A quantum Monte Carlo algorithm for arbitrary spin-1/2 Hamiltonians

We present a universal parameter-free quantum Monte Carlo (QMC) algorithm designed to simulate arbitrary spin-$1/2$ Hamiltonians. To ensure the convergence of the Markov chain to equilibrium for every conceivable case, we devise a clear and simple automated protocol that produces QMC updates that are provably ergodic and satisfy detailed balance. We demonstrate the applicability and versatility of our method by considering several illustrative examples, including the simulation of the XY model on a triangular lattice, the toric code, and random $k$-local Hamiltonians. We have made our program code freely accessible on GitHub.


I. INTRODUCTION
Quantum Monte Carlo (QMC) algorithms [1][2][3] are an indispensable tool in the study of the equilibrium properties of large quantum many-body systems, with applications ranging from superconductivity and novel quantum materials [4][5][6] through the physics of neutron stars [7] and quantum chromodynamics [8,9].The algorithmic development of QMC remains an active area of research, with continual efforts being made to extend the scope of QMC applicability and to improve convergence rates of existing algorithms with the goal of expanding our understanding of quantum systems and facilitating the discovery of novel quantum phenomena [10][11][12][13][14].
While QMC algorithms have been adapted to the simulation of a wide variety of physical systems, different physical models typically require the development of distinct model-specific update rules and measurement schemes, making the simulation of many large-scale quantum many-body systems prohibitively time-consuming.
In this paper, we introduce a universal parameterfree QMC algorithm designed to reliably simulate arbitrary spin-1/2 Hamiltonians.To achieve such capabilities, we devise a clear and simple automated protocol for generating the necessary set of updates to ensure the ergodic Markov chain Monte Carlo sampling of any conceivable input system.The generated QMC updates are shown to be ergodic as well as satisfying detailed balance thereby guaranteeing the proper convergence of the QMC Markov chain.We demonstrate the validity and flexibility of our code by studying in detail a number of models, including the XY model on a triangular lattice, the toric code, random k-local Hamiltonians and more.We have also made our program code freely accessible on GitHub [15].* lbarash@usc.edu The technique we propose here builds on the recently introduced Permutation Matrix Representation (PMR) QMC [16] -an abstract Trotter-errorfree technique wherein the quantum dimension consists of products of elements of permutation groups and which as a result allows for the general treatment of entire classes of Hamiltonians.In PMR QMC, the partition function is expanded in a power series in the off-diagonal strength of the Hamiltonian about the partition function of the classical (diagonal) component of the Hamiltonian [17,18].
The paper is organized as follows.In Sec.II, we provide a brief overview of the off-diagonal series expansion for quantum partition functions, on which our approach is founded.In Sec.III, we analyze spin-1/2 Hamiltonians in the context of PMR-QMC.In the following section, Sec.IV, we discuss our method to generate all the necessary QMC updates, proving that these ensure both ergodicity and detailed balance.Section V is devoted to the implementation of comprehensive measurement schemes.In Sec.VI we showcase the power of our technique by detailing the simulation results of the physical models mentioned above.We conclude in Sec.VII with an additional discussion of future directions of research.

II. THE OFF-DIAGONAL PARTITION FUNCTION EXPANSION
The permutation matrix representation (PMR) protocol [16] begins by first casting the to-besimulated Hamiltonian in PMR form, i.e., as a sum where { Pj } is a set of M +1 distinct generalized permutation matrices [19], i.e., matrices with at most one nonzero element in each row and each column.
Each operator Pj can be written, without loss of generality, as Pj = D j P j where D j is a diagonal matrix and P j is a permutation matrix with no fixed points (equivalently, no nonzero diagonal elements) except for the identity matrix P 0 = 1.We will refer to the basis in which the operators {D j } are diagonal as the computational basis and denote its states by {|z⟩}.We will call the diagonal matrix D 0 the 'classical Hamiltonian'.The permutation matrices appearing in H will be treated as a subset of a permutation group, wherein P 0 is the identity element.
The {D j P j } off-diagonal operators (in the computational basis) give the system its 'quantum dimension'.For j > 0, each term D j P j obeys D j P j |z⟩ = d j (z ′ )|z ′ ⟩ where d j (z ′ ) is a possibly complex-valued coefficient and |z ′ ⟩ ̸ = |z⟩ is a basis state.Upon casting the Hamiltonian in PMR form, one can show [16] that the partition function Z = Tr e −βH can be written as Ez 0 ,...,Ez q ] ⟨z|S iq |z⟩ .
(2) The sum above is a double sum: over the set of all basis states z and over all products of q offdiagonal permutation operators S iq = P iq . . .P i2 P i1 with q running from zero to infinity.The multi-index i q = (i 1 , . . ., i q ) covers all products of permutation operators, where each index i j runs from 1 to M .
In the above sum, each summand is a product of three terms.The first is D (z,S iq ) ≡ q j=1 d (ij ) zj consisting of a product of the matrix elements The various {|z j ⟩} q j=0 states are the states obtained from the action of the ordered P ij operators in the product S iq on |z 0 ⟩, then on |z 1 ⟩, and so forth.For S iq = P iq . . .P i2 P i1 , we have |z 0 ⟩ = |z⟩, P ij |z j−1 ⟩ = |z j ⟩ for j = 1, 2, . . ., q.The proper indexing of the states |z j ⟩ is |z (i1,i2,...,ij ) ⟩ to indicate that the state at the j-th step depends on all P i1 . . .P ij .We will however use the shorthand |z j ⟩ to simplify the notation.The sequence of basis states {|z j ⟩} may be viewed as a 'walk' [20] on the Hamiltonian graph where every matrix element H ij corresponds to an edge between the two basis states, or nodes, i and j.
The second term in each summand, e −β[Ez 0 ,...,Ez q ] , is called the divided differences of the function f (•) = e −β(•) with respect to the inputs [E z0 , . . ., E zq ], where The divided differences [21,22] A useful property is that the divided differences are invariant under rearrangements of the input values.Now, the term ⟨z|S iq |z⟩ in Eq. ( 2) evaluates either to 1 or to zero.Moreover, since the permutation matrices with the exception of P 0 have no fixed points, the condition ⟨z|S iq |z⟩ = 1 implies S iq = 1, i.e., S iq must evaluate to the identity element P 0 (note that the identity element does not appear in the sequences S iq ).The expansion Eq. ( 2) can thus be more succinctly rewritten as i.e., as a sum over all basis states and permutation matrix products that evaluate to the identity matrix.
Having derived the expansion Eq. ( 5) for any Hamiltonian cast in the form Eq. ( 1), we are now in a position to interpret the partition function expansion as a sum of weights, i.e., Z = C w C , where the set of configurations {C} consists of all the distinct pairs {|z⟩, S iq }.We refer to as the configuration weight.We note that as written, the weights w C can in general be complex-valued, despite the partition function Z = C w C being real (and positive).Hence, the imaginary portions of the complex-valued weights do not contribute to the partition function and may be disregarded altogether. 2 We may therefore ignore the imaginary components of the weights to obtain the strictly realvalued weights

III. PERMUTATION MATRIX REPRESENTATION OF SPIN-1/2 HAMILTONIANS
We next discuss the PMR formulation of general spin-1/2 Hamiltonians, which are the main focus of this work.Hamiltonians of spin-1/2 systems are often, and can always be, represented using Pauli matrices (X, Y , and Z).In order to represent this class of Hamiltonians in PMR form, Eq. ( 1), we choose our group of permutations to be the set of all tensor products of Pauli-X operators, G X , while the diagonal matrices D j will be expressed in terms of tensor products of Pauli-Z matrices.As we shall see, any permutation operator for spin-1/2 systems belongs to G X .
Consider a Hamiltonian H given as a linear combination of Pauli strings Here, s (i) j represents a Pauli matrix s ∈ {X, Y, Z} in the ith Pauli string, which acts on the j-th spin, where j ∈ {1, . . ., n} with n being the number of spins in the system (we assume that each Pauli string S (i)  contains no more than one Pauli matrix s (i) j for each spin index j).Explicitly, the Hamiltonian has the form where the c i 's are real-valued coefficients.To cast the Hamiltonian in PMR form, we first write each Pauli string as a product of a diagonal operator and a permutation operator, i.e. we will write S (i) as a string of Pauli-Z's multiplied by a string of Pauli-X's using the fact that Y = −iZX giving where ci = (−i) y being the number of Y operators appearing in the i-th Pauli string.Here, X (i) represents the i-th string (or product) of Pauli-X operators.The notation Z (i) similarly represents a string of Pauli-Z operators.
Noting that Pauli-X strings are permutation operators and that Pauli-Z strings are diagonal in our chosen basis, we next group together all terms that have the same Pauli-X component, ending up with a Hamiltonian of the form The permutation operators X (i) ∈ G X in the above expression are now distinct products of Pauli-X operators, and we identify as the accompanying diagonal operators, as desired.

A. QMC configurations
As was discussed above, for any Hamiltonian cast in PMR form, the partition function Z = Tr [e −βH ] can be written as a sum of configuration weights, where a configuration C = {|z⟩, S iq } is a pair of a 'classical' basis state |z⟩ (an eigenstate of diagonal operators) and a product S iq of permutation operators that must evaluate to the identity element P 0 = 1.The configuration C induces a list of states {|z 0 ⟩ = |z⟩, |z 1 ⟩, . . ., |z q ⟩ = |z⟩}, which in turn generates a corresponding multi-set of energies E C = {E z0 , E z1 , . . ., E zq } for the configuration.
We can now consider a QMC algorithm, i.e., a Markov Chain Monte Carlo routine, that samples these configurations with probabilities proportional to their weights W C , Eq. ( 7).The Markov process would start with some initial configuration and a set of (probabilistic) rules, or QMC updates, will dictate transitions from one configuration to the next.
We will take the initial state of the chain to be C 0 = {|z⟩, S 0 = 1} where |z⟩ is a randomly generated initial classical state.The weight of this initial configuration is i.e., the classical Boltzmann weight of the initial randomly generated basis state |z⟩.
The set of QMC updates will be discussed in the next sections.Before doing so, we note that to make certain that configurations are sampled properly, i.e. in proportion to their weight, we will ensure that the Markov chain is (i) ergodic, i.e., that the QMC updates are capable of generating all basis states |z⟩ as well as all sequences S iq evaluating to the identity, and (ii) satisfies detailed balance, a sufficient (but not strictly necessary) condition dictating that the ratio of transition probabilities from one configuration to another and the transition in the opposite direction equals to the ratio of their respective weights [23,24].In what follows, we show how both conditions are made to be satisfied.

B. Fundamental cycles
As noted above, for spin-1/2 Hamiltonians, the permutation operators P i are Pauli-X strings.As such, they obey (i) [P i , P j ] = 0 for every i, j, i.e., they all commute and (ii) P 2 i = 1.Moreover, each Pauli-X string can be represented as a bit-string of the form indicates whether X i is in the string (b i = 1) or not (b i = 0).(For example, [1 0 1 0 0 1] would correspond to the string X 1 X 3 X 6 .)Denoting by p i the bit-string corresponding to the operator P i , one can easily verify that the product of two permutation operators P i and P j would likewise correspond to the addition modulo 2 (the XOR operation) of the two respective bit-strings p i ⊕p j .Specifically, a sequence of operators evaluating to the identity, S iq = P iq . . .P i2 P i1 = 1, can be written as ⊕ q j=1 p ij = 0, where 0 is the zero bit-string -the bit-string consisting of only zeros.
Moreover, the following observations may be made: (i) Any given sequence of permutation operators S iq that evaluates to the identity is a permutation of a multi-set of operators, the product of which evaluates to the identity.(ii) A multi-set of operators obtained from another by the removal of a pair of identical operators will evaluate to the identity if and only if the original multi-set evaluates to the identity (this follows from the commutativity of permutation operators and the fact that P 2 i = 1).(iii) Upon removing all pairs of identical operators from a multi-set, one is left with a (proper) set of permutation operators, the product of which evaluates to the identity.This set consists of the permutation operators that are contained an odd number of times in the original sequence S iq .
We shall call a set of permutation operators that multiply to the identity a 'cycle'.In terms of binary strings, cycles may be represented by bit-strings [a 1 a 2 • • • a M ] indicating which of the permutation operators P 1 , . . ., P M belong to the set.The length of a cycle would be the number of permutation operators in it (equivalently, the number of 1's in its binary vector representation).As pointed out above, any sequence of operators S iq has a 'core' cycle from which S iq can be generated via insertions of operator pairs followed by reordering.
The question of how one can generate all possible sequences of operators (which evaluate to the identity) is thus reduced to the question of how one can generate all possible cycles (as the former can be derived from the latter).In terms of binary strings, the above question translates to generating the bitstrings [a 1 a 2 • • • a M ] (a j ∈ {0, 1}) that obey the following system of linear equations over mod-2 addition: ⊕ M j=1 a j p j = 0.This question can be answered by finding the nullspace over mod-2 addition of the matrix whose columns are the bit-strings p j .Any bit-string [a 1 a 2 • • • a M ] obeying the condition ⊕ M j=1 a j p j = 0 can be written as mod-2 addition of bit-strings from the nullspace basis.Now, the nullspace basis can be found efficiently via Gaussian elimination over mod-2 addition [25].We shall refer to cycles represented by the bit-strings from the nullspace basis as fundamental cycles.Any cycle is therefore a combination of fundamental cycles.
Generally, the nullspace basis states can be chosen in many different ways, and so the choice of the set of fundamental cycles is not unique.From a practical point of view, however, we find that obtaining a 'minimal cycle basis', i.e., a basis that minimizes the lengths of all basis cycles, is advantageous from the QMC standpoint.This follows from the fact that the probability of a QMC update to be accepted is a decreasing function of the cycle length (see Sec. IV D).To reduce the cycles lengths, we therefore find a basis using Gaussian elimination and then proceed to replace long-cycle basis states with shorter basis states by performing mod-2 additions between the bit-strings of pairs of cycles, accepting the changes each time a new cycle with a shorter length is found.The process ends when a pass through all pairs of cycles does not result in an improvement.
It is worth noting that minimizing cycle lengths of Hamiltonians with nontrivial topologies will not always lead to a cycle basis whose fundamental cycle lengths are all of the order O(1), i.e., do not grow with system size.In the presence of a non-trivial topology, e.g., where periodic boundary conditions are imposed or for other nonzero genus models, there may exist fundamental cycles becoming longer with the system size such as the cycles 'wrapping around the system', consisting of an extensive number of permutation operators.

C. Ergodicity
We are now in a position to utilize the insights from the previous subsection to establish a set of updates to the sequence of operators S iq that would allow for the possibility of generating all possible operator sequences that evaluate to the identity.
As a preliminary step, prior to the simulation taking place, we produce a list of fundamental cycles C 1 , . . ., C T , where T is the dimension of the nullspace, for the to-be-simulated Hamiltonian cast in PMR form.
We next prove that all possible sequences can be generated, i.e., that the Markov chain is ergodic, via a combination of insertion and removal of fundamental cycles, changing the order of the permutation operators in the sequence (permutation operator swapping), and the insertion and removal of pairs of identical permutation operators.Claim: Every permutation operator sequence evaluating to the identity can be generated by (i) insertion and removal of fundamental cycles, (ii) insertion and removal of pairs of identical permutation operators, (iii) swapping the order of two adjacent permutation operators.Proof: For each permutation operator sequence S iq evaluating to the identity, one can consider the cycle C iq consisting of permutation operators that appear an odd number of times in S iq .In other words, by removal of pairs of identical operators, the multi-set of operators that are contained in S iq can be reduced to the cycle C iq consisting of permutation operators that appear only once.
As shown in Sec.IV B, the cycle C iq can be expressed through fundamental cycles, since any cycle can be expressed in terms of the mod-2 nullspace of the matrix containing bit-strings of permutation operators P i from (1).Specifically, c iq = ⊕ T j=1 α j µ j , where c iq is the bit-string representation of the cycle C iq , µ j is a bit-string representation of the fundamental cycle C j , and α j ∈ {0, 1}, j = 1, 2, . . ., T .
The latter observation implies that the cycle C iq can be obtained by inserting those fundamental cycles C j for which the condition α j = 1 is satisfied, accompanied by any necessary reordering of permutation operators and the removal of pairs of identical permutation operators.In turn, the sequence S iq can be obtained from C iq via the addition of pairs of permutation operators together with reordering of permutation operators.This concludes the proof of our claim.■ The ability to generate all permutation operator sequences that evaluate to the identity, S iq , implies ergodicity along the quantum (or imaginary time) dimension.In addition, the generation of all possible classical basis states |z⟩ can be achieved independently by considering moves such as single-or multi-spin flips of the basis states.Moves of this nature guarantee ergodicity along the 'classical' dimension.
In the next section we will use the above observations to devise QMC updates that ensure ergodicity in the entire configuration space, which is the direct product of the classical and quantum configuration spaces, i.e., the pairs C = {|z⟩, S iq }.

D. QMC updates
We next describe the basic update moves for our QMC algorithm.To ensure both performance and high accuracy of evaluating the weight of any given configuration C = {|z⟩, S iq }, we employ a divided differences calculation by means of addition and removal of items from the input multi-set of classical energies [26].We recall that the weight of a configuration requires the calculation of the divided differences e −β[Ez 0 ,...,Ez q ] with inputs [E z0 , . . ., E zq ], where E zi = ⟨z i |H|z i ⟩ = ⟨z i |D 0 |z i ⟩.For S iq = P iq . . .P i2 P i1 , we have |z 0 ⟩ = |z⟩, P ij |z j−1 ⟩ = |z j ⟩ for j = 1, 2, . . ., q.As it was shown in Ref. [26], upon adding an item to or removing an item from the input list of an already evaluated divided differences e −β[E0,...,Eq] , the re-evaluation can be done with only O(sq) floating point operations and O(sq) bytes of memory, where [E 0 , . . ., E q ] are the inputs and s ∝ max i,j |βE i − βE j |.
Following the analysis of the previous subsection, to ensure the ergodicity of the Markov chain Monte Carlo, we find that employing the following QMC updates suffices.

Simple (local) swap
Simple swap is an update that consists of selecting a random integer m from {1, . . ., q − 1}, and then attempting to swap the permutation operators P im and P im+1 in S iq , namely: The updated sequence also evaluates to the identity.Since the internal classical state |z m ⟩ may change, implementing the swap involves adding a new energy E z ′ m and removing the old one E zm from the energy multi-set.The acceptance probability for the update satisfying the detailed balance condition is where it follows from Eq. ( 7) that , and the energy multi-set Here, e −β[E C ] is a shorthand for e −β[Ez 0 ,Ez 1 ,...,Ez q ] for the configuration C and likewise for C ′ .

Pair insertion and deletion
Pair insertion is an update consisting of selecting random integers m ∈ {1, . . ., q + 1} and j ∈ {1, . . ., M }, and then attempting an insertion of the pair of permutation operators P j into the S iq sequence in such a way that the updated operators P im and P im+1 are P j .As a result, the length of the sequences increases: q → q + 2.
The updated sequence also evaluates to the identity.The input multi-set of energies of the new configuration differs from that of the current configuration by having two additional energies E z ′ m and E z ′ m+1 .The weight of the new configuration is then , where the energy multi-set The acceptance probability is as in Eq. ( 13) with the aforementioned E C ′ .
The reverse update of pair insertion is that of pair deletion.This update can be implemented only for q ≥ 2. It consists of selecting a random integer m ∈ {1, . . ., q − 1} checking the validity of the condition P im = P im+1 , and attempting the removal of the pair of operators P im and P im+1 if this condition is true.The updated sequence also evaluates to the identity.The input multi-set of energies of the new configuration differs from that of the current configuration by having one fewer E zm value and one fewer E zm+1 value.The weight of the new configuration is then proportional to e −β[E C ′ ] , where To maintain the detailed balance condition, the acceptance probability turns out to be

Block swap
Block swap is an update that involves a change of the classical state z.Here, a random position k ∈ {1, . . ., q − 1} in the product S iq is picked such that the product is split into two (non-empty) subsequences, S iq = S 2 S 1 , with S 1 = P i k • • • P i1 and S 2 = P iq • • • P i k+1 .The classical state |z ′ ⟩ at position k in the product is given by where where the energy multi-set The acceptance probability is as in Eq. ( 13) with the aforementioned E C ′ .

Classical updates
Classical moves are moves that involve a manipulation of the classical state |z⟩ while leaving S iq unchanged.In a single bit-flip classical move, a spin from the classical bit-string state |z⟩ of C is picked at random and is subsequently flipped, generating a state |z ′ ⟩ and hence a new configuration C ′ .Calculating the weight of C ′ requires the calculation of the new energy multi-set E C ′ and recalculation of the divided differences, so it can become computationally intensive if q is large.Classical moves should therefore be attempted with relatively low probabilities if q is large.Simply enough, the acceptance probability for a classical move satisfying the detailed balance condition is (13).
In the absence of a quantum part of the Hamiltonian (M = 0), not only are classical moves the only moves necessary, but they are also the only moves that have nonzero acceptance probabilities.Since the initial configuration of the QMC algorithm is a random classical configuration |z⟩ and an empty operator sequence S 0 = 1, for a purely classical Hamiltonian, the algorithm automatically reduces to a classical thermal algorithm keeping the size of the imaginary-time dimension at zero (q = 0) for the duration of the simulation.

Fundamental cycle completion
Fundamental cycle completion is an update that consists of choosing a subsequence S from S iq , choosing a fundamental cycle containing all operators of the subsequence S, and attempting to replace the subsequence S with the remaining operators from the selected cycle.
As is discussed in Sec IV C, for the ergodicity condition to be fulfilled, it is necessary to be able to perform the insertion (or, equivalently, completion) of any of the fundamental cycles.For the probability of accepting an update to be non-negligible it is preferable that the change in the value of q will be minimal.The above implies that to complete a fundamental cycle of length l it is most advantageous to replace a subsequence of length r from S iq by the remaining l − r permutation operators, where r ≈ l/2.
We find that in some cases, there are fundamental cycles such that a simple cycle completion update, where the subsequence S comprises consecutive elements of S iq , is always rejected for these cycles due to the zero weight of the resulting configuration [see Eq. ( 7)].Thus, the fundamental cycle completion routine may never accept some of the fundamental cycles during the Markov process if inserted 'as is'.To resolve this issue, we have developed a subroutine that we refer to as 'cycle completion with gaps', which does not require the elements of the sequence S to be consecutive within S iq .Specific details of this protocol can be found in Appendix A.

Composite update
The role of the composite update is to ensure that non-fundamental cycles have the chance of being incorporated into the sequence of operators directly rather than via the concatenation of fundamental cycles which are inserted through the cycle completion move.The composite update is required for situations where fundamental cycle insertions may have zero weight [due to the vanishing of one or more of the matrix elements in the product D (z,S iq ) , see Eq. ( 7)], whereas non-fundamental cycles may not.
The update consists of a combination of several basic updates and is described as follows.(i) Perform one of the basic QMC updates equally likely: either a simple swap, a pair insertion, a pair deletion, or a fundamental cycle completion.(ii) If the resulting weight is zero, reject the entire update with a probability of 1/2, and with the remaining probability, return to step (i).(iii) Finalize the update with the following acceptance probability, which satisfies the detailed balance condition: Here, R(C, C ′ ) = 1 when the update C → C ′ is a simple swap, R(C, C ′ ) = M when it is a pair insertion, R(C, C ′ ) = 1/M when it is a pair deletion, and To prove that the QMC updates ensure ergodicity in the entire configuration space, consider two arbitrary configurations C = {|z⟩, S iq } and C ′ = {|z ′ ⟩, S i ′ q ′ } such that W C ̸ = 0 and W C ′ ̸ = 0.It follows from Sec. IV C that the above QMC updates allow in particular the following sequence of transformations: Hence, the transformation from C to C ′ is possible, and the ergodicity holds.

Worm update
An alternative to the composite update, also capable of incorporating non-fundamental cycles is a worm-type global update for PMR-QMC.This update involves introducing a 'disturbance' (or a 'worm head') into the sequence of operators S iq by either appending S iq with a single operator or removing one from it (we will refer to this addition or removal of an operator as 'single operator moves').Insertion or removal of a single permutation operator causes the sequence to evaluate to a non-identity permutation, thus resulting in a zero-weight configuration.Consequently, the disturbed sequence must be 'healed' back to an identity-forming sequence.The healing process involves introducing further moves, either by employing the basic updates described above such as simple swap, fundamental cycle completion, pair insertion, and pair deletion or by applying additional single operator moves.These single operator moves have the power to heal the sequence.After each such move, the instantaneous sequence S iq is checked to determine if it evaluates to the identity.If it does, the worm update ends; if not, additional moves are required.
To make sure that detailed balance is conserved, and that the acceptance rates of intermediate worm moves are sufficiently high, we assign non-identity intermediate configurations (sequences of operators that do not evaluate to the identity) their 'natural' weight W C as per Eq. ( 7).This allows intermediate moves to be accepted or rejected with probabilities obeying detailed balance.Additionally, to prevent the worm from straying too far from being healed (in other words, to ensure that the sequence of operators is not too far from the identity), we introduce a small probability p f to reject the entire worm update at each intermediate state.
The worm update is specified as follows.(i) Start with a sequence of operators S iq that evaluates to the identity, and store S iq .(ii) Generate a modified sequence of operators by applying one of the following updates with equal probabilities: either a local swap, fundamental cycle completion, pair insertion or deletion, or a single operator move.Accept or reject the new configuration with probabilities obeying detailed balance.(iii) The worm update ends if the new S iq evaluates to the identity.If it does not evaluate to the identity, revert to the stored S iq and exit with probability p f ; with the remaining probability, return to step (ii).

V. MEASUREMENTS
The PMR formulation allows one to measure a wide range of static operators and additional dynamical quantities [27].The key to being able to do so is to write for any given operator A its thermal average as Although, generally, both w C and A C are complexvalued, both the sums C A C w C and C w C are real-valued since both H and A are Hermitian oper-ators.Therefore, we have where The quantity Re[A C w C ]/ Re[w C ] is therefore the instantaneous quantity associated with the configuration C = (z, S iq ) that will be gathered during the simulation.

B. Measurements of custom static operators
We next consider the measurement of a general static operator A. We proceed by casting it in PMR form, i.e., as A = i Ãi Pi where each Ãi is diagonal and the Pi 's are permutation operators.The operators Pi belong to the group G X containing all possible Pauli-X tensor products, whose elements appear in the PMR representation of the Hamiltonian (see section III).We note that in the most general case, the Pi operators in the representation of A may not all appear in the Hamiltonian's PMR decomposition Eq. ( 1).Nonetheless, we can always write and we may focus on a single Ã P term at a time.
where D (z,S iq ) e −β[Ez 0 ,...,Ez q ] is the weight of the configuration (z, S iq ).We differentiate between two cases: (i) the operator P appears in the Hamiltonian or can be written as a product of permutation operators that appear in the Hamiltonian, (ii) The operator P does not appear in the Hamiltonian and cannot be written as a product of permutation operators that do.
In the latter case, since every S iq sequence consists of the permutation operators that appear in the Hamiltonian, we always have ⟨z| P S iq |z⟩ = 0.Then, it follows from Eq. ( 29) that ⟨ Ã P ⟩ = 0.
In the former case, the operator to be measured has the form A = Ã P where Ã is diagonal and P = P i1 P i2 • • • P i k .We modify Eq. ( 29) so that (z, Siq ) with Siq = P S iq is seen as a configuration instead of (z, S iq ).Thus, we arrive at: where (31) In the above, δ P = 1 if the leftmost operators of Siq are P i1 P i2 • • • P i k and is zero otherwise, and (32) The above formulation allows automatic measurements of static operators as long as these are set up as linear combinations of Pauli strings as in Eq. ( 8).For an arbitrary static operator A given as a linear combination of Pauli strings, one is required to cast the expression in the form i Ãi Pi .As a next step, the Pi operators are expressed in terms of the permutations P i of the Hamiltonian.Here, the Gaussian elimination over mod-2 addition can be used [25].Finally, Eqs. ( 30), ( 31) are employed to compute the thermal average ⟨A⟩.
However, for the thermal average of a custom observable to be computed correctly, it is necessary to account for the following restrictions on its structure.We find that employing Eqs. ( 30) and ( 31) lead to obtaining the correct value of the thermal average ⟨A⟩ under the condition that Ã(z) = 0 for all basis states |z⟩ for which D (z, P ) = 0. Otherwise, it is possible that the following conditions hold for some of the configurations: D (z, P ) = 0 and D (z, Siq ) = 0 while D (z,S iq ) ̸ = 0 and Ã(z) ̸ = 0, where Siq = P S iq .Such configurations (z, S iq ) have a nonzero contribution in Eq. ( 29), but the Markov chain does not generate the corresponding configurations (z, Siq ) because they have zero weights: W (z, Siq ) = 0.
In particular, it is possible to obtain the thermal average of any observable of the form A = i A i D i P i , where each A i is an arbitrary diagonal matrix and each D i P i is a generalized permutation matrix appearing in Eq. ( 1).Any operator A satisfying ⟨z|A|z ′ ⟩ = 0 for all basis states |z⟩ and |z ′ ⟩ such that ⟨z|H|z ′ ⟩ = 0, can be written in such a form and hence its thermal average (30) will be computed correctly.
In addition to the above measurement protocol, there is also a way to correctly obtain a thermal average of a static operator A which can be written in the form A = i P (i) , where each P (i) is a matrix of the general form where k are arbitrary diagonal matrices and D j1 P j1 , . . ., D j k P j k are the generalized permutation matrices appearing in Eq. ( 1).
Focusing on a single such term, we calculate the expectation value for an observable of the form where each of the D j k P j k appear in the PMR decomposition of the Hamiltonian and the A k matrices are arbitrary diagonal operators.Carrying out the offdiagonal expansion for Tr [ P e −βH ], we first obtain where with A i (z j ) ≡ ⟨z j |A i |z j ⟩.Note that there is a one-to-one correspondence between non-vanishing terms ⟨z| P S iq |z⟩ = ⟨z|P j k • • • P j2 P j1 S iq |z⟩ and nonvanishing terms ⟨z|S iq |z⟩ which appear in the partition function expansion.We can therefore rewrite the above as: where D (z,S ĩ) e −β[Ez 0 ,...,Ez q+k ] is the weight of configuration (z, S ĩ) with S ĩ = P j k • • • P j2 P j1 S iq .This gives: where M P (z, S iq ) is redefined as (39) In the above expression, δ P = 1 if the leftmost operators of S iq is P j1 P j2 • • • P j k and is zero otherwise.
Denoting A(z, S iq ) = q i=0 M Pi (z, S iq ), we can thus write ⟨A⟩ as C. Monitoring the average sign A necessary condition for the proper importance sampling of partition function weights is that all weights are nonnegative.Whenever the partition function expansion produces negative weights [see Eq. ( 7)], the system is said to possess a sign problem [28].In the presence of a sign problem, configuration weights cannot be treated as unnormalized probabilities as they should in Markov chain Monte Carlo simulations.In such cases, a common workaround is to take the configuration weights to be the absolute values of the original ones |W C | [28][29][30].By doing so, a thermal average of an observable A is re-written as where the average sign is monitored throughout the simulation.The average sign is usually considered a figure of merit of how adverse the sign problem is.For models that do not have a sign problem, all weights are positive, ⟨sgn⟩ = 1 and the expression for ⟨A⟩ naturally reduces to its original form, Eq. ( 18).For sign-problematic systems, ⟨sgn⟩ ≈ 0, and one would expect to obtain extremely large error bars for ⟨A⟩ that would as a result require an exponentially long simulation time for an accurate computation of observables.
As we demonstrate in the next section, in some cases accurate calculations are achievable even when ⟨sgn⟩ is relatively small.In Appendix B we provide an improved approximation of ⟨A⟩ in the presence of a sign problem and an approximation for the statistical error σ(A), which can be determined during the simulation.

VI. RESULTS AND DISCUSSION
In this section, we demonstrate the success of our method in simulating a variety of large-scale quantum many-body systems, taking as test cases models that would help highlight the extensive scope of the algorithm.
Wherever exact calculations were possible, we have verified the correctness and accuracy of our technique by ensuring that the calculated values agree with exact values.

A. The XY model on a triangular lattice
We consider the prototypical quantum anisotropic XY (XZ) model [31][32][33] on a triangular lattice with open boundary conditions (see Fig. 1).This model and variants thereof have been used extensively as simplified models for a variety of physical systems such as liquid helium, high-T c superconductors, anisotropic magnets and more.The Hamiltonian we study is where ⟨jk⟩ denotes neighbors on a triangular lattice with n = L 2 spins containing L sites on each side.The above XY model gives rise to a severe sign problem, preventing a true quantitative understanding of the phase diagram of the model if J > 0 and Γ > 0, which will be our region of interest.Specifically, we shall consider a system with L = 8 and J = 1 and allow the parameter Γ, which serves as the strength of the quantum component of the Hamiltonian, to vary.
The PMR form of the Hamiltonian, Eq. ( 43), is Here, D 0 = J ⟨jk⟩ Z j Z k is the 'classical' component of the Hamiltonian that is diagonal in the computational basis.The set {V jk = X j X k } consists of off-diagonal permutation operators that give the system its 'quantum dimension' and obey V ij |z⟩ = |z ′ ⟩ for every basis state |z⟩, where |z ′ ⟩ ̸ = |z⟩ is also a basis state differing from |z⟩ by two spin flips.There are M = 3L 2 −4L+1 off-diagonal operators V jk , one for each edge of the lattice (see Fig. 1).Since V ij V jk V ki = 1 for every triplet of spins with indices i, j and k that form a basic triangle of the lattice (or a triangular plaquette), we conclude that the model admits 2(L − 1) 2 fundamental cycles, one for each basic triangle (see Sec. IV B).
Figure 2 shows the computed mean energy, its error bar, and ⟨sgn⟩ as a function of β and Γ.Here, we have used 2 • 10 9 Monte-Carlo updates.As is shown in Fig. 2, the observables can be accurately calculated using our method even in cases where the values of ⟨sgn⟩ are as small as 10 −3 .However, as expected, the values of ⟨sgn⟩ decrease exponentially with both β and Γ.
Table I shows the increase in complexity of the calculation with the severity of the sign problem in more detail.In particular, one can see that the wall-clock time is roughly proportional to ⟨q⟩ for ⟨q⟩ > 1 (in agreement with prior results pertaining to divided-difference calculations [26]).This is because most of the computing time is spent on the calculation and reevaluation of the divided differ-

ences.
B. The XY model on a square lattice: Dependence of convergence time on temperature We next investigate the dependence of simulation runtimes on the inverse-temperature β, using as a test case the XY-model on a square lattice with periodic boundary conditions imposed.Similar to the XY model on the triangular lattice discussed in the previous subsection, the permutations of the XY model on the square lattice are two-body XX interactions.The fundamental cycles are therefore length-4 products of permutations corresponding to edges surrounding each plaquette.On top of these, the periodic boundary conditions induce additional cycles that wrap around the lattice.For an 8 × 8 square lattice, the nullspace consists of 65 cycles, of which 56 are of length four and 9 of length eight wrapping around the lattice either horizontally or vertically.
Figure 3 shows the wall-clock time of 10 7 QMC updates as a function of β, fitted by the dashed curve with the optimal fit of 3.14 × β 2.23 .The inset shows the estimated error σ(⟨E⟩) of 10 7 QMC updates versus β.The statistical error turns out to be of the same order of magnitude across the entire tested temperature range.
We thus find that the convergence time of the algorithm grows relatively slowly with β, obeying a modest power-law, indicating that low-temperature simulations are readily attainable.

C. Topological models
As mentioned above, in the presence of a nontrivial topology, e.g., where periodic boundary conditions are imposed or for other nonzero genus models, there may exist fundamental cycles becoming longer with the system size such as the cycles 'wrapping around the system', consisting of an extensive number of permutation operators (as was the case in the XY model on a square lattice with periodic boundary conditions discussed above).For such models, some fundamental cycle lengths will be of the order O(N ), i.e., grow with system size, (as opposed to being O(1).For that reason, these models are usually exceptionally difficult to study as they require so-called 'global' rather than local moves.
To demonstrate that our proposed method can successfully solve these models as well, we next present some results pertaining to the well-known 'toric code' model.The toric code is defined on a periodic two-dimensional lattice (a torus), usually chosen to be the square lattice, with a spin-1/2 particle located on each edge.The Hamiltonian of the toric code is given by [34,35]: where J > 0 and A v = i∈v X i and B p = i∈p Z i with i ∈ v denoting the edges touching the vertex v, and i ∈ p denoting the edges surrounding the plaquette p.In PMR, the Hamiltonian is rewritten as where The only fundamental cycle is equal to the product of all plaquette terms P ≡ v P v .The local moves of insertion and deletion of pairs of permutation operators are however not ergodic on their own.This can be seen by noticing that the annihlation/creation local updates implies that plaquette terms P v appear in even numbers.On the other hand, the product of all plaquette terms P ≡ v P v also evaluates to the identity and so sequences where all plaquette terms are odd-numbered should also be accounted for.There are thus two topological sectors: an even one and an odd one.For the simulation to sample configurations in both, there must be a global move that jumps between the two and which changes the parity of all plaquette operators.As follows from Sec. IV C, the fundamental cycle completion accomplishes this move.
Figure 4 shows N odd /N versus β, where N odd is the number of visited odd-sector configurations, and N is the total number of visited configurations.As can be seen, the Markov chain switches easily between the even sector and the odd one for sufficiently low temperatures, indicating that the Markov chain is ergodic and the algorithm works properly.The above results also agree with the expected behavior of toric code, where transitions between ground states are generated by pairs of excitations at sufficiently low temperatures [36].

D. Random spin-1/2 Hamiltonians
To illustrate the versatility of our algorithm, we next present simulation results for randomly generated spin-1/2 Hamiltonians.We produce random n-spin m-term k-local spin-1/2 Hamiltonian by adding together m randomly generated Pauli strings.To create a k-local Pauli string, we first sample k spin indices i 1 , . . ., i k from the set of spin indices {1, . . ., n}.For each chosen index, we pick at random an operator from the set {X, Y, Z}, thereby cre- ating a Pauli string with locality k.Our final random Hamiltonian attains the form i c (i) S (i) , where each randomly generated Pauli string S (i) is multiplied by a real-valued coefficient c (i) randomly drawn from the interval [−1, 1].To demonstrate the ease with which our approach allows the simulation of such systems, we have generated random 40

E. Classically frustrated spin models
It is important to note that while our approach guarantees a correct equilibrium distribution of the Markov chain for any input spin-1/2 Hamiltonian, a universal rapid mixing of the Markov chain cannot be ensured in general.
One class of many-body systems that is known to considerably hinder the convergence of Monte Carlo algorithms -classical or quantum -is that of strongly frustrated spin models.For these, there exist competing terms in the Hamiltonians the minimization of one directly conflicts with the minimization of others creating 'frustration'.For such Hamiltonians, employing a Markov chain whose classical updates are based on single spin-flip Metropolis moves will result in slow mixing.This is because single spin-flip moves may cause the simulation to get trapped in metastable regions of configuration space (i.e., 'local minima').
In many cases, the slowdown caused by frustration can be significantly mitigated (but not fully cured in general due to the NP-hardness of the underly-ing problem) by replacing the classical Metropolis updates with suitable cluster updates if such exist [37][38][39][40][41]. Replacing classical Metropolis updates (see Sec. IV D 4) with relevant cluster updates often results in a much faster converging algorithm. 3Another efficient approach to some frustrated systems is to combine the QMC algorithm with parallel tempering [42,43] or population annealing [44][45][46].A simple way to detect whether geometric frustration leads to a computational inefficiency in a Monte Carlo calculation is to compare the statistical error obtained from binning analysis (see Appendix B) with the statistical error estimated from multiple completely independent, and hence uncorrelated, runs of the algorithm.If the system is thermalized throughout the course of the simulation, these two estimates should roughly agree.For a highly frustrated system at a low temperature, for which the auto-correlation time is longer than the timescale of the simulation, the error obtained from the binning analysis is significantly underestimated due to the measurements being not fully decorrelated.
Figure 7 shows the computed mean energy as a function of β for a slightly frustrated model where only a small fraction of the edges are antiferromagnetic.The inset shows the error estimates from binning analysis and from independent runs.This example suggests that it is possible to compute the observables with high accuracy at low temperatures 3 These classical cluster moves can be generalized to quantum cluster moves for which the configuration weights are different from Boltzmann weights.However, such a generalization is beyond the scope of the present paper.
if the fraction of frustrated plaquettes is sufficiently low.Markov chain Monte Carlo algorithms are naturally well-suited for massively parallel simulations, where independently run Markov chains contribute equally to the collection of statistics.As shown in Fig. 8, the algorithm, indeed, demonstrates a nearperfect strong scaling speedup, with the only restriction being that each of the parallel processes is expected to employ a sufficient number of initial thermalization (warmup) steps prior to measurement collection.
In addition to sequential execution of the C++ program code, our software package includes capabilities for executing the code in a parallel fashion on high-performance compute clusters using Message Passing Interface (MPI) protocols [47], allowing for extensive parallelization of the algorithm [15].

VII. SUMMARY AND OUTLOOK
We presented a universal, parameter-free, Trottererror-free quantum Monte Carlo scheme capable of simulating, for the first time, arbitrary spin-1/2 Hamiltonians.We have demonstrated that the permutation matrix representation of Hamiltonians allows one to automatically produce QMC updates that are provably ergodic and satisfy detailed balance, thereby ensuring the convergence of the Markov chain to the proper thermal equilibrium.We have in addition illustrated how a wide range of observables may be calculated throughout the simulation.
Our algorithm therefore allows one to study the equilibrium properties of essentially any conceivable spin-1/2 system with a single piece of code that accepts as input a description of a Hamiltonian.This is in stark contrast to existing techniques, which generally require specially tailored model-specific QMC updates for each to-be-simulated system, and are thus limited to simulating models of very specified structures and geometries.
We believe that the generality and versatility of our approach make our proposed technique a very useful tool for condensed matter physicists studying spin systems, allowing the community to explore, with ease, an extremely wide range of physical models, many of which have so far been inaccessible, cumbersome to code, or too large to implement with existing techniques.To that aim, we have made our program code freely accessible on GitHub [15].
We note though that while our approach guaran-tees a correct equilibrium distribution of the Markov chain, the proposed algorithm does not guarantee a universal rapid mixing of the Markov chain, nor does it resolve or aims to resolve the sign problem.
The generality of the technique covered in this study makes it easily extendable to other types of systems, e.g., fermionic, bosonic or higher spin systems.We intend to explore these in future work.

Figure 1 .
Figure 1.A triangular lattice with open boundary conditions.Here the side length is L = 3.

Figure 2 .
Figure 2. Top: Dependence of mean energy and ⟨sgn⟩ on β for L = 8 and Γ = 0.3.Bottom: ⟨sgn⟩ and relative error of the mean energy as a function of Γ for L = 8 and β = 1.

Figure 3 .
Figure 3. Calculations of the XY-model on a square lattice with periodic boundary conditions: wall-clock time in seconds of 10 7 QMC updates as a function of β.Parameters: L = 8, J = 1, Γ = −0.25.The dashed line is 3.14 × β 2.23 .Inset: estimated statistical error of 10 7 QMC updates as a function of β.

Figure 4 .
Figure 4. Toric code: the fraction of time in the odd sector as a function of β for L = 4, L = 6, and L = 8.
-spin m-term Hamiltonians with m varying from m = 1 to m = 50, simulating 200 randomly generated instances per each value of m at β = 1 for three choices of locality k = 3, k = 5, and k = 8. Figure 5 (top) shows the average of the energy ⟨E⟩ over the 200 instances as a function of m.The error bars indicate magnitude of fluctuations of the averaged energy.The bottom panel of Fig. 5 depicts the average sign ⟨sgn⟩, averaged over the 200 instances per each choice of m and k.Similarly, Figure 6 depicts ⟨E⟩ and ⟨sgn⟩ as a function of m for β = 5.

Figure 6 .
Figure 6.Top: Average energy ⟨E⟩ over 200 randomly generated Hamiltonian instances as a function of m for random k-local 40-spin Hamiltonians for k = 3, k = 5, and k = 8.Bottom: A similar plot for ⟨sgn⟩, averaged over the 200 Hamiltonian instances.Here, β = 5.

Figure 7 .
Figure 7.Dependence of mean energy on β for the slightly frustrated XY model on a triangular lattice.Inset: error estimation.Parameters: L = 8, Γ = −0.25,J = 1 for a few randomly selected edges and J = −1 for the remaining edges.

FFigure 8 .
Figure 8.Strong scaling speedup.Here, T (p) is the time required to complete the same amount of work using the parallel setup with p processing threads.The total numbers of initial thermalization QMC updates and main QMC updates are equal to 2.5 × 10 7 and 2.5 × 10 8 , respectively, for each of the parallel setups.Calculations were performed for the XY-model on a square lattice with periodic boundary conditions with L = 8, J = 1, Γ = −0.25.

Table I .
Dependence of the expansion order and average sign on the off-diagonal strength Γ for the antiferromagnetic XY model on a triangular lattice.Calculations are shown for L = 8, J = 1, β = 1.