Gauge Invariant and Anyonic Symmetric Transformer and RNN Quantum States for Quantum Lattice Models

Symmetries such as gauge invariance and anyonic symmetry play a crucial role in quantum many-body physics. We develop a general approach to constructing gauge invariant or anyonic symmetric autoregressive neural network quantum states, including a wide range of architectures such as Transformer and recurrent neural network (RNN), for quantum lattice models. These networks can be efficiently sampled and explicitly obey gauge symmetries or anyonic constraint. We prove that our methods can provide exact representation for the ground and excited states of the 2D and 3D toric codes, and the X-cube fracton model. We variationally optimize our symmetry incorporated autoregressive neural networks for ground states as well as real-time dynamics for a variety of models. We simulate the dynamics and the ground states of the quantum link model of $\text{U(1)}$ lattice gauge theory, obtain the phase diagram for the 2D $\mathbb{Z}_2$ gauge theory, determine the phase transition and the central charge of the $\text{SU(2)}_3$ anyonic chain, and also compute the ground state energy of the SU(2) invariant Heisenberg spin chain. Our approach provides powerful tools for exploring condensed matter physics, high energy physics and quantum information science.


I. INTRODUCTION
In recent years, there has been a growing interest in machine learning approaches to simulating quantum many-body systems .An important step in this direction is the use of neural networks, e.g.restricted Boltzmann machines, to represent variational wave functions.However, many neural networks do not automatically enforce the symmetries of physical models.A considerable amount of work has been devoted to remedy the deficiency for several classes of global symmetries, such as translational symmetry [3], discrete rotational symmetry [3], global U(1) symmetry [4], and anti-symmetry [5][6][7].
In addition to global symmetries, local symmetries can be encoded through gauge invariance.The notion of gauge invariance is crucial in quantum mechanics.In high energy physics, theory is required to be invariant under the action of gauge symmetry groups [24].Gauge invariance appears naturally in various condensed matter physics models.For example, topological states of toric code and double semion models arise as the ground states of their gauge-invariant Hamiltonians [25,26].Also, novel quantum matter such as fracton is the ground state of a Hamiltonian where the subsystem symmetry is gauged [27].In quantum information, various quantum error correction codes can be viewed as eigenstates in a certain gauge-invariant code space [28].Besides Simulating quantum many-body gauge theory is exponentially costly.There has been much effort to efficiently simulate quantum lattice gauge theory with both digital and analog quantum computers [32], but more effort is required experimentally to achieve good fidelity.Two standard approaches to simulating gauge theory classically are stochastic, integrating an effective Lagrangian by sampling, and variational.When simulating gauge theory, the stochastic approach naturally obeys gauge invariance but is plagued with exponential costs associated with the sign problem in models with finite density of fermions or involving quantum dynamics [33].The variational approach overcomes the difficulty by being constrained to an approximate variational space.Imposing gauge symmetries in the variational approach is particularly important and challenging as, otherwise, lower energy states can exist in the gauge-violating part of a Hilbert space.Therefore gauge symmetries must be explicitly constrained.While the stochastic approach has been well studied, there have been limited attempts at using the variational approach for gauge theory.Tensor networks can be readily applied to gauge theory in one dimension and ongoing efforts are required to work with challenges in higher dimensions [32,34].A variational approach based on gauge equivariant networks has been introduced very recently [35][36][37].
We develop for the first time, a general approach to constructing gauge invariant or anyonic symmetric (such as the fusion rule for anyons) autoregressive neural networks (AR-NN) for quantum lattice models.Autoregressive neural networks, such as recurrent neural networks (RNN) [38,39], pixel convolutional neural networks (pix-elCNN) [40], and Transformers [41], have revolutionized the fields of computer vision and language translation and generation, among many others.Autoregressive neural networks quantum states have recently been introduced in quantum many-body physics [4,42,43] and shown to be capable of representing volume law states (as one generically needs in dynamics) with a number of parameters that scale sub-linearly [44].A central feature of AR-NN is their capability of exactly sampling configurations from them.This is to be contrasted with the standard approach of sampling configurations by doing a random walk over a Markov chain, which is often plagued with long equilibration times and non-ergodic behaviors.We construct gauge invariant AR-NN for the quantum link model of U(1) lattice gauge theory [45], Z N gauge theory, and anyonic symmetric AR-NN for SU (2) k anyons.We demonstrate the exact representation of gauge invariant AR-NN for the ground and excited states of the 2D [26] and 3D [46] toric codes, and the X-cube fracton model [47].We optimize our symmetry incorporated AR-NN for the quantum link model, the 2D toric code in a transverse field, the 1D Heisenberg chain with SU(2) symmetry, and the SU(2) 3 anyonic chain [29,30,48], to obtain ground states accurately and extract phase diagrams and various dynamic properties.

II. CONSTRUCTION FOR GAUGE OR ANYONIC SYMMETRIES
Our goal in this work is to generate autoregressive neural networks (AR-NN) which variationally represent wave functions of quantum lattice models and explicitly obey their gauge symmetries-i.e.given a set of gauge symmetry operators {G i } with local support, we would like to construct a wave function |ψ⟩ such that G i |ψ⟩ = |ψ⟩ for each i.To do this, we will work within the 'gauge basis' {|x⟩} which is diagonal in the gauge, ⟨x|ψ⟩ = ⟨x|G i |ψ⟩.A sufficient condition of gauge invariance of the wave function is to ensure that the gauge-violating basis elements |x⟩ have zero amplitude in |ψ⟩.Throughout this work, we will primarily work with gauges G i which are local-i.e.G i |x⟩ only affects a compact range of sites within the vicinity of site i.
While we would typically want our AR-NN to take as input the configuration {x 1 , x 2 , . . ., x n } and evaluate ψ(x 1 , x 2 , . . ., x n ), we will find it useful to instead evaluate ψ( x) where x ≡ { x 1 , x 2 , . . ., x n }, x i ≡ (x i1 , x i2 , . . ., x iv ), is a composite particle specifying the configuration of not only site i but also some number of nearby sites.The motivation for working with composite particles is that a particular local gauge constraint G i might only depend on composite particle x i (and potentially x i+1 ), making it easier to apply the gauge constraints.Different composite particles can naturally overlap in physical sites and we will simply augment our gauge constraints to require that the configurations of the composite particles agree on the state of a physical site-i.e.basis states of composite particles which map to disagreeing physical states should also have zero amplitude.
AR-NN perform two functions: sampling and evaluation.AR-NN can sample configurations x from |ψ( x)| 2 .This is done sequentially (in some pre-determined order) one composite particle x i at a time; the probability to sample x i is equal to a 2 ( x i | x <i ) where a( x i | x <i ) is a function which returns the conditional amplitude.Evaluation of the AR-NN gives a value ψ ( xi| x<i) where θ( x i | x <i ) is a function which returns the conditional phase.Both evaluation and sampling rely on the existence of a gauge block which takes x 1 , . . ., x k−1 and outputs the possible values { z i } of x k along with their respective amplitudes a( z i | x <k ) and phases θ( z i | x <k ), ensuring that the amplitude of any configuration which is going to violate the gauge constraint is set to zero.To build this gauge block, we start with an autoregressive neural network block which returns a list of amplitudes which do not constrain the gauge (such blocks are standard in autoregressive models such as Transformers and RNN); we then zero out those partial configurations which break the gauge (on the already established composite particles) and renormalize the probabilities in this list (see Fig. 1(a)).Given the gauge block it is then straightforward to both sample and evaluate (see Fig. 1(b, c)).Note the probability induced by our AR-NN is different from the probability induced by the AR-NN with only the autoregressive neural network block even if one projects out the gaugebreaking configurations from the latter network.
It is worth noticing that the construction is not limited to gauge theory, but can be generalized to wave functions with either local or global constraints which are checked in the same way as gauge constraints are checked.This will be helpful for describing constraints from certain global symmetries or special algebraic structure, such as the SU(2) symmetry for the Heisenberg model and the SU(2) k fusion rules for non-abelian anyons.

III. OPTIMIZATION ALGORITHMS
We use AR-NN to calculate both ground states and real-time dynamic properties.In both cases, we need to optimize our AR-NN.For ground states, an AR-NN is optimized with respect to energy and for real-time dynamics, we optimize an AR-NN at time-step t + 2τ given a network at time t.We describe the details of these optimizations.As these optimization approaches are general, we use x to denote a configuration, but for the context The evaluation process can be performed in parallel for all the input sites.Given the input { x k }, the gauge block simultaneously generates amplitudes and phases for all sites.We then select the correct amplitudes and phases based on the input configuration for each site and construct the wave function from the selected amplitudes and phases.(c) Sampling process.The sampling is done sequentially for each site.We begin with no input and generate the amplitude and phase for the first site.The configuration of the first site is then sampled from the square of the amplitude.Afterwards, we feed the first sample into the gauge block to obtain the second sample.This process continues until we obtain the whole configuration.
(2) We further control the sampling variance [49] by subtracting from E loc (x) the average over the batch, E avg ≡ 1/N x∈batch E loc (x), and define the stochastic variance reduced loss function as where the gradient is taken on log ψ * θ using PyTorch's [50] automatic differentiation.
With this loss function, we also use transfer learning techniques [51,52].We train our neural networks in smaller systems and use these parameters as the initial starting points for optimizing for larger systems (see Appendix E for details).
For the dynamics optimization, we use a stochastic version of the logarithmic forward-backward trapezoid method [53], which can be viewed as a higher order generalization of IT-SWO [54] and the logarithmic version of the loss functions in Refs.9 and 42.We initialize two copies of the neural network ψ θ(t) and ψ θ(t+2τ ) .At each time step, we train ψ θ(t+2τ ) to match (1 + iHτ ) ψ θ(t+2τ ) ≡ |Ψ θ ⟩ and (1 − iHτ ) ψ θ(t) ≡ |Φ⟩ by minimizing the negative logarithm of the overlap, β(x).(5) The gradient of the negative logarithm of the overlap can be evaluated stochastically as where α avg and β avg are respectively the average values of α(x) and β(x) over the batch of samples.We can then define the loss function as where the gradient is taken on log Ψ * θ using PyTorch's [50] automatic differentiation.
For both optimizations, ψ θ (x) is evaluated as described in Fig. 1(a) and x is sampled from |ψ θ | 2 as described in Fig. 1(b).The full derivations of the stochastic gradients for both optimizations are in Appendix G.
In addition, we extensively use the transfer learning technique, by training on small system sizes before moving on to large system sizes.The transfer learning technique provides a good initialization for neural networks that are trained on large system sizes.We observe that the transfer learning technique in general significantly reduces the number of iterations needed.The details of usage of this technique are described in the captions of each figures.(See more details in Appendix E).

IV. APPLICATIONS IN QUANTUM LATTICE MODELS A. U(1) Quantum Link Model
The quantum link model (QLM) of U(1) lattice gauge theory in 1+1 dimensions in the Hamiltonian formulation with staggered fermions [45] is defined as where m is the staggered fermion mass, g is the gauge coupling, i = 1, 2, . . .labels the lattice site, ψ i is the fermion operator, U i,i+1 is the link variable and E i,i+1 the electric flux for the U(1) gauge field on link (i, i + 1) [45].We denote by |q i ⟩ the basis state at site i, and by |e i,i+1 ⟩ the basis at link (i, i + 1).Each unit cell is defined to include two sites and two links.The operators E i,i+1 and U i,i+1 satisfy the following commutation relations: . The gauge constraint is given by the Gauss's law operator such that the ground state |ψ⟩ satisfies G i |ψ⟩ = 0 for each i.The QLM has gained growing interests and been studied in different settings in recent years [32,34,[55][56][57].We focus on the (1+1)D QLM with the S = 1/2 representation for the link operators U i,i+1 and E i,i+1 .Under the Jordan-Wigner transformation, Eq. 8 becomes [58] where S ± ≡ S 1 ± iS 2 , S 1 , S 2 , S 3 are the Heisenberg matrices, and the Gauss's law operator becomes For the S = 1/2 representation, the last term on the right side of Eq. 9 is constant and, hence, can be discarded.
We define the composite particles of our gauge invariant AR-NN as in Fig. 2 and choose an order from left to right.Each composite particle |σ i ⟩ consists of a fermion |q i ⟩, which can be either |•⟩ or |•⟩, and a gauge field in the link |e i,i+1 ⟩, which can be either |→⟩ or |←⟩.Note that in this case the composite particles do not overlap.The Gauss's law operator G i acts on |σ i ⟩ and |σ i+1 ⟩ to determine allowed configurations and so can be checked in the gauge block which generates the composite particle at site i + 1.For example, given We implement and variationally optimize this AR-NN for the ground state of Eq. 9. Fig. 3 shows the results for 6 unit cells (i.e. 12 particles) which closely match the energy of the exact solution.More importantly, the gauge invariant construction guarantees that the solution is in the physical space, while the neural network without gauge constraint (i.e.removing the gauge-checking from the AR-NN) finds a lower energy but non-physical state.
We in addition compute the ground state for 40, 80, 120 and 160 unit cells with both Transformer and RNN (Fig. 4).The average electric fields are compared with tensor network (TN) results [59].We find that our results (for matching system sizes) are similar to the TN results for both Transformer and RNN.In addition, we extrapolated the ground state energy for the 160 unit cell model at m = 0.7 (see Fig. 18 in Appendix.A) by linearly extrapolating in variance vs. energy.We find that the extrapolated ground state energy is −199.7923,while our lowest energy is −199.7803± 0.0005, giving us a relative error of only 6 × 10 −5 .0 100 200 300 400 500 600 700 800 900 1000 iterations We also consider the real-time dynamics for m = 0.1 and m = 2.0 for 6 and 12 unit cells starting with an initial product state with |• → • →⟩ for each unit cell.Fig. 5(a) shows the conservation of energy for different ansatzes.The total energy is −1.2 for m = 0.1, and −24 for m = 2.0.We find that our gauge invariant AR-NN captures the correct electric field oscillation and has a lower per step infidelity compared with the non-gauge ansatz (see Fig 5(b) and (c), and, additionally, the anticipated string inversion of the electric flux for small mass (and respectively the static electric flux for large mass) (see Fig 6).
While the current work focuses on the S = 1/2 representation, our construction can be generalized to an arbitrary S representation.For a higher spin S, composite particles can be defined similarly (see Fig. 2) except that the degree of freedom for each e i,i+1 increases to 2S + 1 as S increases.

B. 2D ZN Gauge Theory
For the 2D toric code [26], consider an L × L periodic square lattice, where each edge has the basis {|0⟩, |1⟩}.Let V, P, E denote the sets of vertices, plaquettes and edges of the lattice, respectively, such that Here we consider the toric code with a transverse magnetic field (inset) energy variance per unit cell.We compare our results with the tensor network (TN) results (dashed lines in (a)) from Ref. 59.The Transformer neural network has 1 layer and 32 hidden dimensions, whereas the RNN has 2 layers and 40 hidden dimensions.For both neural networks, we use the amplitude-phase parameterization (see Fig. 24).The neural networks are randomly initialized.Then they are trained for 3000 iterations with 12000 samples on 40 unit cells.Then, we use transfer learning technique and train the Transformer for 1000 iterations on 80 unit cells and 600 iterations on 120 unit cells.The RNN is then trained for 1000 iterations on 80 and 120 unit cells and 600 iterations on 160 unit cells.The neural network architecture and optimization details are discussed in Appendix E.
where H T C is the toric code Hamiltonian A v ≡ e∋v σ z e (the star operator) , B p ≡ e∋p σ x e , and h is the strength of the transverse field.Note that A v is the gauge constraint such that the ground state |ψ⟩ of Eq.A1 and Eq.11 satisfies A v |ψ⟩ = |ψ⟩ for each v.
The composite particle construction is illustrated in Fig. 7.We order our consecutive particles by an "S" shape going up one row and down the next (see Fig. 29(b) in Appendix E).Two constraints must be checked in the gauge checking process of a gauge block.
When working on the gauge block associated with composite particle x v , we check that We use the Transformer neural network with 1 layer, 16 hidden dimensions for 6 unit cells and 32 hidden dimensions for 12 unit cells, and the realimaginary parameterization (see Fig. 24).The initial state is |• → • →⟩ for each unit cell and we train the neural network using the forward-backward trapezoid method with the time step τ = 0.005, 600 iterations in each time step, and 12000 samples in each iteration.The neural network architecture, initialization and optimization details are discussed in Appendix E.
A v acting on an entire state, this can be checked locally on a single composite particle).In addition, composite particles overlap with their four immediately adjacent composite particles.The gauge block for the composite particle at v therefore checks consistency of the physical sites with the first v − 1 composite particles.For example, given the configuration 1 0 • 0 1 for a composite particle, the composite particle to the right can only be , as the |1⟩ on the right of the left particle must also be on the left of the right particle.In the "S" ordering, there always exists valid choices for each composite particle.For a composite particle that is not the last one, there is an unchosen site which provides freedom of choices to be valid.For the last composite particle, though all sites are fixed, the fixed configuration must be valid because the product of all the Gauss's law constraints is 1 and all previous Gauss's law constraints have been satisfied to be 1.
We late the gauge constraint.In our construction, if we use an autoregressive neural network block which gives equal weight to all the configurations (this is straightforward to arrange by setting the last linear layer's weight matrices to zero and bias vectors to equal amplitudes), we exactly achieve this state.Checking the A v does not affect the relative probabilities because it is not conditional involving only one composite particle.On the other hand, the 'gauge constraints' which verify consistency of the underlying state of the sites leave equal probability between all consistent states.To see this, we examine the effect of the gauge constraint on |a( x k | x <k )| 2 for any given k, which is the conditional probability of the composite particle x k .Due to the conditioning from previous composite particles { x <k }, some sites of the composite particle x k are fixed.For the Gauss's law gauge constraints to be 1, the product of all the unchosen site configurations in x k must be either 1 or −1, depending on the chosen site configurations.Let 2 will have the same amplitude for any { x ≤k }.We can also generate excited states by changing the A v for a fixed (even) number of vertices to constrain this local eigenvalue to be −1 instead of 1.We provide a numerical verification of this by computing the energy for an exactly represented tower of ground and excited states in Fig. 8.
With a nonzero value of the external field h, the ground state of Eq.A1 is no longer exactly representable, and we variationally optimize our AR-NN to compute the ground state energy.Fig. 9 shows the minimum energy e ⟩) versus h.We use the 2D RNN with 3 layers, 32 hidden dimensions and the amplitude-phase parameterization (see Fig. 24).We use the transfer learning technique where we first train the neural network on a 6 × 6 model for 8000 iterations and then we transfer the neural network to the 10 × 10 model for another 1000 iterations.In each iteration, we use 12000 samples.The neural network architecture, initialization and optimization details are discussed in Appendix E.
for Eq.A1 for different h and the energy derivative, computed using the Hellman-Feynman theorem [60].The toric code is expected to exhibit a quantum phase transition between the topological and trivial phases at an intermediate value of h, and the sharp change of the energy derivative around h = 0.34 is an indicator of this phase transition, which is consistent with the quantum Monte Carlo prediction of h = 0.328474 in the thermodynamic limit [62].We can additionally identify the transition by considering the Wilson loop operator W C = e∈C σ x e for a closed loop C. It is predicted that the topological order phase follows an area law decay, ⟨W C ⟩ ∼ exp(−αA C ), and the trivial phase follows a perimeter law decay, ⟨W C ⟩ ∼ exp(−βP C ), where A C , P C are the enclosed area and perimeter of the loop C [63].Fig. 10 shows the values of ⟨W c ⟩ using our variationally optimized AR-NN.By comparing the respective fits to the area and perimeter laws we again see the transition at h = 0.34.Finally, we compare the non-local string correlation operators S γ = e∈γ σ z e of our variational states which could be viewed as a measure of the correlation of a pair of excited particle and anti-particle along a path γ.In the topological order phase the non-local string operators will decay to zero while they will remain constant at the trivial phase [64].In Fig. 11, this is seen clearly on both sides of the transition.
At h = 0.36, we additionally benchmark our results with the infinite system size iPEPS results [61] and the gauge equivariant results [35] (Fig. 12).Here we use an improved ansatz with 180 • rotation symmetry defined by log rotates configuration x by 180 degrees.We find that our results (at least up to L = 12) are lower in energy density than the gauge equivariant results and the iPEPS results, which indicates that our approach is very competitive with the state-of-the-art methods.Our approach can be naturally generalized to 2D Z N gauge theory, which can be described in the language of Kitaev's D(G) model with group G = Z N (see Appendix B).In this case, the basis at each edge becomes a group element in Z N .Similarly to Fig. 7, one can define a composite particle over four edges from a vertex and impose gauge invariance.We can also extend our approach to the (1+1)D Z N lattice quantum electrodynamics (QED) model, which is discussed in Appendix C.

C. 3D Toric Code and Fracton Model
We turn to gauge invariant AR-NN for the ground and excited states of the 3D toric code [46] and the fracton model [47,65].The 3D toric code generalizes the 2D toric code to an L × L × L periodic cube where each edge has the basis {|0⟩ , |1⟩}.The Hamiltonian takes the same form as the 2D model (see Eq. 11) except that for each A v ≡ e∋v σ z e there are six edges e associated with each vertex v.A ground state of the 3D toric cube similarly satisfies A v |ψ⟩ = B p |ψ⟩ = |ψ⟩ for each v, p.One of the degenerate ground states can also be expressed as ⊗n .The excited states can be generated by breaking certain constraints from A v and B p as in the 2D case.We compare our results (blue squares) with the iPEPS results of infinite system size (dashed lines) from Ref. 61 and the gauge equivariant neural network results from Ref. 35.Notice that due to the difference in the definition of h, the h here is twice as large as in Ref. 61.We use the 2D RNN with 3 layers, 32 hidden dimensions and the amplitude-phase parameterization (see Fig. 24).The neural network is randomly initialized and trained for 8000 iterations with 12000 samples on 6 × 6 system.Then, we used the transfer learning technique to train the neural network on 8 × 8 and 10 × 10 systems for another 8000 iterations and on 12 × 12 system for 4000 iterations.The neural network architecture, initialization and optimization details are discussed in Appendix E. FIG. 13. A i v and Bc for the X-cube fracton model.
The X-cube fracton model [47] is also defined on an L × L × L periodic cube where each edge has the basis {|0⟩ , |1⟩}.The Hamiltonian takes the following form where B c ≡ e∈c σ z e over the edges in a small cube.The gauge constraint, i.e.Gauss's law, is B c |ψ⟩ = |ψ⟩.There are three A i v ≡ e i ∋v σ x e i for three choices of i = zy, xy, xz, depending on which 2D plane A i v acts on.The operators are illustrated in Fig. 13.A ground state of the X-cube fracton model satisfies A We define each star as a composite particle and check bond consistency for adjacent particles similarly to the 2D toric code.(c) Composite particles of the X-cube fracton model.We define each cube as a composite particle and check bond consistency on faces of adjacent particles.
The composite particles for the 3D toric code and the X-cube fracton model are illustrated in Fig. 14.For the 3D toric code, a composite particle is made up of six particles associated with a vertex.The ground state can be constructed by initializing the bias of the autoregressive neural network to be all the |+⟩ state and imposing gauge checking on each composite particle to be 1.The excited states can be constructed by forcing even numbers of composite particles to have gauge checking value −1.For the X-cube fracton model, a composite particle consists of twelve particles on each mini cube.The ground state comes from initializing all biases of the autoregressive neural network to be |+⟩ and requiring all composite particles to have the gauge checking value 1.The excited states break the gauge checking value on sets of four nearby composite particles to be −1.We numerically verify the exact representations of ground and excited states of the 3D toric code and the X-cube fracton model in Fig. 8, where the energy is shown to be exactly the same as the theoretical predictions.
Our approach can be naturally generalized to the Haahs code fracton [65] and checkerboard fracton [47] models.Similarly to 2D Z N gauge theory (see Section IV B), one can consider applying gauge invariant AR-NN to study the 3D Z N gauge theory in the context of the 3D toric code or the X-cube fracton model [27] with an external field.

D. SU(2) k Anyonic Chain and SU(2) Symmetry
Non-abelian anyons play a crucial role in universal topological quantum computation.Here we consider a chain of Fibonacci anyons, which can be regarded as an SU(2) k=3 deformation of the ordinary quantum spin-1/2 chain [48].In this model, there is one type of anyon τ and a trivial vacuum state 1 for each site.The constraint from symmetry requires that τ and 1 satisfy the following fusion rule: τ ⊗τ = τ ⊕1, τ ⊗1 = 1⊗τ = τ .We work directly in this basis where each site is either 1 or τ , generating an anyonic symmetric AR-NN.We then proceed to work out the entire phase diagram of the Fibonacci anyons.This can be done particularly efficiently compared with standard Monte Carlo sampling [13] thanks to the exact sampling of the AR-NN.
Our anyonic symmetric AR-NN is constructed so that it obeys the anyon fusion rule directly by checking two adjacent input configurations and imposing zero amplitude when both are τ .Each anyon is a composite particle and the gauge checking implements the constraint from the anyon fusion rule.
We consider the Hamiltonian [30] H where the two-anyon interactions can be described by the golden chain Hamiltonian [29,30] H and the three-anyon interactions can be described by the Majumdar-Gosh chain Hamiltonian [30] H ϕ = √ 5 + 1 /2 is the golden ratio.This model is predicted to exhibit five phases with respect to different θ [30].Fig. 15 shows the optimized energies of the Hamiltonian in Eq. 13 for different θ and the energy derivative computed using the Hellman-Feynman theorem [60].The non-differentiable points of the energy derivative indicates the phase transition points, which agree with the conformal field theory prediction.In the special case of θ = 0, the model reduces to the Fibonacci anyons in a golden chain, which has a gapless phase [30].Using our optimized AR-NN, we compute the second Renyi entropy S 2 [4].Since the second Renyi entropy S 2 is related to the central charge c under the periodic boundary condition as S 2 ∼ c 4 log(L) with system size L [66], we then extract the central charge finding a value c = 0.703 ± 0.005 very close to the exact result of 0.7 (see Fig. 16).
This can be generalized to the SU(2) k formulation of anyon theory, for which there are k + 1 species of anyons i ⟩, versus θ.Phase transitions occurs when the energy function is not differentiable.Vertical lines in orange are located at the exact phase transition points [30].We use the 1D RNN with 3 layers, 36 hidden dimensions and the real-imaginary parameterization.We use the transfer learning technique where we first train the neural network on a 32-anyon model for 3000 iterations and then we transfer the neural network to the 40-anyon model for another 3000 iterations.In each iteration, we use 12000 samples.The neural network architecture, initialization, and optimization details are discussed in Appendix E. labeled by j = 0, 1/2, 1, . . ., k/2 with the fusion rule of SU(2) k [29].The Hamiltonian can be expressed with operators from the representation of the Temperley-Lieb algebra [29].To construct an anyonic symmetric autoregressive neural network for the general SU(2) k anyonic chain, one works in the angular momentum basis {|. . ., j i−1 , j i , j i+1 , . ..⟩}where j i ∈ {0, 1/2, 1, . . ., k/2}.Since each j i is included as the SU(2) k fusion rule outcome of j i−1 and an extra 1/2 angular momentum, one can view j i as a composite particle and gauge checking is the fusion rule.
The Fibonacci anyon is a special case of the SU(2) k=3 formulation, considering the mapping τ → j = 1 and 1 → j = 0 and applying 3/2 × j = 3/2 − j from the SU(2) 3 fusion rule to the even-number sites [29].Note that this gives a slightly different AR-NN from what is described above.Besides the Fibonacci anyon, one can consider the Yang-Lee anyon, which follows the SU(2) 3 fusion rule [67].
Using this framework, one can also consider the Heisenberg spin chain with SU(2) symmetry since it can be considered as the SU(2) k deformation of the ordinary quantum spin-1/2 chain [48] as k → ∞.In Appendix D, we provide the detailed construction of an SU(2) invariant autoregressive neural network for the Heisenberg model, which can be viewed as the case of SU(2) k=∞ , and obtain accurate results for the 1D Heisenberg model.

V. CONCLUSION
We have provided a general approach to constructing gauge invariant or anyonic symmetric autoregressive neural network wave functions for various quantum lattice models.These wave functions explicitly satisfy the gauge or algebraic constraints, allow for perfect sampling of configurations, and are capable of explicitly returning the amplitude of a configuration including normalization.To accomplish this, we have upgraded standard AR-NN in such a way that the constraints can be autoregressively satisfied.
We have given explicit constructions of AR-NN which exactly represent the ground and excited states of several models, including the 2D and 3D toric codes as well as the X-cube fracton model.For those models for which exact representations are unknown, we variationally optimize our symmetry incorporated AR-NN to obtain either highquality ground states or time-dependent wave functions.This has been done for the U(1) quantum link model, Z N gauge theory, the SU(2) 3 anyonic chain, and the SU(2) quantum spin-1/2 chain.For these systems we are able to measure dynamical properties, produce phase diagrams, and compute observables accurately.
Our approach opens up the possibility of probing a larger variety of models and the physics associated with them.For example, the higher spin representation S > 1/2 in the (1+1)D QLM models would allow one to probe the quantum chromodynamics related physics of confinement and string breaking [68].For the (3+1)D QLM models, there is the Coulomb phase which manifests in pyrochlore spin liquids [69].For Z N gauge theory, it will be interesting to consider the general Z N toric code with transverse field or disorder, with a goal of understanding its phase diagram.Recently, there have been proposals to understand the Z N X-cube fracton model with nontrivial statistical phases [27].For non-abelian anyons, the general SU(2) k formulation exhibits rich physics for different k and one can study the corresponding topological liquids and edge states [48].Our approach can also be extended to study the phase diagram for the 2D Heisenberg models with SU(2) symmetry.
Besides exploring various models in condensed matter physics and high energy physics, our approach can also be further applied to quantum information and quantum computation.Fibonacci anyons are known to support universal topological quantum computation, which is robust to local perturbations [70].It will be interesting to see how well one can approximately simulate topological quantum computation or different braiding operations with anyonic symmetric autoregressive neural networks.As toric codes are an important example of quantum error correction code, our approach can be used to approximately study the performance of a toric code under different noise conditions.With respect to the recent efforts on simulating lattice gauge theories with quantum computation, our approach also provides an alternative method to compare to and benchmark quantum computers.In summary, the approach we have developed is versatile and powerful for investigating condensed matter physics, high energy physics and quantum information science.ACKNOWLEDGEMENT D.L. is grateful for insightful discussion in high-energy physics with J. Stokes and J. Shen.D.L. acknowledges helpful discussion with L. Yeo, O. Dubinkin, R. Levy, P. Xiao, R. Sun, and G. Carleo.This work is supported by the National Science Foundation under Cooperative Agreement No. PHY2019786 (the NSF AI Institute for Artificial Intelligence and Fundamental Interactions http://iaifi.org/).This work utilizes resources supported by the National Science Foundations Major Research Instrumentation program, Grant No. 1725729, as well as the University of Illinois at Urbana-Champaign [71].The authors acknowledges MIT Satori and MIT SuperCloud [72] for providing HPC resources that have contributed to the research results reported within this paper.Z.Z.The "6-cell Gauss SG" is the same as the "6-cell Gauss" in Fig. 5. (a) The change in the energy during the dynamics.(b) The expectation value of the electric field averaged over all links.(c) The per step infidelity measure, where |Ψ⟩ and |Φ⟩ are defined in Sec.III.We use the Transformer neural network with 1 layer, 16 hidden dimensions and the realimaginary parameterization (see Fig. 24).The initial state is |• → • →⟩ for each unit cell and we train the neural network using the forward-backward trapezoid method with the time step τ = 0.005, 600 iterations in each time step, and 12000 samples in each iteration.For the results labeled with SG, we used the sign gradient (SG) optimizer [73] for 15-30 iterations (depending on the resulting fidelity) before switching to the regular optimizer.The neural network architecture, initialization and optimization details are discussed in Appendix E.
In this section, we present additional results.Fig. 17  4.We used a linear fit for variances smaller than 0.2 and obtained a yintercept of −199.7923.Our variational ground state energy is −199.7803± 0.0005.The ansatz, initialization and optimization are the same as in Fig. 4 and are discussed in Appendix E.
is a 6-unit-cell quantum link model dynamics, which we can exactly diagonalize.We observed that the gauge invariant AR-NN matches the exact results up to t = 4, while the non-gauge ansatz quickly fails to capture the electric fields even with the sign gradient (SG) optimizer [73] (Fig. 17 (b)).In addition, the gauge invariant AR-NN in general has a lower per step infidelity (Fig. 17(c)).
In Fig. 19, we measure the local observables of the 12 × 12 toric code model.We show that even though the neural network does not automatically preserve translational symmetry, the optimization drives the neural network to a translationally symmetric state.In addition, the weights of RNN is translationally invariant, which, although not guaranteed, could be potentially useful for preserving translational symmetry.
Furthermore, we benchmark our method on the following model: With the additional p∈P e∈p σ y e , the Hamiltonian exhibits a sign problem compared to the original toric model, which would be challenging for the Monte Carlo method.We further test our method first on small systems (3 × 3 to compare against exact diagonalization) and find relative energy differences on the order of 10 −5 suggesting good agreement (see Fig. 20).We further apply our method on a 10 × 10 lattice.Though it's not clear what to benchmark exactly against here, we measure the energy difference from a variance extrapolated result, and find our variational answer is close to it within ∆E ∼ 0.1 (see Fig. 21).

Exact Representation of Ground State
We generalize our gauge invariant autoregressive construction for the 2D Z 2 toric code.Kitaev's D(G) model [26] is defined on an L × L periodic square lattice where each edge has a basis {|g⟩ , g ∈ G} for some group G.Here we focus on finite groups, in particular G = Z N for Z N theory.Without loss of generality, we attach an upward arrow for each edge in the y-direction and a right arrow for each edge in the x-direction.We employ the notation of Sec.IV B and introduce operators A g v and B hu,h d ,h l ,hr p as in Fig. 22.
The Hamiltonian defined on H(G) ⊗E is where  The real-imag parameterization.The raw output is used as the input of both the real branch and the imaginary branch, which later is converted to the amplitude branch and the phase branch.
Wave functions are complex in general but both the Transformer network and 1D/2D RNN are real.We use two approaches (Fig. 24)-(a) amplitude-phase and (b) real-imaginary-to parametrize complex wave functions from real neural networks.In both parameterizations, the input configuration x, together with a default configuration x 0 , is embedded (i.e. each state of a composite particle is mapped to a unique vector) before fed into the Transformer or 1D/2D RNN.Certain gauge blocks in an AR-NN take a default state x 0 as opposed to any state of the composite particles; the embedded vector of this default state has arbitrary parameters which are trained during optimization.The Transformer used in this work (Fig. 25) is the same as the Transformer used in Ref. 8, which can be viewed as the standard Transformer encoder with masked multi-head attention from Ref. 41 but without an additional add & norm layer.The Transformer consists of a standard positional encoding layer, which uses sinusoidal functions to encode the positional information of the embedded input.After positional encoding, the input is fed into the standard masked multi-head attention mechanism.The mask here is crucial for autoregressiveness, as it only allows each site to depend on the previous sites.The output of the attention layer is then passed through a standard feed forward layer.The detailed explanation of the Transformer can be found in Refs.8 and 41.This transformer is essentially equivalent to the standard Py-Torch implementation [50], but was implemented independently because that implementation did not exist at the start of our work.[38] on which different RNNs are constructed.This is the same GRU cell as the PyTorch [50] implementation.

FIG. 26. The GRU cell
For all RNNs in this work, we used the gated recurrent unit (GRU) cell [38] (Fig. 26) in PyTorch [50], which takes one input vector x k , the hidden input h k−1 , and computes where σ is the sigmoid function and ⊙ means elementwise product.We then build 1D and periodic 2D RNN cells (Fig. 27) based on the GRU cell.The 1D RNN cell computes (y raw , h new ) = GRUcell(x k , h old ), whereas the periodic 2D RNN cell computes

RNNs
With the RNN cells, we can build 1D and periodic 2D RNNs.
The 1D RNN (Fig. 28) has a multi-layer design and shares the same structure as the Pytorch [50] GRU [38].The embedded input configuration is fed into the cells one at a time through multiple layers and produces a raw output.In our work, the cells at different layers share the weight matrices and bias vectors.
The periodic 2D RNN has a more complicated design to capture the most correlations and can be viewed as  Before the first layer of the periodic 2D RNN, a special concatenation of the embedded input needs to be performed.At each location, the concatenation layer takes the four surrounding inputs (periodically) and concatenates them into a single vector.If any (or all) of the surrounding inputs lies later in the conditioning order in Fig. 29(b), the corresponding input is replaced with a default input x 0 .For a 4 × 4 2D input array shown in Fig. 30 in Fig. 31.

Initialization and Optimization
We use different initialization techniques for different models.For the QLM model, the initialization is done through tomography, minimizing − log |ψ(x)| 2 for desired configuration x (|• → • →⟩ for each unit cell in this case) .For the 2D toric code model, we set the weight matrix in the last linear layer to be 0 and the bias such that the wave function for each composite particle is √ 0. , which empirically produces a very low initial energy.We used a transfer learning technique, where we first train our neural network on a 6 × 6 model before training it on the 10 × 10 model.When transferred to the larger system, the weight and bias in the last linear layer is dropped and replaced with the initialization scheme described above.In Fig. 32, we show the effect of transfer learning on a 10 × 10 toric code model.With transfer learning, the energy is clearly lower than the energy without transfer learning.
For the anyon model, we set the weight matrix in the last linear layer to be 0 and the bias to be the state of  the phase diagram, we used a transfer learning technique, where we first train the neural network on 32 anyons with θ = 0, π/4, π/2, 3π/4, π, 5π/4, 3π/2, 7π/4, and 2π for 3000 iterations, and then transfer the model with θ that is closest to the desired value of θ for 40 anyons for another 3000 iterations.Similar to the toric code case, when transferred to the larger system, the weight and bias in the last linear layer is dropped and replaced by the initialization described above.In all models, except the last layer, the weights and biases are initialized using PyTorch's [50] default initialization.
For optimization, we used the Adam [76] optimizer with an initial learning rate of 0.01.For the QLM dynamics, the learning rate is halved at iterations 100, 200, 270, 350, and 420, for the QLM ground state optimization; the learning rate is halved at iterations 300, 600, 900, 1200, 1800, 2400, 3000, 4000, 5000, 6000, and 7000; and for the ground state optimization of other models, the learning rate is halved at iterations 100, 500, 1000, 1800, 2500, 4000, and 6000.In addition, for the 6-unitcell cases and 12-unit-cell m = 0.1 case with Gauss's law, we use the sign gradient (SG) optimizer [73] for 15-30 iterations (depending on the resulting fidelity) before switching to the regular optimizer, and for the 12-unitcell m = 2.0 case with Gauss's law, we modified the loss function by adding an energy penalty term described in Appendix F.

FIG. 4 .
FIG. 4. Variational ground state optimization for the openboundary QLM of different system sizes and different m's with gauge invariant construction.(a) The expectation value of the electric fields averaged over all links, (b) energy and(inset) energy variance per unit cell.We compare our results with the tensor network (TN) results (dashed lines in (a)) from Ref.59.The Transformer neural network has 1 layer and 32 hidden dimensions, whereas the RNN has 2 layers and 40 hidden dimensions.For both neural networks, we use the amplitude-phase parameterization (see Fig.24).The neural networks are randomly initialized.Then they are trained for 3000 iterations with 12000 samples on 40 unit cells.Then, we use transfer learning technique and train the Transformer for 1000 iterations on 80 unit cells and 600 iterations on 120 unit cells.The RNN is then trained for 1000 iterations on 80 and 120 unit cells and 600 iterations on 160 unit cells.The neural network architecture and optimization details are discussed in Appendix E.

FIG. 5 .
FIG. 5. Dynamics for the 6-and 12-unit-cell (12-24 sites and 12-24 links) open-boundary QLM for m = 0.1 and m = 2.0 with and without gauge invariant construction.The dashed curves are the exact results from the exact diagonalization for 6 unit cells.(a) The change in the energy during the dynamics.(b) The expectation value of the electric field averaged over all links.(c) The per step infidelity measure, where |Ψ⟩ and |Φ⟩ are defined in Sec.III.We use the Transformer neural network with 1 layer, 16 hidden dimensions for 6 unit cells and 32 hidden dimensions for 12 unit cells, and the realimaginary parameterization (see Fig.24).The initial state is |• → • →⟩ for each unit cell and we train the neural network using the forward-backward trapezoid method with the time step τ = 0.005, 600 iterations in each time step, and 12000 samples in each iteration.The neural network architecture, initialization and optimization details are discussed in Appendix E.

FIG. 6 .
FIG. 6. Dynamics of the gauge invariant AR-NN for the 12unit-cell QLM with (a) m = 0.1 and (b) m = 2.0.The ansatz, initialization and optimization are the same as in Fig. 5 and are discussed in Appendix E.

S 1 =
{b 1 , . . ., b j | j r=1 b r = 1} and S −1 = {c 1 , . . ., c j | j r=1 c r = −1} be the two possible sets of unchosen site configurations, where b r and c r are the configurations of the unchosen sites in x k .Consider a function f : S 1 → S −1 such that f (b 1 ) = −c 1 and f (b r ) = c r otherwise.Notice that f is bijective and thus S 1 and S −1 have the same cardinality, implying that after normalization |a( x k | x <k )|

FIG. 9 .
FIG. 9. (a) Energy, (inset) energy variance and (b) energy derivative (computed by the Hellman-Feynman theorem[60] as d ⟨H⟩/dh = ⟨ dH/dh ⟩ = − e∈E ⟨σ ze ⟩) versus h.We use the 2D RNN with 3 layers, 32 hidden dimensions and the amplitude-phase parameterization (see Fig.24).We use the transfer learning technique where we first train the neural network on a 6 × 6 model for 8000 iterations and then we transfer the neural network to the 10 × 10 model for another 1000 iterations.In each iteration, we use 12000 samples.The neural network architecture, initialization and optimization details are discussed in Appendix E.

FIG. 10 .
FIG. 10.Perimeter and area laws for the 10 × 10 2D toric code.The expectation value of the Wilson loop operator with respect to the (a) perimeter and (b) area of the loop in a log-y scale for h = 0.34 and 0.35.(c) The fitting of the correlation coefficient R 2 for the perimeter and area laws for different h.The ansatz, initialization, and optimization are the same as in Fig. 9 and discussed in Appendix E.

FIG. 11 .
FIG. 11.(a) Non-local string correlation function for the 10 × 10 2D toric code between a pair of particle and antiparticle with a distance of Ly apart.(b) The correlation of a pair of particle and anti-particle at a distance of Ly = 5 √ 2 for different h.The ansatz, initialization, and optimization are the same as in Fig. 9 and discussed in Appendix E.

FIG. 12 .
FIG.12.(a) Energy per site and (b) variance of energy per site for L × L toric code model with h = 0.36.We compare our results (blue squares) with the iPEPS results of infinite system size (dashed lines) from Ref.61 and the gauge equivariant neural network results from Ref.35.Notice that due to the difference in the definition of h, the h here is twice as large as in Ref.61.We use the 2D RNN with 3 layers, 32 hidden dimensions and the amplitude-phase parameterization (see Fig.24).The neural network is randomly initialized and trained for 8000 iterations with 12000 samples on 6 × 6 system.Then, we used the transfer learning technique to train the neural network on 8 × 8 and 10 × 10 systems for another 8000 iterations and on 12 × 12 system for 4000 iterations.The neural network architecture, initialization and optimization details are discussed in Appendix E.

FIG. 14 .
FIG. 14. Composite particles of the 3D toric code and X-cube fracton model.The colors are to help identify how composite particles are defined (i.e. which parts of (a) map to (b) and (c)).(a) Physical structure of the 3D toric code and X-cube fracton model.(b) Composite particles of the 3D toric code.We define each star as a composite particle and check bond consistency for adjacent particles similarly to the 2D toric code.(c) Composite particles of the X-cube fracton model.We define each cube as a composite particle and check bond consistency on faces of adjacent particles.

FIG. 16 .
FIG.16.The second Renyi entropy S2 versus the system size L for the optimized AR-NN for the Fibonacci anyons in a golden chain (θ = 0) with the periodic boundary condition.The inset shows the variance of energy of the AR-NN.The slope of the fitted line is the central charge c.The AR-NN is the 1D RNN with 3 layers, L hidden dimensions, and the amplitude-phase parameterization.The neural network is trained for 8000 iterations with a sample size of 12000.The neural network architecture, initialization, and optimization details are discussed in Appendix E.

m 2 FIG. 17 .
FIG. 17. Dynamics for the 6-unit-cell (12 sites and 12 links) open-boundary QLM for m = 0.1 and m = 2.0 with and without gauge invariant construction.The dashed curves are the exact results from the exact diagonalization for 6 unit cells.The "6-cell Gauss SG" is the same as the "6-cell Gauss" in Fig. 5. (a) The change in the energy during the dynamics.(b) The expectation value of the electric field averaged over all links.(c) The per step infidelity measure, where |Ψ⟩ and |Φ⟩ are defined in Sec.III.We use the Transformer neural network with 1 layer, 16 hidden dimensions and the realimaginary parameterization (see Fig.24).The initial state is |• → • →⟩ for each unit cell and we train the neural network using the forward-backward trapezoid method with the time step τ = 0.005, 600 iterations in each time step, and 12000 samples in each iteration.For the results labeled with SG, we used the sign gradient (SG) optimizer[73] for 15-30 iterations (depending on the resulting fidelity) before switching to the regular optimizer.The neural network architecture, initialization and optimization details are discussed in Appendix E.

HFIG. 18 .
FIG.18.Ground state energy extrapolation at m = 0.7 for 160 unit cells.The energies and variances are obtained from the training output of RNN in Fig.4.We used a linear fit for variances smaller than 0.2 and obtained a yintercept of −199.7923.Our variational ground state energy is −199.7803± 0.0005.The ansatz, initialization and optimization are the same as in Fig.4and are discussed in Appendix E.

FIG. 20 .
FIG. 20. (a)Relative error in energy and (b) variance of energy for 3 × 3 toric code model with an additional term discribed in Eq.A1.Here we choose h = 0.36 and run the neural network for different Jy's.We use the 2D RNN neural network with real-imaginary parameterization.The neural network is trained using Adam optimizer for 10000 iteraions with 12000 samples in each iteration.The neural network architecture, initialization and optimization details are discussed in Appendix E.

FIG. 24 .
FIG. 24.Two parameterizations of complex wave functions from autoregressive neural networks.(a) The amplitudephase parameterization.The raw output is used as the input of both the amplitude branch and the phase branch.(b)The real-imag parameterization.The raw output is used as the input of both the real branch and the imaginary branch, which later is converted to the amplitude branch and the phase branch.

FIG. 27
FIG. 27.(a) 1D RNN cell.A ResNet (skip connection)[75] is added between the input x k and output y k .To be noted that x k , y k , h old and hnew has the same dimension.(b) 2D RNN cell with periodic boundary condition.This cell requires four hidden inputs h oldi,i=1,2,3,4 and generates one hidden output hnew.Skip connections are added for both output y k and hidden output hnew.The average pooling reduces the hidden output to have the same dimension as each hidden input.To be noted that the dimension of x k and y k is four times the dimension of each h oldi and hnew.

FIG. 29 .
FIG. 29.(a) The hidden information path for 2D RNN with periodic boundary for a 3 × 3 system.Blue arrows show the non-boundary information path whereas the green arrows the periodic boundary information path.When there are less than four hidden inputs, zero vectors are used for padding.(b) The conditioning and sampling order of the 2D RNN.

FIG. 30 .FIG. 31 .
FIG.30.2D RNN input concatenation layer.For (a) a 3 × 3 input array with a default input x0, (b) the concatenation layer takes the four input vectors surrounding each site with periodic boundary condition and outputs the concatenated vector of the four surrounding vectors.x0 is used when the surrounding inputs appear later in the conditioning order.

FIG. 32 .
FIG. 32.Per site energy with and without transfer learning during the training process for the 10 × 10 toric code model.The first 10 iterations are not included as they are too large and outside the range of the figure.The energy with transfer learning is clearly lower than the energy without transfer learning.
FIG.3.Variational ground state optimization for the 6-unitcell (12 sites and 12 links) open-boundary QLM for m = 0 with and without gauge invariant construction.The gauge invariant autoregressive neural network reaches an accurate ground state while the ansatz without gauge constraints arrives at a non-physical state in the optimization.We use the Transformer neural network with 1 layer, 32 hidden dimensions and the real-imaginary parameterization (see Fig.24).
The neural network is randomly initialized and is trained for 1000 iterations with 12000 samples in each iteration.The neural network architecture and optimization details are discussed in Appendix E.
begin by showing that we can analytically generate an AR-NN for the ground state of H T C .One ground state of Eq. 11 is |ψ⟩ = v∈V (1 + A v ) |+⟩ Energies of the analytical constructions of the ground and excited states of the 11 × 11 2D and 4 × 4 × 4 3D toric code, and the 4 × 4 × 4 X-cube fracton model.Here N break is the number of Gauss' law violations.The dashed lines are the exact values for each model.The analytical construction generates the same values as the exact up to stochastic errors from sampling.
Gauss's law and the gauge constraint, and B p = huhrh d h l =1 G B hu,hr,h d ,h l |G| g∈G |g⟩, and |ψ⟩ = p∈P B p |+⟩ ⊗E is the ground state.This is because |ψ⟩ is a ground state for each A v and B p .It is easy to verify that B p |ψ⟩ = |ψ⟩.To see A v |ψ⟩ = |ψ⟩, notice that A v and B p commute andA v |+⟩ ⊗E = |+⟩ ⊗E .Similarly to the Z 2 toric code, the ground state can be constructed using gauge invariant autoregressive neural networks by defining each star as a composite particle and checking Gauss's law and bond consistency.FIG.21.Ground state energy extrapolation at h = 0.36 and Jy = 0.3 for 10 × 10 modified toric code model.The energies and variances are obtained from the training output.We used a linear fit for variances smaller than 0.2 and obtained a y-intercept of −213.938.Our variational ground state energy is −213.802±0.003.We use the 2D RNN neural network with real-imaginary parameterization.The neural network is trainined using the transfer learning technical for 5000 iterations after the 3 × 3 result with 12000 samples in each iteration.The neural network architecture, initialization and optimization details are discussed in Appendix E.