Gauge equivariant neural networks for quantum lattice gauge theories

Gauge symmetries play a key role in physics appearing in areas such as quantum field theories of the fundamental particles and emergent degrees of freedom in quantum materials. Motivated by the desire to efficiently simulate many-body quantum systems with exact local gauge invariance, gauge equivariant neural-network quantum states are introduced, which exactly satisfy the local Hilbert space constraints necessary for the description of quantum lattice gauge theory with Zd gauge group on different geometries. Focusing on the special case of Z2 gauge group on a periodically identified square lattice, the equivariant architecture is analytically shown to contain the loop-gas solution as a special case. Gauge equivariant neural-network quantum states are used in combination with variational quantum Monte Carlo to obtain compact descriptions of the ground state wavefunction for the Z2 theory away from the exactly solvable limit, and to demonstrate the confining/deconfining phase transition of the Wilson loop order parameter.

Introduction -Quantum many-body systems defined on a lattice with local gauge invariance occur ubiquitously in the description of physics at diverse energy scales -they appear in the effective theories of many-electron systems [1] and topological phases [2], in bosonization of two-dimensional lattice fermions [3,4], and in the microscopically regulated description of interacting elementary particles in the Standard Model [5]. Quantum lattice gauge theories are characterized by the fact that the physical states span a subspace of the manybody Hilbert space which is defined by satisfying a set of local operator constraints. These operator constraints gives rise to group invariance of the associated wavefunction, called gauge invariance.
Because of the analytic intractability of gauge theories, it is important to develop techniques for simulating them. Methods such as lattice-gauge theory [6] accomplish this via a quantum-classical mapping which rewrites the quantum problems as a statistical mechanics problem in one higher dimension. This mapping can only be done for sign-free problems without introducing a 'negative' weight in the statistical mechanics model which induces either an exponential cost in the simulations or the need for additional approximations to mitigate the sign-problem. Alternatively, quantum gauge theories can be simulated using variational methods based on compact parameterizations of many-body wavefunctions. DMRG [7], a variational approach based on matrix-product states, has been extensively used to analyze gauge theories [8]. While DMRG is efficacious for one-dimensional systems, the bond-dimension of the matrix product state needed to accurately represent the ground state grows exponentially with width in higher dimensions.
Machine-learning techniques based on a neuralnetwork variational representation of quantum states [9], have extended the scope of variational methods by accurately representing low-energy states of strongly correlated systems in two or more spatial dimensions [10][11][12][13][14][15][16][17][18]. Imposing physical symmetries in neural-network quantum states is a very active research topic [19,20] that reflects the broader need to impose symmetries in machine learning applications to physics [21]. Our work further extends the scope of neural-network quantum states by developing gauge equivariant neural networks which are special-purpose variational families of wave-functions that are explicitly gauge invariant. Our work is inspired by recent developments of groupinvariant networks with group equivariant layers. These have found applications both in data-science and physics. For data-science, the data-generating process is often assumed to be invariant under a symmetry group G [22,23]. For example, Ref. 24 derived equivariant layers starting from the assumption that the input data transforms covariantly as a tensor field on a Riemannian manifold. Very recently, gauge equivariant networks have been constructed and variationally optimized using the reverserelative entropy to approximate the Gibbs distribution associated with the Euclidean action functional for lattice Yang-Mills in the case of abelian U (1) [25] and subsequently non-abelian SU (N ) gauge group [26] (a different gauge-equivariant construction for this non-abelian group is also considered in [27]). The approach of [25,26], which is closely related to variational inference [28], relies on the existence of an analytic continuation between the quantum theory and a positive-definite Gibbs measure, which presents a challenge in the presence of fermions, however, due to the well-known sign problem.
The paper is organized as follows. The Z 2 gauge theory Hamiltonian is introduced using a notation that generalizes to the Z d gauge group and higher dimensional models such as 3D toric code and X-cube model (explicitly described in SM III and IV). The general construction of the gauge equivariant neural network is then described. Gauge equivariant neural wavefunctions are variationally optimized using variational Monte Carlo on square lattices up to system size 12 × 12, demonstrating the transition of Wilson loop order parameters from perimeter to area law. Z 2 Gauge Theory -We start by briefly reviewing the formulation of the lattice gauge theory for the simplest non-trivial gauge group Z 2 = {−1, 1}. The generalization to Z d and higher dimensional models, such as 3D toric code and X-cube model, are discussed in Sec. III and IV of the Supplemental Material. We consider the Hamiltonian with Z 2 fields on the edges e ∈ E of a periodic square lattice, with B f := e∈f Z e , where F is the set of the smallest 1 × 1 plaquettes on the lattice, e ∈ f are the edges around plaquette f and Z e , X e are the usual Pauli matrices. Let V be the set of vertices on the lattice. In order to define the Hilbert space of physical states, define a local operator for each vertex v ∈ V consisting of the product of Pauli-X operators incident on the given vertex, A v := e v X e , where e v indicates the edges containing v. These vertex operators commute amongst themselves, commute with the Hamiltonian H and have eigenvalues ±1. In this work we focus on the so-called even gauge theory in which the physical Hilbert space H phys is chosen to be the +1 eigenspace of all vertex operators, Since the vertex operators satisfy the global operator identity v∈V A v = 1, it follows that only |V | − 1 of the constraints defining H phys are independent. The dimension of physical state space is therefore found to be dim H phys = 2 |E| 2 |V |−1 = 2 L 2 +1 . Further details on the description of the Hilbert space are provided in Sec. I of the Supplemental Material.
The Z 2 gauge theory is exactly solvable in both the weak coupling (h → 0) and strong coupling (h → ∞) limits. For infinite transverse field h = ∞ the nondegenerate ground state is simply the uniform superposition state |+ ⊗E which is manifestly gauge invariant, where |+ satisfies X|+ = |+ . In the opposite extreme of h = 0, the ground states are also eigenstates of all B f and are four-fold degenerate. As shown originally by Wegner using duality arguments [29], the uniform superposition |+ ⊗E and the ground states at h = 0 correspond to different phases of matter, which are distinguished by the expectation value of a non-local operator called the Wilson loop which is defined for any closed curve C ⊆ E on the lattice as followŝ Wegner found a critical value of the transverse field h c separating a deconfined phase for h < h c in which Ŵ C decays exponentially with the perimeter of C, from a confined phase where Ŵ C decays exponentially with the area enclosed by C.
Gauge equivariant neural networks -We present now a neural network which explicitly preserves the gauge invariance of the wave-function. The classical configuration space Z E 2 can be regarded as a subset of the continuous vector space C E := C L×L×2 consisting of tensors with shape (L, L, 2), where the edge e is specified by the vertex v ∈ V indexed by the first two axes and the direction µ ∈ {x,ŷ} indexed by the third axis. The components of an arbitrary tensor φ ∈ C E will then be indexed as φ µ (v) where v ∈ V and µ ∈ {x,ŷ}. The action of the gauge group on the space of (L, L, 2) tensors is described as follows. Given a square matrix Ω ∈ {−1, 1} V := {−1, 1} L×L , we define a gauge transformation g Ω : C E → C E by the following rule, where Ω(v) and Ω(v + µ) denote the entries of the matrix Ω at the lattice location v ∈ V and the shifted lattice location v + µ ∈ V , assuming boundaries are periodically identified. It is straightforward to show that the gauge transformation associated with Hilbert space operator A v is given by g Ωv where Ω v is defined for each v ∈ V by, A wave-function which obeys the Gauss law constraint is then one in which for the case where φ ∈ Z E 2 . Let us call a function h : wavefunction which consists of multiple layers of gauge equivariant functions interwoven with pointwise nonlinearities followed by a final gauge invariant layer is guaranteed to obey Eq. 6 since group equivariance is preserved by composure and pointwise nonlinearities. In the following, we construct neural network blocks which are equivariant and invariant respectively.
Gauge Equivariant Layer -The construction of the gauge equivariant layer parallels the construction found in [25], applied to discrete abelian groups. Given vertices v, v ∈ V and a path γ ⊆ E from v to v , consisting of a sequence of adjacent edges, we define the Wilson path as the function W γ : C E → C given by the formula, It is easy to show that for any Ω and any path γ from v to v we have The above Wilson path is the fundamental primitive from which the gauge-equivariant layers will be constructed. Note that if γ is a closed curve C then the function W C is gauge invariant. Consider a fixed equivariant layer described by the equivariant function f : C E → C E . Each such layer is specified by decorating the edges e = (v, v + µ) ∈ E of the lattice with the following data

A path
From the above data we construct a gauge-equivariant function f : The equivariance follows directly from Eq. (8). The explicit calculation is outlined below for convenience of the reader, Numerical Experiments -Here we determine the phase diagram of Eq. 1 for different values of the transverse field h. When h = 0, we can analytically write a network which exactly represents the ground state with a single gauge invariant block (and no equivariant blocks), which is detailed in Sec. II of the Supplemental Material. For all h, we use variational Monte Carlo to approximately determine the ground states on square lattices. The family of variational wavefunctions is summarized  The ground state expectation value of rectangle Wilson loops of size l1 × l2 (l1, l2 ≤ 4) as a function of the enclosed area AC in the confining phase for h > hc (Top) and of the enclosed perimeter PC in the deconfined phase for h < hc (Bottom) on a 12 × 12 lattice. The linear fit to the loglinear plot is consistent with area law scaling Ŵ C ∼ e −αA C with best fit parameter α = 0.185 and a perimeter law scaling Ŵ C ∼ e −α P C with best fit parameter α = 0.00718. a a Due to the smallness of α , we note that a linear fit of Ŵ C versus P C is also consistent with the data. Real-valued weights and biases are used in all neural networks. The neural networks h e are chosen to be convolutional neural network with periodic padding to capture symmetry and facilitate transfer learning. Residual layers or equivalently skip connections, which are manifestly gauge equivariant, are also employed. The neural network parameters are optimized using the Stochastic Reconfiguration algorithm [30]. Further details about the architecture and optimization are provided in Sec. V of the Supplemental Material.
We start with a benchmark of the hamiltonian in Eq. 1 on a 3 × 3 square lattice by comparing gauge equivariant neural network, gauge invariant network, restricted Boltzmann machine (RBM) and exact diagonalization, where Fig. 2 presents the energy and variance. In addition, we benchmark a hamiltonian with a sign problem H = −J f ∈F e∈f Z e − h e∈E X e − J y f ∈F e∈f Y e for J = h = 1, which results are shown in Fig. 3. The number of variational parameters for the above three neural networks are 66, 24 and 1044 respectively. It can be seen that even with small number of parameters, the gauge equivariant neural network achieves better performance than the RBM and attains accurate results close to the exact. We further apply our method to larger square lattices of size L×L with L ∈ {8, 10, 12}. It is known that the Wilson loop expectation value Ŵ C decays exponentially with area law for h > h c and with perimeter law for h < h c [31]. In Fig. 4, the variational wave function is shown to capture the area law and the perimeter law behaviors of the Wilson loop in the corresponding regimes and attains the related decay factors. In Fig. 5, we compute the energy derivatives and perform a simultaneous fitting of area and perimeter law for log Ŵ C with different h. The changes of the data at around h = 0.3 suggest a deconfinement/confinement phase transition, which is consistent with [32,33].
Discussion and future directions -In this work, we have showed how to use gauge equivariant networks to represent variational states which exactly obey local gauge constraints. This significantly expands the space of models whose phase diagrams can now be numerically established. For example one could variationally explore the phase diagrams of the Z d lattice gauge theory for d > 2 (see SM III); the 3D Toric-Code (SM IV); X-cube fracton model (SM IV) or Kitave D(G) models (SM V) with external field; or models with disorder. Another interesting application would be to relax the restriction to the even sector of the gauge theory and explore the physics of different gauge sectors. Beyond the explicit constructions given here, it will be interesting to further extend the reach of such networks by generalizing the approach described in this work to models with different gauge symmetries or constraints from gauging subsystem symmetries. Finally, the ability to exactly employ gauge constraints variationally has potential to have impact beyond ground state calculations for example in overcoming obstacles in optimizing combinatorial structures [34] or in successful quantum-state tomography [35].
Acknowledgements. Di Luo acknowledges helpful discussion with Jiayu Shen, Luke Yeo, Oleg Dubinkin, Ryan Levy, Peijun Xiao and Ruoyu Sun. We acknowledge support from the Department of Energy grant DOE de-sc0020165. The work is partially done with intern and computational support from the Center for Computational Mathematics at the Flatiron Institute. The numerical experiments were conducted using NetKet [36].

I. Detailed Description of the Z2 Gauge Theory Hilbert Space
The lattice underlying the gauge theory is chosen to be the L × L square lattice with periodic boundary conditions and topology of the torus. Let G = (V, E) denote the undirected interaction graph with |V | = L 2 vertices and |E| = 2L 2 edges and |F | = L 2 faces, in accordance with Euler's formula |V | − |E| + |F | = 2 − 2g for a torus, which has genus g = 1. Vertices, edges and faces are indexed by v ∈ V , e ∈ E, and f ∈ F respectively. Each edge hosts a qubit C 2 = span C {|−1 , |1 } with basis vectors labeled by the eigenvalues [37] of the Pauli-Z operator Z|±1 = ±|±1 , so the joint tensor product Hilbert space for all qubits is H = e∈E C 2 with orthonormal basis elements |x := e∈E |x e where x e ∈ Z 2 . It will be convenient to define the superposition state |+ = 1 √ 2 (|−1 + |1 ) ∈ C 2 . Cardinalities of sets are omitted when they appear in superscripts so, for example, x ∈ Z E 2 denotes a ±1 string of length |E|, indexed by the edges of the graph G.

II. Exact Representation of Z2 Toric Code Ground States
In this section, we provide exact constructions of gauge equivariant neural networks for the ground states and the excited state of the Z 2 toric code model. This can be done with a neural network with no equivariant blocks and a single invariant block. It differs from our construction in the main text in two additional ways: (1) we use a fully connected network instead of a convolution network and (2) in addition to all the primitive plaquettes we include two topologically non-trivial loops around the torus.
There are four degenerate ground states of the Z 2 model. One exact ground state can be obtained by starting with the uniform superposition state |+ ⊗E and applying the following projection operator which commutes with all vertex operators, (S1) which gives rise to the loop-gas ground state |ψ loop-gas = P |+ ⊗E . Excited states are obtained from the ground state by relaxing eigenvalues of various plaquette operators B f to −1, in a manner consistent with the global operator constraint f ∈F B f = 1. Because the toric code is frustration free, to be a ground state of Eq. 1, it suffices to be a ground state of B f and X e separately. As the gauge equivariant construction already ensures the system is a ground state of X e we simply choose a function of the elementary Wilson loops for our network where each C i is the elementary plaquette for i = 1, ..., L 2 , C x is a loop across the whole torus from x-direction and C y is a loop across the whole torus from y-direction. We must choose a h e such that Eq. S2 is a ground state for each B f , i.e. x|B f |ψ = x|ψ for each B f and x. It implies that x|ψ = 0 for any |x such that B f |x = −|x . Consider two operators τ x z = i∈C e x Z i and τ y z = i∈C e y Z i . τ x z and τ y z further distinguishes the four degenerate ground state, where the expectation values a, b of the two operators are from one of the choices in {±1, ±1}. Hence, we define h e as follows where h 1 (W Ci (φ)) = 1 for W Ci (φ) = 1 and zero otherwise, h x (W Cx (φ)) = 1 for W Cx (φ) = a and zero otherwise, h y (W Cy (φ)) = 1 for W Cy (φ) = b and zero otherwise. The above construction provides an exact representation of a one-layer gauge invariant network for each of the ground state of Z 2 toric code.  In this section we provide gauge equivariant neural network construction for general Kitaev Models over Z d group and non-abelian groups. Consider the usual L × L square periodic lattice where each edge has a basis {|g , g ∈ G} for certain group G. We focus on finite group here and group G = Z d for Z d theory. Without loss of generality, we attach an upward arrow for each edge in y-direction a right arrow for each edge in x-direction. We introduce operators A g v and B f as follows  Given a configuration x on the lattice, vertices v 1 , v 2 ∈ V and a directional path γ ⊆ E from v 1 to v 2 , consisting of a sequence of adjacent edges with direction, we define the generalized Wilson path as the function W γ : G E → G E given by the formula,

The Hamiltonian defined on H(G) ⊗E is
(S13) where the product is directional such that it takes original value g on the edge when the path γ direction agrees with the edge direction and inverse value g −1 when the path γ direction is opposite to the edge direction. We claim that for any open path γ, W γ is gauge equivariant, i.e. W γ commutes with A g v for each v and g. To see this, we consider three cases. The first case is that A g v does not act on any edge in the path γ and it is clear that they commute. The second case is that A g v acts on a vertex along the path γ but not v 1 , v 2 . Notice that A g v must touch two adjacent edges at the same time and due to the arrow convention, the effect of A g v will also cancel out in the product. The last case is that A g v is on one of the vertex v 1 , v 2 of the path γ, it is straightforward to verify directly A g v commutes with W γ . Without loss of generality, we consider the a Wilson path function on the bottom edge of a plaquette with path γ going clockwise. One can check the following

P5
For a closed loop C, we also introduce a Wilson loop function W C : Notice that W C is gauge invariant, i.e. W C • A g v (φ) = W C (φ) for any v and g. This is because any A g v touches either no edge or two adjacent edges along C. The invariance property holds clearly if no edge is touched and still holds if two edges are touched since the effect of A g v always cancel out due to the arrow convention. Without loss of generality, this can be verified on each plaquette as follows With W γ and W C , we can construct gauge equivariant and invariant layer for the gauge equivariant neural network with Eq. S15 and Eq. S16 for general Kitave model with both Z d and non-abelian group G. A gauge-equivariant function f : and a gauge-invariant function h : In particular, we can construct a one-layer gauge invariant network for the ground state as follows where each C i is the elementary plaquette for i = 1, ..., L 2 , h 1 (W Ci (φ)) = 1 for W Ci (φ) = tr(I) for the identity element I ∈ G and zero otherwise. This is because A g v |ψ = |ψ for each v, g from the above discussion and so is A v |ψ = |ψ . And B f |ψ = |ψ holds due to the fact that for any finite group tr(g) = tr(I) implies that g = I.

VI. Neural Network Architecture and Numerical Details
In this section we provide the details for the architecture of the gauge equivariant neural network. There are two basic components for the network, which are the equivariant layer and the invariant layer. For the equivariant layer, we choose γ e to be for horizontal edge and for vertical edge and all C e i to be in Eq. 9. The function h e is taken to be convolutional neural network (CNN)and leaky relu activation function. One can further compose different equivariant layers in series or in parallel. For the invariant layer, it is expressed as f e (φ) = h e W C e 1 (φ), . . . , W C e ne (φ) , with all C e i to be and h e to be convolutional neural network. There are two outputs of h e in the invariant layer, which parameterizes the log amplitude and the phase of the wave function separately. The elu activation is used for the log amplitude output while the softsign activation is used for the phase output. The full network is made of composition of equivariant layers ( Fig. 1(b)) followed by an invariant layer (Fig. 1(c)) in the end. For all the 8 × 8, 10 × 10, 12 × 12 experiments, we use neural network architecture as Fig. 1 with kernel size of the CNN to be 4, 5, 5 respectively. For experiments in Fig. 2, the gauge equivariant neural network uses architecture in Fig. 1(a) with one channel of gauge equivariant blocks and the gauge invariant neural network uses architecture in Fig. 1(c), where the kernel size in all CNNs is 3. The RBM has hidden neurons three times as large as input neurons and outputs log amplitude and phase for the wave function.
Stochastic reconfiguration was performed with batch size 1000 and fixed learning rate 0.05. The number of iteration is 120 in general except that the experiments in 10 × 10 and 12 × 12 have iteration 150. The experiment in 12 × 12 starts with transfer learning of ansatzs optimized in 10×10. The convolution neural network parameters are initialized with orthogonal initialization [39]. For sampling, we adopt the standard Monte Carlo sampling with single spin flip.

VII. Observables and Quantities across Phase Transition
We provide further figures (see Fig. S7, S8, S9, S10, S11) on observables and quantities across the phase transition.