Equivariant flow-based sampling for lattice gauge theory

We define a class of machine-learned flow-based sampling algorithms for lattice gauge theories that are gauge-invariant by construction. We demonstrate the application of this framework to U(1) gauge theory in two spacetime dimensions, and find that near critical points in parameter space the approach is orders of magnitude more efficient at sampling topological quantities than more traditional sampling procedures such as Hybrid Monte Carlo and Heat Bath.

Many important physical theories are described by Lagrangians that are invariant under local symmetry transformations that form Lie groups; such theories are named gauge theories. For example, the standard model of particle physics, which is our most accurate description of nature at the shortest length scales, is a quantum field theory centered around the action of three gauge groups [1][2][3][4], and several important condensed matter systems can be described by effective gauge theories [5][6][7][8]. In the strongcoupling limit, these theories are nonperturbative, and numerical formulations on discrete spacetime lattices offer the only known way to compute properties of interest from first principles.
Calculations within lattice frameworks typically proceed by estimating expectation values of observables using Markov chain Monte Carlo (MCMC) to sample from thermodynamic distributions or Euclidean-time path integrals. In both cases, samples U (typically high dimensional) are drawn from an exponentially weighted distribution pðUÞ ¼ e −SðUÞ =Z, where the physics is encoded in an energy or action functional SðUÞ, and the normalizing constant Z is unknown. When MCMC sampling from the distribution pðUÞ is efficient, precise physical predictions can be made from the theory. However, as the model parameters are tuned towards criticality, e.g., to describe universal properties of condensed matter theories or to access the continuum or large-N limits of quantum field theories, critical slowing-down (CSD) can cause the computational cost of sampling to diverge [9,10].
In this Letter, we develop a provably correct flow-based sampling algorithm designed for lattice gauge theories with continuous gauge groups. Our approach is immediately applicable to U(1) gauge theory and can be applied to non-Abelian gauge theories, such as lattice QCD, upon development of bijective maps acting on single elements of the gauge group. Here, we demonstrate the application of this approach to U(1) gauge theory in two spacetime dimensions. In this theory, the field strength tensor fluctuates independently on each lattice site, and therefore, there is no critical limit corresponding to infinite correlation length; however, in the limit of small bare coupling, topological freezing results in severe difficulties when general sampling methods are applied. Hybrid Monte Carlo (HMC) [49] and heat bath (HB) [50][51][52] are two widely used MCMC approaches that are applicable to pure gauge theories with continuous gauge group (HMC is also applicable to a much broader class of theories, including gauge theories with fermion content). Both methods explore topological sectors very slowly, as seen in the Markov chain histories of topological charge depicted in Fig. 1. For 2D U(1) gauge theory, in particular, there are also a number of specialized calculation techniques that do not suffer from topological freezing, including a cluster algorithm [53], complex Langevin [54], dual variable approaches [55][56][57], and direct integration of the path integral [57]. We use the latter to check the accuracy of the numerical methods under study. In contrast, our flow-based algorithm is generally applicable to pure-gauge theories with continuous gauge groups; yet it produces independent samples of field configurations with appropriate frequency from each topological sector, enabling far more accurate estimation of topological quantities at a given computational cost when compared with similarly general methods. Critical to the success of this approach is enforcing exact gauge symmetry in the flow-based distribution: when the symmetry is enforced, we can successfully train flow-based models at a range of parameters approaching topologically frozen, while without it, models of similar scale fail to learn the distributions under the same training procedure.
Flow-based sampling.-Flow-based generative models allow sampling from an expressive family of distributions by applying an invertible function f to samples U from a fixed, simple prior distribution defined by a density rðUÞ [58]. The resulting samples U 0 ¼ fðUÞ are distributed according to a model density qðU 0 Þ. The invertible function is constructed specifically to allow efficient evaluation of the Jacobian factor for any given sample, so that the associated normalized probability density, is returned with each sample drawn. This feature enables training the flow model, i.e., optimizing the function f, by minimizing the distance between the model probability density qðU 0 Þ and the desired density pðU 0 Þ using a chosen metric. Any deviation from the true distribution due to an imperfect model can be corrected by a number of techniques; in this Letter, we apply independence Metropolis sampling [26]. (Reweighted observables can also be used [59,60]. This is efficient when measurements of the action are more costly than measurements of observables.) A powerful approach to defining a flexible invertible function f is through composition of several coupling layers, f ≔ g m ∘…∘g 1 . Coupling layers act on samples U by applying an analytically invertible transformation (such as a scaling) to a subset of the components U A ≔ fU i ∶i ∈ Ag, where the superscript i indexes components of the multidimensional sample U and the set A indicates the components that are transformed. The remaining (unmodified) components U B , defined by U B ¼ UnU A , are given as input to a feed-forward neural network that parametrizes the transformation. This variable splitting guarantees invertibility despite the use of noninvertible feed-forward networks.
Gauge-invariant flows.-Lattice gauge theories can be defined in terms of one gauge variable U μ ðxÞ per nearestneighbor link ðx; x þμÞ of the lattice. Thus, samples live in the compact manifold G N d V , where G is the manifold of the gauge group, N d is the spacetime dimension, and V is the lattice volume. The physical distribution pðUÞ is exactly invariant under a discrete translational symmetry group with V elements and a continuous V-dimensional gauge symmetry group, meaning that the density associated with any transformed field configurationŨ is identical to that of the untransformed configuration, pðŨÞ ¼ pðUÞ. Under a gauge transformation, links U μ ðxÞ are transformed by a group-valued field ΩðxÞ as In the flow-based approach, symmetries correspond to flat directions of the probability density that must be reproduced by the model. Exactly encoding symmetries in machine learning models can improve training and model quality compared with learning the symmetries over the course of training [25,28,[61][62][63][64][65]. The incorporation of translational symmetries into models is possible using convolutional architectures, as studied for example, in Ref. [61]. To address gauge symmetry, one could attempt to use a gauge-fixing procedure to select a single configuration from each gauge-equivalent class, leaving only physical degrees of freedom; however, the only known gauge fixing procedures that preserve translational invariance are based on implicit differential equation constraints [66], which do not have a straightforward implementation in flows. Here, instead, we introduce a method to preserve exact gauge invariance in flow-based models.
When a flow-based model is defined in terms of coupling layers, its output distribution will be invariant under a symmetry group if two conditions are met: (1) The prior distribution is symmetric; (2) Each coupling layer is equivariant under the symmetry, i.e., all transformations commute through application of the coupling layer [61,63,[67][68][69].
Choosing a prior distribution that is symmetric is typically straightforward, for example, a uniform distribution with respect to the Haar measure over gauge links is both translationally and gauge invariant. Using gaugeequivariant coupling layers with such a prior distribution then defines a gauge-invariant flow-based model.
Gauge-equivariant coupling layers.-We construct an explicitly gauge-equivariant and invertible coupling layer g∶G N d V → G N d V by splitting the input variables into subsets U A and U B . In terms of these subsets, we define the action of the coupling layer to be gðU A ; in which h∶G → G is an invertible kernel which is explicitly parametrized by a set of gauge-invariant quantities I i constructed from the elements of U B . Here, S i is a product of links such that U i S i forms a loop that starts and ends at a common point x, and, therefore, transforms under the gauge symmetry to With this definition, the coupling layer is gauge equivariant if the kernel satisfies which implies that U 0i →Ũ 0i transforms according to Eq. (2), To ensure invertibility, the product of links S i must not contain any links in U A . For an Abelian group, the transformation property in Eq. (4) is trivially satisfied by any kernel. In the U(1) gauge theory considered below, therefore, we define the kernel using invertible flows parametrized by neural networks. For non-Abelian theories, it has been shown that it is possible to construct invertible functions on spheres [70] and surjective functions on general Lie groups [71]. If these approaches can be generalized to produce invertible functions with convergent power expansions, they will satisfy the necessary kernel transformation property, since hðXWX † Þ ¼ P n α n ðXWX † Þ n ¼ XhðWÞX † . An example of a variable splitting suitable for both Abelian and non-Abelian gauge theories is given by the pattern depicted in Fig. 2. In this example, the set of updated links U A consists of vertical links spaced by four sites, and the products U i S i are 1 × 1 loops adjacent to each U i . This is sufficiently sparse such that every S i is independent of all updated links in U A , and a nontrivial set of invariants I i (e.g., all traced 1 × 1 loops that are not adjacent to updated links) can be constructed to parametrize the transformation. Composing coupling layers using rotations and offsets of the pattern allows all links to be updated. (For example, composing eight such layers is sufficient to update all links in 2D.) Using gauge-equivariant coupling layers constructed in terms of kernels generalizes the "trivializing map" proposed in Ref. [72]. There, repeatedly applying a specific kernel based on gradients of the action theoretically trivializes a gauge theory, i.e., maps the Euclidean time distribution to a uniform one. The family of gauge equivariant flows defined here includes the trivializing map in the limit of a large number of coupling layers and arbitrarily expressive kernel, indicating that, in this limiting case, exact sampling as described in Ref. [72] is possible. However, the approach presented here allows for more general and inexpensive parametrizations of h. These can be optimized to produce flows that similarly trivialize the theory and which may have a lower cost of evaluation than implementations of the analytical trivializing map [73].
Application to U(1) gauge theory.-Gauge theory with a U(1) gauge group defined in two spacetime dimensions is the quenched limit of 1 þ 1D electrodynamics, i.e., the Schwinger model [74]. The full Schwinger model reproduces many features of quantum chromodynamics (confinement, an axial anomaly, topology, and chiral symmetry breaking) while being analytically tractable. Even in the quenched limit, the well-defined gauge field topology results in severe slowing-down of MCMC methods for sampling lattice discretizations of the model as the coupling is taken to criticality. We consider the lattice discretization given by the Wilson gauge action [75] SðUÞ ≔ −β where PðxÞ is the plaquette at x defined in terms of link variables U μ ðxÞ ∈ Uð1Þ (3) Topological susceptibility χ Q ¼ hQ 2 =Vi, where topological charge Q ≔ ð1=2πÞ P x arg ½PðxÞ is defined in terms of plaquette phase in the principal interval, arg ½PðxÞ ∈ ½−π; π.
To investigate slowing-down due to topological freezing, we studied the theory at a fixed lattice size, L ¼ 16, using seven choices of the parameter β ¼ f1; 2; 3; 4; 5; 6; 7g; the topology is increasingly frozen as β → ∞. For each parameter choice, we trained gauge invariant flow-based models using a uniform prior distribution and a composition of 24 gauge-equivariant coupling layers. The kernels h were chosen to be mixtures of noncompact projections [70], which are suitable for U(1) group elements; in particular, we used six components for each mixture and parametrized each transformation with a convolutional neural network. The model architecture was held fixed across all choices of β, ensuring identical cost to draw samples for each parameter choice. To train the models, we minimized the Kullback-Leibler divergence between the model density qðUÞ and the target density e −SðUÞ =Z. Training was halted when the loss function reached a plateau. For this proof-of-principle study, we did not perform extensive optimization over the variable splitting pattern, neural network architecture, or training hyperparameters, and it is likely that better models can be trained, for example, using automatic hyperparameter and architecture searches [76].
After training, the flow-based models were used to generate proposals for an independence Metropolis Markov chain [26], producing ensembles of 100000 samples each. For comparison, ensembles of identical size were produced using the HMC and heat bath algorithms. For all choices of β, we fixed the HMC trajectory length to achieve > 80% acceptance rate when using a leapfrog integrator with five steps. Each HB step was defined as one sweep, i.e., a single update of every link. To within 10%, the computational cost per HMC trajectory was equal to the cost per proposal from the flow-based model in a single-threaded CPU environment, while the cost per heat bath step was half that of HMC or flow.
Using samples from a flow-based model as proposals within a Markov chain ensures unbiased estimates after thermalization; at the finite ensemble size used here, all observables were found to agree with analytical results within statistical uncertainties. Of the observables we studied, local quantities like powers of plaquettes and expectation values of small Wilson loops were estimated up to two times as precisely by HMC and HB than by the flow-based algorithm. However, Fig. 3 shows that, for observables with larger extent such as W l×l with l ≥ 4, and particularly for χ Q , large autocorrelations in the HMC and HB samples, in some cases on the order of the Markov chain length or longer, result in estimates that have lower precision than the flow-based estimates and have underestimated uncertainties, despite accounting for measured autocorrelations.
For Markov chain methods, the characteristic length of autocorrelations for an observable O can be defined by the integrated autocorrelation time τ int O [77]. Figure 4 compares τ int Q for HMC and HB to that in the flow-based algorithm as an indicator of how well the three methods explore the distribution of topological charge. For all three methods, τ int Q grows as β is increased. However, this problem is far less severe for the flow-based algorithm than for HMC or HB. For example, the autocorrelation time in the flowbased algorithm is approximately ten at the largest value of β, whereas τ int Q ≈ 4000 for HB and τ int Q ≈ 15000 for HMC. Accounting for the relative cost per step of each Markov chain, the flow-based Metropolis sampler is, therefore, roughly 1500 times more efficient than HMC and 200 times more efficient than heat bath in determining topological quantities. A promising possibility for further development is mixing flow-based Markov chain steps with HMC trajectories or heat bath sweeps to gain the benefits of standard Markov chain steps for local observables and of the flow-based algorithm for extended and topological observables.
Summary.-Critical slowing-down of sampling in lattice gauge theories is an obstacle to precisely estimating quantities of physical interest as critical limits of the theories are approached. Topological freezing is a particularly severe slowing-down preventing estimation of quantities coupled to gauge field topology in many lattice gauge theories. Flow-based models enable direct sampling from an approximation to the distribution of interest, from which estimates of physical observables can be derived that are exact in the infinite-statistics limit. Here, we introduce flow-based models constructed to satisfy exact gauge invariance and find that applying this approach to a twodimensional Abelian gauge theory enables more efficient estimation of topological quantities than the two other algorithms that are applicable to general lattice gauge theories, HMC and heat bath.
While the numerical investigations in this Letter demonstrate the efficiency of the proposed sampling approach in a region of parameter space where HMC and heat bath suffer significantly from topological freezing, it remains to be investigated how the algorithm scales in the thermodynamic and continuum limits. In the thermodynamic limit, one naively expects an exponential degradation of the independence Metropolis acceptance rate with volume (given a fixed model); the experimental question will be how the model complexity must be scaled to maintain the acceptance rate. In the continuum limit, it will be important to efficiently construct correlations over large length scales, which may be enabled by architectures such as hierarchical models [78,79] and dilated convolutions [80].
The approach presented here is generally applicable to gauge theories defined by Lie groups, including non-Abelian theories such as QCD. To extend this method to such theories, expressive invertible functions must be defined as the kernels of gauge equivariant coupling layers. There are several possible avenues forward; for example, Ref. [70] defines flows on spherical manifolds and Ref. [71] defines surjective (though not bijective) maps on Lie groups, both using neural network parametrizations. Future work will explore constructing kernels based on generalizations of these methods and, thus, producing gauge invariant flows for non-Abelian theories like QCD.
FIG. 4. Integrated autocorrelation time for the topological charge, τ int Q , measured on ensembles of 16 × 16 lattices generated using HMC, heat bath, and the flow-based algorithm. Ten replicas of each ensemble were used to estimate errors, which are smaller than the plot markers for most points.