Flow-based sampling in the lattice Schwinger model at criticality

Recent results suggest that flow-based algorithms may provide efficient sampling of field distributions for lattice field theory applications, such as studies of quantum chromodynamics and the Schwinger model. In this work, we provide a numerical demonstration of robust flow-based sampling in the Schwinger model at the critical value of the fermion mass. In contrast, at the same parameters, conventional methods fail to sample all parts of configuration space, leading to severely underestimated uncertainties.

Many important physical systems across particle and condensed matter physics can be described in the language of quantum field theory (QFT). Lattice field theory (LFT) is the only known systematically improvable approach to ab-initio calculations of QFTs in nonperturbative regimes, such as quantum chromodynamics (QCD) at low energies. LFT is based on the pathintegral formulation of QFT, discretized on a Euclidean spacetime lattice. Monte Carlo techniques render the high-dimensional discretized path integral tractable by recasting the problem as statistical sampling: the expectation value of some observable O can be computed as where Z is the partition function, S E is the Euclidean action, and {U i } is a set of N samples of the lattice field degrees of freedom distributed as p(U ) = exp[−S E (U )]/Z. Statistical uncertainties decrease as 1/ √ N as the estimate converges to the true value.
In theories such as QCD, for which exact sampling algorithms are not known, Markov Chain Monte Carlo (MCMC) techniques are typically used. However, field samples or "configurations" from MCMC are correlated, i.e., subsequently generated configurations are not statistically independent. Depending on the MCMC approach, these "autocorrelations" may grow as the system is tuned towards criticality [1], e.g. to describe universal properties of condensed matter theories or access the continuum or large-N c limits of gauge theories [2,3]. Autocorrelations may become especially severe if MCMC updates are unlikely to generate transitions between modes that are separated in configuration space. This effect, known as "freezing," can prevent effective exploration of the distribution for any practical sample size, amounting to an in-practice violation of ergodicity-a necessary condition for the validity of MCMC.
Importantly, this affects Hybrid Monte Carlo (HMC), the state-of-the-art algorithm for sampling QCD field Demonstration of underestimated uncertainties when using HMC, in contrast to flow-based sampling which provides consistent results converging to a baseline value (black). The top panel shows estimates of the chiral condensate ψψ in the Schwinger model at critical parameters β = 2.0, κ = 0.276, and L = 16, as a function of the sample size N . The bottom panel shows the scaling of the statistical uncertainty δ ψψ . Flow-based sampling (blue) converges to the baseline value with uncertainties scaling as 1/ √ N . Meanwhile, HMC (gray, red) exhibits seemingly convergent uncertainties that are in fact severely underestimated, as indicated by the discrepancy with the baseline and sudden jumps when tunneling events occur (red).
configurations, which generates samples by continuously evolving the fields through configuration space via Hamiltonian dynamics [4]. These dynamics make the algorithm susceptible to freezing due to the topological properties of gauge fields, which divide the distribution into different modes or "topological sectors". As the system is tuned towards criticality, increasingly large potential barriers suppress tunneling between sectors. In contrast, emerging flow-based sampling algorithms [5][6][7][8][9][10][11] propose random hops throughout configuration space, unaffected by density barriers. While topological freezing presents a well-known obstacle to extending the reach of stateof-the-art lattice QCD calculations [1,12,13], promising results of flow-based samplers in theories without fermions [6,7,14,15], without gauge fields [9], or away from criticality [16] suggest that these methods may provide a path towards mitigating freezing in this context.
In this Letter, we show that flow-based sampling can circumvent topological freezing in a fermionic gauge theory at criticality. Specifically, we provide a numerical demonstration in the Schwinger model (two-dimensional quantum electrodynamics) at the critical value of the fermion mass, illustrating that the flow-based approach is robust at sample sizes where HMC fails. A stark example is shown in Fig. 1, where HMC estimates of a key observable in the theory appear to be converging to a biased value, while in fact the uncertainty is underestimated due to insufficient sampling of the different topological sectors. In contrast, the flow-based sampling estimate is accurate with reliable uncertainties.
Flow-based sampling for the Schwinger model.-Normalizing flow models [17][18][19] are based on applying a diffeomorphic "flow" transformation f to (possibly high-dimensional) samples z drawn from a base distribution, r(z).
This procedure yields samples U = f (z) distributed according to the model density q(U ) = r(z) |det ∂f /∂z| −1 . Flow-based sampling uses the model q to approximate a target distribution p. Neural networks can be used to construct an expressive and trainable flow, which can be optimized by minimizing the distance between p and q. Provably exact sampling that corrects for deviations between p and q can be obtained with independence Metropolis [20,21] or reweighting; we use the former in this work. These may be applied a posteriori, enabling embarrassingly parallel sampling that can provide practical advantages over HMC and sequential algorithms incorporating flows [14][15][16].
Wick rotating, discretizing, and integrating out the fermionic degrees of freedom yields a Euclidean lattice Schwinger model action amenable to Monte Carlo sampling [41][42][43], given in terms of gauge links U µ (x) at position x in direction µ. The first term is the gauge action, where β is the inverse of the squared gauge coupling, and the plaquette P (x) is the smallest possible Wilson loop-a gaugeinvariant product of links around a 1 × 1 square. It is defined as , wherê µ is the unit vector in the µ direction. The second term, given in terms of the Wilson Dirac operator D [44,45], encodes the effect of fermions and gauge-fermion interactions. The bare fermion mass m 0 is controlled by the hopping parameter κ = 1/(4+2m 0 ) that parametrizes D.
To achieve efficient sampling via a flow-based approach, it is critical to incorporate the physical properties of the target distribution. For the Schwinger model specifically, gauge invariance imposes strong constraints on the target distribution, which we build into our models using the framework of gauge-equivariant flows on compact manifolds developed in Refs. [6,7,46]. Another challenge is sampling of theories with fermionic degrees of freedom. Out of the four treatments in Ref. [9], here we consider a "marginal sampler" using exact evaluation of the fermion determinant. This means that our model describes only gauge degrees of freedom, and we compute Eq. (2) exactly during training and for MCMC sampling.
Following Ref. [6], gauge-equivariant flows are constructed by composing a sequence of equivariant coupling layers. Each coupling layer updates an "active" subset of the links conditioned on a disjoint "frozen" subset. Different partitionings are used in each layer so that all variables are updated. In each layer, gauge-invariant closed Wilson loops are computed from the frozen links and fed into a "context function" constructed from neural networks. The outputs are used to parametrize the transformation of the active links, which is constrained to commute with gauge transformations. Combined with a gauge-invariant base distribution, this yields a gaugeinvariant model.
Unlike in the κ = 0 limit of pure-gauge theory, the Schwinger model exhibits long-range correlations, with the correlation length defined by the inverse of the mass of the lightest particle. At criticality, the correlation length diverges, which demands new architectural features over those employed in pure-gauge models [6]. First, we use a subset of active links that is locally more sparse, with each active link completely surrounded by frozen ones, to allow for better propagation of information. Second, we provide larger 2 × 1 Wilson loops along with 1 × 1 plaquettes as inputs for context functions. Third, our architecture includes dilated convolutions, which have translational equivariance and better context aggregation, i.e., an exponential expansion of the receptive field without loss of resolution or coverage [47]. Fourth, we parametrize our transformations using highly expressive neural splines [48]. Finally, we decay the learning rate over the course of training.
We train this flow model for the Schwinger model at criticality and compare the performance of flow-based MCMC using this model against that of HMC. At finite lattice spacing, a diverging correlation length is realized by tuning κ to its critical value, resulting in a vanish-ing renormalized fermion mass. To achieve this, we take β = 2.0 and κ = 0.276 [41] with a square lattice of extent L = 16. Details of the architecture and training scheme are in the Supplementary Material.
Advantages of flow-based sampling.-A clear illustration of the advantages of flow-based sampling for the Schwinger model at criticality is given in Fig. 1, which compares estimates of the chiral condensate from HMC with those from flow-based MCMC. This quantity, is a simple fermionic observable whose value is correlated with the topological sectors and is therefore sensitive to freezing. We quantify uncertainties using the integrated autocorrelation time with the "gamma method" [49] and compute the baseline result using an augmented version of HMC that efficiently samples topological sectors. Clearly, the single frozen HMC stream yields estimates that are manifestly inconsistent with the baseline result, indicating severely underestimated uncertainties even at very large sample sizes, N ≈ 10 5 . The dataset of samples from six independent HMC streams can incorporate information from multiple topological sectors even in the presence of freezing. However, as the figure shows, this estimate is still biased for N ≈ 10 5 samples with incorrect uncertainties deceptively scaling as 1/ √ N . The estimate becomes consistent with the ground truth only when N 10 6 . The uncertainty, however, catastrophically increases-a clear indication of an ergodicity problem. This analysis suggests that affordable HMC stream lengths may not be sufficient to diagnose bias. By contrast, flow-based results converge smoothly to the baseline value, with errors scaling as 1/ √ N . Figure 2 provides a more direct illustration of freezing in the Monte Carlo histories of topological quantities. The topological sectors of the Schwinger model are distinguished by the integer-valued topological charge. A common discretization is [36] where the imaginary part of the complex logarithm is restricted to (−π, π]. Due to lattice artifacts, this observable fluctuates even when the topological sector is fixed. A better-suited observable to identify true tunneling events was proposed in Ref. [41]: The value of σ is positive (negative) for even (odd) topological sectors, and changes in its value are correlated with tunneling events.
Results for these observables based on two Markov chains generated with HMC (with different random number seeds) and one with flow-based sampling are illustrated in Fig. 2. In the first HMC stream, Q appears to fluctuate without any evidence of freezing. However, σ is completely frozen for all samples shown, implying that these fluctuations arise from discretization effects and do not correspond to tunneling events between topological sectors. In the second HMC stream, we see an abrupt change in the behavior of Q. This coincides with a change in σ, confirming that a true tunneling event has occurred. By contrast, flow-based sampling exhibits rapid fluctuation in both Q and σ, demonstrating sampling which rapidly mixes topological sectors. A fair and comprehensive comparison of the costs of HMC and flow-based MCMC requires quantifying three factors for each: setup costs, the raw computational cost of a sampling step, and the sampling efficiency (i.e., the degree of autocorrelation). Setup costs-predominantly, equilibration for HMC and training for flows-are particularly difficult to compare in this case. Full equilibration of HMC requires observing and discarding many tunneling events, which occur stochastically, while training costs for the flow-based approach may vary over orders of magnitude depending on the training scheme. Raw computational costs may be measured directly, but depend strongly on implementation details. On the same GPU hardware, we find that flow-based MCMC steps are ∼ 10 times less expensive than HMC trajectories, due to the frequent inversions of the Dirac operator in HMC. However, there is room for optimization in both cases.
Nevertheless, disregarding setup and raw computational costs, an approximate comparison of sampling efficiency is sufficient to show the advantage of flow-based sampling over HMC. Each algorithm exhibits some characteristic time between tunneling events; a chain with many times that number of steps will be required to incorporate information from all topological sectors with appropriate weights. For these HMC parameters, we find tunneling events are separated by ∼ 20k trajectories on average. Meanwhile, sampling with our flow model, the topological sector changes every ∼ 6 steps on average. Thus, for this model at the parameters investigated, we estimate that the advantage in sampling efficiency of flow-based MCMC over HMC is more than three orders of magnitude.
Conclusion and outlook.-In this Letter, we demonstrate that flow-based sampling can be applied to lattice gauge theories with fermion content at criticality. Specifically, we have developed an architecture that can successfully model long-range correlations in the Schwinger model at vanishing renormalized fermion mass. The resulting flow-based sampler does not suffer from topological freezing at these parameters and thus outperforms HMC by orders of magnitude. These results represent an important milestone in first-principles calculations in gauge field theories with fermions, such as QCD, using provably exact machine learning.
Importantly, the flow models described here may serve as an "engine" for a much broader class of sampling algorithms. For example, flow-based MCMC updates may be interleaved with steps of HMC [11] or other MCMC algorithms [50]. Such composite algorithms may provide improved sampling over either method alone. Another possible improvement is using the flow models developed here inside a hierarchical multilevel MCMC scheme, as proposed by Ref. [16]. Finally, our technical advances can be adapted for more general MCMC schemes, e.g., generalizations of the HMC algorithm [14,15] and stochastic normalizing flows [51][52][53].
Challenges remain on the road to large-scale applications, such as state-of-the-art QCD calculations. The sampling approach here employs exact evaluation of the fermion determinant, but more scalable approaches will be needed for larger volumes and theories in higher dimensions; extending the machine-learned stochastic determinant estimators of Ref. [9] to lattice gauge theories presents a promising opportunity. If the success demonstrated here for the Schwinger model can be extended to other theories, and in particular at scale, it will have widespread impact across nuclear and particle physics, as well as in condensed matter applications. . This work is associated with an ALCF Aurora Early Science Program project, and used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DEAC02-06CH11357. The authors acknowledge the MIT SuperCloud and Lincoln Laboratory Supercomputing Center [54] for providing HPC resources that have contributed to the research results reported within this paper. Numerical experiments and data analysis used PyTorch [55], Horovod [56], NumPy [57], and SciPy [58]. Figures were produced using matplotlib [59].
where k, µ, ν change for each layer. The flow is constructed by iterating through k ∈ {0, 1, 2, 3} in order first with µ = 0, ν = 1, then for µ = 1, ν = 0, such that all links are updated in 8 layers. We use a model with 48 layers, so each link is updated 6 times. For active loops, we use the plaquettes that project forwards from their corresponding active links ("active plaquettes"). Each layer has its own convolutional neural network that outputs the parameters defining the transformation of active plaquettes. Each neural network takes six input channels: where θ P is the argument of the plaquette, and θ 2×1 (θ 1×2 ) is the argument of the 2×1 (1×2) Wilson loop. The sin and cos are applied to ensure that the input is a continuous function of the gauge fields. Each neural network is built from three convolutions with kernel size 3 and dilation factors 1, 2, 3 in order (where 1 is a standard undilated convolution). Between each intermediate convolution, there are 64 hidden channels, and the final output has 10 channels. After each intermediate convolution we use LeakyReLU activations, but no activation is applied to the final output. The 10 output channels are used to parametrize the positions and slopes of the 3 knots of a circular rational quadratic spline s(θ P ), as well as an overall offset t. These are used to transform the active plaquettes θ P A as θ P A = s(θ P A ) + t. The active links are updated by inferring a link transformation that will induce the transformation of the active loops. The total number of parameters of the model is ∼ 2.2 × 10 6 . We initialize the model weights using the PyTorch 1.9 defaults. We use a self-training scheme where the loss function is a stochastic estimate of the Kullback-Leibler divergence [60] made with q-distributed samples generated by the model, where B = 3072 is the batch size (i.e., number of field samples generated for each gradient step), and the constant does not affect optimization and so is not computed. We train using the Adam optimizer with standard PyTorch 1.9 parameters. We decay the learning rate every 30k gradient updates following the schedule [3, 1.5, 0.75, 0.4, 0.2, 0.1, 0.05] · 10 −4 . In total, we train the model for 210k gradient steps. For each step, we clip the absolute value of each parameter gradient to be less than or equal to 0.1, and the norm over all gradients to be less than or equal to 10. We use a mixed-precision approach for both training and sampling, using single precision for the neural network evaluations but double precision for all other computations.
We construct an ensemble from the flow model by drawing independent samples as proposals for independence Metropolis to ensure exactness. The Metropolis acceptance rate is ∼ 17%. To obtain a sample of N configurations, we take the initial 1.25N configurations in the chain and discard the first 0.25N for equilibration.

III. HMC DETAILS
The HMC results presented in this work are for HMC applied to the exact determinant action Eq. (2), without using any additional pseudofermionic degrees of freedom. That is, the action for accept/reject tests is computed directly using Eq. 2, and molecular dynamics forces are computed using exact derivatives with respect to S f [U ]. We use trajectories of length τ HMC = 1 divided into 10 steps, yielding an acceptance rate of 94%. We use double precision.
The HMC data used in Fig. 1 (and Fig. S2 below) are taken from streams 2 × 10 5 trajectories long, with no trajectories discarded between measurements. Each stream is initialized from a hot start, i.e., all links drawn from independent uniform distributions. We use the same scheme for equilibration cuts as described above for flows. For the six-stream example, an equal number of configurations is taken from each stream. For the Schwinger model, it is possible to implement an augmentation step for HMC that proposes hops to other topological sectors [25], i.e., configurations U such that 0 = Q(U )−Q(U ) ≡ ∆Q ∈ Z. This is achieved by distributing the proposed change across links according to with coordinates in lattice units, i.e., x i ∈ {0, . . . , L − 1}. For simplicity, we restrict ∆Q ∈ {−2, −1, 0, 1, 2} and propose each ∆Q with equal probability. The proposal is accepted or rejected with a standard Metropolis step. The acceptance rate of these proposals is 34% for these parameters.
Interleaving this augmentation with HMC steps produces an algorithm with different properties than HMC alone (augmented HMC), and no equivalent construction is known for many theories, including QCD. Thus we solely use this method to obtain baseline results for the chiral condensate in Fig. 1 and the topological susceptibility in Fig. S2. To do so, an ensemble of 1.2 × 10 7 configurations is produced using these augmentation hits alternated with HMC steps. The baseline results are estimated to be ψ ψ = 1.50918 (9), χ Q = 0.003875(4). In Fig. S1 we show part of the Monte Carlo history of Q and σ for the baseline run with augmented HMC. As can be seen, rapid fluctuations occur at a scale comparable to the flow model. In particular, σ exhibits an asymmetric distribution along positive and negative values, as expected from the higher total weight of even topological sectors.

IV. TOPOLOGICAL SUSCEPTIBILITY
It is also interesting to look at the bias of an observable other than the chiral condensate illustrated in Fig. 1 in the main text. The topological susceptibility, defined as is shown computed from both HMC and the flow-based approach in Fig. S2. We observe similar behavior in this observable as for the chiral condensate of Fig. 1-an underestimation of uncertainties in the HMC results that is difficult to discern without very high statistics.