Parallel Tempered Metadynamics: Overcoming potential barriers without surfing or tunneling

At fine lattice spacings, Markov chain Monte Carlo simulations of QCD and other gauge theories with or without fermions are plagued by slow modes that give rise to large autocorrelation times. This can lead to simulation runs that are effectively stuck in one topological sector, a problem known as topological freezing. Here, we demonstrate that for a relevant set of parameters, Metadynamics can be used to unfreeze 4-dimensional SU(3) gauge theory. However, compared to local update algorithms and the Hybrid Monte Carlo algorithm, the computational overhead is significant in pure gauge theory, and the required reweighting procedure may considerably reduce the effective sample size. To deal with the latter problem, we propose modifications to the Metadynamics bias potential and the combination of Metadynamics with parallel tempering. We test the new algorithm in 4-dimensional SU(3) gauge theory and find that it can achieve topological unfreezing without compromising the effective sample size, thereby reducing the autocorrelation times of topological observables by at least two orders of magnitude compared to conventional update algorithms. Additionally, we observe significantly improved scaling of autocorrelation times with the lattice spacing in 2-dimensional U(1) gauge theory.


I. INTRODUCTION
In recent years, physical predictions based on lattice simulations have reached sub-percent accuracies [1].With ever-shrinking uncertainties, the demand for precise extrapolations to the continuum grows, which in turn necessitates finer lattice spacings.Current stateof-the-art methods for simulations of lattice gauge theories either rely on a mixture of heat bath [2][3][4][5][6] and overrelaxation [7][8][9] algorithms for pure gauge theories or molecular-dynamics-based algorithms like the Hybrid Monte Carlo algorithm (HMC) [10] for simulations including dynamical fermions.For all of these algorithms, the computational effort to carry out simulations dramatically increases at fine lattice spacings due to critical slowing down.While the exact behavior depends on a number of factors, such as the update algorithms, the exact discretization of the action, and the choice of boundary conditions, the scaling of the integrated autocorrelation times with the inverse lattice spacing can usually be described by a power law.
In addition to the general diffusive slowing down, topologically non-trivial gauge theories may exhibit topological freezing [11][12][13][14][15][16][17][18].This effect appears due to the inability of an algorithm to overcome the action barriers between topological sectors, which can lead to extremely long autocorrelation times of topological observables and thus an effective breakdown of ergodicity.
Over the years, several strategies have been developed to deal with this situation.On the most basic level, it has become customary in large scale simulations to monitor the topological charge of the configurations in each ensemble, thus avoiding regions of parameter space which are affected by topological freezing [19][20][21].Another possibility to circumvent the problem consists in treating fixed topology as a finite volume effect and either correcting observables for it [22,23] or increasing the physical volume sufficiently to derive the relevant observables from local fluctuations [24].It is also possible to use open boundary conditions in one lattice direction [25], which invalidates the concept of an integer topological charge for the price of introducing additional boundary artifacts and a loss of translational symmetry.
In this work we propose a new update algorithm, parallel tempered Metadynamics (PT-MetaD), and demonstrate its efficiency in 4-dimensional SU(3) at parameter values where conventional update algorithms suffer from topological freezing.In its basic variant, which we present here, PT-MetaD consists of two update streams simulating the same physical system.One of the streams uses any suitable, conventional algorithm, while the other one includes a bias potential that facilitates tunneling between topological sectors.At regular intervals, swaps between the two streams are suggested, so that the good topological sampling from the bias potential stream carries over to the other stream.The algorithm thus combines ideas from parallel tempering [77], Metadynamics [78], and multicanonical simulations [43], leading to an efficient sampling of topological sectors while avoiding the problem of small effective sample sizes which is usually associated with techniques involving reweighting, such as Metadynamics or multicanonical simulations.Additionally, applying PT-MetaD to theories including fermions is conceptually straightforward.
This paper is organized as follows.We start out by giving a general introduction to Metadynamics in Section II.Afterwards, Section III describes our simulation setup, including our choice of actions, observables, and update algorithms.Some details on the application of Metadynamics in the context of SU(3) gauge theory are also given.In Section IV, we present baseline results obtained with conventional update algorithms, including a rough determination of gradient flow scales for the doubly blocked Wilson (DBW2) action.In Section V we present results obtained with pure Metadynamics for 4dimensional SU(3) and discuss several possible improvements.In Section VI we introduce parallel tempered Metadynamics and show some scaling tests of the new algorithm in 2-dimensional U(1) gauge theory, as well as exploratory results in 4-dimensional SU (3).Finally, in Section VII, we conclude with a summary and outlook on the application of the new algorithm to full QCD.Appendices A to D contain mathematical conventions and derivations related to the Metadynamics force calculation, and Appendix E summarizes exact results for observables in 2-dimensional U(1) gauge theory.

II. METADYNAMICS
Consider a system described by a set of degrees of freedom {U }, where the states are distributed according to the probability density with the partition function Z defined as The expectation value of an observable O is defined as In the context of lattice gauge theories, the integration measure D[U ] is usually the product of Haar measures for each link variable, but more generally D[U ] may be understood as a measure on the configuration space of the system.Metadynamics [78] is an enhanced-sampling method, based on the addition of a history-dependent bias potential V t (s(U )) to the action S(U ), where t is the current simulation time.The dynamics of the modified system are governed by S M t (U ) = S(U ) + V t (s(U )), and now explicitly depend on a number of observables s i (U ), with i ∈ {1, . . ., N }, that are referred to as collective variables (CVs).These CVs span a low-dimensional projection of the configuration space of the system, and may generally be arbitrary functions of the underlying degrees of freedom {U }.However, when used in combination with molecular-dynamics-based algorithms, such as the Hybrid Monte Carlo algorithm, the CVs need to be differentiable functions of the underlying degrees of freedom.During the course of a simulation, the bias potential is modified in such a way as to drive the system away from regions of configuration space that have been explored previously, eventually converging towards an estimate of the negative free energy as a function of the CVs, up to a constant offset [79,80].Usually, this is accomplished by constructing the potential from a sum of Gaussians g(s), so that at simulation time t, the potential is given by The exact form of the Gaussians is determined by the parameters w and δs i : Both parameters affect the convergence behavior of the potential in a similar way: Increasing the height w or the widths δs i may accelerate the convergence of the potential during early stages of the simulation but lead to larger fluctuations around the equilibrium during later stages.Furthermore, the widths δs i effectively introduce a smallest scale that can still be resolved in the space spanned by the CVs, which needs to be sufficiently small to capture the relevant details of the potential.
If the bias potential has reached a stationary state, i.e., its time-dependence in the region of interest is just an overall additive constant, the modified probability density, which we shall also refer to as target density, is given by with the modified partition function Expectation values with respect to the modified distribution can then be defined in the usual way, i.e., via On the other hand, expectation values with respect to the original, unmodified probability density can be written in terms of the modified probability distribution with an additional weighting factor.For a dynamic potential, different reweighting schemes have been put forward to achieve this goal [81], but if the potential is static, the weighting factors are directly proportional to the exponential of the bias potential: The case of a static potential is thus essentially the same as a multicanonical simulation [43].
In situations where the evolution of the system is hindered by high action barriers separating relevant regions of configuration space, Metadynamics can be helpful in overcoming those barriers, since the introduction of a bias potential modifies the marginal distribution over the set of CVs.For conventional Metadynamics, the bias potential is constructed in such a way that the marginal modified distribution is constant: Conversely, for a given original distribution p(s) and a desired target distribution p ′ (s), the required potential is given by: Nevertheless, it is important to recognize that even if the bias potential completely flattens out the marginal distribution over the CVs, the simulation is still expected to suffer from other (diffusive) sources of critical slowing down as is common for Markov chain Monte Carlo simulations.

A. Choice of gauge actions
For our simulations of SU(3) gauge theory, we work on a 4-dimensional lattice Λ with periodic boundary conditions.Configurations are generated using the Wilson [82] and the DBW2 [83] gauge actions, both of which belong to a one-parameter family of gauge actions involving standard 1 × 1 plaquettes as well as 1 × 2 planar loops, which may be expressed as Here, W kµ,lν (n) refers to a Wilson loop of shape k × l in the µ-ν plane originating at the site n.The coefficients c 0 and c 1 are constrained by the normalization condition c 0 + 8c 1 = 1 and the positivity condition c 0 > 0, where the latter condition is sufficient to guarantee that the set of configurations with minimal action consists of locally pure gauge configurations [84].For the Wilson gauge action (c 1 = 0), only plaquette terms contribute, whereas the DBW2 action (c 1 = −1.4088)also involves rectangular loops.
It is well known that the critical slowing down of topological modes is more pronounced for improved gauge actions than for the Wilson gauge action [12,14,15,17,18]: A larger negative coefficient c 1 suppresses small dislocations, which are expected to be the usual mechanism mediating transitions between topological sectors on the lattice.Among the most commonly used gauge actions, this effect is most severely felt by the DBW2 action.In previous works [15,17], local update algorithms were found to be inadequate for exploring different topological sectors in a reasonable time frame at lattice spacings around 0.06 fm.Instead, the authors had to generate thermalized configurations in different topological sectors using the Wilson gauge action, before using these configurations as starting points for simulations with the DBW2 action.Thus, this action allows us to explore parameters where severe critical slowing down is visible, while avoiding very fine lattice spacings and thereby limiting the required computational resources.

B. Observables
The observables we consider here are based on various definitions of the topological charge, and Wilson loops of different sizes at different smearing levels.The unrenormalized topological charge is defined using the cloverbased definition of the field strength tensor: This field strength tensor is given by where the clover term C µν (n) is defined as and P µ,ν (n) denotes the plaquette: Alternatively, the topological charge may also be defined via the plaquette-based definition, here denoted by Q p : Similar to the clover-based field strength tensor, the plaquette-based field strength tensor is defined as: Both Q c and Q p formally suffer from O(a 2 ) artifacts, although the coefficient is typically smaller for the cloverbased definition Q c .The topological charge is always measured after O(30) steps of stout smearing [85] with a smearing parameter ρ = 0.12.To estimate the autocorrelation times of the system, we consider the squared topological charge [18], the Wilson gauge action, and n × n Wilson loops for n ∈ {2, 4, 8} at different smearing levels.
We denote the latter two by S w and W n , respectively.

C. Update algorithms
Throughout this work, we employ a number of different update schemes: To illustrate critical slowing down of conventional update algorithms and to set a baseline for comparison with Metadynamics-based algorithms, we use standard Hybrid Monte Carlo updates with unit length trajectories (1HMC), a single heat bath sweep (1HB), five heat bath sweeps (5HB), and a single heat bath sweep followed by four overrelaxation sweeps (1HB+4OR).The local update algorithms are applied to three distinct SU(2) subgroups during each sweep [6], and the HMC updates use an Omelyan-Mryglod-Folk fourth-order minimum norm integrator [86] with a step size of ϵ = 0.2, which leads to acceptance rates above 99 % for the parameters used here.
We compare these update schemes to Metadynamics HMC updates with unit length trajectories (MetaD-HMC), and a combination of parallel tempering with Metadynamics (PT-MetaD) which is discussed in more detail in Section VI.
An important requirement for the successful application of Metadynamics is the identification of appropriate CVs.In our case, the CV should obviously be related to the topological charge.However, it should not always be (close to) integer-valued but rather reflect the geometry of configuration space with respect to the boundaries between topological sectors.On the other hand, the CV needs to track the topological charge closely enough for the algorithm to be able to resolve and overcome the action barriers between topological sectors.A straightforward approach is to apply only a moderate amount of some kind of smoothing procedure, such as cooling or smearing, to a gluonic definition of the topological charge, for which we choose Q c .Since these smoothing procedures involve spatial averaging, the action will become less local, which complicates the use of local update algorithms.Therefore, we use the HMC algorithm to efficiently update the entire gauge field at the same time, which requires a differentiable smoothing procedure, such as stout [85] or HEX smearing [87].Due to its simpler implementation compared to HEX smearing, we choose stout smearing here.Previous experience [88] seems to indicate that for Q c , four to five stout smearing steps with a smearing parameter ρ = 0.12 strike a reasonable balance between having a smooth CV and still representing the topological charge accurately.We found that using Q p would require significantly more smearing steps, whereas some improved definitions involving more general rectangular loops would not reduce the necessary amount of smearing.
The force contributed by the bias potential may be written in terms of the chain rule: . . .∂U (1) Here we have introduced the notation V meta for the bias potential and Q meta for the CV to clearly distinguish it from other definitions of the topological charge.Note that there is an implicit summation over the lattice sites n i and the Lorentz indices µ i .The first term in the equation, corresponding to the derivative of the bias potential with respect to Q meta , is trivial, but the latter two terms are more complicated: The derivative of Q meta with respect to the maximally smeared field U (s) is given by a sum of staples with clover term insertions, and the final terms correspond to the stout force recursion [85] that also appears during the force calculation when using smeared fermions.Note that in machine learning terminology, this operation is essentially a backpropagation [89] and may be computed efficiently using reverse mode automatic differentiation.More details on the calculation of the force can be found in Appendices B to D. The bias potential is constructed from a sum of onedimensional Gaussians, as described in Section II, and stored as a histogram.Due to charge conjugation symmetry, we can update the potential symmetrically.Values at each point are reconstructed by linearly interpolating between the two nearest bins, and the derivative is approximated by their finite difference.To limit the evolution to relevant regions of the phase space, we introduce an additional penalty term to the potential once the value of Q meta has crossed certain thresholds Q min and Q max .If the system has exceeded the threshold, the potential is given by the outermost value of the histogram, plus an additional term that scales quadratically with the distance to the outer limit of the histogram.
Unless mentioned otherwise, we have used the following values as default parameters for the potential: Q max/min = ±8, n bins = 800, w = 0.05, while δQ 2 has always been set equal to the bin width, i.e., ( It is often convenient to build up a bias potential in one or several runs, and then simulate and measure with a static potential generated from the previous runs.In some sense, this can be thought of as a combination of Metadynamics and multicanonical simulations.

IV. RESULTS WITH CONVENTIONAL UPDATE ALGORITHMS
To establish a baseline to compare our results to, we have investigated the performance of some conventional update algorithms using the Wilson and DBW2 gauge actions.Furthermore, we have made a rough determination of the gradient flow scales t 0 and w 0 for the DBW2 action.Some preliminary results for the Wilson action were already presented in [88].

A. Critical slowing down with Wilson and DBW2 gauge actions
In order to study the scaling of autocorrelations for different update schemes, we have performed a series of simulations with the Wilson gauge action on a range of lattice spacings.The parameters were chosen to keep the physical volume approximately constant at around (1.1 fm) 4 , using the scale given by the rational fit function in [90], which was based on data from [91].A summary of the simulation parameters can be found in Table I.TABLE I.A summary of the simulation parameters for the Wilson gauge action runs using conventional update algorithms.The scale was set via the rational fit from [90] (where r0 = 0.49 fm), which in turn used data from [91].Since autocorrelation times near second-order phase transitions are expected to be described by a power law, we use the following fit ansatz in an attempt to parametrize the scaling: All autocorrelation times and their uncertainties are estimated following the procedure described in [92].Figure 1 shows the scaling of the integrated autocorrelation times of 2×2 Wilson loops W 2 and the square Q 2 c of the cloverbased topological charge with the lattice spacing.Additionally, the figure also includes power law fits to the data and the resulting values for the dynamical critical exponents z(W 2 ) and z(Q 2 c ).Both observables were measured after 31 stout smearing steps with a smearing parameter ρ = 0.12.While the integrated autocorrelation times of both observables increase towards finer lattice spacings and are adequately described by a power law behavior, the increase is much steeper for the squared topological charge than for the smeared 2 × 2 Wilson loops.Below a crossover point at a ≈ 0.08 fm, the autocorrelation times of the squared topological charge start to dominate.They can be described by both a dynamical critical exponent z ≈ 5 or, alternatively, by an exponential increase that was first suggested in [16].This behavior is compatible with the observations in [18].
In contrast, the autocorrelation time of Wilson loops is compatible with a much smaller exponent z ≈ 1-2.As can be seen in Table III, the critical exponent does not change significantly with the size of the Wilson loop after 31 stout smearing steps.Generally, the integrated autocorrelation times of smeared Wilson loops slightly increase both with the size of the loops and the number of smearing levels.The only exception to this behavior occurs for larger loops, where a few steps of smearing are required to obtain a clean signal and not measure the autocorrelation of the noise instead.
Regarding the different update algorithms, the unit length HMC does show a somewhat better scaling behavior for all observables than the local update algorithms, but is also the most computationally expensive update scheme considered here (see Table II). 1 For all the local update algorithms, the critical exponents are very similar, but the combination of one heat bath and four overrelaxation steps has the smallest prefactor.It is interesting to note that this algorithm is also approximately twice as fast as the five-step heat bath update scheme, while still providing smaller autocorrelation times.The single step heat bath without overrelaxation, although numerically cheapest, exhibits the worst prefactor of the local update algorithms.Note that the reported numbers for the critical exponents differ from those in [88] due to a different fit ansatz (in the proceedings, the ansatz included an additional constant term).For the DBW2 action, the problem is more severe.Figure 2 shows the time series of the topological charge for two runs using the 1HMC and the 1HB+4OR update scheme.Both simulations were carried out on 16 4 lattices at β = 1.25 using the DBW2 action.c (right) for different update schemes using the Wilson gauge action.The scaling of both observables can be described using a power-law fit (Equation ( 20)) and is compatible with a dynamical critical exponent z ≈ 2 for the Wilson loops and z ≈ 5 for the squared topological charge.Details on the simulation parameters are listed in Table I.Evidently, both update schemes are unable to tunnel between different topological sectors in a reasonable time.Only a single configuration during the 1HB+4OR run and two (successive) configurations during the 1HMC run fulfill the condition |Q c | > 0.5.

B. Scale setting for the DBW2 action
To the best of our knowledge, scales for the DBW2 action in pure gauge theory have only been computed based on simulations with β ≤ 1.22 [17,94], and interpolation formulas are only available based on data with β ≤ 1.04 [95].Since here we perform simulations at β = 1.25, we compute approximate values for t 0 [96] and w 0 [97], which allows us to estimate our lattice spacings for com-parison to the Wilson results.Both scales are based on the energy density E, which is defined as: Similar to the topological charge definitions, we adopt a plaquette-and clover-based definition of the field strength tensor, with the only difference being that the components are also made traceless, and not just anti-Hermitian.The gradient flow scales t 0 and w 0 are both defined implicitly: The flow equation was integrated using the third-order commutator free Runge-Kutta scheme from [96] with a step size of ϵ = 0.025.Measurements of the clover-based energy density were performed every 10 integration steps, and t 2 ⟨E(t)⟩ was fitted with a cubic spline, which was evaluated with a step size of 0.001.For every value of β, two independent simulations with 100 measurements each were performed on 48×32 3 lattices.Every measurement was separated by 200 update sweeps with the previously described 1HB+4OR update scheme, and the initial 2000 updates were discarded as thermalization phase.
Our results are displayed in Table IV.Using the physical value of √ t 0 = 0.1638 (10) fm from [98], these results imply a physical volume of approximately (0.9 fm) 4 and a temperature of around 219 MeV for the 16 4 lattice from the previous section.
In order to facilitate comparison with other results, we also provide an interpolation of our lattice spacing results.For this purpose, we use a rational fit ansatz with three fit parameters that is asymptotically consistent with perturbation theory [90] and has a sufficient number of degrees of freedom to describe our data well: For our reference, clover-based t 0 scale setting, this results in a fit with χ 2 /d.o.f.≈ 1.31 and parameters d 1 ≈ 1.0351, d 2 ≈ −1.3763, d 3 ≈ 0.4058, which is displayed in Figure 3.We want to emphasize that these results are not meant to be an attempt at a precise scale determination but rather only serve as an approximate estimate.Especially for the finer lattices, the proper sampling of the topological sectors cannot be guaranteed, and the comparatively small volumes may introduce nonnegligible finite volume effects.

V. RESULTS WITH METADYNAMICS
Figure 4 shows the time series of the topological charge obtained from simulations with the HMC and the MetaD-HMC with five and ten stout smearing steps on a 22 4 lattice at β = 6.4035 using the Wilson gauge action.Both MetaD-HMC runs tunnel multiple times between different topological sectors, whereas the conventional HMC run essentially displays only a single tunneling event between sectors Q = 0 and Q = 1.The autocorrelation times of Q to define the CV: For the run with five smearing steps (68) for the run with ten smearing steps.A noteworthy difference between the two MetaD-HMC runs is the increase of fluctuations with higher amounts of smearing.If too many smearing steps are used to define the CV, the resulting Q values will generally be closer to integers, which will eventually drive the system to coarser regions of configuration space.Since these regions do not contribute significantly to expectation values in the path integral, it is desirable to minimize the time that the algorithm spends there.This is directly related to the issue of small effective sample sizes after reweighting, which we will discuss in more detail in Section V B.
A similar comparison of topological charge time series for the DBW2 action can be seen in Figure 5. Here, two MetaD-HMC runs with four and five stout smearing steps on a 16 4 lattice at β = 1.25 are compared to the 1HMC and 1HB+4OR runs, which were already shown in Figure 2.Both conventional update schemes are confined to the zero sector, whereas the two MetaD-HMC runs explore topological sectors up to |Q| = 6.
More quantitatively, the integrated autocorrelation time of Q 2 c is estimated to be τ int (Q 2 c ) = 5126(1500) for the run with 4 smearing steps and τ int (Q 2 c ) = 4159(1137) for the run with 5 smearing steps.On the other hand, lower bounds for the autocorrelation times of the 1HMC and 1HB+4OR update schemes are 4 × 10 5 , which is larger by more than a factor of 70.
To illustrate the role of the CV Q meta , it may be helpful to compare the time series of Q meta and Q c , as shown in Figure 6.The two observables are clearly correlated, but Q meta is distributed more evenly between integers.

A. Computational overhead and multiple timescale integration
A fair comparison of the different update schemes also needs to take the computational cost of the algorithms into account.nificant efforts were made to optimize the performance of our implementation of the MetaD-HMC, it is still clear that the additional overhead introduced by the computation of the Metadynamics force contribution is significant for pure gauge theory.The relative overhead is especially large compared to local update algorithms, which are already more efficient than the regular HMC.Note, however, that due to the more non-local character of the DBW2 gauge action, the relative loss in efficiency when switching to Metadynamics from either a local update algorithm or the HMC is already noticeably smaller.Since the majority of the computational overhead comes from the Metadynamics force contribution, and the involved scales are different from those relevant for the gauge force, it seems natural to split the integration into multiple timescales in a similar fashion to the Sexton-Weingarten scheme [99]: The force contributions from the bias potential are correlated to the topological charge, which is an IR observable, whereas the gauge force is usually dominated by short-range, UV fluctuations.Therefore, it is conceivable that integrating the Metadynamics force contribution on a coarser timescale than the gauge force could significantly decrease the required computational effort, while still being sufficiently accurate to lead to reasonable acceptance rates.
We have attempted to use combinations of both the Leapfrog and the Omelyan-Mryglod-Folk second-order integrator with the Omelyan-Mryglod-Folk fourth-order minimum norm integrator in a multiple timescale integration scheme.Unfortunately, we were unable to achieve a meaningful reduction of Metadynamics force evaluations without encountering integrator instabilities and deteriorating acceptance rates.However, this approach might still be helpful for simulations with dynamical fermions, where it is already common to split the forces into more than two levels.
Even if such a multiple timescale approach should turn out to be unsuccessful in reducing the number of Metadynamics force evaluations, we expect the relative overhead of Metadynamics to be much smaller for simulations including dynamical fermions.In a previous study [42] it was found that compared to conventional HMC simulations, simulations with Metadynamics using 20 steps of stout smearing were about three times slower in terms of real time.

B. Scaling of the reweighting factor and improvements to the bias potential
Due to the addition of the bias potential, unweighted averages do not lead to expectation values with respect to the original, physical probability density.If corrected with a reweighting procedure, the overlap between the sampled distribution and the distribution of physical interest needs to be sufficiently large for the method to work properly.A common measure to quantify the efficiency of the reweighting procedure is the effective sample size (ESS), defined as where w i is the respective weight associated with each individual configuration.In the case of a static bias potential, this is simply e V (Qmeta,i) .We found the normalized ESS, i.e., the ESS divided by the total number of configurations, to generally be of order O(10 −2 ) or lower when simulating in regions of parameter space where conventional algorithms fail to sample different topological sectors.
Although the low ESS ultimately results from the fact that the bias potential is constructed in such a way as to have a flat marginal distribution over the CV, we can nonetheless distinguish between two contributions towards this effect.On the one hand, there is the inevitable flattening of the intersector barriers by the bias potential, which is necessary to facilitate tunneling between adjacent topological sectors.On the other hand, 0 10000 20000 30000 40000 50000 Monte Carlo time 6.Time series of the CV Qmeta and the topological charge Qc, measured after 5 and 30 stout smearing steps with a smearing parameter of ρ = 0.12, respectively.The data are from the 5stout MetaD-HMC run shown in Figure 5.
however, the different weights of the individual topological sectors are also canceled by the bias potential.While it is necessary to faithfully reproduce the intersector barriers, the leveling of the weights of the different topological sectors is often unwanted.It increases the time that the simulation spends at large values of |Q|, so that these sectors are overrepresented compared to their true statistical weight.It is therefore conceivable that by retaining only the intersector barrier part of the bias potential, the relative weights of the different topological sectors will be closer to their physical values, and the ESS will increase.The resulting marginal distribution over the topological charge is then expected to no longer be constant, but rather resemble a parabola.In cases where this modification to the bias potential is used, we will either explicitly mention it or include the abbreviation "mod." to make a clear distinction between it and the original potential.
Here and in Section VI of this work, we perform scaling tests of the proposed improvements in 2-dimensional U(1) gauge theory, where high statistics can be generated more easily than in 4-dimensional SU(3) gauge theory.The action is given by the standard Wilson plaquette action, and updates are performed with a single-hit Metropolis algorithm.The topological charge is defined using a geometric integer-valued definition: For all Metadynamics updates, we use a field-theoretic definition of the topological charge that is generally not integer-valued: Since the charge distributions obtained from the two definitions already show reasonable agreement without any smearing for the parameters considered here, we can use local update algorithms and directly include the Metadynamics contribution in the staple.A similar idea that encourages tunneling in the Schwinger model by adding a small modification to the action was proposed in [100].
In previous tests in 2-dimensional U(1) gauge theory, we found that the bias potentials could be described by a sum of a quadratic and multiple oscillating terms [101]: Here, we fit our bias potentials that are obtained from the 2-dimensional U(1) simulations to this form.We then obtain a modified bias potential by subtracting the resulting quadratic term from the data.Table VI contains the normalized ESS and integrated autocorrelation times for different lattice spacings on the same line of constant physics in 2-dimensional U(1) theory.We compare Metadynamics runs using bias potentials obtained directly from previous simulations with Metadynamics runs using potentials that were modified to retain the relative weights of the topological sectors as described above.We see large improvements for both the ESS and τ int in the modified case, even for the finest lattices considered.
We expect that the quadratic term is mostly relevant for small volumes and high temperatures.With larger volumes and lower temperatures, the slope should decrease, and with it the importance of correctly capturing this term.On the other hand, the oscillating term is expected to grow more important with finer lattice spacings, as the barriers between the different sectors grow steeper.Thus, the oscillating term needs to be described more and more accurately towards the continuum.
A standard technique to decrease, but not completely eliminate, action barriers is well-tempered Metadynamics [102].In this approach, the height of the added Gaussians w decays with increasing potential.In our tests, we found that this method increases the ESS at the cost of higher autocorrelation times.Whether the gains of the ESS outweigh the loss from the higher autocorrelation times depends on the choice of parameters.Although this technique might yield moderate improvements in the overall sampling efficiency, we decided not to attempt any fine-tuning of the parameters at this point.

C. Accelerating the equilibration/buildup of the bias potential
Another avenue of improvement is accelerating the buildup of the bias potential.This aspect becomes especially relevant when considering large-scale simulations, where current simulations are typically limited to O(10 4 ) update sweeps, and a lengthy buildup phase of the bias potential would render the method infeasible.Additionally, the range of relevant Q values will also increase with larger physical volumes.
The first idea explored here is to exploit the aforementioned well-tempered variant of Metadynamics, by choosing a larger starting value of the Gaussian height w and letting it decay slowly so as to minimize the change in the potential that arises from the decay.While this approach introduces the decay rate as another fine-tunable parameter, we found that this did indeed reduce the number of update iterations required to thermalize the potential.A small caveat is that an optimal choice of the decay rate requires prior knowledge on the approximate height of the action barriers.
A second way of reducing the buildup time is to use an enhancement of Metadynamics which is most commonly referred to as multiple walkers Metadynamics [103], where the potential is simultaneously built up by several independent streams in a trivially parallelizable way.In our case, it is convenient to make each stream start in a distinct topological sector.In 2-dimensional U(1) gauge theory, this can be achieved by seeding each stream with an instanton configuration of charge Q, which can be constructed according to [104] The serial and parallel buildup are compared in Figure 7 where the potential parameters for each stream are given by: Q max/min = ±7, n bins = 1400, and w = 0.002.In the case of 4-dimensional SU(3) the direct construction of instantons with higher charge is not quite as simple as in 2-dimensional U(1) gauge theory.The construction of lattice instantons with even charge is described in [104], and lattice instantons with odd charge can be constructed by combining multiple instantons with charge |Q| = 1 [88,105].Regardless, starting with instantons is not required, since we only need each stream to fall into the specified sector.The time until the streams start to tunnel is a first indicator of the thermalization timescale of the potential.Independent of the possible improvements mentioned here, a fine-tuning of the standard Metadynamics parameters could also prove to be worthwhile in regard to accelerating the buildup and improving the quality of the bias potential.In order to eliminate the problem of small effective sample sizes observed in our Metadynamics simulations due to the required reweighting, we propose to combine Metadynamics with parallel tempering [77].This is done in a spirit similar to the parallel tempering on a line defect proposed by Hasenbusch [30].We introduce two simulation streams: One with a bias potential, and the other without it, while the physical actions S(U ) are the same for both streams, as illustrated in Figure 8.Since we are working in pure gauge theory, this means the second stream without bias potential can be simulated with local update algorithms.After a fixed number of updates have been performed on the two streams, a swap of the configurations is proposed and subject to a standard Metropolis accept-reject step.The action difference is given by where the indices of the quantities denote the number of the stream, and V t is the bias potential in the first stream.It is apparent and important to note that the action difference is simple to compute regardless of what the physical action looks like.Even in simulations where dynamical fermions are present, the contributions from the physical action are always canceled out by virtue of the two streams having the same action parameters; only the contribution from the bias potential remains.Moreover, the action differences, and thus the swap rates, should be largely independent of the volume.Since the second/measurement stream samples configurations according to the physical distribution no reweighting is needed, and thus the effective sample size is not reduced.Additionally, if the swaps are effective, the measurement stream will inherit the topological sampling from the stream with bias potential and thus also sample topological sectors well.Effectively, the acceptreject step for swap proposals serves as a filter for configurations with vanishing weight, thereby decreasing the statistical uncertainties on all observables weakly correlated to the topological charge.What remains to be seen is whether the efficiency of the sampling of the topological sectors carries over from the bias potential stream to the measurement stream.In this section, we address this question via both scaling tests in 2-dimensional U(1) and exploratory runs in 4-dimensional SU(3) in a region where conventional update algorithms are effectively frozen.

A. Scaling tests in 2-dimensional U(1)
We carried out a number of simulations in 2dimensional U(1) gauge theory using the same parameters and simulation setup as described in Section V B. We use bias potentials already built for these Metadynamics runs and keep them static in a number of parallel tempered Metadynamics runs.For each set of parameters, we carried out one run with the respective unmodified potential and one run with a potential modified as described in Section V B. In these runs, swaps between the two streams were proposed after each had completed a single update sweep over all lattice sites.The run parameters, as well as the resulting autocorrelation times of the topological charge Q, can be found in Table VII.
Since the relevant configuration space is now the product of configuration spaces of the individual streams, autocorrelation timescales of observables that are defined on the product space will now contain important information about the dynamics of the system.Therefore, we additionally monitor the sum of the squared topological charges on both streams.This observable allows us to distinguish the fluctuations in Q originating from true tunneling events from repeated swaps between the two streams without tunneling.
Figure 9 shows the scaling of the total amount of independent configurations, which is given by the quotient of the effective sample size Equation ( 25) and the inte-TABLE VII.Integrated autocorrelation times for different lattices on the same line of constant physics in 2-dimensional U(1) gauge theory, using both Metropolis and PT-MetaD updates.Observables indexed with 1 are taken from the stream with bias potential, whereas those indexed with 2 are taken from the regular stream.The modification of the bias potential is discussed in Section V B. Overall, 10 7 measurements were performed with a separation of 10 update sweeps between every measurement.grated autocorrelation time of the topological susceptibility.The performance of the standard Metropolis algorithm is compared to parallel tempered and standard Metadynamics, with both modified (see Section V B) and non-modified bias potentials.PT-MetaD performs well throughout the entire range of lattice spacings, consistently outperforming standard Metadynamics by more than an order of magnitude.Most importantly, the ratio of independent configurations seems to reach a plateau for finer lattice spacings, indicating an improved scaling behavior compared to conventional Metadynamics, which in itself already scales significantly better than the single-hit Metropolis algorithm.It is also worth noting that the modified bias potential provides better results than the non-modified one, as evidenced by the normalized effective sample sizes and swap rates presented in Table VIII.This is consistent with our expectation that large excursions in the topological charge, which produce irrelevant configurations, are curbed by the modification of the bias potential.A more detailed look at the effectiveness of the new algorithm is provided by Figure 10.It compares the results of PT-MetaD and standard Metadynamics at our finest lattice spacing, with and without modification of the bias potential.For reference, exact values from analytical solutions [40,[106][107][108] are also provided (see Appendix E for more details).First, we note that there is no signifi- cant difference in the performance between standard and parallel tempered Metadynamics in the topology related observables Q and Q 2 in the case of a modified bias potential.This indicates that the swaps are effective in carrying over the topological sampling of the bias potential stream to the measurement stream.On the other hand, the inclusion of the irrelevant higher sectors with the unmodified bias potential does increase the error bars, and there is some indication that not all of the topological sector sampling is carried over into the measurement run of PT-MetaD.Furthermore, Figure 10 [40,[106][107][108]. Results obtained from a run with standard Metropolis updates are off the scale everywhere except for Q, where the expectation value is exactly equal to zero due to complete topological freezing.
a modified bias potential is superior to standard Metadynamics.This is clearly the effect of the higher effective sample size and number of independent configurations.In summary, our scaling tests in 2-dimensional U(1) suggest that parallel tempered Metadynamics with a modified bias potential has much improved topological sampling, equivalent to standard Metadynamics, while at the same time not suffering from a reduced effective sample size.There is some indication that the ratio of statistically independent to total configurations does reach a stable plateau in the continuum limit.

B. First results in 4-dimensional SU(3)
For our exploratory study in 4-dimensional SU(3), we turn to the DBW2 gauge action at β = 1.25 on a V = 16 4  lattice, which we have already used in Section V.For our first run, which is depicted in the left panels of Figure 12, we have combined a local 1HB+4OR measurement stream with a 4stout MetaD-HMC stream that dynamically generates the bias potential.Between swap proposals, updates for the two streams are performed at a ratio of 10 1HB+4OR update sweeps to a single unit length MetaD-HMC trajectory, which roughly reflects the relative wall clock times between the algorithms.One can see that the measurement run starts exploring other topological sectors almost as soon as the parallel run with active bias potential has gained access to them.In the later stages of the run, when the bias potential is sufficiently built up to allow the Metadynamics run to enter higher topological sectors, one can see that the swap rate is lowered by the action difference between the topological sectors, leading to an overall swap rate of ∼ 8 %.This effect mirrors the reduction of the effective sample size in pure Metadynamics updates and 12. Topological charge time series for our parallel tempered Metadynamics runs on a V = 16 4 lattice at β = 1.25 with the DBW2 gauge action and four steps of stout smearing in the definition of Qmeta.The left panels show results of our first run with a dynamically built bias potential, while the right panels show our second run with a modified static potential.The topmost row shows the time series of the topological charges in the respective measurement runs, while the second row is for the Metadynamics part (note the different y-scales).The third row displays the sum of the topological charges on the measurement stream and the stream including a bias potential, serving as an indicator for genuine transitions of the entire system into new topological sectors.In the bottom row, the running average of the swap acceptance rate with a window size of 2000 is displayed.
may be ameliorated by removing the quadratic term in the bias potential, as discussed in Section V B. In fact, the relevant point is that the action difference between the maxima of the bias potential for different topological sectors reflects the relative weight of these sectors in the path integral and should not be flattened out.Ideally, we want the bias potential to only reproduce the barriers between the sectors, not their relative weights.For a second exploratory PT-MetaD run, we therefore opted for a static bias potential of this sort.Lacking data that are precise enough to model the bias potential in detail, as we did in 2-dimensional U(1), we started from the bias potential of a previous Metadynamics run and extracted the high frequency (in Q meta ) part corresponding to the topological barriers, while eliminating the long range part corresponding to the relative weights of the topological sectors.For this purpose, we chose to perform a singular spectrum analysis [109] and crosschecked the result with a simple piece-wise subtraction of the Q 2 term between consecutive local maxima.As displayed in Figure 11, both methods result in a similar modified bias potential that seems to reproduce the intersector barriers rather well.The right panels of Figure 12 display the results of the corresponding PT-MetaD run.Notably, excursions to large absolute values of the topological charge in the stream with bias potential are now curbed, and the swap acceptance rate has increased to ∼ 21 %.In addition, the acceptance rate is approximately constant over the entire run, as it should be expected for a static bias potential.The resulting autocorrelation times are (35).We would like to emphasize that the bias potential we extracted is a rather rough guess.With a larger amount of data, it might be possible to extract a better bias potential, possibly leading to even higher acceptance rates.Considering the rather simple forms used to model the bias potential, it might also be possible to describe it with sufficient accuracy for good initial guesses at other run parameters.We plan to address these points in the future.
In any case, these first results clearly show that the parallel tempered Metadynamics algorithm is able to achieve enhanced topological sampling in 4-dimensional SU(3) without the reduction of the effective sample size that is typical for algorithms with a bias potential.

VII. CONCLUSION AND OUTLOOK
In this paper, we have proposed a new update algorithm, parallel tempered Metadynamics, PT-MetaD and applied it to 4-dimensional SU(3) gauge theory.In its simplest form, which we have investigated here, it consists of two parallel streams simulating the same physical system.One of the streams includes a fixed bias potential, which facilitates transitions between topological sectors.This bias potential can, for example, be extracted from a Metadynamics simulation of the system.The second stream can utilize any efficient conventional update algorithm that, by itself, may exhibit topological freezing.At regular intervals, swaps between the two streams are proposed and subjected to a Metropolis accept-reject step.We have demonstrated that in this way, the good topological sampling of the stream with bias potential carries over to the conventional update stream without compromising its effective sample size.When performing measurements exclusively on the second, conventional update stream, one therefore obtains topological unfreezing without any reweighting or additional modifications.The price to pay is the additional update stream including a bias potential, which requires minimal communication with the measurement stream and thus is embarrassingly parallel.The proposed algorithm may be helpful in overcoming potential barriers in more general cases without compromising the effective sample size.
We have demonstrated that PT-MetaD can unfreeze 4-dimensional SU(3) gauge theory at parameter values where conventional algorithms are frozen.Furthermore, scaling tests in 2-dimensional U(1) gauge theory indicate gains of more than an order of magnitude compared to standard Metadynamics in the total sample size and an improved scaling of autocorrelation times with the lattice spacing compared to standard update algorithms.We have also demonstrated that the buildup of the Metadynamics bias potential may be accelerated by running multiple Metadynamics simulations in parallel.
We believe these results are promising and plan to study the scaling behavior of the methods tested here in more detail for 4-dimensional SU(3) gauge theory, and eventually in full QCD.Conceptually, there seem to be no obstacles for implementing parallel tempered Metadynamics in full QCD.We also plan to explore possible optimizations for parallel tempered Metadynamics.These include optimizing the bias potential via enhanced buildup and extraction and, possibly, describing it parametrically.Furthermore, it would be interesting to investigate whether adding additional streams with the same or modified bias potentials could increase performance, despite the additional computational cost.This might be especially interesting for large scale simulations, where additional streams may offer a reduction of autocorrelation times in a trivially parallelizable manner.

Appendix B: Clover charge derivative
In order to obtain an expression for the force contribution from the topological bias potential in Equation ( 19), the algebra-valued derivative of the collective variable Q meta with respect to the group-valued fully smeared links has to be calculated first.Evidently, this part of the force calculation does not depend on the smearing procedure used.The subsequent stout force recursion [19,85] required to relate this term to the derivative with respect to unsmeared links is discussed in Appendix C.
Recall that the clover-based definition of the topological charge is given by where the field strength tensor is defined as and the clover term is given by C µν (n) = P µ,ν (n) + P ν,−µ (n) + P −µ,−ν (n) + P −ν,µ (n).(B3) For notational convenience, we introduce the auxiliary variables R µν (n) = C µν (n) − C νµ (n) and drop the specification of the lattice site n unless pertinent to the formula.
What we need for the force is the sum of the partial derivatives in all eight group directions.By using the cyclicity of the trace and the symmetry relation we can rewrite the sum of partial derivatives as Note that in the second line of Equation (B5) and the subsequent equations in this section, the index α is not summed over.
Applying the partial derivative operators, we get two nonvanishing contributions from every plaquette term.Using the same symmetry arguments as before, we can further simplify the expression to obtain the following result for the derivative:  In a similar way, expressions for the derivatives of improved definitions of the topological charge involving larger loops can be obtained.
(D1) For a traceless Hermitian matrix Q, this reduces to According to the Cayley-Hamilton theorem, Q satisfies its own characteristic equation: where we again follow the notation in [85]: The structure of Q restricts the range of c 0 : For analytic functions, Equation (D3) may therefore be used to reduce the series representation of the function to a polynomial of degree 2 in the matrix itself: More generally, analytic functions of n × n matrices can be reduced to polynomials of degree n − 1 using the same strategy.The complex coefficients f j may be expressed in terms of complex auxiliary functions h j and real variables u and w that are closely related to the eigenvalues of Q: The stout force recursion also involves terms related to the derivative of the exponential function that appear in Equation (C8).Specifically, the matrix polynomials B i are defined as The coefficients b ij are given by b 1j = 2ur (1) j + (3u 2 − w 2 )r (2) j − 2(15u 2 + w 2 )f j 2(9u 2 − w 2 ) 2 , (D22) j − 3ur (2)

z
FIG.1.Scaling of the integrated autocorrelation times of square 2 × 2 Wilson loops W2 (left) and the squared topological charge Q 2 c (right) for different update schemes using the Wilson gauge action.The scaling of both observables can be described using a power-law fit (Equation (20)) and is compatible with a dynamical critical exponent z ≈ 2 for the Wilson loops and z ≈ 5 for the squared topological charge.Details on the simulation parameters are listed in TableI.

FIG. 2 .
FIG.2.Time series of the topological charge for V = 164 , β = 1.25 using the DBW2 action.The configurations were generated with the 1HMC (top) and the 1HB+4OR (bottom) update schemes.Out of a total of 400000 configurations each, only a single configuration during the 1HB+4OR run and two (successive) configurations during the 1HMC run fulfill the condition |Qc| > 0.5.

2 cFIG. 4 .FIG. 5 .
Figure4shows the time series of the topological charge obtained from simulations with the HMC and the MetaD-HMC with five and ten stout smearing steps on a 224 lattice at β = 6.4035 using the Wilson gauge action.Both MetaD-HMC runs tunnel multiple times between different topological sectors, whereas the conventional HMC run essentially displays only a single tunneling event between sectors Q = 0 and Q = 1.The autocorrelation times of Q 2 c for the MetaD-HMC runs are comparable despite the different amounts of smearing used

FIG. 8 .
FIG. 8. Illustration of the PT-MetaD algorithm, with the two columns representing two simulation streams.The bias potentials are illustrated at the top, while subsequent rows represent the time series, with different shades of the configurations for better visibility.Q values are indicative and do not represent results from an actual simulation.

FIG. 10 .
FIG.10.Comparison of expectation values and uncertainties for the plaquette P , larger Wilson loops Wn, the topological charge Q, and the topological susceptibility Q 2 for variations of MetaD-Metropolis in 2-dimensional U(1) for 32 2 lattices and β = 12.8.The modification of the bias potential is discussed in Section V B. Dashed lines correspond to the exact solutions[40,[106][107][108].Results obtained from a run with standard Metropolis updates are off the scale everywhere except for Q, where the expectation value is exactly equal to zero due to complete topological freezing.

FIG. 11 .
FIG. 11.Comparison between the original bias potential and its trend subtracted modifications from singular spectrum analysis and piecewise subtraction of the Q 2 term.
FIG.12.Topological charge time series for our parallel tempered Metadynamics runs on a V = 16 4 lattice at β = 1.25 with the DBW2 gauge action and four steps of stout smearing in the definition of Qmeta.The left panels show results of our first run with a dynamically built bias potential, while the right panels show our second run with a modified static potential.The topmost row shows the time series of the topological charges in the respective measurement runs, while the second row is for the Metadynamics part (note the different y-scales).The third row displays the sum of the topological charges on the measurement stream and the stream including a bias potential, serving as an indicator for genuine transitions of the entire system into new topological sectors.In the bottom row, the running average of the swap acceptance rate with a window size of 2000 is displayed.

a νρσϵ 2
ανρσ Re tr[T a A ανρσ ] = 8T a Re tr[T a A α ].(B6)An expression of the above form can be rewritten using the projector induced by the scalar product of the algebra defined in Equation (A4):8T a Re tr[T a A α ] = −4P su(3) (A α ).(B7)Including the missing prefactors from Equation (B1) and Equation (B2), we obtain the following final result for the derivative of the clover-based topological charge with respect to the gauge link U α (n):T a ∂ a n,α Q c = 1 32π 2 T a ∂ a n,α ϵ µνρσ tr F clov µν T a ∂ a n,α ϵ µνρσ tr[R µν R ρσ ] = 1 512π 2 P su(3) (A α ).

TABLE II .
[4,5]ive performance of the update algorithms used in our scaling runs.The results cited here are taken from simulations on 224lattices.Note that the performance of the heat bath algorithm is slightly better for larger β[4,5].

TABLE III .
The dynamical critical exponents obtained from power law fits to the integrated autocorrelation times of Q 2 c , Sw, and Wilson loops of different sizes after 31 stout smearing steps for different update schemes and the Wilson gauge action.Notably, the dynamical critical exponents associated with Q 2 c are much larger than those associated with the smeared action or smeared Wilson loops of different sizes.
Table V shows the relative timings for the different update schemes used in this section, measured for simulations carried out on 16 4 lattices.While no sig-

TABLE V .
Relative timings for the different update schemes measured for simulations carried out on 16 4 lattices.The significant computational overhead for the Metadynamics updates compared to the other algorithms is due to the stout smearing and the stout force recursion required for the Metadynamics force calculation.

TABLE VI .
Normalized effective sample sizes for simulations carried out on different lattices on the same line of constant physics in 2-dimensional U(1) gauge theory.For each set of parameters, 10 7 measurements were performed with a separation of 10 update sweeps between every measurement.More details on the simulation setup can be found in Section V B.
. 9. Continuum scaling of the total sample size for the standard Metropolis algorithm and variations of MetaD-Metropolis in 2-dimensional U(1), including the use of a modified bias potential (see Section V B).The corresponding data can be found in TableVIand Table VII.Lines are drawn to guide the eyes. FIG