Multicanonical reweighting for the QCD topological susceptibility

We introduce a reweighting technique which allows for a continuous sampling of temperatures in a single simulation and employ it to compute the temperature dependence of the QCD topological susceptibility at high temperatures. The method determines the ratio of susceptibility between any two temperatures within the explored temperature range. We find that the results from the method agree with our previous determination and that it is competitive with but not better than existing methods of determining the temperature derivative of the susceptibility. The method may also be useful in exploring the temperature dependence of other thermodynamical observables in QCD in a continuous way.


I. INTRODUCTION
The axion solution to the strong CP problem, proposed more than four decades ago [1][2][3], solves the fine tuning problem of the smallness of the θ parameter in QCD by introducing a new light (< meV) pseudo Goldstone boson in the Standard model. Soon after it was proposed, it was also realized that long-wavelength axions would be produced abundantly early in the Big Bang, and can serve as a candidate for dark matter [4,5]. These proposals have heightened interest both in searching experimentally for the axion, and in better understanding its phenomenology and cosmological history; for a review of these topics see for instance Ref. [6]. An important input for the cosmological production of axions and for their contemporary properties is the QCD topological susceptibility χ top (T ), which determines the axion mass: (1) Here m A (T ) is the temperature dependent axion mass and f A is the axion decay constant. The low-temperature value of χ top is well determined [7,8], but the hightemperature regime determines axion production efficiency; it is particularly important to determine χ top (T ) in the temperature regime from 500 MeV to 1200 MeV [9]. In this temperature range, this can only be achieved by nonperturbative lattice investigations [10]. The most straightforward lattice methods, based on brute-force sampling of the gauge field configuration space, face difficulties in this temperature range because topologically nontrivial gauge field configurations become very rare, leading to a loss of statistical power. New methodologies are needed to overcome this problem. In recent years one such methodology has been developed [11,12] which provides access to high temperatures. In these calculations, it was shown that the difference of the expectation of the QCD action in two topological sectors can provide a determination of d(ln χ top )/d(ln T ), which can be integrated to provide the temperature dependence of χ top .
Alternatively, one can approach the problem at a fixed, high temperature by reweighting between topological sectors. A first attempt, based on a fixed guess for the reweighting function [13], explored temperatures around 500 MeV. Another method, in which the reweighting function is determined dynamically via an iterative selfconsistent technique, was introduced in [14] and further improved in [15], where it was shown to be effective in the pure-glue theory up to at least 7 T c .
Our main motivation for this work is to explore whether a new technique might improve existing methods for determining χ top . The method introduced here is related to but distinct from the technique of Refs [11,12]. We train a single Markov-chain Monte-Carlo simulation to explore a wide range of temperatures in a detailedbalance respecting way by replacing the weighting function exp[−βS] with exp[−W (S)], where W (S) is established by an iterative procedure such that the resulting ensemble can be reweighted to describe any temperature in a relatively wide range. We do this separately in the Q = 0 and |Q| = 1 (non-topological and instantonnumber ±1) sectors, and combine with a determination of χ top (T ) at one temperature to determine χ top across the full accessible temperature range. We show that the approach has a similar efficiency to [11], with both (slight) advantages and disadvantages relative to their approach. The technique can be extended to include fermions if the line of constant physics (lattice m(β)) is known. However we find that our approach developed in Ref. [15] seems to afford better numerical efficiency.
In the next section we review the definition of topological susceptibility and lay the groundwork for our technique. Section III introduces our technique to carry out a detailed-balance preserving Markov chain over a range of temperatures. Then Section IV presents our results and Section V closes with a discussion. arXiv:2103.01069v1 [hep-lat] 1 Mar 2021

II. TOPOLOGICAL SUSCEPTIBILITY
The topological susceptibility is defined as where Q = d 4 xq(x) and q is the topological charge density, V the spatial volume and T the temperature. In the continuum Q always takes an integer value while on the lattice one requires a refined definition of Q. Considering the continuum integer topological charge Q one can write the partition function as a sum over topological sectors: (3) The susceptibility is then given by : At low temperatures and/or large volumes, this sum will have important contributions from many N values. But at a sufficiently high temperature, such that V /T χ top (T ) 1, the numerator is dominated by N = 1 and N = −1 and the denominator is dominated by N = 0. Renaming Z 1 + Z −1 → Z 1 (the part of the partition function where Q 2 = 1, that is, Q = ±1) and considering a finite volume we find Here we have re-expressed the temperature and volume as they would appear in a lattice calculation, with lattice spacing a and lattice extents (N τ , N x , N y , N z ) in the temporal and the three space directions, and V L = N τ N x N y N z the number of lattice sites. The last line emphasizes that the lattice spacing is a function of the lattice gauge coupling β = 6/g 2 latt . The nontrivial relation between these two quantities is determined by a scale setting measurement.
In this work, we will present a method which determines the ratio of susceptibilities at two specified temperatures. With a fixed lattice extent in the spatial and temporal directions, the two temperatures will correspond to two gauge couplings β h and β c (h for hot and c for cold) which will in turn correspond to two different lattice spacings a(β h ) and a(β c ). The ratio of susceptibilities at two temperatures will then be given as: Our method will also allow for the determination of the above ratio for any pair of temperatures within the prespecified range. We will do so by computing the ratio of partition functions using two Monte-Carlo simulations: one that works within the Q = 1 topological sector and one which works within the Q = 0 sector, but with each simulation exploring the full range of β values, as we describe in the next section.

III. TEMPERATURE REWEIGHTING
In a given Monte-Carlo simulation, we intend to sample a range of temperatures continuously over a prespecified range in pure glue QCD regularized on a finite space-time lattice. We will accomplish this using a reweighting method very similar to the old proposal of Berg and Neuhaus [18]. In this section we first show how a reweighted Monte-Carlo can be used to determine the susceptibility; then we show how the reweighted Monte-Carlo can be carried out; and finally we show how to determine the reweighting function itself. In practice, the numerical work proceeds in exactly the opposite order.

A. Susceptibility using reweighting
In the standard Monte-Carlo simulation one evaluates the partition function Z(β) shown below at a particular gauge coupling β by generating gauge configurations with the probability distribution dP [U ] as Such a Monte-Carlo simulates a single temperature which is controlled by the gauge coupling β since the lattice dimensions are fixed. In order to simulate a range of β values and therefore a range of temperatures, the sampling weight in the usual Monte-Carlo in Eq. (7) is replaced with The weight function W [S] is a general function of the action S, and its derivative can be interpreted, approximately, as the gauge coupling β to use at a given action value: In this sense, the usual Monte-Carlo has simply W [S] = βS with the constant slope simulating the fixed coupling. We show how to cover a range of temperatures schematically in Figure  is not known a priori; we will return to its determination shortly, but will first discuss how a simulation with such a weight function can be carried out and used. Once W [S] is known, a Monte-Carlo simulation with W [S] generates an ensemble which can be reweighted to determine expectation values at a given β value β 0 via where i indexes the sampled configurations. In practice this partition function is used to determine expectation values for operators via By establishing two reweighting functions W [S] and W Q [S] for the Q = 0 and |Q| = 1 ensembles respectively, we can generate multi-temperature ensembles, labeled by i and iQ, which respectively sample the Q = 0 and the |Q| = 1 ensembles across temperatures. The ratio needed in Eq. (6) is then given by There are two subtleties associated with this expression. The first is that Eq. (10) only determines the partition function up to an overall multiplicative factor. However, this multiplicative factor cancels between the numerator and denominator expressions computed from the same sample. The second subtlety is that the partition function Z also has severe lattice-spacing dependent renormalizations, which we cannot easily compute. Fortunately, these cancel because each lattice spacing occurs once in the numerator and once in the denominator in Eq. (12).

B. Update algorithm with W [S]
In this section we explain the algorithm to perform a Monte-Carlo simulation with weight function e −W [S] . Our aim is to generate a sampling with probability distribution where S[U ] is the standard lattice gauge action and we assume that W [S[U ]] is a known differentiable function of the action S. Simulating such a weight requires a slight modification of the standard Hybrid Monte-Carlo (HMC) algorithm [19]. 1 As in the standard HMC algorithm, we introduce canonical momenta π µ for the link variables U µ , and define a Hamiltonian for this system as where S U in Eq. (15)  1. Picking a random canonical momentum π µ (x) from a Gaußian ensemble independently for each of the elements of the Lie algebra.
2. Solving the following Hamilton equations of motion (shown here schematically): for a total time t 0 . The derivative with respect to U µ is a Lie derivative and in this sense these equations are schematic representation. Here the time t is a fictitious variable under which the Hamiltonian H is conserved.
These Hamiltonian equations are discretized using a time-symmetric solver such as the leapfrog or Omelyan algorithms. Under these algorithms, one iteratively solves Eq. (16) for all link variables U at fixed π, and then solves Eq. (17) for all π variables at fixed U . Before applying Eq. (17), we must compute S U and use the (instantaneous) value of W [S U ] in place of the usual factor of β at each time step during the update.
The use of a time-symmetric algorithm is essential, since it ensures the property that, if the pair For the case of a Q = 1 simulation, an additional accept-reject step is needed, in which we check to see whether the configuration has fallen down into the Q = 0 sector and reject the update if this is the case. In practice we can buffer every N 'th configuration and only perform this step after every N HMC update steps, reverting to the last buffered configuration when the check fails. We define Q as the lattice sum of an a 2 -improved topological density definition after τ F = 2.4a 2 units of Wilson flow, as in [15]. We find that values of N = 5 or N = 10 are adequate to preserve a good acceptance rate.
Lastly we remark on the optimal length of the individual trajectories. The figure of merit for trajectory length is the mean-squared change in S U per unit numerical effort. The numerical effort is approximately linear in trajectory length t 0 . For a short trajectory, ∆S U is linear, and (∆S U ) 2 quadratic, in trajectory length; but beyond a certain (fairly short) trajectory length the action change saturates. Therefore we started with a study of how (∆S U ) 2 varies with trajectory length and chose the value which maximizes (∆S U ) 2 /t 0 ; t 0 0.75a. A single trajectory leads to a change of (∆S U /S U ) 2 3/N dof where N dof = 24V L is the number of lattice degrees of freedom (3 polarizations and 8 colors per site). Therefore, for a well-chosen W [S U ] function, since changes to S U accumulate in a Brownian fashion, the number of updates needed to explore the full β range is of order

C. Choice and determination of W [S]
Now we return to the question of how to determine the weight function W [S]. We start by choosing the range of β values we want to explore, β ∈ [β min , β max ]. A short fixed-β Markov chain establishes S max = S U βmin and S min = S U βmax . We then follow our procedure in [15,20] and choose a discrete set of values S i , i = (0, . . . N i ) with S 0 = S min and S Ni = S max ; W [S U ] will be determined by its values W [S i ]. However, because the update described above works best when both W [S U ] and W [S U ] are continuous functions, we shall interpolate W [S U ] between these points using a cubic spline function, rather than using a piecewise linear function as in our previous papers. We extend W [S] above S max by choosing W [S > S max ] = β min and similarly we set W [S < S min ] = β max ; these are also the values of the first derivatives used in completing the definition of the spline function.
We will use the same automated improvement scheme to determine W [S i ] as we introduced in Ref. [20] (see also [21]). We generate a Markov chain using the update approach of the previous subsection, but after each update, we adjust the function W [S U ] so as to make the current S value less likely (implying that the current S is oversampled). This is done by  The value s r is initially chosen so that, in the time it takes for the Monte-Carlo to go from the top to the bottom and back (see Eq. (18)), the W [S i ] will change by of order 100. Each time the S-value makes its way from the first interval to the last and back (which we call a "sweep"), we reduce s r ; at first we reduce it by a factor of 2, but after the average W [S i ] changes by less than 1 per sweep, we change it by a reduced amount. The update ends when five sweeps change the average W [S i ] by a total of less than 1. This approach is inefficient if the initial W function is very far from its final form. Therefore we improve the initial guess from the simplest approach (that W [S] is linear). Instead, we choose an intermediate β value β mid and perform a fixed-β Monte-Carlo calculation to determine the associated S mid value. We then fit W to a quadratic, with values (β max , β mid , β min ) at the points (S min , S mid , S max ), integrate, and use this to determine starting guesses for the W [S i ]. With this approach, the initial and final W functions differ by less than 100, as shown in Figure 2, which displays the difference between the initial and final W function choice for the specific lattice and β range described in the next section. 2  complete built W is shown in Figure 3. Finally, one must determine W [S U ] for the Q = 1 sector. Here we can take as an initial guess the W [S U ] value determined in the Q = 0 sector. To further refine this guess, we shift it by where T is the temperature associated with the β value described by the slope W using a scale-setting relation between lattice coupling β and temperature T , and T 0 is a reference temperature which could for instance be the temperature at β min . The factor 11 is the expected temperature dependence of χ top a 4 when the lattice spacing a varies as 1/T , at leading perturbative order. Again, Eq. (19) is only used to refine the initial guess for W Q=1 [S]; we again perform an automated Wchanging Markov chain to improve this guess; however the initial s r value can be made much smaller. After the W [S U ]-setting Markov chains are completed, we freeze the values of W Q=0,Q=1 [S] and use them in detailed-balance respecting Markov chains which we will use to determine the susceptibility as described in Subsection III A.

IV. RESULTS
In this section we present results of a simulation with aforementioned W (S) sampling. Our goal is only to test the method; we will not study multiple lattice spacings to attempt a continuum limit. Instead, we choose one lattice geometry from among the geometries studied in Ref. [14,15], and which we can therefore use to compare  our results for susceptibility ratios to those obtained via a competing technique. Our goal is to establish which approach, the one described here or the one described in Ref. [15], is more efficient at establishing the topological susceptibility at high temperature. We choose to investigate a lattice with temporal extent N τ = 10 and spatial extent 32 2 × 36, with (β min , β max ) = (6.9076, 8.01951), which corresponds, according to the scale setting calculation of Ref. [22] which we will use throughout, to T βmin = 2.5 T c and T βmax = 9.4 T c . This scale-setting relation involves applying a fit to scalesetting data beyond the range where the reference has performed simulations, that is, an extrapolation, which means it may not be absolutely trustworthy. However, by expressing our results in terms of χ top (β)a 4 (β), we can remain agnostic about the relation between T and β and just determine how accurately we can determine χ top a 4 as a function of β for this specific lattice geometry. Getting continuum results over a range of temperatures will then of course require multiple lattice N τ values and a reliable scale setting. However, for the time being our goal is just to evaluate the precision-to-numerical-cost ratio of the technique, so we leave this problem for later. The number of updates used in each procedure are listed in Table I. Note that an unfortunate choice of s r made the Q = 1 building unnecessarily inefficient; a more careful choice should have required a fraction as many trajectories, so that the trajectory count would be dominated by the measurements.
We begin with a check that our W [S] function cor-T /Tc β ln(χtopa 4 (T )/χtopa 4 (2.8 Tc)) 1-σ stat. err 3.5 7.  [22], and our results for the log susceptibility ratio and its 1-sigma statistical error. rectly generates a rather uniform sample of configurations across the desired β range. We investigate this by plotting a histogram of the S values measured during the sampling Markov chain, both for the Q = 0 and the Q = 1 ensembles, shown in Figure IV. The sample is adequately uniform.
Our main results are presented in Figure 5, which shows how χ top a 4 changes as a function of β across the range we study, evaluated from our Markov chains using Eq. (12). We have chosen to use a value near the beginning of the β range (T = 2.8 T c ) as the low temperature and to express all other temperatures in relation to this one. The figure, and further data presented in Table  IV, show that the error bars are smallest between nearby β values and grow to around ±0.35 in ln(χ top ) for the widest-separated temperatures. Our results agree within error bars of the results in Ref. [15] for those values where they are directly comparable.

V. DISCUSSION
We have shown that the method we propose can successfully find the β dependence of χ top a 4 , and therefore the temperature dependence of the susceptibility if the line of constant physics (that is, a(β)) is known. This determines χ top over a range of temperatures if it is known at the lowest temperature, which is where it is most easily determined by other approaches.
There are two key questions. Is it more or less effective than the rather similar approach of Refs [11,12]? And how does it compare with the approach of Ref. [15]?
The approach of Ref. [11,12] computes d ln(χ top a 4 )/dβ at several β values, which it integrates to determine χ top a 4 (β). We review this approach and compare it to our own in an appendix. To summarize, in a highstatistics determination, the approaches have essentially the same numerical precision. However, if lower precision is desired, the numerical cost associated with building the W (S) functions in our approach is essentially "dead weight" which does not contribute to the statistical power. The other approach does not suffer from this problem and so it is more efficient for a low-statistics determination. Our approach has the advantage that it automatically includes all intermediate temperatures, while the alternative bases the determined d ln(χ top a 4 )/dβ on a finite set of values which may leave discrete integration errors. But it is not difficult to use enough values to render this a minor concern.
Finally, we want to compare the numerical efficiency to the method of Ref. [15]. Fortunately, the single lattice we investigated in this work was also used in that reference, and we can directly compare the error on χ top a 4 (β h )/χ top a 4 (β c ) found here with the error on the same quantity found there, along with the number of HMC trajectories needed in each case. In that reference χ top (T ) was determined at β = (6.90097, 7.30916, 7.76294), corresponding to T = (2.5, 4.1, 7.0) T c , a little narrower than the range considered here. The three determinations required a total of 9.2×10 6 trajectories, about half the number which should have been needed here. The average trajectory length used was also shorter in that reference than what we used here. The the final errors on ln(χ top ) in that study, for this lattice, were (0.09, 0.09, 0.08) at the three temperatures. In comparison, in comparing 2.8 T c to 7.0 T c we find statistical errors of 0.30. To reduce these errors to the level of the other study would therefore require about 10 times more statistics in our measurement runs, indicating that the present method is of order 10 times less numerically efficient. Moreover, Ref. [15] finds that the number of trajectories needed for a given statistical error barely changes as one increases the volume (larger aspect ratio) or makes the lattice finer (larger N τ at fixed aspect ratio), whereas we know that the number of updates needed for the method described in this paper should scale with the number of lattice sites, see Eq. (18).
We conclude that our method is at least 10 times less efficient than the single-temperature reweighting approach of [15], and will become still less efficient as one goes closer to the large-volume and continuum limits. As we understand it, this also implies that it should be eas-ier in principle to achieve small statistical errors with the approach of [15] than with the approach of [11,12].

ACKNOWLEDGMENTS
The authors acknowledge support by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through the CRC-TR 211 "Strong-interaction matter under extreme conditions" -project number 315477589 -TRR 211.
We also thank the GSI Helmholtzzentrum and the TU Darmstadt and its Institut für Kernphysik for supporting this research. Calculations were conducted on the Lichtenberg high performance computer of the TU Darmstadt. This work was performed using the framework of the publicly available openQCD-1.6 package [23].

Appendix A: Comparison with the slope method
Our approach is closely related to the approach of Refs. [11,12], and as we understand it, the errors per numerical effort are nearly the same. To explain this conclusion, we start with a quick review of their approach, and then look at the issue of statistical power in each approach.
Their approach also seeks to compute Eq. (6) and then use a determination of χ top (β c ) to determine χ top at other temperatures. Taking the log of Eq. (6) we find ln Now note that the β dependence of ln Z is set by the expectation value of the action. Therefore and the ratio we want is ln (A4) Refs. [11,12] evaluate βS in both Q = 0 and Q = 1 ensembles at a number of β values, which are then used to estimate this integral by, eg, the trapezoid rule.  (12). Therefore the real difference between the approaches is whether Eq. (A4) is estimated based on interpolating results for several temperatures, or using a single Markov chain which spans all temperatures. Now consider the statistical power of each approach. The accuracy of a Monte-Carlo evaluation of βS is set by the variance of βS and the number of independent configurations used. The variance should be reasonably approximated as that for N dof Gaußian random variables: σ 2 βS N dof /2. Therefore order-1 errors in βS require N dof /2 evaluations. Since the expectation value determines an integrand, this is multiplied by the integration range, so N dof /2 evaluations return an error in the ratio of partition functions which is of order ln(β max ) − ln(β min ). Evaluating βS at multiple β values leads to a larger error at each evaluation, but because each is responsible for a narrower ∆β range and the errors are uncorrelated, the final statistical uncertainty is independent of the number of β values used in the evaluation and depends only on the total number of Markov steps and the width of the β range considered. The final error estimate is ∆ ln χ = ln(β max /β min ) N dof /2N updates . The error rises by √ 2 and N updates is doubled when we recall that separate simulations are needed in the Q = 0 and Q = 1 sectors.
In comparison, we see in Eq. (18) that our approach can explore the full β range, leading to order-1 errors in ln χ, in N updates ∼ N dof ln 2 (β max /β min ). Therefore the two approaches produce errors per unit numerical effort which are the same up to an order-1 factor. In a numerical experiment on a toy problem (N independent Gaußian random variables x with action S = x 2 /2) we find that the order-1 factor is in fact 1, so the two approaches have the same statistical power per compute time, provided that W [S] is well determined and neglecting the computational effort expended in evaluating it.
We should also remark on how each approach is extended to full (unquenched) QCD. In each case the main challenge is dealing with the way quark masses must be varied with the lattice spacing and therefore with β: m = m(β) (which must also be determined as part of the scale setting). This added β dependence changes Eq. (A2), replacing βS → βS + β dm/dβψψ . In our approach one must replace W [S] → W [S] +ψ( / D + m(W [S]))ψ where we use W in place of β for the scale dependence of m. This amends Eq. (17) by the addition of the standard fermionic force term and by the replacement dW/dS U → (dW/dS U ) + (dm/dβ)(dW /dS U ) ψ ψ where ψ ψ is the sum of theψψ value over all sites in the current configuration. Finally, in Eq. (12), the W − βS reweighting must be complemented by a determinantratio from the S-dependent mass to the physical mass for the desired β value.