$\chi_{\textrm{top}}(T \gg T_{\textrm{c}})$ in pure-glue QCD through reweighting

We calculate the topological susceptibility at 2.5 Tc and 4.1 Tc in SU(3) pure Yang-Mills theory. We define topology with the help of gradient flow and we largely overcome the problem of poor statistics at high temperatures by applying a reweighting technique in terms of the topological charge, measured after a specific small amount of gradient flow. This allows us to obtain a sample of configurations which compares topological sectors with good statistics, with enhanced tunneling between topologies. We quote continuum extrapolated results at these two temperatures and conclude that our method is viable and can be extended without new conceptual problems to the case of full QCD with fermions.


Introduction
Two of the most challenging problems in particle physics are the strong CP problem and the origin of dark matter. The axion [1,2], a hypothetical light scalar particle which appears in the Peccei-Quinn mechanism [3,4], could solve both problems at once: The additional degrees of freedom explain that the CP violating phase Θ QCD in the QCD Lagrangian vanishes [4] and the corresponding particle could play the role of dark matter in the universe [5][6][7].
There is currently a lot of experimental effort to detect axions; for a review on this we refer to Ref. [8]. From a theory point of view, the properties of the axion are sensitive to the topological susceptibility of QCD χ(T ), defined as with q(x) the topological charge density and Q = d 4 x q(x) its integral (defined below). While the topological susceptibility at low temperatures is well established [9], calculations become much more challenging at high temperatures, and axion cosmology requires knowledge of χ(T ) at temperatures up to about 7 T c [10,11]. Recently there has been a burst of progress in determining χ(T ) at high temperatures [12][13][14][15][16][17][18]. However, we feel that it would still be valuable to make an independent study of topological susceptibility which reaches up to 7 T c . At high temperatures topology is expected to be dominated by rare single instanton [19,20] (really caloron [21]) configurations with a typical size ρ ∼ 0.4/T [22]. These configurations become more suppressed as one considers higher temperatures, by χ(T ) ∝ T −7 (at lowest perturbative order in pure-glue gauge theory; with light fermions there is an additional factor of T −N f/3 ). This makes studying topology by lattice Monte Carlo simulations challenging; in an ensemble of high-temperature gauge theory configurations, virtually none of the configurations will possess topology, leading to severely limited statistics. For instance, if we keep the number of temporal points across the lattice fixed, the instanton density in terms of lattice sites vanishes as T −11 . Furthermore, the efficiency with which a Markov chain algorithm samples these topological configurations will be additionally suppressed, because the chain must pass through "small" instantons (or dislocations) to move between distinct topological sectors, and these dislocations get rarer with decreasing lattice spacing as a −11 .
One way around this problem is to measure topology at a low temperature where instantons are not rare, and to work up to high temperatures differentially by studying fixed-topology ensembles [12,18]. But we feel it is important as a cross-check to be able to perform a direct study of topology at high temperature. This will require a reweighting procedure to overcome the sampling challenges. Our goal in this paper is to present such a reweighting approach. Since this work is exploratory, we will content ourselves with a study of the quenched (or pure-glue) theory. Once the technique is well established, we see no obstacles in adapting it to the unquenched case (though there will be the usual increase in numerical cost associated with going from pure glue to unquenched). This paper is organized as follows: In Sec. 2 we discuss in detail the method that we use to enhance the number of instantons in the lattice simulations, namely a combination of gradient flow and reweighting. Results of the topological susceptibility of the quenched theory up to 4.1 T c obtained with this method are presented in Sec. 3. Conclusions and a discussion can then be found in Sec. 4.

Method
The statistical problem of calculating the topological susceptibility at high temperatures on the lattice is twofold. First, the quantity is expected to be physically very small which will result in most of the configurations having Q = 0. Only by collecting a huge amount of statistics, a meaningful statement about expectation values involving the topological charge can be made. This problem becomes more severe as one goes to higher temperatures. And in addition, the algorithm itself used to generate the sample (usually heatbath/overrelaxation or HMC) tends to get stuck in the different topological sectors, with tunneling events between sectors more and more suppressed as one takes the continuum limit. Here we show how to use reweighting to overcome both problems.

Definition of Reweighting
Our reweighting approach is an evolution of those in Refs. [23][24][25][26]. Since on the lattice the topological charge is not restricted to integer values, rare events exist that will enable tunneling between different sectors. The goal is then to enhance those and use them to generate a sample of configurations almost homogeneously distributed across the topological sectors of interest. At the same time, it is mandatory to be able to know by how much they were enhanced so that this effect can be removed at the end without losing the statistical power. In the following, we describe one way of achieving this for the case of pure SU(3) Yang-Mills theory with periodic boundary conditions. The non-perturbative approach of lattice gauge theory is based on a stochastic evaluation of the partition function if the ensemble is distributed according to the modified probability distribution Notice that it is guaranteed that Eqs. (2.3) and (2.5) yield identical results for any choice of the reweighting function W as long as our algorithm converges to Eq. (2.6) and N → ∞.
The argument ξ is an arbitrary set of reweighting variables which need to be measured on each produced configuration.

Choice of Reweighting Variable
If chosen correctly, reweighting variables can account for a clear distinction between different phases and therefore favor or suppress certain sectors in Monte Carlo space. A natural choice is then the topological charge itself: (2.7) HereF µν (x) is a lattice discretized form of the field strength. The conventional choice is the "clover" value (the average over the four square plaquettes touching the point x), but we use an a 2 -improved choice composed of a linear combination of squares and 1 × 2 rectangles [27], specificallŷ However, using Q directly on the original configuration actually fails, because the topological density contains high-dimension operator corrections which are not topological and which receive large random additive contributions. The solution is well known; we should apply some amount of gradient flow [28,29] to remove the UV fluctuations responsible for this problem. We therefore define our single reweighting variable ξ as where t denotes a relatively small amount of Wilson flow. Specifically, we choose t to be enough Wilson flow that topology-1 configurations are clearly distinguished from random fluctuations, but not enough to remove "dislocations," small concentrations of topological charge which are the intermediate steps between the Q = 0 and Q = 1 sectors. Therefore, Q is able to distinguish between fluctuations about the Q = 0 sector, dislocations which lie between topological sectors, and genuine Q = 1 configurations. The true topological charge of the configuration is denoted by Q and is measured after a larger amount of flow. We shall come back to this distinction in Sec. 2.5.

Updating with an Arbitrary Weight Function
Next we describe the Markov chain algorithm whose equilibrium probability distribution is assuming that the function W (Q ) is already known. Although it is common practice to use the heatbath/overrelaxation algorithm in the context of pure gauge theories, the hybrid Monte Carlo algorithm (HMC) supports a conceptually simple fermionic extension, so we will use it instead. One of the simplest ways of producing a sample according to a given probability distribution is to use a Metropolis-inspired algorithm. This algorithm fulfils detailed balance and therefore the Markov chain has an equilibrium distribution to which the system converges if enough updates are done. After having evolved the Hamilton equations as part of the standard molecular dynamics evolution, an accept/reject step accepts the configuration with probability P HMC = min 1, e −∆H , (2.11) where ∆H = H f − H i is the energy difference (the subscripts 'i' and 'f' refer to 'initial' and 'final', respectively). It is given by . This step alone is of course not sufficient for incorporating reweighting. Therefore, the configuration cannot be fully accepted yet. An additional reweighting accept/reject step in terms of Q accepts the configuration with probability where ∆W = W f − W i . In total, the transition probability P (C i → C f ) is given by The probability P G (π i ) ∼ e − 1 2 π 2 i with which the conjugate momenta are chosen is drawn Gaussian as usual. The probability P M refers to the Molecular dynamics evolution which is a deterministic process. Therefore, P M can be seen as a δ-function that evolves the fields from (π i , U i ) → (π f , U f ) with unit probability. Our procedure can be summarized as follows: 2.1 If accepted, store the candidate configuration.
2.2 If rejected, return to the initial current configuration (go to step 1).
3. If 2.1 is true, integrate the flow equation up to flow time t and perform a Metropolis step in terms of ∆W (Q ).
3.1 If accepted, return to the unflowed candidate configuration and fully accept it.
3.2 If rejected, return to the initial current configuration (go to step 1).
We have described the algorithm assuming a known reweighting function W (Q ). In the next section we shall address the question of how to choose W (Q ) such that the final sample is, in the best case scenario, homogenously distributed across topological sectors.

Building the Reweighting Function W (Q )
As explained in the previous section, the a priori knowledge of the reweighting function W is mandatory to implement reweighting. In this section we describe how to find an optimal choice in a completely automated way. Our approach is similar to Refs. [25,26].
We perform two HMC Markov chains; one to determine W and one to apply the (now fixed) W function to perform our actual Monte Carlo study. Here we describe the preparatory Markov chain which determines W . This preparatory run consists of reweighting updates as described in Subsec. 2.3 with the only difference that the function W is updated after each trajectory. In this way, we are able to force the system to visit certain sectors in Monte Carlo space that are rare and, at the same time, avoid those that already were visited quite often.
First, we need to define a reweighting domain Ω rew . This is the interval in Q where W will account for reweighting. A natural choice is (−Q max , Q max ), where Q max is the highest integer value corresponding to the highest topological sector that we want to include in the reweighting sample. Since we are ultimately interested in Q 2 , we can make use of the symmetry Q → −Q and redefine the reweighting variable as the absolute value ξ = |Q |. In this way our reweighting domain is (2.14) We divide this domain into N int intervals, 0 < Q 1 < Q 2 < ... < Q N int = Q max , and we name the interval between Q i and Q i+1 , ω i . We define W (Q ) by giving it definite values at each Q i and interpolating linearly between these values; that is, W (Q ) is taken as piecewise linear. The last interval, In other words, values above the top edge of our reweighting domain are not rejected; they are just not reweighted any higher than the boundary value of the domain. To summarize, Having defined our reweighting function in the domain of interest, we can start making reweighting updates. After letting the system thermalize with ordinary HMC updates, we begin building the reweighting function with reweighting updates. We start with a constant function W (Q ) ≡ 1. 1 Our philosophy is that, whatever value of Q we currently have, this value is presumably oversampled, and should be made less common by reducing W (Q ) at the current value. Because W is piecewise linear, the most local change we can make is to change the values at the two edges of the current interval. If Q ∈ ω i with i = 0, only the corresponding values W i and W i+1 are changed according to while the rest of the function remains unaffected. This procedure is illustrated in Fig. 1.
The first interval, Q ∈ ω 0 , is a special case. We need to remember that Q is strictly positive and therefore W 0 will get updated less than the rest of the points. We correct for this via The value of s controls by how much W changes each update. As soon as the gross features of W arise, we decrease its value to slowly reach convergence. In order to do so it is instructive to introduce the notion of complete sweep. We refer to a complete sweep when the reweighting variable |Q | ranges from ω 0 to ω N int −1 AND back to ω 0 . We count the number of updates needed to accomplish this and name it M . There is no need that it visits all intervals. The combination δW = sM/N int tells us how much in average one point of W has been changed during the completion of the last sweep. After each complete sweep we compute δW and reduce s to s → max s /2, s 1 − δW . Therefore a sweep which changes a point in W by of average more than 1.5 will lead to s being cut in half, while a sweep which changes a point in W by a smaller average amount will result in reducing s less. After the value of s has been updated, one resets the counter of M back to zero waiting for the next completed sweep to appear and repeats the process.
Eventually, after several sweeps, once the gross features of W have arisen, δW will get small since s is being consistently lowered, and the value of M also should get smaller since less trajectories are needed for a completed sweep to appear (that is the whole idea of this update). We consider the procedure to be complete and W (Q ) to be ready for use in a Monte Carlo study when δW < 0.1. An animated GIF of how W (Q ) evolves in this process is included in the additional materials.
Our approach bears some similarities to the "metadynamics" approach [30][31][32] which has also been considered (but to our knowledge not yet implemented) for this problem. The main difference is that, in metadynamics, the W (ξ) function is included as part of a "force" term in an HMC evolution, whereas we implement it purely through a Metropolis accept-reject step. The force-term approach is more efficient since HMC trajectories can be longer and because the acceptance rate is higher. But a Metropolis implementation is much more flexible in terms of the variables ξ which can be used; they need not have a simple closed-form field derivative. 2 There are also some differences in the form of W (ξ) and in the way W (ξ) is updated. The biggest difference is that we perform a separate Markov chain with W (ξ) fixed, rather than trying to extract information from the trajectory where W (ξ) is still being tuned. This makes the statistical interpretation of our results more robust.

Parameters to Tune
The procedure described in the last section allows for the automated determination of W (Q ), allowing for an efficient reweighting. But several parameters are still to be determined, and we found in practice that a certain amount of hand-tuning was needed to select them.
First, there is the depth of gradient flow to use in establishing Q . (In practice we actually used stout smearing [33] with step-size 0.06 as our gradient flow algorithm. This would be totally inadequate if our goal were a precision study of flowed operator expectation values, but here it is only important to suppress UV fluctuations, so a more efficient if less careful implementation of flow should be adequate.) We found that t = 0.24 is insufficient to separate configurations of different topology, while t = 0.42 is enough; larger amounts of gradient flow start to destroy the dislocations, which makes it more difficult to find the configurations intermediate between topological sectors. Optimally, one should perform several beginning-to-end determinations of χ, each on the lattice and temperature, but each time using different t values, to do a systematic study of which choice leads to the highest statistics for a fixed computational effort; but we have not done this.
Second, there is the choice of the number and location of intervals. We found 20 intervals to be adequate except that there were two "corners" in the W (Q ) function where its slope rather abruptly changes, see Fig. 3. These appear to be the points where the dominant type of configuration changes (regular thermal configuration to dislocation, dislocation to full sized caloron), and the Monte-Carlo simulation tends to get stuck at these points. We partly cured this by using more, narrower intervals at these points, which handles finer structure in the reweighting function and also leads to the algorithm spending more time near these points. Unfortunately, this requires a little hand-tuning, working against the idea of a fully automated determination.
Next, there are the details of the parameter s which controls how fast we adjust the W (Q ) function. We tried variations on the procedure described above and found little change to the efficiency with which a good W (Q ) is generated. In any case, if high statistics are desired, the Monte Carlo with fixed W (Q ) takes most of the computational effort.
Next, there is the length of molecular-dynamics time used in the HMC algorithm updates. A larger HMC step leads to a larger change in the configuration, which is good because it more efficiently explores the phase space. But it leads to larger changes in Q value and therefore to a higher rejection rate. So the HMC trajectory length needs to be tuned to provide about 50% acceptance rate in the e ∆W acceptance step. Again, 50% is a rule of thumb; a more careful analysis would compare the total achieved statistics at fixed numerical effort as a function of HMC trajectory length. To date we have not carried out such a study, and have instead used the 50% rule of thumb. Our results in what follow used HMC trajectory lengths of 0.2 − 0.25 a. We are well aware that such a small trajectory length will result in big autocorrelation effects between configurations. We have taken special care in providing a reliable error estimate by making a careful error analysis based on binning and jackknife [34].
Finally, there is the choice of the final observable used to determine χ(T ). Every 100 HMC trajectories, we make a measurement of Q which we use in our statistical analysis of the susceptibility. We use the value of |Q| after some amount t of gradient flow, thresholded to be 1 if |Q| ≥ Q thresh and 0 if |Q| < Q thresh . (Configurations with |Q| > 1 are very rare at the temperatures and for the volumes of interest, as we will establish in the next section.) This leaves open the exact choice of t, of Q thresh , and of gradient flow procedure (Wilson versus Zeuthen [35]). All choices should lead to the same continuum limit and it is not expensive to sample using various choices and compare. This is what we will do; any difference between χ(T ) values due to different threshold or flow depth will indicate deficiencies in our lattice spacing, and must be seen to vanish when we take the continuum limit.

Results
Our goal in this paper is to demonstrate our method and show that it can obtain statistically powerful results at high temperatures in a range of lattice spacings and volumes. With this in mind, we study 10 different lattices, as listed in Tab. 1. We use the Wilson gauge action at two temperatures corresponding to 2.5 T c and 4.1 T c ; the values of β are taken from Refs. [36,37]. At the higher temperature we consider aspect ratios between 1:1 and 4:1 with N τ = 8 and at both temperatures we consider lattice spacings with N τ = 6, 8, 10 with an aspect ratio of about 2.5:1. This allows one study of the volume scaling, and lets us take the continuum limit, but it is not sufficient to consider both limits simultaneously. All calculations were carried out over a six month period on one eight-core desktop machine and one server node with two Xeon-Phi (KNL) CPUs. By modern standards this is an extremely modest computational budget. The first question is: is it sufficient so sample only Q = 0 and |Q| = 1 sectors, or are larger values of |Q| also important in establishing the topological susceptibility? To study this, first look at Eq. (2.5) and consider what happens when we use a thresholded Q as the observable: where in the numerator we only have to sum over |Q| = 1 and higher configurations since Q = 0 does not contribute, while in the denominator we only sum over |Q| = 0 because they completely dominate the ensemble. Clearly we need both Q = 0 and |Q| = 1 configurations to perform the calculation; but if our accuracy goal is 10%, then we only need |Q| = 2 and higher if the total probability to be in one of these states is at least 2.5% of that for |Q| = 1 states. Therefore we carried out the construction of W (Q ) in the domain 0 ≤ |Q | ≤ 2, shown in Fig. 2, for the lower temperature we study and a 6 × 16 3 lattice (Lattice A 1 ). We see immediately from the figure that |Q | > 1. our t values the |Q| = 1 values all have |Q | > 0.7 and |Q| = 2 values all have |Q | > 1.5, so this means that |Q| = 2 configurations are suppressed relative to |Q| = 1 configurations by about e −9 . Therefore |Q| = 2 plays a tiny role in Eq. (3.1) and can be safely ignored. In a larger volume, an instanton gas estimate says that the |Q| = 1 configurations should get more common with V and the |Q| = 2 configurations should get more common with V 2 . So |Q| = 2 would start to become relevant in a box with an aspect ratio of about 15. Such enormous lattices are not needed to study χ(T = 2.5 T c ), and so we do not need to consider |Q| ≥ 2. This conclusion only strengthens for larger T where the susceptibility is still smaller. Obviously, at some lower temperature it will break down and we will need many topological sectors; so every time we go to lower temperatures we must revisit this issue.
We proceed to compute the reweighting function W (Q ) using Q max = 1. Two examples are shown in Fig. 3. In each case there is a deep minimum at Q = 0 corresponding to ordinary Q = 0 configurations and a much shallower minimum near |Q | = 1, corresponding to |Q| = 1 configurations. The broad plateau in between can be understood as configurations containing a dislocation. The sharp features in the 4.1 T c plot are caused by our abruptly adjusting the width of our intervals. At finer lattices (larger N τ ) the |Q| = 1 minimum becomes deeper (or more accurately, the barrier gets higher), as the size of physical calorons becomes more different from the lattice spacing.
With the W (Q ) reweighting functions in hand, we proceed to evaluate the topological susceptibility via Eq. (3.1). For completeness, we present all of our results in Tab. 2.  Lat 11.16 (14) 11.24 (14) 11.07(13) 11.12 (14) 11.15 (14) 11.24(14) B 3 11.76 (17) 11.80 (17) 11.72 (17) 11.74 (17) 11.76 (17) 11.80(17) First, consider χ as a function of aspect ratio, shown in Fig. 4. The figure evaluates Q after t = 2.4a 2 of improved (Zeuthen) gradient flow, and considers three different values for the threshold to distinguish between |Q| = 1 and Q = 0: Q thresh = 0.5, 0.7, and 0.9. The figure shows that, as expected, aspect ratios smaller than 2 are badly discrepant; but the difference between an aspect ratio of 2 and 4 is of order tens of percent, and is not statistically significant. Therefore in this exploratory study, we will only consider the continuum limit for an aspect ratio of about 2.5. The figure also shows that although the value of Q thresh introduces a systematic effect (the lower the threshold, the higher the determined χ value), this effect is statistically irrelevant already at LT < 1.    Fig. 6 (note that the N τ = 8 lattice has a slightly different aspect ratio than the N τ = 6, 10 lattices; smaller for 2.5 T c and larger for 4.1 T c ). At the higher temperature, we show results for two flow depths (t = 1.2a 2 and t = 2.4a 2 ) and two choices of flow action (Wilson and Zeuthen). The different Q definitions differ significantly for N τ = 6 (note that the determinations use the same Markov chain, so the errors are highly correlated and the difference is statistically very significant) but are nearly indistinguishable for N τ = 10; so issues of topology definition are seen to become small on fine lattices. The choice of topology definition is irrelevant in the continuum limit if we extrapolate in terms of ln(χ).
However, if we attempt to extrapolate χ(T, a 2 ) linearly against a 2 , we get very poor behavior. At T = 2.5 T c the two continuum limits, based on extrapolating ln(χ) and extrapolating χ directly, differ by more than their error bars. And at T = 4.1 T c , the linear extrapolations of χ(T ) using different definitions of topology are incompatible, and each definition leads to a negative extrapolated value, which is clearly unphysical. On the other hand, if we perform a linear extrapolation of ln(χ) against a 2 , the different definitions of topology produce compatible results, which are finite and physical. The reason that one should extrapolate in ln(χ) and not in χ directly, as we understand it [27], is that the topological susceptibility is controlled by the exponential suppression of the caloron action exp(−S caloron ) = exp(−8π 2 /g 2 (µ ∼ T )). This action receives multiplicative O a 2 lattice corrections: χ ∝ exp(−S) → exp(−[1 − O(a 2 T 2 )] S). That is, the a 2 corrections are best viewed as a shift in the caloron action and therefore in the logarithm of the susceptibility. Therefore an extrapolation of ln(χ(T )) in terms of a 2 is better justified, and better behaved. Indeed, Fig. 3 shows that S caloron is about twice as large at T = 4.1 T c than at T = 2.5 T c ; so the slope of the extrapolation should be twice as large in the left panel of Fig. 6 as in Fig. 5, which it is. Therefore this picture of the nature of a 2 errors is consistent with our findings, and an extrapolation of ln(χ 2 ) against a 2 is the theoretically best motivated way to extrapolate to the continuum.

Discussion
We have presented a methodology for applying reweighting [23] to the measurement of topology in high temperature pure-glue SU(3) QCD. Our approach involves reweighting in terms of a "poor man's" topological measurement Q (Q measured after a small amount of flow t = 0.42a 2 and using an a 2 improved topological density operator). There is then a two-stage simulation; first we simulate while dynamically changing our reweight function, to determine the form of the reweight function. Then we fix the reweight function and perform a Monte Carlo simulation to determine the topological susceptibility.
The method is effective; with modest numerical resources we are able to treat T = 4.1 T c up to an aspect ratio of 4 and up to a lattice spacing with N τ = 10, obtaining good statistics. Making a full continuum extrapolation but at a modest aspect ratio of 2.5 (extrapolating the t = 2.4a 2 , Zeuthen flow results), we find Our results at individual N τ values are consistent with previous studies; our A 1 lattice gives the same susceptibility as found by Berkowitz et al [13], and our results at 4.1 T c and N τ = 6, 8 (lattices B 1 and B 2e ) appear compatible with those at 4.0 T c from Borsanyi et al [17], who have significantly larger statistical errors despite applying much more numerical effort. Those authors also provide a continuum-extrapolated functional fit for χ(T ), which is in reasonable agreement with our results; applying their fit form to the temperatures we studied, we obtain χ(2.5 T c ) = 1.9 × 10 −4 T 4 c and χ(4.1 T c ) = 5.6 × 10 −6 T 4 c . Our results teach a few other lessons. On a lattice with N τ = 6, χ is sensitive to the exact definition of topology (depth of flow, flow action, threshold). This dependence is nearly gone by N τ = 10 and seems not to affect the continuum extrapolation. The continuum extrapolation should be performed in terms of ln(χ), not in terms of χ itself. The continuum extrapolation corrections to ln(χ) can be large and are larger at higher temperature. None of these lessons should be surprising [27].
We should not claim that our technique solves all problems, however. Looking at the Q value as a function of measurement number for a short portion of a Markov chain evolution, shown in Fig. 7, we see that despite our reweighting function, there are a few points where the simulation gets "stuck. 3 " It moves easily in the range 0 < |Q | < 0.27 and similarly moves easily across 0.2 < |Q | < 0.85; but it has difficulty moving from one of these ranges to the other. There is a similar "barrier" around |Q | = 0.8. These problems become more severe as we move to larger N τ . We believe that this occurs because Q is an incomplete descriptor which is missing some other information which distinguishes between The reweighting allows efficient sampling in three regions, 0 < |Q | < 0.27, 0.20 < |Q | < 0.85, and 0.8 < |Q | < 1.05, but has difficulty moving between these regions. these regions. We partly overcame this problem by making more, narrower reweighting bins in these overlap regions; our reweighting procedure causes the Markov chain to spend approximately equal time in each bin, so narrower bins cause more time to be spent in these regions, which helps the Markov chain to find the way between the different regions. (This is the reason for the cuspy discontinuities in Fig. 3.) However, while this helps, it hardly solves the problem, as Fig. 7 attests. We are searching for one or more additional observables to serve as further reweighting variables in the hopes of improving this sampling. Another inefficiency is that the number of updates needed to move between topological sectors does not improve as we increase the volume. Therefore, to achieve a given level of statistics, the numerical effort must grow linearly with the volume. We don't foresee any solution to this problem.
Conceptually there are no obstacles to applying our technique to the unquenched case. However, we expect doing so to be numerically more difficult. First, the HMC algorithm requires far more computer power with fermions. Second, the Q = 1 sector has near-zero eigenvalues, while the Q = 0 sector should have the smallest eigenvalue close to πT . The chiral limit should be severe. Third, the characteristic size of a caloron is smaller with light quarks than without [22], and therefore the lattice spacing should need to be smaller (N τ values larger) than what we need in pure glue. But we view these added numerical challenges as reasons that such studies should use reweighting; with the other challenges just mentioned, we absolutely need the improvement in statistical sampling of |Q| = 1 from reweighting if any statistical power is to be achieved at temperatures in the range of 3 T c to 7 T c .
We might hope that a similar reweighting method might help with the topologicalsector sampling problem at low temperatures but fine lattices. However, it is not clear to us that Q will be an effective reweighting variable in this case. We leave study of this problem for future work.