Topological Susceptibility of the 2d O(3) Model under Gradient Flow

The 2d O(3) model is widely used as a toy model for ferromagnetism and for Quantum Chromodynamics. With the latter it shares --- among other basic aspects --- the property that the continuum functional integral splits into topological sectors. Topology can also be defined in its lattice regularised version, but semi-classical arguments suggest that the topological susceptibility $\chi_{\rm t}$ does not scale towards a finite continuum limit. Previous numerical studies confirmed that the quantity $\chi_{\rm t}\, \xi^{2}$ diverges at large correlation length $\xi$. Here we investigate the question whether or not this divergence persists when the configurations are smoothened by the Gradient Flow (GF). The GF destroys part of the topological windings; on fine lattices this strongly reduces $\chi_{\rm t}$. However, even when the flow time is so long that the GF impact range --- or smoothing radius --- attains $\xi/2$, we do still not observe evidence of continuum scaling.

The 2d O(3) model is widely used as a toy model for ferromagnetism and for Quantum Chromodynamics. With the latter it shares -among other basic aspects -the property that the continuum functional integral splits into topological sectors. Topology can also be defined in its lattice regularised version, but semi-classical arguments suggest that the topological susceptibility χ t does not scale towards a finite continuum limit. Previous numerical studies confirmed that the quantity χ t ξ 2 diverges at large correlation length ξ. Here we investigate the question whether or not this divergence persists when the configurations are smoothened by the Gradient Flow (GF). The GF destroys part of the topological windings; on fine lattices this strongly reduces χ t . However, even when the flow time is so long that the GF impact range -or smoothing radius -attains ξ/2, we do still not observe evidence of continuum scaling.

Introduction
We are going to deal with the 2d O(3) model, a non-linear σ-model, which is also known as the Heisenberg model, or CP(1) model. It is a highly popular toy model both in solid state physics -where it describes ferromagnetsand in particle physics, where it shares fundamental features with Quantum Chromodynamics (QCD). In particular, it is asymptotically free [1], it has a dynamically generated mass gap (which was computed analytically [2] and numerically, e.g. in Ref. [3]), and in the continuum formulation its configurations are divided into topological sectors, due to Π 2 [S 2 ] = Z.
On a square lattice of unit spacing, the standard lattice action reads where xy are nearest-neighbour lattice sites. The naïve continuum limit leads to S[ e ] = β 2 d 2 x ∂ µ e(x) · ∂ µ e(x) , (1.2) but at the quantum level it is far from obvious if the continuum limit is well-defined. This is a long-standing issue, which arises in the context of topology. In particular, the crucial question is whether or not the topological susceptibility χ t exhibits a continuum scaling behaviour, i.e. whether or not the term χ t ξ 2 , which is supposed to be a scaling term, is finite in the continuum limit ξ → ∞ (where ξ is the correlation length in lattice units).
Numerous studies have addressed this question before, including Refs. [4][5][6][7][8][9][10]. After a period of confusion, the consensus seemed to be that χ t ξ 2 diverges at large ξ, which can be interpreted such that this model is -in a particular sense -ill-defined. This conclusion is consistent with studies with different lattice actions in the same universality class [11][12][13].
However, although smoothing techniques have been applied before, e.g. in Ref. [8], this question has not yet been addressed under application of the Gradient Flow (GF). Unlike ad-hoc approaches, this smoothing procedure is justified based on the renormalisation group [14][15][16]. Here we explore the fate of the term χ t ξ 2 after applying the GF.
There are other models with topological sectors, including QCD, which suffer from the same problem, if one uses a naïve lattice formulation of the topological charge density. However, in QCD this problem is less severe: solutions without GF are known [17], and the GF provides another solution [18]. This raises the question if this remedy could also cure the divergence in the 2d O(3) model, which we are going to investigate. Section 2 describes the numerical tools used in this study. Section 3 comments on the semi-classical picture, and Section 4 presents our results for the topological scaling behaviour at the quantum level, based on extensive Monte Carlo simulations. We discuss the outcome in Section 5. Appendix A compares various numerical implementations of the GF. Preliminary results (with short GF times) have been published in two proceedings contributions [19].

Numerical techniques 2.1 Algorithm
Our simulations were performed with the Wolff cluster algorithm [20]. It adapts the concept of the Swendsen-Wang algorithm [21] from the Ising model to the O(N) models, where it is highly efficient. In our study we employed both the single-cluster as well as the multi-cluster variant.

Topological charge
Regarding the topological charge of a lattice configuration, we applied the geometric formulation, which was introduced in Ref. [4]. For periodic boundary conditions -which we assume throughout this article -it assigns an integer charge Q[ e ] ∈ Z to each configuration (except for a subset of measure zero); this formulation is reviewed e.g. in Ref. [13], Section 3. 1 In the absence of a θ-term, symmetry implies Q = 0, 2 hence the topological susceptibility takes the simple form

Correlation length
The natural scale of the system is set by the correlation length ξ, which corresponds to the inverse energy gap. It is obtained by a 2-parameter fit to the correlation function between spin averages in layers at fixed instances in Euclidean time, say x 2 and x 2 + r, with 0 ≤ r < L, where we refer to a periodic L × L lattice with sites x = (x 1 , x 2 ). This proportionality relation holds when |r − L/2| is sufficiently small. In practice we follow the recipe of Ref. [23] to determine ξ by a fit in the range L/3 ≤ r ≤ 2L/3 (varying this range leads to minor modifications of the fitting result for ξ). Our results agree with the literature, in particular with ξ-values given in Refs. [3,23,24]. In each volume V = L×L we tune β such that L/ξ ≃ 6. Hence increasing the lattice volume corresponds to a system of fixed physical size, which approaches the continuum limit. The corresponding values of β and ξ are given in Table 1 (the column for ξ at t = 0 contains the results before application of the GF). 1 Alternative definitions of the topological charge of a lattice configuration where suggested in Refs. [7,12,22]. 2 This can be seen from the topological charge density in the continuum, q(x) = ǫ µν e(x)· (∂ µ e(x) × ∂ ν e(x))/8π : a global spin flip e(x) → − e(x), which is the Z(2) subgroup of the global O(3) symmetry, changes the sign of q.  Table 1: Overview of the parameters in our study: we consider eight volumes V = L × L, in each one β is tuned such that L/ξ ≃ 6, and the GF time unit amounts to t 0 = L 2 /5760. When we apply the GF, the correlation length ξ does not change significantly up to flow time 10t 0 . Before the GF, at t = 0, ξ agrees fairly well with the second moment correlation length ξ 2 , which can be measured more precisely. Our numerical measurements are based on S ξ and S ξ 2 independent simulations for ξ and ξ 2 , respectively, where each simulation generated 10 5 configurations.
These results are based on sets of S ξ independent measurements, see Table  1, each of which involves 10 5 configurations. If we insert the errors as obtained from the fits to eq. (2.2), the independent results are not fully consistent: requiring a unique value at each L, we obtain χ 2 /d.o.f. ≃ 4.0, which shows that the errors are underestimated. 3 Therefore we amplify the errors by a factor of 2, which leads to consistency, in particular to χ 2 /d.o.f. ≃ 1.0. These extended errors are inserted into the Gaussian propagation to obtain the error of the average values given in Table 1.
Despite the sizeable statistics, the uncertainty in ξ is non-negligible; for comparison, the relative errors on χ t are much smaller, see Section 4. Therefore we also measured the second moment correlation length ξ 2 . It is obtained from the Fourier transform of the spin-spin correlation function e x · e y at zero momentum (χ m ), and at the lowest non-zero momentum (F ), where χ m is the magnetic susceptibility. ξ 2 can be measured more precisely than ξ, cf. Table 1, since it does not require any fit. In this case we have performed S ξ 2 independent measurements, each one based on 10 5 configurations (again the errors are somewhat enhanced for compatibility of the individual results). Strictly speaking, this is not the physical scale, but it is known to coincide with ξ to high accuracy: in the large-L limit the discrepancy is below 1 [25], and at L/ξ 2 ≃ 4, L ≥ 70 it is still below 1 % [13]. 4 (A systematic comparison in other models is given in Ref. [26].)

Gradient flow
The GF in the O(N) models has been formulated in Refs. [27]. In the continuum, the spin components e(x) i are altered by integrating the differential equation where ∆ is the Laplace operator, and t is the GF time (of dimension [length] 2 ), which starts at t = 0, i.e. e (0, x) = e (x) and t ≥ 0. The GF preserves the spin norm, which corresponds to the condition e · ∂ t e = 0.
The concept of the GF is based on the heat kernel K(t, x) [14][15][16], which allows us to estimate its impact range, or smoothing radius, On the lattice we deal with the spin field e(t) x , and we insert the standard lattice Laplacian, For the numerical integration of eq. (2.4) also the GF time t has to be discretised. Here we apply the Runge-Kutta method, see e.g. Ref. [28]. We first compute the gradients to all spin components at all lattice sites, then we rotate all spins simultaneously (afterwards the normalisation | e x | = 1 is re-adjusted at each site). In small and moderate volumes we used the 4-point Runge-Kutta method, with time step dt = 10 −4 . 5 In this project, the GF integration took most of the computation time. In order to handle lattice sizes up to L = 404, it was mandatory to implement an adaptive step size. We applied the Dormand-Prince algorithm [29], which gradually increases dt, if the Runge-Kutta 4-point and 5-point gradients agree to high accuracy. At long flow times this method provides a gain in computing time by several orders of magnitude: once a configuration is quite smooth, dt can be enhanced drastically without causing significant artifacts. This is discussed in Appendix A.
In order to compare results in different volumes, and therefore at different couplings, we need a GF time unit t 0 , which has to be determined by referring to a dimensional observable. Such a time unit allows for the matching of results from different couplings and volumes, and therefore for a controlled continuum extrapolation (which is not obvious for ad hoc smoothing techniques). In QCD, t 0 is usually fixed by the condition E t 2 0 = 0.3 [14] (or E t 2 0 = 0.1 for SU(2) Yang-Mills theory [30]), where the density E = −Tr[G µν G µν ]/2 serves as an observable, which is easily measurable (G µν is a lattice field strength tensor).
In Refs. [19] we have used the corresponding density in the 2d O(3) model, E = S /βV . However, this turned out to be impractical: for increasing GF time t the (dimensionless) term E t rises from 0 to some maximum and decreases again. The value of this maximum decreases as we enlarge L, so in order to capture all volumes under consideration, we had to take a small reference value like E t short 0 = 0.08 (for instance, the value 0.1 is never 5 We checked that the results coincide within the errors with those obtained at dt = 10 −5 . On the other hand, when we increase the step size to dt = 10 −3 we noticed (in a few cases) non-negligible artifacts; they typically emerge at an early stage of the GF. attained at L = 404). Thus we obtained short time units t short 0 < ∼ 0.1, and up to 6t short 0 the impact range attained at most 1.6 lattice spacings. In order to probe much larger impact ranges, of O(ξ), we now refer directly to ξ as our reference observable to fix t 0 . We define it such that 10t 0 -the longest GF time in our study -corresponds to an impact range of about ξ/2, (2.8) Table 1 contains the GF time unit t 0 , as well as the correlation length measured at 10t 0 ; we see that it hardly changes compared to t = 0. 6 An example for the GF time evolution of the correlation function C(r) of eq. (2.2) is illustrated in Figure 1: at a fixed distance r it increases under GF, but the value of ξ remains virtually unaffected. This is consistent with the fact thatx is still small compared to L,x(10t 0 ) ≃ L/12, so it does not reach out to the interval, where we performed fits to relation (2.2). The GF does, however, have the expected effect of suppressing the statistical errors in ξ (they are amplified with the factor of 2, as at t = 0, cf. Subsection 2.3).
On the other hand, after applying the GF the second moment correlation length ξ 2 increases above its value at t = 0, as illustrated in Figure 2. This property is generic; 7 it implies that ξ 2 (t > 0) cannot be used to set an (approximate) scale. Instead our results for ξ(t) justify the use of the scale ξ 2 (0) even after the GF, all the way up to t/t 0 = 10.

The semi-classical picture
Ref. [4] was the first study to show that the numerical results for the topological susceptibility χ t , based on Monte Carlo simulations of the standard action (1.1), do not seem compatible with continuum scaling, i.e. with the scaling towards a finite continuum limit, which is naïvely expected. In particular, the dimensionless term χ t ξ 2 seems to diverge in the continuum limit. 6 In the framework of finite temperature gauge theory, Ref. [31] discusses the question how long the GF time can be, before destroying physical information. Our results for ξ(t) show that -in our case -we are on the safe side, at least up to 10t 0 . 7 Note that the entire configurations contribute to the terms χ m and F , in contrast to the fits, which determine ξ within a limited range. Hence the short-distance deformation of the correlation function (see Figure 1) is likely to cause the distortion of ξ 2 . The correlation function C(r) = s x 2 · s x 2 +r , measured before and after the GF, at L = 120 (as an example). At fixed separation r, C(r) rises under GF, but the value of ξ -obtained from a fit to relation (2.2) in the interval 40 ≤ r ≤ 80 -remains practically constant.
Small topological windings, which may occur in lattice configurations with low action, were blamed for this effect; it was suspected that their dominant rôle on fine lattices prevents continuum scaling [4,5].
Ref. [6] provided a comprehensive semi-classical explanation for this behaviour. It generally considered 2d CP(N −1) models, 8 where the continuum instanton action (the minimal action within the topological sector |Q| = 1) amounts to On the lattice, a single topological winding (Q = ±1) with minimal action is denoted as a dislocation. Its action was numerically obtained as [6] S disloc = βǫ disloc , ǫ disloc ≃ 6.69 · N/2 . At the quantum level, the fate of a model depends on the balance between action and entropy. In this case, a perturbative calculation of the β-function suggests that the fate of this model depends on ǫ disloc as follows [6], ǫ disloc > 4π continuum scaling < 4π divergence in the continuum limit. (3.3) This implies that continuum scaling of χ t is safe at N ≥ 4. At N = 3 it can still be arranged for by adding non-standard terms to the lattice action [32]. N = 2, however, is a peculiar case, where ǫ inst coincides with the bound derived from the β-function. In this case, which corresponds to the O(3) model, the semi-classical picture predicts the term χ t ξ 2 to diverge in the continuum limit. This semi-classical argument is not rigorous, of course, there is no compelling reason for it to be conclusive at the full quantum level. Still, a variety of numerical studies ultimately suggested that this prediction is confirmed, cf. Section 1.
Ref. [11] applied a sophisticated lattice action, a (truncated) classically perfect action, which was constructed by means of classical block spin renormalisation group transformations. It involves couplings over several lattice spacings, which exclude dislocations with ǫ disloc < 4π, but χ t ξ 2 still was found to diverge logarithmically in the continuum limit.
Very different are topological lattice actions, in particular the constraint action, where all configurations have action 0, if the relative angles between all nearest neighbour spins is below some bound δ. 9 In this case, the dislocations are extremely degenerate, with ǫ disloc = 0. Hence one might expect a very bad divergence of χ t ξ 2 in the continuum limit, which is attained in this case by δ → 0. It turned out, however, that the divergence is still compatible with a logarithmic dependence on ξ [13].
Here this question will be revisited under application of the GF. 10 Before doing so, however, we begin with an observation about the relevance of the semi-classical picture. To this end, it is sufficient to consider modest lattice volumes, of sizes L = 24 . . . 80, with the β-values of Table 1. In each volume we selected 50 000 configurations with topological charge |Q| = 1. Figure 3 refers to the quantity ǫ = S/β (S being the lattice action (1.1)): it shows the mean value ǫ , as well as the minimum obtained in each volume. At GF time t = 0 even the minima (in this set of configurations) are orders of magnitude above the instanton and dislocation values. This suggests that -although configurations with ǫ down to ǫ disloc exist -their contribution to a typical expectation value is negligible in our settings. 11 When we apply the GF, as described in Section 2, the configurations become smoother and the action decreases, so one might suspect that now the semi-classical configurations (or at least their vicinity) become relevant. Figure 3 shows that this is not the case: even when we run the GF up to 10t 0 , the averages and minima (within a set of 50 000 configurations, at any instant t) are still more than a factor of 5 times larger than ǫ disloc .  Figure 3: The quantity ǫ = S /β measured in volumes V = L × L, at L/ξ ≈ 6. In each volume, and at each flow time t/t 0 = 0, 1 . . . 10, we used 50 000 configurations with topological charge |Q| = 1. We also show the minima of ǫ within this set. Even after a long GF, up to flow time 10t 0 , these minima are still more than a factor of 5 above the dislocation value ǫ disloc ≃ 6.69. We infer that dislocations and their vicinities hardly contribute to the statistics. Figure 3 further shows that this observation hardly depends on the volume. It raises the question how relevant the semi-classical consideration really is, since it does not refer to the statistically significant contributions (unless presumably in tiny physical volumes). Nevertheless, our goal is a direct verification of its prediction; this is the question to be addressed in the next section.

Topology under the Gradient Flow
Based on S χ sets of 10 5 configurations in each volume (see Table 2), we finally measured the topological susceptibility χ t , given in eq. (2.1). Unlike the case of ξ, the results for χ t from these S χ independent simulations are consistent within our estimated errors: in this case we obtain a ratio χ 2 /d.o.f. ≃ 0.90 (the cluster algorithm allows us to avoid topological auto-correlations).
Our results for χ t are listed in Table 2. They are averaged over all simulations, and each of their standard errors enters the Gaussian composition of the final error. The evolution under GF is illustrated in Figure 4, which shows the dimensionless term Q 2 = χ t V . In large lattice volumes, i.e. on fine lattices, we see a rapid decrease of Q 2 when the GF starts, in particular from t = 0 to t 0 (in Appendix A we will see that most of this effect happens even within a first small fraction of t 0 ). At a later stage Q 2 still keeps decreasing, but at an ever slower rate.   Table 1, based on S χ measurements with 10 5 configurations each.
At last we arrive at the discussion of the "scaling term" χ t ξ 2 . Regarding the correlation length, we rely on the property that the flow times are not excessively long, so that physical aspects are not affected, and in particular the long-range scale ξ should not change, as we argued in Subsection 2.4. In fact, our results in Table 1 confirm that the modifications of ξ are minor: in each volume, ξ(0) and ξ(10t 0 ) agree within less than 1.7σ. (It is also noteworthy that the sign of ξ(0) − ξ(10t 0 ) differs in our results from different lattice volumes, which further shows the absence of a systematic effect of the GF on ξ up to 10t 0 .) Trusting the stability of ξ, we replace it by ξ 2 (0), for which we have precise results -see Table 1 -and use it at any flow time t ∈ [0, 10t 0 ], cf.  Figure 5.
It is an unambiguous observation that -after any fixed multiple of the flow time unit t 0 that we considered -the quantity χ t ξ 2 keeps growing as we increase the correlation length; we cannot observe convergence towards a finite continuum limit. This trend is most obvious in our largest lattice volumes and at long flow times. At relatively short GF, in particular at flow time t 0 , the "scaling term" looks almost stable up to L = 80, ξ ≈ 13, but even closer to the continuum limit it turns into the (qualitative) behaviour observed at long flow times.
As a first hypothesis, we assume the asymptotic behaviour at large ξ to be logarithmic. 12 This can be expressed by the ansatz which was successful in fits to results obtained with topological lattice actions [13]. As an alternative, we consider another 3-parameter ansatz, which describes a power-law, as in Refs. [13,19]. That behaviour corresponds to the semi-classical picture of Ref. [6] (for the case of 4d Yang-Mills gauge theory, this property is worked out explicitly in Ref. [33]). We first consider the data before the GF. In this case, we perform fits over the entire range L = 24 . . . 404, so there are 5 "degrees of freedom", and we obtain at t = 0  . In all cases the logarithmic fits have a better quality, in contrast to the case t = 0. The constants a 1 , a 2 , b 1 , b 2 are positive and incompatible with 0 at any GF time up to 10t 0 . Hence our data are incompatible with continuum scaling. but this quantity is somewhat large for the logarithmic fit. 13 However, even there the uncertainties of the fitting parameters are moderate. The observation that the constants a 1 , a 2 , b 1 , b 2 are all larger than 0 (far beyond the errors) confirms that the data before GF are incompatible with continuum scaling. Figure 6 shows the constants a i (t) and b i (t) obtained from the fits to the functions (4.1) and (4.2) at GF times t = t 0 , 2t 0 . . . 10t 0 . The lower plot also shows the ratio χ 2 /d.o.f., as a measure of the quality of the fits. All fits were performed in the range L = 80 . . . 404, hence they capture 5 data points, corresponding to 5 lattice volumes. In all these cases, i.e. after the GF, the fits to the logarithmic ansatz (4.1) are superior, as we see from the lower plot in Figure 6 (this behaviour agrees with Refs. [19]).
The essential observation, however, is based on the upper plot: it shows that the constants a 1 , a 2 , b 1 , b 2 keep on being larger than zero during the GF; zero values are well beyond the errors. 14 Therefore, even after the GF our data are incompatible with a scaling of χ t ξ 2 towards a finite continuum limit.

Conclusions
There is a variety of models with topological sectors, and some of them are plagued by problems with the continuum scaling of χ t . This is not the case in the simple 1d O(2) model, where -for a multitude of lattice actionsχ t ξ exhibits a straight convergence to its continuum value of 1/2π 2 [13,34].
In naïve lattice formulations of 4d Yang-Mills gauge theory, as well as QCD, this problem appears, but there are various ways to overcome it, see Ref. [35] for pure SU(3) gauge theory, and the aforementioned Refs. [17,18] for QCD.
Regarding the 2d CP(N − 1) models, the numerical results confirm the semi-classical picture of Ref. [6] that we sketched in Section 3: no problem occurs at N ≥ 4, and at N = 3 there is a divergence, but it can be avoided by non-standard lattice actions, see e.g. Refs. [32,36]. 13 The fits to our preliminary data that we considered in Refs. [19] had a similar quality for both functions. On the other hand, in Ref. [13] we observed superiority of the logarithmic function for data obtained with the constraint lattice action.
14 In light of Figure 6, one might question the behaviour of a 2 and b 1 at short t ≤ t 0 . However, in Refs. [19] we arrived at the same conclusion also for such short GF times.
There remains the case N = 2, which is peculiar indeed: in this model, which is equivalent to the 2d O(3) model, no way around the divergent continuum limit of χ t ξ 2 is known; we have seen that not even the GF, which is a safe remedy in other models, helps in this specific case.
This does not mean that all topological terms in the 2d O(3) model are ill-defined. Even without GF, there is evidence for the opposite to hold for the following quantities: • The correlation function of the topological charge density q x , q x q y , is well-defined (i.e. finite in the continuum limit) at all separations x − y, expect for x = y. That point alone causes the divergence of χ t = y q x q y [10,13], and the situation is similar in QCD [17]. 15 • The kurtosis c 4 = (3 Q 2 2 − Q 4 )/V is a characteristic of the distribution of the topological charges (it vanishes if this distribution is Gaussian). In the continuum limit, the ratio c 4 /χ t converges to a value close to −1 [38] (which is the value of a dilute instanton gas).
• If we add a θ-term, S[ e ] θ = S[ e ] − iθQ[ e ], with −π < θ ≤ π, we obtain an expectation value Q , which does not need to vanish anymore. Therefore we now have to refer to the general expression for χ t , The expectation value Q is well-defined at any vacuum angle θ, but the function Q (θ) has an infinite slope at θ = 0. This is the picture elaborated in Ref. [39], without GF, which is sketched schematically in Figure 7. It implies that θ remains finite under renormalisation, 16 and that χ t (θ) does exhibit continuum scaling at any θ = 0.
We have seen that the picture of Ref. [39] -in particular the infinite slope at θ = 0 -seems to (qualitatively) persist under the GF. This extends our previous observation [19] to much longer flow times.
Ref. [39] did not consider this behaviour unnatural, although χ t (θ = 0) is supposed be an observable. This scenario requires the free energy F (θ) = 15 In a fixed topological sector, the correlation q x q y at large separation can be employed for an indirect measurement of χ t [37]. 16 Ref. [39] concludes that each value of θ ∈ [0, π] represents a different continuum theory. model is that its slope -which is proportional to χ t -seems to diverge at θ = 0. This picture corresponds to Ref. [39], and to private communication with A. Zamolodchikov; our results suggest that its qualitative features persist under the GF.
Here we present our numerical results, which support this scenario. We leave it to the reader to decide whether he/she considers this property as fatal for the topology of the 2d O(3) model.

A Numerical integration of the Gradient Flow
This appendix compares various implementations of the GF based on the Runge-Kutta method; for a pedagogical description of this method we recommend Ref. [28]. In particular we are going to address the performance of the Dormand-Prince adaptive step size algorithm [29].
That algorithm allowed us to handle lattices up to size L = 404 with a high statistics of 2 · 10 5 configurations, see Tables 1 and 2. In the smaller volumes we could run the GF at a fixed step size of dt = 10 −4 , and extensive tests demonstrated the consistency with the Dormand-Prince algorithm, up to L = 270. This appendix is going to concentrate on L = 404, where fixed dt = 10 −4 production runs are prohibitively expensive. Instead we refer to a sample of 100 test configurations, which were generated at β = 1.807 (the value used in our study), well thermalised and independent.
Our tests have further shown that the most delicate part of the GF is the very beginning. This is expected: possible artifacts due to the finite step size dt are most likely before the configurations become smooth. In this appendix we consider flow time t = 0 to 10 ≃ 0.35t 0 . This interval is of primary interest: we will see that most of the reduction of Q 2 that we observe up to 10t 0 (see Figure 4) happens in the very first flow period.
Strictly speaking, the application of the Dormand-Prince algorithm requires two parameters: the initial time step dt 0 , and a "tolerance parameter" ε. If the gradients computed by the Runge-Kutta method with 4 points and with 5 points 17 coincide within this tolerance, i.e. the norm of their difference is below ε, then dt will be increased in the subsequent step -in the opposite case it will be decreased. 18 Regarding the initial time step dt 0 , we ran numerous tests with dt 0 = 10 −3 and dt 0 = 10 −4 : when everything else was kept fixed, we never found any difference which could be significant at our level of precision. After just a few time steps one obtains results, which are practically indistinguishable. Since this choice hardly affects the computation time, we used dt 0 = 10 −4 in our production runs, and also in the tests to be presented in this appendix. Hence our discussion focuses on the tolerance parameter ε.
We are going to compare three numerical implementations of the GF: • Fixed step size dt = 10 −4 .
First we consider the topological charges of these 100 test configurations. We checked for possible deviations when we apply these GF implementations, but they fully agree at any t = 1, 2, 3 . . . 10. Figure 8 shows the value of the Q 2 obtained from this sample. It confirms that most of the destruction of topological windings happens very early, at t < 1 ≃ 0.035t 0 . This corresponds to an impact range below 2 lattice spacings, hence it matches the picture of a quick destruction of numerous tiny dislocations (compared to the correlation length ξ ≃ 68). The topological windings that persist can either be large, or small with a structure, which resists the GF for longer flow time. We saw that these remaining windings still make the topological susceptibility diverge.   Figure 9 illustrates how dt increases when we apply the Dormand-Prince algorithm, with tolerance parameter ε = 10 −6 or ε = 10 −7 . The difference between these two scenarios is significant: in particular, at ε = 10 −6 the step size soon attains a remarkable magnitude of dt ≈ 0.25; at t ≈ 4 the configurations are already sufficiently smooth to allow for this value. (That case also confirms that, in exceptional cases, the algorithm can temporarily decrease dt, cf. footnote 18.) Since we did not observe any significant difference in the results, the use of this value of ε is highly motivated. It provides a gain in computation time by several orders of magnitude: this gain can be estimated by assuming the GF to take computation time ∝ 1/dt, although adaptive step size algorithms require some additional operations.