Slow running of the Gradient Flow coupling from 200 MeV to 4 GeV in $N_{\rm f}=3$ QCD

Using a finite volume Gradient Flow (GF) renormalization scheme with Schr\"odinger Functional (SF) boundary conditions, we compute the non-perturbative running coupling in the range $2.2 \lesssim {\bar g}_\mathrm{GF}^2(L) \lesssim 13$. Careful continuum extrapolations turn out to be crucial to reach our high accuracy. The running of the coupling is always between one-loop and two-loop and very close to one-loop in the region of $200\,{\rm MeV} \lesssim \mu=1/L \lesssim 4\,{\rm GeV}$. While there is no convincing contact to two-loop running, we match non-perturbatively to the SF coupling with background field. In this case we know the $\mu$ dependence up to $\sim 100\,{\rm GeV}$ and can thus connect to the $\Lambda$-parameter.


Introduction
The energy dependence of the strong coupling constant α s (µ) in a physical scheme provides information on how to connect the low and high energy regimes of QCD. Relating these very different domains of the strong interactions is key to providing a solid determination of the fundamental parameters of the Standard Model [1]. Lattice QCD is in principle an ideal tool for such studies. Observables defined at short Euclidean distances can be used for a non-perturbative physical coupling definition (see for example [2] and references cited therein), and its value can be extracted accurately via Monte Carlo simulations. A direct implementation of this program has to face the so-called window problem: the short Euclidean distance used to define the renormalization scale has to be both large compared to the lattice spacing a and small compared to the total size of the box (denoted by L) used in the simulation. Since the box has to be large enough to describe hadronic physics, computational constraints severely limit the range of renormalization scales that one can study.
Finite size scaling provides an elegant solution for this problem [3]. Relating the renormalization scale µ with the finite size of the box via µ = 1/L, the couplingḡ 2 (L) depends on only one scale. 1 Lattices of different volumes can be matched, allowing us to compute the step scaling function σ(u) [3]. It measures how much the coupling changes when the renormalization scale changes by a fixed factor, which we set to two, (1.1) It can be considered a discrete version of the renormalization group β-function. The exact relation is with the convention where the universal coefficients in the asymptotic expansion take the values b 0 = 9/(16π 2 ) and b 1 = 1/(4π 4 ) in N f = 3 QCD. Once σ(u) is known one can set u 0 =ḡ 2 (L 0 ) and use the recursive relation u k = σ(u k−1 ), k = 1, . . . , n s , (1.4) to relate non-perturbatively the scale 1/L 0 with the scales 2 −k /L 0 for k = 0, . . . , n s . A few iterations suffice to connect a hadronic low energy scale with the electroweak scale. This is the strategy of the ALPHA collaboration. Using the so called Schrödinger Functional (SF) scheme [4,5], QCD with N f = 0, 2 and N f = 4 quark flavors has been studied [6][7][8]. Of immediate relevance to the present work is the recent application of this technique to the high energy domain of N f = 3 QCD [1]. There the energy dependence of the strong coupling was studied between the electroweak scale and an intermediate energy scale µ 0 = 1/L 0 ∼ 4 GeV, defined byḡ 2 SF (L 0 ) = 2.012, with very high accuracy. This strategy is theoretically very appealing, but has some practical difficulties. The computational cost of measuring the SF coupling grows fast at low energies and in particular towards the continuum limit. Thus it is challenging to reach the low energy domain characteristic of hadronic physics, especially if one aims at maintaining the high precision achieved in [1]. The recently proposed coupling definitions based on the Gradient Flow (GF) [9] are much better suited for this task. The relative precision of the GF coupling in a Monte Carlo simulation is typically high and shows a weak dependence on both the energy scale and the cutoff (see [10] for a recent review and more quantitative statements). Moreover GF couplings can easily be used in combination with finite size scaling and a particular choice of boundary conditions [11][12][13][14].
In this work we use the GF coupling defined with SF boundary conditions [12] (denoted byḡ 2 GF (L)) to connect non-perturbatively the intermediate energy scale 1/L 0 with a typical hadronic scale 1/L had defined by the condition g 2 GF (L had ) = 11.31 . (1.5) The main result of this paper is the relation As the reader will see, our choices of lattice discretization and scale 1/L had are such that L had can be related with the pion and kaon decay constants by using the CLS ensembles [15]. This work therefore represents an essential step in the ALPHA collaboration effort of a first principles determination of the strong coupling constant and quark masses at the electroweak scale in terms of low energy hadronic observables [16,17]. The paper is organized as follows. In section 2 we fix our notation and introduce the details of our coupling definition. Section 3 discusses general aspects of taking the continuum limit while section 4 contains the extraction of the continuum σ(u). After arriving at our main result in section 5 we discuss our findings in section 6.

Continuum
We work in 4-dimensional Euclidean space and consider standard SF boundary conditions with zero background field [4,5]. In summary, gauge fields are periodic in the three spatial directions with period L, and the spatial components k = 1, 2, 3 of the gauge field satisfy homogeneous Dirichlet boundary conditions in time, We choose the value θ = 1/2 [18]. Defining the projectors P ± = 1 2 (1 ± γ 0 ), the time boundary conditions read P + ψ(0, x) = 0 =ψ(0, x)P − , P − ψ(T, x) = 0 =ψ(T, x)P + . (2. 3) The GF [9,19] defines a family of gauge fields B µ (t, x) parametrized by the flow time t ≥ 0 via the equation 2 is the covariant derivative, and G µν (t, x) is the field strength tensor of the flow field, Gauge invariant composite operators defined from the flow field B µ (t, x) are renormalized observables, see [20]. In particular, our definition of a running coupling follows the proposal of using the action density at positive flow time [9]. In a finite volume and with our choice of boundary conditions the running coupling was defined in [12] where N (c) is a known function [12]. Note that we use only the spatial components of the field strength tensor to define the coupling. As argued in [12] boundary effects are smaller for this particular coupling definition, while we have observed that one does not lose numerical precision. The coupling is defined by projecting to the sector of vanishing topological charge, Q = 1 32π 2 x µνρσ G a µν (t, x)G a ρσ (t, x), via the insertion of δ Q,0 into the path integral expectation values. This choice is convenient because lattice simulations with SF boundary conditions suffer from the topology freezing problem at small lattice spacing [21][22][23]. Projecting to the zero charge sector avoids this problem [23]. The renormalization scheme is completely defined by adding that we use This choice is fixed in this work, apart from section 3 where we also consider other values of c.

Lattice
For our lattice computations we work on a (L/a) 3 × (T /a) lattice with lattice spacing a. We use the tree-level improved Symanzik gauge action [24]. With S 0 and S 1 denoting the set of 1 × 1 and 2 × 1 oriented loops respectively, we have where U (C) denotes the product of the link variables U µ (x) around the loop C. Tree-level O(a 2 ) bulk improvement is guaranteed by choosing c 0 = 5/3 and c 1 = −1/12. Modifications of the gauge action near the time boundaries x 0 = 0, T lead to Schrödinger Functional boundary conditions in the continuum. We stick to option B of reference [25] and choose the weights w k (C) as follows: 3 all links in C are on the time boundary c t (g 0 ), C has one link on the time boundary 1, otherwise , (2.9a) all links in C are on the time boundary 3/2, C has two links on the time boundary 1, otherwise The improvement coefficient c t is inserted with the available one-loop precision, see section 3.1. We simulate three massless flavors of non-perturbatively O(a)-improved Wilson fermions with action where m 0 is the bare quark mass that we set to the critical value m cr . The Dirac operator can be decomposed as where D w is the usual lattice Wilson-Dirac operator, is the Sheikholeslami-Wohlert term [27] with F cl µν being the lattice clover discretized version of the field strength tensor, and finally is the contribution of the fermionic boundary counterterm [28]. We use the non-perturbatively determined c sw (g 0 ) [29]. Except at the time boundaries, our action is the same as the one used by the CLS collaboration [15]. With our choice of boundary conditions in time, the complete removal of O(a) effects requires the knowledge of the boundary improvement coefficients c t ,c t . We use their values determined in perturbation theory. As an estimate of the uncertainty of perturbation theory we use the last known term in the perturbative series, the one-loop term (cf. section 3.1). Details will be discussed later.
SF boundary conditions on the lattice are imposed in complete analogy to the continuum counterparts. The gauge links obey while the fermion boundary conditions remain the same as in the continuum, eq. (2.3). There is much freedom when translating the GF equation eq. (2.4), and the energy density used to define the coupling (see eq. (2.6)) to the lattice. Different choices differ only by cutoff effects, but these can be substantial. A popular choice is the Wilson flow (no summation over µ) is the force deriving from the Symanzik tree-level improved (Lüscher-Weisz) gauge action eq. (2.8) (see [30] for more details). We insert the correction term ∆ µ = ∇ * µ ∇ µ into the flow equation for all links (x, µ) except for those links (x, 0) where an end-point touches one of the SF boundaries x 0 = 0, T . For those links we simply The discretized observable is defined to be the action density derived from the Lüscher-Weisz gauge action (i.e. eq. (2.8)). Our choices guarantee that, neglecting small terms coming from the time boundaries at x 0 = 0, T , we do not introduce any O(a 2 ) cutoff effects neither through the flow equation nor through the definition of the observable. The remaining cutoff effects in our flow quantities are hence produced by our lattice action eqs. (2.8,2.10) and by the initial condition for the flow equation at t = 0 [30]. Although this is our preferred setup, in several parts of the work we will compare the results with the more standard Wilson flow / clover-observable discretization.
At non-zero a/L our coupling definition reads and Several comments are in order. We have chosen to define the coupling through just the magnetic part E mag of E since this choice has a lower sensitivity to the boundary improvement coefficient c t , and because its (tree-level) O(a 2 ) improvement does not need any further terms 4 . As in [12] the normalization factorN (c, a/L) is computed on the lattice with our choices of discretization (action, flow and observable), such that in the relationḡ 2 GF = g 2 0 + O(g 4 0 ) the leading term has all lattice artifacts removed. Due to the fact that we use a tree-level improved action, and neither the Zeuthen flow equation nor the Lüscher-Weisz observable discretization introduce any O(a 2 ) artifacts, we furthermore have i.e. the lattice normalization in fact only corrects sub-leading O((a/L) 4 ) terms. Finally, on the lattice one has to clarify what is meant by projecting to zero topology. We define the topological charge by [9] using the clover discretization of the field strength tensor of V . We then set √ 8t = cL with c = 0.3 and use the Zeuthen flow. With this definition the topological charge is not integer valued but approaches integers close to the continuum limit. Therefore, the Kronecker δ Q,0 of the continuum definition is replaced bŷ 3 General considerations on the continuum limit of flow quantities All studies of finite size scaling with the GF scheme show significant cutoff effects in the extrapolations of the step scaling function (see [10] and references therein). In fact one may be concerned not only by the leading 5 O(a 2 ) effects, but also by the sub-leading higher order corrections (in the present case, starting at O(a 3 )) that might lead to the wrong continuum limit. Local composite fields constructed from the flow field have a natural length scale given by the smoothing radius √ 8t, which is smaller than L by a factor c. Hence the natural expansion parameter for the cutoff effects is = a/ √ 8t = a/(cL). A first example is provided by checking the effects at tree-level. This just amounts to studying δ(c, a/L), eq. (2.20). In order to get a more general picture, we consider besides our discretization of the flow observable ("Zeuthen flow"), also the one used in many studies: Wilson flow and clover discretization of the energy density [9] (for short "Wilson flow"). We find that 6 δ(c, a/L) ∼ −0.9118 2 + 0.4867 4 Wilson flow where ∼ holds with corrections of less than 10 −4 for < 0.33 and for c in a range 0.1-0.4. One may also consider the GF coupling for twisted periodic boundary conditions [13]; the above numbers hardly change. This example not only shows that in fact the cutoff effects are predominantly a function of = a/ √ 8t, but also that the contribution of orders higher than 2 are only at the level of a few percent for < 0.3. There is a clear hierarchy of the different orders at small , say < 0.3.
Of course one has to study the situation beyond tree-level perturbation theory, and in particular the scaling properties of the lattice approximation to the step scaling function σ(u) of eq. (1.1). In order to do so, it is useful to consider the general ratio that has a natural expansion where = a/(cL) and = a/(c sL). The connection to the standard step scaling function is σ(u) = u R −1 c,c (u, 0, 2). It is again worthwhile to first consider tree-level. To this end, we temporarily replace the normalizationN (c, a/L) by the continuum one, N , in eq. (2.18); otherwise all cutoff effects are removed. With this replacement, the tree-level ratio R c,c (u, 0, s) is to a very good approximation just a function of u and the product sc , while the function A c,c (u) depends little on c, c . An inspection of our numerical data shows that this is true also at non-vanishing coupling. These properties allow us to get insight into the scaling properties of the step scaling function by considering the case s = 1 where we can use our full dataset. As we shall see in section 4, we have 5 lattice resolutions L/a = 8, 12, 16, 24, 32 at our disposal. Continuum extrapolations can involve a change of the lattice spacing of up to a factor four. Moreover these ratios can be computed even more precisely than the step scaling function, since they are evaluated on the same ensembles and one profits from the statistical correlation of the data. Figures 1 and 2 show R c,c (u, a/L, 1) for all together six different combinations, c, c and two values of u. The data originate from the simulations described in appendix A, forming first the ratios at the availableḡ 2 c and then performing a (very smooth) interpolation to the two chosen values of u. As shown in the figures, we separately extrapolate the ratios for the two different discretizations of the flow observables to the continuum limit. We use a pure a 2 ansatz for the cutoff effects in the ranges The data are compatible with the linear behavior in a 2 and the so-estimated continuum limits agree. The test is rather stringent because here the precision is higher than in the step scaling functions, which form the core observables of the rest of this paper. For the step scaling functions there is no analogy of the correlations of numerator and denominator in eq. (3.2), which enhance the precision of R c,c (u, a/L, 1). Figure 1 and figure 2 are a good confirmation that higher order cutoff effects are small, when eq. (3.4) is satisfied. Translating the bounds (3.4) to the case of the step scaling function we have Wilson flow , Zeuthen flow . We then expect the step scaling function computed using the Zeuthen flow to have only small corrections to an a 2 scaling for 2 = a 2 /(cL) 2 < 0.33. Our coarsest data set has L/a = 8 and c = 0.3, which implies 2 = 0.17. The difference in the bounds eq. (3.5) means that the more precise continuum limit is obtained for the Zeuthen flow. Despite the fact that cutoff effects for the Wilson flow are smaller, their complicated functional form makes extrapolations more difficult and less precise. In particular the coarser lattices used to determine the continuum step scaling function in the next section would have significant violations of the leading a 2 scaling if we were using the Wilson flow data.
However, one has to state that the a 2 corrections are sizable. Since neither the Zeuthen flow equation nor the evaluation of a classically improved observable introduce any a 2 cutoff effects, these remaining lattice artifacts are a consequence of the quantum corrections due to the initial condition of the flow equation at t = 0 and due to the action of the fluctuating fields in the path integral [30]. Whether there are practical ways to reduce these remaining a 2 effects substantially is an interesting problem that deserves further attention in the future.

Boundary O(a/L) effects
With our choice of SF boundary conditions eqs. (2.14, 2.3), the complete removal of O(a) cutoff effects requires not only the non-perturbative value of the coefficient c sw [29], but also the determination of the boundary coefficients c t ,c t . These are known only to one-loop for our choice of lattice action [31][32][33] and therefore we have to estimate the possible effects of higher order terms in the coupling. For this purpose it is convenient to recall that our GF coupling is defined at time-slice x 0 = T /2, and with our choice c = 0.3 and T = L the smearing radius is √ 8t = cL = 0.3T . Therefore we expect boundary effects to be suppressed, since our observable is localized at the center of the lattice, away from the boundaries. The issue was investigated in [14] with the conclusion that indeed these boundary contributions are small. Here we estimate the effect quantitatively and specifically for our observable.
We first quote the linear a-effects at leading order in perturbation theory. They are obtained by expanding the tree-level norm N in c t − 1, We have normalized by the one-loop contribution to c t , using the known c (1) t . In this way, ∆ ct Σ gives the effect in Σ if one takes as an uncertainty the one-loop term in the perturbative series of c t . As here the one-loop term is the last known one, this is exactly what we want to do in this work.
As a check on the use of perturbation theory, we performed simulations on our smallest lattice L/a = 8 atḡ 2 ∼ 4.5 with three different values of c t around the one-loop one. We found that the effective coefficient when we estimate it from a numerical derivative at our central simulation point The agreement with lowest order perturbation theory is good enough to just take eq. (3.8) as our estimate of the uncertainty.
We propagate (by quadrature) the full one-loop effect of this boundary counterterm eq. (3.8) to our error on Σ(u, a/L). Note that this effect is sub-dominant in comparison with our statistical accuracy. The corresponding uncertainty due toc t will be neglected since it is suppressed by a further power of g 2 .

Continuum extrapolations and the β-function
As already mentioned, the way to connect non-perturbatively the hadronic scale L had and the intermediate scale L 0 passes through the computation of the step scaling function. It is defined as the continuum limit of its lattice approximation, The condition m = 0 fixes the bare quark mass for each resolution a/L and each value of the bare coupling g 2 0 . The resulting function is denoted m cr (g 0 , a/L) and described in appendix A. The second condition,ḡ 2 GF (L) = u fixes g 0 for each value of u and resolution a/L considered. The doubled lattices, whereḡ 2 GF (2L) is determined, share the bare parameters with the L/a lattices.  16 3.976400 6.037 (14) 11.346(100) 1203 ∅ 11.346(124) Step scaling functions. At the specified β, we listḡ 2 (L) on the L/a-lattice obtained from the described fit as well asḡ 2 (2L) on the 2L/a-lattice. Their errors do not contain the uncertainty of c t . N ms and N Q refer to the measurements on the 2L/a-lattice Simulations with N Q = ∅ were carried out with the algorithm restricted to Q = 0. At β = 3.556470 and L/a = 16 we have two ensembles, with and without fixing the topology (both ensembles give compatible results and in columns 4,7 we quote as results the weighted avarage). The last column contains Σ(u, a/L) with u equal to the central value of column 3 and the full error obtained fromḡ 2 (L),ḡ 2 (2L) as well as the uncertainty of c t . Note that errors in columns 3 and 7 are correlated, as discussed in the text.

Strategy and data set
In practice these conditions have to be implemented by a tuning of the bare parameters such that the renormalized ones are fixed as described. We briefly explain our strategy to arrive at a precise tuning for a few appropriate values of u and the estimates of Σ.
1. The tuning of the bare mass m 0 was already carried out in [33] for the full range of bare couplings and a/L considered. In the continuum limit the chiral point of vanishing quark mass is unique; the a/L-dependence is a cutoff effect. However, in order to have a smooth extrapolation to the continuum limit, one first defines exactly which mass is set to zero at a fixed a/L and then determines the function m cr (g 0 , a/L). In the cited reference this task was carried out with high precision. As a result we can neglect any deviations from the exact critical line. The used functions m cr (g 0 , a/L) are listed in appendix A.
2. As a next step we performed 9 precise simulations with L/a = 16. These determine 9 values of u = v i , i = 1, . . . , 9, which we take as our prime targets to compute σ(v i ).
We further need values of β for L/a = 8, 12 such thatḡ 2 GF equals our target values v i . This is achieved by an interpolation of several simulations described in detail in appendix A.2. At this point we found for each L/a = 8, 12, 16 nine values of β where couplingsḡ 2 GF (L) match rather well. These β-values are listed in table 1. 3. We then carried out simulations on the doubled lattices at the same values of β, m 0 , see columns 4-6 in table 1. The data forḡ 2 GF (2L) in the table are estimates of the step scaling function Σ(u, a/L) at u =ḡ 2 GF (L) β,L/a . As our estimates forḡ 2 GF (L) we could take the numbers from the interpolation in step 2. These are simply the same as those at L/a = 16. However, in order to enhance the precision, we perform separately at each L/a = 8, 12 an interpolating fit to all available data of table 8. These fits determineḡ 2 GF (L) in table 1. Details on the very well determined interpolation are given in appendix A.2.
4. As a last step we propagate the errors ofḡ 2 GF (L) into those of Σ. As we will see in section 4.2.1 our non-perturbative data is well described by the functional form which suggests to use the derivative, ∂Σ/∂u = Σ 2 /u 2 for the error propagation. This yields the last column of table 1, where u is the central value ofḡ 2 GF without error. The difference of the errors in columns 4 and 7 is mostly due to the uncertainty of O(a) improvement, eq. (3.8); a small part of the uncertainty is also contributed by the propagated errors ofḡ 2 GF (L).
The last two rows in table 1 are from additional simulations performed with the aim of havingḡ 2 GF (2L) ≈ 11.3. They will also be useful below.

Continuum extrapolation of the step scaling function
The results at finite resolution need to be extrapolated to the continuum. It is apparent from table 1 that this is an essential step, since Σ changes by up to 20% in the accessible range of a/L -far outside the statistical errors. However, our investigation in section 3 showed that the cutoff effects are strongly dominated by the (a/L) 2 terms, which motivates extrapolations linear in this variable. Given the high precision which we achieve, this is a crucial part of this work, and a detailed analysis will follow. In particular, we first study the systematic effects in the continuum determination of σ(u) by performing independent extrapolations at 9 fixed values of u. These can transparently be illustrated by simple graphs.

σ(u) and systematic effects in the continuum extrapolations
Apart from the last two rows of table 1, the deviations ofḡ 2 GF (L) from the 9 target values v i (the ones at L/a = 16) are very small. We can therefore simply shift the data for Σ using eq. (4.3). The resulting data is shown in figure 3. Within the uncertainties, linearity in a 2 is perfect and we extrapolate by at each value v i . The quality of the fits is very good with a total χ 2 of 6.3 with 9 degrees of freedom. The fit parameters σ i , second column of table 2, are first estimates of the continuum step scaling function. It turns out that the non-perturbative results are well described by 1/σ i −1/v i ≈ −0.083 (see last two columns of table 2), which is the functional form of one-loop perturbation theory, but with a coefficient slightly different from the perturbative −0.0790. This surprising behavior holds out to σ(u) = O(10). We will come to a comparison with perturbation theory later. For now this suggests to fit also 1/Σ(v i , a/L) = 1/σ i + r i × (a/L) 2 . The quality of these fits is as good as the previous ones (χ 2 = 6.3 for 9 degrees of freedom). Discriminating statistically between the two fit forms would require far higher precision than we have. An implicit assumption behind eq. (4.4) and eq. (4.5) is that higher orders in a 2 are negligible. When this is the case, the fit-parameters σ i have to agree between the two fits (see table 2). There is agreement at the level of one standard deviation. However, the difference between the two extrapolations is of course systematic: σ i are always larger when they are extrapolated following eq. (4.5). This is also apparent in figure 3. Furthermore, when nonlinearities in a 2 are negligible, there is the more stringent condition r i = −r i /σ 2 i . As expected, we find more significant differences between these slope parameters 7 (see figure 4). Note that the difference between the functional forms of eq. (4.4) and eq. (4.5) is of order a 4 . Due to the relatively large O(a 2 ) effects, these are not negligible at large values of the coupling (at small values of u we have good agreement between both type of fits). It is this O(a 4 ) effect that produces a systematic shift in the parameters σ i , r i .
A fit of 1/σ(u) − 1/u to a constant provides a good description of our continuum data (χ 2 /dof < 1) in the whole range u ∈ [2.1 , 6.5]. Although the systematic difference between the continuum fits eq. (4.4) and eq. (4.5) was point by point in σ i below our statistical accuracy, the uncertainty in a constant fit to 1/σ(u) − 1/u is reduced by a factor 3 due to the fact that we use 9 independent values to determine it. The systematic effect then becomes clearly noticeable.

Fitting strategy
The previous considerations illustrate that the O(a 4 ) effects are not large, but still cannot simply be ignored. The size of the O(a 2 ) term, that amounts to 20% at the largest value of the coupling u max = 6.5 at L/a = 8, suggests that there the O(a 4 ) effects are around 5%. Taking into account that a u-independent term is removed by the normalization of the coupling, this translates into the rough scaling This systematic effect is negligible compared with our statistical accuracy for the lattices with L/a ≥ 12 at all values of u (in fact the differences seen in table 2 become insignificant when we perform the extrapolations with just L/a ≥ 12), but it becomes dominant at L/a = 8 and large values of u. When fitting to some particular functional form one performs a minimization of a χ 2 function, defined as where p α represent the parameters that describe the function f (x i ; p α ), and x i , y i are the independent and dependent variables, respectively. The weight, W i , of each data point, is usually taken from their uncertainty, but here we should take into account that we cannot expect our data to be more accurately described by a linear function in a 2 than ∆ sys Σ i . For the following we therefore define the weights by which strongly reduces the weights of the points further away from the continuum. Note that we distinguish the weights of the fits from the errors ∆Σ i of the data (statistical and the one due to the uncertainty in c t ), which enter the error propagation from the data to the parameters of the fit.
As an example for the consequences of introducing W i , we repeat fits eq. (4.4) and eq. (4.5). We obtain continuum values 1/σ(u) − 1/u which are still perfectly described by a constant, but now the values of the constants are −0.0824(5) and −0.0830(6), respectively. Comparing with the last row of table 2 we see that uncertainties have increased and central values are closer. Now both types of fits agree within one standard deviation.

Determination of σ(u)
As already noted, our non-perturbative data is very well described by an effective one-loop functional form. This suggests two strategies to determine the continuum step scaling function. First we can perform continuum extrapolations at constant values of u as suggested in the previous sections (eq. (4.4) and eq. (4.5)). The continuum values of σ(v i ) can then be fitted to a functional form The number of parameters n σ is varied in order to check the stability of the procedure. Second, one can also consider the possibility of combining the ansatz for the cutoff effects immediately with the parametrization of the continuum function σ(u) Apart from checking the stability of the procedure, advantages of this global fit are as follows. The shifts to common values of u for different a/L are not needed and the data in the last two rows of table 1 are easily included. Also more general forms of cutoff effects can be tried. Our investigation suggests that   4) with u 0 = 11.31 and scale factors s(g 2 1 , g 2 2 ) for g 2 1 = 2.6723, g 2 2 = 11.31 for different fits to cutoff effects and the continuum β-function. Fits are labelled by Σ or 1/Σ for continuum extrapolations according to eq. (4.4) or eq. (4.5) respectively while the parametrization of the continuum step scaling function is labelled as σ for σ(u) = u + s 0 u 2 + s 1 u 3 + u 3 nσ n=1 c n u n and labelled as Q for eq. (4.9). Fits to the β-function (eq. (4.12)) are labelled P . For global fits we specify n ρ , of eq. (4.11), while its absence indicates a fit of data extrapolated to the continuum at each value of u = v i . The weights W i refer to the definition of χ 2 , eq. (4.7).
is a good parametrization of ρ when at least n ρ = 2 terms are included. Figure 5 shows a comparison between the individual extrapolations at fixed u according to eq. (4.4) and eq. (4.5), and a global fit eq. (4.10) with n σ = n ρ = 2. We recall that all fits are performed with the weights of eq. (4.8).
A more quantitative test of the agreement between the σ obtained from different analysis is through the sequence u 0 = 11.31, u i≥1 , eq. (1.4). We collect this information in table 3. Once the polynomial is not too restricted, the results depend very little on the number of terms n σ since we use this polynomial interpolation only in the range where data are available.

Determination of the β-function
Since our main goal is the determination of the scale factor L had /L 0 (see eq. (1.6)) it is very convenient to replace the parametrization of σ(u) by a parametrization of the β-function. Namely, we write , P (g 2 ) = p 0 + p 1 g 2 + p 2 g 4 + . . . . The one-loop effective β-function just corresponds to the choice P (u) = p 0 , while higher order terms parameterize possible (obviously small) deviations useful for a more detailed analysis and an estimate of uncertainties. The step scaling function is then given by where n σ parameters correspond to n max = n σ − 2. The parameters p i , i = 0, . . . , n σ − 1 in eq. (4.12) can be obtained by fitting our data for σ(u) to eq. (4.13). Any of our previous methods to extrapolate the lattice step scaling function Σ(u, a/L) to the continuum can be used. In the case of the global fits, we make use of a further variant to parametrize the cutoff effects by fitting . (4.14) Note that this fit ansatz differs from other global fits only by terms O(a 4 ). Comparing the different approaches provides an additional check that these effects are under control (see discussion in sections 4.2.1, 4.2.2). Solving numerically eq. (4.13) for u we then compute the series of couplings u i . In table 3 we compare the results to those obtained via the parameterizations of the step scaling function. There is good agreement between different types of fits. Figure 6 shows a comparison of the β-function obtained with two different fits. Their agreement underlines that all uncertainties have been taken care of and that the small difference to the one-loop β-function is significant. At couplings g 2 ∼ 3 and larger, including the universal two-loop term, b 1 g 5 , in the β-function enlarges the difference. Therefore, perturbation theory is of little use in our range of couplings.
In the following we will use as our central result and uncertainty the fit in the last row of the   to the coupling in our GF scheme. More precisely, we define the function Recall that the SF coupling is defined with a background field, while the boundary conditions of our gradient flow scheme correspond to a zero background field. The connection between the couplings goes through the common bare parameters defined by the condition g 2 SF (L) = u, m = 0, together with the resolution a/L. We do not need the functional dependence on u, but rather just the single value ϕ(2.012). We combine the change of schemes SF → GF with a scale change by a factor of two, because this avoids the disadvantages of both schemes at the same time:ḡ 2 GF has noticeable cutoff effects when a/L is too small andḡ 2 SF needs very large statistics if L/a is too large. A last choice to make is the discretization. Here we choose the Wilson gauge action where the counter-terms (coefficients c t ,c t , see [1]) which cancel linear a effects are perturbatively known, such that they are suppressed to the negligible level of g 8 a/L. The action as well as the definition of the critical line m = 0 is exactly as in [1,34]. In fact, with the exception of L/a = 16, the numerical values of β, κ,ḡ 2 SF in table 4 are taken from there, interpolated to the fixed valueḡ 2 SF = 2.012. More details will be given elsewhere [34]. Our measurements of the GF coupling on the doubled lattices ("Zeuthen flow") are listed in table 4. The errors in the last column include the errors ofḡ 2 SF (L 0 ) (column 4 of table 4). Like for the step-scaling function in eq. (4.3), we use the derivative ∂ u Φ(u, a/L) Φ(u, a/L) 2 /u 2 for the Gaussian error propagation. The additional error does not depend very much on this particular ansatz and is subdominant, as can be also seen in table 4 and in figure 7 where the errors both before and after error propagation are shown.
The continuum extrapolation of Φ can be seen in figure 7. We also show results with the Wilson flow, but the Zeuthen flow eq. (2.18) has smaller cutoff effects. Due to the very Using our fits to the β-function, the scale factor s = L 2 /L 1 between g 2 =ḡ(L 2 ) and g 1 =ḡ(L 1 ) can be easily computed via log(s(g 2 1 , g 2 2 )) = Numbers for s from the various fits are shown in the last column of table 3. They refer to our default value g 2 2 =ḡ 2 GF (L had ) = 11.31 defining L had and g 2 1 =ḡ 2 GF (2L 0 ) = 2.6723 given by the central value of ϕ(2.012) determined above. The error of ϕ can be propagated straightforwardly, yielding L had /L 0 = 21.86 (42) .
MS = 0.0791(21)/L 0 is known [1], the last step on the way to a determination of the Λ-parameter in physical units is the computation of a physical observable of dimension mass in large volume and at the physical masses of the three quarks. This has to be combined with L had /a at identical bare couplings and extrapolated to a = 0. Passing this last milestone still needs input from the CLS ensembles [15].

Discussion
The main goal of this work was to connect the (technical) scales L 0 and L had precisely. This is one of the three steps leading to a determination of the three-flavor Λ-parameter in physical units. The precision of the result, eq. (5.6), is rather remarkable since such a scale ratio can only be determined through the running of a coupling and a step scaling strategy [3] -at least if one wants to obtain a purely non-perturbative result and a controlled continuum limit. Since couplings usually run relatively slowly it is necessary to determine this running with extreme precision in order to achieve the 2% accuracy on the scale ratio. Through the gradient flow [9] running coupling in a finite volume [12] we achieved excellent precision. However, scaling violations had to be dealt with very carefully. After applying systematic Symanzik improvement [30], they were still very significant, but we could show that they are rather accurately described by an a 2 behavior when the flow time, t, satisfies a 2 /(8t) < 0.3. Since we chose our lattice spacings small enough, we could extrapolate to the continuum with three resolutions. All in all this milestone on the way to a precise Λ-parameter has been passed.
Let us discuss also what else we have learned on the way. The behavior of the step scaling function, figure 5, is rather surprising. It follows the one-loop functional form very precisely, but with a coefficient slightly different from the universal perturbative one, out to large values of the coupling. For further details, one better considers the β-function ( figure 6). Here, the non-perturbative result is in the middle between one-loop and twoloop at our smallest coupling, α =ḡ 2 GF (L)/(4π) = 0.17. Describing this by higher order perturbation theory requires a large three-loop coefficient and therefore signals the breaking down of perturbation theory at this coupling or close by. One might consider the statistical significance at the weakest coupling in figure 6 insufficient for a strong conclusion, but the effect becomes increasingly significant at larger α. For example at α = 0.25, still a coupling where perturbation theory is routinely used, the non-perturbative running is many standard deviations away from two-loop. Perturbation theory has broken down.
This finding reinforces what we saw before in the SF-schemes, where the region below α ≈ 0.2 was studied [1]. The β-function in one of the schemes (ν = 0) discussed in [1] is close to the known three-loop one, while other schemes are significantly off. In figure 8 we plot it together with the GF-scheme used in this paper. For the GF-scheme we show only the range of couplings covered by our data. In contrast, for the SF-scheme, we show it all the way to g = 0, since the connection to the asymptotic perturbative behavior was convincingly established. The figure provides a warning that perturbation theory needs to be applied with great care in the sense that its asymptotic nature should not be forgotten. For more details we refer to [1]. The figure also summarizes well where we stand concerning the determination of Λ. "Only" the very low energy connection of the GF-scheme to the hadronic world remains to be carried through. The CLS simulations will allow us to achieve this with an estimated 1 − 1.5% precision [15,35,36]. For now, let us just mention that a combination of the rough resultḡ 2 GF (L) = 11 for β = 3.55, L/a = 16 with the lattice spacing of [15] yields L had = 1 fm. We have therefore computed the running in a range of µ from around 200 MeV to 4 GeV.  Table 5: Parameter r a determining the interval of the Zolotarev approximation (we always choose r b = 7.5), and the corresponding number of poles, N poles . We also report the measured values λ a , λ b in our runs.

A.1 Algorithms, simulation parameters, and autocorrelations
In this work we simulated with a modified version of the openQCD v1.0 package [26], using a Hasenbusch-type splitting of the quark determinant for two of our mass-degenerate quarks [38,39], and an RHMC [40,41] for the third one. Apart from boundary terms, we have the same action as CLS. The interested reader may find it useful to consult [15], where those simulations are described. Here we focus on some peculiarities of our finite volume simulations: the projection to the zero topological charge sector, the scaling of the spectral gap of the Dirac operator, and the behavior of the integrated autocorrelation times of the renormalized coupling. The latter characterize the performance of the algorithm and hence the effort which we put into the computation.

A.1.1 Algorithms
An important speed up in HMC simulations is gained by splitting the contribution of two of the quarks, (det D) 2 = det(D † D), into several factors [38], and representing each factor by a separate pseudo-fermion field. For our expensive simulations with L/a = 24, 32, we used three factors. More precisely, the splitting is characterized by the mass-parameters: µ 0 = 0, µ 1 = 0.1, and µ 2 = 1.2, in the notation of [26]. Having µ 0 = 0 means that twisted mass reweighting, which is also implemented in the package, is not used. We find that this   is not necessary, as the finite volume operator D † D has a sufficiently stable gap. We shall show results for the gap in appendix A.1.2.
A peculiar aspect of our finite volume renormalization scheme is that we are only interested in expectation values obtained in the zero topological sector (see eq. (2.6)). On the lattice, this is implemented by using the definition of the topological charge at positive flow time (see eqs. (2.21-2.22) and [42] for more information). We explored two possibilities in order to obtain these expectation values: Algorithm A: Use a standard simulation and include the termδ(Q), eq. (2.22), as part of the definition of the observable.
Algorithm B: Include the factorδ(Q) as part of the Boltzmann weight and generate an ensemble that only contains configurations with |Q| < 0.5. This is easily implemented by adding an accept/reject step after each trajectory.
In our tables we use N Q to denote the number of configurations that have |Q| ≥ 0.5, and therefore do not contribute to the determination of expectation values. The symbol ∅, instead, denotes an ensemble produced with Algorithm B. These ensembles have |Q| < 0.5 throughout. A downside of Algorithm B is that the acceptance rate can drop significantly below 1, with our lowest value being 0.65. This low acceptance rate is due to attempts of the algorithm to enter other topological sectors, and not to large violations of the HMC energy conservation. As this happens only at the coarse lattice spacings, one can usually choose an efficient algorithm between A and B for a given choice of parameters; at least in the range we considered.  Table 6: Integrated autocorrelation times measured in the simulations performed to determine the step scaling function Σ. Measurements ofḡ 2 GF (L) with L/a = 16, 24 were separated by 10 MDU's, while those at L/a = 32 are separated by 20 MDU's. Accordingly some of the determined τ int are below one in units of measurements, which introduces a (small) bias. Values marked with an * corresponds to ensembles generated with Algorithm B.

A.1.2 Rational approximation and spectral gap of the Dirac operator
The RHMC algorithm uses a Zolotarev approximation [43] in the interval [r a , r b ] for the operator R = (D †D ) −1/2 , which enters the decomposition, HereD = D ee − D eo D −1 oo D oe denotes the even-odd preconditioned Dirac operator, and 1 e is the projector to the subspace of quark fields that vanish on the odd sites of the lattice. The operators D ee , D eo , D oo , and D oe refer to the even-even, even-odd, odd-odd, and odd-even parts of the Dirac operator, respectively. The residual factor W = det(DR), instead, is considered as a reweighting factor which corrects possible (small) errors in the approximation of R; we estimate this using two random sources (cf. rhmc.pdf of the documentation of the openQCD package for more detail information). The precision of our rational approximations with parameters in table 5 is very high. Consequently the reweighting taking into account the factor W has very little effect. Figure 9 summarizes the values for the smallest, λ min a , and largest, λ max b , eigenvalues of D †D , measured during our most challenging runs (those with sizes L/a = 24, 32). More quantitative information is found in table 5. The main conclusion is that even at the largest volumes, our choice of boundary conditions ensures the existence of a gap in the Dirac operator, and with our chosen values of r a , r b the simulations are safe.

A.1.3 Scaling of autocorrelation times
Once more we focus on the more challenging simulations and discuss the scaling of the integrated autocorrelation times in our simulations with lattice sizes L/a = 16, 24, 32. Table 6 shows the autocorrelation times, determined as in [44] in molecular dynamic units, while fig. 10 indicates that they roughly follow the expected scaling with a −2 [45] at constantḡ 2 GF (L) i.e. in fixed physical volume. Even the deviations from scaling seen at the larger coupling have a plausible explanation in terms of a correction to scaling. When the lattice spacing is bigger than around 0.05 fm, the standard HMC still shows topological activity [22]. Algorithm B will therefore have a number of attempts to change topology, which increases with the lattice spacing. These attempts are vetoed by the acceptance step, reducing the acceptance rate and increasing the autocorrelations. This easily explains the three highest lying points in the figure, but of course the quality of the data is not good enough for a quantitative statement. The length of our Monte Carlo chains is always between 200 τ int and 2000 τ int . Despite the expectation that autocorrelations will eventually scale rather differently in large volume compared to our situation with Schrödinger functional boundary conditions, the longest autocorrelation times of our finite volume simulations are comparable to the longest ones observed in large volume in [15].

A.1.4 The critical lines
Since we work in a massless renormalization scheme, we need to define and know the critical line in the space of bare lattice parameters (β, κ, L/a); or equivalently (g 2 0 , am 0 , L/a) with The critical line, am 0 = am cr (g 0 , a/L), is defined by m 1 = 0, where m 1 is a current quark mass in an (L/a) 4 lattice. Making m cr dependent on L/a in this way and using the same L/a in this definition as in Σ, the cutoff effects are guaranteed to disappear as O(a 2 ) in the improved theory. Details on m 1 as well as on the many precise simulations done to find the critical lines by interpolation can be found in [33]. For completeness we here list the results needed to compute m cr . In Table 7 we provide the coefficients of the interpolating functions for the critical lines, am cr (g 0 , a/L) =   at a given value of L/a, valid for all values of g 2 0 used in this paper. These parameterizations guarantee m 1 L < 0.005. With these coefficients the reader can reconstruct the input massparameter κ cr corresponding to our simulations.
A.2 Tuning to selected couplings.
In table 8 we collect our raw data forḡ 2 GF (L) on the small lattices. As explained in the main text, we make maximum use of these data by performing smooth interpolations for L/a = 8, 12. This enables a very precise determination ofḡ 2 GF (L) for those bare parameters where we have computedḡ 2 GF (2L). At fixed L/a we fit v(β) = 1/ḡ 2 GF (L) , (A.4) to a Padé ansatz of degrees [n 1 , n 2 ], v(β) = n 1 n=0 a n β n 1 + n 2 n=1 b n β n , (A.5) and obtain predictionsḡ 2 GF (L) at the desired β from the fit and their errors from the covariance matrix of the fit parameters.
In fig. 11 we show a couple of typical fits of all the L/a = 8 data to a [4, 0] Padé and a [1,2] one. These fits have a good quality. Other fit functions were tested with the result that, once the fits have a reasonable number of degrees of freedom and a good χ 2 , the interpolated values ofḡ 2 GF (L) are entirely stable within their errors. This holds also for the L/a = 12 data. As final description of our L/a = 8, 12 data we use the