Domain wall QCD with physical quark masses

We present results for several light hadronic quantities ($f_\pi$, $f_K$, $B_K$, $m_{ud}$, $m_s$, $t_0^{1/2}$, $w_0$) obtained from simulations of 2+1 flavor domain wall lattice QCD with large physical volumes and nearly-physical pion masses at two lattice spacings. We perform a short, O(3)%, extrapolation in pion mass to the physical values by combining our new data in a simultaneous chiral/continuum `global fit' with a number of other ensembles with heavier pion masses. We use the physical values of $m_\pi$, $m_K$ and $m_\Omega$ to determine the two quark masses and the scale - all other quantities are outputs from our simulations. We obtain results with sub-percent statistical errors and negligible chiral and finite-volume systematics for these light hadronic quantities, including: $f_\pi$ = 130.2(9) MeV; $f_K$ = 155.5(8) MeV; the average up/down quark mass and strange quark mass in the $\bar {\rm MS}$ scheme at 3 GeV, 2.997(49) and 81.64(1.17) MeV respectively; and the neutral kaon mixing parameter, $B_K$, in the RGI scheme, 0.750(15) and the $\bar{\rm MS}$ scheme at 3 GeV, 0.530(11).


I. INTRODUCTION
The low energy details of the strong interactions, encapsulated theoretically in the Lagrangian of QCD, are responsible for producing mesons and hadrons from quarks, creating most of the mass of the visible universe, and determining a vacuum state which exhibits symmetry breaking. For many decades, the methods of numerical lattice QCD have been used to study these phenomena, This isospin symmetric version of QCD requires three inputs to perform a simulation at a single lattice spacing: a bare coupling constant, a degenerate light quark mass (m u = m d ), and a strange quark mass. We fix these using the physical values for m π , m K , and m Ω . In particular, for a fixed bare coupling, adjusting m u = m d and m s until m π /m Ω and m K /m Ω take on their physical values leads to a determination of the lattice spacing, a, for this coupling. All other low energy quantities, such as f π and f K , are now predictions. By repeating this for different lattice spacings, physical predictions in the continuum limit (a → 0) for other low energy QCD observables are obtained. In this work, we used results from our earlier simulations to estimate the input physical quark masses and then we make a modest correction in our results, using chiral perturbation theory and simple analytic ansatz, to adjust to the required quark mass values, a correction of less than 10% in the quark mass. These physical quark mass simulations would not have been possible without IBM Blue Gene/Q resources [1][2][3][4].
For the past decade, the RBC and UKQCD collaborations have been steadily approaching the physical quark mass point with a series of 2+1 flavor domain wall fermion simulations. Recently [5] we reported on a combined analysis of three of our domain wall fermion ensembles with the Shamir kernel, namely our 32 3  MeV partially-quenched and 171(1) MeV unitary. We refer to these as our 32I, 24I and 32ID ensembles, respectively. (The lattice spacings and other results for these ensembles quoted here come from global fits that include the new, physical quark mass ensembles, as well as new observable measurements on these older ensembles. As such, central values have shifted from earlier published values, generally within the published errors. Also, the new errors are smaller, because of the increased data.) For the latter 32ID ensembles, the use of a coarser lattice represented a compromise between the need to simulate with a large physical volume in order to keep finite-volume errors under control in the presence of such light pions and the prohibitive cost of increasing the lattice size. The DSDR term was used to suppress the dislocations in the gauge field that dominate the residual chiral symmetry breaking in the domain wall formulation at strong coupling.
The addition of this ensemble set resulted in a factor of two reduction in the chiral extrapolation systematic error over our earlier analysis of the Iwasaki ensembles alone (24I and 32I) [6], but the total errors on our physical predictions remained on the order of 2%. Now, combining algorithmic advances with the power of the latest generation of supercomputers, we are finally able to perform large volume simulations directly at the physical point without the need for such compromises.
In this paper we present an analysis of two 2+1 flavor domain wall ensembles simulated essentially at the physical point. The lattice sizes are 48 3 × 96 and 64 3 × 128 with physical volumes of (5.476 (12) fm) 3 and (5.354 (16) fm) 3 (m π L = 3.86 and 3.78). Throughout this document we refer to these ensembles with the labels 48I and 64I respectively. We utilize the Möbius domain wall action tuned such that the Möbius and Shamir kernels are identical up to a numerical factor, which allows us to simulate with a smaller fifth dimension, and hence a lower cost, for the same physics. This is discussed in more detail in Section II. The values of L s are 24 and 12 for the 48I and 64I ensembles respectively. For the 48I ensemble, L s would have to be more than twice as large to achieve the same residual mass with the Shamir kernel. The corresponding residual masses, m res , comprise ∼ 45% of the physical light quark mass for the 48I ensemble, and ∼ 30% for the 64I. We use the Iwasaki gauge action with β = 2.13 and 2.25, giving inverse lattice spacings of a −1 = 1.730(4) GeV and 2.359(7) GeV, and the degenerate up/down quark masses were tuned to give (very nearly) physical pion masses of 139.2(4) MeV and 139.2(5) MeV.
We also introduce a third ensemble generated with Shamir domain wall fermions and the Iwasaki gauge action at β = 2.37, corresponding to an inverse lattice spacing of 3.148(17) GeV, with a lattice volume of 32 3 × 64 and with L s = 12. The lightest unitary pion mass is 371(5) MeV.
Although these masses are unphysically heavy, this ensemble provides a third lattice spacing for each of the measured quantities, allowing us to bound the O(a 4 ) errors on our final results. We label this ensemble 32Ifine.
We have taken full advantage of each of our expensive 48I and 64I gauge configurations by developing a measurement package that uses EigCG to produce DWF eigenvectors in order to deflate subsequent quark mass solves, and that uses the all-mode-averaging (AMA) technique of Ref. [7].
In AMA, quark propagators are generated on every timeslice of the lattice but with reduced precision, and then corrected with a small number of precise measurements. To reduce the fractional overhead of calculating eigenvectors and the large I/O demands of storing them, we share propagators between m π , m K , f π , f K , B K , the K l3 form factor f Kπ + (q 2 = 0) and the K → (ππ) I=2 amplitude. (The last two quantities are not reported here.) By putting so many measurements into a single job, the EigCG setup costs are only ∼ 20% of the total time, and we find this approach speeds up the measurement of these quantities by between 5 and 25 times, depending on the observable.
Here again the Blue Gene/Q has been invaluable, since it has a large enough memory to store the required eigenvectors and the reliability to run for sufficient time to use them in all of the above measurements. In Section III we present the results of these measurements.
As mentioned already, in order to correct for the minor differences between the simulated and physical pion masses, we perform a short chiral extrapolation. As these new 48I and 64I ensembles have essentially the same quark masses, we must include data with other quark masses in order to determine the mass dependences. We achieve this by combining the 64I and 48I ensembles with the aforementioned 32 3 × 64 and 24 3 × 64 Iwasaki gauge action ensemble sets (32I and 24I, respectively), and the 32 3 × 64 Iwasaki+DSDR ensemble set (32ID), in a simultaneous chiral/continuum 'global fit'. We also include the new 32Ifine ensemble, to give us a third lattice spacing with the same action, to improve the continuum extrapolation. We note that these are the same kinds of fits we have used in our previous work with the 24I, 32I and 32ID ensembles -here we have the addition of very accurate data at physical quark masses. In addition, we also have added Wilson flow measurements of the scale on all of our ensembles to the global fits. While the Wilson flow scale in physical units is an output of our simulations, the relative values on the various ensembles provide additional accurate data that helps to constrain the lattice spacing determinations. In Section IV we discuss our fitting strategy in more detail and the fit results are presented in Section V.
Given the length of this paper and the many details discussed, we present a summary of our physical results in Table I as the last part of this introduction. These are continuum results for isospin symmetric 2+1 flavor QCD without electromagnetic effects. Our input values are m π , m K , m Ω , and the results in Table I are outputs from our simulations. For results quoted in the MS scheme, the first error is statistical and the second is the error from renormalization. For other quantities, the error is the statistical error. The other usual sources of error (finite volume, chiral extrapolation, continuum limit) have all been removed through our measurements and any error estimates we can generate for these possible systematic errors are dramatically smaller than the (already small) statistical error quoted. This is discussed at great length in Section V. The which for most quantities is much larger than any systematic error we can measure or estimate. The exception is for the quantities in MS andB K . For these quantities, the second error is the systematic error on the renormalization, which is dominated by the perturbative matching between the continuum RI-MOM scheme and the continuum MS scheme.
The layout of this document is as follows: In Section II we present the details of our new en-sembles, including a more general discussion of the Möbius domain wall action. The associated simulated values of the pseudoscalar masses and decay constants, the Ω-baryon mass, the vector and axial current renormalization factors, the neutral kaon mixing parameter, B K , and the Wilson flow scales, t 1/2 0 and w 0 , are given in Section III. In Section IV we provide an overview of our global fitting procedure for those quantities, the results of which are given in Section V. Finally, we present our conclusions in Section VI.

II. SIMULATION DETAILS AND ENSEMBLE PROPERTIES
Substantial difficulties must be overcome in order to work with physical values of the light quark mass. Common to all fermion formulations are the challenges of increasing the physical spacetime volume to avoid the large finite-volume errors that would result from decreasing the pion mass at fixed volume. Similarly, the range of eigenvalues of the Dirac operator increases substantially, requiring many more iterations for the computation of its inverse and motivating the use of deflation and all-mode-averaging to reduce this computational cost. For domain wall fermions it is also necessary to decrease the size of the residual chiral symmetry breaking to reduce the size of the residual mass to a level below that of the physical light quark masses. While this could have been accomplished using the Shamir domain wall formulation [8,9] used in previous RBC and UKQCD work, this would have required a doubling or tripling of the length of the fifth dimension, L s , at substantial computational cost.
Instead, our new, physical ensembles have been generated with a modified domain wall fermion action that suppresses residual chiral symmetry breaking, resulting in values for the residual mass that lie below that of the physical light quark, but without the substantial increase in L s that would have been required in the original domain wall framework.
We use the Möbius framework of Brower, Neff and Orginos [10][11][12]. Although the action has been changed, we remain within the subspace of the Möbius parametrization that preserves the L s → ∞ limit of domain wall fermions. The changes to the Symanzik effective action resulting from this change in fermion formulation can be made arbitrarily small and are of the same size as the observed level of residual chiral symmetry breaking. As discussed in Section II A, we are therefore able to combine our new ensembles in a continuum extrapolation with previous RBC and UKQCD ensembles.

A. Möbius fermion formalism
In this section and in Appendix A we describe the implementation of Möbius domain wall fermions, and provide a self-contained derivation of many of the properties of this formulation on which our calculation depends.
Of central importance is the degree to which the present results from the Möbius version of the domain wall formalism can be combined with those from our earlier Shamir calculations when taking a continuum limit. As reviewed below and in Appendix A, the Shamir  and therefore from each other, vanish in this chiral limit. Note, this equivalence in the chiral limit holds for both the fermion determinant that is used to generate the gauge ensembles (shown below) and for the 4-D propagators (shown in Appendix A) which determine all of the Green's functions which appear in our measurements and define our lattice approximation to QCD.
Thus, we expect that all details of the four dimensional approximation to QCD defined by the Shamir and Möbius actions must agree in the limit L s → ∞ and, in our case of finite L s , will show differences on the order of the residual chiral symmetry breaking, the most accessible effect of finite L s . Since this constraint holds at finite lattice spacing, we conclude that the coefficients of the O(a 2 ) corrections which appear in the four-dimensional, effective Symanzik Lagrangians for the Shamir and Möbius actions should agree at this same, sub-percent level, allowing a consistent continuum limit to be obtained from a combination of Shamir and Möbius results.
To understand this argument in greater detail, it is useful to connect the Shamir and Möbius theories in two steps. We might first discuss the relation between two Shamir theories: one with a smaller L s and larger residual chiral symmetry breaking, and a second with a larger value of L s and a value for m res below the physical light quark mass. In the second step we can compare this large L s Shamir theory with a corresponding Möbius theory that has the same approximate degree of residual chiral symmetry breaking. For example, when comparing our β = 2. 13 Shamir and Möbius ensembles, we might begin with our 24 3 × 64, L s = 16, 24I ensemble with m res a = 0.003154 (15) which is larger than the physical light quark mass. Next we consider a fictitious, L s = 48 ensemble which should have a value of m res very close to the 0.0006102 (40) value of our 48I Möbius ensemble. In this comparison we would work with the same Shamir formalism and simply approach the chiral limit more closely by increasing L s from 16 to 48. Clearly the 5× reduction in the light quark mass will produce a significant change in the theory, which to a large degree should be equivalent to reducing the input quark mass in a theory with a large fixed value of L s . Of course, there will be smaller changes as well. In addition to reducing the size of m res , we will also reduce the size of the dimension-five, O(a) Sheikholeslami-Wohlert term (whose effects are expected to be at the m res a 2 ≤ 0.1% level even for the smaller value of L s ). There will be further small changes coming from approaching the L s → ∞ limit, for example the 3% change in the lattice spacing discussed in Appendix C.
The second comparison can be made between the fictitious L s = 48 Shamir ensemble and our actual 48I Möbius ensemble with L s = 24 and b + c = 2. Since the product of L s (b + c) is the same for these two examples, the approximate sign function will agree for eigenvalues of the kernel H M which are close to zero. In fact, a study of the eigenvalues λ of H M for the Shamir normalization shows that they lie in the range 0 ≤ λ ≤ 1.367 (14) for β = 2.13. One can then examine the ratio of the two approximate sign functions, which determine the corresponding 4-D Dirac operators, over this entire eigenvalue range and show that the approximate Shamir and Möbius sign functions ε(H M ) agree at the 0.1% level. Thus, in this second step we are comparing two extremely similar theories whose description of QCD is expected to differ in all aspects at the 0.1% level. We now turn to a detailed discussion of the Shamir and Möbius operators and their relation to the overlap theory.
Our conventions are as follows. The usual Wilson matrix is where For our physical point ensembles we use a generalized form of the domain wall action [10][11][12], where and we define This generalized set of actions reduces to the standard Shamir action in the limit b = 1, c = 0, and it can also be taken to give the polar approximation to the Neuberger overlap action as another limiting case [13,14]. In all of our simulations we take the coefficients b and c as constant across the fifth dimension. This setup is well known to yield a tanh approximation to the overlap sign function. Coefficients that vary across the fifth dimension can also be used to introduce other rational approximations to the sign function, such as the Zolotarev approximation [15][16][17].
As in the Shamir domain wall fermion formulation we identify "physical", four-dimensional quark fields q andq whose Green's functions define our domain wall fermion approximation to continuum QCD. We choose to construct these as simple chiral projections of the five-dimensional fields ψ andψ which appear in the action given in Eq. (3): While there is considerable freedom in this choice of the physical, four-dimensional quark fields, as is shown in Appendix A, this choice results in four-dimensional propagators which agree with those of the corresponding overlap theory up to a contact term in the L s → ∞ limit. This choice is also dictated by the requirement that we be able to combine results from the present, physical point calculation with earlier results using Shamir fermions in taking a continuum limit. With this choice both the Möbius and Shamir theories will yield 4-dimensional fermion propagators which differ only at the level of the residual chiral symmetry breaking. The choice of physical quark fields given in Eq. (6) has the added benefits that the corresponding four-dimensional propagators satisfy a simple γ 5 hermiticity relation and a hermitian, partially-conserved axial current can be easily defined.
In practice, one solves for physical quark propagators using the linear system To find the 4d effective action which corresponds to our choice of physical fields we must first perform some changes to the field basis as follows. We write where, for now leaving a matrix Q − undefined, Then withH and H − = γ 5 D − , H + = γ 5 D + we may write We may choose Q − to place the matrix D 5 χ in a particularly convenient form as follows, and introduce the so-called transfer matrix as Here the Möbius kernel is We find D 5 χ takes the following form, for which we can perform a UDL decomposition around the top left block: Here, the Schur complement is Denoting the left and right factors as U and L(m) respectively, we write this factorization as D 5 χ = U D S (m)L(m). The determinants of the U and L(m) are unity, and the determinant of the product is simply where We can see that after the removal of the determinant of the Pauli Villars fields with m = 1 in our ensembles we are left with the determinant of an effective overlap operator, which is the following rational function of the kernel: We identify D ov as an approximation to the overlap operator with approximate sign function Note that since sgn(H M ) = sgn(αH M ) for all positive α, changing the Möbius parameters b + c while keeping b − c = 1 fixed leaves our kernel H M proportional to the kernel for the Shamir formulation. This therefore changes only the approximation to the overlap sign function, but not the form of the L s → ∞ limit of the action.
In this way, our new simulations with the Möbius action will differ from those with Shamir domain wall fermions only through terms proportional to the residual chiral symmetry breaking. In particular the change of action is not fundamentally different from simulating with a different L s .
Other, equivalent views of this approximation to the sign function are useful. Noting we see that since we have and for this reason our approximation to the sign function is often called the tanh approximation.
For eigenvalues of H M near zero, this tanh expression becomes a poor approximation to the sign function and it is for these small eigenvalues that the largest contributions to residual chiral symmetry breaking typically occur. For small eigenvalues λ of H M , the tanh approximation is a steep, but not discontinuous, function at λ = 0. Examining Eq. (24) one can easily see that which approaches the discontinuity of the sign function only as L s → ∞. The quality of the sign function approximation for small eigenvalues can be improved by either increasing L s (at a linear cost) or by increasing the Möbius scale factor α = b + c while keeping b − c = 1 (close to costfree), or both. One concludes that the scale factor b +c should be increased to the maximum extent consistent with keeping the upper edge of the spectrum of H M within the bounded region in which ε(H M ) is a good approximation to the sign function. In the limit of large L s a simulation with (b + c) > 1 will have the same degree of chiral symmetry breaking as a simulation in which that scale factor has been set to one but with L s increased to L s (b + c).
In Appendix A we continue the above review of the relation between the DWF and overlap operators, demonstrating the equality of the Shamir and Möbius four-dimensional fermion propagators in the limit L s → ∞. We also introduce a practical construction of the conserved vector and axial currents for Möbius fermions, appropriate for our choice of physical fermion fields.

B. Simulation parameters and ensemble generation
We generated three domain wall ensembles with the Iwasaki gauge action. The 48I and 64I ensembles were generated with Möbius domain wall fermions and with (near-)physical pion masses, and the 32Ifine ensemble was generated with Shamir DWF and with a heavier mass but finer lattice spacing. The results from previous fits to our older ensembles were used to choose the input light and strange quark masses to the simulations. The input parameters are listed in Table II [18], and n i are the number of steps comprising a single update of the corresponding action. The coarsest time-steps are at level 1, and the step sizes are chosen such that the total trajectory length is 1 MD time unit. More detail regarding the notation and integrators can be found in Appendix A of Ref. [5].

C. Ensemble properties
In Figure 1 we plot the Monte Carlo evolution of the topological charge, plaquette, and the light quark scalar and pseudoscalar condensates, after thermalization. We measured the topological charge by cooling the gauge fields with 60 rounds of APE smearing (smearing coefficient 0.45), and then measured the field-theoretic topological charge density using the 5Li discretization of Ref. [19], which eliminates the O(a 2 ) and O(a 4 ) terms at tree level. In Figure 2 we plot histograms of the topological charge distributions.
In Figure 3 we plot the integrated autocorrelation time for the same observables on the 32Ifine, 48I, and 64I ensembles as a function of the cutoff in Molecular Dynamics (MD) time separation, ∆ cutoff : where is the autocorrelation function associated with the observable Y (t). The mean and variance of Y (t) are denoted Y and σ 2 , and ∆ is the lag measured in MD time units. The error on the integrated autocorrelation time is estimated using a method discussed in our earlier paper [5]: for each fixed ∆ in Eq. (31) we bin the set of measurements Y (t) −Y Y (t + ∆) −Y over neighboring configurations and estimate the error on the mean · · · t by bootstrap resampling. We then increase the bin size until the error bars stop growing, which we found to correspond to bin sizes of 960, 100, For all quantities considered, we observe that the chosen bin sizes are sufficient to account for the autocorrelations suggested by Figure 3. We also observe a significant decrease in the rate of tunneling between configurations with different topological charge as the lattice spacing becomes finer, as evidenced by the long autocorrelation time on the 32Ifine ensemble.
After generating our ensembles we discovered that there are spurious correlations between U(1) random numbers generated by the CPS random number generator with a new seed. Fortunately, as discussed in Appendix G, we determined that the correlation present in the freshly-seeded RNG state was lost during thermalization, and consquently that this had no measurable effect on our thermalized gauge configurations or measurements.

III. SIMULATION MEASUREMENT RESULTS
In this section we present the results of fitting to a number of observables on the 48I and 64I en- The ability to generate physical mass ensembles forced us to seek dramatic improvements in our measurement strategy, since the statistical error for kaon observables increases with decreasing light quark mass (holding the strange quark mass fixed). For an example of this behavior, consider the kaon two-point function, which in the limit of large t goes as where .. is the average over the gauge field ensemble. The standard deviation on this quantity, i.e.
its statistical error, goes as C 2 (t) , which contains two strange quark propagators and two light quark propagators. This quantity can also be represented as a linear combination of exponentially decaying terms: where m ss is the mass of the ss state. The signal-to-noise ratio goes as exp (−[m K − (m ss + m π )/2]t) in the large time limit, and therefore decays faster with lighter pions.
The first component of our measurement strategy involves maximally reusing propagators for all of our measurements, which include m π , m K , m Ω , f π , f K , B K , f Kπ + (0) and K → (ππ) I=2 . (Note that the latter two quantities are not reported on in this document.) Reusing propagators requires choosing a common source for our propagators that remains satisfactory across the entire range of measurements. Also, since we measure both two-and three-point functions, we need to be able to control the spatial momentum of the sources in order to project out unwanted momenta.
We performed numerous studies of Coulomb gauge fixed wall sources and Coulomb gauge fixed box sources for many of these observables. (The box sources were generically chosen so that an integer multiple of their linear dimension would fit in the lattice volume, allowing us to obtain zero momentum projections by using all possible box sources.) While the box sources showed faster projection onto the desired ground state, the statistical errors on the wall sources were much smaller, such that the errors on the measured quantities per unit of computer time were essentially the same. From these studies, we chose to use the simple Coulomb gauge fixed wall sources.
In previous work on the η − η ′ mass, which involves disconnected quark diagrams, we found that translating n-point functions over all possible temporal source locations reduced the error essentially as the square-root of the number of translations [20]. The calculation of such a large number of quark propagators on a single configuration can be accomplished much more quickly by a deflation algorithm. The EigCG algorithm [21] was used for K → (ππ) I=0,2 measurements at unphysical kinematics in Ref. [22], and was adopted for this calculation. Measurements were again performed for all temporal translations of the n-point functions, and a factor of 7 speed-up was achieved. The major drawback of EigCG is the considerable memory footprint. However, BG/Q partitions have large memory and therefore this issue can be managed. In practice we found that only a fraction of the vectors generated by the EigCG method were good representations of the true eigenmodes, and in future we may be able to reduce the CG time further by pre-calculating exact low-modes using the implicitly restarted Lanczos algorithm with Chebyshev acceleration [23].
An alternative approach to generating a large number of quark propagators is to use inexact deflation. [24]. This approach had not been optimally formulated for the domain wall operator when the measurements on our new ensembles were begun. However, a new formulation of inexact deflation appropriate to DWF, known as HDCG [25], has since been developed, and has been shown to be more efficient than EigCG; this technique is now being used for our valence measurements.
The final component of our measurement package is the use of the all-mode averaging (AMA) [7] method to further reduce the cost of translating the propagator sources along the temporal direction. AMA is a generalization of low-mode averaging, in which one constructs an approximate propagator using exact low eigenmodes and a polynomial approximation to the high modes obtained by applying deflated conjugate gradient (CG) to a source vector on each temporal slice and averaging over the solutions. The stopping condition on the deflated conjugate gradient can generally be relaxed, reducing the iteration count. The remaining bias in the observable is corrected using a small number of exact solves obtained using the low modes and a precise deflated CG solve from a single timeslice for the high-mode contribution. The benefit of this procedure is that the CG solves used for the polynomial approximation can be performed very cheaply using inexact 'sloppy' stopping conditions of 10 −4 or 10 −5 as many of the low modes are already projected out exactly. The net result of combining the sloppy translated solution with the (typically small) bias correction is an exact result calculated many times more cheaply than if we were to perform precise deflated solves on every timeslice.
In order to avoid any bias due to the even-odd decomposed Dirac operator used in the CG, we calculate the eigenvectors using EigCG on a volume source spanning the entire four-dimensional volume, and the temporal slices where we perform the exact solves are chosen randomly for each configuration. We calculate low modes in single precision using EigCG in order to reduce the memory footprint, and also perform the sloppy solves in single precision. For the exact solves, we achieve double precision accuracy through multiple restarts of single precision solves, restarting the solve by correcting the defect as calculated in double precision. For the zero-momentum strange quark propagators required, we do a standard, accurate CG solve for sources on every timeslice. On the 48I, we performed our measurements using single rack BG/Q partitions (1024 nodes), calculating 600 low modes with EigCG (filling the memory) and running continuously for 5.5 days. (Note that this timing includes non-zero momentum light quark solves for measurements of f K + and additional light quark solves for K → (ππ) I=2 , which are not reported in this document.) For the 64I ensemble, the measurements were performed on between 8 and 32 rack BG/Q partitions at the ALCF and 1500 low modes were calculated by EigCG. On a 32 rack partition, the latter took 5.3 hours and the solver sustained 1 PFlops. (The EigCG setup time is efficiently amortized in these calculations by using the EigCG eigenvectors to deflate a large number of solves.) The Coulomb gauge-fixing matrices for the 64I ensemble were not computed on the BG/Q and were instead determined separately (and more quickly) on a cluster, using the timesliceby-timeslice Coulomb gauge FASD algorithm [26].
We simultaneously fit the residual mass, pseudoscalar masses and decay constants, axial and vector current renormalization coefficients (Z A and Z V , respectively), and kaon bag parameter (B lh ).
A separate fit was performed for the Ω-baryon mass. The values for these observables obtained on each lattice, as well as the statistical errors computed by jackknife resampling, are summarized in Table V. The corresponding fit ranges are summarized in Tables VI and VII. In the following sections we discuss the fit procedures and plot effective masses and amplitudes for each observable.

A. Residual mass
For domain wall fermions, the leading effect of having a finite fifth dimension is an additive renormalization to the bare quark masses known as the residual mass, m res . We extract the residual mass from the ratio where j a 5q is the pseudoscalar density evaluated at the midpoint of the fifth dimension, and j a 5 is the physical pseudoscalar density constructed from the surface fields (cf. Ref. [27], Eqs. (8) and (9) ). In Figure 4 we plot the effective residual mass, as well as the fit, on each ensemble.

B. Pseudoscalar Masses
The masses of the pion and kaon at the simulated quark masses, denoted m ll and m lh respectively, were extracted by fitting to two-point functions of the form Here the subscripts indicate the interpolating operators and the superscripts denote the operator smearing used for the sink and source, respectively. In the following we have used Coulomb gauge-fixed wall (W ) sources, and both local (L) and Coulomb gauge-fixed wall sinks. We extract the pseudoscalar meson masses by fitting three correlators simultaneously: PP LW , PP WW , and AP LW , where P is the pseudoscalar operator and A is the temporal component of the axial current.
These are fit to the following analytic form for the ground state of a Euclidean two-point correlation function: where the + (-) sign corresponds to the PP (AP) correlators, and X denotes the physical state to which the operators couple. In the following sections we use to denote the amplitude for a given correlator. The effective mass plots associated with these correlators, as well as the fitted masses, are shown in Figures 5, 6, 7, and 8.

C. Pseudoscalar Decay Constants and Axial Current Renormalization
The pseudoscalar decay constants, f π and f K , are defined in terms of the coupling of the pseudoscalar meson fields to the local four-dimensional axial current A a µ : where is formed from the surface fields q(x). In order to match this operator to the physically normalized Symanzik-improved axial operator A Sa µ , we must derive the appropriate renormalization factor, Z A . In the domain wall fermion formalism it is also possible to define a five-dimensional current A a µ which satisfies the discretized partially-conserved axial current (PCAC) relation, where ∆ − µ is the backwards discretized derivative. The factor relating this to the Symanzik current In the past, we took advantage of the fact that Z A = 1 + O(m res ) to approximate Z A as Z A /Z A , which can be computed directly via the following ratio: The 5-D current A a µ (x) is properly defined as the current carried by the link between x and x + µ, whereas the 4-D current A a µ (x) is defined on the lattice site x. The correlation functions C(t +   2 ) = ∑ x A a 0 ( x,t)π a ( 0, 0) and L(t) = ∑ x A a 0 ( x,t)π a ( 0, 0) , that one would use to compute the above ratio, are therefore not defined at the same temporal coordinate. By taking appropriate combinations of these correlators one can remove the associated O(a) error and reduce the O(a 2 ) error. Z A /Z A is then computed via the following ratio: [27] While the 1-2% m res errors associated with the above determination of Z A could be neglected in our earlier work, where we were far from the chiral limit and the statistical errors were larger than in the current work, in Refs. [6] and [28] it was shown that a better approximation could be obtained via the vector current. The local vector current operator formed from the domain wall surface fields is which is related to the Symanzik vector current V Sa µ by a renormalization coefficient Z V which was shown to be equal to Z A up to terms O(m 2 res ) [6]. There is also a five-dimensional conserved vector current V a µ for which the renormalization factor, Z V , is unity, and we can obtain a significantly better approximation to Z A by computing Z V /Z V on the lattice: Below we determine both Z A /Z A and Z V /Z V , but use only the latter to renormalize our decay constants.

Determination of Z A /Z A
We introduce a practical approach to the conserved axial current for Möbius fermions in Appendix A and Ref. [29]. For the numerical determination of Z A , the explicit construction of the current, used in Eq. (42), can be avoided with an alternate determination that utilizes the ratio of the divergences of the four-dimensional and five-dimensional axial currents: where the last equality follows from the PCAC relation, Eq. (41). We extract Z A from our lattice data using the improved ratio which is also constructed to minimize errors at O(a 2 ) [27]. The translation by 1 2 in the argument of the correlation function associated with A a µ arises from the divergence. The five-dimensional current A a µ , by contrast, is defined on the links between lattice sites, so its divergence is centered on the lattice. In Figure 9 we plot the effective Z A and fit on each ensemble.

Determination of Z V /Z V
Since the relatively noisy ρ meson is the lightest state to which the vector current couples, computing Z V accurately requires a different approach from that used for Z A (Eq. (46)). Instead, we calculate the pion electromagnetic form factors f + ll (q 2 ) and f − ll (q 2 ), defined by the matrix element where q = p 2 − p 1 is the momentum transfer. Current conservation implies f − ll (q 2 ) = 0 for all q 2 , leaving only the vector form factor, f + ll . For two pions at rest, f + ll (0) = 1, and we can fit Z V from the temporal component of Eq. (48). We fit to the ratiõ whereC WW is the pion two-point function, Eq. (37), with the around-the-world state removed using the fitted pion mass, and C WW PV P (t src ,t,t snk ) is the three-point function defined by the matrix element, Eq. (48).
On the 32Ifine and 48I ensembles, this matrix element was computed for all π − π separations, t sink − t src , that are a multiple of 4. For the 64I ensemble we computed on separations that are multiples of 5. We determine the ranges of π − π separations to use in the fit by plotting the midpoint of Eq. (49) as a function of the π − π separation on each ensemble and looking for a plateau: based on this analysis we chose to include π − π separations in the range 16-32 on the 32Ifine ensemble, 12-24 on the 48I ensemble, and 15-40 on the 64I ensemble. In Figure 10 we illustrate this method by plotting Eq. (49) for a single π − π separation included in the fit, as well as the fitted value for Z V , on each ensemble. performed using several π − π separations, not just the separation plotted here.

Determination of the Decay Constants
The light-light pseudoscalar decay constant can be computed from Z V and the amplitudes of the PP and AP correlators as and likewise for the heavy-light pseudoscalar. In Figures 11 and 12 we plot the effective ampli- associated with f ll and f lh .

D. Neutral Kaon Mixing Parameter
We compute the neutral kaon mixing parameter, B lh , from the ratio where O VV +AA is the ∆S = 2 four-quark operator responsible for the mixing: The matrix element in the numerator of Eq. we have also employed in previous calculations [5]. We determine appropriate ranges of K −K separations to include in the fit using the same procedure as described in the previous section for Z V

E. Omega Baryon Mass
We measured the Ω-baryon mass m hhh from the two-point correlator using an interpolating operator where C denotes the charge conjugation matrix. We performed measurements using both Coulomb gauge-fixed wall sources and Z 3 box (Z 3 B) sources, and, in both cases, a local (point) sink. The correlator, Eq. (55), is a 4 × 4 matrix in spin space which couples to both positive (+) and negative (−) parity states, and has the asymptotic form for large t. The fit to extract m hhh is performed by first projecting onto the positive parity component, for each source type, and then performing a simultaneous fit of both correlators to a sum of two exponential functions with common mass terms : One can also include terms proportional to e −m − (N t −t) , where m − is the mass of the ground state in the negative parity channel, to account for around-the-world contamination effects, but we find that our lattices are sufficiently large and the masses of these states sufficiently heavy that including these terms has no statistically significant influence on the fitted Ω mass. Using multiple source types and double-exponential fits to common masses allows us to reduce the statistical error on the Ω baryon mass m hhh , as well as to also fit the mass of the first excited state in the positive parity channel m ′ hhh . Figure 14 plots the effective Ω-baryon mass on each ensemble.

F. Wilson flow scales
The Wilson flow scales, t 1/2 0 and w 0 , are quantities with the dimension of length defined via the following equations: [30] and [31] where E is the discretized Yang-Mills action density, We determine the action density using the clover discretization, for which F µν is estimated at each lattice site from the clover of four 1 × 1 plaquettes in the µ − ν plane. We find that this leads to smaller discretization errors (especially for t 0 ) than estimating F µν directly from the plaquette via which is in agreement with some previous experience [30]. In Figure 15 we show an example of the interpolation of the two scales on the 64I ensemble. The final results for all ensembles are listed in Table V.

IV. SIMULTANEOUS CHIRAL/CONTINUUM FITTING PROCEDURE
The bare quark masses for the 48I and 64I ensembles were chosen based on the results for the physical quark masses at equivalent bare couplings obtained in Ref. [  the dimensionless ratios m π /m Ω and m K /m Ω are shown in Table V. Since we are not simulating electromagnetism, we compare to the following physical values: m π = 135.0 MeV, m K = 495.7 MeV and m Ω = 1.67225 GeV. Clearly our simulations are very close to the physical point, yet we must perform the very modest extrapolation in order to obtain precise physical results.

A. Summary of global fit procedure
In Refs. [5,6] we have detailed a strategy for performing simultaneous chiral and continuum 'global' fits to our lattice data. In this document we perform such fits to the following quantities: m π , m K , f π , f K , m Ω and the Wilson flow scales t 1/2 0 and w 0 . We parametrize the mass dependence of each quantity using three ansätze (where applicable): NLO partially-quenched chiral perturbation theory with and without finite-volume corrections (i.e. infinite volume χPT), which we henceforth refer to as the 'ChPTFV' and 'ChPT' ansätze respectively; and a linear 'analytic' ansatz. For the ChPT and ChPTFV ansätze we use heavy-meson χPT [32,33] to describe quantities with valence strange quarks. We use the difference between the results obtained for each ansatz to estimate our systematic errors. In order to account for discretization effects, we include in each fit form an a 2 term. As discussed in Ref. [5], we neglect higher order effects including terms in a 4 and a 2 ln(a 2 ). The fits are performed to dimensionless data, with the parameters determined in the bare normalization of a reference ensemble r. The bare lattice quark masses and data on other ensemble sets are 'renormalized' into this scheme via additional fit parameters: For an ensemble e, these are Z e l , Z e h for normalizing the light and heavy quark masses respectively, and R e a for the scale. These are defined as follows: where a is the lattice spacing andm = m + m res . The procedure for obtaining the general dimensionless fit form for a quantity Q is described in Appendix B.
We choose a continuum scaling trajectory along which m π /m Ω and m K /m Ω match their physical values. Here we include the Ω baryon mass due to the ease of obtaining an precise lattice measurement and its simple quark mass dependence. This procedure defines m π , m K and m Ω as having no lattice spacing dependence. After performing the fit, we obtain the lattice spacing for the reference ensemble by comparing the value of any of the aforementioned quantities to the corresponding physical value after extrapolating to the physical quark masses. The lattice spacings for the other ensembles are then obtained by dividing this value by R e a . An alternate choice of scaling trajectory, for example using f π in place of m Ω , would reintroduce the scale dependence on m Ω and remove it from f π ; the values of each a 2 coefficient are therefore dependent on the choice of scaling trajectory, but the continuum limit is guaranteed to be the same (up to our ability to measure and extrapolate the quantities in question). Note that the inclusion of the Wilson flow data results in significant improvements in the statistical error on the lattice spacings compared to our previous determinations due to its influence on the shared ratios R a .

B. Details specific to this calculation
Using our simultaneous fit strategy, we combine our 64I and 48I physical point ensembles with a number of existing domain wall ensembles: the '24I' and '32I' ensembles with lattice volumes 24 3 × 64 × 16 and 32 3 × 64 × 16 and Shamir domain wall fermions with the Iwasaki gauge action at bare couplings β = 2.13 and 2.25 respectively (equal to the 48I and 64I bare couplings respectively) described in Refs. [33] and [6]; the '32ID' ensembles with volume 32 3 × 64 × 32 and Shamir domain wall fermions with the Iwasaki+DSDR gauge action at β = 1.75 described in Ref. [5]; and finally the '32Ifine' ensemble with volume 32 3 × 64 × 12 and Shamir domain wall fermions with the Iwasaki gauge action at β = 2.37 described in this document. Following our earlier analyses, we use the 32I ensemble set as the reference ensemble against which the 'scaling parameters', Z l/h and R a , are defined.

Ensemble-specific parameters
As discussed in Section II, the Möbius parameters of the 48I and 64I ensembles are chosen to ensure the equivalence of the Möbius and Shamir kernels; as a result, the ensembles with the Iwasaki gauge action can all be described by the same continuum scaling trajectory, i.e. with the same a 2 scaling coefficients. As described in Ref. [5], additional parameters must be introduced to describe the lattice spacing dependence of the 32ID ensembles, which use the Iwasaki+DSDR gauge action to suppress the dislocations that enhance the domain wall residual chiral symmetry breaking on this coarse lattice.
The 48I and 64I ensembles have identical bare couplings to the 24I and 32I ensembles respectively, yet differ in their values of the total quark mass, L s and Möbius scale parameter α. The change in residual chiral symmetry breaking resulting from the changes in L s and α gives rise to a shift in the bare mass parameter of the low-energy effective Lagrangian, which we account for at leading order in our fits by renormalizing the quark masses asm = m + m res . Higher order effects such as those of order m res a 2 are small enough to be ignored. After performing this correction we might assume that the scaling parameters Z l , Z h and R a (or equivalently the lattice spacing) for the 48I and 24I ensembles should be identical, and likewise for the 64I and 32I ensembles. However when we performed our global fits we found that the 48I lattice spacing is 3.2(2)% larger than that of the 24I ensemble, and the 64I lattice spacing is 1.1(2)% larger than the 32I value. We saw no statistically discernible differences in Z l and Z h . In Appendix C we discuss how changes in the lattice spacing can arise from the changes in L s and the Möbius scale parameter α, and we provide additional numerical evidence that these effects can be well described by allowing for a change in the lattice scale between each of these pairs of ensembles. As a result we use independent R a parameters for every ensemble in our global fits, while fixing Z 48I l/h = Z 24I l/h and Z 64I l/h = Z 32I l/h in order to reduce the number of free parameters.

Weighted global fits
The fits are performed independently for each superjackknife sample J by minimizing χ 2 J under changes in the set of fit parameters c J of the function f . χ 2 J is defined as were y iJ is the J th superjackknife sample of a measurement i and x iJ are the associated input parameters (quark masses, etc). σ i is the error on the measurement, and provides the weight of each data point in the fit.
The naïve χ 2 -minimization procedure weights each data point according to just its statistical error, and is therefore unable to account for systematic uncertainties on the fit function itself. Given that NLO χPT can only be expected to be accurate to O(5%) in the 200 -370 MeV pion-mass range in which the majority of our data lies, the fits over-weight the data in this heavy-mass region resulting in deviations of the fit curve from the light-mass data. In practice the enhanced precision of the near-physical 64I and 48I data partially compensates for the larger number of heavy-mass data points, resulting in only O(1σ − 2σ ) deviations between these data and the fit curve. However, as the intention of these global fits is only to perform a few-percent mass extrapolation of our near-pristine data, such deviations are unacceptable. While this can be remedied to a certain degree by removing data from the heavy-mass region, there remains pollution from the systematic uncertainty of the fit form. We chose an alternative approach in which we force the fit curve to pass through our near-physical data by increasing the weight of these data in the χ 2 minimization as follows.
We introduce a measurement-dependent weighting factor ω i to the χ 2 determination: Note that only the relative values of ω i matter as the same parameters that minimize χ 2 will also minimize rχ 2 , where r is some common factor. (Of course the algorithm itself has some numerical stopping condition which will need to be adjusted to take into account the change in normalization of χ 2 .) In principle one could tune the relative weights based on a combination of the measured statistical error and an estimate of the systematic error of the fit function at each point, but this runs the risk of becoming too complex and arbitrary. Instead, as previously mentioned, we weight the data such that the fit is forced to pass directly through the data points on the 48I and 64I ensembles.
To achieve this, we set ω i = Ω for those data, where Ω is assumed to be large, and ω i = 1 for the remainder. This is performed independently for each superjackknife sample, and does not change the fluctuations on the data between superjackknife samples. As a result, the statistical error from the overweighted points is unchanged by this procedure. In Appendix D we demonstrate that the fit results become independent of Ω in the limit Ω → ∞ and that the procedure has the desired effect of forcing the fit through the physical point data.
For large values of Ω we must choose small values of the numerical stopping condition on the minimization algorithm, increasing the time to perform the fit and making it more susceptible to finite-precision errors. In the aforementioned appendix we determine that Ω = 5000 and a stopping condition of δ χ 2 min = 1 × 10 −4 is sufficient.

V. FIT RESULTS AND PHYSICAL PREDICTIONS
We performed global fits using the ChPTFV, ChPT and analytic ansätze. As discussed in Ref. [5], we attempt to separate the finite-volume and chiral extrapolation effects by performing the analytic fits to data that is first corrected to the infinite-volume using the ChPTFV fit results. Following Ref. [5], the ChPTFV and ChPT fits were performed with a 370 MeV pion mass cut on the data (this is set slightly larger than the value used in that paper, as we wish to include in our fit the 32Ifine data with a 371(5)MeV pion). The criteria for excluding the other fitted data are as follows: For f π we exclude the data if the pion mass with the same set of partially-quenched quark masses lies above the cut; for f K and m K data points with light valence quark mass m x and heavy mass m y , we exclude the data if the pion with m x = m y on that ensemble is above the pion mass cut; and for m Ω , t 1/2 0 and w 0 we exclude the data only if the unitary pion on that ensemble is also excluded. We consider two different pion mass cuts for the analytic fits: the 370 MeV cut used for the ChPTFV and ChPT fits, and a lower, 260 MeV cut. In our previous work we determined that the analytic fits were not able to accurately describe the data over the range from the physical point to the heaviest data, forcing us to use the lower cut. However, in the present analysis the fit predictions are dominated by the near-physical data due to the overweighting procedure, and these data require only a small, percent-scale, chiral extrapolation to correct to the physical light quark mass. This can be seen in Table VIII, in which we list the sizes of the various corrections required to obtain the physical prediction. We therefore also perform analytic fits with the 370 MeV cut, which includes substantially more data, including a third lattice spacing, that may enable a more precise determination of the dominant a 2 scaling behaviour. In practice we find the results to be highly consistent.  The predicted values of the lattice spacings and (unrenormalized) physical quark masses obtained using the ChPTFV ansatz are listed in Table X alongside the correlated (superjackknife) differences between those and the results for the other ansätze. A similar listing of the physical predictions can be found in Table XI. The corresponding fit parameters for all four ansätze are given in Table XII. For the analytic fit with the 260 MeV cut, the cut excludes the 32Ifine data for which the pion mass is 371(5) MeV, and we are therefore unable to directly obtain the scaling parameters associated with the heavy 32Ifine data; instead we first fit without these data and then determine the remaining unknowns, Z 32Ifine l/h and R 32Ifine a , by including the 32Ifine data while freezing the other fit parameters to those obtained without these data.
In Figure 18 we plot the unitary mass dependence of m π , m K and m Ω , which are used to determine the quark masses and overall lattice scale. In this figure we clearly see that the overweighting procedure forces the curve to pass through the near-physical data as desired, and that this procedure does not introduce any significant tension with the heavier data. In Figure 16 we plot a histogram of the deviation of the data from the ChPTFV fit, showing excellent general agreement between the fit and the data, and in Figure 17 we plot the corresponding histograms for the analytic fits.
For the analytic fit with the 370 MeV mass cut we observe O(3 − 4)σ deviations of the 32ID pion mass data from the fit curve, which arise because of chiral curvature in the data: the fit is pinned near the physical point by the overweighting procedure and is strongly influenced by the larger volume of data in the heavy mass regime, leading to deviations from the lighter 32ID data that lies between these extremes. Nevertheless, in Tables X, XI and XII we generally observe better agreement between the analytic fit with the 370 MeV mass cut and the ChPTFV results than for the lower cut. The total (uncorrelated) χ 2 /d.o. f . are given in Table IX and are sub-unity for all four ansätze.
As previously mentioned, the inclusion of the Wilson flow data in these fits has a significant effect on the precision of the lattice spacings via their influence on the shared R a parameters. This can be seen in Table XIV, in which we show the various scaling parameters, as well as the unrenormalized quark masses and lattice spacings, obtained using the ChPTFV ansatz with and without the Wilson flow data. For the 48I and 64I ensembles, for which the hadronic measurements are very precise, we see only a small improvement in the statistical error. However, for the 32I, 24I and 32Ifine ensembles we observe factors of three or more improvements in precision. The results themselves are very consistent.

A. Systematic error estimation
In our previous analyses we used the difference between the ChPTFV and ChPT results as a conservative estimate of the higher-order finite-volume errors on our results (recall the ChPTFV formulae incorporate the NLO finite-volume corrections). From a purely χPT perspective this is a considerable over-estimate of the size of the NNLO and above corrections, which are known to be only a small fraction of the NLO values even at smaller volumes. Our prudence was motivated by Ref. [34], in which the authors observed significant deviations between the finite-volume corrections predicted by standard finite-volume chiral perturbation theory and those obtained via a resummed version of the Lüscher formula [35] that relates the finite-volume mass shift of a particle to the infinite-volume Euclidean scattering length of that particle with the pion. Nevertheless, one can conclude from those results that the full finite-volume corrections can be expected to differ from the NLO χPT predictions by only 30-50% for the light pions that we are currently using.
Our present fits are dominated by near-physical data computed on 5.5fm volumes, such that (e.g. in Tables X and XI) we observe only very tiny differences between the ChPT and ChPTFV fit results; these differences are typically 10-20% of the size of the statistical error, and hence have negligible impact upon the total error. Given the small size of these differences and that the true sizes of the higher-order finite-volume effects are expected to be several times smaller, we therefore choose to omit the finite-volume systematic from our error estimate.
The estimate of the chiral extrapolation error is made difficult due to the fact that the global fits combine the chiral and continuum extrapolations together, and in this analysis the latter are larger than the former while being less well determined by the fits (the a 2 parameters have typically 50-100% statistical error). As a result, the established procedure of estimating the chiral error from  the difference of the ChPTFV and analytic result with a 260 MeV cut is no longer satisfactory.
In this analysis we considered analytic fits with both a 260 MeV and a 370 MeV pion mass cut. The latter is clearly applying the linear ansatz outside of its region of applicability, leading to deviations from the 32ID data at the 3-4σ level. Despite this there is generally excellent agreement between the continuum predictions of this fit and the ChPTFV. The analytic fit with the 260 MeV mass cut does not suffer from this issue, but at the expense of fitting to a considerably smaller amount of data, including one less lattice spacing. The ChPTFV fit on the other hand is theoretically 'clean' in that it is the correct ansatz for the data in the chiral limit, and agrees very well with our data when applied in the 140 to 370 MeV pion mass range. In Table VIII  Given the small size of the observed differences in the corrections to the 48I and 64I data, and our understanding that these are likely a result of deficiencies in the fitting strategies for those ansätze, we choose to take the cleaner ChPTFV ansatz, which describes our data very well, as our final result and treat the systematic error associated with the extrapolation to the physical point as negligible.  The outliers in the right-hand plot are exclusively from m π on the 32ID ensembles, indicating that the linear curve is deviating from the data due to chiral curvature.

B. Physical predictions
In this section we present our predictions.

χPT parameters
The LO and NLO SU(2) partially-quenched χPT low-energy constants are given in Table XII.
We can also compute the ratio of the decay constant to the LO SU(2) χPT parameter f , for which we obtain:

Lattice spacings
For the lattice spacings we obtain the following values: where we quote the statistical error in parentheses. Our previous values [5] for the lattice spacings of the 32I, 24I and 32ID ensembles are as follows: where the errors are statistical, chiral and finite-volume. We observe a 1.8σ tension between the new and old values of the 32I lattice spacing, which appears to arise from the introduction of the physical point data; if we look at Figure 18 we see that the physical point data appears to favor a stronger light quark mass slope than one would obtain from the heavier data. Nevertheless there do not seem to be any clear discrepancies, except for those that might be attributed to statistical effects. Other than this, our new results are consistent with these values, and are significantly more precise due to the inclusion of the Wilson flow data.

Decay constants
In Table XI we list the predicted values of f π , f K and f K / f π obtained using the ChPTFV ansatz, as well as the differences between those results and those of the other ansätze. As we now have data at several lattice spacings, we can examine the scaling of both f π and f K in order to ensure that their dependence on the lattice spacing can be described by a quadratic form. In Figure 20 we plot the data, corrected to the physical quark masses and the infinite volume using the ChPTFV fit, as a function of the lattice spacing. In addition we show the scaling curve for the Iwasaki ensembles. We observe excellent consistency between the data and the fit curve for both quantities.
In Figure 19 we show the chiral extrapolation in the continuum/infinite-volume limits with the ChPTFV ansatz, again showing excellent agreement between the data and the fit.
We obtain the following physical predictions: where, as above, the statistical errors are given in parentheses. Previously [5] we obtained Here we see that the inclusion of the 48I and 64I data, giving statistically precise data at simulated masses very near the physical quark masses, has led to a highly significant improvement in our results.
In our first global fit analysis [6], performed only to the 32I and 24I ensembles over a (unitary) pion mass range of 290-420 MeV, we obtained a value for f π from our NLO χPT fit that was 6.6% (9 MeV) lower than the experimental value. We concluded that this discrepancy was due to systematic errors in the chiral extrapolation, and introduced the analytic fits as a means of estimating this systematic. When we included the 32ID ensembles into the global fit [5] we observed a marked improvement in the results for the decay constants and a corresponding reduction in the size of the chiral systematic (as estimated by taking the difference between the ChPTFV and analytic fit results).  [36], f π − = 0.1304(2) GeV and f K − = 0.1562 (7) GeV.
Here, f π − is determined experimentally using the measured branching fraction and pion lifetime, with |V ud | computed very precisely via nuclear β decay, such that the error is dominated by higher order terms in the decay width formula. On the other hand, the value for f K − requires |V us | as input, which, for the quoted result, is computed using |V us | f + (0) determined via semileptonic kaon decays and lattice input for f + (0). The consistency of our f K with the PDG value could therefore be taken as both representing the consistency of experiment with the Standard Model, and the quality of the lattice QCD determinations both the kaon semileptonic form factor and our determination of the kaon decay constant.

Wilson flow scales
In Table XI where the statistical error is quoted in parentheses. The above values can be compared to the following results obtained using 2+1f 2HEX-smeared Wilson fermions [31]: where we have combined the statistical and systematic errors in quadrature. We find excellent agreement between our results.

Unrenormalized physical quark masses
The quark masses in bare lattice units on the 32I reference ensemble are given in Table X. In physical units, and including the residual mass, the unrenormalized physical quark masses are given in Table XV. Combining these results we obtain the following: where the errors are statistical.  (21) TABLE XV. Unrenormalized physical quark masses. For the analytic fits, the corresponding pion mass cut is given in parentheses.

C. Renormalized physical quark masses and the chiral condensate
The quark masses presented above are defined in the bare lattice normalization of the 32I reference ensemble. On each of the 32I and 24I ensembles independently, we calculate the non-perturbative renormalization factors that are necessary to convert quark masses in the corresponding bare normalization into a variant of the Rome-Southampton RI-MOM scheme [37] that can be related to MS via perturbation theory. The procedure applied below is identical to that used in Refs. [6] and [5], and the determination of the renormalization coefficients is documented in Appendix F; below we provide only a brief outline.
We compute amputated, projected bilinear vertex functions, where O is an operator, Π are the matrix-valued amputated vertex functions and Γ (s) are projection operators, for which the superscript s indexes the particular renormalization scheme (where applicable). We use the 'symmetric' RI-MOM schemes, defined by the following condition on the incoming and outgoing quark momenta, p in and p out respectively: p 2 in = p 2 out = q 2 ≡ (p in − p out ) 2 . We define renormalization factors by matching to the tree-level amplitude at the scale µ 2 = q 2 : In order to cancel the factors of the quark field renormalization in the denominator, we use whereΛ O ≡ Λ bare O × (Λ tree O ) −1 , S and V are the scalar and vector operators repectively, and Z V is the vector-current renormalization computed using hadronic variables via the procedure given in Section III C 2. We use two different choices of projection operator for the vector vertex, formed from the quantities / qq µ /q 2 and γ µ ; these define the SMOM and SMOM γ µ schemes respectively.
More details on the projection operators and the numerical determination of these quantities can be found in Appendix F.
We now describe the procedure by which we obtain the renormalized quark masses given the and Z m f thus determined are given in Table XVI.
Applying the renormalization factors to the masses from the previous section, we obtain the values given in Table XVII The quark mass ratio is for which there is no systematic error associated with the perturbative matching as it cancels in the ratio.
For comparison, in our previous work [5] we obtained for which the errors are statistical, chiral and finite-volume. Our new results are highly consistent with these values and again show a substantial improvement in the systematic error as a result of including the near-physical data.
We can also compute the chiral condensate, by combining the leading-order SU(2) χPT parameters from The subsequent conversion to the MS scheme at 3 GeV can be performed by further dividing by the appropriate scheme change factor.
It is customary to quote the dimension-one quantity (Σ) 1/3 . We obtain which, after converting to MS and combining, gives where the errors are statistical and from the perturbative matching respectively.  Values are given on the 32I and 24I ensembles and in the continuum limit for the latter quantity.

D. Neutral kaon mixing parameter, B K
The neutral kaon mixing parameter is renormalization scheme dependent, and as such the fits must be performed using renormalized data. As this introduces additional systematic errors, we follow our established procedure of performing these fits separately from the main global fit analysis.
Below we first summarize our non-perturbative renormalization procedure for B K and then present the results of the chiral/continuum fit and finally our physical predictions.

Renormalization of B K
In this section we provide a brief outline of the procedure for determining the renormalization coefficients; for more details we refer the reader to Appendix F and Refs. [38] and [5].
As with the quark mass renormalization, we make use of 'symmetric' regularization-invariant momentum schemes (RI-SMOM for short), defined by the condition µ 2 = p 2 1 = p 2 2 = q 2 ≡ (p 1 − p 2 ) 2 , where p 1 and p 2 are the momenta of the incoming and outgoing quarks: d(p 1 )s(−p 2 ) → d(−p 1 )s(p 2 ). We compute the amputated and projected Green's function of the relevant fourquark operator, O LL , describing the K −K mixing, normalized by the square of the average between the vector and axial bilinear: where for the operator O, as before. Note that the quark wave function renormalization factor cancels in the ratio. In Appendix F we show that the difference between Λ V and Λ A at 3 GeV is numerically negligible, and therefore the above choice of normalization is irrelevant. The superscript (s i ) refers to choice of projector (cf. [5]): either γ µ or / q. The choices s 1 = s 2 = γ µ and s 1 = s 2 = / q define the so-called SMOM(γ µ , γ µ ) and SMOM( / q, / q) schemes respectively.
We perform the full analysis separately for each scheme and use the difference to estimate the systematic error associated with the MS matching. While treating the two schemes in an equal fashion is the most rigorous estimate we can make with the current data, we have indications that this might overestimate the error on the SMOM( / q, / q) result: A preliminary study [39] of step scaling to higher momentum scales suggests that the scale evolution in this scheme agrees with perturbation theory over the full range of scales, whereas the SMOM(γ µ , γ µ ) scheme evolves into better agreement as the scale is raised. The perturbative truncation error is therefore greater for the SMOM(γ µ , γ µ ) scheme than for the SMOM( / q, / q) scheme. The complete study of the evolution to higher energy scales requires careful treatment of the charm threshold, and is the subject of further work by RBC and UKQCD. These observations are consistent with our earlier results at lower scales, and the better agreement with the perturbative scale evolution for the SMOM( / q, / q) scheme was the reason we have, in this work and previously, taken our central values for B K from this scheme [38].
We compute Z B K on each ensemble at a number of q 2 , and interpolate to a chosen high momentum scale at which the matching to MS can be performed. We choose to perform the matching at 3.0 GeV as before. The values of the renormalization coefficients at the various lattice momenta and further details of the analysis are given in Appendix F.
All matrix elements included in the global fit must be renormalized to a common scale of 3.0 GeV in order that the global fit can extrapolate these to a shared, universal continuum limit. As described in Ref. [5], due to the coarseness of the 32ID ensemble we are unable to renormalize directly at 3 GeV without introducing potentially sizeable lattice artifacts. Instead we renormalize with a lower momentum scale of µ 0 = 1.4363 GeV, and apply the continuum non-perturbative running σ (s 1 ,s 2 ) B K (µ, µ 0 ), extracted from the 32I and 24I lattices (and extrapolated to the continuum), to convert this value to µ = 3GeV. More details of this conversion are given in Appendix F.   Table XVIII.

Chiral/continuum fit to B K
As above, we describe the chiral dependence using chiral perturbation theory, with and without finite-volume corrections, as well as a linear ansatz with a 260 MeV and 370 MeV pion mass cut.
The chiral/continuum fit forms can be found in Ref. [38]. As before, we use separate parameters to describe the lattice spacing dependence of the Iwasaki and Iwasaki+DSDR actions. The fit parameters can be found in Table XX, and in Figure 23 we show examples of the unitary and continuum extrapolations. In Figure 24, in which we plot a histogram of the statistical deviations of the data from the ChPTFV fit curve, we see excellent consistency between the data and the fit. The total χ 2 /d.o.f. for each of the four ansätze are given in Table XIX.

Predicted values
In Table XXI we list the continuum predictions for B K , renormalized in each of the two intermediate schemes, that we obtained using the ChPTFV ansatz, as well as the sizes of the differences between those and the other chiral ansätze. We use the SMOM( / q, / q) result for our central value, giving us a final continuum result in a non-perturbative MOM scheme with 0.3% total error after all sources of error are accounted for: GeV.
Here we have again neglected the chiral and finite-volume systematic errors as their sizes are considerably smaller than the statistical error.
This final prediction, and the result in the SMOM(γ µ , γ µ ) scheme, can be converted into the MS scheme using the following one-loop matching coefficients [38]: using α s (3 GeV) = 0.24544. The resulting MS values are also listed in Table XXI. For the reasons discussed above, we use the value obtained via the SMOM( / q, / q) scheme for our final MS result. The matching introduces a perturbative truncation error, which we estimate by taking the full difference between the results obtained using the two RI-SMOM intermediate schemes.
We obtain: where the errors are statistical and from the perturbative matching to MS respectively.
In the renormalization group invariant (RGI) scheme, the above corresponds tô Previously [5] we obtained: for which the errors are statistical, chiral, finite-volume and from the perturbative matching respectively. Comparing with the above, we see excellent agreement. Our new result offers a considerable improvement in the statistical error, but the truncation effects are the same as we have not changed the scale, and dominate the final error.

VI. CONCLUSIONS
Combining decades of theoretical, algorithmic and computational advances, we are finally able to perform 2 + 1 flavor simulations with an essentially chiral action directly at the physical masses of the up, down and strange quarks in isospin symmetric QCD with both fine lattice spacings and large physical volumes. In this paper we report on two such ensembles; a 48 3  The global fits are performed using the techniques developed in Refs. [6] and [5]. We fit to the following quantities: m π , m K , f π , f K , m Ω and the Wilson flow scales w 0 and t 1/2 0 . A separate fit is performed to the neutral kaon mixing parameter, B K . To describe the mass dependence of these quantities we use NLO partially-quenched chiral perturbation theory with and without finitevolume corrections (referred to as the 'ChPTFV' and 'ChPT' ansätze) and also a linear 'analytic' ansatz.
Despite the significantly improved precision of the 48I and 64I data, we found that the fits missed these data by 1-2σ ; this is an artifact of the large number of data points in the heavy-mass regime where χPT is only reliable to O(5%). We resolve this issue by over-weighting the 48I and 64I data in order that the fit is forced to pass through these points.
The 48I and 64I ensembles each have the same gauge coupling as the corresponding 24I and 32I ensembles, but with smaller residual chiral symmetry breaking (significantly so for the former).
We found that the differences in the fermion action between these two pairs of ensembles, each evaluated at the same gauge coupling, resulted in a 3.2(2)% difference between the 48I and 24I lattice scales, and a 1.1(2)% difference between that of the 64I and 32I ensembles. In Appendix C we show that this can be understood as an unexpectedly large effect of the changes in L s and the Möbius scale parameter α which distinguish these ensembles, and provide added numerical evidence that these effects are accurately described by such shifts in the lattice scales.
We showed that due to the dominance of the 48I and 64I data, which were measured with nearphysical pion masses on large, 5.5fm boxes, the systematic errors associated with the chiral ex- Our results for the light and strange quark masses, obtained in Section V C, are renormalized in the MS scheme at 3 GeV. The only remaining uncertainties on these quantities are statistical and perturbative matching errors, roughly 1% each. The renormalization and running of the quark masses were computed nonperturbatively, details of which can be found in Appendix F. The masses are quite consistent with our previous determinations, but show significant improvement due to the inclusion of the physical point ensembles. Our masses agree with the FLAG averages, but have errors that are both smaller than those of the average as well as those of any of the individual results used therein [5,[44][45][46][47]. The ratio of strange to light quark masses, shown in Eq. (86), is also consistent with the FLAG average [40], but here the error is slightly larger since systematic errors mostly cancel, though it is as small as any individual result used in the average [5,[44][45][46][47].
The FLAG average for the standard model kaon bag parameter is largely dominated by the Budapest-Marseille-Wuppertal collaboration (BMWc) result [43],B K = 0.7727(81) stat (34) sys (77) PT , where the errors are statistical, systematic and from perturbation theory, respectively. We would like to stress the difficulties one encounters in reliably assessing truncation errors, a point also compare the non-perturbative scale evolution to the NLO running between 2 and 3 GeV. We find a deviation of around 1.5% for the RI-SMOM(γ µ , γ µ ) and for the RI-MOM schemes, and of 0.5% for the RI-SMOM( / q, / q) scheme.
It is useful to compare our results with Ref. [43] in the intermediate MOM schemes (before converting to MS) as these numbers are purely non-perturbative: where we neglect the various sources of systematic errors in our result since they are considerably smaller than the statistical error. These results are in different non-perturbative schemes and at different scales, and are therefore not directly comparable. However, we can compare their relative total errors: our result and that of BMWc have a 0.3% and a 1.1% relative error, respectively. We emphasize that in terms of objective statistical errors, and every systematic effect for which there is a theoretical framework for estimation (e.g. discretization, mass extrapolation, and finite volume), our new result is more precise than those entering the FLAG average. This is reflected in the 0.3% total relative error on results in a non-perturbatively defined / q RI scheme. Our assessment of the (subjective) perturbative systematic uncertainty on the conversion to MS is more pessimistic than that of FLAG and BMWc, but we believe that it is better founded on the evidence of multiple intermediate schemes.
Predictions ofB K in lattice QCD have now reached a level of precision where other ingredients in its utilization for SM-tests are limiting progress (e.g. our knowledge on |V cb |).
The results for the kaon and pion decay constant and their ratio are compatible with the FLAG average and amongst the most precise N f = 2 + 1 predictions that have been made. Our results will certainly allow for further constraining CKM-unitarity tests [40].
The most significant remaining differences between our simulations and the physical world are isospin breaking and EM effects and the effect of quenching the charm quark.
Including isospin breaking effects requires using non-degenerate masses for the up and down quarks. This is possible within the domain wall fermion framework with current technology, for example using the rational quotient action or the one-flavor action developed by TWQCD [52].
However, these techniques are computationally demanding, and the effects in question are expected to be similar in size to the electromagnetic effects, hence there is limited value in considering these in isolation.
The RBC and UKQCD collaborations have performed exploratory calculations using QCD domain wall configurations with quenched electromagnetic interactions [53,54] and have performed unquenched simulations using reweighting techniques [55]. There is increasing effort in the lattice community to control these effects, from more precise electro-quenched calculations [56,57] (i.e. with EM included only in the valence sector) up to full QCD+QED simulations [58]. Adding QED to lattice simulations is challenging for many reasons. Firstly, adding a coupling constant to the theory, especially in the context of non-degenerate light quarks, considerably increases the cost of the simulations, particularly when using a chiral action close to the physical point. Secondly, the absence of mass gap in QED implies finite-size effects with power-law dependence on the lattice spatial extent, which are potentially large compared to the QED contributions [58,59]. Finally, it is still not clear how to define quantities such as decay constants in QCD+QED, because the matrix elements are infrared divergent and gauge dependent [60]. Because of these issues, the addition of isospin-breaking effects and electromagnetism remains an important and challenging topic for our future calculations.
Dynamical charm effects are expected to be small for the majority of the quantities studied in this paper, but for quantities such as the K L −K S mass difference and K → ππ amplitudes they can have significant contributions. This is therefore the most promising avenue for RBC and UKQCD to take, allowing us to address these systematic errors on our flagship calculations. The biggest hurdle for including the charm is the requirement of simulating with finer lattice spacings, which tends to incur freezing of topology as well as requiring large computing power to obtain sufficiently large physical volumes. RBC and UKQCD have developed the 'dislocation enhancing determinant' (DED) method [61] to overcome the effects of the topology freezing, and have already commenced large-scale physical simulations with dynamical charm.

ACKNOWLEDGMENTS
The generation of the 48 3  Research Council of Canada.

Appendix A: Conserved currents of the Möbius domain wall action
The connection of the Möbius formulation to overlap fermions can be made at the propagator level and with the familiar DWF physical fields q L and q R . In the following subsection we repeat known but important results connecting the surface-to-surface and surface-to-bulk propagators of the Möbius domain wall action (in our conventions) with the four dimensional overlap propagator.
These results are then used to establish a practical implementation of the conserved axial and vector currents for the Möbius case.

Domain wall and overlap propagators, and contact terms
The approximate overlap operator can be written in terms of our four dimensional Schur complement matrices as Observe that if we solve the following 5-D system of equations, and substitute the UDL decomposition, this yields Since L(m)(q, 0, . . ., 0) T 1 = q and (L(m)φ ) 1 = φ 1 , the topmost row of our 5-D system of equations gives the overlap propagator: This approximate overlap operator can however be expressed in terms of theψ basis fields, and The cancellation the Pauli-Villars term can be expressed in terms of unmodified generalized domain wall matrix D 5 GDW . The overlap contact term can be subtracted from the overlap propagator. Here we definẽ This relation is simpler to interpret in our convention than with the convention from Ref. [12]: the mass term is applied to our five dimensional surface fields without field rotation. With this, This is just the normal valence propagator of the physical DWF fields q = (P −1 ψ) 1 andq = (ψR 5 P) 1 . We see that the usual domain wall valence propagator has always contained both the contact term subtraction and the appropriate multiplicative renormalization of the overlap fermion propagator. As a result, the issues of lattice artifacts in NPR raised in Ref. [63] have never been present in domain valence analyses. This was guaranteed to be the case because Shamir's 5-D construction is designed to exactly suppress chiral symmetry breaking in the limit of infinite L s , including any contact term.
For later use, we may also consider the propagator into the bulk from a surface field q for Möbius fermions, Now, and so we have, . . .
Finally, applying the permutation matrix, we have the five dimensional propagator from a physical field, . . .
The connection between domain wall systems and the overlap, well established in the literature and reproduced in this section, is useful in understanding the relation of domain wall fermions to their 4-D effective action.

Conserved vector and axial currents
The standard derivation of lattice Ward identities proceeds as follows. A change of variables of the fermion fields ψ andψ at a single site y is performed: Under the path integral, the Jacobian is unity, and the partition function is left invariant: The Wilson action gives eight terms from varyingψ y and eight terms from varying ψ y due to the 4-D hopping stencil: where ∆ − µ is the backwards discretized derivative.
An equivalent alternate approach may be taken, however, and this is a better way to approach non-local actions such as the chiral fermions. Gauge symmetry leaves the action invariant at O(α) under the simultaneous active substitution, for a fixed site y of A change of variables on the fermion fields at site y may be performed simultaneously to absorb the phase on the fermions: Under the path integral, the Jacobian is again unity, and the phase associated with the fermion is absorbed. We can now view the change in action as being associated with the unabsorbed phases on the eight gauge links connected to site y: For a gauge invariant Lagrangian we can always use a picture where the same change in action, and same current conservation law may be arrived at by differentiating with respect to the eight links connected to a site: This arises because the phase freedom of fermions and of gauge fields are necessarily coupled and inseparable in a gauge theory. For the nearest-neighbor Wilson action, this generates the same eight terms entering ∆ − µ J µ = 0.
In the case of non-local actions, the Dirac matrix, whatever its form, can be viewed as a sum of gauge covariant paths. When generating a current conservation law from U (1) rotation of the fermion field at site y, we sum over all fieldsψ(x) and ψ(x) connecting through the Dirac matrix D(x, y) to the fixed site ψ(y) andψ(y). The following sum is always constrained to be zero for all y, and is identical to that found by Kikukawa and Yamada [64]: The partitioning of this sum of terms, into a paired discrete divergence operator and current is not obvious, and it is cumbersome to generate Kikukawa and Yamada's non-local kernel.
It is instructive to consider what happens if we derive the same sum of terms by differentiating with respect to the 8 links connected to site y.
The structure of Eq. (A32) always lends itself interpretation as a backwards finite difference. For a non-local action, the differentiation Eq. (A32) appears to generate a lot more terms than the fermion field differentiation Eq. (A31). The reason is clear: these extra terms are constrained by gauge symmetry to sum to zero, but only after cancellation between the different terms in Eq. (A32). Specifically, we consider an action constructed as the product of Wilson matrices: The link variation approach gives three terms, each of which are conserved under a nearestneighbor difference divergence: varying with respect to the 8 links we obtain, via the product rule, Each of these contributions contain a backwards difference operator, and it is trivial to split this into a divergence and corresponding conserved current using Eq. (A24).
The above comment is generally applicable to any function of the Wilson matrix. We take this approach to establish the exactly-conserved vector current of an approximate overlap operator, where the approximation is represented by a rational function. We will also establish that matrix elements of this current are identical to those of the Furman and Shamir approach [9] in the case of domain wall fermions. The Furman and Shamir approach will then be used to also establish an axial Ward identity for our generalized Möbius domain wall fermions under which an explicitly known defect arises. This is important in both renormalizing lattice operators and also in determining the most appropriate measure of residual chiral symmetry breaking in our simulations. We construct the conserved vector current by determining the variation in the overlap Dirac operator, δ y D ov : We can similarly find the variation in T −1 induced by a variation in D W , where the variation in D W is just the backwards divergence of the standard Wilson conserved current operator. Denoting, we see that we may re-express the identity and this lets us find a symmmetrical form: We may now look at the variation of the term T −L s Compiling these results, we find The terms may be expanded until insertions of the the backwards divergence of the Wilson current are reached (Eq. (A24)). Gauge symmetry then implies the conservation of the obvious current and the vector Ward identities can be constructed. For example, we may take as source η j j ′ αα ′ (z) = δ j j′ δ αα ′ δ 4 (z − x) and a two-point function of the conserved current may be constructed as Note that when c = 0, the insertion of Eq. (A40) contains only terms such as which are also present in the surface to bulk propagator Eq. (A20). As one would expect, when we take b and c to represent domain wall fermions, the two-point function of our exactly conserved vector current -derived from the four dimensional effective action -exactly matches the matrix element of the vector current constructed by Furman and Shamir [9], Eq. (2.21), from a five dimensional interpretation of the action.
Since the Furman and Shamir current was easily constructed from the five dimensional propagator Eq. (A20), one might hope to do the same in the generalized approach to domain wall fermions.
To play a similar trick for the c term, we would need to generate the terms and These are not manifestly present in Eq. (A19). However, the presence of the contect term on the s = 0 slice can be removed after a propagator calculation. We define this slice as In a practical calculation, the source vector η may be used to eliminate the contact term by forming By applying P + and P − we find we have the following set of vectors and we may eliminate to form a L s + 1 vectors from a 4-D source η This may be used to construct for s ∈ {0 . . . L s − 1}, and by contracting these vectors through the Wilson conserved current the matrix element, Eq. (A42), can be formed in a very similar manner to the standard DWF conserved vector current. When c = 0 the matrix element reduces to being identical to that for the Furman and Shamir vector current.
A flavor non-singlet axial current, almost conserved under a backwards difference operator, can now also be constructed following Furman and Shamir. We associate a fermion field rotation where We acquire a related (almost-) conserved axial current, whose pseudoscalar matrix element is The exact vector current conservation induces the same J 5q midpoint density defect that arose for DWF, and the Ward identity is ∆ − µ ψγ 5 ψ(x)|A µ (y) = ψγ 5 ψ(x)|2mP(y) + 2J 5q (y) .
This allows us to retain the usual definition of the residual mass in the case of Möbius domain wall fermions. We emphasize that the definition, via the zero-momentum pion matrix element of J 5q is particularly important, because then our PCAC relation, π( p = 0)|2mP + 2J 5q = 0, guarantees that the low momentum lattice pions are massless. This is the appropriate measure of chiral symmetry breaking for the analysis of the chiral expansion.
Section III C discusses methods of using the vector and axial ward identities to measure the renormalization of the local vector and axial currents, and their use in our analysis. 4. Using a r = R e a a e , rewrite the function in terms of the lattice spacing on the ensemble e:  This fit function now describes the data in lattice units for the ensemble e.

Appendix C: Dependence of the lattice spacing on the fermion action
In Sec. IV we described that, contrary to our expectations, combining the 24I and 48I ensembles into a single global fit required that two lattice spacings, differing by 3.2(2)%, be used for these two, nominally similar ensembles. (Similar but smaller discrepancies between the lattice spacings for the 32I and 64I ensembles were also found.) In this appendix we will discuss this phenomenon in greater detail and describe additional measurements that we performed in order to verify that this assignment of different lattice spacings is correct. For clarity we will focus on the 24I and 48I If we were to describe the low energy Green's functions computed on the 24I and 48I ensembles as corresponding to separate Symanzik effective theories, these two effective theories would be essentially identical, except for differences in their low energy constants of order (ma) n . For example, in a theory with chiral fermions the dimension-4 (F µν ) 2 term, closely related to the lattice scale, would have coefficients which differed by terms of order (ma) 2 , terms much too small to be relevant here. Of course, had such a term been important, our global fitting procedure would have included its effects by describing both the 24I and 48I ensembles with a single Symanzik effective theory, with a single lattice spacing, whose mass-dependent coefficients were represented by explicit mass-dependent terms in the fit. In this framework both the 24I and 48I ensembles would be described by the same lattice spacing a and the same value of R a .
It may be useful to briefly review the meaning of the lattice spacing a as it is generally defined in field theory and specifically defined in the calculation presented here. Perhaps the simplest way to define the cut-off scale is by specifying the value of a "physical" quantity, such as the Wilson flow or three-gluon coupling, at a sufficiently short flow time or large gluon momentum that the process can be understood in perturbation theory. Theories with identical lattice actions but with different quark masses will give the same value for the lattice scale up to terms of order (ma) 2 if we introduce the lattice scale a as the natural lower/upper limit on the flow times or momentum scales that are available for such a short-distance definition. From this perspective, such mass dependent effects are much too small to result in the 3% discrepancy we find. In our actual approach, we define the lattice spacing through the mass of the Ω − . This requires our global fitting procedure and an explicit extrapolation to a specific value of input quark masses, specifically those which give physical values for m π /m Ω and m K /m Ω , in order that such a low-energy definition of the lattice scale be well defined. Necessarily, in this approach the 24I and 48I ensembles are assigned a common lattice spacing and their different input quark masses are completely accounted for in the global fitting procedure (up to negligible systematic effects). For our low-energy definition of the lattice spacing, it is not possible to interpret the 3% difference in a between the 24I and 48I ensembles as resulting from their different input masses.
Instead, the change in the lattice spacing between the 24I and 48I ensembles must be attributed to some other change in the lattice action. We are left to conclude that this effect must be a result of the change in fermion formulation. As discussed in Section II, we can consider this change as For example, the value of the lattice spacing, which is determined by the strength of QCD inter-actions at the scale of Λ QCD , is a strong function of the anti-screening produced by QCD vacuum polarization. The quarks act to reduce this anti-screening, and the Pauli-Villars determinant was originally included in the domain wall fermion action [8] to regulate what would have been a divergent contribution to QCD vacuum polarization coming from the increasing number of fermion species as L s → ∞. While, as can be seen by the relation with overlap fermions discussed in Sec. II, these effects have a well defined L s → ∞ limit, we cannot rule out the possibility that they appear at the 3% level for β = 2.13 and L s = 16. Instead, we interpret this large shift in a as providing new information about the potential effects of finite L s , and a warning that simple estimates can occasionally be misleadingly low. In this spirit, we should recognize that the earlier arguments about the insensitivity of the coefficients of the O(a 2 ) Symanzik correction terms to our change in action may underestimate these effects. Of course, in this case, if our few tenths of a percent estimate were to become even a 5% effect, it would not interfere with our current continuum extrapolations.
Since the conclusion, implied by our global fits, that the lattice spacing did indeed change by 3.2% and 1% when going from the 24I to 48I and 32I to 64I ensembles respectively, was a surprise, it was important to test this hypothesis. For that purpose, we generated two additional MDWF+I ensembles with input parameters set equal to those of the lightest 24I and 32I ensembles (i.e. those with am l = 0.005 and am l = 0.004, respectively), but using the Möbius parameters and L s values that were used for the 48I and 64I ensembles respectively. We compensated for the reduction in the residual mass by increasing the input bare quark mass in order that the total quark masses remain equal to the 24I and 32I values. If the observed differences in the lattice spacings can indeed be attributed to the change in L s (that which would have been required if the new ensembles were generated with the Shamir action), then the lattice scales for these new ensembles should match those determined for the 48I and 64I ensembles.
We refer to these new ensembles as the '24Itest' and '32Itest' ensembles. They were generated  is responsible for the observed differences in lattice spacing. It provides further confidence in our global fitting procedure, which was sufficiently reliable to produce strong evidence for this effect even though it was not expected in advance.
Note, in this explanation we continue to assume the near equality of the Möbius and Shamir 4-D theories for fixed L s (b + c), and to view the difference in a as what would have been observed had we used only the Shamir action, increasing L s from 16 to 48 (for 24I/48I) and 16 to 24 (for 32I/64I). While we believe that this assumption has a strong theoretical justification, the numerical experiment just described does not provide direct evidence for its validity.

Appendix D: Weighted fits
We define a weighted χ 2 as  where i indexes the measurements, y i and σ i are the measured value and statistical error, x i the associated coordinates, and c the set of parameters of the fit function f . The quantities ω i are set to a value Ω for some subset of the data, where Ω is assumed to be large, and to unity for all other data. We demonstrate below that the dependence on Ω vanishes in the limit Ω → ∞ and that this limit is sensible.
The minimum of χ 2 satisfies Writing out the derivative explicitly and dividing both sides by Ω gives the following expression: If we naïvely take the Ω → ∞ limit of this equation, it appears that all of the data with ω i = 1 drop out entirely and hence do not contribute to the fit. This is certainly true in those cases in which the number of data points with weight ω i = Ω is sufficient to determine the full set of parameters c. However when there are fewer points, there is no solution that satisfies Eq. (D3) in the Ω → ∞ limit. We argue that if one first determines the solution for finite Ω, either analytically or numerically, then afterwards take the limit Ω → ∞, the solution remains valid and in fact depends on the data with ω i = 1. The resolution of this apparent paradox is that when the overweighted points are insufficient to determine the parameters, the fit has (almost-)unconstrained directions with infinitesimally small curvature arising from the vanishing unweighted data, and hence there is a well defined minimum.

a. Simple example
It is straightforward to demonstrate the behavior discussed above via a simple example in which we are attempting to determine the parameters of the function by minimizing where r i are a series of N data points with coordinates x i and unit variances for simplicity.
Let us first consider a scenario in which we have three data points (N = 3), two of which are overweighted: w 0 = w 1 = Ω, and the third is assigned w 2 = 1. Here the overweighted data points are sufficient to determine both parameters and the result of solving for the minimum of Eq. (D5) in the limit of large Ω, and the result of solving at finite Ω and taking the limit afterwards, are identical: Notice that this result does not contain the unit-weight data point, r 2 Let us now consider just two data points (N = 2), and take w 0 = Ω and w 1 = 1 such that the number of overweighted data points is no longer sufficient to determine both parameters. The equations for the minimum of χ 2 are: (D7) Taking the large Ω limit gives These are identical up to a trivial normalization, hence we have two unknowns and only one equation; no unique solution can be found. (Note that the fact that the equations are the same will not be true in a general case with multiple over-constrained data points; there one would instead find expressions that cannot be simultaneously satisfied.) On the other hand we can solve for the minimum at finite Ω; the solutions are identical to those given in Equation (D6), and are independent of Ω, allowing us to take the large Ω limit a posteriori without issue.
Finally we consider one further example, again with three data points but this time with only one over-weighted: N = 3, w 0 = Ω and w 1 = w 2 = 1. Here, as above, the number of overweighted points is insufficient to determine both parameters, but all three points together are more than enough to constrain the parameters (with one degree of freedom). We might therefore expect that the solutions at finite Ω would be Ω-dependent unlike in the previous example. Indeed this is the case, but it is straightforward to show that the solutions are finite in the limit Ω → ∞ and furthermore that they are functions of all three data points in this limit. The expressions are somewhat lengthy and we have not reproduced them here, but we have plotted the Ω dependence of the solutions for a particular set of data points and parameters in Figure 25. In the figure we also plot the function before and after the weighting, demonstrating that it does indeed pass through the over-weighted data point. in χ 2 under a shift of the fit parameters is less than some chosen value, δ χ 2 min . As we increase Ω at fixed δ χ 2 min , the relative effects of fluctuations in the unweighted data are reduced and the fit becomes more tolerant to increasingly large deviations of the fit from the unit-weight data. This manifests as an increase in the jackknife statistical error of our predictions. We must therefore choose a value of δ χ 2 min that is small enough to properly take into account the constraints from the unit-weight data. The choice is limited by the increased time for the fit to reach its minimum coupled with the inevitable limits of finite precision. For fixed δ χ 2 min , the time to perform the fit also naturally increases with Ω due to the increase in the overall scale of the fluctuations. We must therefore determine an optimal value for Ω that is large enough that our predictions are no longer noticeably dependent on its value while small enough for the fits to complete in a reasonable time and to be unaffected by finite precision errors.
In Figure 26 we show examples of the Ω dependence on the predicted values of f π , f K , w 0 and t 1/2 0 . The plots also show the result of reducing the stopping condition δ χ 2 min by several orders of magnitude. We observe percent-scale shifts in the central values of these quantities from the unweighted fit results, and we clearly see the behavior flattens out at around Ω = 1000. We choose Ω = 5000 as a value large enough to be well within the flat region while small enough to avoid the difficulties discussed above. For the chosen value of Ω we observed no significant dependence of the results on δ χ 2 min , but to be conservative we chose 1 × 10 −4 as our final value. Note that we observed stronger dependence of our results on δ χ 2 min for some alternate choices of guess parameters, but with tighter stopping conditions the results stabilized and agreed with those  respectively, and were originally described in Refs. [33] and [6]. We also perform measurements of the Wilson flow scales on the 32ID ensemble, which has a lattice volume of 32 3 × 64 × 32, Shamir domain wall fermions with the Iwasaki+DSDR gauge action at β = 1.75, and was described in Ref. [5].

Wilson flow scales
The procedure for determining the Wilson flow scales is described in Section III F. We have three

Vector current renormalization
In Section III C we describe how the renormalization coefficient relating the domain wall local axial current to the physically-normalized Symanzik current can be determined via the quantity Z V /Z V , which relates the local vector current V µ to the conserved 5D current V µ . This quantity is used to renormalize the decay constants. In our earlier works [5,6] we obtained Z V by fitting directly to the ratio of two-point functions, Since the lightest state that couples to the vector operator is the noisy ρ meson, for this work we instead determine the ratio for the 48I, 64I and 32Ifine ensembles via the three-point function, π|V µ |π , as described in Section III C 2; this procedure gives a substantially more precise result than the above. In the global fits we attempt to describe the aforementioned ensembles, along with 32I and 24I ensemble sets, using the same continuum scaling trajectory. In order to guarantee consistent scaling behavior we must therefore recompute Z V on the 32I and 24I ensemble sets using the new method. This is not necessary for the 32ID ensembles, which are described by a On the 24I ensemble set we measured on 147 and 153 configurations of the am l = 0.005 and 0.01 ensembles respectively. We also included 85 measurements on the heavier am l = 0.02 ensemble and 105 measurements on the am l = 0.03 ensemble described in Ref. [33]. For the 32I ensembles we measure on 135, 152 and 120 configurations of the am l = 0.004, 0.006 and 0.008 ensembles respectively. In Table XXVI we list the measured values on each ensemble and extrapolated to the chiral limit.

Appendix F: Non-perturbative renormalization
In order to determine the renormalization coefficients for the quark masses and B K , we use what is now the standard framework for our collaboration: the Rome-Southampton non-perturbative renormalization schemes [37] with momentum sources, twisted boundary conditions and nonexceptional kinematics [65][66][67][68]. This setup has already been described in several previous publications [5,67,69,70], and results in tiny statistical errors, infra-red contamination suppression, and consistent removal of a 2 discretization effects in the vertex functions.
A key aspect of the RI-MOM approach is that any other, potentially regularization dependent, scheme may be easily converted into the RI-MOM scheme using momentum-space scattering amplitudes determined (either perturbatively or non-perturbatively) solely within that other scheme. figurations, for which we use the timeslice by timeslice FASD algorithm [26]). We use nonexceptional 'symmetric' momentum configurations, defined by the condition where, for bilinear vertices, p 1 and p 2 are the incoming and outgoing quark momenta respectively, and for the four-quark vertices used to compute Z B K the quark momenta are assigned as follows: In the above, q = p 1 − p 2 is the momentum transfer.
In contrast to the symmetric scheme, the original RI-MOM scheme defined in Ref. [37], which we do not include here, corresponds to the zero-momentum transfer kinematics, i.e. p 1 = p 2 , and suffers from enhanced non-perturbative effects at high energies arising from low-momentum loop effects; in particular the effects of the dynamical chiral symmetry breaking are greatly enhanced [6].
We compute projected, amputated vertex functions of the form Precise definitions of the projectors P depend on the choice of operator, the kinematics, and the choice of scheme. In practice the Green's functions are first computed at finite values of the quark mass and then extrapolated to the chiral limit; this quark mass dependence is however very mild for the non-exceptional schemes considered here and we omit it below for the purpose of clarity.
The renormalization factors are defined by imposing where Z q is the quark wave function renormalization factor, and n the number of fermion fields in O. A second, separate condition is required in order to extract Z q . Note that the right-hand side of the above depends on the choice of projector.
In order to simplify the equations, we introduce the following notation: projection scheme and for each ensemble, as a function of the external momenta.
In this work we are only interested in quantities that renormalize multiplicatively, such that the Z-factors and the Λs are simply scalars. For a general lattice action with non-zero chiral symmetry breaking, the four-quark operator responsible for K −K mixing in fact mixes with other operators, and Z O and Λ bare O become matrix-valued [69]. However for our choice of action, the residual chiral symmetry breaking is negligible and only multiplicative renormalization is required.
Once a bare matrix element O bare (a) of the operator O has been computed on a lattice with lattice spacing a, the Z-factor can be used to convert it into the corresponding MOM-scheme: In order to connect the lattice results to phenomenology, they have to be matched to a scheme suitable for a continuum computation, such as MS; this is performed using perturbation theory.
The final equation reads: This quantity has a well-defined continuum limit as any potential divergences are absorbed by the

Z-factors.
We remind the reader that the Z-factors defined above are scheme dependent. The renormalization scheme is fixed by the choice of projectors and of kinematics; specifically, with the choice of symmetric kinematics given above, it depends on the projector used for the operator O and that used to extract Z q . For both the quark mass renormalization factor, Z m , and the B K renormalization factor, Z B K we use two SMOM schemes; for the former these are the RI-SMOM and the RI-SMOM γ µ [68] schemes, and for the latter the SMOM(γ µ , γ µ ) and SMOM( / q, / q) [38] schemes.
In the main analysis we use the difference between the MS results computed using these two , and from there to MS in the usual way. This is described in more detail in Section V C.
We first introduce the renormalization factor of the flavour non-singlet bilinears. We define Λ S and Λ P , the amputated and projected Green's functions of the scalar and pseudoscalar bilinear operators respectively, as Similarly, for the local vector and axial currents we define: where (s) denotes the choice of projector. Following Ref. [68], we define the γ µ and the / q-schemes (or projectors) in the following way: and Γ ( / q) V µ = / qq µ /q 2 , and Γ ( / q) For completeness, we also renormalize the tensor current. The vertex function is Π σ µν , where and the amputated and projected vertex are For the projectors, we use The corresponding renormalization factors Z S,V,T,A,P /Z q can then obtained by imposing Eq. (F3) with n = 2.
To obtain the renormalization factor of the quark mass, Z m , we take the ratio of the vector and scalar bilinears in order to cancel the quark wave-function renormalization: where Z V is computed hadronically via the procedure given in Section III C 2. In the previous equation, we have used the fact that Z m = 1/Z S = 1/Z P in the chiral limit. Similarly, we should expect Z A = Z V up to some small corrections arising, for example, from the fact that we work at finite L s , or due to infrared contaminations. In our estimate of the systematic errors, we have also replaced Λ S by Λ P and Λ V by Λ A in Equation (F14).

b. Renormalization of the kaon bag parameter
The renormalization factor Z B K is defined in a similar manner. The amputated Green's function of the relevant four-quark operator O VV +AA describing K −K oscillations in the Standard Model is computed numerically with a certain choice of kinematics and projected onto its tree-level value.
We normalize by the square of the average between the vector and axial bilinear: where such that the quark field renormalization cancels in the ratio. In practice we find that the difference between the vector and axial vertices are very small, hence choosing the average rather than simply Λ V or Λ A in the denominator, has no discernable effect.
In Eq. (F15), the superscripts s 1 and s 2 label the choice of projectors. We refer the reader to Refs. [5,38] for the details on the implementation, including the explicit definitions of projectors.

a. Quark mass renormalization
For the quark mass renormalization we require only the values on the 32I and 24I ensembles, which together are sufficient to perform the continuum extrapolation of Z m /Z l/h . Here we discuss an update of the analysis performed in Ref. [5] using the newly-determined lattice spacings and a number of additional data points.
In the Rome-Southampton method, the projected vertex functions are first computed at finite quark mass before being extrapolated to the chiral limit. For each ensemble, we use unitary valence quark masses and extrapolate linearly in the quark mass. In the sea sector, the strange quark mass remains fixed to -or close to -its physical value. Since we do not observe any relevant quark mass dependence in our data, we neglect the systematic error associated to the fact that the sea strange quark mass is not extrapolated to zero.
We use partially-twisted boundary conditions to obtain momenta of the following form: wherem combines the Fourier mode with the twist angle θ The fact that these momenta all point in the same direction up to hypercubic rotations means that they lie upon a common continuum scaling curve (i.e. their a 2 dependence is the same), allowing us to unambiguously take the continuum limit.
For the 24I lattice, in addition to the momenta listed in [ For our new ensembles, we have considered only one value of the valence quark mass m sea l = m val l . Again, due to the modest chiral dependence previously observed for the non-exceptional schemes, we expect the associated systematic error to be negligible compare to the other sources of errors (in particular the perturbative matching).
As the 32ID ensemble is comparatively coarse, we renormalize at a lower scale µ 0 ∼ 1.4 GeV and use the non-perturbative continuum step-scaling factor σ (s 1 ,s 2 ) B K (µ, µ 0 ) to run to 3 GeV. This procedure is discussed in Ref. [5]. The step-scaling factor is obtained by performing a continuum extrapolation of the ratio computed on the 32I and 24I lattices.
Since the values of the lattice spacings have been updated, the numbers quoted here differ slightly from our previous work. The strategy is the following: we use the same 32ID lattice renormalization coefficient, Z (s 1 ,s 2 ) B K (µ 0 , a 32ID ), as used previously, but notice that the corresponding value of µ 0 obtained with the new lattice spacings is 1.4363 GeV rather than 1.426 GeV. As a result we must recompute the step-scaling factor. The results for Z (s 1 ,s 2 ) B K at µ 0 can be found in Table XXXVIII and our updated results for σ (s 1 ,s 2 ) B K are reported in Table XXXIX. For each scheme, the 32ID renormalization factor evaluated at µ = 3 GeV is then simply given by

a. Bilinears and quark mass renormalization
The values for the amputated vertex functionsΛ (normalized by the tree level value) at finite quark mass and in the chiral limit computed on the 24I ensemble are given in Tables XXVII and XXVIII for the SMOM γ µ and SMOM schemes respectively. The corresponding numbers for the 32I ensembles are given in Tables XXIX and XXX. Recall that we use only one choice of projector for the scalar and pseudoscalar vertices, specifically those given in Eq. (F7). The results for these vertices computed on the 24I and 32I ensembles are included in Tables XXVII and XXIX respectively.
In Table XXXI we presentΛ interpolated to 3 GeV using a polynomial ansatz in the momenta.
For the 24I lattice, since we have a very fine resolution, we take the five momenta quoted in the tables. For the 32I results we use q ∼ 2.77, 3.10 and 3.43 GeV in the interpolation.
We show the values of the quark mass renormalization in and V vertices. We observe that the differences between the vector and axial vector vertices are very small, and can therefore be neglected. The differences between the scalar and pseudoscalar vertices are slightly larger, but these correspond to only 0.01% changes if used in the computation of Z m , and can therefore be ignored. As discussed above, the systematic error associated with not taking the chiral extrapolation of the sea strange quark mass can also be ignored. Note that the  (7) 1.1165 (7) TABLE XXVIII. Projected, amputated vertex functionsΛ in the SMOM scheme computed on the two 24I ensembles, and in the chiral limit, at scales close to the chosen renormalization scale of 3 GeV.
uncertainties on the lattice spacings are incorporated in these quantities in the main analysis.

b. Renormalization of B K
We quote the results for the projected vertex functionΛ VV +AA at finite masses and in the chiral limit for various momenta on the 32I and 24I ensembles in Tables XXXIII and XXXIV. The corresponding values each computed at a single quark mass on the 48I, 64I and 32Ifine ensembles are given in Tables XXXV, XXXVI and XXXVII respectively.
To obtain the final results we construct the ratio given in Equation (92) at finite quark masses for a few momenta surround the desired scale, either µ 0 = 1.4363 GeV or µ = 3 GeV, take the chiral limit and then perform the interpolation with a polynomial ansatz. Similarly to the quark mass case, the procedure is very robust and does not depend on the order we perform these operations, in Table XXXVIII, and the continuum step-scaling factors used to run the 32ID renormalization factor to 3 GeV are quoted in Table XXXIX.
As with the quark mass renormalization the only significant source of systematic error on these results arises from the perturbative matching to MS, which we estimate using the full difference between our final predictions for B K determined via the two intermediate SMOM schemes. As above, we incorporate the uncertainties on the lattice spacings into our renormalization factors in the main analysis.

Appendix G: Random number generator
After all the data presented in this paper was generated, it was found that the U(1) noise generated from the freshly initialized random number generator (RNG) in CPS [71] is vulnerable to correlations, such that the expectation value of ∑ x ∑ N j=1 e −iθ (x) j 2 /V deviates from N. This correlation is not observed when U(1) noise is replaced with gaussian noise, for which the accept/reject procedure used in generating the gaussian random numbers appears to eliminate the observed correlation. We also confirmed that the U(1) noise generated from the CPS RNG for the thermalized gauge configurations on our previous ensembles do not show the correlation, due to the de-correlating effect of the gaussian RNG used for the pseudofermion fields.
To further test the robustness of gaussian random numbers generated from CPS RNG, we reproduced the 2+1 flavor DWF ensemble used in Ref. [72], with the RNG's replaced with the Mersenne Twister [73], implemented in C++11. Each 2 4 hypercube of lattice sites was initialized with randomized seeds. We confirmed that the plaquette agrees to within 1 standard deviation: 0.588064 (12) from 8460 MD units compared to 0.588052(9) from the configurations used in Ref. [72]. All the random numbers generated from CPS RNG for the work presented here were gaussian random numbers. The only exception are the Z(3) random numbers for Z 3 box source (γ µ , γ µ ) scheme, lowest momenta q/GeV used for the Ω baryon in Section III E, which was generated independently from the CPS RNG.
(γ µ , γ µ ) scheme q/GeV   (γ µ , γ µ ) 0.9161 (5)   we quote values at both 1.4363 GeV and 3 GeV, which are used to compute the step-scaling factor. For the coarse 32ID ensemble we only quote the value at the lower scale, and for the 48I, 64I and 32Ifine we do not quote the values at the lower scale as they are not needed for our analysis. These values do not include the effect of the uncertainty on the lattice spacing in their errors.
These values do not include the effect of the uncertainty on the lattice spacing in their errors.