Simulating rare kaon decays $K^{+}\to\pi^{+}\ell^{+}\ell^{-}$ using domain wall lattice QCD with physical light quark masses

We report the first calculation using physical light-quark masses of the electromagnetic form factor $V(z)$ describing the long-distance contributions to the $K^+\to\pi^+\ell^+\ell^-$ decay amplitude. The calculation is performed on a 2+1 flavor domain wall fermion ensemble with inverse lattice spacing $a^{-1}=1.730(4)$GeV. We implement a Glashow-Iliopoulos-Maiani cancellation by extrapolating to the physical charm-quark mass from three below-charm masses. We obtain $V(z=0.013(2))=-0.87(4.44)$, achieving a bound for the value. The large statistical error arises from stochastically estimated quark loops.


I. INTRODUCTION
The K + → π + + − ( = e, µ) decays are flavorchanging neutral current processes that are heavily suppressed in the standard model (SM), and thus expected to be sensitive to new physics.
Their branching ratios, taken from the latest PDG average [1], are Br [K + → π + e + e − ] = 3.00(9) × 10 −7 and Br [K + → π + µ + µ − ] = 9.4(6) × 10 −8 . This process is dominated by a single virtual-photon exchange (K → πγ * ), whose amplitude is predominantly described by long-distance, nonperturbative physics [2]. With tensions between the LHCb measurement [3] of and SM predictions for the ratio R K contributing to increased interest in lepton-flavor universality (LFU) violation, important tests of LFU in the kaon sector could also be provided by K + → π + + − decays [4]. The amplitude for the K → πγ * decay can be expressed in terms of a single electromagnetic form factor V (z) defined via [2,5] where µ is the photon polarisation index, z = q 2 /M 2 K , q = k − p, and k and p indicate the momenta of the K and π respectively. From analyticity, a prediction of V (z) is given by [2] V (z) = a + + b + z + V ππ (z), where a + and b + are free real parameters and V ππ (z) describes the contribution from a ππ intermediate state (detailed in [2]) with a π + π − → γ * transition. The free parameters have, until recently, only been obtained by * antonin.portelli@ed.ac.uk fitting experimental data. Having previously measured the K + decay channel for electrons and muons at the NA48 experiment at the CERN SPS [6], the follow-up NA62 experiment measured the K + → π + µ + µ − decay during the 2016-2018 Run 1 [7], with prospects for further measurements during the 2021-2024 Run 2 [8]. From the NA48 electron data, values of a + = −0.578 (16) and b + = −0.779(66) have been found [6], and the available NA62 muon data resulted in a + = −0.592 (15) and b + = −0.699(58) [7]. In parallel, the theoretical understanding of these processes is being improved. The authors of [9,10] construct a theoretical prediction of a + and b + by considering a two-loop low-energy expansion of V (z) in three-flavor QCD, with a phenomenological determination of quantities unknown at vanishing momentum transfer. From the electron and muon they find a + = −1.59 (8) and b + = −0.82 (6), in significant tension with the experimental data fit. The authors acknowledge that more work is being done to estimate more accurately the ππ and KK contributions.
The nonperturbative ab-initio approach of lattice QCD is well suited to study the dominant long-distance contribution to the matrix element of the K + → π + γ * decay. Methods with which such a lattice calculation could be performed were first proposed in [11], and additional details on full control of ultraviolet divergences were introduced in [12]. An exploratory lattice calculation [13], using unphysical meson masses, demonstrated a practical application of these methods.
This letter describes a lattice calculation following the same approach as [13], but using physical light-quark masses, thereby allowing for the first time a direct comparison to experiment.

II. EXTRACTION OF THE DECAY AMPLITUDE
The procedure and expressions in this section are largely a summary of the approach described in [13]. We wish to compute the long-distance amplitude defined as in Minkowski space, where q, k, and p are defined as above, J µ is the quark electromagnetic current and H W is a ∆S = 1 effective Hamiltonian density, given by [14] H where the C j are Wilson coefficients, and Q q 1 and Q q 2 are the current-current operators defined (up to a Fierz transformation) by [11] Q We renormalize the operators Q q i nonperturbatively within the RI-SMOM scheme [15] and then follow [16] to match to the MS scheme, in which the Wilson coefficients have also been computed.

A. Correlators and Contractions
The corresponding Euclidean amplitude-which is accessible to lattice QCD calculations-can be computed with the "unintegrated" 4pt correlator [12] where φ † P (t, k) is the creation operator for a pseudoscalar meson P at time t with momentum k. To obtain the decay amplitude we take the integrated 4pt correlator [12] (8) in the limit T a , T b → ∞. The exponential factor translates the decay to t J = 0, allowing us to omit any t J dependence in further expressions. HereΓ (4) µ is the "reduced" correlator, where we have divided out factors that are not included in the final amplitude, i.e.
The spectral decomposition of Eq. (7) has been discussed in detail in [13], in particular describing the presence of intermediate one-, two-, and three-pion states between the J µ and O W operators. As these states can have energies E < E K (k) they introduce exponentially growing contributions that cause the integral to diverge with increasing T a . These contributions do not contribute to the Minkowski decay width [12] and must be removed in order to extract the amplitude whereĨ µ is the integrated 4pt correlator with intermediate-state contributions subtracted. The methods used to remove the intermediate states follow the same steps as in [13], and are outlined in Section II B. The four classes of diagrams-Connected (C), Wing (W ), Saucer (S), and Eye (E)-that contribute to the integrated correlator are represented schematically in the supplementary material. The current can be inserted on all four quark propagators in each class of diagram, in addition to a quark-disconnected self-contraction. Diagrams of these five current insertions for the C class are also shown in the supplementary material. The 20 resulting diagrams need to be computed in order to evaluate Eq. (7).
When working on the lattice there are potentially quadratically divergent contributions that come about as the operators J µ and H W approach each other when the current is inserted on the loop of the S and E diagrams [11,12]. Since we perform our calculation with conserved electromagnetic currents the degree of divergence is reduced to, at most, a logarithmic divergence [11] as a consequence of U(1) gauge invariance and the resulting Ward-Takahashi identity. We emphasise that, due to exact gauge symmetry in lattice QCD there is a vector current, which is exactly conserved on each configuration, independent of any residual chiral symmetry breaking. The remaining logarithmic divergence is removed through the Glashow-Iliopoulos-Maiani (GIM) mechanism [17], implemented here through the inclusion of a valence charm quark in the lattice calculation.

B. Intermediate states
The contribution of the single-pion intermediate state can be removed by either of the two methods discussed in [13]. The first of these (method 1) reconstructs the single-pion state using 2pt and 3pt correlators to subtract its contribution explicitly. The relevant amplitude can be extracted with this method in several ways, including a direct fit of A µ and the intermediate state, the reconstruction of the intermediate states using fits to 2pt and 3pt correlators, a zero-momentum-transfer approximation and an SU(3)-symmetric-limit approximation, all of which are discussed in detail in [13].
The second method proposed in [13] (method 2) involves an additive shift to the weak Hamiltonian by the scalar densitysd [18] where the constant parameter c s is chosen such that Replacing O W with O W in Eq. (7) removes the contribution of the single-pion intermediate state.
As the scalar density can be written in terms of the divergence of a current, the physical amplitude is invariant under such translation [12]. The two-pion contributions are expected to be insignificant until calculations reach percent-level precision and the three-pion states are even more suppressed [12]. As we do not compute the rare kaon decay amplitude to such a precision, the two-and three-pion states are not accounted for in our studies.

III. DETAILS OF CALCULATION
This calculation is performed on a lattice ensemble generated with the Iwasaki gauge action and 2+1 flavors of Möbius domain wall fermions (DWF) [19]. The Möbius DWF action [20] was used to simulate the sea quarks, with a rational approximation used for the strange quark. In this calculation the light valence quarks make use of the zMöbius action [21], an approximation of the Möbius action where the sign function has had its L s dimension reduced by using complex parameters matched to the original real parameters using the Remez algorithm. This gives a reduced fifth-dimensional extent L s = 10, reducing the computational cost of light-quark inversions. The lowest 2000 eigenvectors of the Dirac operator were also calculated ("deflation"), allowing us to accelerate the light-quark zMöbius inversions further. We correct for the bias introduced by the zMöbius action with a technique similar to all-mode-averaging (AMA) [22] by computing light and charm propagators also using the Möbius action on lower statistics, using the Möbius accelerated DWF (MADWF) algorithm [23] with deflated zMöbius guesses in the inner loop of the algorithm for the light and a mixed-precision solver for the charm quarks. Further details are in the supplementary materials.
The GIM subtraction relies on a precise cancellation, in particular in the low modes of the light and charm actions, and it is paramount to use the same actions for those quarks. With the choice of zMöbius parameters for the light quark, the DWF theory breaks down for the physical charm-quark mass [24]. We instead perform the GIM subtractions using three unphysical charmquark masses, chosen to be am c1 = 0.25, am c2 = 0.30, am c3 = 0.35, and extrapolate the results to the physical point. The physical charm-quark mass was found to be am c = 0.510(1) by computing the three unphysical η cmeson masses and extrapolating to the physical η c mass. Previous work has demonstrated that, for the lattice parameters in use for this calculation, such an extrapolation is well-controlled [25].
We use Coulomb-gauge fixed wall sources for the kaon and pion. The pion and kaon sources are separated by 32 lattice units in time, with the kaon at rest at t K = 0 and the pion with momentum p = 2π L (1, 0, 0) at t π = 32. The electromagnetic current is inserted midway between the kaon and the pion at t J = 16, so that the effects of the excited states from the interpolating operators are suppressed. We omit the disconnected diagram, since it is suppressed by SU(3) flavour symmetry and 1/N c to an expected ∼ 10% of the connected-diagram contribution [13]. Given the error on our final result, the disconnected contribution is negligible. Control of the error is being explored in an ongoing project.
We use the Möbius conserved lattice vector current [19] with only the time component µ = 0, which is sufficient to extract the single form factor from Eq. (1).
To compute the loops in the S and E diagrams we use spin-color diluted sparse sources, similar to those used in [26], the structure of which is described in the supplemental material. We use the AMA technique [22] for our calculation of these diagrams, computing one hit of sparse noise with "exact" solver precision (10 −8 , 10 −10 , 10 −12 , and 10 −14 for the light, c 1 , c 2 , and c 3 quarks, respectively) and the same hit of sparse noise with "inexact" solver precision (10 −4 for all quarks). We then compute an additional 9 hits of sparse noise with inexact solver precision and apply a correction computed from the difference of the reciprocal noises.
We performed all correlation function calculations using dedicated software [27] based on the Grid [28,29] and Hadrons [30] libraries. All three are free software under GPLv2. The raw lattice correlators used in this work are publicly available online [31].

IV. NUMERICAL RESULTS
The 4pt functions for the lightest charm-quark mass are shown in Fig. 1, and Fig. 2 shows the T a dependence of the integrated correlator for fixed T b both before and after removing the exponentially growing contributions using method 2. We perform a simultaneous fit to the 2pt, 3pt and integrated 4pt functions, extracting matrix elements, energies, form factors and A 0 , using a covariance matrix with fully correlated 2pt and 3pt sectors and uncorrelated 4pt sector. From this fit, we obtain A 0 = 0.00022(172) with a χ 2 /dof = 0.996. Further details on the fitting procedure, including a discussion of the fit ranges which were used, are presented in the supplemental material. The error on A 0 is entirely statistical. Table I shows the results for A 0 using the three charmquark masses, extracted using the different methods detailed above. The results from method 2 have statistical errors compatible with method 1 results. As method 2 has the simplest fit structure, we use it to extrapolate to the physical charm-quark mass and to compute the form factor as our final result. We stress that method 1 remains an important cross-check on the analysis. Fig. 4 shows the extrapolation of the method-2 results to the physical charm-quark mass, giving a value of A 0 = 0.00035(180). From Eq. (1) we can relate our TABLE I. Fit results for A0 for the three unphysical charmquark masses and value found from extrapolating these to the physical point. The first four results are obtained using the various approaches to method 1, as described in Section II B, and the final result is obtained using method 2. result to the form factor to achieve V (z) = −0.87(4.44). For our choice of kinematics we have z = 0.013(2); we expect the b + z contribution to be ∼ 10 −2 assuming b + is O(1), and we estimate V ππ (z) = −0.00076(73) following [2]. We may therefore take our result for a + as an approximation for the intercept of the form factor.

V. CONCLUSION
We have carried out the first lattice QCD calculation of the K + → π + + − decay amplitude using physical pion and kaon masses. When using physical light- quark masses, even with unphysically light charm-quark masses, the contributions in the GIM loops statistically decorrelate, as shown in Fig. 3. This contributes to the unsatisfactory amount of noise in GIM subtraction, as can be seen in Fig. 1. Although sparse noises reduced the statistical error introduced by the single-propagator trace contribution to the Eye and Saucer diagrams, we are not able to obtain a well-resolved result for the amplitude.
The form factor that encapsulates the behavior of the long-distance amplitude of the rare kaon decay was found to be V (0.013 (2)) = −0.87 (4.44). When this is compared to experimental results, V exp (0) ≡ a exp + = −0.578 (16) from the electron and a exp + = −0.592 (15) from the muon, it can be seen that the error on our lattice result is about 8 times larger than the central value of the experimental result. However, our error is 3 times larger than the phenomenological central value obtained in [9,10], which suggests that lattice QCD calculations will be able to provide a competitive theoretical bound on a + in the coming years.
We would like to stress that since the noise emerges mainly from the lack of correlation in the GIM subtraction, the error obtained here has the potential to be reduced beyond square-root scaling by optimising the stochastic estimator used for the up-charm loops. Such problems have common elements with similar challenges in computing quark-disconnected diagrams, for example as discussed in [32]. Finally, it might also be possible to work in 3-flavor QCD, foregoing the calculation of the charm-quark loop [33], further reducing computational costs. This would require a new renormalization procedure which would be analogous to that of the K → πνν study that was performed by the RBC-UKQCD collaborations previously [34,35].
In conclusion, despite obtaining a first physical result with a large uncertainty, we believe that optimisation of the methodology, combined with the increased capabilities of future computers, should allow for a competitive prediction of the K + → π + + − amplitude within the next years. The spacetime distribution of a source may be treated stochastically, in order to decrease the effects of local fluctuations from the gauge fields. This is important for constructing lattice propagators of the form S (x, x), which are needed to calculate a disconnected diagram or a single-propagator trace contribution to a correlation function, needed for the Eye and Saucer diagrams (Fig. 7) contributing to the rare kaon decay amplitude. To create the propagators we depend on N stochastic sources κ i that fulfill the properties

This work used the DiRAC Extreme
One appropriate choice is the Z 2 source [36], where each element is randomly chosen from It is expected that the statistical error introduced from using stochastic sources scales as 1/ √ N . Each stochastic source here covers the full volume but we can also create "sparse sources", similar to those described in [26], to improve the 1/ √ N scaling of the statistical error. In ddimensional spacetime we create N = n d sparse sources where κ sparse (x) = κ Z2 (x) : x µ mod n = 0, µ = 0, 1, 2, 3 0 : otherwise (3) for the first source and we shift in each dimension to ensure that the N sources cover the entire volume with no overlap when combined, see Fig. 5 for a d = 2, n = 2 example. When investigating rare kaon decays we use N = 2 4 = 16 sparse sources for each hit of a propagator, S (x, x), that we compute. A cost-benefit analysis was performed using the quantity ∆X ∆Sparse N X NSparse , where ∆X is the statistical error of the result from method X and the root of the number of inversions N X tracks the computational cost of using method X. Fig. 6 shows the results of this cost-benefit analysis for the 3-point Saucer diagram with zero momentum. The loop in the diagram was computed using sparse sources, full volume sources and time-diluted allto-all vectors [37] with 2000 low modes, with the other propagators being computed with Coulomb-gauge fixed wall sources. This was performed on RBC/UKQCD's 48 3 × 96 Möbius domain wall fermion gauge ensembles [19]. It can clearly be seen that the sparse-noise approach is the most successful.

II. FURTHER RELEVANT CORRELATORS
Before giving details on the fit parameters that were used we outline the definitions of several relevant Euclidean correlation functions.

A. 2-point correlators
Given an interpolation operator φ P (t, p) for a pseudoscalar meson, P , with spacial momentum p at time t, for t 0 the 2-point function has the following behavior: where Z P (p) = 0| φ P (0, 0) |P (p) and E P (p) the meson energy M 2 P + p 2 . We calculated the pion and kaon 2pt functions using Coulomb-gauge fixed wall sources and both Coulombgauge fixed wall sinks and point sinks. Although we only require the wall-wall matrix elements in order to extract the decay amplitude the point-wall 2pt functions have a cleaner signal. Thus both the wall-wall and pointwall correlators can be used in a combined fit to obtain E P (p) with greater accuracy. All pseudoscalar/sink combinations are calculated for p = 2π L (0, 0, 0) and p = 2π L (1, 0, 0).

C. 3-point electromagnetic current correlator
The electromagnetic current 3pt function for a pseudoscalar meson, P , has the following asymptotic behavior for 0 t J t: where M P Jµ (p, k) = P (p)| J µ (0) |P (k) . This correlator is calculated for both the pion and the kaon.

III. VARIANCE REDUCTION TECHNIQUES
To compute the costly loop diagrams we use a variation of the all-mode-averaging (AMA) technique. On each configuration, for a given operator O, we have an estimator O e tj at "exact" solver precision (10 −8 , 10 −10 , 10 −12 , and 10 −14 for the light, c 1 , c 2 , and c 3 quarks, respectively) using T source times t j and a sparse noise n 0 . We furthermore have N estimators O ie tj ,ni on each source time t j at "inexact" solver precision (10 −4 for all quarks), computed for N − 1 sparse noise sources n i in addition to the noise source n 0 . The AMA estimator we construct from those estimators is where the superscript "zM" highlights that so far, the zMöbius action has been used. In a second AMA step, we compute a single estimator O M for the sparse noise source n 0 and source time t 0 at "exact" precision to get our final estimator The resulting expectation value O = O M , but at a much reduced variance. In practice, we used N = 10 and T = 6, where the 6 source times have been chosen to evenly interlace the time extent of our lattice (16 timeslices apart from each other).

IV. FIT PARAMETERS
Results of the decay amplitude are derived from global fits over all correlation functions involved in a specific fit strategy. A summary of the fit parameters for the 2pt and 3pt functions used, consistent across all fit strategies that they enter, is given in Table II. The range of T a and T b used in each case is given in Table III. In Fig. 9 we show the correlation matrix of the 2pt and 3pt functions, as well as slices of the integrated 4pt function, which highlights the high degree of correlation between elements of the integrated 4pt function. This correlation structure in the data makes the use of uncorrelated fits for the integrated 4pt function necessary. π P 0 π W 0 π P 1 π W 1 K P 0 K W 0 K P  9. The correlation matrix for the simultaneous fit used to extract A0. The correlators in this fit are: the 2pt pion correlator with zero momentum and a point-(π P 0 ) wall-sink (π W 0 ), the equivalent correlators with one unit of momentum (π P 1 , π W 1 ), the same again for the kaon (K P 0 , K W 0 , K P 1 , K W 1 ), the vector current inserted on the pion (3 pt π ) and kaon (3 pt K ), the 3pt weak Hamiltonian correlator with zero (H 0 W ) and one unit of momentum (H 1 W ), and the integrated 4pt correlator (I (4) 0 ). This shows the correlation matrix for a fully correlated fit. Due to the highly correlated nature of the matrix, off-diagonal elements for the integrated 4pt function were set to zero to make the unintegrated 4pt function "uncorrelated" to the other fit variables. TABLE II. Fit parameters for the various 2pt and 3pt correlators and methods used to extract the K + → π + + − decay amplitude. With the exception of 3pt cs each of these were correlated to each other for the relevant simultaneous fits with the integrated 4pt function. The cs parameter was fitted separately and used as an input to the cs shift and cs ×sd analyses. The 3pt HW and cs fit parameters are the same for the three unphysical charm-quark masses. "Thinning" is the stride between data points entering the fit within the fit range.