Lattice calculation of the pion transition form factor with $N_f=2+1$ Wilson quarks

We present a lattice QCD calculation of the double-virtual neutral pion transition form factor, with the goal to cover the kinematic range relevant to hadronic light-by-light scattering in the muon $g-2$. Several improvements have been made compared to our previous work. First, we take into account the effects of the strange quark by using the $N_f=2+1$ CLS gauge ensembles. Secondly, we have implemented the on-shell $\mathcal{O}(a)$-improvement of the vector current to reduce the discretization effects associated with Wilson quarks. Finally, in order to have access to a wider range of photon virtualities, we have computed the transition form factor in a moving frame as well as in the pion rest-frame. After extrapolating the form factor to the continuum and to physical quark masses, we compare our results with phenomenology. We extract the normalization of the form factor with a precision of 3.5\% and confirm within our uncertainty previous somewhat conflicting estimates for a low-energy constant that appears in chiral perturbation theory for the decay $\pi^0 \to \gamma\gamma$ at NLO. With additional input from experiment and theory, we reproduce recent estimates for the decay width $\Gamma(\pi^0 \to \gamma\gamma)$. We also study the asymptotic large-$Q^2$ behavior of the transition form factor in the double-virtual case. Finally, we provide as our main result a more precise model-independent lattice estimate of the pion-pole contribution to hadronic light-by-light scattering in the muon $g-2$: $a_{\mu}^{\mathrm{HLbL}; \pi^0} = (59.7 \pm 3.6) \times 10^{-11}$. Using in addition the normalization of the form factor obtained by the PrimEx experiment, we get the lattice and data-driven estimate $a_{\mu}^{\mathrm{HLbL}; \pi^0} = (62.3 \pm 2.3) \times 10^{-11}$.

We present a lattice QCD calculation of the double-virtual neutral pion transition form factor, with the goal to cover the kinematic range relevant to hadronic light-by-light scattering in the muon g − 2. Several improvements have been made compared to our previous work. First, we take into account the effects of the strange quark by using the N f = 2 + 1 CLS gauge ensembles. Secondly, we have implemented the on-shell O(a)-improvement of the vector current to reduce the discretization effects associated with Wilson quarks. Finally, in order to have access to a wider range of photon virtualities, we have computed the transition form factor in a moving frame as well as in the pion rest-frame. After extrapolating the form factor to the continuum and to physical quark masses, we compare our results with phenomenology. We extract the normalization of the form factor with a precision of 3.5% and confirm within our uncertainty previous somewhat conflicting estimates for a low-energy constant that appears in chiral perturbation theory for the decay π 0 → γγ at NLO. With additional input from experiment and theory, we reproduce recent estimates for the decay width Γ(π 0 → γγ). We also study the asymptotic large-Q 2 behavior of the transition form factor in the double-virtual case. Finally, we provide as our main result a more precise model-independent lattice estimate of the pion-pole contribution to hadronic light-by-light scattering in the muon g − 2: a HLbL;π 0 µ = (59.7 ± 3.6) × 10 −11 . Using in addition the normalization of the form factor obtained by the PrimEx experiment, we get the lattice and data-driven estimate a HLbL;π 0 µ = (62.3 ± 2.3) × 10 −11 .

I. INTRODUCTION
There is a long-standing discrepancy between the Standard Model estimate of the muon anomalous magnetic moment and its experimental determination [1]. Two new experiments, E989 at Fermilab [2] and E34 at J-PARC [3], plan to reduce the experimental error by a factor four in the near future. The theory error is completely dominated by hadronic contributions: the hadronic vacuum polarization (HVP), which enters at order α 2 e in the fine-structure constant α e , and the hadronic light-by-light (HLbL) scattering at order α 3 e . The former is usually obtained using dispersive methods which rely on the e + e − → hadrons cross sections, accessible from experiments [4,5]. Thus, a more accurate determination essentially relies on the availability of precise measurements. Lattice QCD is also a promising tool, and has made a lot of progress in recent years. Even if not yet competitive with the dispersive approach, it became a mature field where most sources of systematic uncertainties have now been addressed, see e.g. Ref. [6] for a recent review. As for the HLbL contribution, the situation is less favorable. Until recently, all estimates for the HLbL contribution were based on model calculations, leading to the Glasgow consensus [7] (see also [8]), for which errors are difficult to estimate. The single largest contribution to a HLbL µ is given by the pion-pole contribution [9] with a prescription for its evaluation that has been confirmed in the recently proposed dispersive approaches to HLbL [10,11]. Its determination relies on the knowledge of the neutral pion transition form factor (TFF). Two groups have also started the direct calculation of the HLbL scattering contribution, from first principles, using lattice QCD [12][13][14][15][16][17]. In the Mainz approach, the calculation involves the convolution of a QED kernel function, computed semi-analytically in infinite, continuous coordinate space, and a QCD four-point correlation function, computed on the lattice. The long-distance contribution is expected to be dominated by the pion-pole contribution. However, this region suffers from large statistical errors and is also more affected by finite-size effects. The TFF is therefore a key ingredient to first reduce the statistical error by constraining the tail of the integrand at large distances and, secondly, to estimate and correct for the dominant finite-size effects due to pions.
The TFF is also interesting from a theoretical point of view. First, in the low energy region, it is directly related to the Adler-Bell-Jackiw (ABJ) chiral anomaly [18,19]. Secondly, for large virtualities, the asymptotic behavior of the TFF is predicted by the Brodsky-Lepage analysis and the operator product expansion (OPE) in the single and double-virtual case respectively [20][21][22]. Testing these predictions is a remarkable test of QCD over a large range of length scales.
The normalization of the TFF has been measured by the PrimEx experiment with a precision of 1.4 % [23] and this error should be reduced by a factor of two by the PrimeEx-II experiment [24,25]. At finite virtualities, experimental data are only available in the single-virtual case and for rather large virtualities above 0.6 GeV 2 [26] but the BESIII experiment plans to have data in the region 0.3 -3 GeV 2 , relevant for the muon g − 2, in the near future [27]. Finally, no experimental data exists to date in the regime of two virtual photons.
In our previous work [28], we have shown that lattice QCD can provide a precise estimate of the TFF in the full spacelike region relevant for the pion-pole contribution to the muon (g − 2). This work is an update of our previous lattice calculation of the neutral pion TFF [28], and includes several major improvements. First, it is based on gauge configurations with N f = 2 + 1 dynamical flavors, which have been generated as part of the CLS initiative. Secondly, on-shell O(a)-improvement of correlation functions has been implemented to reduce discretization effects when approaching the continuum limit. Finally, in addition to the pion rest frame, a new frame where the pion carries one unit of momentum, typically in the range of 300 to 400 MeV, is considered. This allows us to probe larger photon virtualities, especially in the single-virtual case, where virtualities as high as 1.5 GeV 2 are now available. In all cases, we probe a much wider range of virtualities than in our previous study and have increased statistics significantly. We also address many sources of systematic errors, including finite-size effects, hypercubic lattice artifacts due to the broken rotational invariance down to the isometry group H(3) on the lattice and disconnected contributions.
As a benchmark of our calculation, we reproduce the anomaly constraint in the continuum and at the physical pion mass with a precision of 3.5%. From the pion mass dependence of the normalization of the form factor we extract the corresponding low energy constant (LEC) appearing in the chiral Lagrangian needed for the NLO calculation of the decay π 0 → γγ in chiral perturbation theory. We confirm previous estimates of this and a related LEC and obtain with additional input from theory and experiment the value Γ(π 0 → γγ) = 8.07 (10) eV in perfect agreement with recent results in the literature [29]. The improvements of our lattice calculation allow for a model-independent determination of the pion-pole contribution to hadronic light-by-light scattering in the muon (g − 2), expected to be numerically dominant, along with the η and η pseudoscalar mesons. Our final lattice result reads a HLbL;π 0 µ = (59.7 ± 3.6) × 10 −11 which corresponds to a precision of 6%. With the normalization of the form factor obtained from the measurement of the decay width π 0 → γγ by the PrimEx experiment, we get the lattice and data-driven estimate a HLbL;π 0 µ = (62.3 ± 2.3) × 10 −11 with a precision of 4%. This paper is organized as follows. In Sec. II, we describe our methodology to extract the pion TFF. In particular, we explain how O(a)-improvement is implemented. In Sec. III, we present our results, extrapolated to the physical point, and discuss potential sources of systematic errors. Then, in Sec. IV, after comparing our data with various phenomenological models, we use our result to compute relevant phenomenological quantities, including the pion decay π 0 → γγ and the pion-pole contribution in the hadronic light-by-light scattering contribution to the muon (g−2). We conclude in Sec. V with a summary of our work and present some possible improvements for future calculations.

A. Extraction of the transition form factor
In this section, we use the same notations as in Ref. [28] and recall only the main equations. In Minkowski spacetime, the TFF describing the interaction between a neutral pion with momentum p and two off-shell photons with momenta q 1 and q 2 is defined via the following matrix element, where J µ is the hadronic component of the electromagnetic current, p = q 1 + q 2 and µναβ is the fully antisymmetric tensor with 1 0123 = 1. In Euclidean spacetime, the TFF is obtained after analytical continuation. The latter is valid for q 2 1,2 < s 0 , with s 0 the treshold for hadron production in the vector channel [30,31] 2 , and reads where n 0 denotes the number of temporal indices carried by the two vector currents, ω 1 is a real free parameter such that q 1 = (ω 1 , q 1 ) and the superscript E stands for Euclidean. It is convenient to write Figure 1: Kinematic reach in the photon virtualities (q 2 1 , q 2 2 ) for the ensemble N200; see Table I for its parameters. Left: pion rest frame. Right: moving frame where the pion has one unit of momentum in the z-direction p = (2π/L)ẑ.

Eq. (2) as
where E π is the pion energy, Z π = 0|P (0)|π is the overlap of the pseudoscalar operator with the pion state 3 and A µν (τ ) is related to the three-point correlation function computed on the lattice, The three-point correlation function is defined as where τ = t i −t f is the time separation between the two vector currents and t π = min(t f −t 0 , t i −t 0 ) is the minimal time separation between the pion interpolating operator and the two vector currents. Finally, even if not explicitly written, the functions A µν (τ ), as well as the three-point correlation functions, depend on the photon spatial momenta q 1 and q 2 .

B. Orbits and kinematical reach
The TFF depends on the two virtualities q 2 1 and q 2 2 . Given that we use periodic boundary conditions in space, the kinematical range accessible on the lattice can be parametrized by Let the spatial momentum p of the pion be fixed. For a given q 1 , the second photon momentum q 2 = p− q 1 is fully determined and the system (6) describes a single curve in the q 2 1 , q 2 2 plane, parametrized by ω 1 . If we now consider a fixed value of | q 1 |, there is a finite number of realizations of q 1 , forming an orbit or a set of orbits 4 on the reciprocal cubic lattice. To this set of q 1 -values correspond in general several values of | q 2 | 2 , each associated with a separate curve in the (q 2 1 , q 2 2 ) plane. In our calculation, the three-point function evaluations obtained for a given set of (| q 1 |, | q 2 |) are averaged over in order to increase the statistical precision. The presence of hypercubic artefacts, due to the breakdown of rotational symmetry on the lattice, is discussed in Sec. III D 1. 3 Fixing the phase of the pion state via the relation 0|A a µ (x)|π b , p = iFπpµδ ab e −ip·x , with A a µ =ψγµγ 5 τ a 2 ψ the isovector axial current, the partially-conserved axial current (PCAC) relation implies that the overlap Zπ = − √ 2im 2 π Fπ/m of the operator P =ψγ 5 ψ is purely imaginary. Here Fπ 92 MeV is the pion decay constant and m is the average up/down quark mass. 4 For instance, the vectors n = (3, 0, 0) and n = (2, 2, 1), which have the same norm, do not belong to the same lattice orbit.
VMD LMD Figure 2: The functions A (1) (τ ) and A (2) (τ ) for two different orbits with (| q1| 2 , | q2| 2 ) = 2π L 2 (3, 2) and | p| = 2π/L for ensemble D200, whose parameters are given in Table I. Lattice data are in black. The blue and red lines correspond to a fit of the tail using the VMD and LMD models respectively, as explained in Sec. II D.
In Ref. [28], we chose the pion to be at rest, and the corresponding orbits for the ensemble N200, whose parameters are given in Table I, are shown on the left panel of Fig. 1. This setup is well suited to probe large virtualities in the double-virtual case, the kinematical region where no experimental data is available. However, because of their large eccentricity, these orbits are limited to rather low virtualities in the single-virtual case. Here we therefore include a second frame where the pion has one unit of momentum in the z-direction. In this case, we can access larger virtualities for the single-virtual form factor, as can be seen on the right panel of Fig. 1. In principle, even larger virtualities could be reached by increasing the pion momentum, but the signal-to-noise ratio would deteriorate rapidly. Moreover, since within our computational setup the pion interpolating operator is implemented using a sequential quark propagator, every new pion momentum p requires a new inversion of the Dirac operator, the most expensive part of the numerical simulation.

C. Decomposition of the integrand Aµν (τ )
The main quantity of interest in the lattice calculation is the function A µν (τ ) defined through Eq. (4) and directly related to the three-point correlation function. Starting from Eq. (3), which holds for all real values of ω 1 such that q 2 1,2 < s 0 , we consider the analytic continuation for complex values of ω 1 = i ω Again, we do not write the dependence on the spatial momenta explicitly: M E µν is a function of q 1 and q 2 and depends implicitly on ω. Relation (7) is valid as long as A µν (τ ) falls off exponentially for τ → −∞ 5 . If E π < √ s 0 , which holds in all cases considered in this paper, the relation is guaranteed to hold for all values of ( q 1 , q 2 ). The inversion of the Fourier transform then yields where the coefficients P µν and Q µν do not depend on ω 1 . Thus, we can write with P E µν = iP µν and Q E µν = (−i) n0 Q µν and where all the information is encoded into a single rotationally invariant function A (1) defined through 5 For τ → +∞, an exponential fall-off is guaranteed, irrespective of the kinematics.
Exact solution Figure 3: Numerical integration for data generated using a LMD model with lattice parameters close to N200. Blue (red) points correspond to the transition form factor F π 0 γ * γ * (−Q 2 1 , −Q 2 2 ) calculated using A (1) (τ ) ( A (2) (τ )) respectively. The exact result is represented by the black line. Top: using a trapezoidal integration rule. Bottom: using a Simpson integration rule.
In Eq. (11) it is understood that the arguments of the TFF are given by Eq. (6) with ω 1 set to i ω. In particular, the integral transform does not keep either virtuality fixed and q 2 2 even carries an imaginary part.
It is easiest to interpret the components of A µν (τ ) in a spatially covariant notation using the Euclidean metric. With and two unit vectors, we can write These are the master relations expressing A µν (τ ) in terms of an integral transform of the pion transition form factor. In the pion rest-frame, A kl (τ ) measures the transform A 1 (τ ) of the transition form factor. At non-zero p, A kl (τ ) measures a linear combination of A 1 (τ ) and its temporal derivative; in addition, the components A 0k (τ ) are proportional to A 1 (τ ). For convenience, with a pion carrying one unit of momentum in the z-direction, we use the notation A (2) (τ ) to write We remark that the matrix elements M µν , being proportional to the TFF, are real. Therefore, M E kl are real and M E 0k are imaginary. From Eq. (3), we conclude that A kl are imaginary and A 0k are real 6 . Thus, the two scalar functions A (1) (τ ) and A (2) (τ ) are real. For the VMD and LMD [32,33] models, explicit expressions for A µν (τ ) are given in Appendix A.
Finally, from Eqs. (8) and (9), we deduce the symmetry which corresponds to the Bose symmetry of the two photons coupling to the pion. Here we have written explicitly the dependence of A µν (τ ) on the spatial momenta. Equivalently, we have A (1) (τ ; q 1 , q 2 ) = A (1) (−τ ; q 2 , q 1 )e −Eπτ . The relation (14) can be checked explicitly in the case of the VMD and LMD models using the expressions provided in Appendix A. These symmetries are exploited in the analysis of the numerical data.

D. Numerical integration
In the moving frame with p = (2π/L)ẑ, both functions A (1) (τ ) and A (2) (τ ) are plotted in Fig. 2. The function A (1) (τ ) is always positive with a cusp at τ = 0, related to the OPE at short distances, as discussed in Ref. [28]. On the contrary, the sign of A (2) (τ ) changes at τ = 0 where the function is discontinuous. Therefore, the numerical integration is more challenging since large cancellations can occur in applying Eq. (3) to obtain the TFF. In particular, a naive replacement of the integral by a sum over discrete τ (trapezoidal rule) leads to noticeable numerical errors. A Simpson integration, however, reduces significantly this source of error as can be seen in Fig. 3 : here, the results correspond to fictitious data were the three-point correlation function is generated assuming an LMD model with parameters obtained from a fit to the ensemble N200. When p = 0, the function A (1) (τ ) is always positive and we do not observe any significant difference between the two integration schemes.
Due to the finite time-extent of the lattice, and because the signal deteriorates at large time separations τ , the integration in Eq. (3) cannot be performed up to infinity. However, the VMD model is expected to give a good description of the data in a wide τ window, as shown in Appendix A. Assuming a VMD parametrization, the functions A (1) (τ ) and A (2) (τ ), which depend on two parameters α and M V , can be computed explicitly from Eqs. (11, 12a, 12b) and the results are given in Appendix A. As in Ref. [28], we start by fitting our data using the VMD parametrization at large τ where the normalization of the TFF, α, is treated as a free fit parameter. The vector meson mass M V is set to the value extracted from the vector two-point correlation function evaluated on the same ensembles but with higher statistics. This fit is a global fit where all momenta and both A (1) (τ ) and A (2) (τ ) are fitted simultaneously. Then, we integrate over lattice data up to some cut-off τ c and use the fit to evaluate the remaining part of the integral with |τ | > τ c . The value of τ c 1.5 fm is chosen sufficiently large such that the two-point vector correlation function is dominated by the ground state. In practice, we only include points where more than 80 % of the integrand comes from lattice data. We have also repeated the analysis with the LMD model and the difference between the VMD and LMD models is used to estimate the systematic uncertainty associated with the treatment of the tail. This systematic error is always smaller than the statistical error. Typical fits for our lightest ensemble, D200, are depicted in Fig. 2.

E. Lattice ensembles, correlation functions and O(a)-improvement
This work is based on a subset of the N f = 2 + 1 Coordinated Lattice Simulations (CLS) ensembles [34] generated using the openQCD suite [35]. They use O(a)-improved Wilson fermions, with the nonperturbative coefficient c SW determined in Ref. [36], and tree-level O(a 2 )-improved Lüscher-Weisz action for the gauge field. To avoid the freezing of the topological charge when approaching the continuum limit [35,37], CLS ensembles use periodic boundary conditions in space and open boundary conditions in time. The parameters of the simulations are summarized in Table I. We use four different lattice spacings in the range 0.050 -0.086 fm and several pion masses down to 200 MeV to perform the continuum and chiral extrapolations. For the latter, the ensembles considered here have been generated keeping the average bare quark mass m av q = (2m l + m s )/3 fixed and using two degenerate light quarks m u = m d = m l . This chiral trajectory has the advantage of automatically keeping the O(a)-improved bare couplingg 0 constant [38]. All our ensembles included in the final analysis satisfy m π L > 4 such that finite-size effects are expected to be small. The scale setting was performed in Ref. [39] with a precision of 1% using a linear combination of the pion and kaon decay constants. Statistical errors are estimated using the Jackknife procedure and we propagate errors associated with the renormalization constant of the vector current, the scale setting, the pion mass m π and decay constant f π in the chiral extrapolations.
To reduce discretization effects and obtain a shorter continuum extrapolation than in [28], on-shell O(a)-improvement has been implemented. In addition to the improvement of the action, it requires the implementation of the renormalized O(a)-improved vector current. Two discretizations are used, the local (l) and the point-split (c) lattice vector currents, where λ a are the eight Gell-Mann matrices.
With the tensor current defined as Σ a µν (x) = − 1 2 ψ(x)[γ µ , γ ν ] λ a 2 ψ(x), the improved vector current is given by where the coefficient c V differs for the local and point-split vector currents and has been determined nonperturbatively in Ref. [40]. The point-split vector current is conserved on the lattice and does not Table I: Parameters of the simulations: the bare coupling β = 6/g 2 0 , the lattice resolution, the lattice spacing a in physical units extracted from [39], the light and strange hopping parameters κ l and κs, the pion mass mπ, the ground state vector mass, the number of gauge configurations and the number of sources per configuration for the three-point correlation function. Ensembles with an asterisk are not included in the final analysis but used to control finite size effects. need to be further renormalized. For the local vector current, the renormalization pattern reads [41] tr where Z V , b V and b V have been evaluated nonperturbatively in Ref. [40] and M q = diag(m l , m l , m s ) is the bare (subtracted) quark mass matrix. The renormalized coupling is given by . The coefficient f V starts at order O(g 6 ) in perturbation theory and is neglected here. For the electromagnetic current J µ with up, down and strange quarks, it is convenient to use the isospin decomposition where V a µ =ψγ µ λ a 2 ψ is the octet of vector currents and the hat means that the current is both improved and renormalized. In particular, Eq. (17) reduces tô with V 0,I µ = 1 2ψ γ µ ψ the flavor-singlet current and ) as an interpolating operator for the neutral pion. The renormalized and O(a)-improved three-point correlation function appearing in Eq. (5), obtained with two local vector currents, reads   Figure 4: Left: Influence of the discretization of the lattice derivative used to implement O(a)-improvement on the function A (1) (τ ). Black points correspond to ∆ A (1) (τ ) before improvement. Red, blue and green points correspond to ∆ A (1) (τ ) after improvement using respectively the symmetric, forward or backward derivative at the sink. Right: The function A (1) (τ ) for different discretizations of the vector current. Black and blue (red and orange) points correspond to local-local and local-conserved correlation function without (with) improvement of the vector currents using the symmetric lattice derivative at the sink. Results correspond to the ensemble H102.
where the first line corresponds to the connected part and the second and third lines to the disconnected parts. We did not explicitly write the improvement term proportional to c V . Here, G l and G s denote the light and strange quark propagators respectively. If f V is neglected, the connected part renormalizes proportionally to Z 2 . Similarly, with one local and one conserved vector currents, one finds where a summation over y and y is understood and where The lattice quark propagator is γ 5 -hermitian, G † = γ 5 Gγ 5 , and the linear operator V (y) µ is anti The three-point correlation function is computed using the same technique as in Ref. [28]. A pointsource is created on the timeslice y 0 with a random value of y and a sequential propagator is computed at x 0 , for two values of the pion spatial momenta p = 0 and p = (2π/L)ẑ. With our computational setup we obtain all values of the photon momenta q 1 and q 2 such that q 1 + q 2 = p without any new inversion of the Dirac operator. The number of sources per gauge configuration is given in Table I and the location of y 0 is discussed in Sec. III A. For all ensembles, we are able to probe photon virtualities up to 3 GeV 2 in the double-virtual case and up to 1.5 GeV 2 in the single-virtual case. This is a major improvement compared to our previous study, where we were limited to virtualities below 0.5 GeV 2 in the single-virtual case.
In Eq. (16), we have some freedom in the choice of the lattice derivative: different definitions differ by O(a) effects and therefore contribute only to order a 2 in correlation functions. We have considered the symmetric, the backward and the forward derivatives respectively given by For the vector current located at the source, we choose the forward derivative ∂ f µ f (x), since a symmetric derivative would require more inversions of the Dirac operator. For the vector current located at the sink, we have compared the three different discretizations. We define the ratio where A (1),lc (τ ) and A (1),ll (τ ), defined in Eq. (3), are respectively computed using one or two local vector currents and δ A (1),lc (τ ) is the statistical error associated with A (1),lc (τ ). In the left panel of Fig. 4, this ratio is plotted with and without improvement, using different discretizations of the lattice derivative.
After O(a)-improvement, the discrepancy between both discretizations is reduced as expected and the symmetric derivative turns out to be the best choice.
Finally, we note that on-shell improvement of correlation functions is not sufficient for τ = O(a). For τ = ±a, we do not use the symmetric derivative but the forward and backward derivatives respectively to avoid using the correlation function at τ = 0 where A (2) (τ ) is discontinuous (see right panel of Fig. 2)). In the right panel of Fig. 4 we plot A(τ ) using both discretizations ( A ll (τ ) and A lc (τ )) with and without improvement of the vector currents using the symmetric derivative. We observe that after improvement, with the exception of τ = 0, the two discretizations are in excellent agreement with each other.

A. Two-point pseudoscalar correlation function
The pion energy E π ( p) and its overlap Z π with our interpolating operator are extracted from the pseudoscalar two-point correlation function. In the limit of large source-sink time separation, and with both source and sink far from the boundary, the correlation function behaves as We want to extract E π ( p) and Z π in a region where the excited-state contribution and boundary effects are both small. We therefore place the source far from the boundary located at x 0 = 0. In Fig. 5, we show the result for the effective mass defined by For large source-sink time separation, we observe a plateau up to some time where the sink starts to be close to the boundary located at x 0 = T − a. In the case of open boundary conditions in time, and including only the first excited state, we first fit our data using the Ansatz where E π ( p) and Z π are the energy and the overlap of the first excited state with momentum p in the pseudoscalar channel and the third term includes the first boundary excited state: a finite volume twopion state with vanishing momentum. In the fits, we assume that E 2π = 2E π ( 0). Explicitly, denoting |B the boundary state, the prefactor A( p) is given by To reduce the number of fit parameters, both momenta are fitted simultaneously. In a second step, the data are fitted using a single-exponential ansatz in the region where excited state contributions are small compared to the statistical precision, and the results are given in Table II. We note that the pion energies are compatible with the relativistic dispersion relation. Finally, we have checked that the source is sufficiently far from the boundary such that the overlaps are correctly extracted: the last term in Eq. (28) contributes to Z π as less than half its statistical error. The same value of y 0 is used to compute the threepoint correlation function. The statistical error on the ratio E π /Z π in Eq. (3) lies between 0.5% and 0.8%.
The chiral extrapolation of the TFF is done using the dimensionless parameter y = m 2 π /(16π 2 f 2 π ). In order to propagate the error on m π and f π , we also compute the pion decay constant on the same gauge configurations for each ensemble. The bare matrix element of interest is extracted from where the axial-pseudoscalar two-point correlation function is given by The O(a)-improved axial current reads where the improvement parameter c A has been determined nonperturbatively in Ref. [42]. The ratio R(x 0 , y 0 ) is fitted using the Ansatz From this bare matrix element, the renormalized and O(a)-improved pseudoscalar decay constant is given by 7 where Z A is the renormalization factor of the axial current and b A , b A are improvement parameters [43,44].
The results are summarized in Table II. In practice, we correct these results for finite-size effects (FSE) using chiral perturbation theory (χPT) as described in Ref. [45]. These corrections are small and we find that they correctly account for FSE on the ensembles H105/N101, which were generated using the same action parameters but different lattice volumes. However the FSE correction fails on ensemble H200, which corresponds to our smallest volume (L ≈ 2 fm) and was not used to extract the pion TFF. As a consistency check, we have performed an extrapolation to the physical point assuming the functional form [GeV] [GeV]  (80) syst turns out to be in good agreement with the FLAG average [47].

B. The transition form factor
The results for the TFF at our lightest pion mass, corresponding to ensemble D200, are plotted in Fig. 6 for two different kinematics : the single-virtual case and the double-virtual case with Q 2 1 = Q 2 2 . Black and blue points correspond to the TFF obtained in the pion rest frame and moving frame respectively.
In the pion rest frame, results obtained using A (1) (τ ) are statistically more precise when both photons carry the same virtuality (right panel of Fig. 6). In this case ω 1 = m π /2 and the full integrand in Eq. (3) is symmetric. In the single-virtual case, ω 1 takes larger values and the integral probes further the tail of the function A (1) (τ ), where the noise over signal ratio increases rapidly with τ : the signal is lost for virtualities above 0.5 GeV 2 . In the moving frame, the situation is similar for A (1) (τ ) (even if the integrand is not exactly symmetric in the double-virtual case, unless | q 1 | 2 ≈ | q 2 | 2 ). For A (2) (τ ), since the sign of the function changes at τ = 0, the situation is the opposite: results are more precise in the single-virtual case where there is less cancellations between positive and negative contributions and we can reach higher virtualities (see right panel of Fig. 2).
We also point out that results obtained in both frames are fully consistent, confirming that sources of breaking of Lorentz invariance in the lattice calculation are small. In particular, since the two frames are affected by different sources of finite volume effects, it is a first hint that finite-size effects (FSE) are small. A more detailed study of FSE is done in Sec. III D 2.
C. Parametrization of F π 0 γ * γ * and extrapolation to the physical point In this section we propose a method to extract the TFF over the whole kinematical range, and at the physical point, based on its analytical properties. We introduce the conformal variables z 1 and z 2 defined through [48] which map the branch cut, starting at t c = 4m 2 π , onto the unit circle |z k | = 1. Here, t 0 is a free parameter representing the virtuality mapping to z k = 0. Since the TFF is analytic for |z k | < 1, one can write where the coefficients c nm = c mn are symmetric due to the Bose symmetry. Since |z k | < 1 one can expect a fast convergence of the sum, where only a few terms with n, m ≤ N are needed at a given accuracy. The optimal choice for t 0 , which reduces the maximum value of |z k | in the range [0, Q 2 max ], is given by This maximum is reached at z k = 0 and z k = Q 2 max . Using the values m π = 134.9 MeV and Q 2 max = 4 GeV 2 , one finds |z max | = 0.46 well below one. The coefficients c nm satisfy the relation n,m which ensures that the coefficients c nm are not only bounded but also decrease to zero for sufficiently large n, m. In practice, the TFF can be multiplied by any analytical function P (Q 2 1 , Q 2 2 ) and the resulting product expanded in powers of the z k . Some choices may improve the convergence rate of the series expansion. At short distances, the behavior of the TFF is predicted by the Brodsky-Lepage behavior in the single-virtual case and by the operator product expansion (OPE) in the double-virtual case [20][21][22], Therefore, it is convenient to consider where M V = 775 MeV is the vector meson mass. With this choice, the parametrization of the TFF at finite value of N decreases asymptotically as 1/Q 2 in all directions in the (Q 2 1 , Q 2 2 ) plane, in accord with the Brodsky-Lepage and the OPE behavior. This feature is one major reason for us preferring the z-expansion over the LMD+V model [49] used in [28], which e.g. for constant Q 2 1 = 0 and Q 2 2 → ∞ does not vanish, even though the corresponding integrals for the pion-pole contribution to the hadronic light-by-light scattering in the muon g − 2 are still convergent with the LMD+V model (see Sec. IV D). However, at the precision level we aim for, the proper high-energy behavior in all directions is important. It should be noted that we do not impose the explicit values of the coefficients on the right hand side of Eqs. (39) and (40), since they receive higher order corrections in perturbative QCD and at higher twist, see the discussion below.
The imaginary part of the TFF behaves as (q 2 − t c ) 3/2 near threshold (P-wave). This property is not fulfilled at finite N but 8 , as shown in Ref. [50], this constraint can be implemented by imposing which leads to the modified z-expansion Finally, to take into account the discretization effects and the dependence on the quark masses used in the simulations, the coefficients c nm are allowed to vary linearly with the variable y = m 2 π /(16π 2 f 2 π ) and quadratically with the lattice spacing,  [GeV] where d = 1, 2 stands for the two discretizations of the three-point correlation function. We perform a global fit and the results are summarized in Table III. Due to the large set of data, the correlation matrices are ill-conditioned and we perform uncorrelated fits. The error on the fit parameters is then obtained from a Jackknife procedure, using blocking to take into account autocorrelations. The results for N = 3 are shown in Fig. 7. The chi-square by degree of freedom is χ 2 /d.o.f. = 1.1 and the correlation matrix of the coefficients in the continuum and at the physical pion mass is In Appendix B, we study the systematic error associated with the truncation of the sum in Eq. (43) and we conclude that N = 3 is already sufficient for an accuracy at the percent level. We also checked that using P (Q 2 1 , Q 2 2 ) = 1 leads to compatible results in the range where we have lattice data. In the single-virtual case, our results are in good agreement with experimental data. In the single-virtual and in the doublevirtual case, we also find good agreement with the results obtained in the dispersive framework [51], see Fig. 7. A more detailed comparison with phenomenology is provided in the next section.
Finally, we remark that a different fit strategy would consist in fitting each ensemble independently and then extrapolating each coefficient c nm to the physical point. First, we found that the global fit procedure is more stable. Second, the continuum and chiral extrapolation of the individual coefficients c nm is non-trivial, as they are correlated. Nonetheless, we provide the z-expansion on three individual lattice ensembles in Appendix C to allow for comparisons prior to the chiral and continuum extrapolation.

Hypercubic effects
At finite lattice spacing, the O(3) rotational symmetry is broken down to the cubic subgroup H(3). Spatial momenta, equivalent in the continuum, but belonging to different H(3) orbits may be affected by different lattice artifacts. This has been observed in previous calculations of form factors (recently in Ref. [53] for example). As explained in Sec. II B, in the calculation of the TFF, we have averaged over all equivalent combinations of ( q 2 1 , q 2 2 ). In this section, we study the validity of this approach. Any polynomial function of the spatial momenta q, and invariant under H(3), can be expressed in terms of the three invariants q [n] = q n x + q n y + q n z , n = 2, 4, 6 . [GeV] [GeV] [GeV] Therefore, each H(3) orbit is uniquely characterized by the values of q [2] i , q [4] i and q [6] i with i = 1, 2. In the pion rest frame, both photons have back-to-back spatial momenta and q 1 = − q 2 . The first kinematical configuration with two orbits corresponds to | q 1 | 2 = 9( 2π L ) 2 with, for example, q 1 = 2π L (±3, 0, 0) (six elements) and q 1 = 2π L (±2, ±2, ±1) (twenty-four elements). When the pion has one unit of momentum in theẑ-direction, equivalent (q 2 1 , q 2 2 ) can also be obtained from different values of q 1 not related by H(3) symmetry but by the permutation of q 1 and q 2 .
In Fig. 8, we show the results for the transition form factor without averaging over equivalent momenta. We observe that, at our level of statistical precision, we are not yet sensitive to hypercubic artefacts.

Finite-size effects
Two sets of ensembles (H200/N202 and H105/N101, listed in Table I) have been generated using the same bare lattice parameters κ and β but with different volumes (L/a = 32 and 48). The ensembles N202 and N101 correspond to large volumes with m π L ≥ 6.5 where finite-size effects (FSE) are expected to be negligible. The two other ensembles satisfy m π L ≈ 4.0. The pion decay constant has been computed on all ensembles (see Table II) with a statistical precision below 1.0 % and finite-size effects are sizeable: we observe a 2-3 σ discrepancy between H200/N202 and FSE corrections using χPT [45] failed to explain the difference. However, for H105 and N101, the results are in perfect agreement after FSE corrections. Since finite-size effects strongly depend on the observable (and on the estimator), the TFF has been computed on these two set of ensembles. At our level of statistical precision, we do not observe any significant differences between different volumes and the results for H200, compared to N202, are shown in Fig. 9. Since H200 is our smallest ensemble, we neglect finite-size effects in the following and exclude the ensembles H200 from the analysis. To be conservative, we also exclude H105.
[GeV]  47) for the ensembles H105, N203, N200, D200 and N302. Ensembles at the same lattice spacing a = 0.064 fm are plotted using plain lines whereas ensembles at different lattice spacings are plotted using dashed lines. Right panel : TFF with and without including the disconnected contribution for our lightest pion mass ensemble D200.

Quark-disconnected contributions
The quark-disconnected contributions involve a single quark loop which couples to the vector current. Within the Mainz group, they have been coded [54] and computed as part of various physics projects [55][56][57] on a number of ensembles (H105, N203, N200, D200 and N302). These contributions vanish exactly for the three ensembles at the heaviest pion mass, where m l = m s . The single-propagator loops have been computed using two random sources of 512 hierarchical probing vectors [58] per configuration and for all spatial momenta with | q| 2 ≤ 12 × ( 2π L ) 2 . The two-point correlation functions between a pseudoscalar and a vector current have been computed using stochastic sources with U(1) noise as in Ref. [28]. We define the ratio where F conn π 0 γ * γ * + F disc π 0 γ * γ * and F conn π 0 γ * γ * correspond to the TFF obtained with or without including the disconnected contribution. The results are shown in the left panel of Fig. 10. We remark that the tail of the disconnected correlator was treated in the same manner as in the connected correlator. As expected, the disconnected contribution increases as we approach the physical pion mass: the CLS ensembles used in this work were generated using a constant value of the trace of the bare quark masses. Comparing ensembles generated at the same pion mass but with different lattice spacings, we do not observe significant discretization effects. For our ensemble with the lightest pion (D200), the TFF with and without including the disconnected contribution is depicted on the right panel of Fig. 10 : at our level of statistical precision, the disconnected contribution can be neglected. The impact of the disconnected contribution on the pion-pole contribution to the muon g − 2 is discussed in more details in Sec IV D 3.

IV. PHENOMENOLOGICAL APPLICATIONS
A. Parametrizing and comparing the TFF with phenomenological models As in Ref. [28], we compare our results with phenomenological models which have been applied to the pion-pole contribution to hadronic light-by-light scattering in the muon g − 2. In particular, we consider the vector meson dominance (VMD) model, the lowest meson dominance (LMD) model [32,33] and the LMD+V model [49]. These models are parametrized by [GeV] Figure 11: Results of the global LMD and LMD+V fits in the single-virtual case and for the ensembles N203 (left panel) and N200 (right panel). The data strongly favor the LMD+V model, with has the correct asymptotic behavior at these kinematics, over the LMD model. and we refer the reader to Ref. [28] for a more detailed description of the models and phenomenological values of the parameters. We just recall the main properties relevant for this study. All models satisfy the anomaly constraint at vanishing momenta [18,19] by setting α = α th = 1/(4π 2 F π ) = 0.274 GeV −1 . The VMD model does not fulfill the OPE constraint given by Eq. (40) but rather behaves as F VMD π 0 γ * γ * (−Q 2 , −Q 2 ) ∼ 1/Q 4 at large virtualities. The LMD model has the advantage to fulfill the OPE constraint but does not satisfy the BL behavior given by Eq. (39) Finally, the LMD+V model satisfies the short-distance constraints both for Q 2 1 = Q 2 2 → ∞ and for Q 2 1 = 0, Q 2 2 → ∞ if one sets h 1 = 0. We have performed global fits using the VMD, LMD and LMD+V Ansätze where the fit parameters are respectively (α, M V ), (α, β, M V ) and (α, h 0 , h 2 , h 5 , M V1 , M V2 ). As for the z−expansion, each fit parameter is assumed to depend linearly on the dimensionless variable y = m 2 π /(16π 2 f 2 π ) and quadratically on the lattice spacing a. The fits are uncorrelated. Since we have computed the TFF using two different discretizations of the vector current, we actually perform a combined fit such that any parameter has the functional form p(a, y) = p(0, y phys ) + γ m ( y − y phys ) + δ d a a β=3.55 In Ref. [28], we have shown that the VMD model fails to reproduce the lattice data at large Q 2 due to the wrong asymptotic behavior in the double-virtual case. Fitting our new data, we obtain χ 2 /d.o.f. = 4.8. However, we obtain a reasonable χ 2 if we restrict the fit to the single-virtual TFF and the result at the physical point reads with χ 2 /d.o.f. = 1.5. The value of α LMD is compatible with the anomaly constraint with a statistical precision of 2 %. However, the value of β is lower, in absolute value, than the OPE prediction at leading order β OPE = −F π /3 = −30.8 MeV. This point will be discussed in Sec. IV C: the OPE prediction neglects higher twists and O(α s ) corrections which are sizeable at virtualities accessible on the lattice. In Fig. 11 we show the quality of the fit for the ensembles N203 and N200 in the single-virtual configuration. At large Q 2 , we observe a significant deviation between the fit and the lattice data which explains the relatively bad χ 2 . This feature was not observed in our previous work [28] where only virtualities Q 2 < 0.5 GeV 2 where accessible in the single-virtual case. The fact that the LMD model fails to describe the lattice data in the single-virtual case is not unexpected since the model has the wrong asymptotic behavior. This explains the discrepancy between the TFF extrapolated to the physical point using the LMD model and the experimental data, as shown in Fig. 12.
Finally, for the LMD+V model we explicitly set h 1 = 0 to satisfy the OPE constraint. Furthermore, to reduce the number of fit parameters, and inspired by quark models, we assume a constant shift in the [GeV] Figure 12: Transition form factor extrapolated to the physical point using the three phenomenological models described in the main text. In the single-virtual case, we also compare the results with data from the CELLO and CLEO experiments [26].
with χ 2 /d.o.f. = 1.2. We note that the same fit would be unstable if only lattice data obtained in the pion rest frame were included. The coefficient h 0 , related to the OPE behavior, is in good agreement with the leading order OPE prediction h 0 = −F π /3. The phenomenological value h 2 = 0.327 GeV 2 [59] can be fixed by comparing with the higher-twist corrections in the OPE in Eq. (40) and our result turns out to be in good agreement as well. Finally, h 5 is consistent with the phenomenological prediction h 5 = −0.166 ± 0.006 GeV obtained in Ref. [49] by fitting the LMD+V model to the CLEO data in the single-virtual case. At our level of precision, this model provides a good description of our lattice data in the whole kinematical range covered by this study. It is also perfectly compatible with the available experimental data, as can be seen on the left panel of Fig. 12.
Recently, a new parametrization of the pion TFF over the whole space-like region has been proposed in [52]. The parametrization of the (Q 2 1 , Q 2 2 ) dependence relies on a single parameter, which can be obtained by adjusting the model to the experimental data in the single-virtual configuration. The model obeys by construction the leading prediction for the asymptotic behavior at large Q 2 , however, it does not accurately reproduce our lattice result at intermediate virtualities: the prediction underestimates our result by about 20% at Q 2 1 = Q 2 2 = 0.5 GeV 2 as can be seen in Fig 7. Nonetheless, in the future this model could perhaps be used as the factor P (Q 2 1 , Q 2 2 ) in the expansion (43).
In the chiral limit, the normalization of the TFF is exactly predicted by the ABJ chiral anomaly [18,19], where F is the pion decay constant in the chiral limit. At finite quark mass, this result receives corrections which can be computed in the framework of chiral perturbation theory (χPT) or directly using lattice QCD. The chiral expansion of the pion TFF at vanishing momenta is known up to next-to-next-to-leading (NNLO) order in χPT [29]. Chiral logarithms are absent at NLO when the result is expressed in terms of the physical value of F π [60]. They appear only at NNLO but were shown to contribute at the permille level and are negligible at our level of accuracy [29]. This motivates the extrapolation of the normalization of the form factor on the lattice using the Ansatz already used in a previous lattice calculation [63]. This functional form differs from the one assumed in the previous sections but both parametrizations lead to compatible results at our level of precision. Eq. (55) has the advantage to offer the possibility to extract the low-energy constant (LEC) C Wr 7 in the odd-intrinsic-parity sector of chiral perturbation theory at order p 6 [64] via the relation [29,61] by varying the quark masses and thus m π over a range of values on our lattice ensembles. As explained in Section II E, we note that our lattice ensembles lie in the chiral trajectory where the trace of the bare quark matrix is kept constant. If we repeat the analysis done in Section III C but using f π F π 0 γ * γ * (Q 2 1 , Q 2 2 ) instead of F π 0 γ * γ * (Q 2 1 , Q 2 2 ) as our primary observable, and limiting the fit range below 1 GeV, we obtain the result where the first error is statistical and the second error is an estimate of the systematic error due to the truncation on the z-expansion (see Appendix B). The choice N = 1 is already sufficient to get a χ/d.o.f. = 1.1. The normalization of the TFF is almost unchanged compared to the fit done in Section III C. The result in Table III,  in Eq. (57) is larger than the bound |C Wr 7 | < 0.06 × 10 −3 GeV −2 used in Ref. [29]. Of course with our current uncertainty we cannot exclude that this LEC vanishes. The bound is essentially based on estimates using a resonance Lagrangian with heavy pseudoscalars and the non-observation of the decay π(1300) → γγ [1,32,61]. The LEC C Wr 7 also vanishes exactly [49] in simple resonance Lagrangians with vector mesons only, like the VMD model for the TFF or the model proposed in Ref. [65], that do not obey short-distance constraints from the OPE. On the other hand, using the LMD model [32,33] or a resonance Lagrangian with vector and heavy pseudoscalar mesons (LMD+P) that obeys these short-distance constraints, one obtains the estimate C Wr 7 = 0.35(7) × 10 −3 GeV −2 [66]. The result is actually dominated by the LMD part and the additional contribution from the heavy pseudoscalar is again estimated to be very small. Note, however, that the error might be underestimated. All resonance estimates of LEC's are based on the assumption of narrow resonances in large-N c QCD and carry an intrinsic uncertainty of 20-30%. Under the assumption that the LEC's have a natural size of 0.5 × 10 −3 GeV −2 , one could argue that this implies rather an absolute error of ±(0.10 − 0.15) × 10 −3 GeV −2 for all LEC's. With the current precision, our value for C Wr 7 in Eq. (57) is also compatible with the larger LMD+P estimate.
As mentioned above, the precision obtained for the normalization of the TFF from the lattice cannot compete with the PrimEx or even PrimEx-II result. However, with our new estimate for C Wr 7 in Eq. (57), we can follow the same strategy as in Refs. [29,32,61] and use it together with the experimental decay width Γ(η → γγ) = 0.515(18) keV [1] to determine the other LEC C Wr 8 that appears in the expression for the decay π 0 → γγ at NLO in χPT with the result This result is almost identical to the estimate C Wr 8 = 0.58(20) × 10 −3 GeV −2 from Ref. [29] that was obtained by setting C Wr 7 = 0. We took over their estimate of 30% uncertainty from neglected terms of order m 2 s in the chiral expansion. Using the expression for the decay amplitude π 0 → γγ given in Ref. [29], including strong isospin breaking effects from m u = m d and electromagnetic corrections, these changes in the LEC's lead to the prediction Γ(π 0 → γγ) = 8.07(10) eV , confirming the value Γ(π 0 → γγ) = 8.09 (11) eV obtained in Ref. [29]. Therefore the tension with the preliminary result of the PrimEx-II experiment [25] remains. Another phenomenologically interesting quantity is the slope of the form factor at the origin. It is defined through Table IV: Normalization of the transition form factor obtained from the two sequences of Canterbury approximants C N N and C N N +1 . As before, chi-square correspond to uncorrelated fits.
where the first error is statistical and the second error is the systematic error associated to the zexpansion. Both determinations are compatible with each other and with the result obtained in Ref. [51] using a dispersive framework (within our convention, their result reads b π 0 = 1.73(5) GeV 2 ). The model independent determination is affected by a relatively large error which could be reduced in the future by adding large-volume ensembles which probe the low-Q 2 region. We also try the method of Canterbury approximants proposed in Ref. [67] in the context of the pion TFF. It is a generalization of the Padé approximants for bivariate functions and we refer the reader to Ref. [68] for an introduction to the subject. A general Canterbury approximant has the form with the convention b 00 = 1 and the relations a nm = a mn , b nm = b mn reflecting the Bose symmetry of the photons. We consider both the diagonal and the sub-diagonal sequences of approximants C N N and C N N +1 . In particular, the sequence C N N +1 automatically satisfies the short-distance behavior ∼ 1/Q 2 if one sets b N N = 0. The lowest order elements for both sequences are The main drawbacks of this method are, first, the absence of a proof of convergence for the pion TFF and, secondly, the rapid growth of the number of fit parameters. As a consequence, for large values of M , the denominator can become singular at large virtualities (presence of spurious poles in the spacelike domain) where there is no lattice data. In the following, we perform a global fit of the TFF, over the whole kinematic range, assuming a linear dependence of the coefficients a ij and b ij on the dimensionless parameter y and a quadratic dependence in the lattice spacing a. Both sequences already give a good χ 2 for N = 1. Using higher-order approximants only increases the statistical error or leads to unstable fits. The results for the two sequences are summarized in Table IV. We use as our final estimate the average between the approximants C 1 2 and C 1 1 and we take half the difference between the two sequences as an estimate for the systematic error : α = 0.261(12) stat (8) syst GeV −1 . This value is compatible with our determination via the z-expansion with similar statistical error, but a systematic error which is difficult to estimate.

C. Asymptotic behavior of the pion transition form factor
When at least one of the photon virtuality is large, and assuming (collinear) factorization, the TFF can be written as a convolution integral [20] where T H (x, Q 2 1 , Q 2 2 , µ 2 ) is a hard scattering kernel, calculable in perturbative QCD, and ϕ π (x; µ 2 ) is the nonperturbative pion distribution amplitude (DA) of twist two (normalized to one). The latter is scale dependent but otherwise universal and therefore plays a major role in the study of hard exclusive processes. The factorization and renormalization scales are chosen to be equal µ R = µ F = µ and of order Here, x and 1 − x are respectively the quark and anti-quark longitudinal momentum fractions of the meson's total momentum. Neglecting isospin breaking effects, the pion distribution amplitude is symmetric under the interchange of x and x = 1 − x such that ϕ π (x) = ϕ π (1 − x). At leading twist and leading order in perturbative QCD, the hard scattering kernel is given by [20] T LO where we have introduced the asymmetry parameter ω defined through In this section, the parameter ω should not be confused with the parameter ω 1 introduced Eq. (3). At next-to-leading order in perturbative QCD, the hard scattering kernel has been computed for all values of ω in the MS scheme in Refs. [69][70][71] and the result is where an explicit expression for t(x, w) with w = (1 − ω)/2 is given by Eq. (5.2) of Ref. [70]. Then, using C F = 4/3 and the symmetry properties of the pion DA, Eq. (65) reduces to The asymptotic expression of the twist two pion DA, ϕ as π (x) = 6x(1 − x), leads to the OPE and BL asymptotic predictions for the double and single off-shell form factors respectively [20,72]. However, this asymptotic result is expected to hold only at very large virtualities. Higher twist corrections, discussed in Refs. [22,73], are given by where the parameter δ 2 = 0.20(2) GeV 2 has been evaluated in Ref. [22] using QCD sum rules and more recently in Ref. [74] using lattice QCD.

Double-virtual form factor : symmetric case
When both photons share the same virtualities (ω = 0) one can see from Eq. (69) that the asymptotic value of the TFF is not sensitive to the shape of the pion DA (one can show that t(x, 1/2) = −3/2 is also independent of the quark longitudinal momentum fraction x) and the result reads Checking the validity of Eq. (71) at asymptotically large Q 2 provides a strong test of perturbative QCD.
Here we will ask whether the given functional form describes our lattice data at moderately large Q 2 .

Double-virtual form factor : general case
When both photons share different virtualities, the situation is more difficult because the result depends on the precise shape of the pion DA, which is largely unknown. However, as pointed out in Ref. [75], this dependence is small for small values of the asymmetry parameter ω. At leading twist, we have where a 2 is the second coefficient in the expansion of the pion distribution amplitude in terms of Gegenbauer polynomials. For Q 2 2 = 2Q 2 1 , where ω = 1/3, corrections to the asymptotic DA are of the order of 2 %. We can therefore fit our lattice data in a much wider range without being sensitive to the actual shape of the pion DA.

Fits
We perform a global fit using Eq. (71) where δ 2 is considered as a free fit parameter and is allowed to vary linearly with y and a 2 . We restrict the fit to virtualities with ω < 1/3 and Q 2 > Q 2 min and use the four-loop running strong coupling in the MS scheme [76]. Of course, one might question the applicability of perturbative QCD down to such low values of |Q min | = 1.2 GeV, even in Euclidean space, see Ref. [77] for an analysis of nonperturbative effects in the Adler function. The results at the physical point and for different values of Q 2 min are listed in Table V and we quote as our final estimate the value at Q 2 min = 2 GeV 2 This result is compatible with the sum rule determination [22] and the lattice determination [74] although with larger error.
D. Determination of the pion-pole contribution to HLbL scattering in the (g − 2)µ The hadronic light-by-light scattering contribution to the anomalous magnetic moment of the muon is one of two dominant sources of uncertainty along with the hadronic vacuum polarization. For the HLbL scattering, a model independent dispersive approach has been proposed recently [10] and the dominant contribution is expected to be the pion-pole contribution which requires the pion TFF as input. Until recently, most estimates were based on model calculations where errors are difficult to estimate. In our previous work [28], we provided the first lattice QCD calculation of the pion-pole contribution, with a statistical precision of about 12%. Recently, a data-driven determination based on the calculation of the TFF using dispersion theory was published [51]. The result is compatible with our previous determination but with reduced uncertainty. In this section, we propose to use our results to improve our estimate of the pion-pole contribution from lattice QCD.
Following [8], the pion-pole contribution to the hadronic light-by-light scattering in the muon (g−2) can be expressed as a three-dimensional integral involving two model-independent weight functions w 1 and w 2 and the product of a single-virtual TFF times a double-virtual TFF for arbitrary spacelike virtualities, The integrals run over the lengths Q i = |(Q i ) µ |, i = 1, 2 of the two Euclidean four-momentum vectors and the angle θ between them, Q 1 · Q 2 = Q 1 Q 2 cos θ, where we defined τ = cos θ. The analytical expressions for the dimensionless weight functions w i (Q 1 , Q 2 , τ ), i = 1, 2 can be found in Ref. [8]. In particular, it was shown in Ref. [78] that the relevant integration range involves space-like virtualities below 2 GeV 2 , precisely the kinematical region where we have lattice data.

Phenomenological models
A first estimate of the pion-pole contribution is obtained using the LMD+V model, which provides a good description of our lattice data. The parameters at the physical point are determined from the global fit procedure described in Sec. IV A and we obtain a HLbL;π 0 µ;LMD+V = (58.6 ± 2.7) × 10 −11 , where the error is statistical but includes the error from the continuum and chiral extrapolations. It can be compared with our previous estimate, a HLbL;π 0 µ;LMD+V = (65.0 ± 8.3) × 10 −11 , obtained with two dynamical quarks [28]. The slightly lower central value in Eq. (75) arises due to the slightly lower values of the parameters α and h 2 emerging from the fit, but the results are in agreement within the quoted uncertainties. We also checked that computing a HLbL;π 0 µ;LMD+V on each ensemble separately and then, in a second step, extrapolating the results to the continuum limit and at the physical pion mass using a fit linear in y and a 2 gives a similar result. In this case, we would obtain (58.4 ± 2.4) × 10 −11 . Since we are using a phenomenological model to describe the lattice data, the result could be biased and the error underestimated. Moreover, for the LMD+V model, we do not fit all the model parameters but made some assumptions on the second vector-resonance mass M V2 to stabilize the fit.

Canterbury approximants
A second possibility is to use the results obtained from the method of the Canterbury approximants presented in Sec. IV B. The short-distance constraints in Eqs (39) and (40) The next approximant leads to unstable fits. We note that this result is compatible with the LMD+V model determinations, which suggests that the model dependence of the central value is small. As for the LMD+V model, it is however difficult to assess a systematic error, especially because we are limited to rather low values of M .

Final result : z-expansion
Finally, the z-expansion provides a systematically improvable determination. The choice for the function P (Q 2 1 , Q 2 2 ) in Eq. (41) guarantees the 1/Q 2 fall-off of the TFF in all directions, and, using the results of Sec. III C with N = 3, we quote as our final result a HLbL;π 0 µ = (59.7 ± 3.4 ± 0.9 ± 0.5) × 10 −11 = (59.7 ± 3.6) × 10 −11 , where the first error is statistical, the second is the systematic error inferred from the study in Appendix B and the third error comes from the disconnected contribution. In particular, the first error includes the error on the lattice spacing, the renormalization of the vector currents and the extrapolation to the physical point. In the last step of Eq. (77), we have added the errors in quadrature. Since the TFF has units of GeV −1 , the relative scale-setting uncertainty on the pion-pole contribution is two times the relative error on the lattice spacing expressed in fm. The scale is known with a precision of 1% which translates in an uncertainty of about 2% on a HLbL;π 0 µ . The uncertainty on the renormalization of the vector current is negligible. Therefore, the statistical precision of the correlation functions and the extrapolation to the physical point are the dominant sources of uncertainties in our calculation. They could be improved in a future calculation, in particular by including a lattice ensemble at the physical pion mass in the analysis. In contrast to phenomenological analyses, our result for a HLbL;π 0 µ is significantly more accurate than our determination of F π 0 γ * γ * (0, 0).
In the left panel of Fig. 13, we show the difference ∆a HLbL;π 0 ;disc µ between the results obtained with and without including the disconnected contribution. At the SU(3) flavor symmetric point, the difference vanishes exactly, while it turns negative as the pion mass approaches its physical value. At a constant value of the trace of the quark-mass matrix, the disconnected contribution is proportional to m s − m l close to the SU(3) flavor symmetric point. In practice, we parametrize it as being linear in m 2 K − m 2 π and obtain ∆a HLbL;π 0 ;disc µ = −1.0(0.3) × 10 −11 at the physical point. To be conservative, and because our data does not allow us to perform a precise continuum extrapolation, we associate 50% uncertainty to this contribution.
Similarly to the case of the LMD+V model, we could perform the z−expansion on each ensemble separately and then extrapolate a HLbL;π 0 µ to the physical point. We would obtain (58.5 ± 4.0) × 10 −11 , compatible with the results from the global fit. We note that the continuum and chiral extrapolations are very mild, as can be seen in the right panel of Fig. 13. We prefer the global fit method which reduces the number of fit parameters and relies on the chiral extrapolation of the pion TFF itself instead of the pion-pole contribution.
Our final value given by Eq. (77) is compatible with our previous determination [28] using two dynamical quarks but with an improved accuracy. It is also in good agreement with the data-driven determination recently published in Ref. [51] and based on dispersion theory, with a similar precision and to results published in Ref. [67], based on a fit to experimental data using Canterbury approximants.

Combination of lattice and experimental data
The experimental precision on the normalization of the TFF, dominated by the PrimEx experiment [23], is still better than our lattice determination. It is thus possible to reduce the error on a HLbL Two comments on this results. First, this value is slightly higher than in Eq. (77). This can be explained by our lower value for the normalization of the TFF. Secondly, the statistical error has been reduced by a factor 1.7 which can be attributed to the small experimental error on the decay width, emphasizing the importance of precise lattice data at low virtualities. Thus, to further improve the lattice determination, it would be interesting to add large volume ensembles, which more tightly constrain the normalization of the TFF. The CLS ensemble E250 [79], at the physical mass and with a volume of 6.2 fm would be valuable.

V. CONCLUSION
We have computed the neutral pion transition form factor with N f = 2 + 1 dynamical quarks at the physical point. The results are summarized in Table III where we provide the coefficients of the modified z−expansion and the associated correlation matrix. Our results are in good agreement with experimental data from CELLO and CLEO in the single-virtual case. In the double-virtual case, where no experimental data exist yet, we can compare our result with the recent dispersive analysis [51]. Finally, when both virtual photons carry large virtualities, our result reproduces the asymptotic prediction from perturbative QCD once one includes first-order α s correction and higher-twist corrections.
In Sec. IV, we first compared our result with popular phenomenological models, often used to estimate the pion-pole contribution to hadronic light-by-light scattering in the muon g − 2. The LMD+V model, which satisfies both the OPE and the Brodsky-Lepage constraints at short distances, provides a good parametrization over the whole kinematic range covered by our lattice data. The fit parameters turn out to be close to their phenomenological values obtained in Ref. [80]. However, our lattice data show significant deviations from the VMD or LMD models. Secondly, we have extracted the normalization of the TFF using either the phenomenological models or the more systematic parametrization based on the z-expansion. We reproduce the experimental result with a precision of 3.5% : α = 0.264(9) GeV −1 . This is an important benchmark of our calculation. The precision is not yet competitive with the experimental estimate from the PrimEx Collaboration [23], but we were able to extract the low-energy constant (LEC) appearing in the odd-intrinsic-parity sector of χPT at order p 6 , C Wr 7 = 0.16(18) × 10 −3 GeV −2 . This value lies in-between some recent estimates based on various resonance Lagrangians that do or do not fulfill short-distance constraints from the OPE. Ref. [29] revisited the analysis of pion-decay π 0 → γγ at NLO in χPT and assumed that this LEC vanishes. It turns out that with our new result from the lattice, the other relevant LEC at NLO does not change much compared to the analysis in Ref. [29]. We get C Wr 8 = 0.56(17) × 10 −3 GeV −2 and essentially reproduce with Γ(π 0 → γγ) = 8.07(10) eV the result given in that reference, using similar input from experiment and theory. Finally, our model-independent lattice estimate for the pion-pole contribution to hadronic light-bylight scattering in (g − 2) µ , given by Eq. (77), reads a HLbL;π 0 µ = (59.7 ± 3.6) × 10 −11 and corresponds to a precision of 6%. This is our main result. The precision can be further improved by imposing the experimental constraint on the normalization of the TFF from the PrimEx experiment and we obtain in this way the lattice and data-driven estimate a HLbL;π 0 µ = (62.3 ± 2.3) × 10 −11 with a precision of 4%. This precision has already reached a sufficient level in view of the forthcoming experiment at Fermilab, for which a precision of 10% on the theory estimate of the full HLbL contribution is desired.
In the future, we plan to use our result to estimate the dominant finite-size effects in the full lattice calculation of the HLbL contribution to the muon g − 2. Since the QCD four-point correlation function will be computed on the same set of lattice ensembles, we should be able to constrain the tail of the integrand at large distances and thus reduce the statistical error. We also plan to include another lattice ensemble, with physical pion mass and large volume, to constrain even better the chiral and continuum extrapolations. The large volume should help us reach smaller virtualities and will constrain the normalization of the TFF even better. Finally, besides the dominant pion-pole contribution, it would be interesting to have a first principle estimate for the η and η pseudoscalar-pole contributions. According to model calculations [78], the size of these contributions is of the order of 20% of the pionpole contribution and they are therefore not negligible. The same lattice methodology can be used to extract the relevant pseudoscalar transition form factors, even though disconnected diagrams will play a more important role. In that case, the weight functions appearing in Eq. (74) are peaked at slightly larger virtualities. However, since a precision of 20 or 30% should suffice, a lattice calculation should be feasible. Such a study would have a high impact since only sparse experimental data are available in the double-virtual case for spacelike momenta [81].
In this appendix, we provide analytical expressions for the function A LMD µν (τ ), introduced in Eq. (3), for a general pion momentum p, assuming an LMD transition form factor [32,33]. The VMD case is simply obtained by setting β = 0 in the equations below. Using the LMD model and Eq. (10), we obtain where P E µν = i µν0i p i and Q E µν = µνi0 E π q i 1 − i µνij q i 1 p j are independent of ω. The integrand has four distinct simple poles where we used the notations k 1 = M 2 V + | q 1 | 2 and k 2 = M 2 V + | q 2 | 2 . Therefore Case τ < 0 The expression for A (1),LMD (τ ) is then obtained using Eq. (10). The function A LMD µν (τ ) is continuous for all values of τ when P E µν = 0 but is discontinuous at τ = 0 when P E µν = 0. However, the function has finite limits when τ → 0 ± respectively. A similar method can be easily applied to the LMD+V model.
Finally, we remark that Eqs. (A4) and (A5) indicate the type of intermediate states that contribute to the Euclidean three-point function C (3) µν defined in Eq. (5), according to the LMD model. Taking into account p = q 1 + q 2 and the relation (4) between A µν (τ ) and C (3) µν , we find that for τ > 0, either a vector meson with momentum q 1 or a two-particle state consisting of a pion with momentum p and vector meson with momentum − q 2 is propagating between the two vector currents; for τ < 0, either a vector meson with momentum q 2 or a two-particle state consisting of a pion with momentum p and a vector meson with momentum − q 1 is propagating. Results of the fits of the LMD+V model using the modified double z-expansion. The model is defined using Eq. (48c) and the fit parameters in Eq. (53). The exact values are α = 0.264 GeV −1 , bπ = 1.62 GeV −2 and a HLbL;π 0 µ = 59.2 × 10 −11 . Results for a HLbL;π 0 µ are given in units of 10 −11 . The last column, dmax, correspond to the maximum deviation in percent between the exact TFF and the fit.  to estimate the systematic error induced by this truncation. Therefore, as a test, we fit the LMD+V TFF with the z-expansion given by Eq. (43). This model satisfies both short-distance constraints given by Eqs. (39) and (40). For this purpose, we use a regular grid with 0 ≤ Q 2 k ≤ Q 2 max and a step size δQ 2 k = 0.025 GeV 2 . In Table VI, and for different values of N and Q 2 max , we provide the results of the fit for the normalization of the TFF α, the slope at the origin b π and the pion-pole contribution a HLbL;π 0 µ which can be compared to the exact known results. We also provide the maximum deviation between the exact TFF and the fit. The level of agreement is illustrated in Fig. 14.
We conclude that using N = 3 and Q 2 max = 4 GeV 2 is already sufficient to get a precision below 1% for the TFF in the range [0, Q 2 max ]. The normalization of the TFF and the pion-pole contribution, a HLbL;π 0 µ , are also recovered within a precision below 1%. The slope parameter is obtain with a precision of 6%. We point out that, using the optimal value t 0 , the parameters z k reach their maximal value at Q 2 k = 0, precisely where the normalization of the TFF is obtained. Therefore, fitting the data in a wide range of virtualities is not the best method to determine the normalization of the TFF. However, the minimal value of z k is obtained at Q 2 k = −t 0 ≈ 0.5 GeV 2 , an optimal choice for the pion-pole contribution since the integrand is peaked around this value.
Appendix C: Results of the z-expansion for some individual lattice ensembles In this appendix, we provide the coefficients of the z-expansion for three ensembles used in this work: N101, N200 and D200. The correlation matrices are respectively given by Eqs (C1), (C2) and (C3). The ensembles N200 and D200 have the same lattice spacing but different pion masses whereas N200 and N101 have similar pion masses but different lattice spacings. We stress that our main result in Sec. III C is based on a global fit, where all the ensembles are fitted simultaneously using the ansatz (44). Table VII: Coefficients of the z-expansion, in GeV −1 , defined in Eq. (43) and obtained from a fit on a single ensemble using local vector currents at the source and at the sink. The vector meson masses used in Eq. (41) are given in Table I.