Collins-Soper kernel from lattice QCD at the physical pion mass

This work presents a determination of the quark Collins-Soper kernel, which relates transverse-momentum-dependent parton distributions (TMDs) at different rapidity scales, using lattice Quantum Chromodynamics (QCD). This is the first lattice QCD calculation of the kernel at quark masses corresponding to a close-to-physical value of the pion mass, with next-to-next-leading logarithmic matching to TMDs from the corresponding lattice-calculable distributions, and includes a complete analysis of systematic uncertainties arising from operator mixing. The kernel is extracted at transverse momentum scales $240\,\text{MeV}\lesssim q_{T}\lesssim 1.6\,\text{GeV}$ with a precision sufficient to begin to discriminate between different phenomenological models in the non-perturbative region.

TMDs have a functional dependence on two scales: a virtuality scale µ and a rapidity scale ζ, which is related to the hadron momentum in a scattering process. While the renormalization group (RG) evolution of TMDs with µ is perturbative for perturbative scales µ and ζ, the evolution with ζ is inherently non-perturbative in certain regions of parameter space, even for perturbative µ. The ζ-evolution of TMDs is encoded in the Collins-Soper (CS) kernel [4][5][6], which can be defined as the rapidity anomalous dimension entering the relevant RG evolution equations (up to a conventional factor): where φ p (b T , x, µ, ζ) is a TMD, chosen here as a TMD wavefunction (TMD WF) encoding the transverse motion of a parton p ∈ {q, g} in a meson state [38][39][40][41]. The TMD WF is defined in a factorization formula valid in the limit of ultra-relativistic hadron momentum P and depends on the fraction x of the parton's momentum collinear with P, as well as the parton's momentum transverse to P as given by its Fourier conjugate b T , the transverse displacement. The CS kernel depends on µ, b T and parton type p, but is independent of x and the hadronic state.
Experimental DY and SIDIS data has been used to constrain phenomenological parameterizations of the quark CS kernel [42][43][44][45][46][47][48][49][50][51][52][53][54][55][56]. A number of parameterizations are in some tension in the region b T > ∼ 0.2 fm (at µ = 2 GeV), which may be partially understood to arise from different approaches to modelling non-pertubative effects. In the more recent analyses [55,56], the tensions have been reduced as larger sets of experimental data sensitive to the CS kernel in the non-perturbative regime [53,57] were included. Further improvements are expected with future data from the LHC [12] and the Electron-Ion Collider [17,18]. A direct way of constraining the kernel from cross-section ratios has also been proposed and demonstrated on synthetic data [58] and could be applied to experimental data in the future. A more precise determination of the non-perturbative CS kernel is important in particular for measurements of electroweak observables such as the W ± -boson mass [59] and especially for studies of nucleon and nuclear structure via deep inelastic scattering [17].
Using quasi-TMD WFs and LaMET, this work presents the first lattice QCD calculation of the quark CS kernel at valence quark masses corresponding to a closeto-physical value of the pion mass, m π = 148.8(1) MeV, thereby addressing the systematic uncertainty arising from the sensitivity of the kernel to the QCD vacuum structure [76] and reducing those arising from perturbative LaMET matching and proportional to m 2 π /(x|P|) 2 and m 2 π /((1 − x)|P|) 2 . Other b T -dependent systematic uncertainties associated with matching are better quantified relative to previous calculations. The matching is performed at next-to-next-to-leading order (NNLO) and next-to-next-to-leading logarithmic (NNLL) accuracies for the first time in a calculation of the CS kernel, using recent results of Refs. [77,78]. Moreover, previously dominant [28] systematic uncertainties from the Fourier transformation of quasi-TMDs are reduced in this work, and the associated model dependence is eliminated. Finally, renormalization-induced mixing effects for the non-local operators associated with quasi-TMDs are fully quantified for the first time in the RI/xMOM renormalization scheme [79][80][81]. Taken together, this work achieves sufficient control and precision to begin to discriminate in the non-perturbative region between phenomenological parameterizations [44,51,52,55,56] of the quark CS kernel and provides a better understanding of perturbative convergence in LaMET matching and the associated power corrections.

II. THE COLLINS-SOPER KERNEL FROM QUASI-TMD WAVEFUNCTIONS
The quark CS kernel can be computed in lattice QCD from ratios of matrix elements of non-local staple-shaped Wilson line operators in hadron states at different finite boost momenta P z 1 , P z 2 [41,63,65]: + δγ MS q (µ, x, P z 1 , P z 2 ) + p.c. ( Here the dependence on the lattice spacing, a, is suppressed. δγ MS q (µ, x, P z 1 , P z 2 ) denotes the perturbative matching correction defined at the end of this section, and p.c. denotes the associated power corrections that are power series in 1/(b T (xP z )) 2 , Λ 2 QCD /(xP z ) 2 , m 2 h /(xP z ) 2 , where m h is the meson mass and P z ∈ {P z 1 , P z 2 }, and analogous forms with x replaced by 1 − x. W (0) Γ (b T , b z , P z , ) denote ratios of bare quark quasi-TMD WFs (defined further below), such that As only quark quasi-TMD WFs are studied in this work, parton labels on WFs and WF ratios are omitted. Subscripts Γ ( ) denote Dirac structures; in the limit of infinite boosts P z 1 , P z 2 → ∞, quasi-TMD WFs with Γ ∈ {γ 3 γ 5 , γ 4 γ 5 } approach γ + γ 5 . Renormalization factors Z MS ΓΓ (µ) are 16 × 16 matrices, detailed further below, and the normalization factors N Γ (P z ) correspond to where E h (P zẑ ) and m h are the meson energy and mass, respectively.
Bare quark quasi-TMD WFs in position space are given by Euclidean equal-time correlation functions where |0 and |h(P z ) denote the QCD vacuum and a pseudoscalar meson state, respectively. The meson is taken to contain the isovector ud valence quark-antiquark pair, and the operator O Γ ud (b T , b z , y, ) is depicted in Fig. 1 FIG. 1. Diagrammatic representation of the non-local operator O Γ ud (bT , b z , y, ) defined in Eq. (6). The operator comprises a staple-shaped Wilson line of length +bT connecting a quarkantiquark pair ud separated by b = (bT , b z , 0) (blue). The origin is defined at the midpoint between the quark and the antiquark (red). and defined as where b = (b T , b z , 0), u(y) and d(y) denote up-and downquark fields, respectively, Wn(x; ξ) denotes a Wilson line of length ξ starting at x directed alongn,n T denotes a unit four-vector along b T , and denotes the total collinear length of the staple-shaped Wilson line. The transformation properties of these operators and quasi-TMD WFs under sign changes of b T and b z as well as other discrete symmetries are presented in Appendix A. Forming ratios in Eq. (3) cancels divergences logarithmic in a, as well as power divergences linear in /a and b T /a, in the quasi-TMD WFs [79,80,82]. Furthermore, forming the ratios eliminates dependence up to discretization artifacts and power corrections of order 1/(P z ) and b T / . This leads to finite → ∞ limits of infinite collinear staple length for the ratios W The 16×16 renormalization matrices Z MS ΓΓ (µ) appearing in Eq. (2) may be computed as where Here Tr denotes a spinor trace and Γ runs over the 16 Dirac matrices. Conversion from the RI/xMOM renormalization scheme [79][80][81] at the scale defined by p µ R and ξ µ R to the MS scheme at the scale µ is achieved with the conversion coefficient C MS RI/xMOM (µ, p R , ξ R ) computed in continuum perturbation theory [81]. Z RI/xMOM Λq,±z , where q ∈ {u, d}, are 4×4 matrices in spinor space renormalizing the corresponding Green's functions Λ q,±z (p, ξ) defined as where S q (p) denotes the momentum-space quark propagator. Z RI/xMOM Λq,±z in a fixed gauge, where Λ tree is the tree-level Green's function corresponding to Λ q,±z (p R , ξ R ). Further details are provided in Appendix B.
Since C MS RI/xMOM has no Dirac structure, it cannot change the mixing patterns encoded by Z RI/xMOM Λq,±z (p R , ξ R ) and their dependence on the auxiliary renormalization scales p µ R and ξ µ R . Moreover, if determined for any given p µ R and ξ µ R , C MS RI/xMOM simply cancels in the ratio of Eq. (2). However, in practice, if a calculation of Z MS ΓΓ (µ) is realized as an average over multiple auxiliary scales, conversion (in both the numerator and the denominator in Eq. (2) before averaging) may affect the value and systematic uncertainties in the lattice QCD determination of the CS kernel.
The matching correction δγ MS q (µ, x, P z 1 , P z 2 ) appearing in Eq. (2) is perturbative and given by where N k LO denotes a fixed-order accuracy,x ≡ 1−x, the renormalization scheme dependence is omitted for brevity, and C N k LO φ (µ, p z ), with p z ∈ {xP z 1 , xP z 2 ,xP z 1 ,xP z 2 } denote the TMD WF matching coefficients. The corresponding matching formula between physical and quasi-TMD WF receives power corrections as discussed around Eq. (2).
The C N k LO φ (µ, p z ) are computed perturbatively in the strong coupling α s (µ), with C LO φ = 1. The NLO contribution has been computed in Refs. [71,75]; the NNLO contribution may be inferred from the matching formula for quasi-TMD PDFs [77,78]. For further discussion, see Appendix C 1.
Fixed-order coefficients C N k LO φ (µ, p z ) may be resummed from initial scales (µ 0 , p z 0 ) = (2p z , p z ) as [68,72] where N k LL denotes a logarithmic accuracy and K N k LL φ µ 0 , µ is a resummation kernel. Since the µ dependence cancels in the ratio of quasi-TMD WFs (excluding the effects of conversion to the MS scheme which may arise in practice as discussed above), the CS kernel in Eq. (2) is dependent on µ only through perturbative corrections, and the above choice of µ 0 = 2p z further isolates the µ dependence to the resummation kernel. Resummations are independent of initial scale at infinite order but differ by higher-order terms at finite order. For any choice of µ 0 , variations around µ 0 provide a measure of the associated perturbative uncertainties. The resummed matching correction to the CS kernel is given by Eqs. (11) and (12) as where the logarithmic ratio is expanded perturbatively in α s (2p z 1 ) and α s (2p z 2 ). For further discussion, see Appendix C 2.
To partially account for the b T -dependent power corrections, a practical choice is to replace δγ N k LO q (µ, x, P z 1 , P z 2 ) in Eq. (2) with a b T -unexpanded correction: where the ellipsis denotes terms that are power-and exponentially suppressed in b T (xP z 1 ), b T (xP z 2 ), and the analogous terms with x replaced byx.
The NLO contribution may be inferred from the corresponding TMD PDF coefficients [66,75].
are conjectured in this work to reduce the b T -dependent power corrections relative to the resummed matching correction in Eq. (13) at the same accuracy, as is investigated numerically in the following section and in Appendix C 3; further study and a more systematic treatment of power corrections is left to future work.

III. NUMERICAL INVESTIGATION
The quark CS kernel is computed numerically using an ensemble of lattice gauge-field configurations produced by the MILC collaboration [83] with 2 + 1 + 1 dynamical quark flavors and four-volume V = L 3 × T = (48a) 3 × 64a with a = 0.12 fm. The one-loop Symanzik improved gauge action [84][85][86][87] and the highly improved staggered quark action with sea quark masses tuned to produce a closeto-physical pion mass [88][89][90] are used for gauge field generation. Gauge field configurations are subjected to Wilson flow with flow-time t = 1.0 [91] to enhance the signal-to-noise ratio in numerical results, and are gauge fixed to Landau gauge. Calculations are performed in a mixed-action setup with the tree-level O(a)-improved Wilson clover fermion action [92][93][94] used for propagator computation, with hopping parameter κ = 0.125 47 and clover term coefficient c sw = 1.0, resulting in a pion mass of m π = 148.8(1) MeV.
The following subsections detail the steps of the calculation of the quark CS kernel, including calculations of the bare quasi-TMD WF ratios and renormalization matrices, the Fourier transform to b T -space, and finally the extraction of the CS kernel from ratios of quasi-TMD WF ratios with perturbative matching corrections.

A. Bare quasi-TMD WF ratios
The CS kernel is computed according to Eq. (2), using quasi-TMD WF ratios with a pion |h(P z ) = |π(P z ) chosen as the hadronic state. The ratios W (3) are extracted from fits to pion two-point correlation functions. In particular, Euclidean correlation functions both with and without staple-shaped operators are constructed as and where P = P zẑ , t = y 4 , and pion states are created with momentum-smeared interpolating fields where the quasi-local quark fields are constructed using a Gaussian momentum smearing kernel F K with K = ±P/2 realized iteratively with n smear = 50 smearing steps and a smearing kernel width defined by ε = 0.2 [95]. These correlation functions have spectral representations and where E nπ denotes the energy of the n-th eigenstate of the LQCD transfer matrix with quantum number of the pion, denoted |π n , and in particular E π (P) ≡ E 0π (P). Staple-shaped operator matrix elements are defined as  the vacuum state are defined as In Eqs. (18) and (19), T denotes the temporal extent of the lattice, and the ellipses denote additional contributions where the vacuum state is replaced by finite-temperature excited states. These contributions are suppressed by factors of order e −2mπ(T /2) or smaller in comparison with the terms shown and are therefore neglected below. The ground-state overlap Z S π (P) ≡ Z S 0π (P) is guaranteed to be real-valued and positive up to discretization artifacts 1 . This ensures that Z S π (P) = |Z S π (P)| 2 can be extracted from fits to Eq. (18) and therefore that both the magnitude and phase of the complex-valued TMD WFφ Γ (b T , b z , P z , ) can be extracted from joint fits to Eqs. (18) and (19). The ± sign appearing in Eq. (19) depends on Γ as detailed in App. A and in particular is negative for γ 4 γ 5 and positive for γ 3 γ 5 . The operator geometries and number of configurations N cfg (P z ) used to compute the two-point correlation functions for each choice of pion momentum P z = 2π L n z are summarized in Table I. Correlation functions are computed with propagators calculated from sources on a 2 4 grid bisecting the lattice along each dimension for all of the 16 Dirac structures. 2 The operator geometries used, illustrated in Fig. 1, are such that for each b T /a ∈ {0, . . . 7} 1 A combination of the non-singlet axial Ward identity in the isospin limit and the partially conserved axial current relation (PCAC) guarantee that where mq is the renormalized light quark mass, P (x) is a local pseudoscalar interpolating field for an isovector pion, J µ5 (x) is the corresponding axial vector current, and fπ is the pion's decay constant [96]. The above applies to renormalized fields -for the bare pseudoscalar interpolating field, the pion overlap factor is therefore real and positive up to discretization artifacts from possible mixing with higher-dimension operators. This continues to hold for boosted pion states and if the quark fields in P (x) are smeared with a self-adjoint smearing kernel. 2 For n z = 10 measurements were performed on slightly fewer configurations (corresponding to at least 80% of the the N cfg (P z ) alongn T ∈ {±x, ±ŷ}, all possible staple asymmetries b z are constructed with the fixed values of /a specified, i.e., − /a ≤ b z ≤ /a, which are by construction restricted to be either even or odd integers for any fixed /a. This choice is convenient as power divergences are proportional to the total length of the Wilson line [79,80,82] in the operator, so all operator geometries computed for a given and b T have equal power divergences across all b z , simplifying renormalization. This is in contrast to the staple geometries chosen in the work of Refs. [24,25,28] where various geometries with a given b T were constructed with fixed values of 1 2 ( + b z ), leading to b z -dependent renormalization factors.
Correlation functions computed on each gauge-field configuration are averaged over sources, forward and backwards propagation in time, and operator structures witĥ n T ∈ {±x, ±ŷ} for C Γ 2pt . The bare quasi-TMD WF ratios W (3) are then determined using a multi-step fitting procedure: 1. Determination of E π (P) and Z S π (P) from a simultaneous fit to C π 2pt and the statistically most precise C Γ 2pt for a given P; 2. Determination ofφ Γ from fits to the t-dependence of combinations of C Γ 2pt using the results for E π (P) and Z S π (P), accounting for correlations between quasi-TMD WF ratios with different staple geometries using bootstrap resampling; 3. Construction of W Each of these steps is detailed in the following subsections.

Determination of Eπ(P) and Z S π (P)
As the exponential t-dependencies of both C π 2pt and C Γ 2pt are governed by E nπ (P), these correlation functions may be fit simultaneously to extract E nπ (P) and Z S nπ (P). In practice, only the statistically most precise C Γ 2pt for a given P is used, corresponding to the two-point function constructed with the operator geometry with the minimum value of /a computed, b T /a = 1, b z = 0 (even /a), or an average of b z /a = ±1 (odd /a), and Γ = γ 4 γ 5 . For each P, the two correlation functions C π 2pt and C Γ 2pt are jointly fit to the spectral representations of Eq. (18) and Eq. (19) for a variety of fit ranges using correlated χ 2 -minimization with the fitting procedures detailed in Refs. [25,97] and summarized here. Results using t ≤ t max are used for fitting, where t max is chosen to be the largest t for which a given correlation shown in Table I) for some Dirac structures that are found to make negligible contributions to the renormalized quantities studied here.
function has signal-to-noise ratio ≥ 1/3. Fits are performed with all possible fit windows [t min , t max ] such that t min ≥ 2 and t max −t min ≥ 3, where t min is chosen independently for C π 2pt and C Γ 2pt . 3 For each fit range, the covariance matrix is estimated using bootstrap resampling [98] with optimal linear shrinkage [99,100]. First, fits using one-state truncations of Eq. (18) and Eq. (19) are performed. For P z > 0, VarPro methods [101,102] are used in which the best-fit Z S 0π (P zẑ ), which enters χ 2 linearly, is determined using linear methods during each step of nonlinear optimization for E 0π (P zẑ ). For P z = 0, where there is negligible signal-to-noise degradation, VarPro methods lead to less efficient χ 2 -minimization and are not employed. Fits to two-state truncations of Eq. (18) and Eq. (19) are then performed analogously.
The Akaike Information Criterion (AIC) [103] is used to select whether one-or two-state fits are preferred for each fit range. To penalize overfitting, two-state fits are only accepted if they improve the AIC by at least 2 times the number of degrees of freedom and if excited-state contributions do not severely dominate over ground-state In cases where two-state fits are preferred, three-state fits are also performed but are not found to be preferred by the AIC in any case. Further selection cuts are then applied as described in Ref. [97]: fits are discarded for which two nonlinear optimizers disagree on the ground state by more than 10 −5 , the bootstrap median and mean disagree by more than 2σ, or correlated and uncorrelated fits disagree by more than 5σ.
Weighted averages of all results from fits passing these cuts are then used to determine the final results for Z S π (P zẑ ) and E π (P zẑ ). The same weights are used as in Refs. [25,97,104], which for each fit parameter (Z S nπ and E nπ ) correspond to the p-value of each fit divided by the variance of the fitted parameter. For each momentum, at least 6 fit ranges are found to lead to fits passing the cuts described above and are therefore included in these weighted averages. The same weights are also used to perform averages of the bootstrap samples of Z S π (P zẑ ) and E π (P zẑ ) generated using a common set of bootstrap ensembles for each fit range, which are used below to enable correlated determinations ofφ  22) and constructed using C π 2pt (squares, offset slightly on the horizontal axis) and the most statistically precise C Γ 2pt (triangles) for each choice of P z . The gray bands show the weighted average of Eπ(P zẑ ) from fits with all possible tmin using the number of excited states preferred by the AIC, as described in the main text. The colored bands show the corresponding highest-weight fit included in the average. correlation function as where the ellipsis denotes exponentially-suppressed corrections from excited states and the finite temporal extent of the lattice geometry. The momentum dependence of the choice of N cfg (P z ) and (P z ) leads to a complicated dependence of the statistical uncertainties of the determination of E π (P zẑ ) on P z . The momentum dependence of the extracted values of E π (P zẑ ) and Z S π (P zẑ ) is shown in Fig. 3. The continuum dispersion relation E π (P zẑ ) = E π (0) 2 + |P zẑ | 2 is also shown for comparison. The relative differences between E π (P zẑ ) and the continuum dispersion relation in order of increasing P z are 0.03 (2) zero aP z values studied. The increase in these differences with aP z is observed to be approximately linear, which is consistent with the expected form of lattice artifacts since the clover term has not been nonpertubatively tuned to remove O(a) chiral symmetry breaking effects. Further calculations at other values of the lattice spacing are required to study these lattice artifacts in more detail.

Determination of bare quasi-TMD WFsφ [Γ]
The results for E nπ (P) and Z S nπ (P), detailed in the previous section, are subsequently used to determinẽ 2pt , E π (P) and Z S π (P) are formed at the bootstrap level: which are fit to the appropriate spectral representations obtained by multiplying Eq.
. The same procedure described in Sec. III A 1 is used to choose t max for these fits; however, for some staple-shaped operator geometries C Γ 2pt is consistent with zero within the statistical precision of this work. Therefore, t max ≥ 9 is imposed in cases where the signal-to-noise criterion described above would lead to a smaller t max . The same procedure described above is then used to sample over possible values of t min , construct the bootstrap covariance matrix with optimal linear shrinkage for each choice of fit range, and determine weighted averages of the fit parameterφ Γ (b T , b z , P z , ) for each operator geometry. Examples of the resulting fits are shown in Fig. 4.

Construction of bare quasi-TMD WF ratiosW [Γ]
Bare quasi-TMD WF ratios are obtained at the boostrap level from bare quasi-TMD WFs via Eq. (3) for each Γ, P z , , b z , and b T combination considered. For the symmetric staple geometries used here, b z /a is necessarily odd (even) for odd (even) /a. For the geometries where /a and therefore b z /a are odd, b z = 0 matrix elements are replaced by averaging over those with b z /a = ±1. The replacement leads to differences in the normalization of even and odd /a matrix elements at non-zero lattice spacing; however, these differences vanish in the continuum limit and can be analyzed in conjunction with other lattice artifacts when the continuum limit is performed. The statistical precision of the quasi-TMD WF ratios diminishes with increasing b T , with the smallest signal-tonoise ratio observed for the largest computed b T /a = 7 for quasi-TMD WF ratios with the largest collinear length, /a = 32 at n z = 4 (P z = 0.86 GeV). Signal-to-noise ratio also decreases with increasing P z ; however, the use of smaller at larger P z with roughly constant P z leads to relatively mild signal-to-noise scaling with P z in the quasi-TMD WF results.
Both real and imaginary parts of W uncertainties, which is consistent with expectations given the symmetric definition of the staple-shaped operator's origin (shown in Fig. 1) placed at the midpoint between the quarks. For further discussion, see Appendix A.

B. Renormalized quasi-TMD WF ratios
The renormalization factors Z MS ΓΓ (µ) are determined from C MS RI/xMOM (µ, p R , ξ R ) and Z RI/xMOM Λ±z (p R , ξ R ) via Eq. (7) for Wilson lines along ±ẑ and the set of renormalization scales defined by Wilson line lengths ξ R /a ∈ {2, 3, 4} and off-shell quark momenta p µ R = 2π L (0, 0, n z , 0), with n z ∈ {8, 10, 12}. The coefficients C MS RI/xMOM (µ, p R , ξ R ) are computed perturbatively, with α s (µ) determined as prescribed in Ref. [105], setting µ = 2 GeV, and neglecting the running from the scale set by (a) The mixing matrix at the renormalization scale used in the analysis of the CS kernel. White disks denote off-diagonal elements with contributions expected at one-loop order in lattice perturbation theory [106]. Examples for other renormalization scales are provided in Fig. 16 of Appendix B.  components: The central values of Z RI/xMOM ΓΓ (p R , ξ R ) are calculated at ξ R /a = 3 and p R = 2.15 GeV, and systematic uncertainty for each pair Γ, Γ is estimated as half the difference between the maximum and the minimum of Z RI/xMOM ΓΓ (p R , ξ R ) over all values of p R and ξ R studied. Systematic and statistical uncertainties are added in quadrature. Fig. 5a At the level of renormalization constants as defined by Eq. (24), mixing effects for collinear configurations of p µ R and ξ µ R are consistent with constraints on stapleshaped operator mixing from C, P and T transformations [36,107], and the dominant contributions are as expected from lattice perturbation theory at one-loop order [106]. While non-collinear momentum configurations are not used in the determination of the kernel, an investigation of mixing effects using such a definition of the associated renormalization scales, summarized in Appendix B, reveals contributions to mixing in addition to those expected in lattice perturbation theory at one-loop order. The additional contributions may be understood as artifacts of an off-shell renormalization scheme.
The ratios of the MS-renormalized quasi-TMD WFs,

C. Fourier-transformed quasi-TMD WF ratios
The Fourier transform of the MS-renormalized positionspace quasi-TMD WF ratios is realized as a Discrete Fourier Transform (DFT), i.e., where b max z denotes the truncation point in position space andW MS Γ (b T , µ, b z , P z ) denotes a position-space quasi-TMD WF ratio whose real and imaginary parts have been averaged at each P z over ±b z and all values of (P z ) relevant for a given b z with weights proportional to the inverse variance of each contribution. As can be seen in Appendix D, the values that are averaged are in all cases consistent within ≈ 1σ. As demonstrated in Fig. 7, with , computed as described in Section III A. FIG. 6. Example comparison of quasi-TMD WF ratios before and after accounting for renormalization-induced mixing effects.
is expected to be real. Finally, since the LaMET matching coefficients to NLO are independent of Dirac structure, W MS Γ (b T , b z , P z , ) for Γ ∈ {γ 3 γ 5 , γ 4 γ 5 } are expected to agree up to power corrections. The magnitude of both real and imaginary parts of the quasi-TMD WFs are reduced outside of the physical region x ∈ [0, 1] as P z increases, which is consistent with expectations from the factorization formula [41,65,66,68,71,72,75]. Since the factorization scales are proportional to the hard parton momenta xP z and (1 − x)P z , the power corrections are always enhanced near the end-point regions x → 0 and x → 1, and lead to nonvanishing tails when P z is finite.

D. Perturbative matching
The final determination of the CS kernel in this work employs the b T -unexpanded resummed perturbative correction at NNLL accuracy, denoted uNNLL, In addition to uNNLL, corrections at several other accuracies are computed to study perturbative convergence: fixed-order NLO and NNLO corrections computed according to Eq. (11), uNLO corrections computed analogously, and NLL and NNLL resummations computed according to Eq. (13). In all comparisons beyond LO, for example that of NNLL and NLL illustrated in Fig. 9, the Re[δγ MS q (µ, x, P z 1 , P z 2 )] exhibit qualitative agreement between different accuracies for x ∈ [0.3, 0.7] at each pair (P z 1 , P z 2 ), with better agreement at larger momenta. When compared analogously, the Im[δγ MS q (µ, x, P z 1 , P z 2 )] exhibit worse agreement and are larger in magnitude than the real parts. This indicates different rates of perturbative convergence in real and imaginary parts of matching corrections. The same qualitative picture is observed for fixed-order corrections in Fig. 19 of Appendix C 1. Sensitivity to b T -dependent power corrections is also different between real and imaginary parts, as may be seen by comparing corrections expanded and unexpanded in b T , such as the comparison of NLO and uNLO illustrated in Fig. 10 and further examples provided in Appendix C 3. These comparisons reveal a b T -dependent sensitivity to power corrections which, for momenta used in this work, is significant for b T /a < ∼ 3 in the real part and across the entire range in b T /a in the imaginary part.

E. The Collins-Soper kernel
Using Eq. (2) and replacing integral Fourier transforms of quasi-TMD WF ratios with the DFTs defined in Eq. (26), the MS-renormalized quark CS kernel is determined via the estimator The CS kernel is determined from an average of Whereas the CS kernel is a real quantity, averages of Im γ MS q,Γ at different perturbative accuracies indicate a non-zero imaginary part as illustrated in Fig. 14. By comparing to the LO estimate, where the matching correction vanishes, it is clear that matching is a dominant source of the imaginary part. As discussed in Section III D, the imaginary part from the matching is attributed to b Tdependent power corrections enhanced at small b T and mitigated by uNLO and uNNLL corrections. Consistent with this explanation, for small b T , Im γ MS q,Γ at uNLO and uNNLL are reduced relative to all other orders of matching; as b T increases and power corrections are suppressed, they approach NLO and NNLL results, respectively. However, uNLO and uNNLL accuracies still do not lead to values of Im γ MS q,Γ that are consistent with zero within the accessible range of b T P z . This suggests that power corrections beyond those that have been accounted for by the unexpanded matching are relevant at the level of precision of this calculation.
Since matching corrections with smallest expected power corrections are given by uNNLL, this accuracy is used for the final estimate of the CS kernel. Furthermore, considering both the larger qualitative difference between Im γ MS q,Γ for different accuracies and momenta, as well as the parametrically larger estimates of b T -dependent power corrections compared to Re γ MS q,Γ , the central value of the CS kernel is determined from fits to Re γ MS, uNNLL q,Γ while Im γ MS q,Γ is not treated as a direct source of systematic uncertainty. Finally, scale variation in resummed corrections around µ 0 = 2p z , with p z ∈ {xP z , (1 − x)P z }, is not used to estimate the associated perturbative uncertainties. This choice is motivated by the range of p z used to determine the CS kernel, and in particular because results at scales µ 0 /2 are sensitive to non-perturbative effects. The significance of higher-order perturbative effects may instead be judged by comparing the final uNNLL CS kernel determination to those obtained with other accuracies, as shown in Fig. 13.
The final CS kernel results of this work are summarized in Table II. These results are shown as a function of b T and compared with phenomenological determinations of the CS kernel in Fig. 15.

IV. OUTLOOK
This work presents a numerical determination of the quark Collins-Soper kernel γ MS q (b T , µ = 2 GeV) in the non-perturbative range of b T corresponding to transverse momentum scales 240 MeV < ∼ q T < ∼ 1.6 GeV, through a lattice QCD calculation at a fixed lattice spacing and volume, quark masses corresponding to an approximately physical value of the pion mass m π = 148.8(1) MeV, and uNNLL perturbative matching power corrections in LaMET. Additionally, this work presents improved estimates of systematic uncertainties associated with perturbative matching from LaMET, the associated power corrections, and mixing effects in staple-shaped operators using the RI/xMOM renormalization scheme.
While a complete quantification of systematic uncertainties would require performing lattice QCD calculations at multiple lattice spacings and at larger boosts or higher-order perturbative matching, the precision and control over systematic uncertainties achieved in this work is sufficient to preliminarily compare the CS kernel determination with phenomenological parameterizations   of the kernel fit to experimental data. In Fig. 15 [56], as well as an older parameterization based on the work of Brock, Landry, Nadolsky and Yuan (BLNY) [44] and employed in recent code packages for resummation calculations relevant to precision electroweak measurements [110,111]. Within quantified uncertainties, the data agrees with all models in the range 0.12 fm < ∼ b T < ∼ 0.24 fm, with all but BLNY for 0.24 fm < ∼ b T < ∼ 0.6 fm, and with SV19, MAPTMD22 and ART23 for b T > ∼ 0.6 fm. Finally, for b T ≥ 0.6 fm, the results are consistent with a constant, as suggested for the large-b T behavior in Ref. [112]. Discretization artifacts and power corrections, both enhanced at small b T , will be studied in more detail in future work. More refined comparisons would also take into account the differences in the number of quark flavors and their masses between the lattice QCD determination and the global analyses, which lead to perturbative corrections described in Ref. [113]. The first-principles QCD calculations achieved in this work provide new constraints on the quark CS kernel, with better control of the associated systematic uncertainties. The results are complementary to those achieved experimentally and, once the continuum limit is taken, can be rigorously compared to phenomenological parameterizations of the CS kernel from current global analyses. Moreover, in future analyses, lattice QCD constraints could be used to constrain the parameterizations in joint fits with experimental data. Center (PSC) through allocation TG-PHY200036, which is supported by National Science Foundation grant number ACI-1548562, and facilities of the USQCD Collaboration, which are funded by the Office of Science of the U.S. Department of Energy. We gratefully acknowledge the computing resources provided on Bebop, a highperformance computing cluster operated by the Laboratory Computing Resource Center at Argonne National Laboratory. The Chroma [114], QLua [115], QUDA [116][117][118], QDP-JIT [119], and QPhiX [120] software libraries were used in this work. Data analysis used NumPy [121] and Julia [122,123], and figures were produced using Mathematica [124].

Appendix A: Constraints on quasi-TMD WF from discrete Lorentz transormations
The properties of quasi-TMD WFs under charge conjugation C, a product of reflections R T ≡ R 1 R 2 in the transverse directions, and time reversal T , follow from the properties of the relevant staple-shaped operators T , b z , y, , a) defined in Eq. (6). These operators transform as where R T (y) = (−y 1 , −y 2 , y 3 , y 4 ), T(y) = (y, −y 4 ), and the Dirac representation matrices M C , M R , and M T are defined by (A6) For further discussion of discrete transformations of stapleshaped operators, see Ref. [107].
These operator transformation properties constrain the unsubtracted bare quasi-TMD WFs φ Γ (b T , b z , P z , ). Using Eq. (A1) in the isospin limit, charge conjugation invariance of pion states, and u ↔ d exchange symmetry in the isospin limit gives (A7) Next considering transverse reflections, pion states are pseudoscalar and are therefore invariant under the product of reflections R T . Eq. (A2) can therefore be used to obtaiñ which provides the Γ-dependent signs with which correlation functions can be averaged over different staple orientations. Combining these results gives which establishes the symmetry properties of φ Γ (b T , b z , P z , ) under sign changes of b z . In particular, it follows from Eq. (A9) that φ γ4γ5 (b T , b z , P z , ) and φ γ3γ5 (b T , b z , P z , ) are both symmetric in b z .
Finally, Eq. (A3) and the T -odd transformations of pion interpolating operators χ † P (0) can be used to obtaiñ which provides the Γ-dependent signs with which correlation functions can be averaged over forward and backward propagation in time. Discrete transformation properties for renormalization factors can be derived analogously and ensure that renormalized quasi-TMD WFs share the same transformation properties as the bare quasi-TMD WFs with the corresponding Γ.

Appendix B: RI/xMOM renormalization scheme
As discussed in Section II, the renormalization condition of Eq. (10) includes a Green's function containing a Wilson line and gives all the mixing effects of the stapleshaped operator in the RI/xMOM scheme. This simplifies renormalization compared to other RI-type schemes, which involve Green's functions of the operator itself and depend on the geometry of the Wilson-line staple. Encoding all the mixing effects in Eq. (10) is possible by interpreting the Wilson lines in QCD as originating from propagators of free auxiliary fields ζ n (x) [79][80][81], where ζ n (x) denote auxiliary fields of scalar particles moving along straight space-like directions n µ and carrying color charge in the fundamental representation [79,80]. That is, the QCD action is augmented by ζ n (x) in a way that returns the original action when the field is integrated out and Eq. (B1) holds.
The staple-shaped operator in Eq. (6), non-local in QCD due to Wilson lines, may be recast in terms of local fields in the extended theory: where C n,n (x) ≡ζ n (x)ζ n (x) denote cusp operators, and Q q,n (x) ≡ζ n (x)q(x) denote composite spin-1/2 fields. The renormalization constant of the operator is thereby factorized into Z C n,n , Z Qq,n , Z q and Z ζn , renormalizing Cn ,n , Q q,n , quark, and ζ fields, respectively, as well as a factor of e −δm( +b T ) where δm denotes the mass of ξ fields induced by loop effects [79,80,82].
In practice, the corresponding renormalization conditions can be solved in QCD by integrating out the auxiliary fields. For example, while the Green's function in Eq. (9) may be written as (B3) it is still expressed in its original form when solving the renormalization condition in Eq. (10) numerically, and Eq. (B3) is only used to identify the corresponding renormalization factor as where α, α are spin indices. The remaining renormalization conditions in RI/xMOM are approached similarly. Altogether, using Eq. (B2), the renormalization factor of the staple-shaped operator may then be computed via the renormalization factors in the auxiliary-field description.
Moreover, when computing the CS kernel via Eq. (2), renormalization factors with no spin structure cancel in the ratio -it is therefore sufficient to find any combination of them that fully encodes the mixing effects. Z RI/xMOM Λq,n in Eq. (B4), as determined by solving Eq. (10), is one such combination.
For collinear configurations of p µ R and ξ µ R defined by (c) p R = 2.44 GeV and 0 < |z| < 1. Fig. 5a in the main text, for a set of renormalization scales defined by z ≡ p R · ξ R /(|p R ||ξ R |) and ξ R /a ∈ {2, 3}.

FIG. 16. As in
where y ≡ p R · ξ R , C F = 4/3 and Si(y) ≡  (7) is given by RI/xMOM C +z MS RI/xMOM . As mentioned in Section III B, the mixing effects induced by Z RI/xMOM ΓΓ (p R , ξ R ) on p µ R and ξ µ R receive additional contributions not expected in lattice perturbation theory at one-loop order for non-collinear configurations of p µ R and ξ µ R [125]. These mixing effects are illustrated Fig. 16. When p R · ξ R = 0, additional mixing contributions appear at 10%-level. When p µ R has components both collinear with and perpendicular to ξ µ R , the number of mixing contributions is larger, but the magnitude of each is reduced. Since RI/xMOM is an off-shell momentum scheme, contributions to mixing other than those induced by the staple-shaped operator renormalization itself are possible and may be relevant to explain the additional contributions [67,126,127]. Notably, the additional contributions are significantly smaller than those observed in the RI /MOM scheme in previous works [24].

Appendix C: Matching corrections
The quasi-TMD WF factorization formula from the discussion of power corrections in Section II is given by [41,71,75] where matching holds independently of the suppressed flavor indices, Dirac structure indices, and the renormalization scheme label up to power corrections, denoted p.c. 4 The reduced soft factor S r (b T , µ) [41] ensures that the infrared physics is the same as that of the physical TMD WF. The ± label denotes the ±ẑ displacement of the transverse Wilson line relative to the quarks in the staple-shaped operator used to define the quasi-TMD WF. Only the +ẑ displacement is shown in Fig. 1 and used in the determination of the CS kernel, and the ± label is omitted throughout the main text; the label is made explicit for completeness in the following discussion of the matching correction. The matching kernel H ± (x, P z , µ) is given by [128] H where the coefficients C ± φ can be derived from the matching of a heavy-to-light current in the heavy-quark effective theory to soft-collinear effective theory [128].

Fixed-order matching corrections
A fixed-order matching correction in Eq. (11) requires matching coefficients C φ (µ, p z ) computed in a perturbative expansion where a s (µ) ≡ α s (µ)/4π and α s (µ) is determined by running from α s (µ 0 = 2 GeV) as detailed in Appendix C 2. At NNLO, the logarithmic ratio of these coefficients in 4 Note that the CS evolution part in the matching formula differs from that in Refs. [41,71,75] by a suppressed imaginary part in the exponential, which depends on the soft factor subtraction. The imaginary part is suppressed here because it does not affect the extraction of the CS kernel. the matching correction is expanded as (C4) While taking a naive logarithmic ratio of NNLO matching coefficients, differs from the correction in Eq. (C4) only by higherorder terms, in the kinematic regime of this study the discrepancy is significant, as illustrated in Fig. 17. Consistent with the scaling of power corrections, δγ NNLO−I and δγ NNLO−II converge at larger momenta, but the rates of convergence and the sign and magnitude of x-dependent corrections differ between real and imaginary parts. The same conclusions apply to the NLO matching corrections, for which terms of order a 2 s (µ) are dropped in Eq. (C4) and Eq. (C5). As discussed further in Appendix C 2 and illustrated in Fig. 18, corrections δγ NLO−II and δγ NNLO−II are in a better agreement with results expected from the RG equations of C ± φ (µ, p z ). For this reason, the fixedorder results with a naive logarithmic ratio are not used in the determination of the CS kernel.
The difference between δγ NLO−II and δγ NNLO−II , illustrated in Fig. 19, indicates expected convergence in the real component of the matching correction at moderate x. However, matching corrections converge poorly in the imaginary component. This is in agreement with NLO results in Ref. [32] and may be explained by a larger sensitivity of Im(δγ q (µ, x, P z 1 , P z 2 )) to power corrections at small b T , as discussed further in Appendix C 3.
The matching coefficients needed for the NNLO matching correction are given explicitly below, with C F = 4/3, C A = 3, n f = 4, and ζ(n) denoting the Riemann zeta function. At NLO, C (1) φ (µ, p z ) has been calculated [71,75] to be The NNLO coefficients C ±,(2) φ (µ, p z ) for quasi-TMD WFs can be extracted from the corresponding results for quasi-TMD parton distribution functions (quasi-TMD PDFs), for which a factorization analogous to that in Eq. (C1) holds [65,66,68,72]. The matching kernel for quasi-TMD PDFs has been calculated at NLO [63,65,66,77] and recently at NNLO [77,78]. The real part of the coefficient C ± φ (µ, p z ) is equal to the square root of the matching kernel for the quasi-TMD PDF with the identification of ζ z = (2p z ) 2 . Obtained in this way, C

Resummation of momentum logarithms
The resummation of the matching coefficients discussed in Section II is enabled by their RG evolution equations [68,72], where γ ± µ (µ, p z ) and γ ± C (µ, p z ) are the virtuality and momentum anomalous dimensions of C ± φ (µ, p z ), respectively, γ µ (a s (µ)) and γ ± C (a s (2p z )) denote initial values in the solutions to the RG equations, and is the cusp anomalous dimension.
The anomalous dimension γ ± C (µ, p z ) in Eq. (C10) may be used to approximate the matching correction in Eq. (11) in the limit of P z 1 → P z 2 . As illustrated in Fig. 18, this approximation is used to select a fixed-order expansion of the matching correction in Eq. (C4) over that in Eq. (C5). Finally, the relation may be used to cross-check explicit perturbative results for γ ± C (µ, p z ) and γ ± µ (µ, p z ) detailed further below.
The resummed matching coefficients corresponding to K I φ and K II φ are given according to Eq. (13) by where the logarithmic ratio of initial-scale matching coefficients is expanded in a s (µ) as in Eq. (C4) for the fixed-order case, and (C20) Fig. 20 compares matching corrections in the two schemes at NNLL in the kinematic regime used in this work to determine the CS kernel. The differences between the resummations decrease at larger momenta, consistent with the decreasing α s . Since the ratios calculated from the lattice are renormalization group invariant and independent of the MS scale µ, the natural choice of the initial scales should be proportional to the hard parton momentum in the quasi-TMD WFs, e.g., (p z 0 , µ 0 ) = (p z , 2p z ). Therefore, in this work the resummed matching corrections are determined in scheme II.
To obtain the resummed matching corrections, all functions comprising K φ are computed perturbatively in a s (µ), A resummation of C ± φ (µ, p z ) from C ± φ (µ 0 , p z 0 ) of a given accuracy corresponds to a consistent set of loop orders chosen for C ± φ (µ 0 , p z 0 ) and the functions above, with a s (µ) run from a s (µ 0 = 2 GeV) as detailed further below. Examples for NLL and NNLL resummations are provided in Table III. Explicitly, the following perturbative results are used for the NLL and NNLL resummations. The β-function is given by where T F = 1/2. The cusp anomalous dimension Γ cusp , computed to four-loop order [129][130][131][132][133][134], is given by The non-cusp anomalous dimensions are given in terms of γ µ n and γ C± n ≡ γ C n ∓ iπγ C * n . Like the matching coefficients discussed in Appendix C 1, they can be extracted from the recently calculated NNLO matching kernel of the quasi TMD PDFs [77,78] and are given by and respectively, where the imaginary part is inferred from the logarithm L ± z in Eq. (C7) of the fixed-order result. The corresponding perturbative expressions of resummation kernels for the NNLL resummation are [135] K NNLL γµ (µ 0 , µ) and where r ≡ a s (µ 0 )/a s (µ) and the running coupling at µ is given at NNLO order by where X ≡ 1 + β 0 a s (µ) ln µ 2 0 /µ 2 , and α s (µ 0 = 2 GeV) ≈ 0.293 is determined as prescribed in Ref. [9]. N 3 LO terms require a s (µ) at NNLO, and NNLO terms at NLO. Finally, for the NNLL resummation, the logarithmic ratio of initialscale coefficients in Eq. (C19) is expanded as in Eq. (C4) to NLO.

Estimate of bT -dependent corrections
The validity of the factorization formula in Eq. (C1) requires that xP z b T 1 and (1 − x)P z b T 1. Within the kinematic range of P z and b T used in this work, such conditions are not sufficiently satisfied, especially at small b T , and considerable power corrections are expected.
Nonetheless, a factorization should exist for some range x ∈ [x min , x max ] for all values of b T , as long as xP z ,xP z Λ QCD . If P z b T 1, a factorization into TMDs applies; if P z b T 1, then it is reduced to a collinear factorization. One may conjecture a factorization formula that interpolates between collinear and TMD factorizations, written schematically at finite P z as where the matching kernel has a large P z b T expansion for P z b T > ∼ 1, where δC ± φ (b T , xP z ) and δH MS± φ (x, y;x,ȳ; xP z ,xP z , b T ) denote power-or exponentially-suppressed terms such as 1/(xP z b T ) and 1/(xP z b T ) or exp(−xP z b T ) and exp(−xP z b T ).
For the purposes of estimating the significance of the finite-b T correction, the contribution δH MS± φ to the above matching kernel is neglected and its study is left to future work. The matching kernel then reduces to H MS± φ (x, y;x,ȳ; xP z ,xP z , b T , µ) where the ( has a perturbative expansion analogous to that of C ± φ (µ, p z ) in Eq. (C3). and has been calculated at NLO in Refs. [66,75].
Explicitly, the NLO contribution C  In the real part, the corrections become negligible for b T > ∼ 0.4 fm, except for the pair of smallest momenta used in this work. In the imaginary part, the corrections are large for the entire kinematic range of this study. Examples of real parts of CS kernel estimatorsγ MS Γ (bT , x, P z 1 , P z 2 , µ), computed as described in Section III E with matching corrections at LO, for Γ = γ4γ5 and 0.12 fm ≤ bT ≤ 0.84 fm (Re[γ MS γ 4 γ 5 ] for bT = 0.48 fm is illustrated in Fig. 11 in the main text). The black dashed lines enclose the region in x used to determine the CS kernel. The notation n z = P z 1 /P z 2 displays momenta in lattice units.