Single-spin asymmetries at two loops

We find a novel mechanism for generating transverse single-spin asymmetry (SSA) in semi-inclusive deep inelastic scattering, distinct from the known ones which involve the Sivers and Collins functions, or their collinear twist-three counterparts. It is demonstrated that a phase needed for SSA can be produced purely within a parton-level cross section starting at two loops. We identify the complete set of two-loop diagrams for SSA, and discuss their gauge invariance and collinear factorization which features the $g_T$ distribution function. In the $k_T$ factorization framework, many more sources for SSA exist, and contributions from all possible two-parton transverse-momentum-dependent parton distribution functions are presented up to two loops and twist three.


I. INTRODUCTION
A study of single-spin asymmetry (SSA) in processes involving a transversely polarized nucleon is crucial for exploring the three-dimensional nucleon structure. Significant experimental signals of SSA have been observed in hadron production [1,2], which amount up to order of ten or more percent of unpolarized cross sections. Data on pion production have been very consistent, showing asymmetries up to pion transverse momenta of several GeV [3,4]. Despite decades of efforts, the origin of such significant asymmetries is not yet fully understood, due in part to large theoretical and experimental uncertainties. The future Electron-Ion Collider is expected to deliver very precise measurements, that will impose strong constraints on various theoretical approaches.
From the theoretical point of view, understanding SSA is a quest for a 'phase'. One is interested in the part of a cross section which depends linearly on the transverse spin vector S T of a nucleon. The spin vector usually comes with a factor i, so to make the cross section real, one has to find another factor i from involved diagrams. The first such attempt was made by Kane, Pumplin and Repko [5], who calculated the SSA for single hadron (pion) production from quark-quark scattering diagrams with a transversely polarized quark. They found that nonvanishing SSA for high p T reactions is proportional to a current quark mass. Although their calculation does not explain the measured large SSA, the observation with the result being proportional to a quark mass indicates that SSA is a twist-three effect in perturbative QCD. Subsequently, Efremov and Teryaev pointed out that nonvanishing SSA could be obtained as one goes beyond the leading power [6][7][8][9]. It is by now well known that sizable SSA can be generated through the combined effect of nonperturbative twist-three distributions of a nucleon, called the Efremov-Teryaev-Qiu-Sterman (ETQS) function [6,7,10,11], and the pole part of a propagator which provides the required phase. In this picture, the smallness of a current quark mass is no longer an issue, since the relevant mass scale is a nucleon mass. A similar twist-three effect has been implemented into fragmentation functions as an alternative source of SSA [12][13][14].
SSA has been also studied extensively in the k T factorization framework. Parton transverse momenta are incorporated either in transverse-momentum-dependent (TMD) parton distribution functions (PDFs) or in TMD fragmentation functions (FFs). The former is the Sivers function [15,16], which describes the spin-orbit correlation of partons inside a transversely polarized nucleon. The required phase arises from the pole of a propagator for Wilson lines. For the latter, the Collins function [17][18][19] governs the fragmentation of a polarized quark, in which the phase comes from final state interactions.
In this paper we will investigate the source of phases starting from a parton-level cross section up to two loops, taking the polarized semi-inclusive deeply inelastic scattering (SIDIS) as an example. On-shell internal particles in certain two-loop diagrams produce phases from different leading regions of particle momenta. The phase is then absorbed into the relevant piece in the factorization theorem for each leading region. In addition to the known Sivers (or ETQS) and Collins mechanisms, which are associated with the collinear regions of initial state and final state partons, respectively, we find a novel source of phases which goes into a hard kernel. The corresponding factorization formula contains the g T distribution function for a polarized nucleon, and the standard twist-2 FF for a final state hadron. Our result is reminiscent of the observations in [20][21][22][23]: the authors of [20,21] studied the same set of two-loop diagrams as proposed in this work, but for a transversely polarized quark target. The asymmetry is thus proportional to a current quark mass, and only factorizations into the known mechanisms (Sivers, ETQS,...) were examined. In [22,23], the authors found that multi-photon exchange between the leptonic and hadronic parts of inclusive deep-inelastic lepton-hadron scattering causes SSA. The two-photon-exchange diagrams considered in [22] has the same topology as our diagrams, but it turned out that their final formula does not contain the g T distribution function [24].
Once we are allowed to go to higher orders in a hard cross section, more twist-3 TMD PDFs and FFs from various spin projectors can contribute to SSA, resulting in abundant phenomenology to be explored. We will derive a complete set of subleading contributions to transverse SSA at two-parton twist-three accuracy in SIDIS up to two loops. Note that the proof of the factorization theorem at the twist-three level is highly nontrivial. Here we will adopt the twist-three factorization as a working hypothesis [25][26][27][28][29][30], and leave its rigorous proof to future projects. This paper is organized as follows. In Section II, we present the general formalism for SSA in the collinear factorization and check the QED and QCD gauge invariance. In Section III, the complete set of two-loop diagrams that should be included into the hard kernel introduced in Section II is identified. We analyze the various infrared divergences in the considered diagrams, and discuss how to handle these divergences in the collinear and k T factorizations in Section IV. A source of phase, which cannot be ascribed to the known mechanisms of SSA, will be highlighted. It thus represents a new contribution to SSA, and is our main result. Section V is the conclusion.

II. SEMI-INCLUSIVE DEEP INELASTIC SCATTERING
In this section we start with a general discussion of SSA in SIDIS in the collinear factorization framework mostly following the notations of [31] (see also [9,11]). The spin-dependent part of the e(l)p(P ) → e(l )h(P h )X cross section is given by where S ep ≡ (l + P ) 2 , Q 2 ≡ −q 2 = −(l − l ) 2 , L µν = 2(l µ l ν + l ν l µ ) − g µν Q 2 is the leptonic tensor, W µν is the hadronic tensor, and ν and µ are the polarization indices of the virtual photon in the amplitude and the complex-conjugate amplitude, respectively. The Bjorken variable is denoted as x B = Q 2 /(2P · q). We work in the so-called hadron frame, where the virtual photon and the proton move in the z direction with The incoming and outgoing leptons have the momenta where φ is the azimuthal angle relative to the z axis, and The hadronic tensor is expressed as a convolution of the reduced hadronic tensors w µν q,g and the quark and gluon FFs D 1q,g (z), which describe the processes q(P h /z), g(P h /z) → h(P h ), In the following we will suppress the flavor summation. The tensor w µν is represented by the sum of the two diagrams in Fig. 1, The hard matrix elements S (0) µν (k) and S (1)σ µν (k 1 , k 2 ), with σ being the polarization index of the attached gluon, can be computed in perturbation theory. The nonperturbative proton matrix elements M , as well as in color space (omitted for simplicity). Here S µ T = (0, S T , 0) is the spin vector of the transversely polarized proton with the normalization S 2 T = −1. The collinear factorization approach amounts to expanding the momentum k µ in S (0) and k µ 1,2 in S (1) around the collinear part proportional to P µ , After some manipulations, one finds (see Eqs. (31) and (42) of [31]) where α is transverse, σ (x 1 P, x 2 P ), and with n µ = δ µ − /P + . The authors in [31] have focused on the third line of Eq. (8), evaluating the corresponding one-loop hard kernel in perturbation theory and obtaining the soft gluon pole (SGP), soft fermion pole (SFP) and hard pole (HP) contributions. They have also shown that all the other lines in Eq. (8) vanish identically for these contributions.
However, all the lines in Eq. (8) can actually contribute to SSA in more general situations. It has been pointed out in [9,31] that the first line potentially contributes to SSA, if one picks up the g T distribution function with M N being the proton mass. The authors of [31] noted that if S (0) is calculated in the Born (one-loop) approximation, the asymmetry trivially vanishes, because there is no phase from the Born diagrams to cancel the i from the trace involving γ 5 (see the discussion in Sec. III). As we shall demonstrate later, certain two-loop diagrams for S (0) can generate a phase, which leads to a contribution to SSA proportional to g T . When this occurs, the second line of Eq. (8) provides the O(g) piece of the Wilson line in the definition of g T . To see this, note the QCD Ward-Takahashi (WT) identity for the contraction of a gluon of the momentum k 2 − k 1 , where a color matrix t a with the color index a is implicit on the right hand side. The above formula gives in the collinear limit. Upon the integration over x 1 or x 2 , the factor 1/(x 1 − x 2 + i ) becomes the θ-function that enters the Wilson line integralψ Differentiating Eq. (14) with respect to k 1,2 and then taking the collinear limit, one finds It means that the fourth line of Eq. (8) is non-vanishing, and the fifth line does not vanish either, as one can see by summing the two relations in Eq. (17). The crucial difference between the analysis of [31] and ours is whether the right hand sides of Eq. (17) vanish or not. Following [31], the hard kernel S (1) is defined as the sum of 'irreducible' diagrams without including the 'reducible' diagrams in which the k 2 − k 1 gluon merges with the incoming or returning quark line. With this definition, the right hand sides of Eq. (17), accounting for the contributions from those reducible diagrams, exist in general. See [14,32,33] for related discussions in the context of SSA. It turns out that, for the SGP, SFP and HP contributions at the Born level considered in [31], the right hand sides of Eq. (17), and the fourth and fifth lines of Eq. (8), all vanish. However, for the set of two-loop diagrams proposed in the next section, the right hand sides Eq. (14) do not vanish. The fourth and fifth lines of Eq. (8) do not vanish either, and they must be treated simultaneously for gauge invariance as elaborated below. Inserting Eq. (17) into Eq. (8), we observe that various terms organize themselves to form gauge invariant twistthree matrix elements. 1 Define dλ 2π dµ 2π e iλx1+iµ(x2−x1) P S T |ψ j (0)[0, µn]gF αβ (µn)n β [µn, λn]ψ i (λn)|P S T where the Wilson line [µn, λn] = P exp ig µ λ dtn · A(tn) renders the matrix elements gauge invariant, and the threeparton PDFs obey the symmetry property, The second term of the first line and the fourth line in Eq. (8) combine to give the covariant derivativeψD α T ψ, and the fifth line provides the Wilson line of this operator to make it gauge invariant. Equation (8) then becomes The above expression can be further simplified by using the identity [35] G where P denotes the principal value prescription. We shall omit P below to avoid confusion with the momentum P µ . The second equation can be regarded as the definition ofg(x), that is in fact related to g T (x), G F andG F through the QCD equation of motion (see Eq. (27) below). 2 We thus arrive at which will be the starting point of our two-loop analysis.

A. QED gauge invariance
Let us show that Eq. (23) respects the QED WT identity, which is actually nontrivial. The WT identity for S (0) is written as where / P = P + γ − , and q µ and q ν represent the outgoing and incoming photon momenta, respectively. It is obvious that the first line of Eq. (23) does not satisfy the WT identity by itself due to the presence of γ 5 / S T . (For unpolarized distributions, one has the spin projector γ − instead, and the WT identity is trivially satisfied.) In fact, only the sum of all lines in Eq. (23) obeys the WT identity. Similar observations have been made in the literature [9,14,36].
2g (x) is related to the first moment g 1T (x) from [34] has been used.
To verify it, we begin with a slight generalization of Eq. (24), for an on-shell, but not necessarily collinear momentum k. Differentiating Eq. (25) with respect to k α T and then taking the collinear limit, we get Furthermore, we need the following identity [31] g where theg(x) part combines with the second line of Eq. (23) to give the structure This combination vanishes when contracted with q µ or q ν , as can be easily checked by using Eq. (26). The G F andG F terms of Eq. (27) combine with the third line of Eq. (23) to give the structure Remembering that S (1) does not contain reducible diagrams, we have Using the following formulas together with Eqs. (20) and (24), one can show that both lines of Eq. (29) vanish, when contracted with q µ or q ν . This completes the proof of the QED WT identity.

B. QCD gauge invariance
Similarly, the QCD gauge invariance holds only for the sum of all terms in Eq. (23). Suppose that S (0) is evaluated in some gauge which involves a parameter ξ (here we have suppressed the subscripts µ, ν for simplicity). For instance, ξ can be the usual gauge parameter λ in the covariant gauge, or a vector n α in the axial gauge n · A = 0, in which the gluon propagator is proportional to respectively. We will show that Eq. (23) does not change under the variation of the gauge parameters ξ, concentrating on these two classes of gauges. To vary the λ or n dependence in diagrams at arbitrary orders, we apply the differential operator d/dλ or d/dn δ to each of the gluon propagators, yielding Starting with theg terms in Eq. (28), one writes the differentiated S (0) (k, ξ) as δS (0) (k, ξ). The momentum l α or l β appearing at one end of the differentiated gluon line (34) is contracted with a vertex the gluon attaches to. We select an ordinary gluon vertex denoted by α (without the contraction with its momentum) in the diagrams, and collect vertices which correspond to the attachments of another end denoted by β. Since all gluons are differentiated, the possible attachments of l β form a complete set of diagrams. Summing all the gluon attachments, one finds that the only uncanceled piece comes from the diagram with the momentum attaching to the outermost end of either the incoming or returning quark [37], as depicted in Fig. 2. One thus obtains δS (0) (k, ξ) = δS corresponding to these two possibilities. Clearly they satisfy which are entirely analogous to Eq. (25). It is then trivial to see that Eq. (28) with S (0) being replaced by δS vanishes. Therefore, theg part is gauge independent. Similarly, one can write the differentiated three-parton amplitude S (1) as for which the QCD gauge invariance holds for the sum of the reducible and irreducible diagrams. We then have which are again completely analogous to Eq. (30). Hence, Eq. (29) with S (0),(1) being replaced by δS vanishes. This completes the proof that Eq. (23) is QCD gauge invariant.

III. TWO-LOOP CONTRIBUTION TO PHASE
In this section, we identify the lowest order two-parton Feynman diagrams that produce nonvanishing contributions to Eq. (23) in the collinear factorization. It was pointed out [31] that the Born term, given by the one-loop box diagram in Fig. 3 (left), does not contribute. We can easily confirm this result by an explicit calculation as follows. The incoming quark has the momentum p 1 = xP with 1 ≥ x ≥ x B , and we write the virtual photon momentum as q = p 2 − p 1 with whose integrand, as contracted with γ 5 / S T , yields a factor i. In order to make the cross section real, the denominator must provide an imaginary part. However, this is clearly not possible, so the one-loop box diagram does not contribute to SSA.
Next, consider the virtual correction to the photon vertex in Fig. 3 in which the final state quark is on-shell with p + 2 = 0 (x = x B ). The loop integral over l needs to generate an imaginary piece in order to get a real contribution. Expressing (p 1 − l) 2 T , we see that l + must take a value in the range (0, p + 1 ) to get a nonvanishing contribution from the contour integration over l − . After picking up the pole l − = l 2 T /[2(l + − p + 1 )] + i , we need one more i from the remaining l or p 2 − l propagator. However, this is impossible due to l 2 = 2p 1 · l = 2p + 1 l − < 0 and (p 2 − l) 2 = 2(p 1 · l − p 2 · l) = 2(p + 1 l − − p − 2 l + ) < 0. Namely, neither the gluon nor the scattered quark can become on-shell, so this diagram does not contribute.
These observations apply to other one-loop diagrams, and we conclude that the asymmetry cannot be produced in a parton-level diagram at one loop.

: a case with two virtual gluons
We then move on to two-loop diagrams, starting with the diagram with two virtual gluons in Fig. 4 (see also footnote 2). Let the incoming quark carry the momentum p 1 − l 1 after emitting the first gluon, and p 1 − l 2 after emitting the second gluon of the momentum l 2 − l 1 . The scattered quark then carries the momentum p 2 − l 2 before receiving the second gluon and p 2 − l 1 before receiving the first gluon. Focus only on the propagator denominators entering the loop integrand for this diagram, and consider the poles of l − 1 and l − 2 (again, p + 2 = 0): It is easy to see that as long as one of the components l + 1 and l + 2 is greater than p + 1 , the integration over either l − 1 or l − 2 vanishes because the integration contour is not pinched. For example, if l + 1 , l + 2 > p + 1 , all the poles are in the lower-half plane except the one from the propagator (l 1 − l 2 ) 2 . The coefficient l + 1 − l + 2 is either positive or negative, and then the integration over either l − 1 or l − 2 vanishes. The same conclusion is drawn, as one of the components l + 1 and l + 2 is negative. We thus need to examine only the ranges 0 < l + 1,2 < p + 1 . We first investigate the case with l + 1 < l + 2 , for which we pick up the pole l − 2 = l 2 2T /[2(l + 2 − p + 1 )] + i from the incoming quark propagator. As for the pole of l − 1 , we pick up either l − 1 = l 2 1T /[2(l + 1 − p + 1 )] + i from the incoming quark propagator or from the gluon propagator with the momentum l 2 − l 1 . The first pole of l − 1 does not lead to any on-shell internal particles, which all have negative invariant masses as l + 1 < l + 2 . Indeed, the invariant masses of the scattered quark are given by ( For the second pole of l − 1 in Eq. (42), we just need to check the incoming quark of the momentum p 1 − l 1 : That is, this incoming quark does not go on shell.
We then analyze the case with l + 1 > l + 2 , for which we pick up the pole l − 1 = l 2 1T /[2(l + 1 − p + 1 )] + i from the incoming quark propagator. As to the pole of l − 2 , we pick up either l − 2 = l 2 2T /[2(l + 2 − p + 1 )] + i from the incoming quark or from the second gluon propagator. The discussion is completely analogous to the l + 1 < l + 2 case: one can show that none of the remaining propagators can go on-shell, so they cannot produce a phase. We conclude that Fig. 4 does not contribute to SSA. When one gluon is real and another is virtual, there is a chance to get an on-shell parton. Consider the diagram in Fig. 5 (left), which has the same assignment of momenta as in Fig. 4 but with a different cut. Because p 2 2 = 2p 1 · q(1 − x B ) ≥ 0, the scattered quark with the invariant mass (p 2 − l 2 ) 2 = p 2 2 − 2p 2 · l 2 may go on-shell and generate a phase. Hence, this diagram deserves a careful investigation.
The on-shell condition l 2 1 = 0 leads to l − 1 = l 2 1T /(2l + 1 ). The on-shell condition (p 2 − l 1 ) 2 = p 2 2 − 2p 2 · l 1 = 0 then yields two solutions for which the incoming quark is off-shell by ( which vanishes for l + 2 > p + 1 as before. For p + 2 < l + 2 < p + 1 , we pick up the pole l − 2 = l 2 2T /[2(l + 2 − p + 1 )] + i , that renders both the scattered quark and the virtual gluon off-shell with negative invariant masses. For l + 1 < l + 2 < p + 2 , we pick up the pole which makes the incoming quark of the momentum p 1 − l 2 off-shell with a negative mass. The invariant mass of the scattered quark approaches plus infinity as l + 2 → l + 1 from above, and −l 2 2T as l + 2 → p + 2 . That is, we have an on-shell internal particle, and an imaginary piece. However, this phase will be cancelled by a phase from the diagram with two real gluons, which we turn to next.
The diagram in Fig. 5 (right) with the final state cut on the outgoing quark of the momentum p 2 − l 2 and the gluons of the momenta l 1 and l 2 − l 1 is closely related to the previously considered diagram. The on-shell conditions (l 2 − l 1 ) 2 = 0 and (p 2 − l 2 ) 2 = 0 are equivalent to Eq. (48) and the vanishing of Eq. (49), respectively. To get an imaginary piece, the outgoing quark of the momentum p 2 − l 1 should go on shell, which then leads to the condition in Eq. (46). Therefore, this diagram can give rise to a phase from the same set of on-shell propagators as in the diagram of Fig. 5 (left). It has been known that the contributions from on-shell partons cancel between virtual and real corrections. A simple explanation for this cancellation is as follows: for l + 1 < l + 2 , the contour integration over the pole of the gluon propagator with the momentum l 2 − l 1 in the diagram on the left of Fig. 5 gives the metric tensor −g µν of the same sign as the real gluon in the diagram on the right. The other pieces in the loop integrands also contain the same sign between the two diagrams. The only difference comes from the sign of the scattered quark propagators: for the diagram on the left, the quark propagator with the momentum p 2 − l 2 is proportional to For the diagram on the right, the quark propagator with the momentum p 2 − l 1 is proportional to where we have used the on-shell conditions (p 2 − l 2 ) 2 = (l 1 − l 2 ) 2 = 0. Hence, the diagram on the right generates the same imaginary piece as the diagram on the left but with an opposite sign. Summing these diagrams, the imaginary pieces cancel. The same observation applies to other diagrams, where the real gluon of the momentum l 1 attaches to the incoming quark on the right hand side of the final state cut. In summary, the sum of the diagrams with two real gluons and those with one real gluon and one virtual gluon does not contribute to SSA.
C. Fig. 6: a two-loop box diagram Next we discuss a two-loop box diagram in Fig. 6, where two final state partons form a time-like invariant mass with rescattering between them via a virtual gluon with momentum l 2 − l 1 [38]. The plus and minus components of l 1 are fixed by the final state on-shell conditions as in Eq. (46). The contour integration over l − 2 has the structure For p + 2 < l + 2 < p + 1 , we have l + 2 − l + 1 > 0, as l + 1 < p + 2 implied by Eq. (46). In this case the pole l − 2 = l 2 2T /[2(l + 2 − p + 1 )] + i renders the outgoing quark p 2 − l 2 and the two virtual gluons all off-shell with negative invariant masses. In the range l + 1 < l + 2 < p + 2 , we pick up the contributions from two poles, Eq. (48) and l − 2 = l 2 2T /(2l + 2 ) − i . The former leads to an imaginary piece from the outgoing quark propagator p 2 − l 2 shown in Eq (49). For this pole, the incoming quark is off-shell by a negative invariant mass, and the virtual gluon of the momentum l 2 is off-shell by Following the same reasoning as before, the above imaginary piece will be canceled by the same type of diagram with the final state cut on the outgoing quark of the momentum p 2 − l 2 and the gluon of the momentum l 2 − l 1 (see Fig. 7). The contribution from the latter pole of l − 2 can be combined with the same pole in the range 0 < l + 2 < l + 1 , which makes the incoming quark off-shell by a negative invariant mass, and the virtual gluon of the momentum l 2 − l 1 off-shell by For this pole, the outgoing quark of the momentum p 2 − l 2 also generates an imaginary piece, since the on-shell condition (p 2 − l 2 ) 2 can be satisfied. The two solutions are given by meaning that the imaginary piece persists for arbitrary l 2 1T , l 2 2T < p 2 2 /4. Note that this contribution is not canceled by the same type of diagram with the final state cut on the outgoing quark of the momentum p 2 − l 2 and the gluon of the momentum l 2 (see Fig. 8). This diagram is just the complex conjugate of the considered diagram, and thus gives the identical contribution. The observation is that we need two final state partons to form a time-like invariant mass, which rescatter with each other via exchange of a virtual gluon. The diagram with the virtual gluon of the momentum l 2 − l 1 attaching to the incoming quark and the real gluon, displayed in Fig. 9, does not contribute an imaginary piece: as the first emitted gluon l 2 becomes on-shell, the second emitted gluon l 2 − l 1 is off-shell and the loop integral does not produce a phase.
There exists a class of diagrams, as displayed in Fig. 10, which have exactly the same set of on-shell propagators as in Fig. 8, and are equally important. The first and eighth diagrams can be directly obtained from the box diagram by changing the photon vertices. They guarantee that the imaginary piece alone respects the QED WT identity. 4 The other diagrams, such as the third and fourth diagrams in the first row, are obtained from the box diagram by changing the attachments of the l 2 gluon. They are thus crucial for the QCD gauge invariance. The sum of all these diagrams is written as the following compact formula, as depicted in Fig. 11, with the number of colors N c , and l 1 = p 2 − P h /z being determined by the overall momentum conservation. ∆ α α is the projector onto the physical polarization states for the final state gluon l 1 , with l = (l 0 , l) andl = (l 0 , − l). As long as we sum over all the terms in Eq. (23) to ensure the gauge invariance, we may replace ∆ αα by −g αα . The other factors in Eq. (56) are defined as and The two terms in Eq. (56) correspond to the two possible insertions of the final state cut (cf. Fig. 8). Taking the hermitian conjugate of the second term, one confirms that Eq. (56) is symmetric in the indices µ, ν.
It should be noted that, in the end, the final set of diagrams are identical to those considered in [21]. We have however provided a more complete analysis of diagrams, including the discussion of gauge invariance and various kinematic configurations. In particular, we have identified the roots in Eqs. (46) and (55) which are essential for the factorization of our new contribution to be highlighted in the next section.

D. Collinear splitting diagrams
There exists another class of two-loop diagrams, which contains an imaginary part and is characterized by the collinear splitting of an on-shell parton. An example is shown in Fig. 12, where the quark with the momentum p 2 − l 2 is on-shell, and splits into two on-shell partons, a quark with the momentum p 2 − l 1 − l 2 and a gluon with the momentum l 1 . This configuration is kinematically possible only if the three partons are all collimated to each other, and thus gets phase space suppression. Indeed, a simple analysis indicates that the imaginary part arises, only if l 2T is opposite in direction relative to l 1T and l 2 2T < l 2 1T . It means that this diagram is suppressed by l 2 1T /p 2 2 ∼ P 2 hT /Q 2 , namely, a higher twist effect. We therefore neglect these diagrams.

A. Collinear factorization
Since the momentum l 1 has been set to l 1 = p 2 − P h /z in the collinear factorization, we investigate only the infrared divergence from the integration over l 2T . First consider the (++) and (−+) cases, for which the radiative l 2 gluon is collimated to the initial proton in the collinear region The incoming quark of the momentum p 1 − l 2 is nearly on-shell, and the associated l 2T integral is logarithmically divergent like The l 2 − l 1 propagators for the (±+) combinations are written as There is an apparent divergence at l 1T → l 2T in the (++) case, but it is innocuous because the numerator of Eq. (56) vanishes as l 1 = l 2 . The last term of Eq. (58) is given by for which the (−+) combination appears problematic in the limit l 1T → −l 2T . Inspecting the numerator, we find that all components of p µ 2 − l µ 1 − l µ 2 go to zero simultaneously as l 1T → −l 2T , so this limit is in fact infrared finite. To determine the nature of the collinear configuration in the (±+) combinations, look at the potentially dangerous term in Eq. (56), In the small l 2T limit, l µ 2 has only the plus component. We then immediately see that the β = − component in Eq. (67) vanishes owing to (/ p 1 − / l 2 )γ − ∼ (γ − ) 2 = 0. As for the component β = +, we find from Eq. (58) which is a consequence of the QCD WT identity. When the longitudinal momentum l 1α goes into the final state cut, this contribution also vanishes. Therefore, we only need to worry about the case, where β in Eq. (67) is transverse. For transverse β, the singularity does survive. We argue that this can be absorbed into the HP contribution to SSA known in the literature. Indeed, since the collinear gluon with the momentum l 2 is transversely polarized and travels a long distance, we may deform Fig. 11 into Fig. 13, which is identical to Fig. 2 of [31]. As demonstrated in [31], this corresponds to the HP contribution associated with the three-parton ETQS function G F (x 1 , x 2 ), where the value of x 1 is set to the Bjorken variable x B : label the longitudinal momentum of the incoming quark by p + 1 − l + 2 = xP + − l + 2 = x 1 P + and the gluon momentum by l + 2 = (x 2 − x 1 )P + . The on-shell condition l + 2 ≈ p + 2 = (x − x B )P + then yields x 1 = x B . In practice, to absorb the collinear divergence into the ETQS function, we insert the projector (γ + )(γ − ) from the Fierz identity into the quark lines with the momenta p 1 − l 2 and p 1 on the left and right hand sides of the cut, respectively. The matrix γ + then appears as the spin projector in the definition of the ETQS function, and γ − is contracted to the corresponding one-loop three-parton hard kernel. This factorization has been explicitly demonstrated for a quark target model in [21]. We therefore subtract this divergence, as well as the finite part by scheme choice, from Eq. (56) as a known mechanism. Next we turn to the (+−) and (−−) combinations. The l 1 − l 2 and the p 2 − l 1 − l 2 propagator denominators have the forms as in Eqs. (65) and (66), respectively, which are infrared finite in the limits l 1T → l 2T and l 1T → −l 2T as explained above. Besides, the radiative l 2 gluon satisfies the hierarchy for these combinations, such that there is no infrared singularity in the p 1 − l 2 propagator. Hence, the corresponding phase cannot be absorbed into nonperturbative distribution functions. It thus represents a new perturbative origin of SSA purely attributed to a hard kernel, and this is the central observation of our work. In the collinear factorization framework, one can insert the projector (γ + )(γ − ) between the upper two blobs in Fig. 1 and (γ 5 γ i )(γ i γ 5 ) between the lower two blobs. The former leads to the standard collinear twist-two FF D 1 and the latter leads to the g T distribution function. We then arrive at a factorization formula where the proton spin has been assumed to be along the y direction. The superscript denotes the order to which various factors are evaluated. This is the explicit structure we advocated in Eq. (23). There is, however, another possibility. One can insert the projector (γ 5 σ i+ )(σ i+ γ 5 ) between the lower two blobs in Fig. 1 and the identity matrix (I)(I) between the upper two blobs. The former gives the twist-two transversity distribution function h 1 , and the latter gives the collinear twist-three FF E [39]. We thus acquire an additional contribution The FF E dropped out in the one-loop calculation of SSA in SIDIS [14], where it was denoted asê 1 , and also in pp collisions [13]. It first shows up at two-loops, and is naturally suppressed by a factor α s compared to the one-loop contributions to SIDIS in [14]. We point out that an analysis of the complete set of collinear FFs is considerably more complicated at twist-three level.
Of course, Eq. (71) is also parametrically suppressed by a factor α s compared with the known one-loop contributions from the ETQS (or Sivers) distributions [30]. The reason we nevertheless consider them worthwhile to study is because the g T distribution function has the Wandzura-Wilczek part [40] related to the twist-two polarized quark distribution function ∆q(x). This can be seen from Eq. (22) together with another identity (see Eq. (45) of [35]) As suggested in [41], the genuine twist-three distributions G F and G F , which are poorly constrained from the experimental data at present, may be numerically small. On the other hand, the polarized quark distributions, being purely twist-two quantities and well constrained by data, give a finite contribution to the proton spin. Hence, the apparent suppression by α s could be numerically compensated in practice. This possibility will be explored in future works [42].
The above argument suggests that only the (±−) roots is kept in the matrix elements S (0) in Eq. (23). Remarkably, however, we can include also the (±+) roots in this formula by inserting the Fierz identity into the p 1 quark lines, instead of the p 1 − l 2 and p 1 quark lines, as we have done in the (±−) case. It will be demonstrated that these divergences due to the alternative Fierz insertion cancel between the first two terms in Eq. (23). Substituting Eq. (27) into Eq. (23), we obtain the structures in Eqs. (28) and (29). We then notice that is free of the collinear divergence for an on-shell but not necessarily collinear momentum k: in the collinear region where k and l 2 are parallel, the numerator can be expressed as This gives a vanishing contribution when k β ∝ l β 2 , because of k 2 = 0 and M αβ l β 2 = 0. The differentiation of Eq. (74) with respect to S α T ∂/∂k α then immediately leads to the cancellation of the divergences in the Wandzura-Wilczek part of g T in Eq. (28).
Including the (±+) roots into S (0) and S (1) , which collects the diagrams with an additional valence gluon attaching to an internal line of S (0) , we find that the resulting collinear divergences do not cancel in Eq. (29). We argue that they should be absorbed into the renormalization of the G F andG F distributions associated with the one-loop HP contribution to SSA. Indeed, Eq. (29) can be redrawn as in Fig. 14 by inserting the Fierz identity at a different location. To achieve it, the projectors for the S (0) terms have been made the same as for the S (1) terms in the first and second lines of Eq. (29) via the replacements γ 5 / S T /x 1 = / P γ 5 / S T /(x 1 / P ) = i / P γ α α−+S T /(x 1 / P ) and / S T /x 1 = − / P S α T γ α /(x 1 / P ), respectively. In the above expressions γ α corresponds to the vertex located at the outermost end of the incoming quark in Fig. 14, and 1/(x 1 / P ) represents the quark propagator following this vertex. The lower parts of the diagrams on the right are then identified as the one-loop diagrams to renormalize the G F andG F distributions (see Fig. 7 of Ref. [43]). In principle, one is able to rederive the evolution equations of G F andG F this way. We leave it to a future work.

B. kT factorization
Next we come to the more complicated k T factorization, in which both the initial and final state partons can carry transverse momenta. As elaborated below, the transverse momenta l 1T , l 2T of the real gluons in the considered two-loop diagrams serve as these additional parton kinematic variables [44], independent of the momentum fractions x and z. For example, l 1T needs not to be equal to P hT /z associated with the produced hadron as in the collinear factorization. A parton is then off-shell by −l 2 T in the k T factorization, which is regarded as an infrared scale. That is, an infrared divergence in the k T factorization is represented by an infrared logarithm ln l 2 T . A factorization formula is expressed as a convolution of a hard kernel with TMD PDFs and TMD FFs in both longitudinal and transverse momenta. The analysis of the phase origin is the same as in Sec. III with the solutions of l + 1 and l + 2 being easily adapted from their collinear counterparts, given by Eqs. (46) and (55), respectively. Below we will discuss the k T factorization for the four combinations of (l + 1 , l + 2 ) separately. First consider the (++) case, for which the radiative gluons of the momenta l 1 and l 2 are both collimated to the initial proton under the hierarchy similar to Eq. (63). The two final state partons with the momenta p 2 − l 1 and l 1 move in the minus and plus directions, respectively. The incoming quark of the momentum p 1 − l 2 is nearly on-shell, and the associated l 2T integral produces an infrared logarithm from the collinear region l 2T ∼ l 1T as shown in Eq. (64). Besides, the l 2 − l 1 gluon with the invariant mass being of order l 2 1T as l 2T ∼ l 1T , is soft according to Eq. (65). On the other hand, the outgoing quark of the the momentum p 2 − l 2 moves mainly in the minus direction, namely, in the direction of the produced hadron. Since the attaching gluon momentum l 2 − l 1 is soft, the quark line with the momentum p 2 − l 2 can be eikonalized: if this gluon is longitudinally polarized. The resultant Wilson line contains the propagator 1/(l + 1 − l + 2 + i ), which generates a phase as l + 1 = l + 2 . The collinear logarithm together with this phase are then absorbed into the Sivers function by inserting the projector (γ + )(γ − ) from the Fierz identity in Eq. (69): the matrix γ + appears as the spin projector in the definition of the Sivers function, and γ − is contracted to the corresponding leading-order two-parton hard kernel. Under this factorization, the quark carries the momentum p 1 − l 1 before hard scattering, implying that the Sivers function depends on the longitudinal momentum p + 1 − l + 1 ≡ xP + and the transverse momentum l 1T . If the l 2 − l 1 gluon is transversely polarized, the collinear logarithm can be absorbed into the one-loop renormalization of the twist-three three-parton TMD PDF (the TMD version of the ETQS function). To achieve this factorization, we simply insert the projector (γ + )(γ − ) from the Fierz identity: γ + appears as the spin projector in the definition of the three-parton TMD PDF, and γ − is contracted to the corresponding leading-order three-parton hard kernel. After the factorization, the quark and the gluon on the left of the final state cut carry the momenta p 1 − l 2 and l 2 − l 1 before hard scattering, respectively, and the quark on the right of the final state cut carries p 1 − l 1 . It indicates that the three-parton TMD PDF depends on the longitudinal momenta p + 1 − l + 2 ≡ x 1 P + and p + 1 − l + 1 ≡ x 2 P + and on the transverse momenta l 1T and l 2T . The phase comes from the on-shell p 2 − l 2 propagator in the hard kernel, which corresponds to the SGP contribution observed in the collinear factorization as l 2T = l 1T , and to the HP contribution as l 2T = l 1T . We thus conclude that the (++) component does not lead to a new contribution to SSA.
Next we turn to the (−−) combination, for which both the radiative gluons of the momenta l 1 and l 2 follow the hierarchy similar to Eq. (70). Due to p − 2 p + 2 , the two final state partons as well as the momentum p 2 − l 2 are mainly in the minus direction. The incoming quark of the momentum p 1 − l 2 is highly off-shell by O(Q 2 ), so the collinear-to-proton divergence in Eq. (64) is absent. The l 2 − l 1 propagator develops a soft logarithm as l 2T ∼ l 1T , the same as in the (++) combination according to Eq. (65). Since l + 1,2 are soft, the two internal quark lines with the momenta p 1 − l 2 and p 1 − l 1 can be eikonalized. The resultant phase is absorbed by the twist-two FF, or the Collins function in the k T factorization framework. Note that, because the eikonalized p 1 − l 1,2 quark lines always remain off-shell, the Wilson lines involved in the definition of the Collins function do not produce a phase. This result differs from that for the Sivers function mentioned above. See also [45].
The factorization of the infrared logarithm into the Collins FF can be done by inserting the Fierz identity in Eq. (69) between the two-loop FF and the leading-order two-parton hard kernel (i.e., between the upper two blobs in Fig. 1). One picks up the (γ 5 σ i+ )(σ i+ γ 5 ) term, in which σ i+ γ 5 goes into the definition of the Collins function, and γ 5 σ i+ goes into the hard kernel. It implies that the same spin projector also enters the leading-order PDF of the polarized proton, defining the transversity distribution h 1 . The other Dirac structures lead to either vanishing or subleading (twist-three TMD) contributions. The final state quark carries the momentum p 2 − l 1 , so the Collins function depends on the longitudinal momentum p − 2 − l − 1 ≡ zP − h and the transverse momentum l 1T . In conclusion, the (−−) contribution also reduces to the known mechanism of SSA.
We then turn to the (−+) combination. It has been pointed out that the l 2 − l 1 propagator does not generate an infrared logarithm in this case (see Eq. (65)). The quark line p 1 − l 2 develops a collinear logarithm when the vertex β of the l 2 gluon is transverse, as explained in the previous subsection. The k T factorization of this infrared logarithm is similar to the collinear factorization: it is absorbed into the three-parton TMD PDF with the same spin projector. Under this factorization, the quark and the gluon on the left of the final state cut carry the momenta p 1 − l 2 and l 2 before hard scattering, respectively, and the quark on the right of the final state cut carries p 1 . It indicates that the three-parton TMD PDF depends on the longitudinal momenta p + 1 − l + 2 ≡ x 1 P + and p + 1 ≡ x 2 P + and on the transverse momentum l 2T . The phase comes from the on-shell p 2 − l 2 propagator in the one-loop three-parton hard kernel, which corresponds to the HP contribution observed in the collinear factorization. There is no SGP contribution, because of l 2 = l 1 for the (−+) combination.
At last, we investigate the (+−) combination, in which the phase cannot be absorbed into nonperturbative distribution functions. For this combination, there is no infrared singularity in the l 2 − l 1 and p 1 − l 2 propagators. The apparent singularity at l 1T = −l 2T from the last term of Eq. (58) does not exist either. Hence, we arrive at a factorization formula similar to Eq. (71), but with g T and D 1 being interpreted as the TMD PDF and the TMD FF, respectively.
Before closing this section, we briefly comment on the general structure of SSA at the two-parton twist-three level in the k T factorization framework. If we allow for k T -dependent distributions, there are more contributions than the TMD versions of Eqs. (71) and (72). For example, one can insert (γ i )(γ i ) between the upper two blobs and (γ 5 γ + )(γ − γ 5 ) between the lower two blobs. The former yields the twist-three TMD FF D ⊥ , while the latter yields the twist-two TMD PDF g 1T . (All the notations for the TMD PDFs and the TMD FFs follow [39].) Exhausting all possible combinations of the spin projectors for higher-order hard kernels, we derive the contributions to SSA up to the two-parton twist-three and two-loop level γ5γ − ,γ5γ y ⊗ G ⊥ + g 1T ⊗ H γ y ,γ + ⊗ D 1 + g T ⊗ H (2) where the functions labelled by * diminish for a massless produced hadron. The FF G ⊥ comes from the projector γ i γ 5 , and H ⊥ 1 from σ i+ γ 5 . The TMD transversity function h 1 denotes h 1 − (k 2 x − k 2 y )h ⊥ 1T /(2M 2 ) actually. For the h 1 piece, the hard kernel H (1) γ5σ y− ,γ5σ x+ may appear at one loop. It has been omitted in Eq. (77), because it is subleading compared to the term H (0) γ5σ y− ,γ5σ y+ . The nonperturbative spin-momentum correlation in the Sivers function and the Collins function are basically determined by fits to data. Including the numerous terms in Eq. (77), it is expected to make an impact on the determination of the Sivers function and the Collins function.
When we work in the collinear factorization, all the above terms vanish except for the ones which reduce to Eqs. (71) and (72). This emphasizes the importance of the parton transverse momentum for the existence of SSA. Among the many terms in Eq. (77), the one proportional to the distribution f T is particularly interesting. Since f T is T-odd, the corresponding contribution flips signs between SIDIS and Drell-Yan. Its definition involves the proton spin ψ γ α ψ ∼ αβ T S T β f T (x, k 2 T ), that combines with a factor of k x from the one-loop hard kernel H (1) to generate a SSA proportional to P x h S y T . If we stick to the leading order hard kernel, the k x dependence will disappear, and f T will contribute only to the SIDIS structure function associated with sin φ S (denoted by F sin φ S U T in [39]), where φ S is the azimuthal angle of the proton spin relative to the lepton plane. Because the first moment vanishes d 2 k T f T (x, k 2 T ) = 0, its k T dependence exhibits some nodes in k T . This may result in a node in SSA as a function of P hT , similarly to what was observed in [46,47].

V. CONCLUSION
In this paper, we have presented a detailed study of the two-loop diagrams that produce an imaginary phase in SIDIS and discussed their gauge invariance and collinear factorization properties. In addition to the known mechanisms for SSA, we have also identified an entirely new contribution proportional to the g T distribution function. While it is parametrically suppressed by a factor α s , g T has the Wandzura-Wilczek part related to the polarized quark distribution functions. Since this part is usually considered to be larger than the genuine twist-three one, our new contribution could be comparable in magnitude to those from the ETQS function. In a future publication [42], we plan to give a numerical estimate of the obtained results in this paper, and make comparisons with the existing data as well as predictions for the Electron-Ion Collider.
We note that there have been a lot of discussions on potentially dominant sources of SSA recently. There is an indication that the Sivers or ETQS contribution may be numerically small [48]. Instead, a successful fit of the RHIC data [41,49] suggests that the twist-three FFs may be the dominant source of SSA. In order to confirm this, the same FFs should be able to fit other observables [50][51][52][53]. In other words, a global analysis of many different data is necessary for understanding the above observations. The subleading contributions derived in the k T factorization with a more complete set of origins for SSA may provide such a theoretical framework. Because the momentum transferred involved in the relevant processes are not large enough, higher-order hard cross sections may give sizable corrections. Therefore, the rich subleading structures proposed in this work are phenomenologically important.