What does kinematical target mass sensitivity in DIS reveal about hadron structure?

We study the role of purely external kinematical approximations in inclusive deep-inelastic lepton--hadron scattering within QCD factorization, and consider factorization with an exact treatment of the target hadron mass. We discuss how an observed phenomenological improvement obtained by accounting for target mass kinematics could be interpreted in terms of general properties of target structure, and argue that such an improvement implies a hierarchy of nonperturbative scales within the hadron.


I. INTRODUCTION
Understanding the nature of confined systems of strongly interacting quarks and gluons (or partons), such as hadrons and nuclei, remains one of the single most challenging problems in nuclear and particle physics. An essential tool in this quest has been the factorization of the short-and long-distance parts of scattering amplitudes, which has allowed the systematic study of hard scattering processes in terms of universal sets of parton distribution functions (PDFs) [1]. While this has proved an enormously successful paradigm when applied to reactions at high energies, where typical momentum transfers Q are much greater than any hadronic mass scales, Q O(1 GeV), delineating the extent to which factorization techniques may be applicable at lower energies has been a rather more formidable task.
The transition region at intermediate momentum transfers, Q ∼ 1 -2 GeV, where descriptions of phenomena in terms of parton degrees of freedom give way to nonperturbative dynamics, is still poorly understood. Here small-coupling quantum chromodynamics (QCD) techniques are often applicable, while at the same time hadronic mass effects are not always negligible. Phenomena specific to the this regime, such as quark-hadron duality [2][3][4][5][6] and precocious scaling [7,8] have attracted much interest, and it has been the focus of dedicated experimental efforts at Jefferson Lab [9][10][11].
Extending standard perturbative QCD (and even general partonic pictures) to the low-Q region also presents theoretical challenges, particularly since certain mass effects that are normally treated as negligible in QCD processes with large Q may become important there.
Consider the basic statement of factorization for the inclusive lepton-nucleon (or any other hadron or nucleus) DIS cross section (see, e.g., Ref. [1]), dσ dx Bj dQ 2 = dξ dσ dx Bj dQ 2 f (ξ; Q) + p.s. , where x Bj = Q 2 /2P · q is the Bjorken scaling variable, with P and q the target nucleon and exchanged virtual photon momenta, respectively, and for simplicity we omit explicit flavor dependence. The partonic differential cross section dσ is expressed in terms of the corresponding scaling variablex Bj = Q 2 /2k · q for the target parton with momentumk in the subprocess (see Fig. 1 below). The function f (ξ; Q) is to be interpreted as a probability distribution of partons with fraction ξ =k + /P + of the nucleon's light-cone momentum, with extra scale dependence induced by QCD evolution. 1 The first term in Eq. (1) is the end 1 We define a four-vector v µ in terms of light-cone variables as v result of a sequence of canonical approximations which increase in accuracy as Q increases with fixed x Bj [12], while the second term represents power suppressed ("p.s.") errors that are proportional to powers of 1/Q 2 relative to the first term. Factorization then describes the limit of large Q, with x Bj fixed, where these error terms can safely be ignored.
Knowing the exact value of the correction term in Eq. (1) requires a much deeper understanding of complex QCD dynamics than what is treated by the usual factorization. However, there are certain standard approximations (see, e.g., Ref. [13, p. 95]) contributing to the error in Eq. (1) that deal only with the external kinematics of P and q and have nothing specifically to do with the dynamics of the deeply inelastic collision. These are what we will mean by "purely kinematical" approximations. The most common of these is a target mass approximation in inclusive DIS: if the target is moving in light-cone variables with large "+" momentum and zero transverse momentum, then P µ = (P + , M 2 /2P + , 0 T ) ≈ (P + , 0, 0 T ).
As will be discussed in detail below, the resulting errors are proportional to powers of where M is the target nucleon mass. By contrast, the derivation of factorization uses approximations on internal partonic constituents, whose exact properties depend on complex details of QCD dynamics. The resulting error terms are suppressed by powers of m 2 /Q 2 , where m here represents any of the scales associated with intrinsic dynamical properties of bound state partons, such as their virtualities. Since the factorization theorem is meant to describe the limiting behavior as 1/Q 2 → 0, the x 2 Bj M 2 /Q 2 errors from the kinematical expansion are typically lumped with the dynamical m 2 /Q 2 errors. We will, however, refrain from identifying the O x 2 Bj M 2 /Q 2 terms as a contribution to the O (m 2 /Q 2 ) corrections in all our discussions so as to emphasize the different origins of these two types of errors.
Of course, all mass scales are ultimately fixed by the QCD scale parameter Λ 2 QCD , so the internal scales we associate with m 2 should be understood to be proportional to M 2 : m 2 = ηM 2 , with η being a dimensionless proportionality factor. So another way then to state the above is that we will consider expansions in powers of ηM 2 /Q 2 separately from powers of M 2 /Q 2 . This is explained in more detail in Secs. III and IV.
At moderate Q, a natural question is whether all of the various types of contributions to the error term in Eq. (1) are really so negligible and, if not, whether some improvement is possible. For instance, when Q ∼ 1 GeV and x Bj is not especially small (x Bj ∼ 1), the Bj M 2 /Q 2 purely kinematical errors may no longer be negligible. Since they arise only from kinematical approximations, it is reasonable to ask if these purely kinematical errors can be removed with minimal or no modification to the basic correctness of the factorization derivation for the first term in Eq. (1). In fact, as we will discuss in Sec. IV, the standard derivations do not actually require a massless target approximation. Setting the target mass to zero is an ancillary step, while keeping it nonzero leads naturally to Nachtmann scaling [14]. This was actually recognized some time ago by Aivazis, Olness and Tung (AOT) [15] in the context of heavy quark contributions in DIS.
Questions of interpretation remain, however. It must be established, for example, whether it is reasonable to expect that correction for kinematical mass errors will result in phenomenological improvements in applications of QCD factorization. That it should is not obvious since there is no reason a priori to assume one type of power correction is more important than another. The mass scales divided by Q 2 that contribute errors to factorization originate from nonperturbative features of the target hadron, so the effectiveness of target mass improvements must be tied to specific features of individual targets. Questions concerning the relevance of target mass kinematics therefore cannot generally be disentangled from questions about hadron structure.
In this paper we will argue that it is most natural to expect an improvement from the approach of AOT [15] if the structure of the target involves a hierarchy of nonperturbative scales. Keeping certain powers of 1/Q 2 while neglecting others makes sense only when there is a reasonably large variation in mass-squared factors in the numerators. Questions about the phenomenological usefulness of kinematical target mass corrections can then be reframed as questions about target structure. This is how we advocate addressing the issue of target mass kinematics more generally, as explained in more detail in Sec. V. Before this, in Sec. II we introduce the basic kinematics of the DIS process at finite energy, keeping all masses in the structure functions and the kinematic variables on which they depend. In Sec. III we introduce the massless target approximation, carefully defining projection operators and

II. DEEP-INELASTIC SCATTERING KINEMATICS
The reaction we will consider in the present work is inclusive lepton scattering from a target hadron, such as a nucleon, l(l) + N (P ) → l (l ) + X(P X ), where l µ and l µ are the incident and scattered lepton four-momenta, and P µ and P µ X are the four-momenta of the target nucleon and hadronic final state X, respectively. The reaction will be assumed to proceed through the exchange of a virtual photon with four-momentum q µ = l µ − l µ . To make the calculation more transparent, we work in a frame where the nucleon moves in the +z direction, the exchanged virtual photon moves in the −z direction, and both have zero transverse momentum. In this case the nucleon and photon four-momenta are conveniently parametrized in terms of light-front coordinates as where Q ≡ −q 2 , and x N is the Nachtmann scaling variable [14,16], so that the Bjorken variable can also be written In the Breit frame, where the photon has zero energy, the target has P + = Q/( √ 2x N ) and the four-momenta simplify to The total inclusive cross section is expressed as a contraction of leptonic and hadronic tensors, where E is the energy of the scattering lepton, s = (l + P ) 2 is the invariant mass squared of the system, and α em = e 2 /4π is the electromagnetic fine structure constant. The leptonic tensor is and the totally inclusive hadronic tensor is defined as Here, the X symbol represents a sum over all possible final states |X , including integrals For spin-averaged, parity-conserving scattering the hadronic tensor can then be expanded into dimensionless structure functions according to The structure functions take all Lorentz invariants formed by P and q as arguments. These include P · q and Q 2 , while independent mass dependence is left implicit. Instead of P · q we choose x Bj as the independent variable, although it turns out that x N , in fact, is a more natural choice in the context of factorization. We will continue to use the Bjorken variable x Bj , however, since that is the more traditional choice, but will write it in the form x Bj (x N , M 2 /Q 2 ), as a function of x N and M 2 /Q 2 explicitly. While this may appear cumbersome initially, it will help make later approximation steps unambiguous.
The structure functions F i (i = 1, 2) can be calculated from the hadronic tensor W µν using projection tensors, defined by Up to this point all of the expressions for the cross sections and structure functions are for exact kinematics. In the next section we consider the limit in which the mass of the target is taken to be much smaller than the scale Q, M/Q 1.

III. MASSLESS TARGET APPROXIMATION (MTA)
Purely kinematical approximations are those which can be defined in the context of Sec. II; that is, by considering only overall external momentum and with no reference to hadrons' constituents or other dynamical properties. A kinematical approximation replaces P and q, and the arguments of the structure functions F i x Bj (x N , M 2 /Q 2 ), Q 2 , by different, approximated quantities, without changing anything about the functions in Eq. (9) themselves.
Let us define the natural approximate target hadron four-momentum P in a frame where it is moving at relativistic speeds by setting the target mass to zero, The massless target approximation (MTA) is the kinematical approximation defined by the replacement P · q → P · q , wherever this occurs in Eq. (8). To set up the approximation, it is convenient to first switch the structure function decomposition to a basis that uses P instead of P , Here we have defined with the corresponding tensors to project out the structure functions defined by This is a more convenient basis if we ultimately want to neglect the minus component of P .
Note that it is x N that appears in the factors on the right side of Eqs. (15), and not x Bj . To relate structure functions in the two bases, we use Eq. (13) .
Applying the projectors (15) gives We stress that no approximation has been made in the discussion up to this point. The coefficients in front of the structure functions in Eqs. (17) are, in fact, the same as those in the literature that are referred to as "ξ-scaling" [15,[17][18][19][20]. The first step in the MTA is where W µν is the approximate hadronic tensor. In this approximation, Eq. (4) gives so that x Bj and x N are interchangeable in the MTA. 2 The above discussion suggests a definition for the target mass approximated structure where the script notation is a shorthand that means x Bj (x N , M 2 /Q 2 ) is understood to be everywhere replaced by x Bj (x N , 0), so that kinematical dependence on the ratio M 2 /Q 2 is neglected. Part of the MTA is to approximate structure functions defined in the "tilde" where the approximation is to drop all the x 2 Bj M 2 /Q 2 errors. In other words, assuming an exact hadronic tensor in Eq. (8), the MTA [Eqs. (14)- (18)] is equivalent to a set of natural argument replacements that are reasonable when Q is very large or x Bj is very small. This approximation is usually made implicitly in discussions of high energy scattering in the literature [13]; here we have made it very explicit so that it will be straightforward to reverse it. Each step in Eq. (21) can be traced back to the unapproximated hadronic tensor and structure functions. Operationally, it is implemented by the replacement in Eq. (18).
This completes our general discussion of the exact and target mass approximated structure functions, based on considerations of external kinematics alone. In the remainder of the paper we will specialize the discussion to the role of the target mass in collinear factorization.

IV. THE MTA AND COLLINEAR FACTORIZATION
In this section we discuss how the MTA of the last section, combined with the standard factorization steps [12], leads to the well-known collinear factorization theorem of Eq. (1).
Again, we will present the steps in greater detail than is common in the literature, which will help later to unravel the source of purely kinematical mass sensitivity.
Before any factorization approximations are made, the exact parton momentum k can in general have both a virtuality and transverse momentum, The steps to obtain factorization approximate certain internal lines by exactly light-like ones. In particular, all lines entering and exiting the hard partonic scattering subprocess in Fig. 1 are taken to be massless and on-shell, so that in Eq. (22) both |k 2 | and k 2 T can be taken to be ∼ O (m 2 ) Q 2 and hence dropped. The approximated parton momentum,k, is then parallel to the hadron momentum, where ξ =k + /P + is the fraction of the target momentum carried by the struck parton.
These steps for approximating the partonic momenta are justified in the standard derivations of collinear factorization, as discussed for instance in Ref. [12]. The factorization approximations make no reference to the target mass, so none of the approximations of the previous section are necessary to move forward with a factorization derivation. The structure tensor for the target parton in the factorized subprocess has a form similar to that of Eq. (9), but with P µ replaced byk µ , where F i are the corresponding structure functions for the parton. In analogy with the scaling variables for the hadron, herex N is the partonic version of the Nachtmann variable x N , as the natural generalization of Eq. (3), andx Bj is the obvious generalization of Eq. (4), Since for massless partonsk 2 = 0, the MTA is automatic for the partonic structure tensor, andx N =x Bj . Using the notation of Eq. (20), but now for the partonic target, the partonic structure tensor can be written as where F i are the partonic versions of the massless structure functions of Eq. (20). The factorization theorem, Eq. (1), now in terms of hadronic and partonic structure tensors, can be represented as For brevity here we have suppressed the dependence on the renormalization group scale Q in the PDF f (ξ), but have included the explicit ξ argument ofk(ξ) to emphasize that the plus component of the target parton is related to the hadron through the momentum fraction ξ.
Applying the projectors in Eqs. (11) allows factorization to be written in terms of structure functions, still without the MTA, where from Eq. (26) one hasx Bj (ξ) = x N /ξ. For the lower limit of the ξ integration, the minimum ξ occurs when (k + q) 2 = 0, which gives ξ min = x N . Thus, without kinematical target mass approximations, the factorized expressions for the structure functions are The errors here arise entirely from assumptions about the smallness of intrinsic parton scales; there are no x 2 Bj M 2 /Q 2 types of errors since no MTA has been made. The second lines of Eqs. (30a) and (30b) define the "AOT structure functions", F AOT i , as the factorized structure functions with exact external kinematics [15], and this prescription for taking target masses into account will be referred to as the AOT method. (Note that the notation in Eqs. (30) differs from that in Ref. [15], whose focus was more on the treatment of heavy quark effects rather than on kinematical errors.) If, in addition, x N is expanded in powers of x 2 Bj M 2 /Q 2 , then Eqs. (30) become The expressions in Eqs. (30) are the most immediate results of a factorization derivation of the style of Ref. [12], and the factorized terms on the right-hand-side can be considered first term on the right will be referred to as the "factorized massless target approximation" (FMTA), since it just combines standard factorization with the MTA. If we wish to keep kinematical target mass effects, we will simply maintain Eqs. (30).
In order to make the various approximations very explicit, the discussion in the last two sections of the basic theoretical set up has been much more detailed than what is usually found in the literature. This has required the introduction of a number of new notations for structure functions, which is useful to briefly summarize here: • Hadronic structure functions, which are represented by the Roman font F i , are functions of the independent variables x Bj and Q 2 ; however, since it is ultimately convenient to express them in terms of x N and Q 2 , we write x Bj explicitly as a function of x N and M 2 /Q 2 as in Eq. (9).
• The hadronic tensor can be re-expressed in a different basis of Lorentz vectors, by using P µ rather than P µ to define the corresponding structure functions F i in the massless basis, which we distinguish by the tilde [" "] symbol.
• When this is combined with the approximation x Bj (x N , M 2 /Q 2 ) → x Bj (x N , 0) we obtain the F i (x Bj (x N , 0), Q 2 ) structure functions evaluated as in Eq. (18).
• The script notation for the structure functions F i is an abbreviation for the special case when M 2 /Q 2 is set to zero in x Bj (x N , M 2 /Q 2 ), as in Eq. (20).
• A hat [" "] on a structure function denotes a massless and on-shell partonic target.
Note that structure functions in Roman font with a hat ( F i ) and in script font with a hat ( F i ) are identical, sincek 2 = 0. Also, partonic structure functions are identical with (the partonic analogues of) either the W µν [Eq. (9)] or W µν [Eq. (18)] bases, since the target parton in the hard part is always massless and on-shell. For many subsequent practical applications some of these notations will be redundant; however, since they make the different layers of conventions and approximations very explicit, they will be useful for our present purposes.
To conclude this section, let us also summarize the key observations: (1) There are two independent types of approximations. One is the purely kinematical approximation described in Sec. III, with errors suppressed by powers of x 2 Bj M 2 /Q 2 . It is independent of whatever theoretical techniques might be used to actually calculate the structure functions. The second approximation is the factorization theorem in Eq. (28), with errors suppressed by powers of m 2 /Q 2 , where m 2 is a typical scale associated with intrinsic dynamical properties of partons, such as their virtualites.
(2) The MTA is not necessary for deriving collinear factorization. The relationx Bj = x N /ξ in Eq. (26) is usually automatically approximated to x Bj /ξ, but this is not needed. One may simply stop at Eqs. (30) and view the MTA application that leads to Eqs. (31) as ancillary.
(3) The standard factorization derivation, as embodied in the AOT method, automatically gives x N instead of x Bj as the natural scaling variable for the structure functions (neglecting logarithmic Q dependence from higher orders in α s ).
Before concluding, let us also mention that a number of other prescriptions for dealing with the effects of a nonzero target mass on kinematics have been proposed in the literature, but generally these impose extra assumptions on the dynamics. We discuss these in more detail in Appendix A. Having reviewed the mathematical statement of factorization in the presence of target masses in detail, and the corresponding expressions for the structure functions, in the next section we turn to the question of the physical interpretation of an observed improvement from target mass effects.

V. WHEN ARE TARGET MASS KINEMATICS RELEVANT?
The most straightforward and correct approach to computing the inclusive DIS structure functions is to simply avoid introducing unnecessary kinematical errors by choosing to keep target momentum exact and applying the AOT expressions for factorization in Eqs. (30).
A question of interpretation remains, however; without special knowledge of the target structure there is no reason a priori to expect the powers of x 2 Bj M 2 /Q 2 from purely kinematical approximations to be any more important than other power-suppressed corrections.

A. Scattering from subsystems
To interpret an observed phenomenological improvement obtained by using the AOT method instead of the FMTA, consider several generic scenarios for scattering from an extended target that could reveal a nontrivial relation between target mass effects and general properties of hadron structure. Consider, for instance, that if the target is a composite object (the precise nature of which need not be specified at this stage), then the sum of scattering amplitudes may described as occurring off subsystems of the target, as depicted in Fig. 2. We leave the nature of the dynamics completely unspecified at this stage and only assume that diagrammatic arguments apply generally. To be completely general, we also allow for the possibility that the lower (nonperturbative) blob is empty so that scattering can occur off the entire target as a whole.
To be quantitative, we define the generic subsystem to have a momentum before the collision parametrized by the four-vector where the squared transverse mass m 2 T ≡ p 2 + p 2 T denotes the sum of the virtuality p 2 (which could in principle be negative) and transverse momentum p 2 T of the subsystem, and X = p + /P + is the light-cone fraction of the target carried by the subsystem. The collision with the exchanged virtual photon produces another system of particles with invariant mass- Such a system need not be physical and could be off-shell; for example, it could be a part of a hadronizing string. Without loss of generality, we may describe the total lepton scattering amplitude for the whole target A tot (P, q, l ), which in general depends on three variables (chosen here to be P , q and l ), in terms of the amplitude for scattering off the subsystem, To connect to the total amplitude A tot , the subsystem amplitude needs to be integrated over all components of p, weighted by a function that characterizes the four-momentum distribution of the subsystem in the overall target.
To avoid confusion in what follows below, it is important not to view the diagram in Fig. 2 as the sort of "region" diagram common in factorization derivations [12], but rather as a topological representation in which the blobs are not necessarily characterized by any particular (small or large) momentum. The blobs simply denote an arbitrary subgraph assignment for some graphical contribution to the amplitude; some lines are routed through the (upper) photon-subsystem part of the graph, while others are diverted through the (lower) part of the graph connected to the target.
Such organization does not achieve much of interest until we pose questions about possible relationships between the total target and subsystem momenta, P and p. If we find that there is an assignment in Fig. 2 such that p 2 T , m 2 T Q 2 for typical values of p 2 T and m 2 T , then up to power-suppressed errors the amplitude for scattering from the subsystem becomes a function of X only, where m 2 refers to p 2 T or m 2 T . The entire factorization derivation can then be performed for the sub-amplitude A p (X, q, l ) rather than for the total amplitude A tot (P, q, l ).
In general the invariant mass v 2 varies between small values (≈ 0) and large values (of order Q 2 or larger). In the standard QCD factorization paradigm, large-v 2 behavior is describable by perturbative calculations. One can therefore define an approximate invariant mass squaredṽ 2 of the final state subsystem which is calculated by approximate methods where δv 2 is the correction needed to recover the exact v 2 value. The approximate invariant mass squaredṽ 2 may vary from zero to O (Q 2 ), while δv 2 is of the order of a typical small scale comparable to p 2 T and m 2 T . Expanding X in terms of these variables, we can write and, further expanding the Nachtmann variable x N , the light-cone fraction becomes If the typical values of small mass scales associated with the interactions between subsystems (p 2 T , m 2 T and δv 2 ) are totally negligible, but x 2 Bj M 2 is comparatively large, then the expansion in Eq. (36a) is an improvement over the expansion in Eq. (36b). In other words, in the limit of large Q, provides a better approximation than In both of these cases, the connection between X and external observables has lost any sensitivity to the details of interactions between subsystems. The only dependence on dynamics is throughṽ 2 , which is calculable in factorization and perturbation theory. Suggestively defining 1 +ṽ 2 Q 2 ≡ from a single, isolated perturbative quark or gluon, as these can emit large amounts of collinear and soft radiation. Moreover, a perturbative quark has virtuality that ranges up to O (Q 2 ). A system of collinearly propagating quarks and gluons that are nearly massless and on-shell cannot be described purely in terms of short-distance, perturbative propagators. At the other extreme, the p 2 T , m 2 T x 2 Bj M 2 condition also cannot arise when all or most of the lines in Fig. 2 are routed through the upper part of the diagram, leaving the blob in the lower part of the diagram completely empty, which would correspond to m T ∼ M .
The only way, therefore, to consistently arrive at a scenario whereby p 2 T , m 2 T x 2 Bj M 2 , and thus Eq. (39a) (in terms of x N ) be an improvement over Eq. (39b) (in terms of x Bj ), is if the target consists of more than one separate, low-invariant mass (relative to x 2 Bj M 2 ) subsystem that can play the role of the lines entering the upper blob in Fig. 2. To avoid pushing |p 2 | too high, the interactions between subsystems need to be reasonably weak.
While the individual subsystems necessarily need to have a small typical invariant mass |p 2 | relative to x 2 Bj M 2 , each subsystem can involve internal interactions that involve scales much larger than p 2 T , m 2 T , δv 2 , but still much smaller than Q 2 . Therefore, it is only the scales involved in the interactions between subsystems that need to be very small in order for the above argument for the usefulness of the AOT method to be valid.
Our general conclusion is that any observed improvement in the theoretical description of scattering that comes from using Eq. (39a) instead of Eq. (39b) is suggestive of a hierarchy of "clustered" structures within the target, representing correlated subsystems of strongly interacting particles. We stress that we are totally agnostic about what those clusters might be; our observation is simply that, kinematically, some sort of clustering is preferred. Thus, an improvement in the phenomenological description using the AOT method can be interpreted as evidence that scattering occurs off a collection of weakly interacting subsystems (since p 2 T , m 2 T and δv 2 must be small relative to x 2 Bj M 2 ), while a failure to observe any improvement suggests a more complicated type of scattering. (Some of this also echoes earlier discussions of TMCs in DIS at low energies, such as in Ref. [3], see pg. 325, where the scale M 0 there is analogous to the mass m used in the present work.) A subsystem can in general be any nonperturbative system, consisting of one or more interacting particles, whose internal interactions are stronger than interactions with other subsystems in the target. The subsystem could, for example, be colored or colorless; for the latter, we notice that for a nucleon target the region of kinematics where the x 2 Bj M 2 /Q 2 corrections are important corresponds to the nucleon resonance region, and the subsystems might be a collection of hadrons, such as nucleons and pions. However, the exact nature of the target or its subsystems and their interactions is not relevant to our discussion.
The above argument is very general, since it only relies on the kinematics of scattering off subsystems in a target, and the assumption that scattering from the composite object can be described in generally diagrammatic terms. In particular, it applies to arbitrary orders in perturbation theory. In fact, arriving at Eqs. (39) does not even require factorization or partonic degrees of freedom specifically. It only states that, if scattering occurs off weakly interacting light and nearly on-shell subsystems in a heavier target, then the cross section at a particular v 2 becomes a function of x N /ξ, where ξ is either 1 or is obtainable from large-ṽ 2 methods.
An example of such a scale hierarchy could be nuclear targets, where the subsystems correspond to nucleons; the hierarchy arises because interactions between nucleons are much weaker than the typical interactions binding quarks and gluons inside the nucleons [21,22].
Other examples may be nucleons coupled to soft pseudoscalar mesons through chiral dynamics, which can give rise to unique nonperturbative features in sea quarks in the proton [23][24][25][26][27].
A possible hierarchy with explicit color degrees of freedom could involve partons clustered into constituent quark-like subsystems [28,29]. Conversely, an example of a target where one would not expect an improvement would be the case of a hadron target whose mass comes almost entirely from a single point-like quark, such as a heavy quark hadron. We stress again, however, that our arguments here do not rely on any assumptions about dynamics of the composite object or the nature of its subsystems, but only on the kinematical considerations associated with target mass improvement.

VI. CONCLUSION
In this paper we have presented a detailed description of the basic structure function analysis of deeply inelastic scattering in the context of QCD factorization, fully taking into account hadronic masses in order to give clarity to the notion of "purely kinematical" mass effects. Even when clearly stated, however, the meaning of an improvement in the theoretical description of the scattering process from purely kinematical effects of the target mass begs for a physical interpretation.
The discussions in Secs. III-V make clear that an improvement is natural if factorization is understood to apply to scattering off a small invariant mass subsystem or cluster inside a composite target. Models of the nucleon with multiple scales and a clustering structure imply a particular kind of phenomenological prediction -that standard collinear QCD factorization, in the form of AOT framework for treating target masses with exact external kinematics, can be extended to smaller Q and larger x Bj than might otherwise be expected from perturbative QCD arguments. In the limit of large Q, with all other scales fixed, and assuming x Bj M ≈ Q, it is the first terms on the right hand sides of Eqs. (30a) and (30b) that give the asymptotic behavior. The clustering hypothesis suggests that, as Q decreases, the power corrections initially come mainly from switching between x Bj and x N in the usual factorized expressions, and also accounting for overall kinematic factors such as in Eq. (30b).
An interesting consequence is that the degree of purely kinematical improvement found by keeping the target mass can be viewed as probing the degree of clustering in the target.
To quantify this, it will be interesting to investigate how much improvement can be expected within specific models of the target. This way of viewing the target mass effects suggests a variety of future directions for research.
From phenomenological and global QCD analyses of deep inelastic lepton-nucleon scattering data, it is already well established that treatments of the target mass that switch x Bj to x N significantly improve the description of the data and extend its range to lower Q and larger x Bj values [3,5,[30][31][32][33][34][35]. On the other hand, clear room for refinement exists, for example to distinguish between precise implementations of TMCs that have been proposed in the literature [18][19][20][36][37][38][39][40][41]. Also, upcoming experiments will allow for comparison between different target structures, including pions, kaons, and nuclei [6,[42][43][44]. While the discussion in the present work has for simplicity been restricted to a single flavor, the generalization to the more realistic case of multiple flavors is straightforward. Moreover, the treatment of structure functions in Secs. II through IV can be directly extended to spin and polarization dependent structure functions. This will be important since the extraction of certain spin dependent effects can be especially sensitive to target mass effects [45][46][47][48][49][50]. We leave these interesting and important topics for future consideration.

ACKNOWLEDGMENTS
We thank J. C. Collins for useful discussions. This work was supported by the U.S. Throughout this paper we have adopted what could be viewed as the most natural meaning of a "purely kinematical correction"; namely, a correction that is totally independent of any assumptions pertaining to the dynamics within the target. The MTA from Sec. III accounts for all such approximations that one encounters in the context of standard collinear factorization in DIS. The purely kinematical target mass correction is therefore uniquely of the form derived by AOT [15] (see Sec. IV), since this is merely the combination of the MTA and standard factorization, which is independent of target mass kinematics. Any other corrections must involve at least some set of additional assumptions about parton dynamics.
In the literature there exist a number of other prescriptions that are sometimes described as "purely kinematical" target mass corrections, but which in various ways differ from the AOT approach. Probably the best known of these is the treatment by Georgi and Politzer [36] based on the operator product expansion (OPE). (For extensions to the polarized case see Refs. [45][46][47][48].) Here the expressions for target mass corrected structure functions contain extra terms involving integrals of structure functions, which arise from additional constraints or assumptions that are beyond the purely kinematical corrections implicit in the AOT approach. As discussed by Ellis, Furmanski and Petronzio [38], and more recently by D'Alesio, Leader and Murgia [51], the origin of the additional integral factors is the constraint that the struck partons inside the target correlation function should be exactly massless and on-shell, for all longitudinal momenta and for all transverse momenta.
Absent some exotic dynamical mechanisms within the target, this appears to be a relatively strong assumption, which in itself is not a necessary one for the standard derivation of collinear factorization.
Another way to understand the difference between the AOT approach and the OPE-based prescription is to note that in the latter the kinematical TMCs that are kept are only those that are relevant for a leading twist treatment, while kinematical corrections associated with higher twists are neglected. This type of assessment of O (m 2 /Q 2 )-type errors runs the risk of entangling the O x 2 Bj M 2 /Q 2 target mass corrections with those from other sources. By refraining from introducing O x 2 Bj M 2 /Q 2 -type errors from the outset, the direct method used by AOT has the advantage of including all kinematical target mass effects regardless of twist. It is worth emphasizing here that modern derivations of factorization do not need to use the OPE, but rather can be formulated as direct, arbitrary-order expansions in powers of 1/Q 2 [12]. An added benefit of the direct method, which can be argued to be the more rigorous one, is that it does not a priori need to entail an MTA.
Still other TMC formalisms have been proposed that also differ from, or go beyond, AOT [19,38]. For example, the Accardi-Qiu prescription [19] uses collinear factorization together with the dynamical assumptions that well-defined target and jet directions exist at rather low Q 2 [52,53] and that the initial state baryon number flows only along one such direction [54]. This relies on a very literal matching between virtual partonic states and a particular final state distribution of hadrons, which goes beyond the standard factorization paradigm [1,12] but regulates the behavior near the kinematical threshold at x Bj = 1.
The direct factorization approach can also help to contextualize the so-called "threshold problem" [36], which is the observation that the structure function for nonzero target mass in the OPE derivation has support at x Bj = 1 (where kinematically only elastic scattering should contribute) and can be nonzero in the unphysical region x Bj > 1 (up to x N = 1) [55].
This has led to various proposals for modifying the target mass corrected structure functions such that they have support only in the physical region [40,41,51,[55][56][57]. The solution to the "threshold problem" from the factorization perspective is simply that the conditions for which QCD factorization itself is valid break down as x Bj → 1. While the structure functions are defined through Eq. (10) for all x Bj ≤ 1, and the parton distribution f (ξ) exists for all parton momentum fractions ξ ∈ [0, 1], the factorization formulas in Eqs. (28) and (30) relating the two receive increasingly large corrections at large x Bj that render the perturbative expansion in powers of both α s and 1/Q 2 no longer a useful one. Improvements beyond this require more sophisticated methods for treating the large-x Bj region than what is available in the standard factorization treatment.