Unraveling Generalized Parton Distributions Through Lorentz Symmetry and Partial DGLAP Knowledge

.

Introduced in the 1990s [1][2][3], Generalized Parton Distribution (GPDs) are today at the core of hadron structure studies, both experimentally and theoretically.This enthusiasm can be understood, as GPDs encode the multidimensional structure of hadrons [4,5] (2D in the transverse plane and 1D in longitudinal momentum space).They are also connected to the energy-momentum tensor, allowing in principle to extract the contributions of quarks and gluons to the total angular momentum [6], and also the contributions to the pressure and shear forces within the hadrons [7].
Therefore, since the early 2000s, attempts to extract GPDs from experimental data have been performed (see the examples of [8,9]).However the task remains hard for multiple reasons.First, GPDs are connected to experimental data of deep exclusive processes, like deep virtual Compton scattering (DVCS), which are much harder to measure than inclusive ones, connected to regular parton distribution functions (PDFs).Further, the connection between experimental data and GPDs in standard deep exclusive processes like DVCS, reveals itself non invertible at fixed scale and mathematically ill-posed when evolution is turned on [10,11].The situation might be better for processes involving the production of an additional particle in the final state (see [12][13][14][15][16]).
The difficulty in extracting GPDs comes also from the fact that they have to obey a number of theoretical constraints whose fulfillment is a priori not granted using generic modeling techniques.Among those, let us mention polynomiality [17,18] and positivity [19][20][21][22].A way to fulfill systematically both at the same time was introduced in [23] based on the so-called covariant extension [24,25], and yielded the first evaluation of experimental feasibility of pion DVCS [26].The covariant extension technique was originally developed to recover the Efremov-Radyushkin-Brodsky-Lepage (ERBL) kinematic region from the Dokshitzer-Gribov-Lipatov-Altarelli-Parisi (DGLAP) one, in a unique way, up to a D-term 1 .However, as we will show in this paper, its application is in fact more general and allows one to uniquely continue a GPD known solely in the vanishing and low-ξ region to the entire kinematic domain.This application is important as it allows reconstruction for GPDs based on collider data (see for instance [27]) supplemented 2 with lattice QCD data at vanishing ξ (but finite values of t).In such case, the t-dependent PDFs (or equivalently vanishingξ GPDs) become key quantities, and they are typical matrix elements that can be assessed on the lattice (see for instance [29][30][31]).
In section 2, we start by recalling the mathematics behind the covariant extension, highlighting that both an incomplete x and ξ domain are allowed.In section 3, we check the actual feasibility using two different numerical techniques, Artificial Neural Networks (ANNs) and Finite-Elements Method (FEM), on two models.The first one is derived from lightfront wave functions computations [25] and the second one is the phenomenological parametrization by Goloskokov and Kroll [32].Finally in section 4, we conclude. 1The details about the ambiguities generated by the D-term can be found in ref. [24] and Sec.II. 2 The reader can refer to [28] for the impact of combining experimental and lattice data.

II. POLYNOMIALITY AND INVERSION OF THE RADON TRANSFORM
GPDs are defined as a lightfront projection of a non-diagonal hadronic matrix element of a bi-local operator [1][2][3], and they are usually expressed in terms of three kinematic variables.Namely, the lightfront momentum fraction, x; the skewness or lightfront momentum fraction transfer, ξ; and the squared momentum transfer, t.Concerning x and ξ, the kinematic domain of GPD defines its support with x, ξ ∈ [−1, 1]; the region |x| ≥ |ξ| (|x| ≤ |ξ|) being denominated DGLAP (ERBL) where the GPD can be expanded in Fock space only involving states with the same number N (different number N and N + 2) of particles [17].
This being given, it can be proven for any m-order GPD's Mellin moment that Lorentz symmetry implies [17,18,33] 1 where H (x, ξ, t) denotes the unpolarized quark GPD, which we use here as illustration 3 .This property is widely dubbed polynomiality and has both a deep origin and important implications.Particularly, if the GPD is known for all x and t in a given compact range of ξ, one can calculate Eq. (2.1)'s lhs in there and, a priori, determine the polynomial coefficients in rhs for any m-order Mellin moment.It becomes thus clear that, rooting on Lorentz covariance and on the uniqueness of Mellin moments, the GPD appears defined in its entire support from its knowledge within a restricted kinematic domain.Moreover, the GPD reconstruction can be supplementary constrained by profiting the highly non-trivial interplay between the structure of the DGLAP and ERBL regions expressed by the polynomiality property in Eq. (2.1).In the following, we shall elaborate further on this, and exploit it, capitalizing specially on the related mathematical literature devoted to the incomplete data problem in computerized tomography [35].
To the purpose of this work, ultimately capitalizing on the result from Eq. (2.1), the momentum transfer t can be fixed and taken to be a constant parameter all throughout the analysis; any conclusion would then be irrespective of its particular value.Thus, for the sake of simplicity, it will be considered implicit from now on and omitted in the notation.Furthermore, time reversal symmetry can be also invoked and seen to impose that any sensible GPD is an even function of ξ, thus entailing that no odd power is allowed in Eq. (2.1): c (m) 2n+1 = 0 for any integer n; and, particularly, c (n) n+1 = 0 for any even integer n.In the following we shall concentrate on the ξ ≥ 0 region.Then, as it has been carefully discussed in Ref. [24] (see also references therein), given a function D(z) with support z ∈ [−1, 1], defined by its Mellin moments m+1 , one can prove that any m-order Mellin moment of H(x, ξ) − sgn(ξ)D(x/ξ) is a polynomial of degree m on ξ; otherwise said, it fulfills the Ludwig-Helgason consistency condition entailing that it is in the range of a Radon transform R [36], where Ω = {(β, α) ∈ R 2 /|α| + |β| ≤ 1} is the support for the distribution F whose Radon transform RF is in the physical domain of (x, ξ).D is an odd function in its single argument, as it clearly comes from its definition in terms of Mellin moments which are zero for even order.Strictly speaking, the Ludwig-Helgason condition and the Radon transform are expressed4 in terms of the distance s and the polar angle ϕ (e.g., see Ref. [37]), which can be mapped into the usual GPD variables x and ξ through the transformations x = s/ cos ϕ and ξ = tan ϕ, such that the m-order Mellin moment of [H − sign(ξ)D]/ cos ϕ is an homogeneous polynomial of degree m with terms cos m−k ϕ sin k ϕ.This change from mathematical to physical variables has no consequence for our purposes.
One can then introduce a second distribution G(β, α) = δ(β)D(α) such that RG(β, α) = D(x/ξ)/|ξ|.Consequently, the GPD H is given as RF (x, ξ) + ξRG (x, ξ) = H (x, ξ), with F (G) being even (odd) in α.Then, owing to its fulfilling of condition (2.1), any sensible GPD can be represented as the Radon transform of the two 2-dimensional distributions F and G, widely named double distributions (DDs).However, as exposed in Refs.[38,39], the pair of DDs (F, G) does not offer a unique representation for the GPD H(x, ξ): there exists a family of transformations, usually called scheme transformations, that make possible to redefine F and G for the same GPD.A judicious choice of the transformation (see for instance appendix B of Ref. [24]) allows for the two DDs to rely on one single DD h, the GPD then reading which corresponds to the so-called P -scheme [40].At this point, it is worthwhile highlighting that, if (1 − x)Rh(x, ξ) fulfills the polynomiality condition (2.1), also does, where D − (z) is any odd function in its single argument, with support z ∈ [−1, 1].The latter is important for our purpose of facing the inverse problem, as H and H only differ by the contribution from the line β = 0 within the DD support, which impacts only on the GPD kinematic domain |x| < |ξ|.
Let us, at this point, consider a GPD for which its knowledge is restricted to a given kinematic domain.Then, rooting on a proof of uniqueness for a DD whose Radon transform reproduces the GPD on such a restricted domain, the inversion of the Radon transform can be featured as a sensible procedure for a covariant extension from restricted to full GPD kinematic ranges.So has been suggested and illustrated in Refs.[24,25,37], where GPDs known within their DGLAP kinematic domain were successfully extended to ERBL through the Radon transform of the DD previously obtained by inversion.
More generally, the problem of inverting the Radon transform has been exhaustively studied in the mathematical context of computerized tomography [35]; for which, in most realistic scenarios, the Radon transform is known only for hyperplanes belonging to a particular subset of the whole domain 5 .Although explicit inversion formulas remain mostly not available, several theorems prove the uniqueness of the inverse function, depending on the properties of the functions involved and given the data of the Radon transform corresponding to specific subsets of hyperplanes.This implies that if a particular function is found, whose Radon transform reproduces the data known for the subset of hyperplanes for which the theorems apply, then that function is guaranteed to be unique.The relevant theorem on which the GPD covariant extension relies is the support theorem by Boman and Todd-Quinto [41] that, for the sake of completeness, will be reproduced below: (2.5) This theorem applies to f ∈ E ′ (R n ), i.e. to distributions with compact support.It can therefore be applied to the Radon transform of a DD, whose domain is the compact skewed square Ω.For this two dimensional case, the argument of the function corresponds to the DD variables, i.e. z ≡ (β, α) and θ ∈ S 1 ≡ (cos ϕ, sin ϕ), which correspond to (θ, s), the Radon transform variables6 usually employed in mathematical literature and which convert to the standard GPD variables (x, ξ) as made above explicit.In this way, the line z • θ = s corresponds to x − β − αξ = 0.
What follows below aims at making transparent the physical implications of the above support theorem, exhibiting that it guarantees the uniqueness of the distribution F (β, α) from Eq. (2.2), if RF (x, ξ) is fixed by the knowledge of H(x, ξ) on a subset of the DGLAP region.Notice first that Eq. (2.2)'s lhs also includes a term D(x/ξ) which, anyhow, has to be evaluated outside its support for any DGLAP kinematic configuration and, hence, vanishes.However, this D-term raises a well-known ambiguity as, despite the uniqueness of F (β, α), the GPD H(x, ξ) remains undetermined within the ERBL region by the impact of such a term which, as shown above, corresponds to a contribution with support only on the line β = 0.The same is true for DDs in any scheme and, particularly, is made apparent by P -scheme Eqs.(2.3,2.4), which exhibit how DGLAP data cannot unambiguously fix the ERBL GPD.On the other hand, since H(x, −ξ) = H(x, ξ), we can consider ξ ≥ 0, implying ϕ ∈ [0, π 2 ); and expose that the whole DGLAP region |x| > ξ, also expressed as |s| > sin ϕ, is not a connected set.However, this makes no limitation for the application of the theorem, because it can be separately applied for the two disjoint, open connected regions {|x| > ξ} ∩ {x > 0} (particle contribution) and {|x| > ξ} ∩ {x < 0} (anti-particle contribution).
Indeed, without lack of generalization, we can focus on the region x > 0 and consider the following subsets (see the right plots of Fig. 1) The square being a graphical representation of the domain of GPDs, green triangles correspond to the ERBL region and red areas to the DGLAP one.The hatched area represents, for different values of λ in (2.6), the portion of the overall GPD domain which is taken as input in the reconstruction procedure.[Right panel] The skewed square being the domain over which DDs have support, the Radon transform Eq. (2.3) can be interpreted as an integration over straight-lines, α = (x − β) /ξ, with α0 = x/ξ and β0 = x as alpha-and β-intercept, respectively.The support theorem guarantees the uniqueness for the DD within the region swept by all the straight-lines such that (x, ξ) ∈ O λ ; namely, ξ/λ < x ≤ 1 with 0 ≤ ξ < λ.Then, for any fixed ξ, one is left with parallel lines of slope 1/ξ bounded by 1/λ < α0 ≤ 1/ξ and, correspondingly, ξ/λ < β0 ≤ 1; schematically displayed in the plot by the shadowed area between the two parallels crossing the bounding intercepts (red circles).Therefore, when varying ξ from 0 to λ, the entire DD domain (β > 0) is covered.
where λ ∈ (0, 1].One should realize that {β > 0} = {z • θ = s| (θ, s) ∈ O λ }.Namely, the set of lines on the (β, α) plane corresponding to O λ , i.e. the lines α = (x − β)/ξ, with ξ ∈ (0, λx), sweep the half plane β > 0 irrespective of the value of λ (this is graphically illustrated by the right panel of Fig. 1).The support theorem therefore implies that if the GPD H(x, ξ) is uniquely given on the DGLAP subset O λ , the DD is uniquely fixed on the half plane β > 0. By the same argument one can uniquely fix the DD on the β < 0 half plane by knowing the GPD on the corresponding O λ subsets with x < 0.
Then, for any λ ∈ (0, 1], the DGLAP region x > 0 fixes uniquely the DD for all β > 0 and, separately, the DGLAP region x < 0 does the same for all β < 0. Furthermore, one can simply consider a positive α in both cases, owing to the even parity of the DDs F (or h in this scheme).On the other hand, as above discussed, the axis β = 0 remains undetermined by the single implementation of DGLAP data.The choice of λ = 1 entails that the GPD is uniquely determined, with the exception of the Radon transform of terms with support only on β = 0, by its whole DGLAP knowledge.The covariant extension from DGLAP to ERBL kinematic ranges introduced and discussed in Refs.[24,25,37,42] relies on this remarkable output.However, more importantly and not thus far highlighted, is that the same is formally true for 0 < λ < 1. Namely, the GPD covariant extension can be made from its knowledge on proper subsets of the DGLAP region where the skewness parameter is restricted to an arbitrarily small range, i.e. ξ ∈ [0, λx], with a positive and arbitrarily small value of λ.Beyond its formal interest by itself, this is very relevant for physical purposes, since in actual experimental setups one usually has access only to small values of ξ; and can be also very helpful for the completion and GPD reconstruction from lattice data.
In the following sections we will numerically examine the soundness of this conclusion by trying to invert the GPD Radon transform having as inputs its data on these restricted subsets O λ and subsequently reconstructing the GPD on its whole domain.

III. PROOF OF CONCEPT AND FEASIBILITY
If a kinematic completion of GPDs is formally possible, a proof of concept and feasibility yet remains to be studied.We therefore assess this problem by following two approaches: (1) Applying the FEM discussed and tested in Refs.[23,24,37,43], Sec.III A; and (2) using ANNs for the extraction of DDs, Sec.III B. The main purpose for this double implementation, with two very different methods to invert the Radon transform, stems from our aim of discarding systematic effects related to a particular inversion procedure.To this goal, we shall test the results obtained with both against two benchmarking models for GPDs 7 (Tab.I): The algebraic model for the pion GPD first constructed in [25] and the renowned Goloskokov-Kroll model [32,[44][45][46].A thorough inter-comparison of the two approaches, as they apply to our problem, is out of the scope of the present work.
TABLE I: Syllabus -Summary of the models and strategies explored in this study.

A. Finite-Elements Method (FEM)
A suitable numerical approach to the Radon transform inverse problem is to proceed through discretization and interpolation in the space of DDss, what in the following will be referred to as FEM-strategy.This methodology was first presented in [24] and further elaborated in [23,43] where its usefulness in the kinematic completion of GPDs was exposed, setting the ground for a pioneering exploratory study of DVCS on pions [23].As an outcome of these studies, a C++ implementation of this tool is already available within the PARTONS framework [47].However this work employs a slight extension of that version.We thus find it advantageous to briefly expose in the following the main ideas behind this program.The interested reader can find detailed discussions in [24,26,37,43].
The starting point of the FEM-approach is to introduce a discretization of the DD domain, Ω = e Ω e , and embed in it a set of interpolating polynomials such that the DD may be approximated within the discretized domain.The accuracy of this approximation is essentially determined by the choice of the interpolating functions as well as the mesh itself.Inspired by the method of finite-elements widespread in e.g. the study of partial differential equations, we choose to perform piecewise polynomial interpolation8 , allowing to approximate the DD as where P e (β, α) are the polynomial interpolants chosen for each element Ω e , which can in turn be expressed as where k labels the items of a set of interpolation nodes n j ≡ (β j , α j ) appropriately distributed over Ω e ; and where v j (β, α) represents the Lagrange interpolating polynomials associated to the node n j .In this work we take them to be degree-two polynomials 9 .Finally, h j is the value of the DD at the interpolation node n j , so we may write the integral problem of Eq. (2.3) as Importantly, because the interpolating basis functions v j are simple second order polynomials, the integrals can be evaluated in closed form, turning the original problem into a set of algebraic equations connecting the DD and its associated GPD.Furthermore, choosing pairs (x i , ξ i ), a system of algebraic equations can be built to represent the Radon transform problem: . . .
In particular, one may choose the sampling points (x i , ξ i ) within the DGLAP region, where the GPD is assumed to be known.Or even better, on a subset of that domain such as the one considered in Eq. (2.6).When doing so, a system of algebraic equations where the only unknowns are the values of the DD at the interpolation nodes arises.As explained in [23] its solution can be found using a least squares minimization where boldface letters represent the vectors (matrices) of Eq. (3.4) in compact form.Thereupon, one may finally reconstruct the DD according to Eq. (3.1) (or Eq. (3.2)) and employ it to evaluate the GPD within the ERBL region through a Radon transform (in its discretized version): where R is a matrix representation of the Radon transform integral operator built following the procedure described before and evaluated for (x i , ξ i ) chosen within the ERBL region.
In this way, a FEM-inspired strategy allows to find a solution for the inverse Radon transform problem.This approach being grounded on the theoretical construction of Sec.II, the covariant extension of GPDss from the DGLAP to the ERBL region turns feasible.Nonetheless, even if the input DGLAP GPD is known to arbitrary precision10 , the procedure is affected by uncertainties.Provided that the solution to our problem exists (Sec.II), the construction of this section reveals (1) the choice of interpolating polynomials, (2) the mesh definition (in particular its size) and (3) the number/distribution of sampling lines (x i , ξ i ) in Eq. (3.4) as the main sources of possible deviation from the actual solution.In practice, we have tested three types of polynomial interpolants: Zero, first and second order polynomials.Optimization in the sense of trading performance of the implementation for accuracy of the solution lead us to use degree-two polynomials for all calculations.With regard to the mesh, a Delaunay triangulation was built over the DD domain [23,24], the average size of the elements being fixed to a model-specific optimal value found as to balance precision (expected for finer meshes) and stability (precluded in the same direction) [48].Coincidentally, the meshes revealing optimal for the study of the algebraic-and GK-model where found to be identical and made up from seventeen elements of average area 0.03 (arbitrary units).Finally, the number N of sampling lines was set according to the criterion of [23] where it was argued that the actual answer reveals in the least squares solution Eq. (3.5) when N sample is taken to be at least four times the number of elements in the mesh.Here we took N sample = 6n.These three sets of parameters being thus fixed, the solution to the inverse Radon transform problem can be found.From that point on there remains one single parameter whose effect on the solution has to be assessed: The distribution of sampling lines within the (restricted) DGLAP region.Here we follow once again the prescription of [23]  to draw them randomly following a uniform distribution x, ξ/x ∈ [0, 1].Therefore, the distribution of lines leading to the solution of the inverse Radon transform varies among runs.In lack of a criterion to fix them, such constitutes the main source of statistical uncertainty in our calculation.Here we decide to estimate it through the production of replicas following an strategy similar to that in [49]: We generate 250 sets of sampling lines and solve our problem for each of them.Thus one is given with the corresponding number of solutions to our problem, say DDss, called replicas.Then, for a given point, say the interpolation nodes, we check for outliers following a 5σ-rule [50].Finally, the replicas identified as outliers in any of the points are removed from the uncertainty estimate, which is given by the standard deviation of the population of replicas.Accordingly, all our results (Figs. 2 to 5) are given as a solid line, representing the mean value; and the corresponding one-sigma band.

Algebraic model
Employing the setup described in the previous paragraphs, the algebraic model can be extended from the DGLAP region, cf.Eq.A4, to the ERBL domain.As in Sec.III B, the case ξ = 0.5 has been used to illustrate our findings.Figs. 2 and 3 shows the results obtained as the solution to our problem using the region O λ as the input domain.Three values of λ have been explored, ranging for λ = 1 (corresponding to the entire DGLAP region) down to λ = 0.5 and 0.2.In all cases, the agreement between the actual GPD and the reconstructed form is astonishing, showing that indeed the FEM strategy allows to perform the covariant extension of GPDs from the DGLAP to the ERBL region with a minimal knowledge of the input GPD.These results agree with those found in Sec.III B using the ANN implementation, again supporting the thesis of this study.
The accuracy of the solution is remarkably good.We can get a better grasp about it by looking at the reconstructed DD h FEM (β, α) against the expectation, Eq. (A5): left panels of Figs. 2 and 3. Again, both are essentially indistinguishable.The reason for that resides in the structure of the DD generating the algebraic model, which is a second order polynomial in the kinematic variables, Eq. (A5); just as they are the interpolants P e (β, α) introduced within the FEM strategy.In that way, the exact solution is accidentally put by construction in the range of the discrete Radon transform of Eq. (3.4), meaning that in the absence of noise, the exact result can be found.In modern terms, this situation resembles that found in Bayesian reconstruction when a default model is introduced as prior information.In the language of Bayes theorem, such a default model represents the most probable answer in the absence of any data.When the chosen default model indeed represents the input data, the functional space is strongly dumped, and the chances of finding an accurate solution grow.

Goloskokov-Kroll model
With a proof of concept for the usefulness of the FEM construction in solving the inverse Radon transform problem in a particularly favorable case, challenging our implementation in more complicated scenarios becomes crucial.The case of the GK model constitutes an ideal test ground for three reasons: First, as remarked in Sec.A 2, the GK model is widely employed in phenomenological analyses of GPDs.Second, the coincidence between the basis functions and the actual expression observed in the algebraic model is no longer found.And third, the DD giving raise to the GK model exhibits an integrable divergence at low values of β, challenging even more our implementation.It is precisely this later feature that has triggered a slight modification of the construction employed in the case of the algebraic model, redefining the approximating DD, h FEM as so as to smooth the divergent behavior of h GK when applying the FEM reconstruction.
In practice we have found this redefinition to be key in stabilizing the problem's solution.Because the behavior of the underlying DD is generally unknown, this observation might seem a curse of the present approach.However, we believe that to be only apparent.Two main arguments can be highlighted here: On the one hand, although DDs are generally unknown, one can always rely on the behavior of the input DGLAP GPD to infer information about that of the structure of the underlying DDs.As a simple illustration, one expect that the PDFs present a diverging behavior in the the limit x → 0. One may argue, for instance using the Goloskokov-Kroll model, that such low-x divergences translate into a similar behavior of the DD as β approaches zero.Moreover, these singularities are expected to be integrable in the case of valence distributions (as those being considered here, thus triggering our choice r (β) = 1/ √ β).A second argument is of a practical nature.Indeed, the relevance of the smoothing suggested in Eq. (3.7) can be traced back to the structure of the interpolating basis functions: Here they are simple degree-two polynomials which are unlikely to accurately piecewise-approximate rapidly growing functions such as h GK in the limit of small β.In contrast, the approximating functions employed in our ANN approach (where no special attention was payed to this feature) are more complicated functions of the kinematic variables (β, α), thus hinting a modification of the interpolants P e to be a general possible solution to this problem.Of course, there is no reason why not to upgrade the interpolating polynomials P e to more sophisticated structures capable of accounting for these kind of effects; however we prefer here to keep things simple and proceed using Eq.(3.7).
This being clarified, we can operate as before and explore the covariant extension of the GK model from the DGLAP to the ERBL region.Again we use ξ = 0.5 as an illustration for our results; we study three possible sampling regions O λ , with λ = 1, 0.5, 0.2, as well.The corresponding results are shown in the right panels of Figs. 4 and 5. Again, the overall fidelity of the reconstructed GPD is noticeable.However, this time we see a growing deterioration, not that substantial at the level of the mean value but at that of the uncertainties, as the area covered by the input data set decreases.In particular, deviations seem to be more apparent in the region −ξ < x < 0. This can be traced back to numerical instabilities triggered by the smoothing factor √ β introduced in Eq. (3.7).Indeed, integration over β translates (by means of the delta distribution in Eq. (2.2)) into √ x − αξ, which shall be bounded from below.However, for values x < 0, if numerical precision languishes, the argument might show very small while non-zero values, that being at the origin of the observed impaired behavior.
Further insights can be obtained by again looking at the "intermediate" DD (left panels of Figs. 4 and 5).The results show general good agreement, specially when the entire DGLAP region is given as input information.However, the reconstructed signal rapidly deteriorates as λ decreases.Nevertheless, those deviations wash out after integration to reach the GPD domain, similarly to what will be seen in Sec.III B. This is indeed a very noticeable feature, as the quantity relevant for phenomenology and the one in Lattice QCD is, indeed, the integrated DD; i.e. the GPD.

B. Artificial Neural Newtworks (ANNs)
An entirely different approach to tackle the inverse Radon transform problem consists in fitting the DD by means of an ANN with one hidden layer.The advantages of this strategy are twofold: Firstly, the universal approximation theorem [51] guarantees that this simple network architecture is able to approximate, at arbitrary precision for sufficiently large width, any compactly supported continuous function, therefore providing an unbiased parametrization of the DD; secondly, the optimization algorithms that are used to train the network come equipped with regularization methods, introduced to avoid overfitting, that are particularly useful in this context to overcome the ill-posedness of the Radon transform inverse problem [52,53].
The DD h(β, α) is therefore approximated by the final output of the ANN whose architecture is sketched in Fig. 6, with the two variables β and α as input features.As a function of the input variables, the network's output is denoted h ANN (β, α).The explicit functional form, in terms of the weights (w) and biases (b) parameters is: where a re-scaled variable α ′ = α/1 − |β| has been introduced, that takes values between -1 and 1 (since |α| + |β| ≤ 1) which is supposed to better perform as an input to the sigmoid activation function σ(x) = (1 + e −x ) −1 .N is the number of hidden neurons which is tuned in order to obtain the best possible approximation.The total number of learning parameters is 4N + 1, divided between 3N weights and N + 1 biases.Notice that in Eq. (3.8) the traditional hidden output has been augmented with an identical term with w α → −w α , as it was already implemented in [49], which explicitly enforces the parity condition h ANN (β, −α) = h ANN (β, α).o (2)   . . .FIG. 6: ANN architecture with one hidden layer that parametrizes the DD h(β, α).Here o (L) i stem for output of the i-th neuron in the L-th layer of the network.The zeroth layer is implicitly identified as the input one.
The network is trained using GPD data.Actually, the inputs for the algorithm are the GPD variables (x i , ξ i ), randomly chosen with a uniform probability distribution from the DGLAP region.Each pair of values (x i , ξ i ) corresponds to a line α = −β/ξ i + x i /ξ i in the β-α plane, along which the Radon transform has to be evaluated (integrating h ANN (β, α)) to produce a predicted GPD value H (x i , ξ i ) that is to be compared with the true one H (x i , ξ i ).The network depicted in Fig. 6 may therefore be considered as embedded in a larger one, with two extra layers: An input layer that for each (x i , ξ i ) outputs the batch of (β k , α k ) values positioned along the corresponding line 11 ; and a final output layer that combines the batch of the h ANN (β k , α k ) values to produce H (x i , ξ i ).
The network parameters are updated at each iteration, using the adaptive gradient descent Adam algorithm 12 [54] to minimize the loss function, here chosen to be the Mean Squared Error (MSE): (3.9) A Dropout regularization is implemented [55], where hidden neurons are randomly turned off with a fixed probability rate.It is worth noticing that the ill-posedness of a problem like the Radon transform inversion has been commonly treated using a Tikhonov regularization.It has been shown, however, that these two approaches are essentially equivalent [52,53], the dropout regularization being more effective for deeper networks.
The reader can note that the number of neurons and the choice of activation function play here a similar role compared to the number of cells discretizing space and the choice of interpolating polynomials, in the FEM approach.Yet, one can expect that the convergence and smoothness properties might be different between the two approaches.

Algebraic model
In the case of the algebraic model, the ANN has been designed with N = 100 hidden neurons, tuned by trial and error, and trained with a sample of N sample = 10 4 GPD data points.Without loss of generality, both entries x and ξ are considered positive: The x values are generated within the interval [0, 1], which is the x domain for the quark sector in the DGLAP region, and the associated ξ may remain also positive while covering the entire DGLAP region owing to the parity of the GPD.Still, for the inversion of the Radon transform (2.3), the subset O λ in Eq. (2.6) is fixed by restricting the skewness parameter to the interval [0, λx], with 0 < λ ≤ 1, where the covariant extension is formally achievable, as discussed in Sec.II.
However, despite the support theorem guarantees the uniqueness of the inverted Radon transform from O λ with any λ > 0; in practice, the noise in the obtained DD increases as the range of the skewness parameter becomes smaller.This is exhibited in the left panels of Figs. 7 and 8, where three outcomes for the DD function are plotted, corresponding to the values of λ = {1, 0.5, 0.2}.The values of h(β, α) are shown as functions of the β variable for three different values of α.The error bands associated with the ANN results correspond to a standard deviation from the mean value estimated by training the network from scratch on 50 independent trials (replicas).
The errors increase as λ decreases, in particular closer to the boundary β = 1 − α, where the DD is discontinuous.These somewhat large deviations of the numerical predictions from the exact analytical expression become milder, however, once the DD is integrated over the lines in order to get the GPD predictions.This is shown in the right panels of Figs. 7 and 8, where the results displayed correspond to the particular value of ξ = 0.5, chosen for illustrative purposes.This is noteworthy especially because the main goal of the current application of the support theorem is the GPD reconstruction; namely, the extension of GPD's knowledge from a restricted DGLAP domain to that on its entire support, particularly the full ERBL region.Although achieved by inferring the functional form of the DD, it is actually its Radon transform to be of primary interest here.
The kinematic restriction for the GPD data of the training set O λ has been pushed down to a value as low as  7 with a DGLAP sampling given by ξ ∈ [0, λx] and λ = 0.5.λ = 0.2 in FEM.In this case, the quality of the results for both the inverted DD and the derived GPD is still good for λ = 0.5, as it is shown in Fig. 8, but it becomes strongly degraded for lower values of λ, these results not being then displayed.The main reason for this will be discussed below and comes as a drawback of the GPD behavior for the algebraic model approached with ANNs, presumably requiring further optimization.The GK model, in exchange, will be seen to admit, also for λ = 0.2, a very reliable GPD from a DD approximated with ANNs.

Goloskokov-Kroll model
In the case of the GK model, as shown in appendix A 2, the DD function h GK (β, α) relies on a profile function π N (β; α) as given by Eq. (A7), which fulfills the normalization condition (A9).In the aim of making easier the explicit implementation of this condition, we have chosen this profile to be approximated by π ANN as it results from the ANN output o (2) (see Eq. (3.8)) properly normalized.Thus, the GK DD is accordingly approximated by where f (β, µ) is the parton distribution function given by Eq. (A11).Notice that in Ref. [49], in addition to implementing this normalization condition, the ANN expression (3.8) was modified in order to impose the vanishing of the output along the border α = 1−β, a property which is indeed satisfied by the h GK (β, α) in (A12).We decided however not to implement this constraint explicitly and let the network learn it by itself, considering that the vanishing at the boundary is a particular feature of this model, unlike the normalization condition and the parity in the α variable, which are properties that come from first principles.This decision comes at the cost of a much slower convergence of the optimization algorithm, due to the larger functional space that has to be explored during the training phase.
As opposed to the previous example, the DD function that the network has to learn is now everywhere continuous, since it vanishes at the boundary of its support.This property helps the ANN to better converge to the exact solution, yielding more precise results, as it can be appreciated in Figs. 9 and 10.There, in the left panels, we have chosen to represent the profile function, π ANN (β, α), the direct output from ANNs which the DD follows immediately from as given in Eq. (3.10).Furthermore, a smaller network was used, with N = 25 hidden neurons, and a smaller training data sampling of N sample = 5 × 10 3 GPD values generated in the DGLAP region using Eq.(A13), derived from Eqs. (A11,A12).In the case of a kinematic restriction given by λ = 0.2, the prediction for the inverse function is rather imprecise, although the deviation from the exact result of its Radon transform is not substantial (see the bottom panels in Fig. 10).
As stated above, we do not aim at a careful and conclusive comparison of the two different approaches herein followed to invert the Radon transform; namely, ANNs and FEM.Both of them have been implemented instead to avoid drawing biased conclusions about our methodology that may stem for the suitability of a particular implementation to the models chosen as an illustration.To this aim, we have independently tuned the setups in both ANN and FEM approaches such that they deliver good results when they compare to the exact ones.In particular, we have presented results in which the ratio of ANN to FEM input data is roughly 100.
Notwithstanding, some exploratory studies in the context of approximating the GK model through the ANN approach evidence that still acceptable results, while poorer, could be achieved by reducing the number of input data to 250.However, similar studies around the algebraic model seem to point out an issue arising from the use of non-polynomial activation functions, requiring a large number of neurons to reproduce the polynomial behavior of the algebraic DD.The use of extra neurons being associated to the need of more training data, this observation suggests a limitation of the ANN-based method.One could instead consider different activation functions, better adapted to a polynomial DD, but this would be an ad-hoc improvement relying on prior knowledge of the DD.On the other hand, techniques of data augmentation which are commonly used in machine learning applications could be also implemented.Indeed, we have triggered an analysis with 250 data points and used standard interpolation procedures to generate enough additional data to invert the Radon transform for the algebraic model GPD, thus getting results similar to those obtained with FEM.Nevertheless, this analysis neglects the impact of errors and noise, which will become also strongly amplified by the data augmentation.A careful and systematic analysis, considering noise and ANNs optimization would be needed to perform a reliable comparison with FEM.This is anyhow out of the scope of the present work.

IV. SUMMARY AND CONCLUSIONS
In this work we have demonstrated that, at fixed Mandelstam t, Lorentz covariance as encoded through the renowned polynomiality imposes a strong enough constraint so that the sole knowledge of GPDss at zero-and very-low skewness is enough to have them determined over their entire kinematic range.The formal proof of this result relies entirely on a theorem by Boman and Todd-Quinto [41], which we have revisited here and made it more transparent for the field of GPDs by its formulation in the appropriate language.
We have thus concluded that the knowledge of GPDs for x ∈ [−1, 1] and ξ ∈ [0, λx] is enough to fully constraint them over their entire (x, ξ) range.Furthermore, this idea has been implemented in practice through two different algorithms: One based on ANNs, and a second one using a more traditional strategy using FEM.Both approaches have been tested against standard models for GPDs, showing that accurate reconstructions shall be obtained in practice with input information as restricted as to ∼ 20 % of the DGLAP region.Given the increasing interest in the study of GPDs, we believe this approach to be valuable in complementing the limited kinematic range over which we expect outcomes to be obtained from Lattice QCD simulations (limited by Ioffe-time and hadron momentum accessible, see e.g.[56]) and experimental data.So far, looking towards a practical application of this methodology, the FEM-strategy seems to perform better than the ANN one: The number of inputs required by the ANN approach is ten times larger than the one needed in the FEM approach.The reason is that in the former case, the required information roughly linearly scales with the size of the network.Nevertheless, optimization of the ANN approach can still be achieved, as well as the use of data augmentation techniques implemented.This work being intended at drawing the attention to this novel technique and not to a refined kinematic completion of actual data, added to the lack of reliable GPD extractions, we defer such a comparative study, optimization of the numerical techniques and testing of robustness against noise to a future work.

Algebraic model
Capitalizing on a long effort towards the development of models for parton distributions respecting all of the QCD's basic properties [24,57,[59][60][61][62], the algebraic model for the GPD of valence-quarks in a pion was first presented in Ref. [25].There the authors relied on the overlap representation of GPDs [21,34] to model the DGLAP region in a way consistent with their positivity property [20,22].To that end, the essential ingredient is a parametrization of the hadron's light-front wave function (LFWF) Ψ (x, k ⊥ ; µ); which for a two-body system reads [21] H σ being the possible quark-helicity combinations, k ⊥ the average momentum of the quark pair along the transverse direction, as defined by the hadron's (light-cone) momentum, p; and ∆ ⊥ the transverse components of the fourmomentum transfer between quarks.The light-front wave function giving rise to the algebraic model was obtained in Euclidean space from an appropriate projection of the pion's Bethe-Salpeter wave function [63], which was in turn constructed using a Nakanishi representation for the Bethe-Salpeter amplitude [64,65] where M is a mass-scale, supplemented with an Ansatz of the form for the quark-propagator.Such a parametrization allows for a fully algebraic treatment leading to the evaluation of the two possible light-front wave functions in closed form [57], thus the denomination of the resulting GPD model as "algebraic".Feeding the overlap representation Eq. (A1) with those LFWFs yields Here from, the companion ERBL region was constructed in [25] following the dictations of Lorentz covariance as captured by the renowned polynomiality property and implemented through the (inverse) Radon transform [24,38].As a result, the underlying DD is found [25] h Alg.As a final remark notice that no explicit reference has been made to the scale µ in the presentation of this model.Its setting is implicit in the expression Eq. (A1), where a meson is being represented by two-body LFWFs.A discussion about this subject is definitely outside the scope of the present work, the interested reader can find more information in e.g.[43] and references therein.
• The profile of a GPD along the x-direction is basically determined by that of f (x; µ) ≡ H (x, ξ = 0; µ), i.e. the parton distribution function.
• The shape of a GPD along the ξ-direction characterizes the spread of parton momentum induced by momentum transfer.
A simple way to combine these two assumptions is to model GPDs through DDs of the form: Strikingly, the α-dependence of such profile function is entirely controlled by a single parameter, N , producing a rather inflexible modeling 14 .
In [32,[44][45][46] this approach is followed to build the renowned Goloskokov-Kroll model.There, the valence-quark GPD in nucleons is designed choosing N = 1 in the profile function π N (β, α), which reproduces the expected asymptotic behavior for the quark distribution amplitude [32]; and employing a phenomenological Ansatz for the parton distribution function,

FIG. 2 :
FIG.2:(FEM) Algebraic model -The left panel shows the DD hFEM(β, α) obtained through the procedure described in Sec.III A (solid lines), displayed as a function of the β variable for three different values of α = {0.2,0.5, 0.7} (0 ≤ β ≤ 1 − α).For comparison, the exact analytical result of Eq. (A5) is drawn through dashed lines.The sampling of the DGLAP region corresponds to ξ ∈ [0, λx] with λ = 1 (the entire domain).The right plot displays the GPD H (x, ξ) at the illustration value ξ = 0.5, obtained as the Radon transform of the DD shown in the left, compared to the analytical expression, Eqs.(A4) and (A6).

FIG. 7 :
FIG. 7: (ANNs) Algebraic model -The left panel shows the DD hANN(β, α) obtained through the procedure described in Sec.III B (solid lines), displayed as a function of the β variable for three different values of α = {0.2,0.5, 0.7} (0 ≤ β ≤ 1 − α).For comparison, the exact analytical result of Eq. (A5) is drawn through dashed lines.The sampling of the DGLAP region corresponds to ξ ∈ [0, λx] with λ = 1 (the entire domain).The right panel displays the GPD H (x, ξ) at the illustration value ξ = 0.5, obtained as the Radon transform of the DD shown in the left, compared to the analytical expression, Eqs.(A4) and (A6).

FIG. 9 :
FIG.9:(ANNs) GK model -The left panel shows the profile function, πANN(β, α) (solid lines), the direct ANN output which the DD follows immediately from as given in Eq. (3.10), obtained through the procedure described in Sec.III B and displayed as a function of the β variable for three different values of α = {0.2,0.5, 0.7} (0 ≤ β ≤ 1 − α).It compares strikingly well with the exact expressions, Eq. (A10) with N = 1(dashed lines).The sampling of the DGLAP region corresponds to ξ ∈ [0, λx] with λ = 1 (the entire domain).The right plot displays the GPD H (x, ξ) at the illustration value ξ = 0.5, obtained as the Radon transform of the DD shown in the left, compared to the analytical expression, Eqs.(A13) and (A14).
and c j -parameters determined from a fit to the CTEQ6m PDF[44,68] (see Tab. II).Combining the parametrization above with Radyushkin's DD Ansatz generates the Goloskokov-Kroll model for the DD of valence-quarks within nucleons h GK (β, α; µ) ≡ h