Two-nucleon S-wave interactions at the SU(3) flavor-symmetric point with mud≈ msphys A first lattice QCD calculation with the stochastic Laplacian Heaviside method

Two-nucleon S-wave interactions at the SU(3) flavor-symmetric point with mud≈ msphys: A first lattice QCD calculation with the stochastic Laplacian Heaviside method. We report on the ﬁrst application of the stochastic Laplacian Heaviside method for computing multiparticle interactions with lattice QCD to the two-nucleon system. Like the Laplacian Heaviside method, this method allows for the construction of interpolating operators which can be used to construct a set of positive-deﬁnite two-nucleon correlation functions, unlike nearly all other applications of lattice QCD to two nucleons in the literature. It also allows for a variational analysis in which optimal linear combinations of the interpolating operators are formed that couple predominantly to the eigenstates of the system. Utilizing such methods has become of paramount importance to help resolve the discrepancy in the literature on whether two nucleons in either isospin channel form a bound state at pion masses heavier than physical, with the discrepancy persisting even in the SU(3)-ﬂavor-symmetric point with all quark masses near the physical strange quark mass. This is the ﬁrst in a series of papers aimed at resolving this discrepancy. In the present work, we employ the stochastic Laplacian Heaviside method without a hexaquark operator in the basis at a lattice spacing of a ≈ 0 . 086 fm, lattice volume of L = 48 a ≈ 4 . 1 fm and pion mass m π ≈ 714 MeV. With this setup, the observed spectrum of two-nucleon energy levels strongly disfavors the presence of a bound state in either the deuteron or dineutron channel.

(Received 14 October 2020; accepted 4 December 2020; published 19 January 2021) We report on the first application of the stochastic Laplacian Heaviside method for computing multiparticle interactions with lattice QCD to the two-nucleon system. Like the Laplacian Heaviside method, this method allows for the construction of interpolating operators which can be used to construct a set of positive-definite two-nucleon correlation functions, unlike nearly all other applications of lattice QCD to two nucleons in the literature. It also allows for a variational analysis in which optimal linear combinations of the interpolating operators are formed that couple predominantly to the eigenstates of the system. Utilizing such methods has become of paramount importance to help resolve the discrepancy in the literature on whether two nucleons in either isospin channel form a bound state at pion masses heavier than physical, with the discrepancy persisting even in the SU(3)-flavor-symmetric point with all quark masses near the physical strange quark mass. This is the first in a series of papers aimed at resolving this discrepancy. In the present work, we employ the stochastic Laplacian Heaviside method without a hexaquark operator in the basis at a lattice spacing of a ≈ 0.086 fm, lattice volume of L = 48a ≈ 4.1 fm and pion mass m π ≈ 714 MeV. With this setup, the observed spectrum of two-nucleon energy levels strongly disfavors the presence of a bound state in either the deuteron or dineutron channel. DOI: 10.1103/PhysRevC.103.014003

I. INTRODUCTION
Quantum chromodynamics (QCD), the fundamental theory of nuclear strong interactions, encodes the interactions of nearly massless quarks and massless gluons which are confined into protons and neutrons, the nucleons, with a mass of O(1) GeV. These nucleons, which form the basis of matter, have a residual strong interaction that leads to the formation challenges, and prospects of connecting our understanding of nuclear physics to the SM through a coupling of LQCD and effective theories, see the recent review articles in Refs. [1][2][3].
The first application of LQCD to the two-nucleon system was in the quenched approximation (infinitely massivē qq virtual pairs) 25 years ago [4]. The next calculation was performed in 2006 using dynamical quarks with pion masses ranging from 350 m π 600 MeV [5]. Since that time, there has been a measured growth in the application of LQCD to systems with two or more baryons, with the first calculation of a bound two-baryon system appearing in 2010 [6,7]. However, significant challenges, most notably the exponentially bad signal-to-noise (StoN) ratio [8], have prevented substantive progress: some 15 years later, even with all the growth in computing power and algorithmic advances, there are still no computations of two-nucleon systems utilizing the Lüscher method [9,10] with pion masses lighter than m π ≈ 300 MeV [11].
However, we have seen the emergence of LQCD calculations of light nuclei at m π ≈ 800 MeV (up to A = 4) [12,13] which have been used to match to a pion-less effective field theory of few-nucleon interactions and used to calibrate and predict nuclei up to A = 6 [14]. We have also seen the development of a new method which first constructs a two-nucleon potential, known as the HAL QCD potential [15][16][17][18][19], from which a Schrödinger equation is solved and can then be used to predict the scattering phase shifts. If the HAL QCD method can be demonstrated to numerically agree with the Lüscher method, then it offers a promising alternative for computing the interactions of baryons from LQCD.
However, there is controversy in the literature concerning the aforementioned agreement and, in turn, there is a discrepancy on whether or not two nucleons form a bound state at medium and heavy pion masses. In short, most calculations of two-nucleons that utilize the Lüscher method observe the presence of (deeply) bound deuteron and dineutron systems at pion masses larger than ≈300 MeV [13,[20][21][22][23][24]-with the exception of the Mainz group which found a bound dineutron to be unlikely at m π ≈ 960 MeV [25]-while the HAL QCD Collaboration, utilizing their potential method, concludes that there are no bound states in either channel [17,26]. For a more detailed discussion of the controversy, see the recent review in Ref. [1].
Some have found it tempting to think this disagreement is a demonstration that the HAL QCD method has uncontrolled systematic uncertainties. However, while the Lüscher method provides a rigorous mapping between the finite-volume energy spectrum and the infinite-volume scattering amplitudes, there are potential unresolved systematic uncertainties in the application of the method, particularly in properly identifying the multiparticle energy spectrum. All applications that observe the existence of bound two-nucleon systems rely upon a local hexaquark creation operator at the source and dilute, two-nucleon momentum-space annihilation operators at the sink. The HAL QCD Collaboration has shown that the extracted spectrum in many of these cases does not pass basic consistency checks, demonstrating that there are larger systematic uncertainties than have been reported [27]. Combined with the StoN challenges and the very small elastic scattering energy gaps, this has led HAL QCD to speculate the calculations which observe bound states have been misled by "false plateaux" in the effective masses of the system, which can arise with non-positive-definite correlation functions [28]. 1 These non-positive-definite correlation functions require the assumption that the overlap onto the eigenstates of the system are dominated by a single interpolating field constructed from the projection of each nucleon individually onto a state of definite momentum at the sink side. Consider the center of mass (CoM) for simplicity, such that where q is the relative interacting momentum, which is determined through the Lüscher quantization condition; p is given by a noninteracting plane-wave momentum mode allowed in the finite periodic volume, p = 2π L n with n vector of integers; and c(p) is a weight which can be chosen arbitrarily if a single momentum mode p dominates the overlap (in practice, the existing calculations have chosen c(p) = 1). For weakly interacting systems, such as I = 2 ππ scattering, this type of interpolating fields works reasonably well as demonstrated by the consistency between the results from NPLQCD [32] and HadSpec [33] which utilized this simplistic operator and a full variational basis, respectively. For strongly interacting systems, such as the two-nucleon system, the results in the literature are insufficient to draw a conclusion one way or the other as to how well this simplistic basis of interpolating fields couples cleanly to the spectrum.
In contrast, with a variational basis of interpolating fields, one is not restricted to this assumption and instead utilizes a linear combination of creation and annihilation operators, where now the c(p) coefficients are determined through a diagonalization of the set of interpolating fields used and constrained by the numerical values of the correlation function. Even with the variational basis, experience shows that it is still necessary to have a large basis of operators which provide sufficient overlap onto the various states of the system. For example, in the I = 1 ππ system, one must include operators that look both like local ρ operators (qγ μ q) as well as displaced two-pion operators to obtain a spectrum that is consistent with the expected ρ resonance [34]. A similar study of the negative parity nucleon found that a nonlocal Nπ operator is required [35]. In the case of the two-nucleon system, it could be that the hexaquark operator is important for coupling to a deeply bound state, as speculated in Ref. [22]. 2 In the present work we take a first step toward trying to resolve this discrepancy by performing the first LQCD calculation of the two-nucleon systems in both isospin channels using a positive-definite correlation matrix with a variational basis of operators. The first application of a variational basis to two-baryon systems was applied recently by the Mainz group to the H-dibaryon and dineutron systems [25,36] in which significant tension with the local hexaquark results from NPLQCD was observed [6,13], although some tension with the HAL QCD potential method exists as well [7,26]. One possible explanation for this is the use of only two dynamical quark flavors (with a "quenched" strange quark) from Mainz, giving rise to potentially large systematic effects in the determination of the binding energy as compared to HAL QCD and NPLQCD. Furthermore, all calculations were performed with a single lattice spacing with different lattice actions.
For the present work, we focus on the two two-nucleon channels with total isospin I = 0 and 1, which we refer to as the deuteron and dineutron channel, respectively. We utilize the stochastic Laplacian Heaviside method [37], which is a stochastic variant of the distillation method [38]. We will summarize the method in Sec. II. As discussed in Sec. III, our calculation strongly disfavors a bound state in either the deuteron or dineutron channel. We discuss the implications of this work as well as the limitations and provide an outlook in Sec. IV. For the broader community to have confidence in the application of LQCD to nuclear physics, it is of paramount importance to resolve the issue underlying the contradictory results in the literature.

II. STOCHASTIC LAPLACIAN HEAVISIDE METHOD
A successful computation of two-nucleon energies relies heavily on the construction of optimal operators. Unfortunately, this leads to several sources of computational difficulty. First, the six valence quarks present in two-nucleon correlation functions give rise to a large number of Wick contractions. Next, to maximize overlap onto the individual finite-volume two-nucleon states, both nucleon interpolating operators should be projected onto a definite spatial momentum at the source and sink. Finally, two-nucleon interpolators which transform irreducibly under the finite-volume remnant of rotational symmetry require a summation over two-nucleon momentum combinations which transform among themselves under the little group.
The considerations above necessitate a flexible and efficient treatment of all-to-all quark propagation in which the quark propagator is determined between all spatial lattice points. The stochastic Laplacian Heaviside (LapH) method, which is employed here, has been successful for two-meson [39,40] and meson-baryon [41] correlators. This method enables a particular choice of quark smearing in which the quark fields are projected onto the space spanned by the lowest N ev eigenmodes of the gauge-covariant three-dimensional laplace operator [38]. Stochastic estimators with N r noise sources are then introduced in this N ev × N spin dimensional LapH subspace rather than the entire spatial lattice, significantly improving the variance [37].
The stochastic estimators are improved by 'dilution' [42], in which each stochastic field is partitioned into N dil fields, each of which has support on a unique subset of the LapH subspace. For this work we employ N ev = 384, full spin dilution, and 12 LapH eigenvector projectors, so N dil = 4 × 12 = 48. 3 To ensure unbiased estimates of products of quark propagators, independent stochastic fields are required for each valence quark line, so that estimates for the quark propagators are given by where (r, d ) denote the noise and dilution indices, respectively, ρ(y) is a stochastic combination of LapH eigenvectors and φ = Qρ is the result of a linear system solve, which are solved efficiently in GPU accelerated nodes with QUDA [43,44]. The computation of two-nucleon correlation functions is also simplified with stochastic LapH. Each two-nucleon interpolator trasnforming irreducibly is given by with definite isospin (I, I 3 ), little group irrep , irrep row λ, and total momentum P. The additional identifier distinguishes multiple linearly independent operators of this type, while the labels 1,2 for the single nucleon operators denote the individual I 3 and momenta p 1,2 . Note that all terms have p 1 + p 2 = P, with the p 1,2 in different terms related via little group transformations. The coefficients c II 3 λ 1 2 are determined according to Ref. [45] and are available upon request.
When using stochastic LapH estimates for quark propagators, temporal correlators factorize into "source" and "sink" functions which depend on the fields in Eq. (3) at a given Euclidean time separation. For single nucleons, these fields are where we have used the shorthand i k = (r k , d k ) to combine the noise and dilution indices.
The rank-three tensors of Eq. (5) are contracted over the i k to project onto definite (I, I 3 ) and treat all Wick contractions for each of the terms in Eq. (4). To produce an unbiased stochastic estimate, each of the six valence quark lines in a two nucleon correlation function require a different r k . However, for a given set of stochastic sources, each permutation and combination of six r k produces a new (in principle correlated) estimate. However, even using the minimal number of noise sources but moderately increasing the number of permutations results in a scaling of the statistical errors consistent with 3 In practice only the upper two spin components are used for the computation of states propagating forward in time, reducing the effective N dil by a factor two in the correlator construction. independent measurements [46]. We use the maximal number of permutations of six noise sources, accounting for nucleonlevel symmetries, giving a total of 180 permutations. Further details of algorithmic improvements and the optimization of our developed code are given in Appendix B.

III. LATTICE CALCULATION
We employ an isotropic clover-Wilson action with N f = 2 + 1 dynamical fermions that matches the setup being used by the CLS Collaboration [47]. We have generated the new C103 ensemble with m u = m d = m s ≈ m phys s , using the openQCD code [48] on the BlueGene/Q machine at LLNL (Vulcan). The lattice spacing is a ≈ 0.086 fm [49] with a lattice extent V = 48 3 × 96, periodic boundary conditions in space and thermal boundary conditions in Euclidean time. The C103 ensemble has four thermalized replicas (streams) of about 400 configurations, and each replica is started from different thermalized configurations and with different random seeds. Each configuration is saved after 2 HMC trajectories of length τ = 2 in molecular dynamic time units. The bare parameters of the lattice action are provided in Table I. The present computation of two-nucleon correlation functions uses four time sources [cf. t in Eq. (5)] on 802 configurations spanning two of the replicas, for a total of 3208 time-sources.

A. Correlation functions
At low temperatures the spectral decomposition of a twopoint correlation function is given by where z i,n = |O i |n is the overlap of the nth enegy eigenstate onto the vacuum through the annihilation operator O i . If the creation and annihilation operators come from a Hermitian-conjugate basis, then this correlation function is positive definite such that all z i,nz † i,n = |z i,n | 2 0. This simple fact greatly simplifies the analysis of excited-state contamination to the ground-state contribution in Eq. (6). Specifically it eliminates the possibility of having a false plateau which could be generated by opposite sign contributions to Eq. (6) from the lowest lying states in the spectrum.
For single-hadron correlation functions, a calculation which uses local point or gaussian-smeared (Wuppertal [50,51]) quark sources for the hadron creation operator while using momentum-space annihilation operators is still positive definite, since translation invariance ensures that, up to a multiplicative constant arising from the Fourier transform, the creation and annihilation operators are still Hermitian conjugate to each other. If the annihilation operator of a two-hadron correlation function was constructed with a sin-gle total-momentum Fourier transform, then it would also be positive definite for the same reason, but it is well known that such operators do not provide enough control over the eigenstates of the system to reliably extract the multitude of energy levels corresponding to the two hadrons interacting at different values of relative momentum. Therefore, two-hadron correlation functions are typically computed with each of the two final-state hadrons separately Fourier transformed to a particular final-state momentum. Unfortunately, such annihilation operators are no longer Hermitian conjugates of the spatially local creation operators, and thus the correlation functions lose their positive-definite quality.
The sLapH (and LapH) methods allow for the construction of Hermitian-conjugate pairs of creation and annihilation operators in which each hadron at the source and sink can be separately Fourier transformed. The advantage is twofold: a volume-averaging effect at the source as well as the sink, improving the stochastic precision and a positive-definite matrix of correlation functions.
Another well-known feature of two-nucleon calculations is that a ratio of correlation functions constructed as provides the best way to estimate the interaction energy. The stochastic correlation between the two-nucleon and singlenucleon correlation functions, C NN and C N , is very strong, and the ratio R benefits from a large cancellation of the singlehadron inelastic excited states: the effective mass of this ratio correlation function R yields a precise estimate of the interaction energy. However, prior to a time separation when the single-hadron correlation function has relaxed to the ground state, this ratio correlation function can be susceptible to false plateaus: the Taylor expansion of the single-hadron correlators in the denominator leads to opposite sign contributions to the ratio correlation function which are precisely the kind of corrections that can lead to false plateaus.
To describe this feature more precisely, suppose the singlenucleon correlation function was described by just the ground state and a single excited state, In this simplistic model, the two-nucleon correlation function would be given by where the sums over q,q, and q run over the elastic scattering modes between two ground-state nucleons, a ground and excited state, and between two excited states, respectively, as allowed by the Lüscher quantization condition. The interaction energies E 00 , E 10 , and E 11 depend upon the relative momentum between the states (q) and are typically 014003-4 much smaller than the inelastic excited-state energy E 1 − E 0 as these elastic scattering energies must vanish as L → ∞ except in the case of a bound state. The large-time behavior of the ratio correlation function is then approximated by where the b 00,n , b 10,0 , and a 1 are ratios of overlap factors. The observed near-exact cancellation of inelastic excited states in the ratio manifests as near-exact cancellation between the b 10,0 and −2a 1 terms on the second line of Eq. (10). Such cancellations with opposite signs can lead to false plateaus early in Euclidean time before the time-separation in which the single-nucleon correlation function is saturated by the ground state. 4 To avoid this problem, the NPLQCD Collaboration has long advocated that a sufficient amount of statistics should be used such that the interaction energies can be precisely determined without the need of relying upon the ratio correlation function, but rather the two-nucleon and single-nucleon correlation functions can be fit independently and E 00 (q) can be extracted under jackknife or bootstrap resampling of the ground-state energies such determined [6,52,53].
For many calculations, including the present one, the statistical precision is insufficient to achieve a multisigma determination of the interaction energy from fits to the twonucleon and single-nucleon correlation functions separately. A simple measure of the feasibility of such a strategy is whether one can use the ratio correlation function R(t ) only at sufficiently late times that the single-nucleon has plateaued and still achieve a convincing energy extraction of the interaction energy [24]. This is almost the case in the present calculation, but we require the use of a few time slices [O(0.17-0.35) fm] prior to the ground-state saturation of the single-nucleon correlation functions.
The desire to leverage the positive-definite nature of the two-nucleon correlation functions with sLapH, and that, with the present stochastic precision, we must rely upon values of the correlation function prior to the single-nucleon being saturated by just the ground state, motivates the following set of correlation functions and their parametrizations. First, we factorize the spectral decomposition by pulling out the ground-state contribution as a prefactor. For a single nucleon of momentum q, we parametrize the correlation function as with an implicit sum over all excited states n > 0. The groundstate energy is E q 0 and The ground-state overlap factor is given by z q,0 , and the z q,n are the ratio of overlap factors of the nth state to the ground state which all satisfy the bound z q,n > 0.
To take advantage of the positive definiteness of the twonucleon correlation function and also the cancellation of excited states in the ratio correlation function, instead of fitting the two-nucleon correlator, we fit the ratio correlator but with the following functional form with implicit summations over the l, n, and m excited states. The various new terms in this expression are the energy gap between the lth two-nucleon excited-state and the two-nucleon ground-state energy. The lth energy gap can arise from either an elastic scattering state of the two groundstate nucleons or when one or both nucleons are in an inelastic excited state; , the ratio of the ground-state two-nucleon overlap factor to the product of singlenucleon overlap factors; (4) r 2 l = (z NN l /z NN 0 ) 2 > 0, the ratio of the lth two-nucleon overlap factors to the ground-state two-nucleon overlap factor, which are all positive.
With this fit function, Eq. (13), if an equal number of "inelastic" excited states are included in the numerator as in the denominator, as well as possibly extra "elastic" excited states, then the fit function can naturally capture the cancellation of the inelastic excited states from the single nucleon that are observed to also pollute the two-nucleon correlation functions at early times, without forcing this cancellation to be exact. In the next section, we will demonstrate the stability of the analysis with respect to the time-range and number of states used in the analysis.

The pion
We first look at the pion correlation function to estimate m π . A single operator was used to construct this correlation function which is fit to the cosh version of Eq. (6) to take into account wrap-around effects, Figure 1 shows an N-state stability plot of the ground-state pion mass versus t min with the chosen fit (given by the filled symbol) coming from N = 3 states and t min = 3. The fits were performed with a Bayesian constrained analysis [54], resulting in a determination of the pion mass in lattice units of

Single nucleon analysis
We then move on to study the mass of single nucleons. The single-nucleon correlation functions were fit with Eq. (11), also using a Bayesian constrained analysis [54]. The groundstate energy prior is estimated from the long-time behavior of the effective mass and the ground-state overlap factor is estimated from an effective overlap construction: The prior central values are taken from the mean of these at a late reference time of t = 10 and the prior widths are taken to be 10 times the uncertainties on the relative effective quantity at this time. For the excited-state energy splittings, we use a log-normal distributed prior such that the total energies are ordered. The mean values of the priors are estimated at twice the pion mass with a width that comes down a little lower than the first Nπ p-wave scattering state. The central value of the lth state energy and the excited-state overlap factors are then priored as where E q is the mean value and l = 1 is the first excited state. We use the notation (p c , p w ) to represent a prior with central value p c and width p w assuming that its distribution is Normal unless the prior name is ln(·), in which case a lognormal distribution is assumed. In Fig. 2, we show the resulting ground-state energy of the nucleon at rest versus t min and the number of excited states. It is sufficient to chose n = 3 states (2 excited states) to fit the single nucleon as early as t min = 2 to achieve an answer that is consistent with the general stability displayed. We observe very similar stability of the ground-state mass for all of the boosted single-nucleon correlation functions which are shown in the github repository accompanying this publication  [55]. In all cases, we observe an n = 3 fit from t min = 2 is in excellent agreement with the general stability of the ground state as well as n = 2 with t min = 5. We use these two choices for our analysis and to explore systematics associated with the choice of the number of states and fit range. We find in lattice units

Two-nucleon analysis
To determine the two-nucleon eigenstates, a correlation matrix, , is formed from the set of operators,Ô i (t ), which have been projected onto a given (P, ). Solutions to the following generalized eigenvalue problem (GEVP), for given reference times t d , t 0 , may then be used to rotate the correlation matrices, to a basis consisting of linear combinations of operators having optimal overlap (for the given basis) onto the eigenstates of the system. From this set of correlation functions we form the ratio R using Eq. (7) with single-nucleon correlators corresponding to momenta p n of the nearest noninteracting energy level for a given state, n. This ratio is then fit to the functional form of Eq. (13) with similar Bayesian methods as the single-nucleon case. Priors for the various parameters are chosen as follows: (1) E NN 0 : similar to the single nucleon, these are estimated from the effective mass of the ratio correlation function at a reference time of t = 10 with the prior mean estimated from the mean of the effective mass 014003-6 and a prior width that is 10 times larger than the uncertainty of the effective mass; (2) E NN l,0 : We add two towers of excited states, one corresponding to inelastic excited states with prior means and widths estimated as with the single nucleon inelastic excited states and a second tower with energy gaps estimated to arise from elastic scattering states. Since we expect the GEVP to remove the low-lying elastic scattering excited states, the gap to the first excited state is estimated to be several levels above the ground state with a prior width that allows it to be as small as the first anticipated excited scattering state or as large as an inelastic single nucleon excited state.

C. Phase shift analysis
The Lüscher finite-volume formalism [9,10], and its extension to moving frames [56,57] and various generalizations [58][59][60][61][62][63][64], allows one to faithfully connect the finite-volume two-particle spectrum to the corresponding infinite-volume scattering phase shifts at the momenta associated with those energies. The reduced hypercubic symmetry of the lattices, however, mixes the partial waves associated with spherical symmetry in infinite volume. Thus, a system which has been projected onto the given irreps of the hypercubic group will, in general, have nonzero overlap with an infinite number of partial waves, and therefore a truncation in the partial waves considered is required. Fortunately, at low energies, we expect contributions from a partial wave, l, to fall off as E −l , justifying a truncation to the lowest partial waves that couple to a given finite-volume irrep.
In this first look at NN interactions with sLapH (the present work), we ignore all partial-wave mixing induced by the finite, periodic volume and restrict ourselves to considering the swave interactions (a standard choice in the field so far for two-baryons). We also restrict the energies considered to those below the t-channel cut, q * m π /2 where q * is the magnitude of the momentum of each nucleon in the center-of-mass (CoM) frame. We will relax these restrictions and assumptions in a forthcoming paper where we explore the partial wave mixing and energies up to the inelastic pion-production threshold.
For the low energies considered here, the K-matrix for each partial wave is expected to be well described by a smooth polynomial in q * 2 , known as the effective range expansion (ERE), where δ(q * ) is the scattering phase shift, a is the scattering length, r 0 is the effective range, and r n , n > 0 are higher-order shape parameters which give the short-distance details of the potential. In terms of the potential, the convergence of the ERE is expected to be rapid for qR 1, where R is the range of the potential.
Under the assumption that partial wave mixing is negligible, the Lüscher quantization condition provides a one-to-one mapping between the spectrum and q * cot δ(q * ), which for the s-wave is where γ is the ratio of the energy to the CoM energy, γ = E /E * , and Z d 00 is a generalized ζ function defined in Ref. [56], characterized by the boost vector The input values q * are derived starting from the lattice extracted energies, E = E * 2 + |P| 2 , where E * is then related to q * via There are several ways to proceed in fitting the numerical results to extract the ERE parameters; here, we discuss three. In the first method, referred to as the determinant residual method, the Lüscher quantization condition (truncated to some maximum partial wave and parameterized appropriately) is used directly to form the residuals of the χ 2 function, which is subsequently minimized [65]. Using the quantization condition directly in the fitting procedure is a natural way to include multiple partial waves. A convenient feature of this method, as opposed to methods that directly solve the quantization condition, is that the generalized ζ functions can all be computed once before the minimization process starts. However, one cannot avoid recomputing the covariance matrix each time the parameters are adjusted during the fit, since the model cannot be separated from the data. In subsequent papers, we will explore this method in more detail when we consider the partial wave-mixing induced both by the finite volume as well as the physical mixing of the 3 S1 -3 D1 waves in the deuteron channel.
The second method, which we refer to as the q cot δ method, has been common in the application to two-baryon systems under the truncated partial-wave expansion (also considered here). First, one converts the energy levels, which are determined typically with Gaussian distributed noise, to values of the CoM momentum Eq. (24) which are used to determine the phase shift values through Eq. (22). These values of the q * cot δ(q * ) are then fit with the ERE Eq. (21), to determine the values of a, r 0 , and other shape parameters that describe the low-energy interactions.
As is well known, the ζ functions appearing in Eq. (22) have nonlinear dependence upon q * in the typical range over which the momenta can be determined. This transforms the roughly Gaussian distributed determination of E (and hence q * ) into a highly asymmetric distribution of q * cot δ(q * ). Moreover, it is common to perform the ERE fit by treating q * cot δ(q * ) data points as having uncertainties in both the x and y directions. However, under the assumption of no partial wave mixing, the Lüscher quantization condition, Eq. (22) 014003-7 provides a one-to-one mapping between the x (q * 2 ) and y [q * cot δ(q * )] values, such that there is really only a single variable with uncertainty. For sufficiently precise determinations of q * values such that a linear approximation to Eq. (22) describes the results, treating the pairs of [q * 2 , q * cot δ(q * )] points with correlated uncertainties is expected to faithfully reproduce the true uncertainty with the standard linear transformations for handling x and y uncertainty. However, when the nonlinearity of Eq. (22) is important, this method can produce biased results. See, for example, Ref. [27] for a treatment that enforces this constraint from Eq. (22).
We propose an alternative method that properly handles this nonlinear relationship, which we refer to as the spline/gradient method. Consider a bootstrap (BS) resampling of the values of (X i , pairs on irrep i. For a given BS sample, one can define the squared distance between this point and the intersection of the ERE function with the ith irrep as the distance along the curve defined through Eq. (22), which we denote Y i = f (X i ) for convenience, with the distance given by where f i is the derivative of f along the curve, z i is a generalized coordinate along the curve, z i,β is the location of intersection of the ERE function with the ith irrep, and z i,bs is the coordinate of the bsth sample of irrep i. These BS distances can be used to construct the objective function that penalizes data discrepancy between all irreps with the intersection of the ERE parametrization. Then, an uncorrelated, unweighted least-square penalty for BS sample bs would be given by i s i ( f i , z i,β , z i,bs ) 2 . To estimate the appropriate covariance, we leverage the delta method [66] that scales the covariance from X using the gradient of f (X ). For a normally distributed set of X variables of mean μ X , with X is the covariance of the X variables over the irreps, the delta method states for a differentiable f , the distribution of f (X ) follows where ∇ f (μ X ) is the vector of f i over the irreps. And thus, the correlated objective function we can minimize to estimate the ERE parameters (β), for a given BS sample is given bỹ where the inverse "covariance matrix" for sample bs is While the variance of X is fixed for each BS sample, the gradients are evaluated sample by sample. To estimate the distance and gradient along the Lüscher curve, a cubic spline is fit to each pair of values using the BS samples. The ERE parameters are then estimated with the resulting BS distribution ofβ BS . For more detail, see Appendix C and Ref. [67].
A third method we consider is essentially the "spectrum method" described in Ref. [65], which directly minimizes spectrum residuals, thus avoiding skewed q cot δ distributions. This method is composed into two steps: An outer step, effectively computing the spectrum as a function of ERE parameters and an inner root-finding step, solving for the value of q * 2 which satisfies the Eq. (21) = (22) for given q cot δ parametrization. This inner step performs a leastsquares minimization for fixed ERE parameters that minimize the residual of the predicted q * 2 values with those determined from the spectrum. The outer step computes a function f (a, r 0 , ...; irrep, P, n) that returns the value of q * 2 for a given irrep at boost P of the nth principle correlator, which are compared against the spectrum by q * 2 irrep,P,n = E * 2 /4 − m 2 N which is the numerical value of q * 2 for the same state. We then minimize the χ 2 with respect to the ERE parameters where β = {a, r 0 , . . . } and i, j are master indices running over the combinations of irrep, P, n. The covariance is constructed from the bootstrap distributions of q * 2 i with respect to the bootstrap means q * 2 i , There are many other variants of extracting the physical parameters from the two-particle spectrum which are discussed in the literature; see, for example, Refs. [68][69][70][71][72][73][74][75][76]. As we will show in Secs. III C 1 and III C 2, of the two methods used in this work, the spectrum method is less susceptible to outliers, since the q * 2 values determined are bound to finite intervals and have a near Gaussian distribution following from their parent E distributions, and therefore, the resulting uncertainty on the extracted ERE parameters is smaller. This is in contrast to the values of q cot δ determined with Eq. (22) as these distributions become highly nonsymmetric and heavy tailed. Nevertheless, the spline/gradient method we introduce reproduces the same values of the ERE parameters and is less susceptible to the heavy-tailed fluctuations than the more standard analysis of q cot δ values one finds in the literature.

Deuteron channel
To extract results for the deuteron channel, we consider all irreps whose lowest partial-wave contribution corresponds to s-wave scattering of nucleons with isospin I = 0 and spin s = 1. To determine the spectrum, we first perform a stability analysis of the two-nucleon correlation function as a function of t min and the number of "elastic" excited states used above and beyond the n = 2 states used for the single nucleon. In Fig. 3, we show sample stability plots for fits to the NN ratio correlation functions in two different irreps. In all irreps, we find that the choices lead to an optimal, or near optimal fit as measured by three factors: (1) Good quality of fit, Q; (2) For a given t min , the highest weight w = e logGBF as measured by the relative Bayes Factor, see Ref. [77] for further discussion on this point where fits with different amounts of data are also considered; (3) Consistency with the long time values of the effective mass of the ratio correlation function.
We opted to select the values of t min = 5 and n el = 0 to be the same for all irreps analyzed to minimize the chance of accidentally biassing the result through a more fine-grained optimization.
Stability plots for all irreps can be found with the git repository accompanying this publication [55]. In Appendix A in Table II, we list the irreps and the resulting groundstate energies of the two-nucleon system and corresponding boosted single nucleons as well as the processed values of q * 2 and q * cot δ in m π units used in this analysis.
In Fig. 4 we show the resulting values of (q * , q * cot δ) from all irreps along with an ERE fit using the spline/gradient method (left) and the spectrum method (right). To cleanly display the correlated distributions of (q * , q * cot δ) pairs, we bootstrap our energy results and show the resulting 68% confidence intervals in the data. We note that the results from different irreps agree nicely within their respective energy ranges. The purely s-wave contributions from each irrep are expected to be consistent with each other, with any discrepancies arising from mixing of higher partial waves. The smooth q cot δ behavior taken from multiple irreps thus gives some confidence that mixing from higher partial waves is negligible within our errors.
We find that the fits to the ERE to q * 2 , next-to-leading order (NLO) and q * 4 , next-to-next-to leading order (NNLO) give consistent results for the phase shift within our energy range at our given uncertainties. This, coupled with the smooth behavior of the data, strongly indicates a convergence of the expansion within the energies considered. Our results for the effective range parameters are as follows:

5.5
+1 resulting in the two solutions Taking the results from the more stable spectrum analysis, the plus solution is found to be In principle, this could correspond to a bound-state solution. However, this solution lies well outside the range where our results are constraining the amplitude (it is the crossing of our q cot δ and − −q 2 at large, negative value of q 2 ). However, this cannot be a physical bound state as the slope of the q cot δ curve is larger than the tangent of the −iq curve at this crossing, as discussed in detail in Ref. [27]. The negative solution lies in the range of our results, is purely imaginary with a negative sign, and thus corresponds to a virtual bound state. This state is expected physically for an attractive interaction with a large, negative scattering length. As the strength of the interaction increases, such that the system would form a 014003-9 TABLE II. Energy levels and phase shifts for the deuteron channel. These are determined with a 2-exponential fit to both the single-nucleon (t = [5,20]) and ratio two-nucleon correlation functions (t = [5,15]) defined in Eqs. (11) and (13). The total momentum is given by the boost vector d, Eq. (23). The state indicates the resulting principle correlation function after performing the GEVP. d 1,2 are the squared boost vectors of the individual nucleons used in the ratio correlation function which is fit to determine the interacting energy E NN and total energy E NN which is converted to the CoM frame E * NN and processed to get q * cot δ.  (24) bound state, the virtual bound-state solution would move toward zero and become a positive imaginary solution which is the bound state. There have been few previous identifications of virtual states with lattice QCD in the two-meson sector [78,79]. Such a bound-state solution would have a positive scattering length, such that the intercept of q cot δ at threshold (q 2 = 0) would be negative. This implies one should find negative values of q cot δ for small, positive q 2 , which we do not find with our results. Thus, the results of this computation strongly disfavor the existence of a bound deuteron at this pion mass, and with this particular action at finite lattice spacing.
These results are not sufficient to rule out a bound state in the system. For example, the operator basis we have chosen, which does not include a hexaquark operator, may not have sufficient overlap with a bound state to correctly extract energy levels. If this were the case, then all of our results would have to systematically shift downwards by several sigma with the inclusion of this otherwise missing operator. In a forthcoming publication, we will investigate the impact of including such a hexaquark operator in the basis, which has yet to be included due to its numerical cost.

Dineutron channel
For the dineutron channel, we similarly chose all irreps corresponding to I = 1, s = 0, having overlap onto the swave as the leading contribution at low energies for values of q * < m π /2. After performing a stability analysis, we also observe the same choice of t min = 5 and n el. = 2 provides an optimal or near-optimal fit for all irreps. The irreps and resulting energies and processed values of q * 2 and q * cot δ are given in Table III in Appendix A.
The resulting values of q * cot δ also suggest minimal partial wave-mixing and a smooth q * 2 dependence. The ERE analysis with the two methods described above is displayed in Fig. 5 and yields the following parameters:

Method
Order m π a m π r 0 m 3 π r 1 q cot δ NLO −6.6 Taken together, our results, while not conclusive, strongly disfavor the presence of a bound state in either the deuteron or dineutron channel.

IV. DISCUSSION AND OUTLOOK
We have presented the first lattice QCD calculation of two-nucleon systems using the stochastic Laplacian Heaviside method [37]. There are only two such two-baryon calculations using a variational operator basis in the literature, the other being an application to the H-dibaryon system and the dineutron system [25,36] using the more common "distillation" method [38]. In this work and Refs. [25,36], the pion mass is rather heavy. In our case it is set approximately equal to the physical strange quark mass resulting in m π ≈ 714 MeV, and in Ref. [25] it corresponds to m π ≈ 960 MeV in an N f = 2 calculation. 5 Even at the SU(3) flavor-symmetric point with a heavy pion mass, the pion is the lightest propagating degree of freedom emerging from QCD. It is therefore natural to measure other length scales with respect to the pion mass and possibly natural to expect that the range R of the potential would approximately be given by m −1 π . While the scattering length can take on any value (with the unitary limit, a → ∞, being the crossover between BEC and BCS like systems), the effective range is typically the size of the potential.
In the present work, we have found that in both the dineutron and deuteron channels, the effective range is r 0 m π ≈ 5-9, which is an unusually large value. The delta-nucleon mass splitting is another small energy scale, and it is found that the splitting decreases with increasing pion mass at a mild rate such that it is m − m N ≈ 200 MeV at the SU(3) flavorsymmetric point near the physical strange quark mass [80]. While this is a small energy scale compared to m π , it is not clear how this translates into a range of the two-nucleon potential, but one should keep in mind that QCD does naturally produce such an energy scale. NPLQCD similarly found large values of the effective range in their calculations at m π ≈ 800 MeV [21,24], though not quite as large.
Causality and unitarity can be used to place a bound on the size of the effective range in terms of the range of the potential [81] with corrections arising from a finite scattering length [82], 6 Since we do not know the range of the potential, R, as it is dynamically generated by QCD (and it is not an observable), we can invert this relation and use our determination of the effective range to place a lower bound on R. Using the NLO ERE parameters from the spectrum fit of the deuteron, the real Values of q cot δ in the deuteron channel in the present work compared with NPLQCD results at m π ≈ 800 MeV. While there is some communal expectation that discretization effects should be subdominant, one should be cautious to note that these computations have been performed with different lattice actions at only a single lattice spacing each: in the present case, the CLS clover-Wilson action [47] and with NPLQCD, a single-stout smeared [86], tadpole improved [87] clover-Wilson action [13]. Assuming the discretizaton effects are relatively small, and that the phase shift does not have a strong pion mass dependence, these results are in conflict. solution of Eq. (37) provides the limit which is roughly the same or larger than the size of the nucleon: as the pion mass increases, the pion cloud of the nucleon shrinks till the size of the nucleon roughly corresponds to a size r N ≈ −1 QCD , similar to this value. Perhaps the range of the potential is set by the nucleons coming "into contact" with each other.

A. Comparing with the literature
Several groups [13,[20][21][22][23][24][25] have used the Lüscher method to compute the scattering phase shifts of the two-nucleon systems, deuteron and dineutron, at pion masses larger than 300 MeV. In all cases, except the Mainz group, they have found (deeply) bound states with a reasonable degree of certainty. However, the HAL QCD Collaboration [17,26] has used their potential method to conclude that there are no bound states (again at higher than physical pion masses). Below we discuss possible sources of discrepancy.
We will focus our comparison with the results from NPLQCD at m π ≈ 800 MeV [21,24] as their results are the most similar to ours also being at the SU(3) flavor-symmetric point near the physical strange quark mass. 7 They have found that both the deuteron and dineutron channels form bound 7 NPLQCD also has results at m π ≈ 450 MeV with bound states, however, these results are self-inconsistent as pointed about by HAL QCD [27] as well as with a low-energy scattering analysis [84], and so we do not compare with them. The more recent update of the m π ≈ 450 MeV results [85] addresses these issues and leads to larger uncertainties in the constraint of the scattering parameters, though they still identify bound states in the di-nucleon channels.
states with a relatively large binding energy of B ≈ 20 MeV at the SU(3) flavor-symmetric point. In Fig. 6 we show our present determination of q cot δ in the deuteron channel along with the values from Ref. [24].
As is clearly visible from the figure, the results from NPLQCD and the present work are not compatible with each other: To have a bound state, there must be negative values of q cot δ at positive values of q 2 for the ERE to cross the − −q 2 line with a slope smaller than the tangent to this line [27]. Further, we have no evidence of such large negative values of q 2 as does NPLQCD (the clustering of (green) points around q 2 cm /m 2 π ≈ −0.08). One should always be cautious comparing results at finite lattice spacing, at least from calculations with different lattice actions. There is an expectation in the community that discretization effects are a subdominant source of systematic uncertainty. If this is found to be true (with future work), then there must be another unresolved systematic uncertainty.
While our results strongly disfavor the existence of a bound deuteron or dineutron at this pion mass, they are not sufficient to settle the discrepancy in the literature between HAL QCD [17,26], NPLQCD [13,21,31], Yamazaki et al. [11,20], and CalLat [22]. The possible source of the existing discrepancy can be any of the following: (1) There are larger systematic uncertainties in the HAL QCD method and/or the local two-nucleon creation operators typically used by NPLQCD and Yamazaki et al. than are currently understood. HAL QCD has speculated that these calculations suffer from "false plateaus" that arise through an unfortunate linear combination of elastic scattering states [28]; (2) Our variational calculation has not utilized a local hexaquark creation/annihilation operator. It is possible that such a local operator may couple to a deep bound state with a significantly larger overlap such that, without it, the operator basis is not sufficient to identify the state. If this were the case, then the addition of the hexaquark operator would have to shift the resulting spectrum in all the irreps presented in this work down in a coordinated way that does not spoil the otherwise very smooth q 2 dependence observed. (3) None of the calculations have been performed with more than a single lattice spacing and so there could be larger-than-expected discretization effects which prohibit the rigorous identification (or exclusion) of bound states.
All of these sources of potential systematic uncertainty may need to be explored in more detail to resolve the discrepancy. A good first start would be the use of all methods in the literature on the same set of gauge configurations such that one could eliminate all the systematic uncertainties aside from the method in the determination of the spectrum. Provided the dispersion relation is continuum like in the range of momentum considered, the Lüscher method remains valid. HAL QCD has performed the computation of the spectrum and interactions using their potential method and the Lüscher method with a local source [88], which led them to conclude 014003-12 the local source method has elastic excited-state pollution leading to a false plateau.
In a forthcoming publication, we will compare and contrast our present work (with more statistics) to the local source method used by NPLQCD and Yamazaki et al., as well as the displaced nucleons used by CalLat. With both methods, we will implement the HALQCD potential approach such that we can isolate possible sources of systematic uncertainty arising in the method. We also plan to implement a hexaquark operator into the basis to see how much it shifts the spectrum, if at all. Resolving this discrepancy is critical if we are to have confidence in the application of LQCD to multinucleon systems, and more importantly for the NP and HEP longrange science goals, to be able to compute the response of few-nucleon systems to SM and BSM currents. NPLQCD has invested significant effort in computing such matrix elements, see for example the recent review [89], but if the spectrum has been misidentified, it is not clear how much these systematic uncertainties would modify their results and conclusions.
The phase shift analysis code and resulting bootstrap results of the data presented here are included with the github repository [55]. The correlation functions will be released with a future publication that includes a larger number of correlation functions with a full partial-wave analysis.  The deuteron and dineutron irreps, extracted energies, and processed values of q 2 and q cot δ are provided in Table II and  Table III, respectively.

APPENDIX B: COMPUTATIONAL AND ALGORITHMIC OPTIMIZATION
Several of the kernels required in the stochastic LapH workflow have been implemented to run on NVIDIA V100 GPUs using the QUDA library [43]. We constructed new routines that compute the cross product and contraction of color vectors, as well as specialized routines that compute time-slice reductions. Due to the reduction strategy we employ, the bulk of these contractions is expressed in terms of BLAS3 (matrix-matrix) operations, automatically improving the arithmetic intensity of the computation. These operations take the form for dense matrices A, B, C, and batch index i. The matrix A 0 is constant with respect to the batch index. As such, we wrote interfaces in QUDA for the cuBLAS function stridedBatchZGEMM which minimizes data-transfer latency between the host and device by caching the A 0 matrix. We found that by using the 4 V100 accelerators on a single Lassen (LLNL) node we gained speed-up factors of ≈ 30x over using the host IBM Power-9 CPUs. All the code we constructed specifically for this computation is publicly available in the QUDA GitHub repository. The contraction, time-slice reductions, and cuBLAS interface components are in mainline QUDA.
Overall, correlation function construction requires a large number of tensor contractions, and therefore dedicated computational optimizations are vital. A total of 32 960 correlation functions is computed on each gauge configuration. A strategy to minimize the amount of computational work by optimizing the contraction order as well as re-using common subexpressions is described in Ref. [96]. Following the nomenclature of that reference, 200 370 960 diagrams are left to evaluate after consolidating duplicates in the initial set of 2 052 792 360 diagrams. Eliminating common subexpressions in the set of remaining diagrams reduces the number of computationally dominant contractions with N 4 dil scaling from 344 163 600 to 9 969 360 for a combined speedup by roughly a factor 350× compared to the naive evaluation of all tensor contractions. 014003-13 TABLE III. Energy levels and phase shifts for the dineutron channel. These are determined with a 2-exponential fit to both the singlenucleon (t = [5,20]) and ratio two-nucleon correlation functions defined in Eqs. (11) and (13).

Unbiasedness in weighted regression
In this section we offer a short refresher on the guarantees and assumptions behind classic regression. The statistics regression model specifies the relationship between data (X, Y ) and the associated parameters of interest β via an additive error, where Y is a n × 1 vector, X is a n × p matrix, β is a p × 1 vector, and is a n × 1 random vector. Intuitively, n is the number of data points and p is the number of coefficients in the regression. The usual inference task is to infer β using the noisy observed values of Y . Any estimate for β is often denoted asβ and an estimate is unbiased if E (β ) = β. The weighted regression estimate minimizes the squared loss between the Y values and the inferred line Xβ, β reg = arg min W 1/2 (Y − Xβ ) 2 = (X T W X ) −1 X T WY . We will show thatβ reg is unbiased if Eq. (C1) and the following assumption holds: The addition of mean 0 error in the Y dimension intuitively justifies why minimizing the squared loss in Y alone would yield the unbiased results. Mathematically the derivation further highlights the assumption on treating X as given implied in both Eqs. (C1) and (C2): The most common choice for W = −1 where Cov( |X ) = but its choice affects the variance forβ instead of its unbiasedness. It is also worth pointing out that Normality nor symmetry in are necessary for the unbiasedness to hold.

Assumptions for unbiasedness are not met
In this section we explain the possible errors in fitting the classic weighted least squares for our curve fitting exercise at hand. The first violation is the existence of error in the horizontal axis for each data point. Given the derivations above, if X is measured with error as well, our objective would incorporates both error in X and Y instead of solely minimizing errors in the vertical axis.
The second deviation from the classic setting is the strong nonlinear relationship between the errors in X and Y within each irrep. Although we have errors in both axes, knowing the error in one dimension allows us to infer the error in the other dimension. The seemingly two-dimensional error is therefore more appropriately modeled as having a single source of variability.

Our modified weighted least squares
Our modification essentially converts the problem at hand into the classical settings by redefining the error in terms of squared distance along the curve rather than the vertical axis. Figure 7 demonstrates this modified distance between a data point and any candidate regression line, implied byβ FIG. 7. Plotting the bootstrap samples within a single irrep, demonstrating the difference between the modified distance vs. the usual regression.

014003-14
is the distance along the blue curve between the red points X * i,β to X i, j . Classic regression, however, computes the vertical distance between Y i, j and X i, jβ which forces the regression line toward implausible values.
Let f i denote the functional relationship between X i,· and Y i,· for irrep i so that Y i,· = f i (X i,· ). The general equation for the length along a differentiable curve between 2 points is s( f i , a, b) = | b a 1 + f i (X ) 2 dx|. Let the point of intersection between f i and the regression line will be denoted as (X * i,β , X * i,ββ ). An unweighted least-square penalty would then be i s( f i , X i,β , X i, j ) 2 , where j denotes the index for different bootstrap values.
To weigh the different irreps, we estimate the covariance using a similar approach as in the delta method [66]. The delta method states that if (X − μ X ) ⇒ N (0, X ), then for a differentiable f we have We recycle the same f from calculating the lengths before, and we estimate the covariance of X treating each irrep as a different dimension, specifically: ∇ f (X ·, j ) = whereX i = 1 n j X i, j is the average over the bootstraps, k is the total number of irreps, and W is the weights we will use to scale the modified errors. It is worth noting that the delta method is more practical than directly estimating the covariance empirically because the the distance along the curve between several bootstrap samples on certain irreps are infinite if they are across the critical point of the cot function. These points make empirically estimating the covariance of Y infeasible.
The estimation of f i and f i were done using splines which are piecewise cubic polynomials that ensure the derivative is continuous.