Towards the Solution of the Many-Electron Problem in Real Materials: Equation of State of the Hydrogen Chain with State-of-the-Art Many-Body Methods

We present numerical results for the equation of state of an infinite chain of hydrogen atoms. Avariety of modern many-body methods are employed, with exhaustive cross-checks and validation. Approaches for reaching the continuous space limit and the thermodynamic limit are investigated, proposed, and tested. The detailed comparisons provide a benchmark for assessing the current state of the art in many-body computation, and for the development of new methods. The ground-state energy per atom in the linear chain is accurately determined versus bond length, with a confidence bound given on all uncertainties.


I. INTRODUCTION
One of the grand challenges in modern science is the accurate treatment of interacting many-electron systems. In condensed phase materials, the challenge is increased by the need to account for the interplay between the electrons and the chemical and structural environment. Progress in addressing this challenge will be fundamental to the realization of "materials genome" or materials by design initiatives.
Often the physical properties of materials and molecules are determined by a delicate quantitative balance between competing tendencies, so that accurate computations are required to predict the outcome. The theoretical framework for these calculations, the manyparticle Schrödinger equation, is known [1]. However, the solution of the Schrödinger equation in a many-electron system presents fundamental difficulties originating from combinatorial growth of the dimension of the Hilbert space involved, along with the high degree of entanglement produced by the combination of Fermi statistics * shiwei@wm.edu and electron-electron interactions. Computational methods need to reach beyond the incredible success afforded by density functional theory (DFT), and capture electron correlation effects with sufficient accuracy across different physical parameter regimes.
No general, numerically exact method presently exists that can treat many-electron systems with low computational cost. Except for special cases, known methods either have systematic errors which cannot be easily quantified, or the computational burden scales exponentially or as a very high power of the system size.
Recent years have witnessed remarkable progress in the development of new theories, concepts, methodologies, software and algorithms that have pushed the conceptual horizons and technical boundaries of computational many-body methods, and considerably improved our understanding of interacting electrons in solids and molecules. A vast suite of methods exist which have different strengths and weaknesses and different domains of applicability, and ever more are being developed.
It is imperative, under these circumstances, to develop systematic knowledge by detailed benchmark studies. Comparison of different methods allows characterization of relative accuracy and capabilities, which provides a survey of the state-of-the-art to guide applica-tions. Applying complementary methods synergistically to the same problem enables cross-check and validation, leading to a powerful new paradigm of attack on difficult problems. Cases where results from different methods agree provide valuable benchmarks against which new methods can be tested, thereby facilitating further development and accelerating progress.
Detailed benchmark studies of extended systems have been rare. A major reason is the nature of the problems involved: while it is reasonably straightforward to compare results obtained for finite systems (such as molecules), or at the thermodynamic limit in an independent-electron picture, it is challenging to make reliable calculations in the thermodynamic limit with many-body methods. A recent success is the benchmark study of the Hubbard model [2], and a subsequent multi-method study of the ground-state order in the celebrated 1/8-doped case [3]. With real materials, two more challenges arise. First, the general long-range Coulomb interaction must be treated accurately. Second, manybody calculations often require the use of incomplete one-electron basis sets, whose systematic errors must be removed in order to reach the continuous space, or complete basis set, limit for physical observables.
In this work, we undertake a comprehensive benchmark study of state-of-the-art many-body methods in a more realistic context. We choose the linear hydrogen chain -introduced in Ref. [4] and studied at finite lengths and finite basis sets by several groups [5][6][7][8][9][10] and investigate its ground state versus bondlength under the Born-Oppenheimer approximation of fixed nuclear positions, at the thermodynamic and complete basis set limits.
Hydrogen is the first element in the periodic table and the most abundant in nature. Studies of the H atom, H + 2 cation and H 2 molecule have served as landmarks in quantum physics and chemistry. Despite their deceptive simplicity, bulk H systems are rich and complex. The ground state properties of the hydrogen chain can differ significantly from those of simpler systems such as the Hubbard model, and are, in fact, not completely understood. The linear H chain captures key features that are essential to the generalization of model-system methods to real materials, in particular strong electron correlations of diverse nature arising as the H-H distance is increased, the need to treat the full physical Coulomb interaction, and to work in the continuous space and thermodynamic limits.
Compared to the one-dimensional Hubbard model, the hydrogen chain has multiple (in principle, infinite) orbitals per site, as well as long-range interactions. The use of basis sets defines models of the hydrogen chain of increasing complexity. In a minimal basis, there is only one band, and the problem resembles a one-dimensional Hubbard model with long-range interactions. Larger, more realistic, basis sets bring back characteristics of real materials. Thus one can neatly and systematically connect from a fundamental model of strong electron correlation to a real material system. On the other hand, the H chain eliminates complexities of other materials systems such as the need to separately treat core electrons or incorporate relativistic effects, and is thus accessible to many theoretical methods at their current state of development. As such, the linear hydrogen chain is an ideal first benchmark system for testing the ability of manybody theoretical methods to handle the challenges posed by real materials.
We study finite chains of increasing length, and crosscheck the results against calculations performed using periodic boundary conditions. We also present results from calculations formulated in the thermodynamic limit. Most of the methods employed use a one-particle basis set, and we investigate convergence by obtaining results for a systematic quantum chemistry sequence of basis sets of increasing size. These results are compared to methods formulated directly in real space. With extensive direct comparisons and cross-validations between different methods, we characterize the uncertainties in each approach in detail.
This study presents results obtained from more than a dozen many-body computational methods currently used or under development in physics and chemistry. A vast amount of data is produced, providing detailed information in finite-length chains and with finite basis sets. In the largest systems treated, the size of the Hilbert space exceeds 10 100 . We anticipate that our data, which are made available in the appendices and in online repositories [11], will be useful for benchmarking other existing and future electronic structure methods. In addition to the results and comparisons, we introduce a variety of methodological developments which were spurred by the benchmark, including new approaches. Combining the strengths of complementary methods, we are able to determine the energy per atom in the thermodynamic limit to sub-milli-Hartree accuracy. We hope that the results presented here will serve as a preview of what can be achieved in the predictive computation of the properties of real materials, and provide a firm basis for theoretical progress in condensed matter physics, quantum chemistry, materials science and related fields.
The rest of the paper is organized as follows. In Sec. II we introduce the linear H chain systems that will be studied and define notation. In Sec. III we give a brief overview of the many-body methods employed in the present work. More details on the methods and some of the computational details of each method are given in Appendix A. In Sec. IV we present results for a finite H 10 chain. A subsection summarizes the results within a minimal basis set, followed by one which presents the results in the complete basis set (CBS) limit, and by one that describes the extrapolation to the CBS limit from finite basis set results. In Sec. V we present results in the thermodynamic limit. The first two subsections contain results for the minimal basis and the CBS limit, respectively, while the last subsection presents finite-size results and discusses the procedure for reaching the ther-modynamic limit. A brief discussion and summary of our work, along with future prospects, is then given in Sec. VI. In the Appendices we include further descriptions of the methods, present tables that summarize our data, and provide further details of our benchmark results and procedures. A database of results is also made available electronically [11].

II. THE HYDROGEN CHAIN
We consider a system comprised of N protons, at fixed equispaced positions along a line, with N electrons. This system is described by the Hamiltonian where (r 1 . . . r N ) and (R 1 . . . R N ) are the coordinates of electrons and nuclei, respectively. We will use atomic units throughout, i.e., lengths are measured in Bohr ( a B = 2 /(m e e 2 )) and energies in Hartree (E Ha = e 2 /a B ). In the thermodynamic limit of infinite system size at zero temperature, which is our primary focus, such a system is characterized by only one parameter, the bondlength R separating two adjacent atoms. The electron coordinates are continuous in threedimensional space, while the nuclear coordinates are fixed on a line, e.g., R a = a R e z with R the inter-proton separation and a = 1...N . Most of our calculations are on such finite-size systems, referred to as open boundary conditions (OBC). We have also performed calculations using periodic boundary conditions (PBC), in which case a periodic supercell containing N atoms is treated.
Among the methods employed here, diffusion Monte Carlo (DMC) and variational Monte Carlo (VMC) operate in first-quantization and treat the Hamiltonian in Eq. (1) directly. The other methods use a finite oneelectron basis set, with a total of M orbitals {ϕ p } M p=1 , i.e., m ≡ M/N basis functions per atom (including occupied and virtual orbitals). The Hamiltonian is written in second-quantized form where the creation and annihilation operatorsâ † andâ obey fermionic anticommutation relations and the indices p,q,r,s run over all M single-electron basis functions. Most calculations were performed using standard Gaussian basis sets, but specialized density-matrix renormalization group (DMRG) calculations using a grid along the nuclear axis and Gaussians along the other two directions (sliced-basis) were also performed, as discussed in Sec. A 1 b. Within the Gaussian basis, the matrix elements {h, v} in Eq. (2) are readily obtained from standard quantum chemistry packages. The basis functions are centered on the protons. We use the correlationconsistent cc-pVxZ basis set, with x =D,T,Q, and 5, which correspond to m =5, 14, 30, and 55 orbitals per atom, respectively. For small N , explicit correlation using the F12 technique [12,13] was also considered to help ascertain the approach to the CBS limit. Our procedure for extrapolating to the CBS limit is described in Secs. IV C and V C.
H-chains with nearest-neighbor proton separation (bondlength) R of 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.4, 2.8, 3.2, and 3.6 Bohr are studied. We focus, in this work, on the ground-state energy E(N, R) for different chain sizes and lengths, and obtain the energy per atom, E(N, R) = E(N, R)/N , at the thermodynamic limit (TDL) Below when presenting results on finite chains, we will follow the chemistry convention and use the term potential energy curve (PEC), although we will always refer to energy per atom, E(N, R). When presenting results for the TDL, we will use the term equation of state (EOS) to refer to E TDL (R) vs. R at zero temperature. Most of the methods considered chains with N = 10-102 atoms. The procedure for extrapolating to N → ∞ is discussed in Sec. V C.

III. OVERVIEW OF COMPUTATIONAL METHODS
The methods employed in this work include: • AFQMC: auxiliary-field quantum Monte Carlo [14][15][16] • BDMC: bold diagrammatic Monte Carlo [17][18][19] • DMET: density matrix embedding theory [20][21][22] • DMRG: density matrix renormalization group with a quantum chemistry basis [23][24][25][26] • FCI: full configuration interaction, i.e., exact diagonalization • GF2: self-consistent second-order Green's function [27][28][29][30][31][32] • LR-DMC: lattice-regularized [33] diffusion Monte Carlo (DMC) [34] • MRCI, MRCI+Q: multireference configuration interaction without (MRCI) and with (MRCI+Q) the Davidson correction [35,36] • NEVPT2: partially (PC-) and strongly contracted (SC-) variants of the N-electron valence state second order perturbation theory [37] method deterministic basis set self-consistent variational scaling Wave-function CCSD yes b yes no  Table I. Summary of the methods employed in the present work. Methods are classified by type: wavefunction, embedding, or diagrammatic; the use of one-electron Gaussian basis sets (b), sliced-basis sets (sb), or continuous electron positions (cs); whether a self-consistency procedure is involved; whether the method is deterministic or stochastic in nature; and whether or not the method is variational. The scaling of the computational cost of key pieces of the algorithm is shown, versus the number of electrons (N ), basis set size (M ), etc. In DMRG, D is the bond dimension kept in the calculation. In SBDMRG, No is the number of basis function per slice and D(No) the compressed MPO size. In Green's function methods, nτ is the size of the imaginary time grid. In DMET, N f is the number of atoms in the fragment and D denotes the bond dimension kept by the DMRG solver. The first scaling corresponds to a DMET calculation with a single fragment using translational invariance, while the second corresponds to treating multiple fragments tiling the chain. In SEET, Nimp is the number of impurities, while ne denotes the number of electrons in the impurity and Ms = M A + M b the number of orbitals treated, where M b is the number of bath orbitals. The first scaling corresponds to SEET(CI/GF2), while the second corresponds to SEET(CI/HF). In BDMC, n denotes the diagrammatic order.
In Table I we summarize the methods using several characteristics or criteria. At a high level, the methods can be distinguished by general categories such as wavefunction, embedding, and diagrammatic. Wave-function methods (AFQMC, CC, DMRG, FCI, LR-DMC, MRCI, NEVPT2, and VMC) formulate an ansatz for the ground state, and compute expectation values of observables and correlation functions with respect to the wave function. The ansatz for the wave function can be explicit (as in VMC and most quantum chemistry methods), or reached via an iterative procedure (as in AFQMC, LR-DMC). The accuracy of a wave-function method is determined by the quality of the underlying ansatz (e.g., form of trial wave function in VMC, size of truncated space, order of perturbation) and by approximations (if any) in the realization of the ansatz (e.g., constraints in QMC) and in the evaluation of observables (e.g., non-variational estimators in CC methods, mixed estimate or back-propagation in QMC). Extrapolations in N (and, in many cases, basis set size M ) are needed for wave-function methods.
Embedding methods (DMET, SEET) evaluate the properties of a large system by partitioning it within a basis, e.g. the spatial or energy basis, into a collection of fragments, embedded in a self-consistently determined environment treated at a more approximate level. The accuracy of an embedding method is determined by a combination of several factors including: the size of the fragments, the level of accuracy in the treatment of the embedded fragments and environment, and the level of convergence of the self-consistency loop. Extrapolations in the fragment size (and in the basis set size M if a basis set is used) is needed for embedding methods.
Diagrammatic methods (BDMC, GF2, SC-GW) evaluate, either deterministically (GF2, SC-GW) or stochastically (BDMC), terms in a diagrammatic expansion of a system property. Diagrammatic expansions can be formulated either in a basis or directly in the continuum, and can be applied to finite systems or directly in the thermodynamic limit. GF2 provides an exact selfconsistent evaluation of all second order terms in a perturbative expansion in the interaction; SC-GW evaluates a small subset of diagrams to all orders in perturbation theory via the solution of self-consistent equations; BDMC accounts for higher-order vertex corrections within the skeleton expansion by performing a stochastic sampling of diagram topologies and internal integrations, and is limited to situations where the series converges. (We note an important ambiguity in the formulation of diagrammatic approximations: Hamiltonian terms that are identically zero because of the Pauli exclusion principle give rise to diagrams that may not sum to zero (self-interaction error) in approximations that do not consider all terms at a given order. The effect of these zero-terms on SC-GW results is illustrated in Appendix A 4.) Many other types of classification are possible. For example, one could characterize a method as deterministic or stochastic; whether a basis set is used and if so, what type; whether a self-consistent procedure is involved; whether the computed ground-state energy is variational; etc. Table I lists some of these classifiers, in addition to the computational scaling of key pieces in each algorithm. It is important to note that the classification is only meant as a general guide, and is in many cases fluid. For instance, embedding methods could also be classified by their solver for the embedding fragments, as wave function (DMET) or diagrammatic (SEET). Depending on the particular form of the solver, they could be deterministic (e.g., DMET with a DMRG solver as in the present study) or stochastic (if a QMC solver is used [54]). Various methods are shown as needing a Gaussian basis set, but can also be implemented using other bases (e.g., plane-wave with pseudopotentials for AFQMC, and SC-GW). The choice of basis set can affect the computational scaling. Note also that the meaning of selfconsistency can vary and depend on the type of methods.
In wave-function methods, we have used it to indicate whether a self-consistent procedure is involved, although this can still have ambiguity since there are sometimes multiple ways to obtain the solution. Finally, the scaling reported refers to the canonical implementation of these methods, without any specialized optimizations.

IV. RESULTS FOR THE H10 CHAIN
In this section, we present results on a finite chain of ten atoms. This relatively simple system provides a good intermediate step in the benchmark, as it removes one of the major challenges, namely the approach to the thermodynamic limit. Detailed comparisons can be made as in quantum chemistry, providing insights for the more challenging case of the TDL. We emphasize the approach to the continuous space limit, with extensive studies on the removal of any residual finite basis errors. In Sec. IV A, results in the minimal basis are given, for which exact results from FCI are available for detailed comparison. Final results are presented for the CBS limit in Sec. IV B. Then, in Sec. IV C, we include results using finite basis sets and discuss the approach to the CBS limit.
A. Benchmark in the minimal basis set Figure 1 shows a detailed comparison of the potential energy curve (PEC), E(N = 10, R) vs. R, obtained by a variety of methods, in the minimal STO-6G basis. For all methods, the PEC features a familiar short-range repulsion, due to the combined effect of Coulomb repulsion and Pauli exclusion, followed by a decrease to a minimum value E 0 , attained at the equilibrium bondlength R e . Beyond R e , the PEC monotonically increases to an asymptotic value E ∞ , the ground-state energy of a single H atom. The well depth gives a dissociation energy D e = (E ∞ − E 0 ). Owing to the small size of this chain and the STO-6G basis, the PEC can be calculated using the FCI method, giving the exact values R e = 1.786 a B , The overall agreement between all the many-body methods is quite good. Deviations from FCI are shown in the lower panel of Fig. 1 in a magnified view. We see that the different methods agree with each other and with the exact result at the level of ∼ 2 m E Ha , i.e. about 5% of the dissociation or cohesive energy. The agreement is better for R ≤ 2.0 a B and tends to worsen in many methods as R increases, because electronic correlations become more pronounced, increasing the multireference character of the system.
MRCI and especially MRCI+Q are seen to be uniformly accurate for this system, with discrepancies of µ E Ha or less from FCI. This is also confirmed in the larger cc-pVDZ basis set (see Sec  able. MRCI+Q can be conveniently carried out for even larger basis sets (but not for large N ), and we will use it as our reference in the following subsections for the 10-atom chain. Among the other approaches, AFQMC gives results accurate to within 0.2 m E Ha . Bias from the CP approximation is visible in the intermediate region, where the energy is slightly overestimated, and also at large bondlengths, where there is an underestimation. Coupled cluster methods, especially RCCSD(T), are very accurate near equilibrium. Although it is in principle possible to dissociate the H chain to the correct energy within RCCSD (as a product of dissociated H dimers) there can be multiple CC solutions, and in practice a correctly dissociating solution is hard to find [38]. UCCSD and UCCSD(T) provide accurate results at equilibrium and approach the correct result at longer bond lengths, but have large errors at the intermediate bond-lengths due to spin recoupling.
In embedding methods, extending the number X of embedded atoms (in DMET[X]) or the number Y of impurity orbitals (in SEET-m[Y]) leads to noticeable im-provement. In the minimal basis, DMET[X] uses X impurity orbitals because there is only one orbital per atom. The maximum error in DMET [2] is about 2 m E Ha while DMET [5] is exact by construction. While the SEET-m [4] curve is at a similar level of accuracy to DMET [2], a substantial improvement is obtained within the mixing scheme (SEET-m [6] curves), especially at large bondlengths.
As weak coupling methods, the diagrammatic GF2 and SC-GW methods have difficulties in the strong coupling regime at large bondlengths. Allowing methods to break spin symmetry may lead to an improvement of the energetics. As illustrated with GF2, using an unrestricted reference state provides a better estimate of the ground state energy in that regime but generates a spurious magnetization. Deviations at small distances (corresponding to the weak coupling regime) show that terms beyond the bare second order or screened first order approximation are needed to reach the accuracy of other methods. We also note that the cancellation of self-interaction error in SC-GW is subtle and depends on the treatment of exclusion principle violating terms in the Hamiltonian (see Appendix A 4 d).
B. Potential energy curve in the complete basis set limit In Figure 2, we show the final computed potential energy curves of H 10 in the continuous space (complete basis set) limit, including results obtained from VMC and LR-DMC, which work in continuous space.
For all our methods that require a basis set, we employ the correlation-consistent polarized valence x-zeta (cc-pVxZ) sequence [55], which is designed to include successively larger shells of polarization functions (x = 2, 3, 4, 5 corresponding to D,T,Q,5 respectively). The results are extrapolated to the CBS limit, following procedures described in Sec. IV C with further details given in the Appendix.
Our final results in this system give an equilibrium geometry R e = 1.801(1) and energy E 0 = −0.5665 (1). The computed PECs are tabulated in the Appendix. Deviations from the reference curve are shown in the bottom panel, where the combined uncertainties in the the reference curve (primarily from the extrapolation to the CBS limit) are indicated. Our reference curve for this system was obtained from MRCI+Q, extrapolated to the CBS with basis sets up to x = 5. This is confirmed by a separate extrapolation including an F12 correction, which gave results in agreement within the statistical uncertainties.
Trends similar to the minimal basis results are observed. LR-DMC, which works in continuous electron coordinate space, has only a weak dependence on basis sets originating from the representation of the Slater determinant in the trial wave function (and hence the position of the nodes). LR-DMC provides an upper bound for the ground-state energy. Its quality is determined by the nodal surface of the trial wavefunction. At large bondlength, the nodal structure is simpler, consistent with the more quasi-one-dimensional nature of the system. The LR-DMC results are very accurate in this regime, indicating that the DFT-LDA determinant gives a good description of the nodal structure. The AGP trial wave function allows a more sophisticated, multideterminant description of the many-body nodes. Improvement with the AGP trial wave function is only seen at the smallest bondlength. The excellent consistency between the LR-DMC results and the basis-set methods provides another assurance on the robustness of the approach to the CBS limit in the latter.

C. Reaching the complete basis set limit
For each method which utilizes a basis set, the computational cost grows as the basis size M is increased, in some cases very rapidly. In Fig. 3 we show the PEC yielded by several methods, in the cc-pVxZ bases, taking MRCI+Q as reference. The accuracy of MRCI+Q is further validated by its excellent agreement with DMRG. The general trends seen at the minimal basis level are mostly confirmed with the larger basis sets. Most methods show errors that remain consistent throughout this family of basis sets which, though not surprising, is reassuring. As mentioned, improvements of the results are possible within certain methods, via larger embedding clusters, using better trial wave functions, or going to higher orders. Examples are shown for DMET and SEET; for instance, at the cc-pVDZ level, increasing the number of embedded atoms from DMET [2] to DMET [5] reduces the maximum error from ∼ 2m E Ha to 0.5m E Ha . (In the cc-pVTZ and cc-pVQZ bases, cancellation of errors means that the maximum DMET [2] error is ∼ 1m E Ha .) Since SEET(CI/GF2) works in the energy basis, increasing the number of impurity orbitals results in significant improvements. The mixing scheme, which illustrates the strong correlation present in the active space, recovers results of NEVPT2 quality while solving impurity problems with only 6 orbitals.
In AFQMC, a multideterminant trial wavefunction is used in the dissociation region (last two points, R > 3) as discussed in Sec. A, which improved its accuracy by ∼0.36 m E Ha over that with the UHF trial wave function (shown as half-filled red circles). For CC methods, the improvement from the inclusion of the perturbative triples is systematic and evident. MRCI+Q energies for the shortest bondlengths and large basis sets relied on a correction as discussed in Appendix B.
Increasing the basis set size has a dramatic effect on the total energy, as seen in Fig. 4. The basis set dependence is stronger at short bondlengths, with an energy difference of ∼ 21 m E Ha between cc-pVDZ and CBS, compared to ∼ 3 m E Ha in the bond breaking regime.
The effect of the sliced basis used in the SBDMRG method is also illustrated in the main panel in Fig. 4. At the STO-6G level there are several competing effects which account for the deviations between the sliced basis and standard basis results. In the large R limit, the single basis function of STO-6G poorly describes an isolated hydrogen atom, and the increased flexibility in the chain direction of the sliced basis can partially compensate for this. At shorter distances, the overlapping basis functions between adjacent atoms of standard STO-6G give additional degrees of freedom in the transverse directions, which can improve the energy at both the Hartree-Fock level and in terms of transverse correlations. In contrast, the sliced basis set has nearly ideal resolution in the longitudinal direction. At very short distances, the STO-6G basis becomes nearly linearly dependent while the sliced basis does not and consequently performs significantly better. These complicated competing transverse and longitudinal effects make it unsurprising that the differences between the two energies changes sign as a function of R. For the cc-pVDZ bases, the sliced version is uniformly slightly better, probably because the dominant effect is  its improved longitudinal correlation. We extrapolate the finite basis set results to the CBS limit by standard procedures [56,57], taking care to reach large basis sets. We first fit the RHF energies E RHF,x (N, R) computed at the cc-pVxZ basis set level, to an exponential function The correlation energy is then fitted to a power law: The CBS result is taken as α(N, R) + A(N, R), with a combined uncertainty estimated from the fitting procedures. We find that using UHF as reference gives numerically indistinguishable results, except for very short bondlengths and large sizes, where convergence of UHF shows more sensitivity.
To deal with basis set linear dependence, which becomes relevant at the shortest bondlengths and largest basis sets, we apply a threshold to eigenvalues of the overlap matrix. Threshold values are given under method descriptions in the Appendix.
The final CBS results are verified with a separate set of MRCI+Q calculations augmented by F12 explicit correlation, as illustrated in the inset in Fig. 4. This data is extrapolated using the same procedure as above. That these results are consistent with those from the continuous-space LR-DMC provides a further validation of the procedure.

V. RESULTS IN THE THERMODYNAMIC LIMIT
In this section we present our results in the TDL. The section is structured similarly to the previous one. Results for the minimal basis, which makes the H chain resemble an extended Hubbard model, can be valuable for model studies and are described in some detail in Sec. V A. The final results for the hydrogen chain at the joint continuous space and thermodynamic limits are given in Sec. V B. Sec. V C discusses our procedures and cross-validations for approaching the thermodynamic limit. Basis set dependence of the PEC in H10 and extrapolation to the CBS limit. MRCI+Q and SBDMRG results are shown in the main figure for selected basis sets (to avoid cluttering). The inset shows the extrapolation of the correlation energy (from MRCI+Q) for two sequences of basis sets, cc-pVxZ and with F12, together with LR-DMC, for R = 1.8.

A. Benchmark in the minimal basis set
The minimal basis set hydrogen chain is similar to an extended Hubbard model. As such, the results in this basis provide a quantitative connection to model studies.
The computed EOS is shown in Fig. 5 for the STO-6G basis set. DMRG calculations can be carried out for large system sizes in this basis, and serve as a near-FCI quality benchmark. DMRG results for finite chains, after extrapolation to N → ∞, yield an equilibrium geometry R e = 1.831(3) a B and ground-state energy per atom of In the lower panel of Fig. 5, a detailed comparison is shown using DMRG as a reference. Most methods show similar behaviors as in finite chains. Coupledcluster methods display the same general trends, with RCCSD(T) in particular giving extremely accurate results before the breakdown at larger bondlengths (R > 2 a B ).
AFQMC yields energies accurate to within 0.15m E Ha per particle across the bondlengths.
SEET is extrapolated to the TDL with respect to the chain length, with the number of orbitals treated by an accurate method fixed to 6. With this constraint, SEET(CI/HF)-m(6,SAO) shows accuracy at the thermodynamic limit comparable to the 10-atom chain when FCI is used to treat the impurity and HF to treat environment, with a maximum error of ∼ 1m E Ha . For stretched distances, SEET-m results improve if HF is used instead of GF2 since the latter (SEET(CI/GF2)-m(6,SAO)) results in overcorrelation.
As discussed in Sec. A 3 a, two types of DMET calculations were performed for the minimal basis. DMET [5] -0.52 was from the first type, treating finite chains with fragment size N f = 5, followed by extrapolation of the chain size N similar to the procedure used by most other methods whose results are shown here; this gives a maximum error of ∼ 1m E Ha . DMET[∞] shows results from the second type, which worked directly in the large N limit, and extrapolated the fragment cluster size N f . Details of the extrapolation procedure are given in Sec. A  Our final results for the equation of state of the hydrogen chain are presented in Fig. 6. Detailed numerical data are tabulated and included in the Appendix. For these results, VMC and LR-DMC are extrapolated to the TDL, while basis-set methods are extrapolated to the joint TDL+CBS limit. We carry out extensive self-consistency and cross-checks in order to validate the extrapolations, as discussed in Secs. IV C and V C.
In the bottom panel of Fig. 6, AFQMC results are used as a reference, based on its accuracy from the systems which have been benchmarked. Large system sizes and basis sets can be reached to minimize the uncertainty in the extrapolation to the TDL and CBS limits. We can further quantify the residual systematic errors of the constraint in AFQMC from cross-checks with DMRG, by estimating their difference, E TDL,DMRG (R) − E TDL,AFQMC (R), at the cc-pVDZ basis level. This "correction" can be applied to the AFQMC equation of state at the CBS limit, E TDL,AFQMC (R). The result is shown by the empty circles and dashed lines in Fig. 6, while original AFQMC data are shown with solid circles and lines.
Agreement is seen between these results and that from RCCSD(T), which provides another consistency check. We find an equilibrium bondlength of R eq = 1.859(3) a B with an energy of E 0 = −0.5659(3)E Ha at R eq in the thermodynamic limit. The computed correlation energy, defined with respect to the RHF energy, is shown in the inset of Fig. 6.
At the smallest bondlengths (R ≤ 1.2), there is significant linear dependence in the basis sets. This causes an effective reduction in the size of the basis, which can render the usual ansatz for basis set extrapolations unreliable. We thus avoid performing CBS extrapolations. (All finite-basis data are listed in the appendix and repository. It will be valuable to develop specifically designed basis set sequences or correction methods in this regime.) LR-DMC results are shown, which provide an upper bound for the energy. Based on the results in Sec. IV B, the fixed-node error is estimated to be < 1 m E Ha per atom, which is indicated by the pink error bands on these two points.
LR-DMC results are obtained directly in real space, and provide an independent validation.
At large bondlengths, the fixed-node error in LR-DMC is mini-  Figure 6. Top: Computed equation of state of the hydrogen chain in the thermodynamic limit. The inset shows the corresponding correlation energy per particle. Bottom: Detailed comparison using AFQMC results as reference (LR-DMC for the shortest two bondlengths). The empty circles indicate AFQMC results after a correction is applied from the difference with DMRG at the cc-pVDZ level. The pink error bands indicate all statistical uncertainties. mal, as we have seen in the finite-chain benchmarks. Furthermore, we have performed PBC calculations using LR-DMC to provide a separate extrapolation to the TDL. The excellent agreement between LR-DMC and AFQMC at large R is thus a further indication of the robustness of our procedures for reaching the infinite basis set and thermodynamic limits. Note that the VMC results exhibit a different trend from the corresponding LR-DMC, suggesting that the variational many-body wave function is best at intermediate bondlengths. This is likely a reflection of the balance between the two parts that form the VMC ansatz, namely the LDA Slater determinant and the optimized Jastrow factor. The former becomes more accurate in describing the nodal surface as R in-creases, where the latter is evidently more effective at weaker correlation. Only the determinant part, via the nodes that it defines, impacts the DMC results.
The DMET [2] results provide an example of an embedding calculation at the thermodynamic and complete basis set limits, with a modest impurity size. The limitation on the impurity size is from the use of a DMRG impurity solver, which becomes expensive in the large basis set limit.
We comment that various correction schemes can be applied to our finite-basis and/or finite-size data to provide additional estimates from methods not included in Fig. 6. For example, a residual basis set correction could be obtained either from a different method or using a lower order theory (if available) of the same method, and applied to DZ or TZ basis results to estimate the CBS limit. These can be readily retrieved for assessment from the detailed data provided in the appendices and supplementary materials.

C. Reaching the complete basis set and thermodynamic limits
A key challenge in the ab initio computation of bulk materials is to remove finite-size and finite-basis effects so as to obtain results for the continuous and thermodynamic limits. This is important in order to make reliable predictions about materials properties and allow direct comparisons with experiments. Various choices exist in the calculation. These can be at the level of the type of many-body methods, for example the use of particular embedding approaches (versus those that treat a cluster only, whether finite or periodic), or the use of coordinate space methods like DMC versus basis set methods. They can also be common to classes of methods and decoupled from and independent of the details of the underlying many-body methods, for example the use of periodic supercells versus finite clusters, or the choice of basis sets, etc. By employing many state-of-the-art methods, we are able to investigate these factors extensively and with great care in the present work.
Many of our calculations are performed using OBC, i.e., treating a finite chain. We find that, somewhat surprisingly, OBC calculations show faster convergence to the TDL than PBC in the hydrogen chain for all but the smallest few bondlengths (see Appendix C 2). To extrapolate the finite-N results to the TDL, we assume that the PEC has the following size dependence: where k is a small integer. For k = 1, this gives the subtraction trick based on a division of surface and bulk terms, (omitting R), which has been used, for example, in DMRG calculations before [58]. In this work we typically used k = 2, employing N = 10, 30, 50 and, when necessary, N = 18, 22, 70 and 102. Under this choice, there are still multiple strategies for finite basis set methods to approach the combined limits. One could extrapolate to the CBS limit for each finite chain of fixed N following the procedure described in Sec. IV C, and then extrapolate the results in N to the TDL. Alternatively, one could extrapolate each basis set to the TDL, and then extrapolate to the CBS limit, or use a joint ansatz and extrapolate both simultaneously. As illustrated in Fig. 7, exchanging the order of the extrapolation leads to consistent and robust results.
With the exception of the minimal basis, the TDL extrapolation for the DMET data is performed using the same OBC size-dependence described above. In the minimal basis, we also carried out DMET calculations directly in the TDL, as mentioned earlier. Additional details on this extrapolation scheme and a comparison of the two is discussed briefly in the appendix. Calculations in PBC (including those with BDMC, which used a ring geometry, and LR-DMC) are extrapolated to the TDL using the form E(N, R) = A 0 (R) + A 2 (R) N −2 [59], and statistical error bars are propagated following standard procedures in the extrapolations.

VI. CONCLUSIONS
We have presented a comprehensive investigation of the hydrogen chain, deploying a vast suite of cutting-edge many-body numerical methodologies and obtaining a detailed and quantitative understanding of current computational capabilities for treating correlated quantum materials. We have shown how finite-size effects and finite spatial or other basis set resolutions can be systematically removed to reach the physically relevant infinite system size and complete basis set limits.
Through the synergistic use of complementary methods, we have accurately determined the ground-state energy as a function of interatomic distance. This serves as a proof of concept for a new mode of attack on correlated materials by ab initio calculations. The benchmark results will provide a reference on the state of the art in many-body computation of real materials.
Our study captures many of the salient features of predictive computations in real materials. The ability of each many-body method to correctly capture important physical properties will depend on the material system under study. For example perturbative or diagrammatic methods can have better or worse accuracy in systems with different amount of electron correlation, the qualities of the constraints on the sign problem in QMC methods can vary with the physical nature of the problem, the rate of convergence with the fragments or the requirement on the impurity solver in embedding methods can differ from material to material, the scaling and computational feasibility of DMRG can change, etc. More benchmark studies of this kind will be highly desirable to broaden the understanding and identify further limitations as well as opportunities of development.
The computational cost of each method depends on various factors, including the degree to which the algorithm and codes have been optimized, the level to which one wishes to take the calculation (the order in perturbative or diagrammatic methods, or the statistical accuracy in Monte Carlo methods), etc. The results in this benchmark were obtained with moderate computing (order of days on platforms ranging from local clusters to medium-sized supercomputers). The computational scaling, which is summarized in Sec. III, together with the corresponding accuracy achieved by each method in the benchmark, will provide a rule of thumb on their computational cost.
The benchmark results indicate that many of the methods tested here are capable of reaching an accuracy of five percent of the cohesive energy or better, across wide parameter regimes of strong electron correlation. A subset of these methods predict the equation of state systematically to sub-milli-Hartree accuracy. Further development may turn these into post-DFT methods of choice for ground-state studies, and a concerted effort to build open-source codes will be invaluable. Other techniques can more naturally address dynamical and thermodynamical properties, many of which are the outcome of recent research. Continued development along these lines will further improve their accuracy and time to solution. Further benchmark studies of dynamical and thermal effects, building on the work done here on the equation of state, would also be very desirable.
It is important to continue to expand the benchmark studies to more complex materials. Even in this relatively simple system of the hydrogen chain, important questions remain on the physics which are of strong in-terest and relevance to some of the key issues in strongly correlated systems in general. For example, how does the nature of the charge and magnetic orders vary with the bondlength? We are presently investigating these and related questions.
In this Appendix, we provide further descriptions of the individual methods used. Following the main text, we group methods into the following categories: deterministic wavefunction (CC, DMRG, MRCI, NEVPT2), stochastic wavefunction (AFQMC, LRDMC, VMC), embedding (DMET, SEET) and diagrammatic (SC-GW, GF2, BDMC). The categories are by no means rigid, as a method can fit into multiple groups; they are meant to provide a general guide and help with organization of the discussions.

Deterministic wave function methods
Deterministic wavefunction methods (HF, CC, DMRG, FCI, MRCI, NEVPT2) range in quality between the mean-field HF and the exact FCI. Correspondingly, their computational costs vary a great deal. These methods rely on different types of ansatz, the nature of which is ultimately responsible for their accuracy and computational cost.

a. Coupled Cluster (CC)
Coupled cluster (CC) theory [38][39][40] is widely applied, often providing accurate and systematically-improvable ground state energies when systems are neither too large nor too strongly correlated. The CC wavefunction is written as where |0 is a single-determinant reference state, and the cluster operator is given bŷ wheret µ creates an excited determinant |µ containing µ particle-hole pairs relative to |0 , with amplitude t µ . Standard CC theory constructs a similarity-transformed HamiltonianĤ and the energy and amplitudes t µ are obtained by solving the Schrödinger equation projectively In other words, CC theory diagonalizes the similarity transformed Hamiltonian in the space spanned by the mean field reference and excitation manifold. Becausê T is an excitation operator, the commutator expansion used to evaluate the amplitudes in A4 terminates after four nested commutators for all values of µ, because the Hamiltonian contains only one-and two-particle terms.
In this work, we limitT to µ ≤ 2, i.e. CC with single and double excitations (CCSD) on restricted Hartree-Fock (RCCSD) and unrestricted Hartree-Fock (UCCSD) references. [60,61] Since the quality of the reference determines the quality of the CC wavefunction, a reference obtained from a symmetry-broken mean field can be critical to getting good CC energies when systems become strongly correlated. When multiple Hartree-Fock solutions exist, the question of which one to use as a reference for the CC calculations can be subtle. In general, our CC calculations are performed on the lowest-energy Hartree-Fock determinants we could find. In some calculations, we have also perturbatively included triple-excitation effects, denoted CCSD(T) [62]. All CC calculations shown here are widely available in standard quantum chemistry packages [63]. To carry out the larger calculations, we used the high-performance implementation in the PySCF package. This uses an AO-driven implementation to reduce the IO costs associated with accessing integrals on disk [64]. We carried out RCCSD and RCCSD(T) calculations for H 30 with cc-pVQZ and cc-pV5Z basis and H 50 with cc-pVTZ and cc-pVQZ basis with this implementation. Even with the AO-driven technique, the RCCSD calculation for H 50 in the cc-pV5Z basis would require at least 4 TB of disk space. Although technically feasible, we did not perform it here. For the largest basis sets cc-pVTZ, cc-pVQZ and cc-pV5Z at geometries R < 1.6 a B and cc-pVDZ at R < 1.4 a B , the Gaussian basis was nearly linearly dependent. We removed linearly dependent vectors with an overlap threshold of 10 −7 . Although a smaller threshold could be used in the Hartree-Fock calculation, changing the energy by ∼ 10 −6 E Ha per atom, we found the resulting CCSD calculations with smaller thresholds to be numerically unstable.

b. Density Matrix Renormalization Group (DMRG, SBDMRG)
DMRG (density matrix renormalization group) is a low-entanglement wavefunction approximation [23]. The wavefunction can be written as where M denotes the number of orbitals in the system. Each A n is a D × D matrix of real numbers associated with a single-particle basis function, except for the boundary terms A n1 and A nM , which are length-D vectors. D denotes the bond dimension and controls the accuracy of the simulation; as D increases, the wavefunction converges to the exact correlated state. In linear systems such as the hydrogen chains considered here, provided that the gap of the system does not close, the bond dimension required for a given accuracy per atom stays close to a constant, independent of the number of atoms. The energy of the wavefunction may be stably computed and variationally optimized through the DMRG sweep algorithm.
In this work we considered two different kinds of singleparticle basis functions in the DMRG calculations. In the standard quantum chemistry formulation of DMRG [24][25][26], the single-particle basis is simply an orthogonal basis in the space of Gaussian orbitals of the system. This is what we will refer to as DMRG in the calculations in this work, and details can be found in standard references to DMRG in the quantum chemistry literature [65]. We carried out Gaussian based DMRG calculations using the implementation in the Block code, with the standard settings described in Ref. [66]. DMRG energies were computed for H 10 (STO-6G, cc-pVDZ, cc-pVTZ bases), H 30 (STO-6G, cc-pVDZ), and H 50 (STO-6G), up to a maximum bond dimension of D = 2000. For the STO-6G basis, the DMRG single-particle basis was the set of symmetrically orthogonalized AO orbitals. For all the other bases, we used split-localized molecular orbitals. The split-localized orbitals were ordered by the exchange Fiedler vector [66,67]. The estimated maximum uncertainty in the Gaussian based DMRG energies is less than 0.05 m E Ha per atom.
In addition, we have also introduced the sliced basis DMRG method (SBDMRG). Here, instead of a 3D Gaussian basis, one uses a grid in one direction (z), while using a Gaussian-derived basis in the two transverse directions (x,y). Formally, one can write the basis as φ jn (x, y, z) = ϕ jn (x, y) δ where n labels grid points along the z direction (with grid spacing a) and j labels the transverse basis function at that grid point (or "slice"). The 1 2 power on the Dirac delta function indicates that the basis functions are square-normalized. The kinetic energy terms in the Hamiltonian are approximated with a fourth order finite-difference second derivative approximation. For the data presented in Section V B, we used a grid spacing of a = 0.1, for which we estimate an error of about 0.1 m E Ha per atom. The transverse basis functions ϕ jn (x, y) are derived from a standard Gaussian, atomcentered basis set. Functions from the standard basis are projected onto each slice, then these functions are orthogonalized, keeping the most significant ones up to as many as the number of contracted orbitals on each atom. Compared to the original basis, a sliced basis has approximately the same transverse resolution, but its resolution in the z direction is essentially perfect. Thus energies in the sliced basis are generally lower than in the original basis, due to the improved z correlation. The key advantage of SBDMRG over the standard quantum chemistry formulation of DMRG is that the local support of the basis functions along the grid direction makes the number of Hamiltonian terms proportional to M 2 . Using matrix product operator compression techniques, which are quite simple to apply in the sliced basis formulation, the cost of SBDMRG is further reduced to M , which is the same scaling as applying DMRG to the 1D Hubbard model. For more details see the recent paper Ref. 42.

c. Multireference configuration interaction (MRCI)
MRCI (multireference configuration interaction) is a method that incorporates 1-and 2-external excitations on top of an active space wavefunction. It is a commonly used method for high accuracy simulations of multireference electronic structure in small molecules. Here we use a variant of internally contracted MRCI described by Werner and Knowles [35,36], implemented in the Molpro package [68]. We start with a CASSCF (complete active space self-consistent-field) wavefunction |Ψ 0 . The variational ansatz is then where |Ψ 0 is the CASSCF wavefunction, |Ψ I is a configuration state function (CSF) with a single external orbital, andÊ a i is the spin-summed excitation operator, σâ † aσâ iσ . The parameters c 0 , c I , and c ab ij are determined variationally.
We also considered the explicitly correlated (F12) MRCI approximation [69][70][71]. Explicit correlation accelerates convergence to the complete basis set limit by introducing 2-external amplitudes with explicit r 12 dependent functions. The associated integrals are computed through an auxiliary Gaussian basis. In this work, we used the default F12 settings and auxiliary bases in Molpro, including the singles corrections in the complementary auxiliary basis set space.
The MRCI wavefunction does not give an extensive energy. Defining the MRCI correlation energy as ∆E = Ψ|Ĥ|Ψ − Ψ 0 |Ĥ|Ψ 0 , we define the approximate sizeextensive correlation energy (Q) through the scaling ∆E → ∆E(1 − c 2 0 )/c 2 0 . With the above techniques, we computed MRCI+Q and MRCI-F12+Q wavefunctions and energies for H 10 in the STO-6G and cc-pVxZ (x=2-5) bases, using a (10, 10) CASSCF initial state. d. N -electron valence state perturbation theory N -electron valence state perturbation theory (NEVPT) [37] is a multireference 2nd order perturbation theory which is size-extensive and free of intruder state problems. It uses a zeroth order CASSCF wavefunction and Dyall's Hamiltonian [72] as the zeroth order Hamiltonian. From this starting point, the 1st order wavefunction and 2nd order energy are defined using the usual Rayleigh-Schrödinger perturbation theory. Typically the 1st order equation is not solved exactly, but rather in a restricted variational space.
In partially-contracted NEVPT2 (PC-NEVPT2), the 1st order wavefunction is expanded in the space of 1-and 2-external excitation operators acting on the ground-state wavefunction.
In strongly-contracted NEVPT2 (SC-NEVPT2), the expansion space of the 1st order wavefunction is restricted further such that the amplitudes can be determined without solving any linear equations, and simply from expectation values of the zeroth order wavefunction.
For the H 10 chain, we computed SC-NEVPT2 and PC-NEVPT2 using the Molpro package, starting from a (10, 10) CASSCF zeroth order state. For H 30 we carried out DMRG-SC-NEVPT2 calculations [73] using the PySCF package, starting from a (30, 30) DMRG-CASSCF zeroth order state computed with splitlocalized orbitals, using the Block package. The basis linear dependency threshold was set to 10 −8 . The DMRG-CASSCF calculation was carried out with bond dimension D = 1000, leading to an estimated energy error of less than 0.1m E Ha . The zeroth order wavefunction was constructed by compressing the DMRG-CASSCF wavefunction down to bond dimension D = 500, with a compression error in the total energy of less than 0.3m E Ha except at the shortest geometry, R = 1.0 a B , where it was 10m E Ha . The semi-internal components of the DMRG-SC-NEVPT2 wavefunction and energy were approximated using the MPS compression scheme described in [74,75], with a first order wavefunction bond dimension of D = 1500; these contributions were determined with an estimated accuracy of 0.1m E Ha .

Stochastic wave function methods
Stochastic wavefunction methods (AFQMC, VMC, LRDMC) rely on Monte Carlo sampling to construct an ansatz for the ground state of the system and compute expectation values of observables. AFQMC and LRDMC are both based on mapping the imaginary-time evolution onto a random walk. AFQMC is formulated in non-orthogonal Slater determinant space. LRDMC (and VMC) conducts the random walk in coordinate space. The fermion sign problem has different manifestations in the different manifolds, and the constraints to control them lead to different approximations.

a. Auxiliary-field quantum Monte Carlo (AFQMC)
The auxiliary-field quantum Monte Carlo (AFQMC) is a wavefunction method, which estimates the groundstate properties of a many-fermion system by statistically sampling the wavefunction e −βĤ |Ψ 0 ∝ |Ψ β → |Ψ G , where Ψ 0 is an initial wavefunction, non-orthogonal to the ground state Ψ G ofĤ [15,16]. For sufficiently large β, expectation values computed over Ψ β gives groundstate averages. AFQMC projects Ψ 0 towards Ψ G iteratively, writing e −βĤ = (e −δβĤ ) n where δβ = β n is a small imaginary-time step. The propagator is represented as e −δβĤ = dx p(x)B(x), whereB(x) is a mean-field propagator of the form of an exponential of one-body operators that are dependent on the vector x, and p(x) a probability distribution [76,77]. This representation maps the original interacting system onto an ensemble of non-interacting systems subject to a fluctuating potential. The imaginary-time projection can be realized as an open-ended random walk over the auxiliary-field (i.e., mean-field potential) configurations [15,16]. Sampling the trajectories of the random walk leads to a stochastic representation of Ψ β as an ensemble of Slater determinants. For general two-body interactions AFQMC has a sign/phase problem, which is controlled by a phaseless gauge constraint (CP) on the Slater determinants using a trial wavefunction Ψ T [16,78]. (For Hamiltonians that satisfy certain symmetry properties, e.g. the Hubbard model at half-filling, AFQMC is free of the sign problem). The trial wavefunction is typically taken from Hartree-Fock or DFT. A self-consistent constraint is possible [79] but is not used in this work. The accuracy of the CP AFQMC was extensively benchmarked in both real materials [5,80,81] and lattice models [2,79]. The CP AFQMC provides an alternative and complementary way to addressing the sign problem with respect to fixed-node DMC. The random walks take place in the overcomplete manifold of Slater determinants, in which fermion antisymmetry is by construction maintained in each walker. Applications have indicated that often this reduces the severity of the sign problem and, as a result, the phaseless approximation has weaker reliance on the trial wave function [82].
For ab initio materials computations, AFQMC can be carried out using either a plane-wave basis and pseudopotentials [16,83], localized basis sets such as standard Gaussian type orbitals [84], or general basis sets using DFT orbitals [85]. In this work, we apply AFQMC implemented for Gaussian basis sets to finite chains. In all our calculations we use a linear dependence threshold of 10 −7 in the one-electron basis. The two-body matrix elements v pqrs are decoupled into bilinear form with the modified Cholesky approach using a tolerance of δ ≤ 10 −5 [86]. Results are extrapolated to the TDL and CBS limits. The total projection time is typically β = 80 E Ha , although calculations with β = 220 E Ha were performed in some cases. The convergence error from the use of a finite β is negligible. Most calculations used δβ = 0.005 E Ha −1 . Extrapolations were performed where the associated Trotter error was greater than the other uncertainties. The reported QMC error bars are estimated as one standard deviation statistical errors. The CP bias leads to a non-variational estimator of the ground-state energy. In our calculations, the UHF ground state was taken to be Ψ T . For the two largest bondlengths (R = 3.2 and 3.6), motivated by the analogy between the H chain and the Heisenberg model, we employed in this work a new form of trial wavefunc-tion, a linear combination of spinon excitations on top of the UHF state, |Ψ T = s k=0 i1<···<i k C i1...i k |Ψ i1...i k . Spinon excitations Ψ i1...i k are constructed using atomic positions with antiferromagnetic spin ordering, and k pairs (i 1 , i 1 +1) . . . (i k , i k +1) of adjacent spins flipped, as initial conditions for the UHF self-consistence procedure, and coefficients C i1...i k are finally optimized variationally. We used k = 1, 2 for R = 3.2, 3.6 respectively.

b. Variational and Lattice-Regularized Diffusion Monte
Carlo (LR-DMC, VMC) The LR-DMC is a projection method [33] that uses the lattice regularization for applying the imaginary time propagator exp(−βĤ) to a trial function Ψ T defined in the continuous space. In this work, improvements including new use of localized basis sets were introduced, which drastically improved accuracy over previous results [10]. The main approximation is to write the Laplacian by means of its dicretized expression on a lattice with a grid with lattice space a, e.g. for a single electron wavefunction depending only on one variable x: ). The extension of ∇ a to higher dimensions is straightforward and the continuous space can be sampled ergodically even for finite a > 0 with a much simpler algorithm than the original proposal [33], namely by randomizing the directions along which the Laplacian is discretized [87]. For a → 0 and β → ∞ the exact ground state wavefunction Ψ G can be obtained if Ψ T |Ψ G = 0. However, due to the fermion "sign problem", an approximation is employed to achieve small and controlled statistical errors: it is required that, during the projection, the "sign" of the propagated wavefunction is constrained to the one of the chosen trial function: for any electronic configuration x where the spins and the electron positions are defined. For a → 0 the results coincide with the standard Fixed-Node approximation introduced long time ago [34]. This scheme is usually employed within the diffusion short time (∆) approximation of the propagator, in a way that in the ∆ → 0 limit, by applying it β ∆ times, the exact Fixed-Node projection scheme (A8) is recovered, with the well established variational property on the estimated energy. In this work we have used the lattice regularization, just because it is more conveniently implemented in the TurboRVB package [88], and also because the extrapolations for a → 0 are very well behaved and easily controlled in an automatic way.
This method is very weakly dependent on the dimension of the basis set chosen to represent the nodes of the Slater determinant, and we have verified that a negligible error in the DMC energy is obtained by using the standard cc-pVTZ basis where the largest Z 1s (Z 1s = 33.87) is removed. Indeed too large exponents are also not necessary in this approach because the cusp conditions are fulfilled by a one-body Jastrow factor of the type: This one body Jastrow is included also in the GTO basis set for the DFT (LDA) calculation. The DFT is also defined (within the TurboRVB package [88]) on a mesh of lattice space a = 0.1 or smaller, until convergence is reached within 0.001E Ha in the total energy. The use of the one body Jastrow factor drastically improves the convergence for a → 0 and the quality of the basis set as the DFT energy is much lower than the standard one in the original cc-pVTZ basis. For R < 1.4 we found that the cc-pVDZ basis can be significantly improved by adding p diffusive Gaussian orbitals with small exponents (Z 1p = 0.2, Z 2p = 0.05), allowing us to obtain the best variational LR-DMC estimates for R = 1.0 and R = 1.2, even better than with the larger cc-pVTZ basis. Within periodic boundary conditions in the direction of the chain, assumed to be along the z direction, we use a supercell of dimension L x × L x × L z with L x = L y = 40 a B , that is large enough for safely neglecting the interaction between the periodic images in the x, y directions (error less than 0.0001E Ha per atom). Moreover the basis set (standard for open systems) has been periodized according to the standard procedure described in [89], when PBC are applied. Before the application of the LR-DMC algorithm, a more accurate Jastrow factor is used to define the trial function Ψ T . This contains the so-called two-, three-and four-body contributions that are expanded in a localized basis different from the determinant one. All these terms are efficiently optimized, using the scheme described in [90]. Since the Jastrow is not affecting the results for a → 0 presented here, we do not describe in details its form and the standard optimization methods used [91].

Embedding theories
Quantum embedding theories (DMET, SEET) are based on the idea of combining two different types of quantum calculations: high-level calculations on one or more active regions of interest, called fragments, and lowlevel calculations on the environment surrounding fragments. In various methods, these fragments can be chosen either in the energy or in the local basis.
A quantum embedding theory determines the coupling between fragments and environment self-consistently, using a variable of interest to provide for feedback. DMET and SEET respectively use the one-particle density matrix and the self-energy as variables of interest. DMET [20,21] provides a framework to approximate expectation values of a large system from embedded calculations. A mean-field wavefunction Φ over the full system is used to define the embedding of a fragment defined in terms of a set of L local orbitals. The embedding of Φ splits the occupied and virtual orbitals into ones with and without weight in the local fragment. The occupied and virtual set with weight in the fragment fully span the L local orbitals used to define the embedding, as well as an additional set of (at most L) bath orbitals. In DMET, a high level calculation is carried out in the fragment + bath orbital space; as the size of the embedded fragment L approaches that of the full system, the resulting DMET energy converges to that of a high level calculation on the full system. While DMET can be used to study finite systems, it provides a natural framework to study systems directly in the TDL. In this work, we have used different DMET strategies for calculations in finite chains and in the TDL; we refer the reader to Section B3 for further details about the latter.
As outlined above, the splitting of the full system into fragments requires the introduction of a set of local orbitals. In this work, we use intrinsic atomic orbitals (IAOs) [92] to define the local basis in the valence space. In our finite chain calculations, we split the system into fragments of x atoms by considering the corresponding local valence orbitals. These local orbitals, through the embedding construction, generate a set of bath orbitals. To this embedding (fragment + bath) space, we further add a set of local virtual orbitals, built as projected atomic orbitals (PAOs), on the constituent atoms. We use the acronym DMET[X] for calculations using fragments of size X= 2, 5.
Expectation values in DMET (such as the energy and particle number) are computed by partial traces [93] (using the local orbitals in a given fragment) of the contraction of integrals with density matrices. A self-consistency loop can be used to uniquely define Φ [20,93]. In this work, however, we use the RHF wavefunction without further optimization. A global chemical potential is used to control the total number of electrons in the system.
Our DMET results are reported using FCI as a solver for STO-6G basis calculations and larger basis calculations with fragments of size 2. Other calculation use DMRG as a solver using a bond dimension of D = 1000. The error due to the DMRG solver is expected to be significantly smaller than the error due to the fragment sizes considered.
We have introduced new schemes for basis set and TDL extrapolations in this work, further details for which are provided in Section C 2. b. Self-energy embedding theory (SEET) The self-energy embedding theory (SEET) [46][47][48][49][50] relies on the assumption that orbitals in the system can be separated into S different intersecting or non-intersecting subsets A i , each containing M A i orbitals, while M R orbitals are contained in the the remainder R such that M A i ≪ M for each i. In SEET, orbitals within subsets are strongly correlated and treated non-perturbatively; on the other hand, inter-subset correlations can be treated either perturbatively or non-perturbatively. In a case of a perturbative treatment, the inter-set correlations between two different orbital sets A i and A j , where i = j, or correlations in the remainder R are assumed to be weaker. Various ways of choosing the orbital subsets are possible. New versions implemented in this work have lead to much improved accuracy. We employ a selection based on the occupancies of natural orbitals (NOs) as well one based on the spatial locality of symmetrically orthogonalized atomic orbitals (SAOs), and localized molecular orbitals (LMOs). For details concerning each procedure see [48,50].
In SEET, the solution of the whole physical system is approximated by an affordable but frequently not so accurate Φ-derivable method suitable for treating weakly correlated systems. Subsequently, this approximation is corrected within chosen strongly correlated orbital subspaces by a non-perturbative method. We have demonstrated that the general SEET functional can be written as where the contributions with ± signs are used to account correctly for the possible double counting, for details see [50]. In this paper, Φ tot weak is determined from GF2 or HF. In general, other choices such as the GW method [94] are also possible. Φ Ai i stands for all those terms in Φ with all four indices i, j, k, l of two-body interactions v ijkl contained inside orbital subspace A i . Here, Φ Ai weak is the solution for subset A i within the weak-coupling method, here GF2. Φ Ai strong is the solution in the M A i subspace evaluated using a higher-order method suitable for treating "strong correlation". We denote this way of performing SEET calculations as SEET(method strong/method weak)-m([M A o]/basis) since here self-energies from intersecting orbital subspaces with M A strongly correlated orbitals are "mixed" between each other. While in a general case, the self-energy has to be evaluated for M M A orbital subgroups, where M M A can be a fairly large number, in practice one can quite trivially reduce it by identifying most important subgroups containing M A orbitals that lead to the significant lowering of the ground state energy, see [50].
Since calculations can be performed either in the energy or spatial basis employing NOs or SAOs and LMOs, respectively, as basis functions, we denote these choices using the "basis" keyword, where basis=NO, SAO, or LMO.
In SEET, the self-energy is constructed as a functional derivative of the Φ SEET -functional and the total SEET self-energy contains diagrams from both the 'strong' and 'weak' coupling methods.
Consequently, each strongly correlated subspace selfenergy is embedded into a weakly correlated self-energy generated by all orbitals outside the strongly correlated subspace and accounting for all the non-local interactions on the strongly correlated orbital groups [95]. For details explaining how to evaluate Σ Ai strong and Σ Ai weak , we refer the reader to [46,49].
We converged the electronic energy to 10 −4 E Ha . The inverse temperature β was set at 100 E Ha depending on the geometry. The Matsubara freqency grid was generated using the splines interpolation [96] with the maximum number of points varying between 20,000 and 50,000. In this paper, for the weakly correlated method in SEET, we used GF2 as well as HF. The strongly correlated part of the SEET self-energy was evaluated from the Anderson Impurity model using FCI or versions of restricted active space CI (RASCI). Note that in all the calculations presented here, SEET is based on the RHF reference.

Diagrammatic methods
The diagrammatic methods discussed in this work (SC-GW, GF2, BDMC) evaluate, either deterministically (GF2, SC-GW) or stochastically (BDMC), a subset of the terms in a diagrammatic interaction expansion. The methods are distinguished by the subsets of diagrams or series terms that are included in the calculation. The diagrammatic methods discussed in this work are based on the Feynman diagrammatic technique formulated in terms of self-consistent propagators and bare or renormalized interactions [97]. They are formulated at finite temperature but evaluated at low enough temperature that the T → 0 limit can be taken. Many of the methods were implemented specifically for the present study. The BDMC n calculations are the first attempt to use highorder skeleton diagrams in materials, to systematically improve GW. a. Self-consistent second-order Green's function theory (GF2) The fully self-consistent second order Green's function theory (GF2) [27][28][29][30][31][32] includes all second-order skeleton diagrams dressed with the renormalized second-order propagators and bare interactions. GF2 is formulated as a low-level approximation to the exact Luttinger-Ward (LW) functional [43,98] and therefore is Φ-derivable, thermodynamically consistent, and conserving [43,99].
We solve all the non-linear equations self-consistently at non-zero temperature using on an imaginary-time mesh [96,100]. At each iteration, the self-energy, Green's function, and Fock matrix are updated until convergence is reached, so that the converged solution is referenceindependent.
In the weakly correlated regime, GF2 preserves the advantages of the second-order Møller-Plesset perturbation theory (MP2), while at the same time avoiding the divergences appearing in non-self-consistent zero-temperature formulations of finite-order perturbation theories.
We converged the electronic energy to the threshold of 10 −6 E Ha . The inverse temperature β was set at 100 E Ha −1 or 200 E Ha −1 depending on the geometry. The Matsubara freqency grid was generated using splines interpolation [96] with between 20,000 and 50,000 points on the Matsubara axis. The calculations presented here are based on an RHF or UHF reference. We use GF2 to denote the version that is based on RHF and does not allow for spin symmetry breaking. The acronym UGF2 is used to denote for a spin unrestricted version based on UHF.

b. Diagrammatic methods with renormalized interactions
The BDMC and SC-GW methods are diagrammatic approximations formulated in terms of renormalized propagators G and renormalized ("screened") interactions W . They can be written as approximations to the Luttinger-Ward (LW) functional Φ [43,98], which implies that they are thermodynamically consistent and conserving [43,99]. The methods require the self-consistent determination of propagators G, self-energies Σ, screened interactions W , and polarizations P . While the expressions for Σ and P are different in the individual methods, the Dyson equations determine G and W , where G 0 and V are the bare electronic propagator and interaction. c. Self-consistent GW (SC-GW) The self-consistent GW (SC-GW) approximation truncates the skeleton sequence at the lowest-order graph, so that only the first-order contribution in the renormalized interaction is considered and the second-order exchange diagram is neglected. We have implemented a deterministic procedure of this approximation closely following Refs. [27,[43][44][45]: The Green's function is initialized using the Hartree-Fock (HF) approximation result. We then construct the polarization P = GG and obtain W from Eq. (A11). After computing the GW selfenergy Σ = −GW , we obtain the updated G by solving Dyson's equation, thus closing the self-consistency loop.
The method is formulated in a grand canonical ensemble, but the chemical potential µ used in each step is chosen to preserve the desired electron number. After convergence, the total energy is computed from Σ and G.
While SC-GW benefits from the conservation of the average particle number, energy, momentum and angular momentum, the size and complexity of W call for appropriate controlled simplifications. Instead of introducing physically motivated approximations that may not respect the conserving properties, we perform systematic linear algebra decompositions and truncations on V , W and G which vastly reduce the numerical effort [101]. We converge our calculations to a relative precision of 10 −7 . Convergence is reached at inverse temperature β = 100 E −1 Ha , and for about 8000 Matsubara frequencies. A combination of power and uniform meshes, adaptive grids, and spline grids is used for imaginary time and Matsubara data [96]. The code is based on the ALPS libraries [102] and uses integrals generated by LIBINT [103].

d. Treatment of zero-terms in the Hamiltonian
We note an important ambiguity in formulating certain diagrammatic approximations: Hamiltonian terms that are identically zero because of the Pauli principle, and thus have no observable effect on the physical properties of the system, can be arbitrarily added to a Hamiltonian. While these terms will evaluate to zero in the exact solution, they may evaluate to non-zero values in approximations that do not consider all terms at a given order. The GW approximation is such an approximation, whereas the GF2 approximation does not suffer from this problem.
To illustrate the point, consider an ideal spin-polarized lattice Fermi gas and add a contact interaction term The system remains non-interacting, and, correspondingly, in the diagrammatic expansion based on the bare interaction vertex U , all diagrams of the same order cancel each other exactly. However, low-order self-consistent theories like GW may break this cancellation by including some, but not all, higher-order contributions in U . As a result, a low-order self-consistent method would produce different answers for Eq. (A12) with U = 0 and U = 0. When the interaction Hamiltonian is projected on the orbital basis, similar considerations apply to all terms that create or delete two electrons in the same state, meaning that one has a choice of keeping or dropping Hamiltonian terms based on matrix elements v abad and v bada ; in what follows we will call them "zero-terms". In calculations with realistic Hamiltonians, these terms are usually kept, and we will refer to this choice as Hamil-tonianĤ. In contrast, the lattice model Hamiltonian  community usually omits zero-terms explicitly (or nullify the corresponding matrix elements); we will refer to this choice as HamiltonianĤ ′ . In an exact solution of the problem, all physical properties ofĤ andĤ ′ are identical.
One question we answer in this work is how the SC-GW results depend on the Hamiltonian representation with respect to "zero-terms". The two curves labeled as SC-GW(Ĥ) and SC-GW(Ĥ ′ ) in Fig. 8 correspond to the outcomes of the SC-GW method when it is applied toĤ andĤ ′ , respectively. The effect of zero-terms is profound. While the SC-GW(Ĥ) curve at higher energy is more accurate for separations up to the equilibrium distance, it becomes less accurate than the SC-GW(Ĥ ′ ) curve at R > 2.4 a B , and SC-GW(Ĥ ′ ) appears to produce more consistent energy differences at the large separation range.
Given that the two SC-GW answers surround the variational estimate, the difference between them can be used as a crude estimate of the accuracy of the SC-GW approximation. This point of view is confirmed by our study of vertex corrections. When the second-order vertex corrections are accounted for, the result for Hamil-tonianĤ ′ shifts upwards by an amount comparable to the difference between the SC-GW(Ĥ) and SC-GW(Ĥ ′ ) curves. The BDMC result starts converging to the best variational estimate when higher-order corrections are included, as shown in Fig. 8. We have also developed a stochastic implementation of the G 2 W formalism that is able to go beyond the lowestorder diagrams. Within the bold diagrammatic Monte Carlo framework (see e.g. [17][18][19]), the configuration space of skeleton diagrams for Φ is sampled stochastically starting from vertex corrections to SC-GW. The method can be applied to any system at non-zero temperature with arbitrary dispersion relation (doped and undoped) and with arbitrary shape of the interaction potential [17][18][19][104][105][106]. Both Σ and P are computed as sums of skeleton graphs, up to order n; we denote these sums as Σ n and P n , and abbreviate the corresponding level of approximation as BDMC n . The lowest-order contributions to Σ 1 and P 1 are based on products of the G and W functions mentioned above, and BDMC 1 is identical to SC-GW. To obtain final answers we either perform an extrapolation to the n → ∞ limit, or observe good convergence with increasing the diagram order. (This was demonstrated for several Coulomb systems in [19,107]).
In the orbital representation, each interaction line depends on four site/atom indices {(i, j); (k, l)}, four orbital indices {(α, β); (γ, δ)}, and two spin indices {σ; σ ′ } (the Coulomb interaction vertex does not change spin); to simplify notations we will be also using a composite index a = (i, α, σ). It is worth mentioning that terms with u = v = 0, where u = i − j and v = k − l are relative "distances" between the orbitals, represent the "densitydensity" part of the interaction potential, and their contribution is dominating in the final answer. Accounting for nonzero values of (u, v) in the Dyson equation for W changes the answer at the sub-percent level, and the corresponding contribution quickly saturates when separation in space between orbital indices, limited by cutoffs u * and v * , is increased. For hydrogen atoms in the singleorbital case we find that energies per atom obtained with unrestricted summation over (u, v) and with u * = v * = 2 coincide at the level of ∼ 10 −5 in relative units even at the smallest values of lattice constant R considered in this work (the agreement is better at larger values of R).
Appendix B: Tables of results and additional  benchmark data   In Tables II, III, IV, V, and VI we include the numerical values of the results presented in some of the figures the main text, as well as additional finite-basis and finitesize data. Lengths are measured in Bohr and energies in Hartree. Data not included in the appendices will be available online [11].
A variety of finite-size and/or finite-basis-set data are available. Figure 9 shows the equation of state (EOS) extrapolated to the TDL at cc-pVDZ and cc-pVTZ level. At cc-pVDZ level, DMRG provides a highly accurate EoS, with equilibrium bondlength and energy R eq = 1.880(2) a B , E 0 = −0.5608(2)E Ha . The corresponding correlation energies, using RHF energies as reference, are shown in the middle panel of Fig. 9. Figure 10 shows the EOS extrapolated to the TDL using SBDMRG and DMRG for STO-6G and cc-pVDZ level basis sets.
Appendix C: Additional details on reaching the complete basis set and thermodynamic limits

Extrapolation to the CBS limit
Extrapolations of the UHF and UHF-based correlation energy to the CBS limit are illustrated in Fig. 11, for the representative bondlengths R = 1.8, 2.8 a B . Using RHF as references gives indistinguishable results. Figure 12 shows the effect of the F12 correction on MRCI+Q energies. As illustrated in the main text, extrapolations to the CBS limit obtained with and without F12 correction agree with each other to within the fitting uncertainties, confirming the robustness of the extrapolation procedure.

Extrapolation to the TDL
As mentioned, we find that the use of finite clusters (chains) tends to give better convergence to the TDL than with periodic boundary conditions, except for very short bondlengths. This is illustrated in Fig. 13. The faster convergence with OBC than PBC is somewhat surprising and counter to commonly held belief. The quasione-dimensional nature of the hydrogen chain is likely an important factor.
Extrapolations to the TDL are illustrated in the minimal basis in Fig. 14 for the bondlengths R = 1.0, 1.4, 1.8, 2.8. The importance of the A 2 (R)N −2 correction is evident in capturing the size effects. Note that the finite-size effects are larger at shorter bondlengths (ranging from roughly 50 m E Ha for R = 1.0 a B to less than 1 m E Ha at large separation as the chain turns into a collection of uncoupled H atoms). This is expected from the nature of the long-range Coulomb interaction.
In Fig. 15, we examine the robustness of the extrapolation. Results from the simple subtraction trick separating surface and bulk [i.e. k = 1 in Eq. (7)] are compared with the reference extrapolation using E N1,N2,N3 (R). Extrapolations agree with each other to well within 1 m E Ha , and approach E N1,N2,N3 (R) as N 1 , N 2 are increased. These results suggest that extrapolations E N1,N2,N3 (R) to the TDL have a resolution of the order of 0.1m E Ha per particle. For most methodologies, therefore, the uncertainty on the TDL extrapolation is one order of magnitude smaller than the bias due to the underlying approximations. It is also well within the uncertainty bound in our final best estimate of the EOS.
Our DMET strategy for calculations that directly access the TDL is somewhat different. We start from a R AFQMC DMET [2] FCI  RHF calculation on a large system (H 150 , H 300 , . . . ). A single fragment around the central H-atom is constructed, using x − 1 neighbors around it. The energy contribution due to the central H-atom is taken as the energy per atom, while the chemical potential is adjusted in the central H-atom such that its particle number contribution is 1.
In order to more efficiently access large impurity sizes, we truncate the embedding space such that the bath orbitals with a small norm (< 0.01) are excluded from the high level calculation. Our DMET results are re-ported using DMRG as a solver and a bond dimension of D = 1000. The error from the DMRG solver is less than 1 µE Ha in the energy per atom. To converge to the TDL it is necessary to converge both the full system as well as the fragment size. We find that it is necessary for the full system to be very large to converge to the TDL at short bond lengths. We carry out calculations on systems of increasing size (up to H 1950 at R = 1.0 a B ) using a fixed fragment size [22], until the change in the energy per atom is smaller than 0.01 mE Ha . Using this suitably defined full system size [108], we perform calcu-  Figure 9. Top: EOS in the thermodynamic limit computed with finite basis sets (cc-pVDZ and cc-pVTZ). Middle: detailed comparison using DMRG and AFQMC as reference. Error bars on the DMRG data indicate estimates of the TDL extrapolation uncertainties based on the results at the STO-6G level in Fig. 15. AFQMC+∆DMRGDZ is shown as reference for TZ (empty red circle), where the correction is obtained from the energy difference between DMRG and AFQMC at DZ. Bottom: Correlation energy per particle in the TDL, at cc-pVDZ (left) and cc-pVTZ (right) level.
lations on larger and larger fragments (see Fig. 17, panel b) and perform a quadratic extrapolation with the in-verse of fragment size. The fragment size extrapolation using the largest 4 or 5 fragment sizes yields the same limit to better than 0.02 mE Ha .     Figure 11. Illustration of the extrapolation to the CBS limit. Results are shown for H10 at bondlengths, R = 1.8 and 2.8 aB.
The unrestricted Hartree-Fock energy is fitted to an exponential function of the index x=2,3,4,5 of the cc-pVxZ basis, the correlation energy to the power law α + β x −3 .   Figure 14. Extrapolation of the EOS to the TDL limit, in the minimal STO-6G basis. The energy per particle is fitted to a second-order polynomial A0 + A1N −1 + A2N −2 in N −1 , reflecting the presence of a bulk and a boundary contribution.