Direct Comparison of Many-Body Methods for Realistic Electronic Hamiltonians Collaboration the Many-Electron Problem)

A large collaboration carefully benchmarks 20 first-principles many-body electronic structure methods on a test set of seven transition metal atoms and their ions and monoxides. Good agreement is attained between three systematically converged methods, resulting in experiment-free reference values. These reference values are used to assess the accuracy of modern emerging and scalable approaches to the many-electron problem. The most accurate methods obtain energies indistinguishable from experimental results, with the agreement mainly limited by the experimental uncertainties. A comparison between methods enables a unique perspective on calculations of many-body systems of electrons.


I. INTRODUCTION
A major challenge in condensed matter physics, materials physics, and chemistry is to compute the properties of electronic systems using realistic Hamiltonians. Efficient and accurate calculations could enable computational design of drugs [1] and other materials [2,3] and shed light on a number of physical questions, such as the origin of linear-T resistivity [4], high-temperature superconductivity [5], and many other effects that currently lack a satisfying explanation.
Many-body quantum calculations on classical computers are challenging, because the dimension of the Hilbert space increases dramatically with the number of particles. For example, in the simple case of a CuO molecule with a large (5z) basis, the Hilbert space is of dimension 10 44 for the S z ¼ 1 2 sector. A vector of this size cannot be represented in any computer; at the present time, the Oak Ridge machine Summit has approximately 250 petabytes of storage [6], which is still approximately 17 orders of magnitude too small to store a single vector. Modern techniques therefore use compression and other techniques to approximate the state vectors.
There are many, not always mutually exclusive, approaches to dealing with the dimensionality: truncation of the wave function space through wave function Ansätze, one-particle Green function approaches, density functional theory, Monte Carlo methods, and embedding techniques.
The techniques vary dramatically in their computational cost and accuracy. Most studies [7][8][9][10][11][12][13][14][15] judge the accuracy of the methods by comparing to experimental energies [16], which are computed by taking differences of total energies and are therefore subject to a fortuitous cancellation of error. Recently, it has been pointed out [17,18] that comparisons between numerical implementations can be extremely valuable, rather than making comparisons directly to experiment. In this study, we include three systematically improvable methods with sufficiently small prefactors that they yield almost exact total energies within the chosen basis set and serve as a benchmark for testing all other methods.
In this manuscript, we apply a diverse array of 20 established and emerging techniques to a test set of small, realistic transition metal molecules and atoms. Each technique is implemented by an expert and employs precisely the same Hamiltonian. This approach allows us to directly assess methodological differences without confounders such as different Hamiltonians and has been important for a previous benchmark study of the hydrogen chain [19] and helium atom [20,21]. For these systems, we achieve a convergence of exponentially scaling but systematically convergable methods at the order of 1 mhartree in the total energy, or about 300 K, establishing a reliable reference on realistic Hamiltonians with complex atoms. We then assess the accuracy of more approximate approaches for computing the total energy of atoms and molecules, which allows some assessment of the transferability of performance with an increasing system size. Finally, we study how errors in the total energies translate into errors of physical observables obtained as differences of total energies, and we make comparisons to experiments. These results provide an important reference for the development of techniques that can address the larger goal of computing electronic properties of realistic materials. Table I lists the methods tested in this work. It includes most of the common techniques to address the manyelectron problem, as well as some emerging methods. It also includes a few methods such as configuration interaction with singles and doubles (CISD) which are no longer commonly used but have historical relevance. The methods in this benchmark vary dramatically in their computational cost; the density functional theory methods require only a few minutes to complete the test set, while some of the more advanced techniques are not able to treat every basis for every system with the available amount of computer time. The methods also scale very differently, ranging from OðN 3 e Þ to exponential in the number of electrons N e . Of the three systematically converged methods [initiator full configuration interaction quantum Monte Carlo (iFCIQMC), density matrix renormalization group (DMRG), and semistochastic heat-bath configuration interaction (SHCI)], only SHCI is performed for all the systems in all the basis sets. Consequently, SHCI energies are used as the reference.

II. METHODOLOGY
Some of the other techniques are, in principle, systematically improvable, such as configuration interaction,  [22]. Column A lists the largest basis set employed by that method for at least one of the transition metal atoms, and column B lists the same for the monoxide molecules. The basis sets are abbreviated in order as d, t, q, 5, and c for complete basis set. coupled cluster, self-energy embedding theory, and the Monte Carlo methods, but convergence to better than 1 mhartree is not achieved on these systems for the level of the method employed. Some of the techniques give upper bounds to the exact energy, such as diffusion Monte Carlo (DMC), CISD, DMRG, and Hartree-Fock (HF). Finally, for completeness, it should be noted that the methods also require different levels of specification to define the approximations used. For example, some of the methods can be reproduced only by specifying the initial starting determinant; others require defining an initial multideterminantal wave function or the choice of partitioning between high-level and low-level methods. We consider transition metal systems, with the core electrons removed using effective core potentials [56][57][58]. These potentials accurately represent the core [59] in manybody simulations and allow all the methods considered in this work to use the same Hamiltonian. In addition, they provide an easy way to include scalar relativistic effects, needed for a meaningful comparison to experiment. These potentials are available for O, Sc, Ti, V, Cr, Mn, Fe, and Cu, which define our test set. We consider these atoms, their ions, and the corresponding transition metal monoxide molecules. To simplify the comparison, the molecules are computed at their equilibrium geometry.
Almost every electronic structure method (all the methods in this study except DMC) works in a finite basis. Here, we follow the chemistry convention of defining an ascending basis set denoted by the zðζÞ value, ranging from 2 to 5; i.e., dz, tz, qz, and 5z. For each system, we consider the first-principles Hamiltonian projected onto the basis, making for a total of 23 × 4 ¼ 98 calculations for each method. See Supplemental Material [22] for details on the precise basis sets used in this study. While the results are comparable to experiment only in the complete basis set limit (cbs), for each basis set there corresponds a projected Hamiltonian which also has an exact solution. We thus can compare methods within a basis, since the Hamiltonian is defined precisely.
In Table I, we list the methods considered in this work. The deviation in the total energy between two methods m and n is computed as where N is the total number of calculations performed in common between the methods. This number is a measure of how well the output total energies between two methods agree. It is possible for two methods with large σ to agree on energy differences if there is a significant cancellation of errors.
To compare total energies between methods and systems in a consistent way, we use the concept of percent of correlation, commonly used in quantum chemistry: where E HF is the Hartree-Fock energy, m stands for the method under consideration, and E SHCI is the total energy computed in the basis by the SHCI method. At 100% of the correlation energy, the exact result is obtained. This quantity is particularly useful, since methods tend to obtain similar percentages of the correlation energy across different basis sets and systems. Extrapolation to the basis set limit is done making the usual assumption that the correlation energy (difference between Hartree-Fock and the exact energy) scales as 1=n 3 , where n is the cardinal number of the basis set, and that the Hartree-Fock energy exponentially converges to the complete basis limit. Complete basis set extrapolation is necessary for a comparison of the finite basis set results to experiment, DMC, and density functional theory results. DMC works directly in the complete basis limit, whereas density functional methods are designed to reproduce complete basis set limit energies. The uncertainty in the extrapolation, judged from the variation between different fits to the extrapolation, is approximately 2-4 mhartree; for details, see Supplemental Material [22]. Thus, in this test set, the largest uncertainty in the complete basis set total energy is due to the extrapolation of finite basis set energies to the infinite limit.
The energy differences studied are the ionization potential of a transition metal atom M [IP ¼ EðM þ Þ − EðMÞ] and the dissociation energy of a metal oxide molecule MO [DE ¼ EðMÞ þ EðOÞ − EðMOÞ]. These quantities have been studied in detail for these systems in the past, e.g., Refs. [7][8][9][10][11][12][13][60][61][62][63], among others. However, none of these previous studies attain reference energies as well converged as the ones in this paper, and none compare energies from a large number of methods.

III. RESULTS
We show several views of the data collected in this study in the figures. Supplemental Material [22] contains various tables and the complete set of data (approximately 1200 calculations) on which these plots are based. Figure 1 establishes that several high-accuracy techniques are in agreement and establishes a reference technique SHCI. Figure 2 compares the performance of methods in computing the total energy as compared to the reference. Figure 3 compares the performance of methods in computing the ionization potential of the atoms and dissociation energy of the molecules. Figure 4 summarizes the cancellation of error for different techniques in computing the differences in energies. Finally, Fig. 5 compares calculations using methods found to be accurate to the experimental dissociation energies. In this section, we examine related methods in the context of these different views.
In Fig. 1, we show a cluster analysis of the total energies using Eq. (1), evaluated on the intersection of basis sets and systems available for both methods, as the distance metric. iFCIQMC, DMRG, and SHCI are converged to very high levels of accuracy. In fact, these three methods agree to approximately 1 mhartree for all systems and basis sets that are computed. Because of this threefold agreement, we can take any of these results as the exact ground state energy in a given basis set to within an rms error of less than 1 mhartree, which is approximately what is termed "chemical accuracy" in the context of energy differences. Here, we achieve 1 mhartree accuracy in the total energy of the ground state. However, as shown in Table I, iFCIQMC and DMRG calculations are feasible within the available computer time for only the smaller basis sets, so we use SHCI as the reference. For finite basis sets, the estimated uncertainty is approximately 1 mhartree, and for the complete basis set, the estimated uncertainty is approximately 2-4 mhartree due to the extrapolation uncertainty.
Density functional methods have a large spread across systems in the percent of correlation energy attained (Fig. 2). The gradient-corrected and the hybrid functionals (B3LYP, HSE06, PBE, and SCAN) improve the LDA. The most recently proposed of these, SCAN, is more consistent in the percent of correlation energy obtained at around 80%-90% of the correlation energy. Figure 4 shows that it also benefits more than the other functionals from a cancellation of errors between the atom and the molecule to give more accurate dissociation energies, although it has less cancellation of errors for the ionization potentials. Much of the improvement in accuracy of the hybrid functionals over PBE is in the cancellation of error.
The random phase approximation (RPA) and both versions of GW overestimate the correlation energy as shown in Fig. 2. While the total energy tends to be too low, those errors tend to cancel for QSGW applied to energy differences, as can be seen in Figs. 3 and 4. As can be seen in Fig. 2, CISD, a truncated determinant expansion technique well known to have size consistency defects, performs much better for the atoms than the molecules, which leads to rather poor predictions for the dissociation energy of the molecules (Fig. 3). The error is large enough that CISD is not included in Fig. 4 to improve readability of the more accurate numbers. We note that unrestricted coupled cluster with singles and doubles (UCCSD), which is size consistent, also performs worse on the molecules than the atoms, though to a lesser degree than CISD. This difference results in the underestimation of the dissociation energy (Fig. 3) and no cancellation of error in the dissociation energy but a significant cancellation in the ionization potential (Fig. 4).
Fixed node diffusion Monte Carlo with a single-determinant trial function [DMC(SD)] yields a lower bound to the extrapolated correlation energy, corresponding to an upper bound to the total energy, which is apparent in Fig. 2. The remaining energy is the fixed node error, the main approximation in the DMC calculations, which for a single Slater determinant nodal surface is much larger than the extrapolation uncertainty. With the single Slater determinant, DMC obtains 90%-95% of the correlation energy quite consistently, in line with previous benchmarks on smaller systems [68]. This consistency results in a significant cancellation of error (Fig. 4) in the dissociation energy and ionization potential.
Self-energy embedding theory with a full configuration interaction solver and GF2 embedding [SEET(FCI/GF2)] obtains results in good agreement with the reference total energy (Fig. 2), resulting in accurate energy differences (Fig. 3). Consequently, it lies very close to the x ¼ y line in Fig. 4 and does not benefit from an additional cancellation of error, as the energies are already accurate. The errors in the total energy are not strongly correlated with the atomic species; for example, the error in the Ti atom is not statistically similar to the error in the TiO atom, resulting in little cancellation of error.
The auxiliary field quantum Monte Carlo with a multiple determinant trial function [AFQMC(MD)] gives good agreement with the reference total energy, with an rms deviations of about 3-4 mhartree. The dissociation energies have an rms deviation of approximately 2.5 mhartree, which is consistent with the conclusion of a recent benchmark on a large set of transition metal diatomics [14]. The use of single-determinant unrestricted Hartree-Fock trial wave functions leads to less accurate results, roughly doubling the rms error in the total energy of the molecules (see Supplemental Material Sec. I A [22]).
Coupled cluster with singles, doubles, and perturbative triples [UCCSD(T)] performs very well on these systems, obtaining close to 100% of the correlation energy. For these problems, UCCSD(T) has a notably low cost for high performance. The accuracy of UCCSD(T) is likely due to the fact that these systems are not strongly multireference, in that, even in the near-exact wave functions, there is a single dominant determinant that makes a large contribution to the wave function. This contribution can be seen by examining the natural orbital occupations; for example, in UCCSD, the spin-resolved natural orbitals with large occupations have occupations of 0.96 or greater. The single-reference nature also explains the mediocre performance of the multireference methods such as MRLCC, which sacrifice some accuracy in the single-reference case to treat multireference situations more accurately. In general, active space techniques, which operate within an explicitly chosen subspace of the larger Hilbert space, are not very effective for these systems. We believe that the reference data produced computationally have lower uncertainties than the experiment for the purposes of benchmarking quantum calculation techniques. The ionization potential of the large-basis SHCI results is in agreement with the experiment with a mean absolute deviation of 0.2 mhartree, or 7 meV, so one could equivalently use experiment or the SHCI reference values, as can be verified in Table VI in Supplemental Material [22]. The experimental dissociation energy estimation is limited by the challenges of the measurements, and the experimental measures differ from one another by as much as 0.5 eV. In Fig. 5, the high-accuracy estimates of the dissociation energy of the molecules is shown, compared to experimental values with zero point energy removed [69][70][71][72][73][74][75]. For these systems, the experimental uncertainty of the dissociation energy is larger than the difference between the most accurate techniques in this benchmark. Remarkably, SHCI, UCCSD(T), and AFQMC(MD) agree to about 0.1 eV for all the molecules. We also should note that, since we use effective core potentials to standardize the benchmark, there may be some small errors in comparing directly to experiment. However, we see no evidence that the potentials used are limiting the accuracy; the most accurate methods obtain results well within the experimental uncertainty, with the possible exception of VO, for which most of the experimental values are slightly below the theoretical ones.
When computing differences of total energies, both methodological errors and errors due to finite basis sets tend to cancel. In Fig. 4, we quantify the methodological cancellation of errors in many of the techniques studied in this work. Considering basis set errors, the rms error in the total energy in the commonly used tz basis compared to the complete basis set limit is 75 mhartree, while the rms error in the ionization energy and dissociation energy for the same comparison are 1.6 and 6 mhartree, respectively, as can be seen in Table VIII in Supplemental Material [22].

IV. CONCLUSION
We survey 20 advanced many-electron techniques on precisely defined realistic Hamiltonians for transition metal systems. For a given basis set, we achieve approximately 1 mhartree agreement on the total energy between highaccuracy methods, which provides a total energy benchmark for many-body methods. To our knowledge, such an agreement is unprecedented for first-principles calculations of transition metal systems. Our accurate reference energies should enable the development of approximate, but more computationally efficient, many-body techniques as well as better density functionals, without the necessity of experimental reference values. These systems are also a useful test for future quantum computing algorithms. To enable such comparisons, we include pyscf scripts that can execute the benchmark for any density functional available in libxc [76] and can export the one-and two-body integrals needed for testing many-body methods.
We assess the state of the art in achieving high accuracy in realistic systems. The benchmark set includes systems with large Hilbert spaces of around 10 44 determinants. While these spaces are so large that a single vector cannot fit in any computer memory, the computations are feasible due to powerful compression of that space. The systematically converged techniques used in this work (DMRG, FCIQMC, and SHCI) are able to achieve excellent agreement but can be applied only to relatively small systems due to their computational cost. It is thus important to understand the errors in lower-scaling techniques that can be applied to larger systems and whether performance on small systems is transferable to larger systems. Our study takes a step in that direction, since we are able to achieve converged results for both correlated atoms and molecules, and indeed we observe that the accuracy of some techniques degrades with system size.
To avoid misinterpretation of the results, we make a comment here. In order to ensure high-quality results, it is necessary to limit the number of systems on which this benchmark is performed. While treating electron correlation accurately is important to obtain accurate results, these systems have a particular character of correlation. In a determinant expansion of the wave function, the systems chosen here have one determinant with a large weight and many determinants with small weights, rather than several determinants with large weights. For such systems, methods such as UCCSD(T) are accurate. The performance profile will likely be different for differently correlated chemical systems, so benchmarking efforts of similar quality in that realm would be highly valuable.