Accurate many-body electronic structure near the basis set limit: Application to the chromium dimer

We describe a method for computing near-exact energies for correlated systems with large Hilbert spaces. The method efﬁciently identiﬁes the most important basis states (Slater determinants) and performs a variational calculation in the subspace spanned by these determinants. A semistochastic approach is then used to add a perturbative correction to the variational energy to compute the total energy. The size of the variational space is progressively increased until the total energy converges to within the desired tolerance. We demonstrate the power of the method by computing a near-exact potential energy curve for a very challenging molecule: the chromium dimer.


I. INTRODUCTION
The evaluation of accurate energies for correlated manyelectron systems is one of the most important challenges for computational science. The difficulty arises from the fact that the number of many-electron states increases combinatorially with the number of single-electron states (orbitals) N orb and the number of up-and down-spin electrons N ↑ , N ↓ (N = N ↑ + N ↓ ) as N orb C N ↑ × N orb C N ↓ .
There exist a number of accurate methods for weakly correlated systems, which we define for the purpose of this paper as systems for which much of the wave-function amplitude resides on a relatively small number of many-electron basis functions (Slater determinants), all of which can be constructed by exciting electrons from the orbitals of a reference state to orbitals within a small set of "active" orbitals [1]. In that case it is possible to perform an exact diagonalization in the complete active space (CAS), i.e., the space spanned by all determinants reachable by any number of excitations among these active orbitals. If the orbitals are rotated to optimize the energy, the resulting method is called the complete active space self-consistent field (CASSCF) method [2,3]. The resulting energy can be improved by performing second-order perturbation theory to approximately include the contribution of additional states, resulting in the complete active space perturbation theory (CASPT2) method [4].
At the other end of the spectrum, very strongly correlated systems can be defined as those systems for which it is necessary to include a large fraction of all possible Slater determinants to get an accurate energy and other expectation values. For these systems there is no recourse other than exact diagonalization in the entire Hilbert space, which is feasible only for very small systems or very small basis sets.
In between these two extremes, moderately strongly correlated systems require a large number of important Slater determinants to obtain accurate results, but this number constitutes a vanishingly small fraction of the dimension of the Hilbert space. Further, these states do not have any obvious pattern (e.g., they do not all belong to a CAS space). Many ab initio Hamiltonians belong to this category. It is for these systems that selected configuration interaction plus perturbation theory (SCI+PT), first developed about 50 years ago [5,6], can be most useful. Recently, there has been renewed interest in these methods [7][8][9][10][11][12][13][14][15] and some interesting applications, particularly to excited states [14,16]. The recent development of a very efficient algorithm in the form of the semistochastic heatbath configuration interaction (SHCI) method by some of the authors of this paper [17][18][19][20][21] has now made it possible to perform calculations on a wider and more interesting set of systems. We next briefly describe the SHCI method and the main innovations that account for its efficiency. Then we apply the SHCI method to calculate the potential energy curve of a small but very challenging molecular system, the chromium dimer.

II. METHOD
Selected configuration interaction plus perturbation theory (SCI+PT) methods approximate the full configuration interaction (FCI) energy by selecting the most important determinants from a large Hilbert space. These methods contain two steps. In the first step, a set of important determinants, V, are selected and the Hamiltonian is diagonalized in the subspace of these determinants to obtain the lowest, or the lowest few, eigenstates. In the second step, a second-order perturbation theory is used to calculate the energy contributions of the determinants that are in a space P that is disjoint to V, but that have a nonzero Hamiltonian matrix element connecting them to at least one of the determinants in V. We will refer to V and P as the variational and perturbative spaces, respectively. The recently developed SHCI algorithm substantially reduces the computational time of performing both the variational calculation and the perturbative correction, and eliminates the memory bottleneck for the perturbative calculation. We describe these two innovations next.
The method used in this paper is an improved version of the one recently developed [21] by some of the authors of this paper. Straightforward SCI+PT implementations use an energetic criterion based on second-order perturbation theory, for selecting determinants D a , to be included in V. In Eq. (1) E is the energy of the variational wave function and E a is the energy of determinant D a . SHCI modifies the selection criterion to which greatly reduces the cost by taking advantage of the fact that most of the H ai matrix elements are two-body excitations, which depend only on the indices of the four orbitals whose occupations change and not on the other occupied orbitals of a determinant [17]. Thus, by presorting the absolute values of all possible matrix elements of the two-body excitations in descending order, the scan over determinants D a can be terminated as soon as |H ai | drops below 1 /|c i |. In this paper, a similar idea is used to speed up the selection of one-body excitations as well. This enables a procedure in which only the important determinants are ever looked at, resulting in orders-of-magnitude saving in computer time. SHCI uses an analogous procedure to efficiently select the important perturbative determinants in P, replacing the variational cutoff 1 in Eq.
(2) with a much smaller perturbative cutoff 2 1 . Even with this improvement, a straightforward evaluation of the perturbative correction has a very large memory requirement because all distinct determinants connected to V, but not in V, that meet the criterion in Eq. (2) with = 2 must be stored [22]. The total number of connected determinants is >10 15 (>10 13 distinct connected determinants) when the number of variational determinants is on the order of 10 9 , as is the case for the calculations in this paper. To solve this problem, we have developed a two-step [18], and later an improved three-step [21] semistochastic perturbative approach that overcomes this memory bottleneck, and is fast and perfectly parallelizable. A different efficient semistochastic perturbative approach has been used in Ref. [11]. These improvements allowed us to use 2 × 10 9 variational determinants [21], which is two orders of magnitude larger than the largest variational space of 2 × 10 7 determinants [11] used in any other SCI+PT method.
We choose 2 = 10 −6 1 , so by progressively reducing the single parameter 1 a systematic convergence to the full configuration interaction limit is obtained. The energy at the 1 = 0 limit is obtained using a quadratic fit to the energies versus the perturbative correction [19]. The convergence of the energy depends greatly on the choice of orbitals. Natural orbitals give faster convergence than Hartree-Fock orbitals. Orbitals that are optimized to minimize the SHCI energy [20] for a large value of 1 yield yet faster convergence, but the optimization typically requires many more optimization iterations than CASSCF optimizations require because of strong coupling between the orbital and CI parameters. In this paper, we greatly accelerate the convergence by using an overshooting method based on the angle between successive parameter updates.

III. POTENTIAL ENERGY CURVE OF Cr 2
The potential energy curve of the chromium dimer is very challenging for state-of-the-art quantum chemistry methods for several reasons. The 1 + g ground state of the molecule dissociates into two atoms in high-spin 7 S states, each with six unpaired 3d and 4s electrons. Thus, the molecule has a formal sextuple bond, and the minimal CAS space required for correct dissociation is CAS(12e,12o). Consequently, neardegeneracy correlation is very important, as evidenced by the fact that spin-unrestricted coupled cluster theory with single, double, and perturbative triple excitations [UCCSD(T)] predicts a dissociation energy that is much too small [23]. Simultaneously, dynamic correlation is also very important, as evidenced by the fact that CASSCF in a CAS(12e,12o) space gives a very weak minimum at a very large bond length. Thus, most of the calculations that have been performed employ CASPT2 [24][25][26][27] or the related n-electron valence state perturbation theory (NEVPT2) [28] to try to capture both near-degeneracy and dynamic correlation effects. These methods are sensitive to the choice of the CAS space, and in addition the CASPT2 method is sensitive to the choice of the ionization potential electron affinity (IPEA) shift. In fact, CASPT2 with a CAS(12e,12o) reference space and reasonable choices of IPEA shift yield well depths ranging from 1.1 to 2.4 eV [26]. Since conventional CASSCF calculations are limited to about CAS(18e,18o), the density matrix renormalization group (DMRG) [29,30] method has been employed [24,28] as a CAS space solver, allowing the use of the larger CAS(12e,22o), CAS(12e,28o), and CAS(28e,20o) reference spaces, which partially cures this problem. Despite this, these methods have been unable to provide a definitive potential energy curve (PEC) for Cr 2 .
Externally contracted multireference configuration interaction (MRCI) using determinants selected from a DMRG calculation in a CAS(12e,42o) as the reference space has also been used [31]. It gives a reasonably-shaped PEC shifted down by about 0.1 eV relative to experiment. Multireference averaged quadratic coupled cluster (MR-AQCC) [32] is another accurate method that has been used to compute the PEC of Cr 2 . It gives a well depth of 1.35 eV, and the shape of the PEC is in reasonable agreement with experiment.
Probably the most accurate method used for Cr 2 is the auxiliary field quantum Monte Carlo (AFQMC) [33]. Most AFQMC computations are performed using the phaseless approximation, the accuracy of which depends on the choice of the trial wave function. For Cr 2 , phaseless AFQMC is not sufficiently accurate with affordable trial wave functions. On the other hand, free-projection AFQMC is exact (aside from statistical error), but very computationally expensive. So, a hybrid approach was used wherein free-projection AFQMC was performed in the 3z basis and the complete basis set correction was computed by adding in the correction from phaseless AFQMC for 3z, 4z, and 5z basis sets for r < 2 Å, and by adding in the correction from free-projection AFQMC with only 12 rather than 28 correlated electrons in 3z and 4z basis sets for r > 2 Å.
Part of the interest in Cr 2 comes from the fact that an experimentally deduced PEC is available which can be used to some extent to test the accuracy of theoretical methods. The shape of the PEC comes from high-resolution photoelectron spectra of Cr − 2 , which showed 29 vibrationally resolved transitions to the neutral Cr 2 ground state [34]. However, there are gaps in the measured vibrational levels and the assignment of the higher levels is not unambiguous, so part of the PEC is not well constrained by the data. The vertical placement of the potential energy curve is determined from the dissociation energy. Experimental values vary considerably: 1.56 (26) [37], and 1.54(6) eV [38]. We will use the last number in most of our plots since it is more recent, but will keep in mind that it has considerable uncertainty. Since the zero-point energy is 0.03 eV [24], the potential energy curves we present are shifted so that the well depth is 1.57 eV. Recently, the experimental data of Casey and Leopold [34] have been reanalyzed by Dattani [39] using a more flexible fitting function and a fully quantum mechanical treatment to obtain a slightly different PEC from the original. We show both of these curves in all our figures.

IV. HAMILTONIAN
For the 3d transition metals it is important to include scalar relativistic effects, but the spin-orbit splitting is small. The two standard scalar relativistic Hamiltonians are the Douglas-Kroll and the x2c [40] Hamiltonians. In our work, we employ mostly the x2c Hamiltonian, but we have verified that the Douglas-Kroll Hamiltonian yields essentially the same PEC, though it gives a total energy for the molecule that is about 14.6 mHa higher. The one-and two-body integrals for the x2c Hamiltonian are obtained using the PYSCF package [41].

V. BASIS SETS
Quantum chemists have designed several different sets of standard single-particle basis functions for most of the elements in the periodic table [42]. The "correlation consistent" bases of Dunning and co-workers [43][44][45] are widely used and are designed to enable systematic extrapolation to the complete basis limit. These bases are designated cc-pVnZ, where n is referred to as the cardinal number of the basis set. They are designed for nonrelativistic calculations; the corresponding basis sets for relativistic calculations are designated cc-pVnZ-DK. We employ the cc-pVnZ-DK basis sets with n ranging from 2-5, and for brevity we designate these by 2z, 3z, 4z, and 5z. These have 86, 136, 208, and 306 basis functions for the dimer, respectively, which result in the same number of orbitals written as linear combinations of the basis functions. In the SHCI calculations we allow excitations to and from all these orbitals, keeping only a small number of core, and in some calculations semicore, orbitals doubly occupied. By using more than one basis set, we can extrapolate the UCCSD(T) and SHCI energies to the complete basis limit making the usual assumption that the binding energy converges as the inverse cube of the cardinal number n for n 3.
Cr 2 at a bond length of 1.5 Å in the Ahlrichs SV basis [46] has become a very popular system for testing the accuracy and efficiency of electronic structure methods, even though this basis is much too small to give even a qualitatively correct PEC [47]. In the Supplemental Material [48], we provide accurate energies for this basis, both with and without core excitations.

VI. CORRELATING 12 ELECTRONS
Molecular systems containing heavy atoms have orbitals with very different energies. Although core electron correlations make a large contribution to the total energy, they have only a relatively small effect on energy differences such as the potential energy curve (PEC) because the core contributions in the atoms and the molecule tend to cancel. In Cr, the 3d and 4s electrons are the valence electrons, the 3s and 3p electrons are semicore electrons, and the 1s, 2s, and 2p electrons are the core electrons. Early calculations of Cr 2 employed only valence electron excitations; later calculations included also semicore electron excitations.
The computed energies depend not only on which orbitals are allowed to excite, but also on the nature of the orbitals that are kept frozen (not allowed to excite). Figure 1 shows the PEC obtained from correlating only the 12 valence electrons by allowing excitations to all higher-lying orbitals, keeping the semicore and core electrons fixed either in Hartree-Fock (HF) orbitals, or in orbitals obtained by optimizing in a CAS(12e,12o) space. The two curves differ greatly from each other and from the experimentally deduced PECs.   1. Comparison of the SHCI potential energy curves correlating the 12 valence electrons with a HF core and a CAS core to experimentally deduced curves. Note that in the SHCI calculation excitations to all higher-lying orbitals are allowed. When correlating 12 electrons, the nature of the frozen orbitals has a large effect on the PEC.  2. Comparison of the SHCI potential energy curves, correlating the 12 valence electrons using a CAS core and 2z-4z basis sets, to experimentally deduced curves. It is apparent that correlating just the 12 valence electrons is insufficient to get an accurate PEC.
In Fig. 2 we employ the 2z, 3z, and 4z basis sets to study the basis set dependence of the PECs obtained again from correlating only the 12 electrons, using CAS(12e,12o) semicore and core orbitals. Although the PECs improve with increasing basis size, it is clear that correlating just 12 electrons is insufficient to get good agreement with experiment. This is in fact well known, but the precise PECs have not been published before.

VII. CORRELATING 28 ELECTRONS
The coupled cluster method with single, double, and perturbative triples [CCSD(T)] amplitudes gives very accurate energies for systems where a single determinant has a large amplitude, such as most organic molecules at equilibrium geometry. Here, we use the spin-unrestricted versions of HF and CCSD(T), denoted by UHF and UCCSD(T), respectively, meaning that the HF up-spin and down-spin orbitals, and the CCSD up-spin and down-spin amplitudes, need not be the same, since this allows for dissociation of the molecule into two high-spin atoms. On the other hand, in our SHCI calculations, up-and down-spin orbitals are the same, so that the SHCI wave function can be an eigenstate of S 2 . In Fig. 3 we show the PECs obtained from UCCSD(T) using PYSCF [41] and 2z through 5z basis functions. Of course, the total energies go down monotonically with increasing basis size, but very surprisingly the 2z PEC curve lies lower than the 3z, 4z, and 5z curves. The same behavior is observed also in SHCI calculations at equilibrium with 2z, 3z, and 4z bases. The infinite basis extrapolated UCCSD(T) curve, shown as the solid blue line, lies below the 2z curve at short distances and above the 2z curve at large distances. The extrapolation is done using the 4z and 5z curves, but almost the same extrapolated curve is obtained from 3z and 4z curves. The 28 correlated electron UCCSD(T) curves have shapes similar to those from the 12 correlated electron SHCI curves, but they agree even less well with experiment.
Although UCCSD(T) gives poor PECs, it can be used to provide a reasonable basis set correction to the SHCI  FIG. 3. UHF and UCCSD(T) potential energy curves correlating 28 electrons in bases ranging from 2z to 5z, and the complete basis limit. Note that the 2z curve lies lower than the 3z, 4z over the entire range, and lower than the 5z and complete basis curves over most of the range. curves that we present next. The accuracy of the correction has been checked at the equilibrium bond length, where we find that the correction to the 2z SHCI energy, using the (3z, 4z, 5z) UCCSD(T) energies agrees with that obtained from the (3z, 4z) SHCI energies within 0.1 eV. More precise agreement cannot be expected since the uncertainty of the SHCI 3z, 4z energies from the extrapolation in 1 is itself about 0.1 eV. The 4z SHCI calculations with 28 correlated electrons have a Hilbert space of ( 198 C 14 ) 2 ≈ 10 42 . One of the desirable features of the SHCI method is that although the Hilbert space increases by 10 orders of magnitude going from the 2z to the 4z basis, the cost of the calculation is only a   FIG. 4. Comparison of the SHCI potential energy curves correlating 28 electrons to experimentally deduced curves. The red curve is for the 2z basis, the orange dot for the 3z basis, the green dot for the 4z basis, and the blue curve is the complete basis limit using the correction from UCCSD(T). Similarly to UCCSD(T), when 28 electrons are correlated, the binding energies do not change monotonically with the basis cardinal number. The FP-AFQMC curve from Fig. 4 of Ref. [33] is also shown. few times larger. This desirable feature is even more evident when the increase in Hilbert space comes from correlating additional core orbitals. However, since the 2z calculations are already expensive, we have done the larger basis calculations only at equilibrium.
The PEC from SHCI in the 2z basis, correlating the 28 valence and semicore electrons, is shown as the red curve in Fig. 4. The blue curve is the PEC extrapolated to infinite basis size using the correction from UCCSD(T). It has a minimum of −1.55 eV at 1.679 Å, in agreement with the experimentally determined −1.57 eV [38] at 1.679 Å [34]. It agrees very well with experiment at bond lengths around equilibrium and also at long bond lengths. It differs a little from experiment in the shoulder region from 1.8 to 2.7 Å, which roughly coincides with the range of distances where the experimentally deduced curve is most uncertain because of missing vibrational levels, as also noted in Ref. [33]. This is also the region where the computed energies converge most slowly. The blue curve agrees well also with the curve labeled FP-AFQMC in Fig. 4 of Ref. [33], except that the FP-AFQMC curve is yet a bit lower than SHCI in the shoulder region.

VIII. CONCLUSIONS
The SHCI method enables systematic convergence to the exact energy for moderately strongly correlated systems with sizes of Hilbert space that were previously inaccessible. We demonstrated its power by computing the potential energy curve of a very challenging dimer Cr 2 . The size of the largest Hilbert space treated with SHCI is 10 42 . Nevertheless, energies, that we estimate are accurate to a few milliHartrees, were obtained from calculations that involve 10 9 variational determinants or fewer, and several trillion perturbative determinants. In future work, we plan to use an effective Hamiltonian that incorporates the effect of explicit interelectronic correlation [49] to reduce the magnitude of the basis set extrapolation error.