Systematic improvability in quantum embedding for real materials

Quantum embedding methods have become a powerful tool to overcome deficiencies of traditional quantum modelling in materials science. However, while these are systematically improvable in principle, in practice it is rarely possible to achieve rigorous convergence and often necessary to employ empirical parameters. Here, we formulate a quantum embedding theory, building on the methods of density-matrix embedding theory combined with local correlation approaches from quantum chemistry, to ensure the ability to systematically converge properties of real materials with accurate correlated wave~function methods, controlled by a single, rapidly convergent parameter. By expanding supercell size, basis set, and the resolution of the fluctuation space of an embedded fragment, we show that the systematic improvability of the approach yields accurate structural and electronic properties of realistic solids without empirical parameters, even across changes in geometry. Results are presented in insulating, semi-metallic, and more strongly correlated regimes, finding state of the art agreement to experimental data.


I. INTRODUCTION
Computational materials research suffers from a lack of systematic improvability. This feature is important to validate and trust each individual prediction with internal checks, without relying on a faithful comparison to experiment which may have other sources of error. Overwhelmingly, the choice of approach is density-functional theory (DFT) [1,2]. While this can be accurate, the uncontrolled approximations in the choice of exchangecorrelation (xc) functional mean that this approach cannot be systematically improved and qualitatively fails in a number of well-documented physical regimes [3]. Postmean-field methods, which explicitly account for the true Coulomb interaction between the electrons, have been shown to be accurate and systematically improvable [4][5][6][7][8][9][10]. However, the scattering induced by the two-body Coulomb operator explicitly couples different k-points of the reciprocal lattice, and therefore convergence to the bulk thermodynamic limit in materials is generally costly and limited in scope.
In this work, we demonstrate how we can lift this steep scaling with size of the computational cell, to rigorously return to a DFT-like scaling. This allows for converged results with highly accurate and systematically improvable quantum chemical methods for real materials, without recourse to the density-functional approximation. We demonstrate convergence both with k-point sampling and basis set, to compute electronic properties of insulating, metallic, and strongly correlated solids [11]. To achieve this, we introduce a fragmentation and locality approximation on the correlated physics-a concept which has been well-explored in both chemical and material frameworks previously [12][13][14][15][16][17]. In a material context these fragmentation approaches are most commonly * max.nusspickel@kcl.ac.uk † george.booth@kcl.ac.uk cast as 'quantum embedding methods', such as dynamical mean-field theory (DMFT) [18][19][20][21][22][23][24][25][26], and a number of other related methods [27][28][29]. While they are in principle systematically improvable by increasing the fragment size, it is rarely possible to achieve rigorous convergence in practice, or compute reliable ground-state energetic properties for ab-initio materials [30]. Indeed, in many applications of quantum embedding to realistic systems, one has to resort to introducing ad hoc parameters [31]. In contrast, in this work we define a single continuous parameter which defines the fidelity of the many-body environmental interactions, and show a rapid and systematic convergence in this parameter. As a defining feature, quantum embedding methods focus on a local subsystem of interest (often called "impurity" or "fragment") and couple to it a coarse-grained description of its environment, usually termed "bath". Since quantum embedding methods retain the environment's quantum nature, they can describe effects of delocalization and entanglement across the fragmentenvironment boundary, generally focusing on the onebody entanglement (hybridization) between the two [32]. Expansion of this fragment space is a route to improve results, but this is hard to define as a systematic convergence parameter in practice for real materials [33]. The focus on a local subsystem and truncation of its environment is also encountered in local correlation methods of quantum chemistry, albeit from a somewhat different perspective. Here, the hole and particle states are separately localized, and the excitation manifold between the two is systematically truncated based on locality criteria [16,34]. These approximations have been used to form low-scaling and accurate approaches for correlated (primarily molecular) systems, but it can be challenging to separately localize occupied and unoccupied reference states (especially in polarizable systems) [12,35] and defining the locality truncation on the excitation manifold can limit the applicability to more strongly correlated systems [36].
The quantum embedding of this work combines the strengths from these different perspectives and can be simply and systematically improved to exactness within the desired quantum chemical method, without requiring an expansion of the fragment space. This builds on the density-matrix embedding theory (DMET), which defines a bath space from the Schmidt decomposition of a mean-field wave function [17,23,37], and augments this (interacting) bath space with additional states inspired by the pair natural orbital (PNO) approach of local quantum chemistry methods [16,38,39]. This judiciously chosen bath expansion opens up a new single simulation parameter by which results can be systematically improved, and avoids the alternative approach to converge the important long-range physics in the embedding via an expansion of the fragment size. A comparison between these two axes of systematic improvability will be explored.
In this initial work, we exclusively converge results using the coupled-cluster method on the single-double level (CCSD), which has already been shown to provide good results as both a cluster solver for DMFT in correlated systems [40,41] and as a tool for real materials science [4,7,8]. However, we stress the generality of the method, ensuring it is also readily applicable to other levels of theory. We show convergence of properties across a diverse range of electronic characters, demonstrating the advantages of systematic improvability for quantitative accuracy and highlighting qualitative changes to traditional DFT results.

A. Definition of atomic fragments
Central to our method is the partitioning of the full system into atomic fragments, defined by the intrinsic atomic orbital (IAO) basis of each atom. This orbital basis only has the dimensionality of a minimal basis set, while simultaneously spanning (at least) the entire occupied space of a parent mean-field calculation without the need for numerical optimization [42,43]. Additionally, the IAOs retain the lattice symmetry of the mean-field and as a result only a single embedding calculation has to be performed for each symmetry-unique atom in the unit cell.
These atomic-like fragments are combined with bath orbitals. The Schmidt decomposition of the same reference mean-field (Hartree-Fock) state produces bath orbitals which ensure a rigorous reproduction of the onebody density-matrix (1DM) and Fock matrix over the fragment, and is the leading order bath space in an expansion of the full fragment hybridization [23,24,32,44,45]. These are the bath orbitals defined in DMET, and allow for the resulting 'DMET cluster' (defined as the union of the fragment and DMET bath orbitals) to be rigorously rotated into a space of occupied (hole) and unoccupied (particle) single-electron states. The number of these DMET bath orbitals is bounded from above by the number of fragment orbitals and independent of the full system size. Visualizations of these fragment and DMET bath orbitals can be found in appendix A for two systems of this work, graphene and SrTiO 3 .

B. Cluster-specific bath natural orbitals
Inclusion of this DMET bath space ensures reproduction of the fragment 1DM, in the sense that a mean-field calculation performed within the DMET cluster space yields the same fragment 1DM as the original calculation of the full system. However, this 'exact' embedding no longer holds for correlated wave function methods, which often require many more orbital degrees of freedom to accurately capture the correlation-induced virtual excitations and quantum fluctuations about the mean-field reference. While in principle these could be captured via increasing the size of the fragment, this can become hard to define for ab initio systems, where the choice of orbitals to include in the fragment space for a balanced description is unclear, especially for high-energy, virtual processes or dynamic correlation. A comparison of our proposed method to this fragment expansion is given in Sec. III A.
We instead choose an approach more familiar to local quantum chemical methods, expanding the local space about this DMET cluster in a form analogous to pair natural orbitals of the cluster [16,38,39]. This extends the bath space with additional occupied and unoccupied states, to systematically converge the local excitations necessary for a correlated fragment description, and an improved description of the fragment-environment entanglement. To define the most relevant space that these states should span, we consider the local interacting space from second-order (Møller-Plesset) perturbation theory (MP2). We perform two subspace MP2 calculations for each fragment where in the first, the occupied states of the excitations are constrained to be rigorously within the DMET cluster space, and in the second, the unoccupied states are constrained to be within the DMET cluster space. In both subspace calculations we use Gaussian density-fitting, defining the central double excitation ('T 2 ') amplitudes of MP2 as respectively, where i, j labels occupied orbitals, a, b labels unoccupied orbitals, and L labels auxiliary densityfitting functions, with the (ia|L) three-center integrals computed as defined in Ref. 46. The tilde above occupied or virtual indices indicates that these orbitals are restricted to the DMET cluster space and their orbital energies, ĩ and ã , do not generally match the Hartree-Fock orbital energies, but instead stem from the diagonalization of the Fock matrix within their respective subspace (see appendix B for more details). With these T 2 amplitudes, we calculate the approximate MP2 1DMs for the unoccupied space (first subspace) and occupied space (second subspace), defined from this correlated level of theory according to respectively, where we added a fragment index X to stress that these matrices are different for each fragment of the system. These 1DMs can be projected onto the environment of the cluster and diagonalized, by loose analogy to the PNO approach [16,39], yielding a set of ' cluster-specific' natural orbitals which span the corresponding unoccupied or occupied environment space of the system. The corresponding eigenvalues (occupation numbers) provide a natural ordering of these environment orbitals according to their importance in describing excitations out of or into the DMET cluster, and therefore implicitly consider the locality of the fragment space, without requiring this to be explicitly enforced. A subset of the natural orbitals with occupations numbers larger than some threshold η for the unoccupied orbitals, or smaller than 2−η for the occupied orbitals, can then be included together with the fragment and DMET bath orbitals as 'active' orbitals to define the final correlated cluster. The remaining occupied natural orbitals just contribute a static, one-particle Coulomb and exchange potential. Figure 1 shows the density of the resulting active cluster space at the example of a single carbon atomic fragment embedded in a two-dimensional graphene layer. As the bath threshold η is decreased from Fig. 1a to Fig. 1c, the cluster space grows in size, but remains predominantly localized around the atomic fragment, thus demonstrating the emergence of locality in the cluster construction. All steps of this bath construction scale at most as O(N 3 ) with respect to the system size N . Once this total cluster subspace is defined over the system for some threshold η, we project the bare, interacting Hamiltonian into this space (including the bath), thus ensuring that the coupling between fragment and environment goes beyond the one-body hybridization physics. This leads to an efficient and systematically improvable expansion of the correlated coupling of each fragment to its environment, with the accuracy determined by a single parameter, η, by which the longer-range physics of the embedding can be converged. Crucially, for any non-zero threshold η, the dimensionality of the resulting active cluster space is expected to remain constant beyond a certain point, as the system size is increased asymptotically.
This latter point is illustrated in Fig. 2, which compares the final number of cluster orbitals, i.e. the fragment and all bath orbitals, N cluster , for the embedding of a single carbon atom in the diamond lattice, for supercell sizes up to 6×6×6 at different thresholds η.
While the cluster size increases initially with larger supercells, no significant growth is observed for supercells larger than 3×3×3, exploiting the locality of the fragment and its virtual excitations into its environment. As a result, the subsequent correlated cluster calculation does not scale with system size for any fixed value η; in the case of a 6×6×6 cell with η = 10 −8 the embedding amounts to performing a single Γ-point calculation with 312 active cluster orbitals and 5304 frozen environmental orbitals. Importantly, due to the constant cluster size the correlated calculation does not scale with respect to the system size for any fixed threshold η, and the necessary projection of the interactions into this cluster (via the density-fitted three-center integrals) can be performed in the k-space of the primitive cell at O(N 2 k ) cost (see appendix C).

C. Fragment correlation energy
We calculate the total energy as a sum of the meanfield (Hartree-Fock) energy, and the correlation energy arising from the clusters. We can write this correlation energy for any wave function as where τ ab ij represents the amplitudes on all doubly excited configurations [47]. For coupled-cluster, these are given as t ab ij + t a i t b j . We further partition this correlation energy functional into contributions E X corr associated with fragment X, such that the total energy can be approximated as As a guiding principle for defining an expression for E X corr one should ensure that the canonical, full-system CCSD energy is reproduced as the bath spaces are expanded to completeness (i.e. Eq. (4) should become an identity as η → 0). To achieve this, we project the first occupied index of the τ -amplitudes of each cluster calculation onto the respective fragment space, according to whereC X represents the AO coefficients of the fragment orbitals (x) of fragment X, C X represents the  The black dotted lines marks the y=x-diagonal, denoting the total number of degrees of freedom in the system. Note that the x-axis is linear with respect to N 1/3 crystal , which is why the diagonal has the shape of a cubic function. Inset: Convergence of the embedded CCSD correlation energy as a function of cluster size for the 6×6×6 supercell.
The calculations were performed with GTH pseudopotentials and the GTH-DZVP basis set.
coefficients of the occupied cluster orbitals, and S is the AO overlap matrix (we use crystalline Gaussian atomic orbitals (AOs) [7]). The obtained 'projected' τ Xamplitudes are then inserted into Eq. (3), leading to fragment correlation energy contributions E X corr , which can be used to estimate the total energy. The exactness of this expression in the η → 0 limit relies on the choice of IAOs as the fragment space, which ensures that the full occupied orbital space is spanned by all fragments. However, this choice of local correlation energy functional is not unique, and other options are possible.

A. Comparison of fragment and bath expansion
There are two primary differences of the quantum embedding method presented in Section II from existing quantum embedding methodology, and most directly, from previous DMET applications [43]. Firstly, we develop a principled protocol to expand the bath space of each fragment due to the correlated physics at longer length scales, avoiding the need to converge calculations via enlarging the fragment space. This allows the accuracy to be systematically improved, while retaining minimal and physical fragment sizes corresponding to the space of valence atomic orbitals. Quantum chemical methods have recently been employed as impurity solvers in DMFT, but necessitate a finite-sized truncation of the bath space [40,41,48]. The size of the bath must then be expanded until the hybridization physics of the the fragment is converged to the desired level of accuracy. However, since the bath is non-interacting in DMFT, this expansion does not converge the correlated physics of the fragment, and is thus-opposed to the approach proposed here-not a route to systematic improvability.
Alternatives to enlarging fragment spaces in DMFT and embedding methods to converge long-range physics is an area of active research [33,[49][50][51][52]. It should also be noted that expansion of the fragment space beyond atomic partitioning can still be achieved in the current method if there is a compelling reason to include the full excitation space of multiple atoms (e.g. multiple strong correlation centers). The second principal difference is that we directly use the cluster wave function to calculate system properties, such as the correlation energy as given in Eq. 3. This contrasts with the use of cluster reduced density-matrices, which are combined by democratically partitioning their contributions between the fragment spaces, as used in previous DMET studies [25]. In this section, we compare the traditional interacting-bath DMET formulation to this current approach, to clearly demonstrate the value of these two  3. Convergence of the total energy per unit cell of diamond for a given number of correlated cluster orbitals, comparing an expansion of the fragment space to the proposed approach based on an expansion of the bath space via cluster-specific natural orbitals. Top plot shows convergence to the full-cell result in a 2×2×2 supercell and cc-pVDZ basis, while the lower plot shows the convergence in a 5×5×5 supercell and cc-pVTZ basis. The fragment expansion relies on inclusion of sets of IAO+PAO fragment orbitals on all included atoms, while the bath expansion has a fragment space of a minimal IAO space of a single carbon atom (Natom in the plot denotes the number of atoms included in the fragment) . Energies in the fragment expansion are computed from democratically partitioned density-matrices [25], requiring an additional chemical potential optimization, while the bath expansion energies are computed as defined in Sec. II C.

features.
The result of this comparison is summarized in Fig. 3, where the fragment expansion results (denoting fragments spanning all atoms at an increasing radial distance from a central atomic site) are compared to the proposed approach (labelled as 'bath expansion') where only a single minimal basis carbon atom is used for the fragment space in all cases. The convergence of the energy per unit cell as a function of the total number of orbitals in the correlated cluster calculation is shown, for a 2×2×2 and 5×5×5 supercell of diamond in varying basis sets. Within the 2×2×2 cell, convergence of both the fragment and bath expansion to the full-system CCSD result is demonstrated. The protocol for obtaining the fragment expansion results largely follows the procedure of Ref. 43, with some salient details worth highlighting in comparing the approaches.
Firstly, the democratically-partitioned DMET energy expression via the reduced density-matrices (used in the fragment expansion) requires that each atom included in the fragment also spans its entire local virtual space, as only in this case is it guaranteed that the calculation spans the entire computational basis in the limit that every atom of the unit cell is included in the fragment. This is achieved (as proposed in Ref. 43) by adding orthogonalized projected atomic orbitals (PAOs) to the IAO space of each carbon fragment atom. In contrast, our correlation energy functional of Sec. II C partitions the τ -amplitudes in their first occupied index amongst the fragments, and thus only the local occupied space needs to be spanned by each fragment in order to allow for systematic improvability to the full-cell limit. This condition is automatically satisfied by the choice of minimal basis IAO fragments, with the high-energy unoccupied states included as bath orbitals in a principled fashion from the construction of the cluster natural bath orbitals.
Secondly, it should be noted that the computational effort of the same sized cluster calculation is higher by roughly a factor of ten in the fragment expansion scheme compared to the proposed bath expansion. This is primarily due to the fact that when computing the total energy from partitioned density-matrices it is essential limit. The rate of convergence, however, is quite different with the MP2 bath expansion showing an exponential convergence, whereas the fragment expansion converges only very slowly to the exact result. This slow convergence is somewhat surprising and we attribute it to the fact that some degree of relatively delocalized virtual space is generally necessary in order to achieve accurate correlation energies, which is not efficiently described by purely local orbitals or the DMET bath construction.
In the lower plot of Fig. 3, we also show the results for a more physically realistic 5×5×5 supercell using the cc-pVTZ basis (7500 orbitals), in which case a full-cell CCSD calculation is no longer feasible. In this case, the fragment expansion appears to converge quickly, but to a (likely) spuriously high energy due to the neglect of appropriate longer-range unoccupied states. When compared to the energy obtained with a bath expansion, we converge to a significantly lower value, with similar profile to the 2×2×2 result [53]. This faster and more reliable convergence of the energy with respect to the number of cluster orbitals, as well as the ability to circumvent the computationally costly construction of reduced density-matrices and iterative chemical-potential optimization, make the bath expansion approach proposed significantly more attractive for systematic convergence in the description of realistic correlated materials.

B. Structural properties of diamond
We first apply our embedded CCSD approach to diamond, using an all-electron cc-pVTZ basis set and a 5×5×5 supercell, containing 250 atoms and 7500 AOs. All calculations were performed using a locally modified version of the PySCF quantum chemistry library [54,55], interfaced with our custom embedding package. The atomic fragment is constructed from five IAOs, corresponding to atomic 1s, 2s, and 2p orbitals. The diagonalization of the environment density matrix additionally leads to four strongly coupled DMET bath orbitals, reflecting the sp 3 hybridization and four-fold coordination of each carbon atom. As detailed above, the remaining environment is expanded in terms of MP2 bath natural orbitals, which are added to the DMET cluster according to the chosen threshold η, thus forming the final cluster for the CCSD calculation. All carbon atoms in the diamond lattice are symmetry equivalent and thus only a single atomic embedding problem is solved. A counterpoise procedure has been performed to correct for the basis set superposition error (see appendix D) [56,57]. The resulting total energies per unit cell as a function of the lattice constant are shown in Fig. 4a for different values of η. Despite the fact that the number of bath orbitals changes slightly as a function of lattice constant, the energy curves are smooth for calculations with η ≤ 10 −6 and can thus be readily fitted with an equation of states.
Here we use the Birch-Murnaghan equation of state [60] to perform a fit of seven data points with lattice con- stants between 3.4Å and 3.7Å (as shown in Fig. 4a). Both the equilibrium lattice constant and bulk modulus of the fitted equation of states, shown in Fig. 4b, converge smoothly with respect to the bath threshold η. The most accurate embedded CCSD calculations with η = 10 −8 results in a lattice constant of 3.557Å which is in excellent agreement with the experimental value (corrected for zero-point vibrational effects) of 3.553Å. This achieved with only around 420 active orbitals, compared to the 7500 orbitals of the full simulated cell. The bulk modulus of 467 GPa slightly overestimates the zero-point corrected, experimental value of 455 GPa by 2.6%. Despite this, the accuracy of these results is well within the scatter about experiment of different DFT results in the literature (as seen in the inset in Fig. 4) [58,59], as well as previous correlated CCSD results with smaller k-point meshes [7].

C. Graphene
As the embedding approach is built around local fragments, it raises the question if it can be applied to system with a delocalized electronic structure, such as metals or semi-conductors, which have proven difficult for other local correlation methods [12,13], as well as low-dimensional systems where correlation and screening are generally increased. Additionally, the MP2 method, used here to construct bath orbitals, is known to diverge for gapless systems [61]. In the following we apply the embedding approach to graphene, a semi-metallic, twodimensional system, using the all-electron def2-TZVP basis set and supercells up to a size of 10×10. The lattice constant is sampled between 2.4Å and 2.525Å and the calculated total energy, corrected for the basis set superposition error, is fitted to a quadratic equation of state (a plot of the fitted curves is shown in the appendix E). Figure 5 shows the convergence of the resulting equilibrium lattice constant as a function of the supercell size for two different bath thresholds, η = 10 −7 and η = 10 −8 . Both curves are converged with a 8×8 supercell at which point they differ by only about 0.0025Å. The resulting values of 2.459Å at η = 10 −8 is in good agreement with the experimental range [62,63], and once again within the scatter of different xc functionals of DFT.
It may appear surprising that a local embedding approach performs well for this semi-conducting system, but it is worth pointing out the following: 1) No full system MP2 calculation is performed, which would be plagued by a diverging denominator in Eq. (1) for gapless systems. Instead, in the subspace MP2 calculations either the occupied or unoccupied space is restricted to the DMET cluster space and the corresponding subspace HOMO or LUMO will be below or above the full system HOMO and LUMO, respectively, leading to a finite gap.
2) The subspace MP2 calculations are only used to identify the relevant environment space, which is then treated on the CCSD level. 3) While the initial fragment orbitals are local by construction, the DMET and MP2 bath nat-ural orbital are not restricted to be local. This means they are only as local as the electronic structure (on the HF and MP2 level, respectively) allows and are able to describe non-local parts of the system for the delocalized physics.

D. Electronic structure of SrTiO3
One of the strengths of coupled-cluster is that it is able to systematically include stronger correlation effects [4,40,41]. Strong electron correlation occurs, for example, in solids with partially filled d-and f-orbitals, which have a localized, atomic nature, and are not well described by most xc functionals. Another (related) wellknown failure of DFT occurs in charge-transfer complexes, where it generally overestimates the amount of transferred charge, which is a result of its tendency to over-delocalize orbitals [1]. This has to be contrasted with the HF method, which is known to have orbitals which are too local, and thus underestimates the amount of transferred charge [1].
The pervoskite SrTiO 3 contains both strongly correlated 3d orbitals at the Ti atom and exhibits charge transfer from the O 2− anions to the Ti 4+ cation within its TiO 6 octahedra, and is thus a challenging system for electronic structure methods. In order to remedy this, the LDA+U method is commonly applied to this system, adding a corrective Hubbard U potential to the 3d orbitals [64][65][66]. This potential raises the energy of 3d orbitals which are less than half-filled and in this way reduces the amount of charge-transfer and increases the fundamental gap of the system. The ability to converge the static electronic structure for these materials at the level of CCSD now provides a high-level reference for whether the charge rearrangement in these approaches is physical, providing benchmark values and a validation which cannot be obtained from experiment.
In the following, we compare these different methods (HF, DFT, LDA+U , and the embedded CCSD), for the resulting occupations of their Ti 3d orbitals in SrTiO 3 . We use a 3×3×3 supercell of SrTiO 3 in its cubic phase (the unit cell is shown in Fig. 6a) and the experimental lattice constant of 3.905Å. Deep lying core states are replaced by GTH pseudopotentials [67,68] (the 3s and 3p semi-core states of Ti are retained) and the GTH-DZVP-MOLOPT-SR basis is used [69,70]. The fragment of the embedded CCSD is spanned by the 4s and 3d IAOs of Ti and additional MP2 bath orbitals are added up to a threshold of η = 10 −8 (see appendix A and F for orbital plots and convergence with respect to η). Orbital occupations are obtained from the Löwdin population analysis [71], where we combine the contributions from the 3d and 4d shells on Ti, and separate them into their e g and t 2g symmetry groups. Note that while the absolute values calculated from a population analysis do not have a well-defined physical meaning and depend sensitively on the chosen analysis method, atomic projec- tor, and computational basis set, the relative trends in the occupations between different methods (or materials) are still meaningful, as long as the analysis is performed in the same way. Figure 6b shows the total number of electrons in the e g and t 2g orbitals of Ti for the different methods. As expected, Hartree-Fock significantly underestimates the amount of transferred charge thus leading to the lowest orbital occupation. The LDA functional has a much higher number of electrons in the 3d shell of Ti, indicating an overestimation of charge-transfer; this trend is reversed as a Hubbard U potential of U eff = 4.36 eV (established literature values for this system [64,65]) or U eff = 8 eV is added to the 3d states. With a value of U eff = 4.36 eV, the LDA+U 3d occupations are in better agreement with hybrid (B3LYP) DFT and embedded CCSD, but still overestimate the charge transfer. As the value of U is increased further, the number of electrons in t 2g orbitals will eventually be in good agreement with the embedded CCSD (at around 6 eV), but the occupation of the e g orbitals is actually increasing and thus deviating even more from the B3LYP and embedded CCSD value. This shows that the LDA+U approach is too simplistic to accurately represent the correlation-driven charge transfer in the 3d shell in SrTiO 3 . Hybrid DFT and embedded CCSD results are in reasonable agreement, but note that B3LYP contains an empirical amount of HF exchange, whereas the embedded CCSD approach is a true ab initio method, allowing for systematic convergence towards the full CCSD limit via a single parameter, η. Overall, embedded CCSD falls into the expected range between HF and DFT, indicating that it does not exhibit the same level of (de)localization error as these mean-field methods, and is thus well suited to describe solids with strong electron correlation and charge-transfer complexes.

IV. CONCLUSION & OUTLOOK
We present a systematically improvable formalism for quantum embedding in materials, with a single contin-uous parameter defining the fidelity of the interactions of a local region with its environment. While the framework is applied to converge high-level quantum chemical theory at the level of coupled-cluster for structural and static electronic properties, the extension to systems with non-perturbative correlations, magnetic order, and quantum phase transitions can also be envisaged, and will be considered in future work. Across insulating, low-dimensional, semi-metallic, and perovskite materials, we demonstrate the ability to converge all technical parameters of highly accurate wave function approaches, for reliable predictions with good agreement with experimental data where available. Overall, this expands the scope and prospects for explicitly correlated methodology applied to materials science problems, returning to a (hybrid) DFT scaling with system size in this locally correlated framework.  Figure 7 shows the IAO fragment orbitals for an embedded carbon atom in a graphene lattice and the corresponding DMET bath orbitals, required to ensure that the Hartree-Fock density-matrix of the fragment space is conserved from its lattice description. The IAOs can be rigorously associated with the fragment carbon atom, characterizing the 1s, 2s, and 2p states. Their slight polarization from these true atomic states reflects the requirement for the IAOs to both be orthogonal and to ensure that taken together, they span the occupied space of the lattice mean-field solution. We also show the fragment 4s and 3d IAOs for the embedding of Ti in SrTiO 3 in Fig. 8a and the corresponding DMET bath orbitals in Fig. 8b. Note that the DMET bath orbitals are localized at the six O atoms neighboring the Ti and predominantly represent linear combinations of 2p states.

Appendix B: Technical details of cluster-specific bath natural orbitals
In the following we present the technical steps of the cluster-specific bath natural orbital construction in detail. All steps are performed in the supercell, which is only sampled with a single k-point, the Γ-point; for brevity we will abstain from adding this Γ-label in the following equations. Note however that the integral transformation of the density-fitting integrals can be more efficiently performed using the k-point sampled primitivecell (see section C below).
Let us consider the fragment X, with fragment orbitals C X αx and the corresponding DMET bath orbitals with coefficients C X αp , where α denotes crystalline Gaussian atomic orbitals (AOs), x denotes fragment orbitals, and p denotes DMET bath orbitals. We can then project the one-body reduced density-matrix of the underlying Hartree-Fock mean-field calculation, γ αβ , onto the the combined fragment-⊕ DMET-bath space (the"DMETcluster" space), according to where S is the AO overlap-matrix and r, s are general DMET-cluster indices, enumerating both the fragmentand DMET-bath space of fragment X. The projected density-matrix of Eq. B1 can then be diagonalized, resulting in N D X occ eigenvalues 2 and N D X vir eigenvalues 0, where D X denotes the DMET-cluster space of Fragment X. The corresponding eigenvectors, R D X occ and R D X vir , define rotations of the DMET-cluster orbitals to a set of purely occupied and virtual cluster orbitals, according to where we usedĩ (ã) to denote occupied (virtual) cluster orbitals. With this we can now define four spaces: 1) the space of the occupied cluster orbitals of fragment X with coefficients C X αĩ , 2) the space of the virtual cluster orbitals of fragment X with coefficients C X αã , 3) the space of all occupied Hartree-Fock orbitals with coefficients C αi , and 4) the space of all virtual Hartree-Fock orbitals with coefficients C αa . In order to apply the noniterative MP2 method, we furthermore need to define a set of orbital energies . In the untruncated occupied and virtual spaces [3) and 4) above], these are simply the Hartree-Fock orbital energies. In the DMET cluster space, however, we define orbital energies by projecting the Fock matrix separately onto the occupied DMET cluster space and virtual DMET cluster space, and diagonalizing this projected Fock matrix. The eigenvalues of the resulting "pseudo-canonicalized" cluster orbitals will generally not be a subset of the Hartree-Fock energies (this would require that the off-diagonal block of the Fock matrix between the cluster and the remaining space vanishes).
We are now in the position to form the MP2 T 2 amplitudes in either the "cluster-occupied ⊕ full-virtual" or "full-occupied ⊕ cluster-virtual" subspace according to Eq. (1a) and (1b), respectively, and subsequently form the MP2 density-matrices (2a) and (2b). Diagonalizing these density-matrices directly would lead to orbitals with some overlap with the DMET cluster space. For this reason we first project each density-matrix onto its pure environment space according to where t, u (m, n) denote virtual (occupied) pureenvironment orbitals (these are the unentangled orbitals of the DMET bath construction, see Ref. 25). Diagonalizing Eqs. (B4a, B4b) yields the natural occupation numbers as eigenvalues and eigenvectors which define rotations of the pure-environment orbitals, leading to the final set of environment orbitals. This set of orbitals can be ordered according to their natural occupation number and truncated at the threshold η (or 2−η for occupied orbitals), resulting in the final cluster-specific bath natural orbitals.
Appendix C: Folding of k-point sampled quantities In order to converge calculations of periodic solids to the thermodynamic limit, one can either perform Γ-point calculations of increasingly large supercells, or instead only consider the primitive unit cell and increase the k-point sampling of its corresponding first Brillouin zone. In regular full system calculations, the latter approach is generally more efficient, since it is able to easily exploit the full translational symmetry of the system. In contrast, the embedding method presented in this work can make better use of the locality of electron correlation in the supercell formalism, as the corresponding fragment and bath orbitals only have the periodicity of the supercell lattice vectors, i.e. they are allowed to break any internal symmetries in the supercell.
For this reason, we perform the initial, canonical Hartree-Fock calculations using the primitive cell and a M 1 ×M 2 ×M 3 k-point sampling and then fold the orbital basis to the Γ-point of the corresponding M 1 ×M 2 ×M 3 supercell, via a basis transformation of the orbital coefficients according to where C k,αi denotes the k-sampled expansion coefficients of the crystal orbitals (CO) i in terms of crystalline Gaussian atomic orbitals (AO) α, N k is the number of kpoints M 1 M 2 M 3 , and R is a real space vector which points to the origin of each primitive cells within the supercell. Note that the index pairs (R,α) and (k,i) on the left hand side of Eq. (C1) enumerate the supercell, Γ-point AOs and COs, respectively. Matrix quantities with k-point sampling, such as the overlap-or Fock matrix, can also readily be folded to their supercell representation according to as available in PySCF.
A problem arises when folding the k-point sampled AO three-center integrals (L|kα, k β) used in the densityfitting method, as the dimensionality of the resulting supercell integrals (RL|R α, R β) scales as O(N 3 k ) with the number of k-points (or equivalently primitive cells). They thus become unfeasible to store even for modest k-point meshes. On the other hand, the AO three-center integrals are only needed as an intermediate in the orbital transformation to the three-center integrals in the final cluster basis (consisting of fragment and bath orbitals), the dimensionality of which (without pruning of the auxiliary basis) only scales as O(N k ). Instead of folding the AO three-center integrals, we thus unfold the cluster orbital coefficients back into a primitive cell k-point sampled representation, according to where we used the index p to denote a general cluster orbital. These coefficients can then be contracted efficiently with the primitive cell, k-point sampled three-center integrals, i.e.
with O(N 2 k ) scaling (note that k, k , and k are not independent, as a result of crystal momentum conservation). If desired, the k -point of auxiliary dimension can also be Fourier transformed to a real space vector of the supercell, rendering the three-center integrals real in the process.
Three-center integrals of the form of Eq. (C4) are also needed for the subspace MP2 calculations in order to determine bath orbitals (see Eq. (1) where the k -point has been subsumed into the index L). In this case, only elements of the mixed occupied-unoccupied block are needed, however in either calculation either the occupied or the unoccupied space span the entire supercell, while the other space remains restricted to the DMET cluster space. In this case, evaluation of Eq. (C4) requires an additional scaling factor of N k , leading to a final scaling of O(N 3 k ).

Appendix D: Correction of basis set superposition error
Gaussian type orbitals (GTOs) are a natural choice for quantum embedding calculations, as they are inherently local (a single atom centered GTO has a decay in real space that is faster than exponential), making them ideal to define local fragments. Additionally, GTO basis sets are often much more compact than a plane wave basis set and allow for all-electron calculations within a single framework. These benefits come with a downside, as atom-centered GTOs explicitly depend on the arrangement of atomic nuclei in the system, thus leading to a structure-dependent basis set incompleteness error. When comparing relative energy changes of different cell geometries or structures, the change of basis set completeness leads to a spurious energy contribution, which is termed basis set superposition error (BSSE) [56]. For solids in particular, the basis set becomes more complete, when the lattice constant is reduced and less complete when it is increased. As a result, the equilibrium constant is expected to be underestimated, if the BSSE is not accounted for.
In this work, we use the counterpoise (CP) method of Boys and Bernardi [57], to calculate an a posteri BSSE energy correction. To this end, we perform two atomic calculations without periodic boundary conditions for each symmetry-inequivalent atom in the primitive unit cell. In the first calculation, only the basis functions of the atom under consideration are included, whereas in the second we additionally add the GTOs of the other atoms in the unit cell, as well as those GTOs centered at atoms in the directly neighboring unit cells (26 cells in three-dimension or 8 cells in two-dimensions). Note that only the basis functions centered at these atomic positions are added, not the actual nuclei or associated electrons. In both cases, a canonical Hartree-Fock calculation is performed first, followed by an embedded CCSD calculation using the same bath threshold η which was used in the full lattice calculation. The difference in total energy of both calculations then yields an estimate for the BSSE associated with the given atom; the total BSSE can be recovered by adding all atomic contributions, multiplied by the respective symmetry factor, in the unit cell.
We observe that the BSSE correction is important in order to obtain smooth and quick convergence of observables which depend on energy differences at different geometries with respect to the bath threshold, η. Figure 9 compares the convergence of the CP-corrected lattice constant and bulk modulus of diamond which were shown in Fig. 4, to the uncorrected values. While the uncorrected quantities still differ substantially between η = 10 −7 and η = 10 −8 , the CP-corrected values are converged to good accuracy. This observation can be interpreted as follows: the additional bath orbitals, which are present in the η = 10 −8 calculation but not in the η = 10 −7 calculation, still lead to a change in correlation energy. However, this change in energy is also observed when reducing η in the same way in the atomic calculation, which utilizes the extended lattice basis, as part of the CP procedure. This means that this contribution to the correlation energy does not stem from physical interactions of an atom with the lattice environment, but is purely a result of the finite basis size. Since the finite basis error changes with the lattice geometry and introduces an bias towards compressed geometries, it is beneficial to remove this energetic contribution in order to accelerate the convergence of physical properties. Loosely speaking, the electron correlation between different parts of the solid converges more quickly with respect to the bath size than the resolution of the finite basis error.  10. Uncorrected and counterpoise (CP) corrected total energy of graphene for the def2-SVP (top) and def2-TZVP (bottom) basis set, for a 8×8 supercell, calculated with embedded CCSD using η = 10 −8 . The pentagons denote the equilibrium lattice constants from a quadratic fit of the data points, with the grey stripe showing the experimental uncertainty. Note the significantly different energy scale of the two plots. Figure 10 shows the effect that this BSSE correction has on the equation of state of graphene, calculated with both the def2-SVP and def2-TZVP basis sets. As expected, the uncorrected curves have their minimum at smaller lattice constant than the CP corrected curves.
Appendix E: Total energy of graphene Figure 11 shows the total energy of embedded CCSD for 10×10 graphene as a function of the lattice constant. The def2-TZVP basis set was used and the datapoints  were fitted with parabolas, shows as solids lines. The pentagons mark the minima, i.e. the equilibrium lattice constants of each fit, which are also shown in Fig. 5 as a function of supercell size.
Appendix F: Convergence of 3d occupations in SrTiO3 Figure 12 shows the convergence of the number of electrons in Ti the e g and t 2g orbitals, calculated with embedded CCSD, as a function of cluster orbitals. Even with the small bath expansion of η = 10 −4 , the number of electrons deviates substantially from the Hartree-Fock reference. Lowering η further, the e g occupation converges quickly, while the t 2g occupation still has a slight upwards trend and a change of around 0.01 electrons between η = 10 −7 and η = 10 −8 . Nevertheless, this difference is very small and does not affect the conclusions of the main text.