Multi-layered atomic relaxation in van der Waals heterostructures

When two-dimensional van der Waals materials are stacked to build heterostructures, moir\'e patterns emerge from twisted interfaces or from mismatch in lattice constant of individual layers. Relaxation of the atomic positions is a direct, generic consequence of the moir\'e pattern, with many implications for the physical properties. Moir\'e driven atomic relaxation may be naively thought to be restricted to the interfacial layers and thus irrelevant for multi-layered heterostructures. However, we provide experimental evidence for the importance of the three dimensional nature of the relaxation in two types of van der Waals heterostructures: First, in multi-layer graphene twisted on graphite at small twist angles ($\theta\approx0.14^\circ$) we observe propagation of relaxation domains even beyond 18 graphene layers. Second, we show how for multi-layer PdTe$_2$ on Bi$_2$Se$_3$ the moir\'e lattice constant depends on the number of PdTe$_2$ layers. Motivated by the experimental findings, we developed a continuum approach to model multi-layered relaxation processes based on the generalized stacking fault energy functional given by ab-initio simulations. Leveraging the continuum property of the approach enables us to access large scale regimes and achieve agreement with our experimental data for both systems. Furthermore it is well known that the electronic structure of graphene sensitively depends on local lattice deformations. Therefore we study the impact of multi-layered relaxation on the local density of states of the twisted graphitic system. We identify measurable implications for the system, experimentally accessible by scanning tunneling microscopy. Our multi-layered relaxation approach is not restricted to the discussed systems, and can be used to uncover the impact of an interfacial defect on various layered systems of interest.

Naturally, the relaxation itself is strongest at the defect interface, but we demonstrate here that it may also * These authors contributed equally extent beyond the two atomic layers at the interface. Therefore, the "Lego bricks" analogy put forward by Geim and Grigorieva [1], where heterostructures are composed of mechanically independent layers, though extremely helpful for illustration purposes, needs to be refined considerably to incorporate the flexible nature of each layer as well as the interplay of mechanical relaxation between different layers. We contrast the two paradigms in Fig. 1, where the left panel illustrates how to understand a stack of two multi-layered, mismatched materials in the picture disregarding relaxation. As no relaxation occurs there is a sharp transition from the bottom stack (green) to the top stack (purple) visualized by the difference in width. In contrast, a multi-layered relaxation leads to a smooth deformation of layers from the bottom to the top (right panel), with signatures of the moiré superlattice slowly decaying away from the interface (yellow vortices).
In fact, we show, elaborate domain formation even far away from the interface, stressing the problem is truly three dimensional in nature. From this follows that material properties and the electronic structure [57][58][59][60] may be substantially influenced as soon as domain formation occurs, not only in direct vicinity of the interface. Figure 2 summarizes experimental evidence for the fact that the domain formation picture is rather complex, and for its three dimensional nature. Figure 2 explore piezoresponse force microscopy (PFM) maps of a terraced few layer graphene (FLG) structure on top of a trilayer graphene (TLG) substrate at a twist angle of 0.66 • between the TLG and the FLG part of the stack. PFM is an atomic force microscopy (AFM) contact-mode variant which has been shown to be a powerfull large area imaging tool of twisted moiré superlattices [61]. It measures the deflection of the cantilever of a conductive tip at a given point on the sample, due to an oscillating voltage between the metallic tip and a grounded sample (more details in Appendix A). Simplistically, the amplitude of the signal can be interpreted as a measure of local deformation resulted by the electric field, i.e. a local piezoelectric or flexoelectric effect (See Ref. 61 for a dedicated study on the use of PFM for visualization of moiré superlattices in in van der Waals heterostructures). In the context of this work we consider the PFM contrast as a qualitative measure of the strength of the relaxation at the sample surface. The terraced FLG has monolayer graphene (MLG), bilayer graphene (BLG), and TLG segments, (highlighted over the optical image in Appendix A), and studied by PFM in Fig. 2(a-c) respectively. As the top layer thickness increases the PFM contrast quickly decays. On the other hand, when a similar measurement is performed for a smaller twist angle of 0.14 • (Fig. 2(d-f )) the moiré pattern can be visualized by PFM at the surface of a considerably higher stack, even up to 18 atomic layers above the twist interface ( Fig. 2(f )). A different consequence of the layered atomic relaxation emerges in atomically mismatched heterostructures. Such a case is presented in Fig. 2(g,h) for an epitaxially grown PdTe 2 over a Bi 2 Si 3 substrate [62]. In this system a moiré superlattice emerges from the lattice mismatch between the two materials. Surprisingly, the domain shapes and even the moiré period itself seem to depend on the layer thickness of the PdTe 2 stack. In fact, the observed moiré period exceeds the maximal (zero twist angle) predicted moiré period, solely considering the lattice mismatch. All of the above suggest that the relaxation process and the resulting domain structures are more involved than considered thus far. The spatial extent of the relaxation process and even the moiré period depend on the twist angle and material properties.
In this work we first develop the multi-layered relaxation framework in Section II. The framework allows a description of large systems (both in terms of thickness as well as moiré period) which cannot be accessed by atomistic models [63][64][65][66][67][68]. We will then use this framework to reproduce the data in Fig. 2 (cf. Section III), explore the three dimensional nature of the relaxation process (cf. Section IV), and study its consequences for the electronic structure using graphitic stacks as a prototypical example in Section V.

II. MULTI-LAYERED RELAXATION -THE APPROACH
In this section we develop the mathematical framework to allow the description of the atomic relaxation in a multi-layered stack of vdW materials. We follow a continuous relaxation approach, conceptually similar to Refs. 69-72. By continuous relaxation, we assume that the in-plane displacement fields change on a length scale larger than the atomic unit cell spacing. Using the above assumption for continuous models, the atomic relaxation can be formulated in terms of the energy functional being a sum of a stacking mismatch and an elastic energy term. The sought after solution is the displacement field that minimizes the total energy. Ref. 69 formulated the problem in configuration space for twisted bilayer systems. Ref. 71 introduced a generalized formalism to potentially treat multiple twisted interfaces, later implemented by Ref. 70 for a specific system of double twisted trilayer graphene. In Ref. 72 we solved the problem in real space, thus allowing an omission of the periodicity requirement imposed by the previous treatments in configuration space. It allowed treating experimentally realistic strain induced relaxation patterns. The derivation follows the lines and notation of Ref. 72, but instead of treating each flake of the heterostructure as a single elastic membrane, here we assume a stack of m layered materials labelled by j ∈ {1, 2, . . . , m} with a thickness of n j layers, respectively, where j = 1 denotes the lowermost and j = m the uppermost flake of the heterostructure. Crucially, we allow for individual relaxation in each layer with the in-plane displacement fields of layer k of flake j being described by   The interface between flakes j and j+1 is formed between layer k = n j of flake j and layer k = 1 of flake j + 1. The energy functional in our model is composed of the elastic energy density and the stacking energy density, We assume that the n 1 +n 2 +· · ·+n m displacement fields change on a length scale larger than the lattice constants α j in accordance with prior literature [69,71,72]. However, Ref. 72 was restricted to a single interlayer interface, where now we consider one interface from the twist or lattice constant mismatch between each set of flakes j and j + 1, which after relaxation induces n j − 1 interfaces within flake j and similarly n j+1 − 1 interfaces within flake j + 1. We show here that only such a modelling captures the important, finite penetration depth of the relaxation induced by stacking vdW heterostructures from one to a few to a large number of layers in the flakes. To this end, we will therefore assume that the stacking energy term can be described by two sets of GSFEs: one set covering the initial interfaces between flakes and another set describing the relaxation-induced interfaces within each of the m separate flakes. Note that the separation of the system into a sum over the different interfaces makes a further approximation. It neglects contributions to the stacking energy from neighboring layers beyond the nearest neighbor. Such an approximation would fail to capture subtle differences like the energy imbalance between the Bernal and rhombohedral phases in twisted double bilayer graphene [72] and could be relaxed in principle in the future. First, we write the elastic energy density E elastic (∇u) as a function of the gradients of the displacement fields ∇u: with ε (j,k) (∇u) the layer resolved elastic energy density, and K j and G j the bulk and shear moduli of flake j, respectively. The stacking energy density, E stacking (r, u), explicitly depends on position and displacement fields. The general expression is a function of the stacking configuration of each interface. As the stacking configuration is defined in terms of the underlying Bravais lattice, we denote the lattice vectors of flake j with α j a j 1,2 where a j 1,2 are normalized lattice vectors. Here, we choose the normalization factors α j to be the lattice constants. For the general case, the relative length of lattice vectors can be included in a j 1,2 , and α j acts as an overall length scale of flake j. Further, we can always define a two-dimensional matrix A j that transforms to the lattice vectors such that We then proceed to define the stacking configuration Ω 0 of each layer k in flake j as The origins of the unit cells of the layers within the m rigid flakes, respectively, can be chosen at r j = α j (la j 1 + ma j 2 ) with integers l and m, such that for zero displacement field Ω We assume that each unit area on each interface between two layers contributes to the stacking energy density according to the GSFE for that interface with the relative stacking configuration as input. So we define the relative stacking configuration Ω * for the intra-flake interfaces k j = 1, . . . , n j − 1 as The relative stacking configuration at the inter-flake interfaces can similarly be expressed as As we also want to explicitly treat isotropic expansion of each flake and constant stacking registry shifts of each layer (such as Bernal (AB) stacking in graphene), we define the total displacement of flake j and layer k as U (j,k) = u (j,k) + j r with j the expansion parameter and add the space independent constant ∆Ω (j,k) to the stacking configuration of each layer. These refinements yield the following expressions for the stacking configurations of all interfaces: In general, we formulate the stacking energy density of the system as a function of the relative stacking configurations Ω: For the specific physical systems under consideration in this work, we we can use the known form of the GSFE functional [69][70][71][72] for each of the interfaces: + c 3 cos(2v) + cos(2w) + cos(2v + 2w) where we set (v, w) = Ω as the components of the relative stacking configurations. The parameters c 0,...,5 depend on the type of interface. As the systems considered in this work consist of m = 2 flakes, we need three sets of parameters to calculate the stacking energy density, namely c  Table II for numerical values for the systems under consideration in this work).
To find the solution u (j,k) (r) and j , we minimize the total energy using standard unconstrained minimization techniques (trust region algorithm). Note that changes to j affect the size of the moiré unit cell, so one has to properly account for this effect in defining the grid and the strain terms.

III. MISMATCHED SYSTEM TEST CASE
Motivated by the experimental findings summarized in Fig. 2(g,h) we consider a PdTe 2 /Bi 2 Se 3 heterostructure as a specific example of a mismatched heterostructure. The underlying Bravais lattice of both materials is triangular, so we set the lattice vectors as The GSFE function parameters c 1,...,5 for each interface of this multi-layered PdTe 2 on a Bi 2 Se 3 substrate, and the bulk and shear moduli for both materials are taken from DFT (see Appendix D). For simplicity we fix the substrate and allow all PdTe 2 layers to relax individually. We further let the PdTe 2 flake expand isotropically via 2 . This degree of freedom is crucial in order to explain the observation of Fig. 2(g,h). Figure 3 summarizes the multi-layered relaxation results for different numbers of PdTe 2 layers. Figure 3(a,b) show how as the layer number drops the relaxation becomes stronger, leading to hexagonal domains and domain expansion. The domain expansion is further quantified in Fig. 3(c) covering the transition between the thick and thin PdTe 2 regimes. It is important to note that all the considered cases have shown a moiré period λ in excess of the accessible moiré periods as expected without including the isotropic expansion term (blue shaded region in Fig. 3(c)). Such an excess moiré period is in agreement with the experimental results. Our results highlight the multi-layered trade-off in search of an energy minimizing solution by the PdTe 2 flake: By conforming to the lattice constant of the Bi 2 Se 3 substrate the stacking energy is reduced, at the expense of a penalty of an elastic energy. As the number of layers increases the elastic energy term becomes too costly and the relaxation is suppressed. However, at the thin PdTe 2 limit due to the relative softness of PdTe 2 (see Table II) the elastic energy penalty due to stretching the PdTe 2 is not severe in comparison with the stacking energy benefit, and the result is an enhanced moiré period. Figure 3(c) also highlights the penetration of the relaxation field (here probed by the layer dependent moiré periodicity λ) induced by the PdTe 2 /Bi 2 Se 3 interface into the stack which decays exponentially with the number of layers n PdTe2 . To visualize the layer dependent moiré period, we assembled a video of the pattern in Fig. 3(a,b) varying n PdTe2 over time [73]. Finally, we note that the layer-dependent moiré wavelength in PdTe 2 /BiSe 2 is quantitatively reproduced by our multi-layered relaxation framework, considering our

IV. STRUCTURAL STUDY OF A GRAPHENE BASED SYSTEM
Next, we highlight the impact of employing the multilayered relaxation scheme by the analysis of a graphene based system compared to the experimental findings reported in Fig. 2(a-f ). To this end, we study the structural properties of two flakes of Bernal stacked multilayered graphene stacked with a twist at the interface of the two flakes. As in the previous section, the Bravais lattice of graphene is also triangular. However, we need to implement the twist angle θ as a two-dimensional rotation matrix in the transformation from flake 1 to 2: A 2 = R(θ)A 1 . Moreover, the GSFE parameters and shear and bulk moduli of graphene are presented in Appendix D.
As a primary result, we obtain the layer-and spaceresolved elastic energy density ε (j,k) (r). This can be viewed as a layer-resolved elastic energy tomograph scanning through the height of the stacks. plays this result for a small twist angle system (θ = 0.035 • ) and n 1 = n 2 = 60. The layer-resolved elastic energy tomograph emphasizes two main points: First, the relaxation pattern decays exponentially as a function of distance to the interface. Second, the system forms triangular relaxation domains separated by domain walls connecting the AB/BA regions of the interfacial moiré pattern. In Fig. 4(b), we plot the elastic energy density along the line cut across a domain wall (as indicated in the tomograph of Fig. 4(a)) for all layers. At the interface, these domains have very sharp domain walls. The width of the domain wall grows as the distance to the interface increases, while at the same time its height declines. In Fig. 4(c) we quantify this behaviour by plotting the full width at half maximum (FWHM) as a function of layer index k. We color-code the value of k consistently across panels (a-c).
Upon varying twist angle, we can examine how the relaxation behaviour evolves. To this end, we analyze the scaling of the atomic displacements u (j,k) (r) as a function of k in the second flake j = 2. To do so we quantify the strength of the relaxation in layer k of flake 2 by defining a layer-dependent scaling factor s(k) which is chosen such that the in-plane integrated squared difference of displacement fields of the k-th layer u (2,k) (r) and the first layer u (2,1) (r) is minimal. Figure 4(d) shows the scaling factor s(k) for five twist angles 0.035 • ≤ θ ≤ 0.107 • , with red curves corresponding to the small twist angle (large moiré wavelength λ) case and blue curves to the large twist angle (small λ) case. For each of the twist angles considered, we additionally set the number of layers of the second flake to n 2 = 30 (dashed lines) and n 2 = 60 (solid lines). Upto a cutoff when approaching the upper end of the stack (dependent on the height n 2 ), we find penetration of the relaxation fields into the stack which decays exponentially. The exponent decreases with the twist angle, yielding longer-ranged relaxation effects for smaller twist angle. We quantify the exponential decay of the relaxation effects by determining the penetration depth L relax. of the relaxation pattern. Here, we define L relax. through and calculate it using a fit for the case n 2 = 60 as a function of λ. The result is shown in Fig. 4(e) with consistent color-coding of λ and θ across panels (d-e).
We observe that the penetration depth L relax. (λ) scales linearly with λ in the small-θ and large-n 2 limit. For the two angles observed in the experiment ( Fig. 2(a-f )), we thus expect a penetration depth that is 0.66 • /0.15 • ≈ 4.7 times as large in the 0.15 • system than in the 0.66 • system. As the 0.66 • case cannot be assumed to be in the large-n 2 limit (n 2 = 3), we conclude that the theory qualitatively agrees with the experimental data -18 layers/3 layers is the same order of magnitude as 4.7.

V. MULTI-LAYERED RELAXATION'S IMPACT ON THE ELECTRONIC STRUCTURE OF A GRAPHITIC SYSTEM
As suggested by our experimental data and confirmed by using the multi-layered relaxation framework, the inplane relaxation can affect layers far from the interface of the flakes. For twisted graphitic systems this becomes particularly prominent at small twist angle (cf. Figs. 2  and 4). Such in-plane relaxations generate effective magnetic fields [74][75][76][77] and therefore may have a substantial impact on the electronic structure. Furthermore, using relaxed atomic positions has been shown to be crucial to explain low-energy properties of twisted graphitic systems [57-59, 78, 79]. Therefore, we analyze the effect of multi-layered relaxation on the electronic structure of twisted many-layer graphene systems in the following.
We focus on the case of several layers of Bernal stacked graphene on bulk graphite (corresponding to the small angle experimental PFM data shown in Fig. 2). We approximate the bulk by n 1 = 20 graphene layers and vary the number of layers in the top (twisted) flake from n 2 = 3 to n 2 = 20. We set the twist angle to θ = 0.1 • (the θ = 0.8 • case can be found in Appendix F) which implies a unit cell of ∼150 nm in size and ∼650,000 carbon atoms per layer. For a multi-layer geometry, the number of atoms is intractable for a tight-binding framework, and so we adopt the Bistritzer-MacDonald continuum model of twisted bilayer graphene [80][81][82][83][84] to include arbitrary in-plane relaxations (see Refs. 85 and 86) and Bernal multilayer hopping terms [65,[87][88][89].
For our analysis, we make use of the Bistritzer-MacDonald Hamiltonian, which can be viewed as a reciprocal moiré lattice vector expansion of the full tightbinding Hamiltonian of the system. The expansion as presented below is valid for SE-odd moiré structures [81], which applies to the graphitic systems studied in this work. With B 1,2 the reciprocal moiré lattice vectors, we expect the multi-layered Hamiltonian to be a matrix in the "indices" G = pB 1 + qB 2 , p, q ∈ Z. The expansion is then truncated at G < G c , where G c must be chosen large enough to ensure convergence. We further need to account for the sublattice (A, B), flake, layer, and valley ξ = ±1 degrees of freedom. For the sake of clarity of notation we drop flake indices and continuously index the layers with l = 1, . . . , n 1 + n 2 , where l ≤ n 1 corresponds to flake j = 1 and l > n 1 to flake j = 2.
Our starting point is the intralayer continuum Hamiltonian, which is composed of the following three parts: Only the standard Bernal intralayer Hamiltonian explicitly depends on momentum k (in the moiré Brillouin zone). As the low-energy states all are close to the individual graphene layers' K ξ -points, we shift and rotate the momentum vector: with R(θ l>n1 ) = R(θ) for the twisted layers in flake 2 and R(θ l≤n1 ) = 1 for the non-twisted layers in flake 1.
The K ξ -points of a given layer can be expressed by the reciprocal moiré lattice vectors as Using K v = ξK v,x + iK v,y , the Bernal intralayer Hamiltonian as a matrix in (A, B) sublattice space is then expressed as Here, v F denotes the Fermi velocity of graphene. We further include the intrinsic symmetric polarization δ l isp that is nonzero only in the central (twisted) layers: δ n1 isp = δ n1+1 isp = δ isp . Additionally, we account for sublattice polarization with δ p,A = δ p , δ p,B = 0 in all layers.
The other two parts of the intralayer continuum Hamiltonian Eq. (15) depend on the in-plane displacement field. As the in-plane displacement field is periodic in moiré lattice vectors and varies within the moiré unit cell, we apply a Fourier transform and obtain a representation in reciprocal moiré lattice vector space: First, we can write down the vector potential of the effective magnetic field generated by the displacement field as a function of G: This potential is added to the Hamiltonian with a coupling strength g 2 by standard substitution: with σ x and σ y Pauli matrices in sublattice space. Note that the initial dependency of the displacement field on the position vector r is translated to a (reciprocal moiré lattice) momentum transfer G − G . Second, the displacement field also generates a static potential that leads to the following contribution to the intralayer Hamiltonian: Here, g 1 is the static potential's coupling strength.
The intralayer Hamiltonian is accompanied by several interlayer coupling terms. For all non-twisted interfaces, we employ an effective graphite Hamiltonian. Its coupling from one to the next layer reads (as a matrix in (A l B l ; A l+1 B l+1 ) sublattice space) [65,89] H GG ,ξ,l l+1 The coupling to second-next layers is taken into account as shown in Ref. 87 and reads as a matrix in (A l B l ; A l+2 B l+2 ): For the values of the hopping parameters γ 1,...,5 see Table I. The interlayer Hamiltonian of the twisted interface between layers l = n 1 and l = n 1 + 1 encodes the interfacial moiré pattern and is therefore a matrix in G. The coupling function U ξ depends on the difference of in-plane displacement and is most easily formulated in real-space: Here, t 0 is the out-of plane hopping and M ξ j are matrices in sublattice (A n1 ,B n1 ; A n1+1 ,B n1+1 ) space: with c = 1.22333 the ratio of AB-stacking to AA-stacking interlayer tunneling due to out-of plane corrugations and w = exp(2πi/3). The unperturbed momentum transfer vectors δk j read and the displacement induced momentum transfer vectors Q ξ j are the center of the twisted and non-twisted K ξ -points of the original graphene lattices: where b 1,2 (b 1,2 ) denote the reciprocal lattice vectors of the non-twisted (twisted) graphene lattice. To arrive at a consistent description we further take the Fourier transform of the real-space interlayer tunneling U ξ (r) and obtain a coupling function U GG ,ξ .  The full single-particle continuum Hamiltonian for valley ξ follows as where δ ll bulk = 1 for l and l in the same flake and δ ll twist = 1 for l and l corresponding to the twisted interface coupling (l = n 1 to l = n 1 + 1). For all other combinations of layers, these "delta functions" are zero. As the full Hamiltonian requires several numerical parameters, we provide an overview of their values in Table I.
The application of the continuum description Eq. (32) to the θ = 0.1 • twisted many-layer graphene on graphite system is presented in the following. We inspect the local spectral density in the top layer of the twisted flake as an observable experimentally accessible by scanning tunneling microscopy. To isolate the effect of the relaxation we compare the rigid system (where u (j,k) (r) ≡ 0) to the relaxed one where we apply the method of Section II to find u (j,k) (r).
The left subpanels in Fig. 5 present the integrated (over space) top layer density of states (DOS) as a function of energy ω. Over all panels (a-f ), the top flake layer number n 2 is varied from n 2 = 8 to n 2 = 20 (the n 2 = 3 to n 2 = 6 cases are shown in Appendix F). We subdivide the rigid (a,c,e) and the relaxed (b,d,f ) cases. To highlight the spatial variations, we color the curves with the coefficient of variation of the top layer LDOS (over space) that is defined as c v (l, ω) = std r LDOS(l, ω, r) mean r LDOS(l, ω, r) .
Additionally, we add a gray background that corresponds to the minimum/maximum values taken by the LDOS of the outermost layer. By the DOS and its spatial variations we can identify regions in ω (for each n 2 ) where the effect of relaxation is most pronounced. We highlight these regions by vertical dashed lines colored from red to blue. The right subpanels in each row (a-f ) show the The frequencies are indicated in the left subpanels as dashed, colored vertical lines varying from red (low frequencies) to blue (high frequencies). The color-maps for the LDOS maps are chosen to be consistent with this coloring. Light (yellow) regions correspond to high intensity and dark (red → blue) regions to low intensity. For each configuration, the rigid (gray) and relaxed (black) panels in two subsequent rows share a color scale to make differences visible. The system's unit cell is indicated in each of the small subpanels. All plots on the right hand side share the scale bar plotted in the top left subpanel.
corresponding spatial map of the outermost layer LDOS for these specific ω and n 2 values. Note that two subsequent rows [e.g. (a) and (b)] share a color-map for each ω and n 2 such that differences from the relaxed to the rigid case become visible. The color-maps of the LDOS maps are chosen to correspond to the colors indicating ω in the DOS plot on the left. We provide the LDOS maps for all frequencies as videos in the supplemental material [73]. For n 2 = 8 and n 2 = 12, the LDOS maps for rigid atomic structures (a,c) clearly show less variation than the ones for relaxed atomic structures

VI. OUTLOOK
Our work highlights the effects of multi-layered relaxation in various vdW heterostructures. We have shown experimentally using PFM that for multi-layered graphene stacks with a twisted interface the atomic displacement can propagate deep into the flake for small twist angles [see Fig. 2 (a-f )]. We complement this experimental observation of the relevance of relaxation in twisted vdW heterostructures with an untwisted mismatched system: multi-layered PdTe 2 on a Bi 2 Se 3 substrate. In this system, multi-layered relaxation is of great importance as well and it becomes distinctively visible as a change in lattice constant as the number of PdTe 2 layers is increased [cf. Fig. 2 (g,h)]. We thus establish that multi-layered relaxation is highly relevant for the engineering of stacked vdW heterostructures in the twisted and non-twisted context. On the one hand this emphasizes the need of deepened understanding of relaxation and a refinement of the lego-like picture often employed in the field [1]. On the other hand, it adds yet another tuning knob to the thriving field of vdW heterostructure engineering. As exemplified here for PdTe 2 on a Bi 2 Se 3 substrate, the effective lattice constant can be tuned over a rather versatile range by multi-layered relaxation effects.
In tandem, we develop a machinery that allows us to simulate and understand these relaxation properties given the generalized stacking fault energy functional from ab-initio calculations. Within this approach, we treat each layer of the heterostructure stack as a separate membrane. Our simulations fully support the experimental evidence of three dimensional, layered relaxation patterns in both systems: First, we reproduce the stacking height dependent lattice constant evolution in a stack of PdTe 2 on Bi 2 Se 3 (Fig. 3). Second, we observe that the penetration depth of relaxation domains in twisted graphite stacks linearly increases with moiré wavelength (Fig. 4). These two examples emphasize how the three dimensional nature of multi-layered heterostructures needs to be taken into account when modeling their atomic structure.
Ultimately we predict differences in electronic properties arising from the atomic relaxation in multi-layered graphene twisted on graphite. Similarly to the notion of penetration depth of the relaxation pattern, we see that the local density of states in the outermost layer of the graphene stack sensitively depends on whether the atomic positions of the crystal are relaxed or not. For a small twist angle (θ = 0.1 • ), we demonstrate that the local variation of the electron density in the outermost graphene layer is much stronger when the multi-layered relaxation pattern is taken into account properly (Fig. 5). We propose to analyze such multi-layered (small angle) twisted graphene on graphite stacks using scanning tunneling microscopy in the future, as the local electronic density as a function of frequency can be used to probe the here predicted effects of relaxation on the electronic structure in precise manner. The strong effects of relaxation on the electronic properties of the stack highlight that also these properties, including their collective behavior, can be engineered by the relaxation of stacked vdW heterostructure even beyond the few layer paradigm.

ACKNOWLEDGMENTS
Nano-imaging research at Columbia is supported by DOE-BES grant DE-SC0018426. Research on atomic relaxation is supported by W911NF2120147. We acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under RTG 1995 and RTG 2247, within the Priority Program SPP 2244 "2DMP", under Germany's Excellence Strategy -Cluster of Excellence Matter and Light for Quantum Computing (ML4Q) EXC 2004/1 -390534769 and -Cluster of Excellence and Advanced Imaging of Matter (AIM) EXC 2056 -390715994. We acknowledge computational resources provided by the Max Planck Computing and Data Facility and RWTH Aachen University under project number rwth0716. This work was supported by the Max Planck-New York City Center for Nonequilibrium Quantum Phenomena. DNB is Moore Investigator in Quantum Materials EPIQS GBMF9455. DH was supported by a grant from the Simons Foundation (579913). We thank Oxford Instruments Asylum Research for performing the large scale tapping mode scan presented in Appendix A (Fig. 7).

Appendix A: Graphene experiment
Using an optical microscope, we identify exfoliated graphene flakes with tiered graphene thicknesses in close proximity to a bulk graphite flake from the same crystal. Samples are then assembled using a dry transfer technique using a slide with polycarbonate (PC) film on top of a polydimethyl siloxane (PDMS) dome [90]. We use the slide to first pick up a BN flake of thickness 30 − 40 nm, which is then placed into contact with the bulk graphite flake. After picking up the bulk graphite, we then rotationally misalign the transfer stage by a small angle of less than 1 degree before picking up the tiered graphene flake. The completed heterostructure remains on the polymer stamp slide during subsequent experimental measurements. Figure 6 presents optical images of the completed heterostructures studied in Fig. 2. Topography data of these samples are shown in Fig. 7. While the large scale topography maps of Fig. 7(a,d) indicate the existence of bubbles and wrinkles in the structures, the samples have an abundance of flat regions that were meticulously studied. Fig. 7(a) was acquired with a Bruker Dimension Icon with a Nanoscope V Controller, and Fig. 7(d) was acquired by a Jupiter XR AFM by Oxford Instruments Asylum Research. We use piezoresponse force microscopy (PFM) to image the moiré superlattice between the rotationally misaligned tiered graphene flake and the bulk graphite [61]. In PFM, we apply an AC bias between the conductive probe and the microscope stage (on which the sample is directly mounted, giving an approximate distance of about 500 µm between biased tip and ground), which induces a periodic deformation of the sample through the piezo-electric effect. Even though the sample is not contacted, and thus one may raise the concern of its potential being ill-defined, empirically the results of PFM studies do not seem to be sensitive to the sample being properly grounded. Apparently, due to the proximity to a grounded plane, the stage still provides a good dcreference for the oscillating fast perturbations made by the metallic tip (at hundreds of kHz). The phase and amplitude of the local deformation provide the strongest contrast to visualize moiré in our samples. AC bias magnitudes were 500−1000 mV with resonance frequencies in the range of 270 − 300 kHz for vertical and 750 − 850 kHz for lateral PFM. The experiments were performed in a Bruker Dimension Icon with a Nanoscope V Controller, using Oxford Instrument Asylum Research ASYELEC-01 Ti/Ir coated silicon probes. While the PFM signal is commonly interpreted as a measure of local deformation resulting from the electric field exerted by the tip, i.e. a local piezoelectric or flexoelectric effect (see Ref. 61) the exact physical mechanism responsible for the contrast is not necessarily purely of such origin. When looking at the contact mode scans of Fig. 7(b-c,e-f ), taken simultaneously with those of Fig. 2(a-b,d-e) respectively, one may notice faint periodic structure, which may reflect some topographic features of the moiré lattice. Regardless of the exact mechanism for the PFM contrast, we consider it to be correlated with the strength of the relaxation at the sample surface.
The PFM scans of twenty different locations in the ∼0.66 • sample revealed twist angles ranging from 0.63 • to 0.8 • , based upon the moiré wavelength. For the ∼0.1 • sample, we took scans in forty-eight different locations, showing a rather uniform twist angle of 0.11 • ± 0.02 • through the two to five top layer graphene and bulk top layer graphene regions. In both samples the thinner toplayers regions (MLG and to some extent the BLG as well) were more susceptible to strain, as indicated by the moiré superlattice.
One may wonder whether the observed moiré patterns may actually originate from hBN/graphene moiré interface. For the case of the single, bi-and trilayer graphene twisted at an angle of 0.66 • (cf. Fig. 2(a-c)), we note that we can exclude a potential moiré lattice formed by aligned hBN and the bottom multi-layered graphene as a source of PFM contrast due to the different length scales involved. At maximum, hBN and graphene form a moiré lattice with λ max hBN ≈ 14 nm [91], whereas the moiré wavelength observed experimentally is λ graphene = a graphene /(2 sin(θ/2)) ≈ 18 − 22 nm, which is significantly larger than 14 nm; and therefore corresponds to the mentioned graphene-graphene moiré lattice with twist angle of θ = 0.66 • . system with base pressure below 2 × 10 −10 mbar. The Bi 2 Se 3 was prepared by in-situ cleaving the surface and subsequent annealing to 250 • C for one hour to degas. Then, high-purity Pd (99.95%) and Te (99.9999%) were evaporated from an electron-beam evaporator and a standard Knudsen cell, respectively, with a flux ratio of 1:10. The deposition rate of Pd and Te atoms was monitored  Fig. 2(a-c), as reflected by the deflection error channel in contact mode. The layer markings are as in Fig. 6(a). (b-c) Topography maps in contact mode taken simultaneously with Fig. 2(a-b) respectively ((b-c) share a color-scheme and a scale-bar). (d) Topography map of the sample measured in Fig. 2(d-f ), as reflected by tapping mode scan. The layer markings are as in Fig. 6(b). by a quartz oscillator. The temperature of substrate was kept at 210 • C during the growth.
After the initial annealing of the Bi 2 Se 3 , the substrate was transferred in-situ to the Aarhus STM stage with a Tungsten tip. STM measurements were taken of the substrate to ensure the quality of the substrate. Once deposition of the Pd and Te was complete, the sample was transferred in-situ to the STM stage for surface topography mapping. While our atomic relaxation model is restricted to inplane relaxation, we argue that the most important contributions of out-of-plane strain are still captured. During the DFT calculations of the GSFE function, the inter-layer spacing is relaxed separately for each stacking configuration. The result is a map of the GSFE function V GSFE (v, w), but also a stacking dependent inter-layer spacing function d (v, w). The only effect with relax. w/o relax.
FIG. 8. Effects of curvature on atomic relaxation process in twisted bilayer graphene. Comparing total energy density (red) to the elastic energy term resulting from membrane bending (blue -more details in the text) before and after relaxation (dashed and solid lines respectively) as a function of original (before relaxation) stacking configuration (on a path in configuration space indicated in the inset). The out-of-plane displacement was taken to be the layer spacing at the local configuration as presented in Table III of Ref. [94] (and independently corroborated).
that is omitted in this calculation is the contribution to the elastic energy emanating from the bending of the membrane. The bending energy can be written as E b = d 2 r 1 2 κH 2 + κ G K [92], where H and K are the mean and Gaussian curvatures respectively and their respective rigidities κ and κ G . Similar to Ref. 92 we approximate H and K as H ≈ ∇ 2 f and K ≈ det (∂ i ∂ j f ), where f is the out of plane strain of the membrane. For the sake of the estimate we assume f ≈ d, namely, that the shape of the membrane is dictated by the local stacking configuration. Only if the curvature related terms of the elastic energy density are significant relative to the total energy density at a given stacking configuration can one argue that this effect is of importance in the atomic relaxation process. Fig. 8 compares the curvature terms using the above assumptions with the total energy omitting this effect, before and after relaxation. It is clear that the curvature contribution is secondary, therefore we omitted these terms, to greatly simply the model. However, there is no fundamental problem preventing from including this effect in the model. In the estimate of Fig. 8 we took κ = κ G = 10 eV following the calculations of Ref. 93 and the material parameters for graphene detailed in Table II. Spectral densities generally can be defined in an arbitrary basis of the Hamiltonian. The layer-resolved local density of states, i.e. LDOS(l, ω, r), is nothing else but the spectral density in a basis consisting of real-space-, layer-, sublattice-, valley-and momentum-indices. The sublattice (x), valley and momentum degrees of freedom are summed over to obtain a function of l and r, and frequency ω. As our model Eq. (32) is forumlated in reciprocal moiré lattice vector (G) space, we obtain the following Green's function: with b a band index and b ξ (k) the dispersion of band b of valley ξ at momentum k. Note that, since the Hamiltonian is diagonal in the valleys ξ = ±1, the Green's function carries only one valley index as well. For a realspace description, we need to transform the G indices to real-space: Here, the number of reciprocal moiré lattice vectors N G is needed for proper normalization. The representation in a diagonal basis, Eq. (E1), numerically facilitates the Fourier transform: Instead of transforming both indices in a double sum [cf. Eq. (E2)], we can directly transform the "Bloch"-functions u b G,ξ,l,x (k) to real-space: Then we perform the analytic continuation iω → ω+iη and trace out the sublattice-, valley-and momentumindices to arrive at LDOS(l, ω, r) = −Im The broadening parameter η determines the energy resolution. We set η = 5 meV for all simulations of the θ = 0.8 • systems and η = 0.8 meV for the θ = 0.1 • case. The momentum summation is carried out on an equispaced mesh with 14 × 14 points in a C 3 reduced wedge of the Brillouin zone. For numerical calculations of the system's electronic properties we employ specialized code operating on a hybrid CPU/multi-GPU (OpenMP/CUDA) architecture. Convergence of the inverse moiré lattice vector expansion dictates large matrix sizes for the Hamiltonian. The expansion's cutoff G c is set by magnitude: G < G c . In the case of θ = 0.1 • we require G c = 13 B 1/2 for convergence resulting in Hamilton matrix sizes of ∼50,000 (for n 1 = n 2 = 20). In the large angle case (θ = 0.8 • ) we set G c = 5 B 1/2 to converge the expansion.
FIG. 10. Overview over electronic structure of many-layer graphene twisted at θ = 0.8 • on graphite system Similarly to Figs. 5 and 9, we display the integrated top layer DOS and additionally color-code the coefficient of variation cv. Furthermore, we mark the minimum/maximum values taken by the top layer LDOS as gray background. The system considered here also consists of n1 = 20 layers of graphene as a bottom flake and n2 = 3, . . . , 20 layers of graphene as a top flake. In Panels (a-f ), we vary n2 from 3 over 4 to 6, with subsequent rows corresponding to the relaxed and rigid cases of the same system. The subpanels on the right show the LDOS maps at selected frequencies indicated as vertical dashed lines in the left subpanels. The color-maps correspond to the vertical dashed lines' colors. Two subsequent rows [e.g. (a) and (b)] share a color-map for each frequency to make differences apparent. (g-l) Same as panels (a-f ), but for n2 = 8, n2 = 12, and n2 = 20. As for these systems local variations in the electronic density of the top layer are very minuscule, we only show the LDOS maps at a single frequency. Apart from a small shift in absolute scale, these LDOS maps are very similar between each relaxed and rigid case in panels (g-l).
tom flake is fixed at n 1 = 20 and we vary the number of graphene layers in the top flake from n 2 = 3 to n 2 = 20. The results are presented in Figure 10. Overall, we observe that patterns in the LDOS of the top layer decay much faster as a function of increasing n 2 . For example, even the n 2 = 6 and n 2 = 8 cases display no notable difference in c v of the relaxed and rigid structures. For n 2 ≥ 8, the value of c v is too small to be visible on the selected scale (less than 1%). Furthermore, no such clear pattern of increased c v in the relaxed case over the rigid case can be found as for the small angle (θ = 0.1 • ) case. We interpret this behaviour as the relaxation decaying as fast -or faster than -the effects that the moiré interlayer coupling U G,G has on the top layer electronic structure.

Full frequency LDOS maps
As supplemental material [73], we provide all LDOS maps for the systems studied within this work as videos where we change ω with each frame. The color-maps evolve as a function of ω similar to the color-maps of the LDOS maps in Figs. 5, 9 and 10. The upper panel corresponds to the rigid structure and the lower panel to the relaxed one. In each frame, we normalize the color-map separately with bright (yellow) colors indicating maxima and dark (red/blue) colors indicating minima. To display only significant LDOS variations in the movies, we set the minimum relative difference that needs to be covered by the color-map to ±2% such that when the minimum and maximum values are closer than 2% to their mean value, we artificially increase the color-mapped minimum and maximum values to be 2% away from the mean value.