Origin of Mott insulating behavior and superconductivity in twisted bilayer graphene

A remarkable recent experiment has observed Mott insulator and proximate superconductor phases in twisted bilayer graphene when electrons partly fill a nearly flat mini-band that arises a `magic' twist angle. However, the nature of the Mott insulator, origin of superconductivity and an effective low energy model remain to be determined. We propose a Mott insulator with intervalley coherence that spontaneously breaks U(1) valley symmetry, and describe a mechanism that selects this order over the competing magnetically ordered states favored by the Hunds coupling. We also identify symmetry related features of the nearly flat band that are key to understanding the strong correlation physics and constrain any tight binding description. First, although the charge density is concentrated on the triangular lattice sites of the moir$\text{\'e }$ pattern, the Wannier states of the tight-binding model must be centered on different sites which form a honeycomb lattice. Next, spatially localizing electrons derived from the nearly flat band necessarily breaks valley and other symmetries within any mean-field treatment, which is suggestive of a valley-ordered Mott state, and also dictates that additional symmetry breaking is present to remove symmetry-enforced band contacts. Tight-binding models describing the nearly flat mini-band are derived, which highlight the importance of further neighbor hopping and interactions. We discuss consequences of this picture for superconducting states obtained on doping the valley ordered Mott insulator. We show how important features of the experimental phenomenology may be explained and suggest a number of further experiments for the future. We also describe a model for correlated states in trilayer graphene heterostructures and contrast it with the bilayer case.


I. INTRODUCTION
Superconductivity occurs proximate to a Mott insulator in a few materials. The most famous are the cuprate high-T c materials [1]; others include layered organic materials [2], certain fullerene superconductors [3], and some iron-based superconductors [4]. In these systems, there is a complex and often poorly understood relationship between the Mott insulator and the superconductor, which has spurred tremendous research activity in condensed matter physics in the past 30 years. Very recently, in some remarkable developments, both Mott insulating behavior and proximate superconductivity have been observed in a very different platform: two layers of graphene that are rotated by a small angle relative to each other [5,6].
Twisted bilayer graphene (TBG) structures have been studied intensely in the past few years [7][8][9][10][11][12][13][14][15][16][17][18]. The charge density is concentrated on a moiré pattern which forms (at least approximately) a triangular lattice [8,9,11,14,15]. The electronic states near each valley of each graphene monolayer hybridize with the corresponding states from the other monolayer. When the twisting angle is close to certain discrete values known as the magic angles, theoretical calculations show that there are two nearly flat bands (per valley per spin) that form in the middle of the full spectrum that are separated from other bands [12]. When the carrier density is such that the chemical potential lies within these nearly flat bands, interaction effects are expected to be enhanced. At a filling of 1=4 or 3=4 (denoted ν ¼ −2 and þ2, respectively, with full band filling denoted ν ¼ þ4) of these nearly flat bands, Ref. [5] reports insulating behavior at very low temperatures. At such fillings, band insulation is forbidden, which leads naturally to the expectation that these are correlation-driven (Mott) insulators. Doping the Mott insulator at 1=4 band filling-with either electrons or holes-reveals superconductivity at low T [6].
A number of other striking observations are made in Refs. [5,6] about both the Mott insulator and the superconductor from transport studies in a magnetic field. The Mott insulation is suppressed through the Zeeman coupling of the magnetic field at a low scale of approximately 5Troughly the same scale as the activation gap inferred from zero field resistivity. Quantum oscillations are seen in the hole-doped state with a frequency set (in the hole-doped side) by the density deviation from the Mott insulator. The degeneracy of the corresponding Landau levels is half of what might be expected from the spin and valley degrees of freedom that characterize electrons in graphene. The superconductivity occurs at temperatures that are high given the low density of charge carriers. Just like in other doped Mott insulators, there is a dome of superconductivity with T c reaching an "optimal" value at a finite doping. The superconductivity is readily suppressed in accessible magnetic fields-both perpendicular and parallel to the plane.
The observation of these classic phenomena in graphene gives new hope for theoretical progress in addressing old questions on Mott physics and its relationship to superconductivity. They also raise a number of questions. What is the nature of the insulators seen at these fractional fillings? How are they related to the observed superconductivity? On the theoretical side, what is an appropriate model that captures the essential physics of this system?.
In this paper, we make a start on addressing these questions. The two nearly flat bands for each valley found in the band structure have Dirac crossings at the moiré K points (but not Γ). We argue that these Dirac crossings are protected by symmetries of the TBG system. We show that this protection precludes finding a real-space representation of the nearly flat bands in terms of Wannier orbitals localized at the triangular moiré sites, in contrast to natural expectations. Thus, a suitable real-space lattice model is necessarily different from a correlated triangular-lattice model with two orbitals (corresponding to the two valleys) per site. We instead show that a representation that is faithful to the Dirac crossings is possible on a honeycomb lattice with two orbitals per site, but even this representation has some subtleties. First, one cannot implement a natural representation of all the important symmetries in the problem, which include spatial symmetries, time reversal, and a separate conservation of electrons of each valley [which we dub U v ð1Þ]. Second, since the charge density is concentrated at the moiré triangular sites (which appear as the centers of the honeycomb plaquettes), the dominant interaction is not an on-site Coulomb repulsion on the honeycomb sites. Rather, it is a "cluster-charging energy" that favors having a fixed number of electrons in each honeycomb plaquette, which makes this model potentially rather different from more standard Hubbard models with on-site interactions.
Armed with this understanding of the microscopics, we can begin to address the experimental phenomenology. We propose that this system spontaneously breaks the valley U v ð1Þ symmetry-we call the resulting order "intervalley coherent" (IVC). We discuss microscopic mechanisms that stabilize IVC symmetry breaking. We point out that, even when the IVC is fully polarized, it cannot, by itself, lead to a fully insulating state but rather leads to a Dirac semimetal. The development of a true insulator needs a further symmetry breaking (or some more exotic mechanism) to gap out the Dirac points. We show that, once the valley symmetry is spontaneously broken, the physics at lower energy scales can be straightforwardly formulated in terms of a real-space honeycomb lattice tight-binding model with a dominant cluster-charging interaction and other weaker interactions. We outline a number of different possible routes in which a true insulator [19] can be obtained in such an IVC-ordered system. A concrete example is a state that further breaks C 3 rotational symmetry. We show how doping this specific IVC insulator can explain the phenomenology of the experiments. We present a possible pairing mechanism due to an attractive interaction mediated by Goldstone fluctuations of the IVC phase. We describe and contrast features of other distinct routes by which the IVC state can become a true insulator at ν ¼ AE2. We propose a number of future experiments that can distinguish between the different routes through which an IVC can become a true insulator.
In addition, very recently, a heterostructure of ABCstacked trilayer graphene and boron nitride (TLG/hBN), which also forms a triangular moiré superlattice even at a zero twist angle, was studied [20]. This system also features nearly flat bands that are separated from the rest of the spectrum. Correlated Mott insulating states are seen at fractional fillings of the nearly flat band. Unlike the TBG, here the nearly flat band has no Dirac crossing, which makes the details of the two systems potentially rather different. In particular, the nearly flat band of the TLG/hBN can be modeled in real space as a triangular-lattice model with two orbitals per site, supplemented with interactions. However, the hopping matrix elements are, in general, complex (but subjected to some symmetry restrictions). We describe some properties of this model and suggest that this system offers a good possibility to realize novel kinds of quantum spin-orbital liquid states.

A. Setup
First, to establish the notation, let us consider a graphene monolayer, with lattice vectors A 1 and A 2 (see Appendix A for details). The honeycomb lattice sites are located at r 1;2 ¼ 1 2 ðA 1 þ A 2 Þ ∓ 1 6 ðA 1 − A 2 Þ, where the − and þ signs are, respectively, for the sites labeled by 1 and 2. Now consider the moiré pattern generated in the twisted bilayer problem. For concreteness, imagine we begin with a pair of perfectly aligned graphene sheets, and consider twisting the top layer by an angle θ relative to the bottom one. Now we have two pairs of reciprocal lattice vectors, the original ones B a and B 0 a ¼ R θ B a . Like Refs. [7,12], we approximate the moiré superlattice by the relative wave vectors, leading to a periodic structure with reciprocal lattice vectors b a ¼ B 0 a − B a ¼ ðR θ − IÞB a . For small θ, we can approximate this by b a ¼ θẑ × B a . Thus, the moiré pattern also has triangular-lattice symmetry, but it is rotated by 90°and has a much larger lattice constant. Note that, in this dominant harmonic approximation, questions of commensuration or incommensuration are avoided, since no comparison is made between the other sets of harmonics of the moiré superlattice that are not commensurate to the dominant one.
Let us now briefly review the low-energy electronic structure of monolayer graphene to set the notation. Parameterizing k ¼ P j¼1;2 g j B j for g j ∈ ð−1=2; 1=2, the Bloch Hamiltonian for the nearest-neighbor hopping model is Note that, as a general property of our present choice of Fourier transform, the Bloch Hamiltonian is not manifestly periodic in the Brillouin zone (BZ). Rather, for any reciprocal lattice vector B, we have Hðk þ BÞ ¼ η B HðkÞη † B , where η B ¼diagðe −iB·r a Þ. One can now pass to a continuum limit near each Dirac point K ¼ð2B 1 −B 2 Þ= 3 and −K. We then have the linearized Hamiltonian where ℏv F ¼ ffiffi ffi 3 p ta=2 in our simple nearest-neighbor model. Since HðqÞ is not periodic in the BZ, expanding about the other equivalent Dirac points leads to a slightly modified form of the Hamiltonian (due to conjugation by some η). In second quantized notation, we can write the continuum Hamiltonian: where the momentum integration is understood to be implemented near the Dirac point momentum by introducing a cutoff jkj ≤ Λ and d −2 k ¼ ½ðdk x dk y Þ=ð2πÞ 2 . The symmetry implementation on the continuum fields is tabulated in Appendix A. For example, C 3 -rotation symmetry is represented asĈ 3ψ AEμ;kĈ −1 Next, we couple the degrees of freedom in the two layers of graphene and arrive at a continuum theory for the twisted bilayer graphene system [7,12]. First, we note that the rotated Bloch Hamiltonian of a monolayer can then be identified as H φ ðkÞ ¼ HðR −φ kÞ. Linearizing about the rotated K point R φ K obtainsĥ AE ðφÞ, withĥ AE ðφÞ defined by replacing σ ↦ σ φ inĥ AE , where Focusing on a single valley, say, K, the continuum theory [7,12] of the twisted bilayer graphene system is described by the HamiltonianĤ cont ¼Ĥ Dirac þĤ T , wherê H Dirac ¼ĥ þ ðφ t Þ þĥ þ ðφ b Þ; þ symmetry-related terms; ð5Þ t and b, respectively, denote the top and bottom layers, and we set φ t ¼ θ=2 and φ b ¼ −θ=2. Here, we introduce q 1 ≡ R −θ=2 K − R θ=2 K, which characterizes the momentum transfer between the electronic degrees of freedom of the two layers [12]. Assuming T q 1 is real, the symmetries of the system, which we discuss in the following subsection, constrain T q 1 [21] to take the form T q 1 ¼ w 0 − w 1 σ 1 , where w 0;1 are real parameters [22]. Similarly, one can generate the omitted symmetry-related terms by applying symmetries on T q 1 .

B. Symmetries of the continuum theory
Let us discuss how the symmetries of the graphene monolayer are modified in the twisted bilayer problem within the dominant harmonic approximation. We see that, in addition to the moiré translation symmetry, we have C 6 rotation, time reversal, and a mirror symmetry. Furthermore, a U(1) valley symmetry that allows us to assign valley charge to the electrons emerges in the lowenergy limit. The generator of C 6 rotation and time reversal flips the valley charge, while reflection leaves it invariant.
Microscopically, the stacking pattern of the two layers can be specified as follows [12,21,23]: First, we align the two layers perfectly in a site-on-site manner, corresponding to the "AA-stacking" pattern, and then rotate the top and bottom layers about a hexagon center by angles θ=2 and −θ=2 clockwise, respectively; second, we shift the top layer by a vector d parallel to the plane. For generic values of θ and d, one expects that almost all of the spatial symmetries are broken.
However, within the dominant harmonic approximation, it is found that, on top of possessing moiré lattice translation symmetries, the effective theory is also insensitive to d [12]. This finding implies that, given θ, the effective theory will at least possess all the exact symmetries for any choice of d. A particularly convenient choice is when we take d ¼ 0. In this case, we can infer all the point-group symmetries of the system by focusing on the center of the hexagons [ Fig. 1(a)]. Aside from the rotational symmetries generated by the sixfold rotation C 6 , we see that there is an additional mirror plane M y , which, in fact, combines a mirror perpendicular to the 2D plane together with an inplane mirror which flips the top and bottom layers. Strictly speaking, this process leads to a twofold rotation in 3D space, but when restricting our attention to a 2D system it acts as a mirror.
In summary, the effective theory will at least have the following spatial symmetries: lattice translations, a sixfold rotation, and a mirror, which allows one to uniquely identify its wallpaper group (i.e., 2D space group) as p6mm (numbered 17 in Ref. [24]). Having identified the symmetries of the system, one can derive the model following a phenomenological approach by systematically incorporating all symmetry-allowed terms with some cutoff [21]. We tabulate the explicit symmetry transformation of the electron operators in Appendix A.
In the effective theory, the degrees of freedom arising from the microscopic K and K 0 points are also essentially decoupled [7,12], which is because, for a small twist angle satisfying j sin θj ≪ 1, we have jb a j ≪ jKj, and therefore the coupling between the K and K 0 points is a very highorder process. Hence, on top of the usual electron-charge conservation, the effective theory has an additional, emergent U v ð1Þ conservation corresponding to the independent conservation of charge in the two valleys K and K 0 . Henceforth, we refer to this as "valley conservation." The valley-charge operator is given bŷ Note that, as time reversal T interchanges the K and K 0 valleys, it is not a symmetry of a single valley. Similarly, C 6 also interchanges the two valleys; thus, We then see that their combined symmetry C 6 T is a symmetry in the single-valley problem. In fact, one can check that the symmetry of the single-valley problem is described by the magnetic space group 183.188 (BNS notation; Ref. [25]). We tabulate the generating symmetries in Table I.  [7]. (e) When C 3 rotation is broken, but the combined symmetry of twofold rotation C 2 and time reversal T is preserved, the Dirac points remain protected, although unpinned from K M and K 0 M . (f) When valley conservation U v ð1Þ symmetry is broken, one can no longer label the bands using their valley index. The gaplessness at charge neutrality is no longer symmetry required, although, depending on detailed energetics, there can still be remnant Dirac points. In contrast, at the quarter filling relevant for the observed Mott physics, there are necessarily Dirac points present in this case. The other symmetry-breaking patterns listed above also do not open band gaps at quarter filling. Formally, the continuum effective theory [7,12] we describe corresponds to an infinite-band problem for each valley. However, near charge neutrality it is found that, for some range of angles, the moiré potential can induce additional band gaps at a certain commensurate filling of the moiré unit cell. The "nearly flat bands" identified near the magic angle correspond to two bands per valley, separated from all other bands by band gaps, that form Dirac points at the K M and K 0 M points in the moiré BZ. These bands correspond to the relevant degrees of freedom for the correlated states observed in Refs. [5,6], and, in the following, we focus our attention to the properties of these bands. In this section, we always focus on a single valley, say, that corresponding to the K point in the microscopic description.

A. Symmetry-enforced band contacts
A salient feature of the effective theory is the presence of Dirac points at charge neutrality, whose velocity is strongly renormalized and approaches zero near the magic angle [7,12]. The stability of the Dirac points can be understood from symmetries: For a single valley, K M is symmetric under the (magnetic) point group generated by C 6 T . In particular, ðC 6 T Þ 2 ¼ C 3 , and therefore we can label each band at K M by its C 3 eigenvalue, which takes a value in f1; ω ¼ e −ið2π=3Þ ; ω Ã g. In particular, a band with C 3 eigenvalue ω is necessarily degenerate with another with eigenvalue ω Ã , as ðC 6 T Þ 2 ≠ 1 on these bands and enforces a Kramers-like degeneracy. The observed Dirac points at charge neutrality correspond precisely to this twodimensional representation [ Fig. 1(b)].
While we allude to the presence of C 6 T symmetry in explaining the stability of the Dirac points, these band contacts are actually locally stable so long as the symmetry ðC 6 T Þ 3 ¼ C 2 T is kept. This stability can be reasoned by noting that C 2 T quantizes the Berry phase along any closed loop to 0; π mod 2π, and a Dirac point corresponds precisely to the case of a nontrivial π Berry phase [26].
Let us now consider the effect of breaking the various symmetries (spontaneously or explicitly) in the system. First, as C 2 T is crucial in protecting the local stability of the Dirac points, once it is broken, the Dirac points can be immediately gapped out [ Fig. 1(c)]. However, as long as C 2 T symmetry is preserved, a small breaking of any other point-group symmetries will not lead to a gapped band structure at charge neutrality. For instance, the mirror M y maps K M to K 0 M , and its presence ensures only that the two inequivalent Dirac points are at the same energy. Therefore, even when a perpendicular electric field is externally applied such that M y is broken, as in the setup in Refs. [5,6], it can only induce an energy difference between the two Dirac points [7] [ Fig. 1(d)]. This result should be contrasted with the case of Bernal-stacked bilayer graphene, whose quadratic band touching at charge neutrality can be gapped by an external electric field [27]. Alternatively, if C 3 symmetry is broken, the Dirac points are unpinned from K M and K 0 M [ Fig. 1(e)]. As such, for a sufficiently strong C 3 breaking, a band gap might open at charge neutrality if the Dirac points could meet their oppositely charged partners and annihilate. (Though, as we argue later, this situation is impossible without further symmetry breaking [28].) Now consider the case when valley conservation is spontaneously broken by an IVC, i.e., the valley chargê I z is no longer conserved. In this case, we should first consider the full four-band problem consisting of both valleys. At, say, K m , the combined symmetry of M y T ensures that the Dirac points from the two valleys are degenerate. While such degeneracy is lifted in the presence of an IVC, as long as the remaining symmetries are all intact, we can only split the degeneracy according to Fig. 1(f)]. This remaining twofold degeneracy rules out an interpretation of the experimentally observed Mott insulator as a Slater insulator with a spatial-symmetryrespecting (ferro) IVC incorporated at the Hartree-Fock level. Instead, one must either introduce additional symmetry breaking, say, that of C 3 or lattice translations, or consider an IVC which also breaks some additional spatial symmetries. We elaborate on these points in Sec. VI. We also note that an essentially identical argument holds for the case of spontaneously ferromagnetic order leading to fully spin-polarized bands of I z ordering. In this way, it connects to the quarter-filled Mott insulator we are interested in.

B. Triangular versus honeycomb lattice
A conventional route for understanding the correlated states observed in Refs. [5,6] is to first build a real-space tight-binding model for the relevant bands and then incorporate short-range interactions to arrive at, say, a Fermi-Hubbard model. Typically, the orbital degrees of freedom involved in the tight-binding model can be identified from either applying chemistry insight or, more systematically, studying the projected density of states for the relevant bands, both of which are inapplicable to the current moiré potential problem; furthermore, an understanding of the structure of the wave functions is required. Indeed, as is noted in Refs. [9,23,29], the local density states for the flat bands are well localized to the AA regions of the moiré pattern, which form a triangular lattice. This theoretical prediction is also confirmed experimentally [8,11,14]. Based on this observation, it is natural to consider a real-space model starting from effective orbitals centered at the AA sites, which corresponds to a tightbinding model defined on the triangular lattice [5,6]. In addition, by treating the two valleys separately, one envisions a model with two orbitals localized to each of the triangular sites (i.e., AA regions of the moiré pattern).
From symmetry representations, however, we can immediately rule out such a model. This observation can be readily inferred from the computed band structure [7,12,21,23] [ Fig. 4(j)]: While the two bands are nondegenerate at Γ, as we have explained, they form symmetry-protected Dirac points at K M and K 0 M . Using such a pattern of degeneracies, one can infer the possible symmetry representations at these high-symmetry points, and from a real-space analysis [30][31][32] one finds that a triangular-lattice model always leads to the same symmetry representation at all three of the high-symmetry points; i.e., they are either all nondegenerate or are all Dirac points. This result is inconsistent with the observed pattern of degeneracies, which rules out all triangular-lattice models.
In fact, the degeneracy pattern described is familiarit corresponds exactly to the monolayer graphene problem. One can further check that this is the only possible solution using the methods described in Refs. [30,32]. Symmetrywise, this correspondence implies that any tight-binding model must correspond to orbitals forming a honeycomb lattice. To reconcile with the predicted and observed local density of states [8,9,11,14,23], however, these orbitals must have nontrivial shapes: Although each orbital is centered at a honeycomb site, which corresponds to the AB=BA region of the moiré pattern, the weight of the orbitals is mainly localized to the AA sites. Therefore, we expect the shape of the orbitals to resemble a (three-lobed) fidget spinner [Figs. 3(a) and 3(b)].

C. Obstructions to symmetric Wannier states
Our symmetry analysis suggests that one should model the system by orbitals centered at the AB=BA regions of the moiré potential, which form a honeycomb lattice. A minimal tight-binding model of a single valley would then bê whereĉ † r is an electron-creation operator centered at a honeycomb site (for a single valley) and ρ i connects two ith nearest-neighbor sites. Given that this equation describes a single valley which breaks time-reversal symmetry, the hoppings are, in general, complex unless constrained by a space-group symmetry.
A pedestrian approach involves optimizing the parameters ft ρ i ; ϕ ρ i g to reproduce the energy eigenvalues obtained from the continuum description. Would this be a good starting point for building up a real-space effective model upon which we can incorporate interaction terms? Contrary to usual expectations, we argue that such an approach has a serious flaw in capturing certain essential properties. Specifically, we show that, while the energy eigenvalues may be well approximated, the topology of the resulting Bloch wave functions will necessarily be incorrect. This result has important dynamical consequences, relating to the stability of band contacts under different symmetry assumptions, which, in turn, dictate whether an insulator will result at particular fillings. In particular, we find two symmetry obstructions to deriving a single-valley tight-binding model. The first concerns the symmetry representations of M y : We find that the two bands have opposite M y eigenvalues of AE1, whereas, from a real-space analysis [30][31][32], one can show that the two bands in a tight-binding model must have the same M y eigenvalue.
There is a second, more serious, obstruction: Aside from a quantized Berry phase of π for any closed loop encircling a single Dirac point, one can further define a Z-valued winding number [28]. In contrast to the conventional case of graphene, the two inequivalent Dirac points in the singlevalley model are known to have the same winding number [5,28,33]. As the net winding number of the Dirac points arising in any two-band tight-binding model would necessarily be zero, we can then conclude that there is an obstruction for a symmetric real-space description; i.e., there is an obstruction for constructing localized Wannier functions that reproduces just the two bands of interest, represents C 2 T naturally, and preserves valley quantum numbers. A more detailed description of this obstruction, by relating it to the anomalous surface state of a threedimensional topological phase, is contained in Appendix B. Essentially, this argument invokes three key ingredients: (i) a two-band model, (ii) C 2 T symmetry, and (iii) net winding of the Dirac points in the Brillouin zone.
We return to the question of tight-binding models in Sec. VIII, but, for the discussion below, we work directly in the momentum space in the manifold of states spanned by the nearly flat bands.

IV. INTERVALLEY COHERENT ORDER: PHENOMENOLOGICAL MOTIVATION
We first describe some important clues from experiments [5,6] on the nature of both the Mott state and the superconductor. We begin with the observation that-at optimal doping-an in-plane magnetic field suppresses the superconductivity when the Zeeman energy scale is of the order of the zero field T c . This observation shows that the superconductor has spin-singlet pairing. Upon hole doping the ν ¼ −2 insulator, quantum oscillations are seen with a frequency set by the density of doped holes in perpendicular B fields exceeding approximately 1T. This result tells us that the "normal" metallic state and the superconductor that emerges from it should be regarded as doped Mott insulators: The charge carriers that are available to form the normal state Fermi surface or the superconducting condensate are the doped holes. Thus, the hole-doped superconductor retains information about the Mott insulator. In contrast, electron doping this Mott insulator leads very quickly to quantum oscillations with a high frequency that is set by the deviation of the charge density from the charge neutrality point (ν ¼ 0). This result may indicate a first-order transition between a metal and Mott insulator on the electron-doped side. It will be important to search for signs of hysteresis in transport experiments as the gate voltage is tuned. As the superconductor is better developed and characterized on the hole-doped side, we restrict our attention to hole doping from now on.
A further important clue from the quantum oscillation data is that the Landau levels (per flux quantum) are twofold degenerate, whereas one would expect fourfold degeneracy coming from the spin and valley degeneracy. The doped holes have thus lost either their spin or valley quantum numbers (or some combination thereof). Losing spin makes it hard to reconcile with spin-singlet pairing that can be suppressed with a Zeeman field. Thus, we propose instead that the valley quantum number is lost. The simplest option [34] then is that the valley quantum number is frozen due to symmetry breaking, i.e., hIi ≠ 0. Here, we may define I using the electron operatorsĉðkÞ for the nearly flatband states: where a; b ¼ AE correspond to the valley index, α is the spin index, n labels the two bands for each valley, and τ denotes the standard Pauli matrices. A nonzero expectation value for I z breaks time-reversal symmetry, which leads to a sharp finite-temperature phase transition in 2d and would likely have been detected in the experiments. Given the absence of any evidence of a sharp finite-temperature transition, we propose that the ordering is in the pseudospin xy plane. These phenomenological considerations therefore lead us to an IVC-ordered state.
We note that, for IVC ordering to be useful to explain the quantum oscillations, it has to occur at a scale that is large compared to the scales set by the magnetic field. Specifically, the band splitting due to IVC ordering must be bigger than the Landau level spacing approximately 15-30 K at the biggest fields used (of the order of 5T), which means that the IVC order is much more robust than the superconductivity and occurs at a higher temperature scale. We further need the IVC order to be present already in the Mott insulator, so that upon doping it can impact the quantum oscillations.
Thus, our view is that the first thing that happens as the sample is cooled from a high temperature is IVC ordering. This order then sets the stage for other phenomena to occur at lower temperatures (the Mott insulation or the superconductivity).

V. SIMPLE THEORY OF THE IVC-ORDERED STATE
We now describe a mechanism that stabilizes IVC ordering and describe the properties of the resultant state.
Interestingly, to treat this stage of the problem, it is sufficient to work within a momentum-space formulation, which enables us to sidestep the difficulties elaborated in Sec. III C with a real-space tight-binding formulation.
Consider the nearly flat bands in the limit of strong Coulomb repulsion. Note that the dominant part of the interaction is fully SUð4Þ invariant. We expect that the Coulomb interaction prefers an SUð4Þ ferromagnetic state-similar to the SUð4Þ ferromagnetism favored by Coulomb interaction in the zeroth Landau level in monolayer graphene [35][36][37] or in the extensive literature on flat-band ferromagnetism [38]. Indeed, the difficulties with Wannier localization of the nearly flat bands also suggest that, when Coulomb interactions dominate, an SUð4Þ ferromagnetic ground state will be favored. The band dispersion, however, is not SUð4Þ symmetric, and hence there will be a selection of a particular direction of polarization in the SUð4Þ space. To address this, we consider the energies of different orientations of the SUð4Þ ferromagnet within a simple Hartree-Fock theory. Specifically, we compare a spinpolarized state, a pseudospin I z -polarized state, and the IVC state with I x polarization.
Assume a Hamiltonian with Similarly to before, a is the valley index, α is the spin index, and n labels the two bands for each valley. The dispersion ϵ an ðkÞ is independent of the spin, and, due to time reversal, ϵ an ðkÞ ¼ ϵ −an ð−kÞ. We assume a simple form of interaction: where N is the number of k points in the moiré Brillouin zone. Repeated indices are summed over here. This interaction actually has an SUð8Þ symmetry, but it is strongly broken down to SUð4Þ by the difference in dispersion between the two bands and eventually down to Uð2Þ × Uð2Þ by the asymmetry of the dispersion under k → −k. Each Uð2Þ factor corresponds simply to independent Uð1Þ charge and SUð2Þ spin-conservation symmetries of the two valleys. We also remark that Eq. (14) is overly simplified, for it does not incorporate form factors arising from the modulation of the Bloch wave functions over the BZ when projecting onto the nearly flat bands. With such form factors included, the interaction projected onto the nearly flat bands should be written as with the form factors given by the Bloch wave functions of the states in the nearly flat bands via where ju an ðkÞi is the Bloch wave function of a state in the nearly flat bands labeled by valley index a, band index n, and momentum k (it has no dependence on the spin indices). These form factors are potentially important for the present problem due to the nontrivial band topology present in the valley-resolved band structure. Our preliminary analysis suggests that the results of the Hartree-Fock calculation are modified in the ultra-flat-band limit, i.e., when the interaction term overwhelms the kinetic energy, whereas the key conclusions below are stable within a range of intermediate interaction strengths. In view of this analysis, in the following, we first pursue the simplified Hartree-Fock theory and leave the task of settling down the real ground state for future (numerical) studies; it is an interesting question answering whether the actual experimental systems demand a more sophisticated treatment. Details of the Hartree-Fock calculation are presented in Appendix E. In summary, we find that the IVC state has a lower energy than both spin-and I z -polarized states. The physical reason is that, for both the spin-and I z -polarized states, the order parameter is conserved, and, hence, there is a linear shift of the band when the order parameter is nonzero. In contrast, due to the k → −k dispersion anisotropy, the IVC order parameter does not commute with the Hamiltonian. IVC order thus does not simply shift the band but modifies it more significantly. Assuming a near full polarization in the Hartree-Fock Hamiltonian, the noncommutativity leads to an extra energy gain at second order in the IVC state compared to the spinpolarized or I z -polarized states.
Note that, in the presence of Uð2Þ × Uð2Þ symmetry, the spin-singlet IVC state is degenerate with states that have spin-triplet IVC ordering with an order parameter I x S. The selection between the singlet and triplet IVC order has to occur due to other terms in the Hamiltonian that have been ignored so far. We do not attempt to pin down the details of this selection in this paper and simply assume, as suggested by the phenomenology, that the spin-singlet IVC is preferred and discuss its consequences.
Next, we turn to a description of the properties of the IVC state. We assume that the order parameter is large and first study its effects on the band structure. In the absence of valley ordering, at the two Dirac points, there is a fourfold band degeneracy. As explained in Sec. III A, the valley ordering splits this fourfold degeneracy into two sets of twofold degenerate Dirac points. When the order parameter is large, the four nearly flat bands split into two sets of two bands [ Fig. 1(f)]. At quarter filling, we fill the bottommost band, which, however, results in not a Mott insulator but a Dirac semimetal. Thus, the IVC state by itself does not lead to a Mott insulator, and a further mechanism is needed. We discuss this in the next section. We note that the semimetals obtained from planar valley order versus I z order are rather different, the latter being similar to spin-ordered states. Furthermore, while additionally breaking C 3 symmetry alone can eventually gap the Dirac points of the IVC semimetal, the same is not true of spin-or I z -ordered semimetals, which need further symmetry breaking due to their Dirac points carrying the same chirality.
Going beyond the mean field, the universal properties of the IVC-ordered state are determined by its symmetry breaking. It has a Goldstone mode with linear dispersion at the longest scales. Furthermore, it has a finite-temperature Berezinskii-Kosterlitz-Thouless transition, which has weak signatures in standard experimental probes [39].

VI. INTERVALLEY COHERENT MOTT INSULATORS: GENERALITIES AND A CONCRETE EXAMPLE
We see that IVC ordering by itself gives us only a Dirac semimetal and not a Mott insulator. We now consider the physics below the IVC-ordering scale. First, we note that that, once U v ð1Þ symmetry is broken, there is no difficulty with writing down a real-space tight-binding model for the two lowest bands. This model lives on the honeycomb lattice and must be supplemented with interactions. Naively, we might imagine that the dominant interaction is an on-site Hubbard repulsion. However, we know that the orbital shapes are such that the actual charge density is concentrated at the original triangular sites, i.e., at the center of the hexagons of the honeycomb lattice. Now, if there is an electron at a honeycomb site r, its wave function is spread equally between the three hexagonal plaquettes that the site r is a part of. The integral of the modulus square of the wave function in any one such plaquette is 1=3. The total charge that is localized at the center R of each hexagon (i.e., the triangular moiré sites) is therefore Now, let us make the reasonable approximation that the primary Coulomb interaction is on site on the triangularlattice sites R. Then, in terms of the honeycomb model, the appropriate Coulomb interaction is a "cluster Hubbard" term: We also specialize to ν ¼ −2 when this honeycomb lattice is half filled. This interaction penalizes charge fluctuations on each honeycomb plaquette. Thus, a suitable model Hamiltonian at scales much smaller than the intervalley coherence scale takes the form For the usual Hubbard model with a strong on-site repulsion, the Mott insulating state has the usual twosublattice Néel order. However, when the cluster-charging energy is dominant, this is not obviously the case. We therefore allow ourselves to consider a few different possibilities for the Mott insulator. Naturally, in all these options, the charge gap of the insulator is much lower than the scale of IVC ordering. In the experiments, the charge gap is estimated to be about 5 K. The IVC ordering should then occur at a much higher scale, consistent with what we already concluded based on the phenomenology. In this section, to be concrete, we focus on a particular Mott insulator where the C 3 rotation symmetry is spontaneously broken while preserving other symmetries (Fig. 2).
In passing, we note that insulators driven by cluster charging have been considered in a number of different contexts before. For instance, Refs. [40,41] study models with extended Coulomb interactions as a route to access Wigner-Mott insulators. Cluster-charging insulators have also been long studied [42][43][44] as a platform for various fractionalized insulating phases of matter. In contrast to these earlier works, where a dominant cluster-charging interaction is simply postulated and the resulting physics explored, here we identify a natural mechanism for such an interaction.

A. C 3 -broken insulator
Breaking the C 3 symmetry allows gapping out the Dirac points and leads to an insulator. As the C 3 breaking order parameter increases, the two Dirac points move towards each other [ Fig. 1(e)] and eventually annihilate to produce a fully gapped insulator. This annihilation (and, correspondingly, the gap minimum just into the insulator) occurs at either the Γ or M point depending on the details of the dispersion. Note that, within this picture, the C 3 breaking also occurs at a scale bigger than the approximately 5 K charge gap of the Mott insulator. Clearly, the excitations above the charge gap are ordinary electrons, and their gap can be readily closed by a Zeeman field.
Upon doping this insulator, charge enters as ordinary holes and forms a small Fermi pocket. This pocket is centered at either Γ or M depending on the location of the minimum insulating gap. In either choice, due to the absence of C 3 symmetry, there is just a single such Fermi pocket which will accommodate the full density of doped holes. Because of the intervalley ordering, these holes are valley polarized in the I x direction. Naturally, this explains the quantum oscillation experiments-the frequency is set by the density of doped holes, and the Landau level degeneracy (per flux quantum) is only twofold (from the spin).
A natural pairing mechanism emerges from the coupling of the holes to Goldstone fluctuations of the intervalley order, as we now elaborate. In the presence of an intervalley condensate, an appropriate effective action is Here, ψ is a continuum electron field that represents the electrons in the low-energy nearly flat bands, θ is the phase of the intervalley condensate, and Φ 0 is its amplitude. Note that h k ¼ ϵ s ðkÞ þ ϵ a ðkÞτ z is a 2 × 2 matrix for each k point [45]. We allow for slow Goldstone fluctuations of the phase and obtain a convenient form of the electron-electron interaction induced by these fluctuations. To that end, we first define new fermion variables χ through This redefinition removes the θ dependence from S 1 , but S 0 now takes the form Here, J v is the contribution to the U v ð1Þ current from the fermions. It is conveniently written down in momentum space as Now we assume that Φ 0 is near maximum polarization and diagonalize the χ Hamiltonian obtained from S 0 ½χ þ S 1 ½χ. As discussed earlier, there are two sets of bands per spin (corresponding to I x ¼ AE1) that are well separated from each other. The low-energy electrons are those that have valley polarization I x ≈ 1. We wish to obtain the coupling of these electrons to the θ fluctuations. For the bands with I x ≈ 1, we write It follows thatχτ z χ ≈0 and similarlyχf½∂ϵ s ðkÞ=∂k i gτ z χ k ≈ 0. The only nonvanishing coupling, therefore, is to the contribution from ϵ a ðkÞ. We get Now we assume we have integrated out the fermions everywhere except in the close vicinity of the Fermi surface, which gives a long-wavelength, low-frequency effective action for the θ fluctuations of the form Here, K is the phase stiffness of the θ field, and v is the velocity of the linear dispersing θ fluctuations. We now integrate out θ to get an effective interaction between the c electrons: which is an attractive interaction. Anticipating that the important regime for pairing is jωj ≪ vq for an approximate treatment, we set ω ¼ 0 in the prefactor to get a simplified effective interaction: We emphasize again that-within our approximate treatment-the only contribution to J v i comes from the antisymmetric part of the "normal" state dispersion. This attractive interaction can now be treated within a BCS mean field and leads to a superconducting state.
Note that, in real space, since the large repulsion will be on the hexagon center and not on the honeycomb site, there is no particular reason to disfavor on-site s-wave pairing. Though we do not give a detailed description of the pairing symmetry, the route to superconductivity sketched above naturally leads to a spin-singlet superconductor. Furthermore, it forms out of a "normal" metal of ordinary holes through a BCS-like pairing mechanism. We expect then that Zeeman fields of the order of T c efficiently suppress the superconductivity except possibly at very low doping (where eventually phase fluctuations kill T c ). At low doping, and when one is near a high-symmetry point of the Brillouin zone (which is consistent with the fact that there is no additional degeneracy seen in quantum oscillations), the antisymmetric part of the dispersion is expected to be constrained by symmetry to be small. For example, near the Γ point, it vanishes as the cube of the crystal momentum, which would lead to a small valley current (the derivative of the antisymmetric dispersion with respect to momentum) and, hence, a weakening of the coupling to valley Goldstone modes, as the doping is reduced. However, if C 3 rotation symmetry is broken, the antisymmetric dispersion can include a term that is linear in momentum, leading to a nonvanishing valley current at small doping.
It is also important to ask if a conventional pairing mechanism due to coupling to phonons might be operative. We note that the bandwidths of the nearly flat bands (approximately 10 meV) are much smaller than the typical phonon energies in graphene. Thus, the magic-angle twisted bilayer graphene system is far from a regime in which phonon effects can be treated within the usual adiabatic approximation. Furthermore, the observation of superconductivity only in the vicinity of the correlated insulator appears unnatural within a phonon-based theory. Nevertheless, it is possible that phonon effects play a role in various aspects of our physics and contribute to the effective interactions between the electrons. We leave for the future a proper treatment of the electron-phonon coupling in these systems and focus instead here on electron-electron interaction effects, which clearly must play an important role given the proximity of the superconductor to a correlation-driven Mott insulator.

VII. OTHER POSSIBLE MOTT INSULATING STATES
The C 3 -broken insulator is a concrete example of how an intervalley condensate of the twisted bilayer system can eventually become a Mott insulator. However, given the current experimental information, it is not clear that this is uniquely dictated. Therefore, we sketch a few different Mott insulating states and present some of their phenomenological consequences.
(1) Translation-broken insulator.-Broken moiré translations-for instance, Kekule ordering on the effective honeycomb lattice-can also gap out the Dirac points. The properties of this state and its evolution into the doped superconductor are similar to the C 3 -broken insulator discussed above.
(2) Antiferromagnetic insulator.-This state is the familiar Mott insulator of the usual honeycomb Hubbard model. Upon doping, it is expected to evolve into a spin-singlet superconductor as seen in numerical studies of the t − J honeycomb model [46]. The pairing symmetry appears to be d þ id. It will be interesting to look for signatures of broken time-reversal symmetry if this scenario is realized. Furthermore, this state is known to have quantized spin and thermal Hall effects and associated gapless edge states [47,48]. Other properties related to this state are also discussed in the literature [49,50]. Upon doping, a natural possibility is that the charge goes in as bosonic holons (spinless charge-e quasiparticles) whose condensation leads to superconductivity. This process is the classic resonating valence bond (RVB) mechanism [1] for superconductivity in a doped Mott insulator. However, in this scenario, at low doping the superconducting T c will not have anything to do with the spin gap (measured by the Zeeman scale needed to suppress pairing). We do not attempt to decide between these different options in this paper. However, we outline experiments that can distinguish between them in Sec. X.

A. For TBG
As we argue, there is an obstruction for writing down any tight-binding model for the single-valley problem. A natural way out, therefore, is to instead consider the four-band problem consisting of both valleys. Our earlier argument requiring the orbitals to be centered on the sites of the honeycomb lattice still applies, but now with two orbitals associated with every site. In addition, to resolve the mirror-eigenvalue obstruction we describe, these two orbitals should transform oppositely under the mirror symmetry. These orbitals, however, cannot have definite valley charge, for otherwise the problem is reduced back to the earlier case with each valley considered separately. Instead, it is natural to demand each of the two orbitals to be a time-reversal singlet, which leads to a standard representation for the symmetry group of p6mm together with time reversal. However, due to the aforementioned anomaly, the representation of valley U v ð1Þ is necessarily complicated, and we address that later.
Forgetting about valley conservation for the time being, the construction of Wannier functions becomes a rather standard problem, and well-established protocols apply. In particular, we construct well-localized Wannier functions using the projection method [57], starting from a set of well-localized trial wave functions as the "seed" of the Wannier functions (Appendix C). Specifically, we start with the continuum theory of Ref. [12], as described around Eq. (5), with the parameters w 0 ¼ 110 meV and w 1 ¼ 120 meV. The success of the construction hinges crucially on having a nonsingular projection everywhere in the BZ, which can be monitored by ensuring that the overlap between the seed and the actual Bloch wave functions neither vanish nor diverge anywhere in the BZ [57]. Using this approach, we construct Wannier functions for a particular choice of parameters, detailed in Appendix C, for the four nearly flat bands near charge neutrality (spin ignored). Our trial wave functions attain a minimum and maximum overlap of 0.38 and 3.80, respectively, indicating a satisfactory construction. Indeed, the Wannier functions we obtain are quite well localized Having constructed the Wannier functions, one can readily extract an effective tight-binding modelĤ WF by first projecting the full Bloch Hamiltonian into the Wannier basis and then performing Fourier transform. Because of the exponential tail, however, the resulting tight-binding model has infinite-range hopping despite an exponentially suppressed amplitude. To capture the salient behavior of the model, it is typically sufficient to keep only the bonds with strength larger than some cutoff t c . In other words, t c serves as a control parameter, and one recovers the exact band structure in the limit of t c → 0, albeit at the cost of admitting infinite-range hoppings.
The obtained band structures for different value of t c are plotted in Figs. 4(a)-4(d). We find that a fairly long-range model (see Appendix D for parameters), keeping terms that connect sites up to two lattice constants apart, is needed to capture all the salient features of the energetics. It should be noted that the range of the approximate models generally depends on the localization of the Wannier function, and in this work we have not optimized the Wannier functions. It is therefore possible that, by further optimization, one may capture the energetics more faithfully using only shorterrange terms.

031089-11
Although spatial and time-reversal symmetries are respected in the tight-binding model, valley conservation is explicitly broken, which is because our Wannier functions cannot be chosen to represent valley conservation naturally, similar to the case of topological insulators [58], and, therefore, any truncation of the transformed Hamiltonian generically introduces explicit valleyconservation symmetry breaking. Furthermore, one can ask how the operator I z is represented. In particular, we want to construct the projection operators for the singleparticle problem, P AE ¼ 1 2 ð1 AE I z Þ, which project into the valleys. It would be most desirable if one can formulate I z , and hence P AE , directly in real space. Given that I z is also a free-electron Hermitian operator, this formulation amounts to finding a symmetric Wannier representation of I z . However, we find that there are again obstructions, which mirror exactly the obstructions we face when attempting to construct Wannier functions for the single-valley two-band model of the nearly flat bands, i.e., a mismatch in the mirror eigenvalues, as well as a nonzero net charge of the Dirac points. Such an inheritance of the obstructions is presumably a manifestation of the underlying anomaly of the single-valley description.
Towards recovering valley conservation.-It is desirable to restore valley conservation even approximately, for our truncated model, and we describe a method below. Recall that we denote the Hermitian valley-charge operator in the continuum effective theory byÎ z [Eq.  rotating into the Wannier function basis, we arrive at another Hermitian operator defined on the honeycomb sites, which we can simply interpret as yet another Hamiltonian-like object in our problem. Similar to the earlier discussion for the Hamiltonian, the effectiveÎ z operatorÎ WF z has infinite-range hopping with an exponentially suppressed amplitude, and it is natural to approximate it by truncation, keeping again only terms with a strength larger than some t 0 c . Such a truncation, however, introduces a deviation in the eigenvalues ofÎ z from the physical values of AE1 [ Fig. 4(e)]. To fix this deviation, therefore, we can further perform spectral flattening of the corresponding bands to AE1 in momentum space. This procedure is well defined so long as a band gap is sustained between the second and third bands ofÎ WF z , which is generically true as long as t 0 c is chosen to be reasonably small. We denote this flattened version byÎ WF z . Physically, this corresponds to an approximation of the actual representation of the valleycharge operator in our tight-binding model, and again our approximation can be made exact in the limit of t 0 c → 0. We can now restore valley conservation in our effective Hamiltonian. To this end, for n ∈ Z define the projection operatorP which projects into the sector withÎ WF z eigenvalue n (in the many-body Hilbert space) and satisfiesP nPm ¼P n δ n;m . We can now definễ for which valley conservation is restored. In essence, through this procedure we introduce a pair of Hermitian operators ðĤ WF ;Î WF z Þ, corresponding, respectively, to the Hamiltonian and the valley charge, that converge to the exact operators in the limit t c , t 0 c → 0. The valley projection procedure we describe applies equally well to an interacting HamiltonianĤ WF U obtained by projecting the microscopic interaction terms to our Wannier basis, which again would not be automatically valley conserving due to the truncation errors. For the free part of the Hamiltonian, however, the projection procedure can be greatly simplified, because the Bloch states ofÎ WF z , which equal those ofÎ WF z by definition, are known, and using which we can decompose H WF ðkÞ into the valleyconserving and valley-breaking parts. The projection then proceeds simply by retaining only the valley-conserving part. More concretely, write the Bloch "Hamiltonian" of the valley-charge operator asĨ z ðkÞ ¼ Ψ þ;k Ψ † þ;k − Ψ −;k Ψ † −;k , where Ψ AE;k are 4 × 2 matrices. Note that the columns of Ψ AE;k are simply the AE-valley-charge eigenstates. We can then perform the projection byH giving an easy way to perform the described projection. As is shown in Figs. 4(f)-4(i), the projected effective tight-binding model reexhibits all the symmetry features of the bands from the continuum theory [ Fig. 4(j)], for any choice of truncation parameter t c . In particular, Figs. 4(a) and 4(f) represent the simplest model which demonstrates the utility of our approach, with the valley projection alone converting an otherwise hopping-free Hamiltonian into one exhibiting the charge-neutrality Dirac points. We further remark that, although generically H WF does not respect U v ð1Þ, the effective Hamiltonian corresponding to Fig. 4(b) comes very close to being U v ð1Þ symmetric in terms of its energetics along the high-symmetry line. We provide an explicit tabulation of the bonds in this H WF and that of I WF z in Appendix D.

B. Nearly flat bands in trilayer graphene-boron nitride moiré superlattices
Recently, Mott insulating phases (but not superconductivity, at the time of writing) were observed [20] in a heterostructure of ABC trilayer graphene encapsulated in boron nitride (TLG/hBN), where a moiré superlattice is present even at zero twisting between the graphene layers. Four minibands are observed close to neutrality, whose bandwidth and separation can be tuned by a vertical electric field. Half filling one of the nearly flat bands results in a Mott insulator.
We remark that the symmetry setup for this trilayer heterostructure bears more resemblance to the Bernalstacked bilayer graphene than the TBG system we discuss above. In particular, the absence of Dirac points among the minibands suggests that no C 2 symmetry is present, and the system is potentially described by a wallpaper group for which the two sublattices of the honeycomb lattice are no longer symmetry related (say, the wallpaper group p3m1, No. 14 [24]). If that is indeed the case, one expects the valley-resolved band structure to admit a tight-binding model defined on the triangular lattice, although it remains to be checked whether or not the charge density profile exhibits any nontrivial features, akin to that found for the TBG system [ Fig. 3(c)]. It will be of great interest to derive a concrete real-space effective model for the TLG/hBN, but we leave this derivation as a future work.

IX. MODEL FOR CORRELATED STATES IN TRILAYER GRAPHENE HETEROSTRUCTURE
In this section, we briefly consider the case of triangular moiré superlattices in a trilayer graphene heterostructure. Correlated insulating states were observed very recently in this system [20]. Just like in the twisted bilayer, here, too, there are nearly flat bands that are separated from the rest of the spectrum. However, unlike the TBG, here there are no Dirac crossings in this nearly flat band, and the low-energy degrees of freedom are in the trivial representation of C 3 . In addition, it is known that a vertical electric field can induce a gap for ABC-stacked trilayer graphene, and, depending on the direction of the electric field, the band structure can have a zero Chern for one direction of the electric field or a nonzero Chern number for the other direction of the electric field [59]. It is thus reasonable that, in the case where the direction of the electric field is such that the nearly flat bands possess no net Chern number, the nearly flat band can be modeled in real space by a triangular-lattice model with two orbitals (corresponding to the two valleys) per site supplemented with interactions. However, some care is still necessary. Time reversal and C 2 both act by flipping the valley index. Thus, the band dispersion ϵ AE ðkÞ within a single valley is not symmetric under k → −k: is satisfied. A real-space tight-binding description on the triangular lattice therefore takes the form with t rr 0 real and positive. The phases ϕ rr 0 are, in general, nonzero. The phase ϕ even on nearest-neighbor bonds cannot be removed, as, in general, the symmetries permit a nonzero flux Φ for any single valley through an elementary triangle (and the opposite flux for the other valley). When the Coulomb interaction dominates, the SUð4Þ ferromagnet with a further selection of IVC order is once again a possibility. From a real-space point of view, the projection of the Coulomb interaction to the Wannier basis used to formulate the tight-binding model leads to an appropriate interaction Hamiltonian. If the Wannier orbitals are not tightly confined to each triangular site, then there will be a significant intersite ferromagnetic Hund's exchange which promotes SUð4Þ ferromagnetism with a further selection of IVC ordering.
It is interesting to consider the limit where the Wannier functions are sufficiently tightly localized that such a ferromagnetic intersite exchange is weak and can be ignored. In that limit, to obtain a minimal model for this system we restrict the hopping to just be nearest neighbor and include an on-site repulsion. The minimal model then takes the form with ϕ 12 þ ϕ 23 þ ϕ 31 ¼ AEΦ with a þ sign for up-facing triangles and a − sign for down-facing ones. Here, sites 1, 2, and 3 are assumed to be arranged counterclockwise on each triangle. N R is the total electron number at site R, and N 0 controls the filling factor. As in previous sections, this Hamiltonian has a Uð2Þ × Uð2Þ symmetry corresponding to independent Uð2Þ rotations of each valley in addition to the discrete symmetries described above. The model thus needs to be supplemented with further weaker interactions that break the continuous symmetry down to Uð2Þ, though we do not specify them here. Note that if the flux Φ ¼ 0, then the model actually has an even higher SUð4Þ symmetry. Also, this model has an extra C 2 rotational symmetry that flips the valleys, which should be viewed as an emergent symmetry of the model defined above, which should be broken by other terms. In particular, it should be differentiated from a microscopic C 2 symmetry; if such a microscopic symmetry was present, it would combine with time-reversal symmetry and protect an odd number of Dirac points in the single-valley trilayer graphene band structure, which suffers from a parity anomaly [60,61]. Our effective model does not suffer from this parity anomaly, because this microscopic C 2 symmetry is absent. The minimal model above allows for a discussion of the Mott insulator in the strongly correlated regime of large U at integer N 0 . In the experiments, Mott insulators at fillings N 0 ¼ 1, 2 have been reported. In the large-U limit, the effective model takes the form of a "spin-orbital" Hamiltonian on a triangular lattice that has four states per site: two spins and two valleys. A systematic t=U expansion is readily performed to yield this spin-orbital Hamiltonian. At Oðt 2 =UÞ, the "superexchange" is not sensitive to the flux Φ, and we end up with an SUð4Þ quantum antiferromagnet on the triangular lattice (This is not actually correct. As shown in Ref. [62], there is a second order process which depends on the phase of t. This leads to a term that breaks SUð4Þ symmetry. However we anticipate that the more interesting dependence on flux comes from the third order term identified below.). For N 0 ¼ 1, the SUð4Þ spins are in the fundamental representation, while for N 0 ¼ 2 they are in the six-dimensional representation.
Antiferromagnetic models of SUð4Þ spins have been studied on a variety of lattices with different motivations (for some representative recent papers, see Refs. [63][64][65]). It seems likely that they go into "paramagnetic" states that preserve SUð4Þ symmetry. However, a new feature in the present problem is the presence of the flux Φ in the underlying Hubbard model which breaks SUð4Þ to Uð2Þ × Uð2Þ. This feature modifies the spin-orbital model at Oðt 3 =U 2 Þ. In the experiments, the ratio of Coulomb interactions to the bandwidth of the nearly flat bands may be controlled by a perpendicular electric field, and it may thus be possible to tune the strength of these third-order terms relative to the second-order ones. In Appendix F, we derive the spin-orbital Hamiltonian to third order, showing how the flux Φ leads to new terms not present in an SUð4Þ-invariant model. We, however, leave for the future a detailed study of these interesting spin-orbital models.
At any rate, we emphasize that this trilayer system is thus qualitatively different from the twisted bilayer graphene where we argue that a real-space triangular-lattice description is not possible due to Dirac crossings within the nearly flat bands.

X. PROPOSED FUTURE EXPERIMENTS
As discussed in previous sections, the ideas presented in this paper suggest a number of experiments which will be extremely useful in revealing the physics. Here, we reiterate and elaborate on some of these suggestions.
A crucial clue from the existing experiments is that an inplane field suppresses the superconductivity-at optimal doping-when the Zeeman energy is of the order of the zero-field T c . This suppression indicates spin-singlet pairing and that T c at optimal doping is associated with the loss of pairing. It will be extremely useful to study this systematically as a function of doping. For the doped C 3 -broken insulator, the superconductivity may be driven by the pairing of a small Fermi surface of electrons. Then (except perhaps at very small doping), T c and the critical Zeeman scale continue to track each other as the doping is decreased. In contrast, if the pairing (in the form of singlet valence-bond formation) already happens in the Mott insulator-as in the usual RVB theory, or with the featureless Mott insulatorthen, with decreasing doping, T c and the critical Zeeman field should part ways significantly.
A second crucial clue from the experiments is the 2; 4; 6; 8; … degeneracy pattern of the Landau fan emanating from the Mott insulator. We propose that this pattern is due to the freezing of the valley degree of freedom, which can be distinguished from the alternate possibility that there is spin freezing by studying the quantum oscillations in a tilted field. Zeeman splitting, if it exists, should show up in a characteristic way as a function of the tilt angle.
Our proposal is the intervalley phase coherence at a scale higher than both the superconducting T c ≈ 1.5 K and the Mott insulating scale approximately 5 K. The valley symmetry is, as usual, related to translational symmetry of the microscopic graphene lattice. In the twisted bilayer, there is an approximate translation symmetry that holds at some short scale associated with translation by one unit cell of the microscopic graphene lattices. Under this approximate translation operation, electrons at the different valleys get different phases, which is a U v ð1Þ rotation. Therefore, intervalley ordering strongly breaks this approximate short translation symmetry. Within each moiré site, the density of states is uniform at the lattice scale when there is no intervalley ordering but oscillates once this order sets in. This difference may be detectable through scanning tunneling microscopy (though, if the bilayer graphene is fully encapsulated by boron nitride, it may be challenging to see the graphene layer).
Assuming there is intervalley ordering, if the undoped Mott insulator develops antiferromagnetic order, it appears likely that the doped superconductor is a spin-singlet d x 2 −y 2 þ id xy superconductor, which spontaneously breaks time-reversal symmetry. In contrast, for a doped C 3 -broken state, either s-wave or d þ id spin-singlet superconductivity seem possible. It is also useful to directly search for broken C 3 or moiré translational symmetry in the experiments. Finally, the very different behavior in quantum oscillations between electron and hole doping away from the Mott insulator suggests that there may be a first-order transition into the Mott state as it is approached from the charge neutrality point, which will lead to a hysteretic response as the gate voltage is tuned towards charge neutrality from the Mott insulator.

XI. CONCLUSION
In this paper, we addressed some of the theoretical challenges posed by the remarkable observations of Mott insulating states and proximate superconductivity in twisted bilayer graphene.
We proposed that both the Mott insulator and the superconductor develop out of a state with spontaneous intervalley coherence that breaks independent conservation of electrons at the two valleys. We described a mechanism for the selection of this order over other spin-or valley-polarized states owing to the peculiarities of the symmetry realization in the band structure. We showed that intervalley ordering by itself does not lead to a Mott insulator and described possible routes through which a Mott insulator can develop at a low temperature. A specific concrete example is a C 3 -broken insulator. We showed how doping such an insulator leads to an understanding of the quantum oscillation data and presented a possible pairing mechanism for the development of superconductivity. We described potentially useful experiments to distinguish the various possible routes to a Mott insulator from an intervalley coherent state.
Our work was rooted in a microscopic understanding of the twisted graphene bilayer. We showed that the momentumspace structure of the nearly flat bands places strong constraints on real space descriptions. In particular, contrary to natural expectations, we showed that a real-space lattice model is necessarily different from a correlated triangularlattice model with two orbitals (corresponding to the two valleys) per site. This difference is due to a symmetryenforced obstruction to constructing Wannier functions centered at the triangular sites that can capture the Dirac crossings of the nearly flat bands. We showed that a honeycomb lattice ORIGIN OF MOTT INSULATING BEHAVIOR AND … PHYS. REV. X 8, 031089 (2018) representation may be possible but requires a nonlocal implementation of valley U v ð1Þ symmetry. In our description of the intervalley ordered state and its subsequent lowtemperature evolution into the Mott or superconducting states, we sidestepped these difficulties by first treating the problem directly in momentum space and defining a real-space model only at scales below the intervalley ordering (when the obstruction to a honeycomb representation is gone). We also contrasted the bilayer system with trilayer graphene where Mott insulators have recently been observed. In the trilayer system, it is reasonable to construct a real-space triangularlattice two-orbital model, but the symmetries allow for complex hopping (with some restrictions). We argue that this system may offer a valuable platform to realize interesting quantum spin-orbital liquids.

ACKNOWLEDGMENTS
We thank Yuan Cao, Valla Fatemi, and Pablo Jarillo-Herrero for extensive discussions of their data and sharing their insights. We also thank Shiang Note added.-Recently, Ref. [66] appeared, which has significant differences from the present paper; Refs. [67][68][69], which discuss, in particular, the symmetries and constructions of Wannier functions, also appeared. Note that these discussions on Wannier functions disregard the presence of the (emergent) sixfold rotation symmetry, and, consequentially, the Dirac points observed at charge neutrality are not symmetry-protected features of these models. A more detailed discussion can be found in our subsequent work reported in Ref. [70].

APPENDIX A: LATTICE AND SYMMETRIES
In this Appendix, we document some details on the conventions and the symmetry transformations.
Consider a monolayer of graphene. We let the primitive lattice vectors A and reciprocal lattice vectors B be where a ¼ 2.46 Å is the lattice constant (some authors use a to denote the C-C bond length, which is a factor of ffiffi ffi 3 p smaller than the lattice constant we are using here). In this choice, we can choose the basis of the honeycomb lattice sites to be In momentum space, the K and K 0 points are given by AEðB 1 þ B 2 Þ=3 or, for the equivalent ones lying on the x axis, AEð2B 1 − B 2 Þ=3. Note that jKj ¼ 4π=ð3aÞ, as is well known. Furthermore, we take the Dirac speed v F to be 10 6 ms −1 . Besides, we choose the moiré lattice vectors to be In the main text, we list all the symmetries of the continuum theory (Table I). Here, we tabulate explicitly the symmetry transformations of the electron operators, which follow from that of the Dirac points in the monolayer problem:t where μ ¼ t, b. Note that M y is the only symmetry which flips the two layers, i.e., M y ½t ¼ b and vice versa. The symmetries listed in Eq. (A4) generate all the spatial symmetries of the continuum theory of the TBG [7,12,21] (in wallpaper group 17). In particular, we see that t ρ and M y preserve the valley index (K versus −K) whereas C 6 and T do not. However, their (pairwise) nontrivial products leave valley invariant, and it is helpful to also document their symmetry action explicitly (which are fixed by the above): Here, we make two remarks regarding the subtleties in the symmetry representation documented here: First, the momenta k appearing above are defined as the deviation from the original Dirac points in the monolayer problem. Generally, they correspond to different momenta in the moiré BZ. For instance, the Dirac point labeled by ðþ; tÞ, i.e., that of the K point in the top layer, is mapped to K M , whereas ð−; tÞ is mapped to K 0 M . Similarly, ðþ; bÞ and ð−; bÞ are, respectively, mapped to K 0 M and K M . If desired, one can also rewrite Eqs. (A4) and (A5) using a common set of momentum coordinates defined with respect to the origin of the moiré BZ.
Second, the representation of the translation symmetryt ρ has a subtlety in its definition, which is because the microscopic translation effectively becomes an internal symmetry for the slowly varying fields appearing in the continuum theory. As such, for a single layer one can deduce its representation only up to an undetermined phase, hence the appearance of ∝ in Eq. (A4). However, the relative momentum across the different slowly varying fields, say, the operators corresponding to the þ valley of the top and bottom layers, is a physical quantity. Consequentially, there is really only one common ambiguity across all the degrees of freedom appearing in the continuum theory.

APPENDIX B: VALLEY SYMMETRY AND WANNIER OBSTRUCTION
We argue here that the valley-symmetry-resolved band structure does not admit a Wannier representation. Note, since we ignore spin, that this is a two-band model, which will be crucial for what follows. If one were to include other bands, the arguments below would fail, although precisely what selection of bands would lead to localized Wannier functions (LWFs) remains to be determined. In some ways, it is not very surprising that a valley-resolved band structure does not admit a Wannier description, and a simple example is a single valley of monolayer graphene, which is just an isolated Dirac node. But in those cases the band structure does not terminate on raising the energy and, hence, does not form a band. In contrast, in our present problem for TBG, there is an isolated band, and so one may expect to capture the physics with LWFs. Nonetheless, we argue that there is an obstruction, as can be seen as follows.
We begin with three ingredients: (i) a two-band model; (ii) C 2 T symmetry; and a third ingredient which will be specified shortly. The two ingredients above enforce the following form on the momentum-space Hamiltonian: where there is no condition on the function ϵ i ðpÞ. Similar to the main text, we implement C 2 T by σ 1 K, where K denotes complex conjugation. Check that C 2 T leaves the Hamiltonian above invariant. Now, if we are interested in the band wave functions, they are independent of the first term in the Hamiltonian, and we could pass to the following one by imposing a constraint: The obvious constraint is to demand which is nothing but the chiral condition that specifies class AIII, and one can show that this condition can be implemented as an on-site symmetry for a two-band system. Now, we introduce the third ingredient: (iii) The two Dirac points at the middle of this band structure have the same chirality. This ingredient allows us to write down the following effective Hamiltonian close to neutrality: where we now have a four-component structure to include the two Dirac nodes. Note that there is no mass term that will gap out these nodes and also preserve the chiral condition (B3); hence, this Hamiltonian corresponds to the surface of a threedimensional topological phase in class AIII, with index ν ¼ 2, corresponding to the two Dirac nodes. Since this Hamiltonian is the surface state of a nontrivial 3D topological phase, it does not admit a Wannier representation. However, when combined with the opposite valley band structure, together the pair of band structures does admit LWFs but at the price of losing valley conservation symmetry. Finally, let us address a conundrum that the careful reader may be puzzled by. The valley-resolved bands are stated to be the anomalous surface states of a 3D topological phase; nevertheless, they appear as isolated bands, which seems to contradict the usual expectation that such anomalous bands cannot be separated in energy. The way this conundrum is resolved in the present case is through the two-band condition, which further allows us to map the problem to one with particle-hole symmetry (class AIII). The later problem can have anomalous surface states that are disconnected from the bulk bands, because they are forced to stick at a zero chemical potential, and hence cannot be deformed into the bulk. This mapping to class AIII holds only for the two-band model; hence, if we add bands or fold the Brillouin zone from translation symmetry breaking, the presented arguments no longer hold.

APPENDIX C: WANNIER FUNCTIONS
We construct Wannier functions using the projection method [57]. The method proceeds by first specifying a collection of well-localized, symmetric trial wave functions in real space, which serves as the seed for constructing a smooth gauge required in obtaining well-localized Wannier functions for the problem of interest.
Let us begin by considering the symmetry properties of a real-space wave function in our present problem. For simplicity, we let β be a collective index for the valleys and layers, i.e., β ¼ ðþ; tÞ; ð−; tÞ; ðþ; bÞ; ð−; bÞ. Define the real-space electron operatorŝ To construct a collection of well-localized, symmetric trial wave functions, one can follow the standard discussion concerning the symmetry representation associated with such a real-space basis, say, as reviewed in Supplemental Materials of Ref. [30]. We briefly sketch the main ideas below. Let W β h 0 ðrÞ be a two-component (column) vector localized to h 0 (the two components here originate from the sublattice degree of freedom in the microscopic problem).
and its associated momentum-space operator where a is a moiré lattice vector and we assume a periodic system with V moiré unit cells. For our purpose, we wantŴ † h 0 to serve as our seed for the construction of symmetric Wannier functions. To this end, suppose h 0 ¼ ða 1 − a 2 Þ=3, which corresponds to a honeycomb site in the moiré potential, i.e., an AB=BA region. We demandŴ † h 0 to be invariant under time reversal, the mirror M y , and the threefold rotation about h 0 , which we denote byC 3 . In addition, recall that the previously predicted charge density profile [8,9,11,14,23] suggests that the Wannier functions take the shape shown in Fig. 3(a). Therefore, it is natural to consider a trialŴ † h 0 taking the formŴ † whereŵ † 0 is localized to the unit-cell origin (an AA site). By definition,Ŵ † h 0 transforms trivially underC 3 , which one can verify leads to the correct C 3 representations for the nearly flat bands.
It remains to ensure thatŵ † 0 is symmetric under T and M y . In the spirit of Eq. (C2), we may writeŵ † 0 ¼ P βŵ β † 0 for β ¼ ðþ; tÞ; ð−; tÞ; ðþ; bÞ; ð−; bÞ. From the symmetry transformation in Eq. (A4), we set Similarly, to respect M y symmetry, we set where ζ M y ¼ AE1. As such, we reduce our degree of freedom on the specification of the trial wave function W † h 0 down to our choice ofŵ þt † 0 ¼ R d 2 rψ † β;r w þt 0 ðrÞ. Our only condition on w þt 0 ðrÞ is that it is a two-component wave function well localized to 0; however, we simply assume a Gaussian form: where ξ is the localization length and ϕ 0 is a constant twocomponent vector. Correspondingly, all the other twocomponent wave functions take a similar form, although they can be localized to a different point, say,C 3 0. Such a choice is particularly convenient, as its Fourier transform can be readily evaluated: which enables an efficient computation of the overlap between our trial and the Bloch wave functions. Thus far, we have focused on only one well-localized wave function in real space. In our problem, the two sites of the effective honeycomb lattice are related by C 6 ; i.e., we simply construct the wave function localized to Besides, to describe a four-band problem, we should have two orbitals on each of the honeycomb sites. These two orbitals are not symmetry related. However, to reproduce the M y representation in momentum space, we have to take the two orbitals to, respectively, correspond to ζ M y ¼ þ1 and ζ M y ¼ −1.
In our numerical construction of the Wannier functions, we take the localization length to be 0.15ja 1 j and the twocomponent vectors These choices are found simply by a search of the parameter space to optimize the minimum overlap between the trial and the Bloch wave functions. We check the overlap for approximately 1180 momenta along the highsymmetry line M M − K M − Γ M − M M , as well as for an additional 1000 randomly sampled points in the BZ. The minimum and maximum over the overlap are, respectively, found to be 0.38 and 3.80, indicating a satisfactory construction of the Wannier functions. In Fig. 3 in the main text, we present the results for the two symmetryrelated Wannier functions labeled by ζ M y ¼ þ1; the corresponding results for the pair with ζ M y ¼ −1 are shown in Fig. 5. Remarkably, the localization properties of the two sets are essentially identical.

APPENDIX D: TIGHT-BINDING MODEL
In this Appendix, we provide an explicit tabulation of the bonds in the H WF corresponding to Figs. 4(b) and 4(g) in the main text, as well as the associated I WF z , whose eigenvalues are plotted in Fig. 4(e).
In the following, we parametrize a "bond" by where a is a moiré lattice vector and To, Fr ¼ 1; …; 4 labels the four Wannier functions localized to each unit cell. Physically, orbital 1 corresponds to the one localized to h 0 with ζ M y ¼ þ1; orbital 2 is the one localized to h 1 symmetry related to orbital 1; orbital 3 is localized to h 0 with ζ M y ¼ −1; and orbital 4 is symmetry related to 3. The bonds in H WF with t c ¼ 63μeV are tabulated in Table II(a). Note that we tabulated only half of the bonds, in the sense that the Hermitian conjugates of the listed bonds are not included. In particular, for consistency we also halve the coefficient of the on-site chemical potential ∼tĉ † rĉr , which is Hermitian by itself. We in addition subtract the "trace part" of the chemical potential (i.e., we remove a constant energy offset) from the model. The effective valley-charge operator defined with t 0 c ¼ 0.15 is similarly tabulated in Table II(b).

APPENDIX E: HARTREE-FOCK THEORY FOR SELECTION OF IVC ORDERING
Here, we discuss a simple mean-field treatment to illustrate that an IVC state is favored by the system at ν ¼ −2, which is described by the following simplified Hamiltonian of Eq. (12): where the free Hamiltonian is with a the valley index, n the band index, and α the spin index. Notice that the dispersion ϵ an ðkÞ is independent of the spin and, due to time reversal, ϵ an ðkÞ ¼ ϵ −an ð−kÞ.
We assume a simple form of interaction: where n aα ðxÞ is the electron density with flavor a and spin α. Repeated indices are summed over here. This interaction has an SUð8Þ symmetry. As discussed in the main text, the more complete form of the interaction that takes into account the form factors arising from projecting the interactions onto the nearly flat bands should be where ju an ðkÞi is the Bloch wave function of a state in the nearly flat bands labeled by valley index a, band index n, and momentum k (it has no dependence on the spin indices). However, for simplicity, we first present the result from analyzing the simplified interaction (E3) and comment on the preliminary result from analyzing (E4) at the end of this Appendix. We factorize the interaction in a Hartree-Fock mean-field manner: ½hc † anα ðk 1 þ qÞc anα ðk 1 Þic † a 0 n 0 α 0 ðk 2 − qÞc a 0 n 0 α 0 ðk 2 Þ þ hc † a 0 n 0 α 0 ðk 2 − qÞc a 0 n 0 α 0 ðk 2 Þic † anα ðk 1 þ qÞc anα ðk 1 Þ − hc † anα ðk 1 þ qÞc a 0 n 0 α 0 ðk 2 Þic † a 0 n 0 α 0 ðk 2 − qÞc anα ðk 1 Þ − hc † a 0 n 0 α 0 ðk 2 − qÞc anα ðk 1 Þic † anα ðk 1 þ qÞc a 0 n 0 α 0 ðk 2 Þ − hc † anα ðk 1 þ qÞc anα ðk 1 Þihc † a 0 n 0 α 0 ðk 2 − qÞc a 0 n 0 α 0 ðk 2 Þi þ hc † anα ðk 1 þ qÞc a 0 n 0 α 0 ðk 2 Þihc † a 0 n 0 α 0 ðk 2 − qÞc anα ðk 1 Þi: ðE6Þ The first, second, and fifth terms are the Hartree contributions, while the other terms are the Fock contributions. The Hartree contribution is determined by the local total electron density alone and independent of ordering, so for our purposes they can be simply dropped. We thus focus on the Fock terms.
(1) Spin-polarized state.-We start with the SP state. In this case, the interaction Hamiltonian is replaced by its mean-field representative, which reads The total mean-field Hamiltonian is then f½ϵ an ðkÞ − 2gϕ 1n c † anþ ðkÞc anþ ðkÞ þ ½ϵ an ðkÞ þ 2gϕ 1n c † an− ðkÞc an− ðkÞg þ 4gNðϕ 2 11 þ ϕ 2 12 − n 2 1 − n 2 2 Þ: ðE14Þ Consider the limit where g is much larger than the bandwidth. In this limit, we expect the spin is fully polarized. Without loss of generality, assume ϕ 1n ≥ 0 for both n. In this case, all electrons are in the state with α ¼ þ and n ¼ 1. Self-consistency requires that which yields This is indeed a fully polarized state. In this fully polarized state, and further assuming that the lower flat band is strictly lower than the higher flat band, the total energy of this state is (2) Valley I z polarized.-Next, we turn to the state IZP. In this case, the interaction Hamiltonian is replaced by The total mean-field Hamiltonian is Suppose g is much larger than the bandwidth, the system wants to fully polarize along a ¼ þ, and both bands with n ¼ 1 and n ¼ 2 are occupied for a ¼ þ. Self-consistency requires that which yields This is indeed a fully valley-Z-polarized state.
In this fully polarized case, the total energy of this state is (3) IVC state.-Finally, we discuss the IVC state. In this case, the interaction Hamiltonian is replaced by The total mean-field Hamiltonian is Denoteε n ðkÞ ¼ f½ϵ þn ðkÞ þ ϵ −n ðkÞ=2g and δϵ n ðkÞ ¼ f½ϵ þn ðkÞ − ϵ −n ðkÞ=2g, and the spectrum of the above Hamiltonian is : ðE27Þ Again, consider the limit where g is much larger than the bandwidth; then the system tends to occupy the bands with energies E −1 . Self-consistency requires that In the limit where g is much larger than the bandwidth, ϕ 3n → 1=2, which means the system tends to fully polarize along the valley-XY direction. At the same time, n an → 1=2. The total energy of this state in this case is This energy is lower than E 2 in Eq. (E22): where time-reversal symmetry of the noninteracting band structure is used in the last step: P k δϵ 1 ðkÞ ¼ 0. Therefore, if the interaction strength is much larger than the bandwidth, the system tends to be fully polarized. Within this analysis, among different fully polarized states, the valley-XY-ordered state (IVC) has a lower energy than a spin-polarized state and a valley-Z-ordered state.
Finally, we comment on the effect of taking into account the form factors into the interaction term, as in Eq. (E4). It turns out that in this case which state has the lowest energy can potentially be changed from the IVC state if bands are too flat, and our preliminary calculations show that the spin-polarized state and the valley-Z-ordered state have a lower energy compared to the IVC state when the interaction strength is around 10 times the bandwidth of the nearly flat bands. A more reliable way to settle down the real ground state is to do a numerical calculation formulated in momentum space, using the Hamiltonian given by Eqs. (E1), (E2), and (E4).

APPENDIX F: SPIN-ORBITAL MODEL FOR MOTT INSULATORS IN TRILAYER GRAPHENE
In this Appendix, we derive an effective spin-orbital model applicable to the trilayer graphene described in Sec. IX, in the limit of U ≫ t. This effective model is applicable for both the cases of ν ¼ −1 and ν ¼ −2, and it is obtained by a systematic expansion in the large-U limit to the order of t 3 =U 2 .
Besides the translation symmetry of the triangular lattice, the system is assumed to have Uð2Þ × Uð2Þ symmetries, corresponding to charge and flavor Uð1Þ conservations, as well as spin conservation. In addition, there is a C 6 symmetry that maps one flavor into the other and a time-reversal symmetry that also maps one flavor into the other (while leaving the spin unchanged, so T 2 ¼ 1 for this time reversal).
We start from a Hubbard model with nearest-neighbor hopping and on-site Hubbard interactions. Consider three nearby sites (A, B, and C) forming an elementary triangle of this triangular lattice. The kinetic Hamiltonian is taken as where a ¼ AE labels the flavor, α labels the spin, and η a ¼ AE1 if a ¼ AE. The interaction Hamiltonian on each site is taken as where N ¼ P aα c † aα ðr i Þc aα ðr i Þ, and we are interested in both the cases with N 0 ¼ 2 and N 0 ¼ 1. All other terms of the many-body system can be generated by applying various symmetries.
We are looking for an effective spin-orbital model at large U to the order of t 3 =U 2 . We first present the result and discuss some simple physical consequences before presenting the details of the derivation. The final result is with and where h ð3;1Þ ¼ 8n 2 cos 3ϕðn 2 þ S A · S B þ S B · S C þ S C · S A Þðn 2 þ I z A I z B þ I z B I z C þ I z C I z A Þ þ 4n 2 ðn 2 þ S A · S B þ S B · S C þ S C · S A Þ · ½e iϕ ðI þ and h ð3; In the above, n ¼ 1 2 for N 0 ¼ 1, and n ¼ 1 for N 0 ¼ 2. We briefly comment on these results before turning to the detailed derivation. First, we notice the effective model indeed has the same set of symmetries as the original Hubbard model. Second, we note that H ð2Þ is actually SUð4Þ invariant, and H ð3Þ is also SUð4Þ invariant at ϕ ¼ 0, which is most easily seen by inspecting Eqs. (F19) and (F30). These SUð4Þ symmetric interactions can potentially make the system an SUð4Þ antiferromagnet. Third, there is a spin chirality term for the two valleys with opposite coefficients, which can potentially lead to interesting kinds of topological order, such as a double-semion state. Last, we note the system may develop an SUð4Þ antiferromagnetic order in the limit where U ≫ t at N 0 ¼ 2, although there is evidence that the system is disordered at N 0 ¼ 1. Suppose the SUð4Þ antiferromagnetic Heisenberg model indeed results in an SUð4Þ-broken state; we would like to understand how the SUð4Þ-breaking terms in the Hamiltonian affects the ground state.
To this end, we expand H ð3Þ for small ϕ and obtain h ð3;1Þ ¼ 8n 2 ðn 2 þ S A · S B þ S B · S C þ S C · S A Þðn 2 þ I A · I B þ I B · I C þ I C · I A Þ − 8½S A · ðS B × S C Þ · ½I A · ðI B × I C Þ þ 24ϕ½I z A I z B I z C þ n 2 ðI z A þ I z B þ I z C Þ · ½S A · ðS B × S C Þ − 8ϕn 2 ðn 2 þ S A · S B þ S B · S C þ S C · S A Þ½I y and h ð3;2Þ ¼ 8ðn 2 þ S A · S B Þðn 2 þ I A · I B Þ þðA → B → C → AÞ þ8ϕðn 2 þ S A · S B ÞðI y A I x B − I x A I y B Þ þðA → B → C → AÞ þOðϕ 2 Þ: ðF9Þ As we can see, the main effect of SUð4Þ-breaking interactions to the leading order of ϕ, besides giving rise to the spin chirality terms, is to introduce terms roughly in the following form to the Hamiltonian: Consider the I's as classical spins on the XY plane and parametrize I x A ¼ cos θ A and I A y ¼ sin θ A ; the above Hamiltonian becomes This term tends to introduce some canting of the I order to gain energy.
Below, we derive effective spin-orbital Hamiltonians of such systems in the large-U limit to the order of t 3 =U 2 , for N 0 ¼ 1 and N 0 ¼ 2 separately. We use the standard Van Vleck perturbation theory to derive the effective Hamiltonian, which involves two steps: writing down the matrix elements of the effective Hamiltonian within the low-energy manifold and expressing these matrix elements in terms of charge-neutral operators.
In this problem, the effective Hamiltonian can be written as with and where P is the projector into the ground-state manifold of V and 1. Mott insulator at N 0 = 1 We first discuss the effective Hamiltonian for N 0 ¼ 1. We first write the results in terms of some SUð4Þ generators and convert them into a form in terms of the S and I operators later. The final result in terms of SUð4Þ generators is where with i ¼ 1, 2, 3, 4, and j1i ¼ j þ ↑i, j2i ¼ j þ ↓i, j3i ¼ j − ↑i, and j4i ¼ j − ↓i. In terms of these states, the action of T ij is The details of the calculations are below.

a. Order t 2 =U
To get the effective Hamiltonian at the order t 2 =U, it is sufficient to consider a single bond of the triangular lattice, and all other terms can be generated by applying symmetries.
2. Mott insulator at N 0 = 2 Now we discuss the effective Hamiltonian for N 0 ¼ 2. In this case, there are six states on each site, which can be denoted as jiji ≡ c † i c † j j0i ¼ −jjii. Again, we first write the result in terms of some SUð4Þ generators and then convert it to a form in terms of the original S and I operators. The final result is where T ij ¼ c † i c j . And whereĨ z A gives the flavor of the particle that is acted by the T operators. For example, T lj A T jk B T kl C e 2iϕðĨ z A þĨ z B þĨ z C Þ ¼ T lj A T jk B T kl C e iϕðη j þη k þη l Þ ¼ e 2iϕðĨ z A þĨ z B þĨ z C Þ T lj A T jk B T kl C . The details of the calculations are below.

a. Order t 2 =U
To get the effective Hamiltonian at the order t 2 =U, it is sufficient to consider a single bond of the triangular lattice, and all other terms can be generated by applying symmetries. Now we first calculate the matrix elements of the effective Hamiltonian at the order of t 2 =U. Denote the two sites linked by this bond by A and B, and then jij; kli ¼ c † i ðAÞc † j ðAÞc † k ðBÞc † l ðBÞj0i. It is useful to distinguish three types of states: jij; kli, jij; iki, and jij; iji, where different letters denote different states. Clearly, there are no matrix elements between two states from two different types, and all we need is to calculate the matrix elements between states within the same type.
The independent matrix elements include hij; ijjH ð2Þ jij; iji ¼ 0; and h ð3; b. Effective Hamiltonian for N 0 = 2 For n ¼ 1 (N 0 ¼ 2), it is useful to first consider the general relation T ij T kl ¼ δ jk T il − c † i c † k c j c i . Restricting to twoparticle states, we can use this general relation to write down