Entanglement entropies of equilibrated pure states in quantum many-body systems and gravity

We develop a universal approximation for the Renyi entropies of a pure state at late times in a non-integrable many-body system, which macroscopically resembles an equilibrium density matrix. The resulting expressions are fully determined by properties of the associated equilibrium density matrix, and are hence independent of the details of the initial state, while also being manifestly consistent with unitary time-evolution. For equilibrated pure states in gravity systems, such as those involving black holes, this approximation gives a prescription for calculating entanglement entropies using Euclidean path integrals which is consistent with unitarity and hence can be used to address the information loss paradox of Hawking. Applied to recent models of evaporating black holes and eternal black holes coupled to baths, it provides a derivation of replica wormholes, and elucidates their mathematical and physical origins. In particular, it shows that replica wormholes can arise in a system with a fixed Hamiltonian, without the need for ensemble averages.


Introduction
Consider a quantum many-body system initially in a far-from-equilibrium pure state |Ψ 0 .
If the system is non-integrable, it should eventually approach a thermal equilibrium, in the following sense. For times t t s , where t s is a thermalization time scale, |Ψ(t) = U (t) |Ψ 0 can be associated with macroscopic thermodynamic quantities such as temperature, entropy, and free energy which obey the usual thermodynamic relations, and measurements of generic few-body observables in |Ψ(t) exhibit the same behavior as in the equilibrium density matrix ρ (eq) with those macroscopic parameters.
Even as the state equilibrates in the above sense, under unitary time-evolution it must go to a pure state, and therefore cannot become equal to the mixed state ρ (eq) . A natural question then is how we can tell an equilibrated pure state apart from an equilibrium density matrix. For instance, this question arises in the context of Hawking's information loss paradox [1,2], in trying to understand whether the evolution of a black hole formed from the gravitational collapse of a pure state is unitary. For this purpose, we can use the Renyi entropies S In a pure state, one must have for any subsystem A and its complementĀ S (A) n (t) = S (Ā) n (t), n = 1, 2, · · · .
Other than brute-force numerical simulations of individual cases, we currently do not have an efficient method for calculating S Here d A and dĀ are respectively the dimensions of the subsystems A andĀ, and the above expression is exact only in the limit where one of d A , dĀ is much larger than the other. When d A dĀ, these are equal to the entanglement entropies of a thermal state at infinite temperature. It can be checked that when the dimension of the Hilbert space is large, the standard deviation about the average (1.3) is small. Thus the right-hand side of (1.3) should provide a good approximation for the entanglement entropies of a typical pure state.
This observation has yielded many important results, such as the prediction of the Page curve for black evaporation [8]. Similarly, a Haar average over pure states has been used to predict when information in a black hole can be transferred to its Hawking radiation [9]. The random average idea, however, cannot be directly applied in cases where we expect equilibration to a finite temperature, or in field theories and other systems with infinite Hilbert space dimension, where a canonical and calculable average such as the Haar average does not seem to exist. Even for a finite-dimensional system that equilibrates to infinite temperature, in order to apply (1.3) when we have a fixed initial state and time-evolution operator, we need to make the highly non-trivial assumption that the system can evolve to a typical pure state. It would be useful to better understand the physical basis for this assumption, and have a systematic procedure with which we can improve upon (1.3) for such cases.
In this paper, we develop a general approximation method for calculating S (A) n for equilibrated pure states in systems with a fixed initial state and time-evolution operator, in the limit where the effective dimension of the Hilbert space (roughly, the dimension of the accessible part of the Hilbert space from the initial state) is large. The approximation scheme, which we will refer to as the equilibrium approximation, can be applied to finite temperatures, systems with infinite Hilbert space dimension, and field theories.
The method we propose builds on an observation in [10] that the Haar average in a finite-dimensional Hilbert space can be seen as a projection into a subspace P of the replica Hilbert space used for computing Renyi entropies. This means that for a system with a fixed Hamiltonian, the result (1.3) can be considered an approximation in which at leading order one ignores the contribution from the orthogonal subspace to P . The generalization to finite temperatures or systems with infinite-dimensional Hilbert spaces then boils down to identifying the appropriate subspace of the replica Hilbert space to project into to obtain the leading contribution. Equivalently, the method can be thought of as identifying the contributions from the most important subset of configurations in the Lorentzian path integrals for the Renyi entropies. Since we drop certain well-defined contributions to the time-evolved quantities in making this approximation, it can in principle be systematically improved by adding these contributions back. We also develop a self-consistent criterion to demonstrate the validity of the approximation.
Some important general features of the results from our approximation method are: 1. The expressions for S (A) n (t) in an equilibrated pure state are time-independent, and can be expressed solely in terms of partition functions and entropies of an equilibrium density operator ρ (eq) . They are thus independent of details of the initial state and capture the effects of equilibration.
2. The expressions are manifestly compatible with the constraint from unitarity in (1.2). n (t) for an equilibrated pure state can be expressed in terms of a sum of Euclidean path integrals when ρ (eq) has a Euclidean path integral representation. Each term in the sum has n replicas of the Euclidean path integrals of ρ (eq) connected in a certain way, determined by an element of the permutation group S n . The approximation thus provides a general physical mechanism for how Euclidean path integrals associated with the equilibrium density operator can arise as the dominant subset of contributions from intrinsically Lorentzian path integrals.

While S
Since the only input that goes into our approximation method is information about the equilibrium density matrix ρ (eq) , it can be used to obtain universal results for the entanglement entropies of a variety of quantum many-body systems when the initial state equilibrates to a given type of ensemble. We explicitly obtain the universal expressions for the microcanonical and canonical ensembles.
One important motivation for studying the entanglement entropies of equilibrated pure states is to understand the unitarity of black hole evolution. Consider a situation where the initial state |Ψ 0 describes a star, which under time-evolution collapses to form a black hole. For all practical purposes, a black hole looks like a thermal state: it emits thermal radiation at a certain temperature, and has an entropy which satisfies the standard thermodynamic relations. Furthermore, correlation functions of a finite number of few-body observables in the black hole geometry have the same behavior as in a thermal state. The black hole is thus in an equilibrated pure state if the time-evolution in gravity obeys the usual rule of unitarity in quantum mechanics. The formalism we developed can thus be applied to a black hole system to obtain its entanglement entropies in a way that respects unitarity. In particular, item 3 above implies that one can get expressions which are compatible with unitarity using Euclidean gravity path integrals.
Recently, there has been important progress in understanding the unitarity of black hole evolution through derivations of the Page curve [11][12][13] (see also ). 2 In particular, in order to obtain Renyi and von Neumann entropies compatible with unitarity, one needs to include certain "island" contributions [13] in the quantum extremal surface prescription [48]. In models for an evaporating black hole in [16] and for an eternal black hole coupled to a bath in [17], these island contributions were derived by including Euclidean gravity path integrals with replica wormholes in the calculation. By applying the equilibrium approximation to these models, we provide a derivation of the replica wormholes introduced in these references, explaining how such Euclidean configurations emerge from Lorentzian time-evolution at late times, and why they lead to answers which are consistent with unitarity constraints. Our discussion also clarifies an issue raised in [16], due to which the authors there suggested that an averaging procedure may be necessary to explain the results from including replica wormholes. We show that no average over theories is needed, and that the issue can be resolved within the framework of the equilibrium approximation. Applied to more general holographic systems, our results predict new bulk geometries that must be summed over in the calculation of Renyi entropies for states with black holes.
We also explore the underlying physical mechanism in the Heisenberg evolution of operators that underlies the emergence of the equilibrium behavior of entanglement entropies. In [43], we showed that for the second Renyi entropy, the random void distribution conjectured for quantum chaotic systems in [49] leads to the right-hand side of (1.3). Using the equilibrium approximation, we show that the behavior of the Renyi entropies can be seen as a special example of a general behavior of operator growth in chaotic systems, and derive a higher-moment generalization of the random void distribution.
The plan of the paper is as follows. In section 2, we develop the equilibrium approx-imation, discuss its justification and universal consequences, and apply it to a variety of equilibrated pure states. In section 3, we apply the approximation to gravity systems with holographic duals, explain how replica wormholes emerge from it, and make comments on the need for averaging based on this derivation. In section 4, we explain how equilibration to infinite temperature can be understood in terms of the random void distribution.
In section 5, we discuss the applicability of the equilibrium approximation to observables other than the Renyi entropies, and mention some open questions.
2 Universal behavior of entanglement entropies in equilibrated pure states In this section, we present an approximation for the entanglement entropies of equilibrated pure states, which can be expressed in a simple, universal form, and applies to a variety of systems. We first explain the physical reasoning behind this approximation and its mathematical structure, and then examine its consequences for a variety of equilibrated pure states.

Equilibrated pure states
Consider a quantum system in some far-from-equilibrium pure state |Ψ 0 at t = 0. At time t, it evolves to where U is the time-evolution operator for the system. For now, we will assume the system is compact, so that there exists a finite equilibration time scale t s such that for t t s , macroscopic properties of |Ψ can be well-approximated by some equilibrium density operator ρ (eq) . 3 We will refer to |Ψ as an equilibrated pure state. 4 5 We can write the equilibrium density matrix ρ eq in the form where I α is an un-normalized density operator, and α is a set of equilibrium parameters such as temperature, chemical potential, and so on, which can be determined from the expectation values of conserved quantities in |Ψ 0 . We require that I α commute with the evolution operator, i.e.
which can be viewed as a requirement for ρ (eq) to be an equilibrium state.
Here are some specific examples of I α : 3 The precise definition of time scale ts will not be important for our purpose. Since we are interested in quantum-informational quantities in chaotic systems, a likely candidate for ts is the scrambling time. In Sec. 2.8, we will discuss uncompact systems. 4 We caution that the fact that |Ψ resembles an equilibrium density operator ρ (eq) macroscopically does not mean that |Ψ Ψ| is close to ρ (eq) by measures like the trace distance. 5 A special situation is when |Ψ0 is an energy eigenstate of a chaotic system, which does not evolve with time, but exhibits thermal behavior as postulated by the eigenstate thermalization hypothesis (ETH) [50,51]. 1. The system has a finite-dimensional Hilbert space, and there is no constraint on accessible states from |Ψ 0 . In this case, the associated equilibrium state ρ eq does not need to be labelled with any parameters α, and I is the identity operator, where d is the dimension of the Hilbert space. Below, we will refer to this case as the infinite temperature case.
2. The system has a time-independent Hamiltonian H with energy eigenstates |n , and |Ψ 0 (and hence |Ψ ) involves only energy eigenstates localized in a narrow energy band I = (E − ∆E, E + ∆E). In this case, 5) where N I is the number of energy eigenstates in the energy band I.

The system has a time-independent
Hamiltonian H, and |Ψ 0 (and hence |Ψ ) involves energy eigenstates with a broader range of energies. In this case, where the inverse temperature β is determined by requiring I β has the same energy For (2.4) and (2.5), I α is a projector, I 2 α = I α , but this is not true for (2.6). In (2.5)-(2.6), one can view I α as an "effective identity operator" defining the accessible part of the Hilbert space, and the partition function Z(α) as the corresponding "effective dimension." It is clear that each choice of I α in (2.4)-(2.6) satisfies the requirement (2.3) of invariance under U .

Renyi entropies as transition amplitudes in a replicated Hilbert space
We are interested in quantum-informational properties of |Ψ at time scales t t s . The n-th Renyi entropy with respect to a subsystem A is given by Recall that in a quantum system with evolution operator U , the transition amplitude from an initial state |Ψ 0 to a final state |Ψ f has the path integral representation where φ(t) collectively denotes the dynamical variables of the system and S[φ(t)] is the corresponding action. 6 Equation (2.8) contains 2n U 's and thus can be written in terms of path integrals over 2n time integration contours, (2.10) Here φ i , φ i are respectively associated with the i-th contours going forward and backward in time, as in Fig. 1. {χ i , χ i } and {ψ i , ψ i } denote respectively the initial and final values of the dynamical fields. ψ iA denotes the value of ψ i restricted to subsystem A, and ψ (n+1)A = ψ 1A . The initial state ρ 0 determines the initial conditions for the integrations through its "wave function" ρ 0 [χ i , χ i ] while the contractions dictated by Tr A and TrĀ determine the final conditions for the integrals. From (2.10), and also intuitively from Fig. 1, we can view (2.8) as a transition amplitude in a new "replica" quantum system consisting of 2n copies of the original system, with an evolution operator given by (U ⊗ U † ) n . 7 This way of viewing Z (A) n will be particularly convenient for our discussion below. To write down the explicit form for Z (A) n in the replica system, let us first introduce some notations.
Suppose H is the Hilbert space of the original system. The Hilbert space of the replica system can be taken to be (H ⊗ H) n . If we use an orthonormal basis {|i } for the first copy of H in each H ⊗ H, then it is convenient to use a basis {|ī } for the second copy, defined as where T is an anti-unitary operator such that T U T −1 = U † . For example, T can be taken to be the time-reversal operator in systems with time-reversal symmetry, or CPT in more general systems. A basis for (H ⊗ H) n can then be written as {|i 1ī 1 i 2ī 2 · · · i nī n }.
For any operator O acting on H, we can define a set of n! states |O, σ ∈ (H ⊗ H) n , where σ is an element of the permutation group S n of n objects, and σ(i) denotes the image of i under σ. The states associated with the identity operator 1 are simply denoted as |σ We note that when H is infinite-dimensional, |σ are not normalizable and should be viewed as formal definitions rather than genuine states in the replica Hilbert space.
In the discussion below, we will often use the following properties of the inner products among these states: where k is the number of cycles in the permutation στ −1 and n s , s = 1, 2, · · · , k, are the lengths of the cycles. For H = H A ⊗ HĀ, the associated replica Hilbert space inherits a tensor product structure, and we can define the corresponding states for each tensor factor.
Using the above notation, we can now write (2.8) as where e is the identity element of S n and η is the element (n, n−1, · · · 1). |η A is in the space (H A ⊗ H A ) n and similarly |eĀ ∈ (HĀ ⊗ HĀ) n . The equivalence between (2.15) and (2.8) can be checked by inserting complete sets of states of H and (H ⊗ H) n respectively in (2.8) and (2.15), and using (2.12)-(2.13).

Proposal for a general equilibrium approximation
Consider the special set of configurations in the path integral (2.10) that satisfy For such configurations, the phase factor in the exponent of (2.10) vanishes identically. 8 Heuristically, one may imagine that for sufficiently late times, the contributions from configurations of φ i ,φ i which do not satisfy (2.16) will generically lead to large oscillations of the integrand in the path integral, so that the integral will evaluate to a small value.
It is thus natural to expect that configurations satisfying (2.16) dominate. An important feature of the configurations (2.16) is that they lead to contributions which are independent of t, which also makes it natural for them to describe the behavior of the system after reaching macroscopic equilibrium at t t s . However, naively setting (2.16) in the path integrals (which also sets χ i = χ σ(i) and ψ i = ψ σ(i) ) leads to divergences, for example even in a simple system of two harmonic oscillators. Physically, the divergences come from the fact that (2.16) includes unphysical configurations of arbitrarily large energies. Mathematically, such a procedure corresponds to replacing (U ⊗U † ) n in (2.15) with a projector onto the set of states |σ , σ ∈ S n associated with the identity operator 1. In a system of an infinite dimensional Hilbert space, |σ is not normalizable.
We will now present a systematic procedure which incorporates the idea of matching the configurations on forward evolutions with those on backward evolutions, but avoids such divergences by restricting to configurations which are accessible to the evolution. Our discussion does not depend on whether the system has a finite or infinite dimensional Hilbert space. Our proposal builds on an important observation in [10], that the Haar average in a finite-dimensional Hilbert space can be seen as a projection into the set of states |σ associated with the identity operator. This observation can be seen as the infinite-temperature case of our discussion.
Let us first introduce some further notation. Consider the set of states in (H ⊗ H) n associated with the effective identity operator I α of (2.2), which satisfy (using (2.14)) I α , τ |I α , σ = Z 2n 1 · · · Z 2n k , I α , σ|I α , σ = Z n 2 , Z n ≡ TrI n α , (2.18) where k is the number of cycles in the permutation στ −1 and n s , s = 1, 2, · · · , k are the lengths of the cycles. For notational simplicity, we have suppressed the α-dependence in Z n , and Z(α) in (2.2) is now referred to as Z 1 . It is convenient to define the metric for the normalized states corresponding to |I α , σ , Note that g τ σ is a symmetric matrix, and from (2.14) it is invariant under simultaneous multiplication of an element λ ∈ S n on σ and τ from the left or the right, as well as under simultaneously taking inverses of σ and τ . Moreover, we can show that g τ σ , the inverse of g τ σ , is also invariant under each of these operations. The projector onto the set of states spanned by {|I α , σ } then has the form One key physical input in our approximation is that due to the invariance (2.3) of I α under the action of U , the states |I α , σ are invariant under the action of the time-evolution operator (U ⊗ U † ) n in the replica Hilbert space, which in turn implies that Then, decomposing the identity on (H ⊗ H) n as where Q is the orthogonal projector of P α , we can rewrite the transition amplitude expression for the n-th Renyi entropy in (2.15) as n,Q .

(2.24)
Our proposal is that for t t s , for a chaotic system with a large effective dimension n,P and can be ignored, so that (2.25) We will refer to this approximation as the equilibrium approximation in the following discussion. In next subsection, we discuss a justification for the approximation. We have thus replaced the time-evolution operator (U ⊗ U † ) n of the replica Hilbert space with the projector onto the set of states spanned by {|I α , σ } in the expression for Z (A) n , which can be seen as a realization of the heuristic idea discussed at the beginning of this subsection. Equation (2.25) is time-independent, which is consistent with the proposal that it captures quantum-informational properties of pure state |Ψ after it has reached a macroscopic equilibrium. 9 We note that each of the steps in equations (2.21)-(2.24) applies to any choice of I α that is invariant under the action of U . The approximation that Z (A) n,Q is negligible should, however, only be valid for I α chosen such that it corresponds to the late-time equilibration of |Ψ 0 .
In particular, we can obtain a self-consistency condition from considering the implication of the approximation (2.25) for n = 1. In this case, equation (2.8) simply reduces to the trace of ρ 0 and we should find Z (A) n = 1. For n = 1, the only permutation is the identity e, and (2.25) then has the form A pure state never really stops evolving and one expects occasional deviations from the "equilibrium values" (2.25). However, for a macroscopic systems such instances should be very rare. See e.g. [55].
where we have used (2.18) and (2.14). For this result to reproduce the normalization of ρ 0 , we thus need Tr(ρ 0 I α ) = Z 2 Z 1 . (2.27) We will impose equation (2.27) as a self-consistency condition for |Ψ 0 to evolve to a state |Ψ which resembles 1 Z 1 I α macroscopically. 10 Let us now use the self-consistency condition (2.27) in our approximation for Z (A) n . First, using (2.14), we find that where we have used that since ρ 0 is a pure state, Equation (2.25) can then be written in a form independent of the initial state, From our usual intuition about statistical systems and the comments below (2.35) about the standard deviation of the Haar average in the infinite temperature case, we expect more generally that to suppress potential contributions from Z (A) n,Q in (2.24), we should always consider the regime that the "effective dimension" Z 1 = Tr I α is large. Since Z n contains only a single trace, we should then have where N is proportional to number of degrees of freedom 11 and is large, and g(β) > 0 is an O(1) monotonically decreasing function of β. From (2.19) we then find that We then obtain our final general expression for the approximation to Z (A) n for an equilibrated pure state:  11 The volume of the system is also included in N .
We will examine the mathematical structure of (2.34) further in subsection 2.5. In Sec. 2.6 we show that while (2.34) is expressed solely in terms of properties of equilibrium density operator I α , the unitarity constraint (1.2) is maintained. The resulting physical properties are discussed in Sec. 2.7-2.8.
The equilibrium approximation for the case where the initial state ρ 0 is a mixed state is discussed in Appendix A.

A justification of the equilibrium approximation
In the infinite-temperature case (2.4) with I = 1, the equilibrium approximation (2.25) yields results identical with those obtained from the Haar average over unitary matrices U acting on the full system, as it can be checked that [10] (U ⊗ U † ) n = P I=1 (2.35) where overline denotes the Haar average. With this interpretation, one can estimate the magnitude of Z n,P ) 2 , then for a randomly chosen time-evolution operator U , Z n,Q has high probability of being small compared with Z (A) n,P , and the approximation is justified for most systems, including those where U comes from a fixed Hamiltonian.
For a general I α , the projection to P α in (2.24) may not emerge from an over average time-evolution operators. As we will discuss later, our results for Z (A) n from the equilibrium approximation for the microcanonical and canonical ensembles agree with previous calculations based on averages over special sets of states [56,57], but in these cases there does not seem to be a clear way of interpreting the averages over states as averages over time-evolution operators from physically relevant Hamiltonians. Moreover, in the case of an infinite-dimensional Hilbert space, there does not exist a canonical averaging procedure over all physical time-evolution operators analogous to the Haar average.
Here we propose a self-consistent criterion for deciding whether (2.25) is a good approximation, which does not depend on whether an average exists. We consider the following quantity where subscript "eq app" denotes that we apply the equilibrium approximation (2.25) to the quantity inside the bracket. We will explain more explicitly in appendix B what is meant by applying the equilibrium approximation to Z n,P , then the approximation is self-consistent. The criterion can also be interpreted as the question of whether the equilibrium approximation is compatible with factorized form of Z n,P , it means that to a good approximation we have We can also extend the self-consistency criterion to higher powers: for the approximation for be valid, we need i.e. the equilibrium approximation is compatible with factorization for any power, In Appendix B, we calculate (2.37) explicitly, and show that ∆ is suppressed by at least a factor Z n,Q can be comparable to or larger than the next-to-leading term in Z (A) n,P . We further show that ∆ m is suppressed by at least a factor Z −1 1 compared with the leading contribution from (Z (A) n,P ) m in the limit of large Z 1 . Through analytic continuation, this leads to the equilibrium approximation for the Renyi entropies, Similarly, the approximation for the von Neumann entropy can be obtained by analytic continuation as . (2.42)

Diagrammatic structure and path integral representation
We now examine more closely the mathematical structure of (2.34). Using (2.12)-(2.13), the inner product in Z (A) n (τ ) can be written more explicitly as In the above expression, it should be understood that all indices i ka , i k b , i ka , i k b with k = 1, ..., n, each of which appears twice, are summed over. Equation (2.43) can be given a diagrammatic representation as in Fig. 2-4. The expression in the parentheses of the first line corresponds to the "future conditions" indicated in Fig. 2 (a), where the Kronecker deltas between indices in A are indicated with dashed lines, while those inĀ are indicated with solid lines. These future conditions are independent of τ , while the τ -dependent factors in the second line of (2.43) correspond to how the indices should be connected to each other in the interior of the diagram. We connect each i ka to i τ (k)a with a dashed line, and each i k b to i τ (k) b with a solid line, and read off a factor of i ka i k b |I α |i τ (k)a i τ (k) b from each such interior connection to obtain (2.43). An example of an interior connection is shown in Fig. 2(b), and some examples of diagrams associated with different τ are given in Fig. 3-4. n (τ ), coming from the factor in the first line of (2.43), for n = 6. In (b), we show an example of how to connect indices in the interior of the diagram for a permutation τ such that τ (1) = 3, and the factor of i 1a i 1 b |I α |i 3a i 3 b that comes from this interior connection.   In particular, each loop of solid lines in a diagram for a given permutation corresponds to a trace inĀ, and each loop of dashed lines to a trace in A. Qualitatively, we expect that a trace in A (Ā) should yield a factor which is of the order of the effective dimension of subsystem A (Ā). The number of A loops in the diagram associated with τ according to our prescription above is given by k(η −1 τ ), while the number ofĀ loops is k(τ ), where 1 > a 1 2 > ... > a 1 n 1 ), the cycle representation of the corresponding planar permutation is given by (a 1 1 , a 1 2 , ..., a 1 n 1 ) (a 2 1 , a 2 2 , ..., a 2 n 2 ) ... (a k 1 , a k 2 , ..., a k n k ). Fig. 4 contains some examples of nonplanar diagrams, corresponding to ⌧ that do not saturate (2.40).
Let us make some further observations on the structure of Z (A) n (⌧ ) which will be useful in the later discussion. (2.38) can be further simplified as Consider a partition of {1, 2, · · · n} and any four elements a < b < c < d. The partition is non-crossing if whenever a, c are in the same group and b, d are in the same group, the two groups coincide. k(σ) denotes the number of cycles for a permutation σ. Thus we expect that where d A , dĀ denote the effective dimension of subsystems A andĀ respectively. Note that one should view (2.44) as a heuristic equation, as in general (for example for both (2.5) and (2.6)) there is no precise definition of effective dimensions for A andĀ. Note that for any τ , We can understand this diagrammatically. Let us slightly redraw each of the diagrams in We then get diagrams similar to 't Hooft's double-line diagrams for large N matrix field theories [58]. With each such diagram, we can associate a polygon by replacing the double lines with single lines. If polygon can be drawn without crossing lines on a surface of minimum genus h, then the total number of loops in the double-line diagram is equal to the number of faces F of the polygon on this surface. The total number of loops in the original diagrams in Fig. 3-4 is one less than this, so where E is the number of edges of the polygon and V is the number of vertices, and we have used the theorem relating Euler's characteristic to F , E and V . But for all diagrams we consider, the number of vertices is 4n and the number of edges is 3n, and hence The largest value of k(η −1 τ ) + k(τ ) is thus n + 1, corresponding to planar diagrams. Two immediate examples of permutations that correspond to planar diagrams are τ = e and τ = η. More generally, there is a one-to-one correspondence between such planar permutations and "non-crossing partitions" of n elements. 12 Given a non-crossing partition where the elements of each subset in the partition are arranged in descending order (e.g. a 1 1 > a 1 2 > ... > a 1 n 1 ), we obtain a planar permutation with the cycle representation given by (a 1 1 , a 1 2 , ..., a 1 n 1 ) (a 2 1 , a 2 2 , ..., a 2 n 2 ) ... (a k 1 , a k 2 , ..., a k n k ). Equation (2.43) can be further written as In terms of path integrals, the approximation (2.49) corresponds to replacing the second line of (2.10) by a sum of Euclidean path integrals, each of which involves n copies of that for I α "coupled" together in a certain way specified by permutation τ . More explicitly, we now have is the Euclidean action for I α , i.e. Fig. 6 for an illustration. Since our approximation arises from projecting to a subspace of states in the replica theory (with Hilbert space (H ⊗ H) n ), it corresponds to isolating a subset of configurations from the Lorentzian path integrals (2.10), which can further be given a Euclidean formulation. We stress, however, that these Euclidean path integrals do not arise from analytically continuing the Lorentzian path integral. Also note that while the Lorentzian replica space has 2n copies of the original system, the Euclidean replica space in (2.50)-(2.51) has only n copies. Figure 6. Examples of the path integral representation of Z (A) n (τ ) in (2.51) for n = 3, in the case where I α = e −βH and the system is (1+1)-dimensional. In the i-th replica, the final state is labelled by ψ i (which we indicated by 1, 2, 3 in the figure) while the initial state is labelled by ψ i (which is indicated by 1 , 2 , 3 ). The shaded regions represent Euclidean path integrals from t = 0 to t = β between the initial and final states. For each τ , the final states ψ iA are identified with the initial states ψ τ −1 η(i)A , while the final states ψ iĀ are identified with the initial states ψ τ −1 (i)Ā , as indicated with the arrows in each case.
To conclude this subsection, we make some further observations on the structure of (2.49) which will be useful in the later discussion. Since i ka , i k b are dummy indices, that is, the inner product is invariant under independent left multiplications for A andĀ. The inner product is also invariant under a simultaneous right multiplication for A andĀ by the same permutation, that is, which simply corresponds to a reordering of the n factors the product (2.48).

Unitarity
We now examine the physical consequences of (2.34) further. It is first useful to note that two terms in the sum over τ in (2.34) have a simple physical interpretation where S (A,eq) n and (Ā,eq) n are respectively the n-th Renyi entropy with respect to A andĀ of the equilibrium density operator ρ eq . These two contributions are represented diagrammatically in Fig. 3(a) and Fig. 3(b). We can then write (2.34) as The first two terms in (2.57) are together manifestly symmetric under A ↔Ā. We now show thatZ (A) n is also invariant under A ↔Ā so that the full expression satisfies the constraint that must be obeyed in a pure state. For this purpose we writeZ where σ is a permutation satisfying σησ −1 = η −1 , which always exists as η and η −1 are in the same conjugacy class (and is in general non-unique). Then note that where in the first and third equalities we have used (2.53)-(2.54) repeatedly. Equation (2.59) can then be written as If the time-evolved state ρ = |Ψ Ψ| is pure, we should also have Tr ρ n = 1. Let us see how this is realized under our approximation (2.25). Following arguments exactly parallel those which led to (2.25) and further to (2.34), we obtain the following approximation where k = k(τ η −1 ), and n 1 , · · · n k , are the lengths of the cycles in τ η −1 . (2.62) can also be given a diagrammatic representation, as explained in figure 7. From (2.31), the dominant term is given by τ = η shown in Fig. 7(b), leading to Figure 7.

Universal behavior of Renyi entropies for equilibrated pure states
Before applying (2.34) to different classes of systems, here we make some general comments on its implications in various regimes: 1. Suppose A Ā (and hence d A dĀ). Then, from the argument around (2.44), the term with the maximal number ofĀ-loops, which corresponds to τ = e, should dominate in (2.34). In this case, from (2.55) we have (2.64) Similarly, whenĀ A (and hence dĀ d A ), the term with the maximal number of A-loops, which corresponds to τ = η, should dominate. In this case, from (2.56) (2.65) Thus, when one of the effective dimensions of A andĀ is much larger than the other, we have where S (A,eq) n and S (Ā,eq) n are respectively the equilibrium Renyi entropies with respect to A andĀ of ρ eq . Analytically continuing (2.66) to n = 1, one then finds that for the von Neumann entropy where S (A,eq) and S (Ā,eq) are respectively the "equilibrium entropy" for subsystems A andĀ for the system in the equilibrium state ρ eq .
2. In the regime where d A and dĀ are both large and comparable in size, from (2.44), the leading terms in (2.34) come from those τ 's which saturate (2.45), that is, from planar diagrams like in Fig. 3.
3. In the infinite temperature case, In general, I α does not factorize between A andĀ. Nevertheless, as we will see more explicitly below and in Sec. 3, for various situations of physical interest, one can have an approximate factorization Note that here we have allowed the parameters α to be different for A andĀ, which can happen if A andĀ interact only for a finite period of time. Below for notational simplicity, we will simply write I αĀ as I A , IĀ. Define for any integer m and (2.49) can be expressed as where k is the number of cycles of τ with n 1 , · · · n k the lengths of the corresponding cycles, and l is the number of cycles of τ η −1 with m 1 , · · · m l the lengths of the corresponding cycles.
We now consider more specifically the examples of I α discussed in (2.4)-(2.6).

Infinite temperature
Let us first consider the case (2.4). From (2.34) we have where the coefficients N (n, k) are the number of non-crossing partitions of n objects with k blocks, and are known as the Narayana numbers In the second line of (2.73), we have also made explicit that N (n, 1) = 1 (this comes from τ = η), and N (n, n) = 1 (from τ = e). The von Neumann entropy can be obtained by analytically continuing (2.73) to general real values of n and using (2.42). For this purpose, we note that (2.73) can be written as which can be continued to general n. The derivative with respect to n can be found by expanding the hypergeometric function 2 F 1 (a, b; c; z) as a power series of z. Since the power series is convergent for |z| ≤ 1, we should use the first expression in (2.75) for d A < dĀ, and the second one for d A > dĀ. We then find that This agrees with the result from Haar averages [3]. As discussed below (2.35), subleading corrections to (2.73) and (2.76) beyond the limit of large d will likely not be universal.

Microcanonical ensemble
Now we consider the case of the microcanonical ensemble (2.5). We expect the result derived below should also apply to a single energy eigenstate of a chaotic system (i.e. to systems satisfying the eigenstate thermalization hypothesis). In this case, I E is a projector, Z n = N I for all n. For |Ψ 0 lying in the subspace defined by the projector I E , equation (2.28) is exactly satisfied.
We can write the Hamiltonian of the system as where H AĀ denotes the interactions between A andĀ, and H A , HĀ only involve respectively degrees of freedom of subsystems A andĀ. Let us suppose H is local. Then with sufficiently large subsystems A,Ā, the contribution of H AĀ to the energy is much smaller than those of H A , HĀ in macroscopic states whose energies are proportional to the volume of the system. When A Ā orĀ A, we again have (2.64)-(2.67). Then using the standard argument of statistical mechanics, we can write Z (A,eq) n and Z (Ā,eq) n more explicitly as where the inverse temperature β is determined from the density of states as β = d log N I dE . Let us now consider the situation in which A andĀ are comparable in size. More explicitly, suppose the total system has volume V and V A /V = c < 1, with V → ∞. Using again the fact that the contribution of H AĀ to the energy is small, we can approximate the projector I E as where |n A , |m Ā are respectively eigenstates of H A and HĀ with energies E A n and EĀ m . We then have where E runs over the allowed values of energy in A that can be consistent with total energy E, d A E is the dimension of the subspace of A with energy E, and dĀ E−E is the dimension of the subspace ofĀ with energy E − E. Let us write 1−c . Equation (2.81) can then be written as (k 1 = k(τ η −1 ) and k 2 = k(τ )) The sum over A can now be performed by a saddle point approximation with the saddle point¯ A satisfying the equation Note that¯ A depends on k 1 , k 2 . We then find that To proceed further, let us take the system to be homogenous, so D A ( ) = DĀ( ) = f ( ). We will further take f ( ) to be described by a power law, i.e. f ( ) = C α for some exponent α. Conventional statistical systems have α < 1 and we will restrict to this case 14 , where we find that the dominant contribution to (2.85) comes from τ = e (or equivalently k 1 = 1, k 2 = n) when c < 1/2, and from τ = η (or k 1 = n, k 2 = 1) when c > 1/2. For c = 1/2, the τ = e and τ = η contributions are equal and both are dominant.
Note that equation (2.81) can also be obtained by averaging uniformly over all pure states in the subspace I, that is, by taking a fixed |ψ 0 ∈ I and averaging the value of Z (A) n over states U |ψ 0 , where U is a Haar-random unitary matrix acting within the subspace of energy E. Equivalently, it can be obtained from an average over the "ergodic bipartition" states described in [57].

Canonical ensemble
Let us now consider the situation where the effective identity operator I α is given by (2.6). Note that the result (2.34) with this value of I α can also be obtained by further manipulation of the average over random "canonical thermal pure quantum states" in [56].
Let us now consider the result in special regimes. When A Ā orĀ A, we again have (2.64)-(2.67). Let us now consider the case where A andĀ are comparable, that is, the total system has volume V and V A /V = c < 1, with V → ∞. With the Hamiltonian (2.77) and assuming local interactions, we can approximate, within the traces appearing in various quantities, that I β has a factorized form 15 , (2.86) Then Z (A) n is given by (2.70), and the quantities appearing in (2.70) have the form Comparing with (2.85), we see that while for A Ā , the Renyi entropies corresponding to the microcanonical and canonical ensembles have the same form, they differ in the regime where V A /V is finite. For a homogeneous system, from (2.32), the partition functions for A andĀ subsystems can be written as We then haveẐ To proceed further let us take g(β) to be a power law, that is, g(β) = λβ −α with α > 0. We then find that the τ = e term is dominant in this expression for c < 1/2, the τ = η term is dominant for c > 1/2, and for c = 1/2, both τ = e and τ = η give equal contributions, which are dominant. 14 α = 1 is the so-called Hagedorn spectrum, while α > 1 does not correspond to a stable equilibrium as the system has a negative specific heat. 15 One should not view the equation below as an operator equation, rather as a relation which holds within matrix elements among states who energies are proportional to the volume of the system.

Uncompact systems and subregion equilibration
In our discussion above, we have assumed that the whole system thermalizes after some finite time scale t s . For an uncompact system, such a time scale does not exist. Nevertheless, at a finite time t, subregions of certain finite sizes can thermalize, and we can apply the approximation of Sec. 2.3-2.5 to such subregions.
As an illustration, we consider an infinite (1+1)-dimensional system, which can be a spin chain or a quantum field theory. We assume for simplicity that the system is governed by a local Hamiltonian which results in a sharp light-cone, with speed c = 1. Suppose we are interested in the entanglement of a finite region A with its complementĀ at time t, as indicated in Fig. 8. Due to the causality constraint from the sharp light-cone, the region which is relevant for this purpose is J(A), the region at t = 0 which is causally connected with A. Evolution of the system in J(A) should not be relevant for finding S (A) n or S (Ā) n , and we can replace the time-evolution operator in this part of the system with the identity. See Appendix C for a more explicit argument.
Let us further suppose that the system is sufficiently strongly interacting and chaotic, such that at time t, the system is locally equilibrated in region J(A), i.e. the equilibration is maximally efficient as allowed by causality. In this case, we can apply (2.34), treating J(A) as the full system, and the region J(A) − A of length 2t as the complement of A. From the discussion of item 1 in Sec. 2.7 we immediately conclude that 16 where s eq n is the n-th equilibrium Renyi entropy density (n = 1 being the equilibrium entropy density). Here the entanglement velocity is given by v E = c = 1. Since it is expected on general grounds [59,60] that v E ≤ v B , where v B ≤ 1 is the velocity associated with the growth of operators, (2.90) implies that we must also have v B = 1, corresponding to the fact that operators grow at the fastest speed allowed by causality. Hence, we can see the maximally fast growth of operators as a necessary condition for the assumption of maximally efficient equilibration that we made above.

Gravity systems and replica wormholes
The equilibrium approximation discussed in the last section can be applied to gravity systems, with the assumption that they follow the usual rules of quantum mechanics. In this context, various quantities in (2.34) or (2.49) should be seen as amplitudes in an exact theory of quantum gravity. In particular, the Euclidean path integrals (2.50) emerge universally as an approximation to the Lorentzian path integral (2.10). Furthermore, different Euclidean replica gravity systems have to be "coupled" in specific ways. However, in our current understanding of quantum gravity, gravity path integrals can only be formulated at a semi-classical level, and hence a direct implementation of the prescription (2.51) may be subtle. For holographic systems, the amplitudes in equations (2.34) and (2.51) also have 16 When t ≈ |A|/2, the expression could be more complicated as other planar permutations besides τ = e and τ = η may become significant. a dual description in terms of the corresponding ones in the boundary system. Here, one has the benefit that the boundary version of path integrals (2.51) can be used to provide boundary conditions for formulating the corresponding bulk ones by using the standard rules of holography.
Intuitively, couplings among different replicas could lead to replica wormholes, that is, geometries connecting different replica manifolds. In this section, we will make this idea precise by applying the equilibrium approximation to two recently discussed models of black holes [16,17], and showing that the prescriptions proposed there for including certain replica wormholes in the calculation of entanglement entropies follow from (2.34) and (2.51). The earlier discussion of Sec. 2.6 then provides an explanation for why including replica wormholes leads to entanglement entropies that are consistent with unitarity. We will also comment on how the equilibrium approximation provides an alternative to the need for an averaged description discussed in [16], and briefly discuss the Renyi entropies for a big black hole in AdS formed from gravitational collapse of a pure state.

A model for black hole evaporation
Let us first briefly review the model of an evaporating black hole discussed in [16], where the black hole lives in a (1+1)-D spacetime with JT gravity and has an end-of-the-world (EOW) brane behind the horizon, see Fig. 9. The state |Ψ of the full system resulting  from the evaporation process is assumed to be such that if |i is an orthonormal basis of N states for the radiation subsystem R, then the matrix element (ρ R ) ij of the reduced density matrix for R can be calculated using Euclidean path integrals with the following rules, shown in Fig. 10. The boundary condition is given by a single open asymptotic boundary segment of JT gravity of length β for some inverse temperature β associated with the state. The endpoints of the segment are labelled by i and j, as shown in Fig. 10(a). In the corresponding bulk path integral in Fig. 10(b), the two endpoints are connected with an EOW brane. In addition to a gravity path integral indicated by the shaded region, this gives a factor of δ ij , indicated by the dotted line connecting i and j. We will now describe the evaluation of the quantities Z (A) n according to the equilibrium approximation in a more general class of quantum-mechanical systems related to the above model. We will then show that applying the standard rules of holography to the Euclidean path integrals in the resulting expression gives a derivation of the replica wormholes introduced with the ad hoc rules above. The final result matches precisely with that of [16].
Let us consider a situation where the initial state |Ψ 0 in (2.1) describes a star, which under time-evolution collapses to form a black hole and subsequently emits Hawking radiation. The full system at time t, described by the state |Ψ , consists of the black hole and the emitted radiation subsystem. The radiation subsystem has a Hilbert space of finite dimension N with no energy constraint, while the black hole can be associated with an inverse temperature β. We assume that the radiation separates from and no longer interacts with the black hole after being emitted. We can then write the effective identity operator I α corresponding to |Ψ in a factorized form where R and B denote respectively the radiation and black hole subsystems.
For an evaporating black hole, the Hamiltonian of the system cannot be strictly timeindependent. As a result, (3.1) may not strictly satisfy (2.21). However, (2.21) should still be valid to a very good approximation if the evaporation process happens slowly.
For comparison with the discussion of [16], we will assume that the black hole subsystem resembles that in Jackiw-Teitelboim (JT) gravity (or the SYK model at a sufficiently low temperature). That is, it has a large number of densely spaced states in the energy range accessible at inverse temperature β, such that where the "background" density of states e S 0 is large, and z m (β) are O(1) functions. We will be interested in the regime Applying (2.70) to (3.1) we find for the radiation subsystem Since the summand in (3.4) only depends on the cycle structure of τ , we can write the leading planar contribution as For k = 1, we only have one block consisting of all n elements, and N ({m i }) = 1. So far, our discussion is general and applies to any system with I α as in (3.1) and Z (B) m as in (3.2). The specific gravity description enters in the explicit evaluation of various partition functions Z (B) m (β) (or equivalently z m (β)), which can be expressed in terms of Euclidean gravity path integrals and evaluated using a saddle-point approximation. Let us now specify to the gravity system considered in [16]. This motivates us to write  [16]. 17 The contributions from terms corresponding to different τ in (3.4) saturating (2.45) can be captured precisely by the planar gravity diagrams of [16] like the ones shown in Fig. 11(b) and (c), which correspond respectively to τ = e and τ = η. We can show more explicitly that (3.6)-(3.7) agree precisely with the results obtained in the limit (3.3) in [16]. 18 In [16], the full expression for Z  [16] precisely has this structure. 18 Away from the limit (3.3), in the equilibrium approximation we get corrections from τ that do not through a generating functional called the resolvent, which can be used to obtain a recursion relation for Z (R) n . More explicitly, the trace of equation (2.27) of [16] can be written in our notation as (3.9) Equating coefficients of 1/λ n+1 on both sides, we find a recursion relation (3.10) One can check that (3.6)-(3.7) indeed satisfy (3.10). 19

Comments on averaging and replica wormholes
In the previous subsection, we demonstrated how the Euclidean gravity prescription for computing the Renyi entropies of an evaporating black hole can emerge as an approximation to the Lorentzian path integrals (2.10). One important implication of the discussion is that replica wormholes can arise in a system with a fixed Hamiltonian, and it not necessary to have an ensemble-averaged theory. We now further clarify an issue raised in [16], which was used there to interpret replica wormholes as arising from some averaging procedure.
Let us first recapitulate the issue. Consider a matrix element of the reduced density matrix for the radiation system (here |i denotes a basis for the R subsystem), Applying the same procedure as in (2.24) to (3.11), we find where ρ (eq) = 1 Z 1 I α and ∆ ij is the contribution from the Q projector. Using (3.1) and dropping ∆ ij , we obtain the equilibrium approximation for these matrix elements, which would imply (3.14) But (3.14) clearly contradicts (3.6). For example, for n = 2, (3.6) gives .
(3.15) saturate (2.45), and also from higher-order terms in the metric gστ , which can all be evaluated using Euclidean gravity path integrals. 19 We have checked up to n = 12.
Equations (3.14) and (3.15) are compatible only when N e S 0 , but the derivation of (3.13) uses only Z 1 ∼ N e S 0 1 and, in particular, does not need to assume any relative magnitude of e S 0 and N .
In [16], the same apparent disagreement was observed, and it was pointed out that (3.13) and (3.15) can be compatible if the Euclidean gravity prescription for computing them is interpreted as an average over an ensemble of theories, so that there is a difference between the averages (ρ R ) ij (ρ R ) * ij and |(ρ R ) ij | 2 . We now show that the conflict between (3.13) and (3.15) can be naturally resolved using the equilibrium approximation interpretation of the Euclidean gravity prescription, without the need for any averages.
For this purpose, let us estimate the term ∆ ij we dropped in reaching (3.13) using the analogous procedure to (2.37), which is computed in Appendix B. The results are Comparing (3.16) with (3.13), we see that ∆ ij is suppressed by a factor e − 1 2 S 0 compared with the leading order contribution (3.13). So the approximation (3.13) appears to be justified in the limit e S 0 → ∞. This still does not say anything about the relative magnitude between N and e S 0 , so the tension between (3.14) and (3.15) remains. But notice that (3.15). In (3.17), we have used (3.13) for the diagonal terms, and the second equation of (3.16) for the off-diagonal terms.
We thus see that although the off-diagonal elements |∆ ij | are higher order in e −S 0 compared with the diagonal elements, there are many more of them (O(N 2 )) than the number N of the diagonal terms. So Z (R) 2 can receive significant contributions from these off-diagonal terms outside the regime N e S 0 , and hence the corrections to the equilibrium approximation for the matrix elements are important while estimating Z (R) 2 . Since the first equation of (3.16) vanishes for off-diagonal elements, we conclude that the off-diagonal elements ∆ ij must be complex and likely time-dependent.
Our explanation here is consistent with an idea discussed in [16], that an average over time rather than an average over theories may also explain the disagreement between (3.13) and (3.15), as the equilibrium approximation should agree with an average over time at late times.
We would like to emphasize that the conceptual picture obtained here is different from certain possibilities proposed in [16]. It was suggested there that bulk geometry may only provide "an effective, coarse-grained description" of some "different, more fundamental degrees of freedom" that correspond to the boundary theory, and that such non-geometric degrees of freedom may need to be added to calculate quantities like (3.13) and (3.15) to an accuracy that allows us to avoid the apparent disagreement. Here, we emphasize that the conflict arises from dropping ∆ ij in (3.12), which is equivalent to approximating these quantities with Euclidean path integrals like in (2.50). It therefore arises even before we use semiclassical gravity to evaluate these Euclidean path integrals, and is thus not linked to using semiclassical gravity. In particular, this means that the approximation can in principle be improved within the framework of semiclassical gravity, if we are able to perform a Lorentzian rather than Euclidean calculation of these quantities.
For a different perspective on how replica wormholes can emerge without the need for ensemble averaging, see [30].

A model for an eternal black hole coupled to a bath
We now consider a quantum-mechanical system that corresponds to the gravitational system discussed in [17] (see also [14,15,19,62]), which consists of an eternal black hole in AdS 2 coupled to a flat (1+1)-dimensional bath system with speed of light c = 1, see Fig. 13(a). In the quantum-mechanical system, shown in Fig. 13(b), the eternal black hole is replaced by a boundary dual. Note that the bath system remains the same in two descriptions. The whole system is initially put in a pure state which does not have any entanglement between the black hole and bath subsystems. 20 We also assume the interactions between the black hole and the bath are local.  Figure 13. (a) shows the Penrose diagram for an eternal black hole in AdS 2 coupled to a bath, the system discussed in [17]. The shaded region coupled to JT gravity corresponds to the black hole, while the unshaded region with flat Minkowski space corresponds to the bath. (b) shows a quantum-mechanical system dual to this gravitational system. In this dual theory, the black hole is a (0+1)-dimensional system while the bath is (1+1)-dimensional. At time t, only the part of the time-evolution operator in the region J(B) is relevant for finding the entanglement entropies between the black hole and the bath.

AdS Flat Flat
The full system is uncompact with local interactions, as in the case discussed in Sec. 2.8. We can therefore apply the discussion of that subsection to the current context, replacing the subsystem A there by the subsystem describing the eternal black hole. In particular, for the purpose of studying entanglement between the black hole and the bath at some time t, it is enough to consider the finite region J(B) around the black hole determined by causality, as indicated in Fig. 13(b). When the bath is maximally efficient in thermalization as in our discussion of Sec. 2.8, we can immediately write down the Renyi entropies for either subsystem as a function of time, where S (BH) n are the thermal entropies for the black hole, and s eq n is the equilibrium entropy density for the bath. The entropies increase linearly and then saturate at t n = S (BH) n /s eq n . The expression is valid for t not close to t n , and agrees with the results for the entanglement entropy in [17].
Using the duality between the black hole and its boundary description, we can understand the two contributions in (3.18) more explicitly from the gravity perspective. For this purpose it is convenient to start with the path integral representation (2.51) for the boundary description of Fig. 13(b), with A = bath andĀ = BH. Then the linearly increasing contribution to (3.18) comes from the configuration in Fig. 6(a) with τ = e, while the saturation value for t > t n comes from the configuration in Fig. 6(b) with τ = η. For τ = e, one traces over the black hole subsystem within each replica copy. On the gravity side this corresponds to the standard evaluation of the black hole partition function using the Euclidean black hole geometry. For τ = η, the black hole subsystems for different replica copies are now connected, and on the gravity side this requires the introduction of replica wormholes. The contributions from other values of τ , such as the example in Fig. 6(c), involve other types of replica wormholes, and are only relevant in a relatively short time interval around the transition time t n between linear growth and saturation.
Our assumption that the bath is maximally efficient in thermalization is not important for obtaining the qualitative features of the results here. The specific physical nature of the bath system may affect the specific form of S (bath) n (t), and the time scale for saturation. However, it will not change the fact that entanglement entropies will saturate at 2S (BH) n due to the contribution from Fig. 6(b). In particular, the bath can in principle be a free theory instead of a chaotic system that leads to rapid thermalization, like one of the toy models considered in [43].

Unitarity of Renyi entropies in more general holographic systems
In the previous subsections, we considered situations where only one of the subsystems has a gravity description, and the path integral involving that subsystem could be turned into a gravity calculation involving replica wormholes. However, the unintuitive couplings between replicas should appear in more general contexts. Consider an initial pure state which subsequently undergoes gravitational collapse to form a big black hole in AdS d+1 , which is stable and does not evaporate. In the boundary language, the system settles into an equilibrated pure state corresponding to the black hole. The von Neumann and Renyi entropies of this final state should satisfy the unitarity constraint (1.2).
In holography, the von Neumann entropy S for a subregion A is found from the area of the HRT surface [63,64]. For a black hole formed from collapse, the HRT surface forĀ is the same as that for A, and thus the unitarity constraint S is automatically satisfied [65].
For the Renyi entropies, it was less well-understood how unitarity is maintained. From our discussion of Sec. 2.7, the Renyi entropies of a subsystem in the boundary field theory can be obtained by the path integrals in Fig. 6, and correspondingly the in the bulk calculation, the Renyi entropies can be obtained from certain Euclidean black hole geometries, despite the fact that a black hole from collapse does not have a Euclidean analytic continuation. However, the boundary conditions for these gravity path integrals include the unconventional ones specified by permutations τ , as indicated for example in Fig. 6 (b)-(c). From Sec. 2.7, at leading order in the large N expansion, one should consider two types of bulk geometries, one for the boundary conditions with τ = e and one for those with τ = η in Fig. 6 (a) and (b). Bulk manifolds with more exotic boundary conditions such as those in Fig. 6 (c) provide subdominant corrections which are exponentially suppressed in the large N limit.

Typicality and the random void distribution
In the discussion of the previous sections, we assumed that the time-evolution operator U can take a system from a far-from-equilibrium state to an equilibrated pure state, and furthermore that U is such that the contribution from Z (A) n,Q in (2.24) can be neglected. For a finite-dimensional system at infinite temperature, the approximation yields the same results as those obtained from the Haar-random averages of the quantities Z n,Q may be viewed as a dynamical criterion for the evolution of a system towards typical states in such systems. However, since we expect on general grounds that evolution to typicality should take place in chaotic systems, it would be good to understand more directly which aspects of chaos are responsible for it. In this section, we will offer some suggestions from the perspective of operator growth.
We conjectured in [49] that one general feature of operator evolution in chaotic systems at late times is the form of the probability of "void formation" in them, which we referred to as the random void distribution. In [43], based on studies of the second Renyi entropy, we argued that typicality is a direct consequence of the random void distribution.
In this section, we generalize the notion of the random void distribution to higher moments, and show that the behavior of all higher Renyi entropies of an equilibrated pure state can be seen as special cases of the generalized random void distribution. Below, we first review the notion of void formation in operator growth, and discuss its relevance for typicality. For simplicity, we will only consider a finite-dimensional Hilbert space with no energy constraint (that is, the infinite temperature case).

Random void distribution and typicality
Consider the Heisenberg evolution O(t) of an initial operator O. Since the identity operator does not evolve, in the discussion below we always assume that O does not have an identity part, i.e. Tr O = 0. With respect to a subsystem S, we can decompose O(t) as a sum of two parts, where 1 S is the identity operator for subsystem S andÕS is some operator inS. We refer to the presence of O (1) (t) in O(t) as void formation in the subsystem S. Using the following inner product between any two operators A and B, where d is the dimension of the full Hilbert space, we can define the weight (or "probability") that an operator O forms a void in the subsystem S at time t as Moreover, based on studies in random unitary circuits, it was conjectured in [49] that in a chaotic system, for a generic traceless initial operator O, at sufficiently late times the probability P which was referred to as the random void distribution. We note that (4.4) should apply essentially to all traceless operators, for example, in a spin chain, to local operators, basis operators which cover a finite region, and superpositions of non-trivial basis operators. We now show that if we assume (4.4) applies to the non-identity part of a density matrix for a pure state we obtain (2.73) for n = 2. The discussion below can be seen as a model-independent version of the argument in [43] for the derivation of the Page curve of black hole evaporation using the random void distribution. For this purpose, we decompose the initial density matrix as Given the initial state is pure, where the final statement is true in the large d limit. To find the reduced density matrix for a subsystem A, we further decomposeρ(t) ≡ Uρ 0 U † aŝ for some O A in subsystem A. Tracelessness ofρ implies that Tr A O A = 0. It then follows that the reduced density matrix for A has the form and the second Rényi entropy for ρ A (t) can be written as Now assuming the random void distribution (4.4) for operator O =ρ 0 and the subsystem S =Ā, we have where we have used (4.6). We thus find Equation (1.3) with n = 2 then follows immediately. The first term in (4.11) is the contribution from the identity, and the second term comes from processes of void formation inĀ under the action of U .

Higher moments of the random void distribution
Just like the behavior of the second Renyi entropy may be considered a direct consequence of the random void distribution, our discussion of the Renyi entropies in Sec. 2.7 can be considered a special case of the random void distribution for higher moments of general operators. More explicitly, consider the generalization of (4.3) to higher n: The n-th Renyi entropy for A corresponds to taking O = ρ and S =Ā. The general approximation scheme we developed in Sec. 2.7 can be used to find (4.12), assuming there exists a time scale such that the quantity P where we have separated the contribution from τ = e and τ = η explicitly. Recall that τ |O, e = Tr O n 1 · · · Tr O n k (4.14) where k = k(τ ) and n i are the lengths of the cycles of τ . If O is traceless, Heuristically, (4.4) and (4.16) suggest that the operator has become uniformly spread throughout the system, so that the probability that it is localized within any small subsystem is exponentially suppressed in the number of degrees of freedom in that subsystem.
Setting O in (4.13) to be ρ 0 and S =Ā, we then recover (2.73). Setting in (4.15) O to beρ 0 (i.e. the traceless part of ρ) and S =Ā, we obtain a generalization of (4.10) to higher n (in the regime of (2.72)) whereÑ (n, p) is the number of non-crossing partitions of n objects into p blocks such that each block has more than one element.Ñ (n, p) are called the Riordan numbers 21 . The consistency of (4.17) with (2.73) leads to a nice relation between the Riordan and Narayan numbers It is tempting to conjecture that (4.15)-(4.16) apply to a general chaotic system. In Appendix D we show that they hold in the local random unitary circuits of [52,53].

Conclusions and discussion
In this paper, we developed an approximation to calculate the entanglement entropies of an equilibrated pure state. The resulting expressions can be written solely in terms of the partition functions and thermodynamic entropies of the equilibrium density operator ρ (eq) , but at the same time are compatible with unitarity. One immediate implication is that a set of Euclidean path integrals for the equilibrium density operator emerge universally as an approximation to the Lorentzian path integrals for Renyi entropies, with a variety of boundary conditions specified by different permutations of the replica systems. We introduced a criterion for checking that the approximation is self-consistent, which at the same time provides an estimate of the contribution Z (A) n,Q we neglected. We also extracted the universal behavior of the entanglement entropies for various classes of equilibrated pure states.
Applied to two recently discussed models of black holes [16,17], the equilibrium approximation leads to a derivation of the prescriptions proposed in these papers for including replica wormholes in the calculation of entanglement entropies, and provides a general explanation for why such a prescription leads to results compatible with unitarity. Replica wormholes are thus one manifestation of a universal structure which appears in a large variety of thermalizing systems, including quantum-mechanical systems and quantum field theories without holographic duals. Our derivation can be used to see when and how replica wormholes should be included in more general gravity theories, and in particular it shows that they can arise in systems with a fixed Hamiltonian, without any need for an ensemble average.
We further discussed a mechanism for equilibration in the infinite-temperature case from the perspective of operator growth. The underlying property of operator growth, called the random void distribution, can be seen as a more direct manifestation of quantum chaos than the assumptions that went into the equilibrium approximation.
One important open question for the future is to understand better the role of the contribution Z (A) n,Q which we neglected, and to develop a further approximation scheme to capture its effects (see [10] for a discussion in the infinite-temperature case). A related question is about the time scales for which our equilibrium approximation should be valid. We expect it to be valid for time scales much longer than the thermalization scale t s , but perhaps not at very long time scales at which large fluctuations could become relevant.
It is also interesting to consider for which other observables the approximation is expected to work well, and for which observables it is expected to fail. Let us discuss two examples. For correlation functions of the form if the smallest time t m is much greater than t s , then we can apply the equilibrium approximation, which gives the equilibrium correlation functions associated with ρ (eq) (see Appendix B for details): Using an analogous criterion to the one discussed in Sec. 2.4, we show in Appendix B that (5.2) is valid provided that the minimal subsystem is contained is much smaller than its complementĀ. This is also consistent with what we expect based on the behavior of Renyi entropies in this regime (recall (2.66)-(2.67)). As a final example, let us consider the quantity where in the second equality we have expressed the quantity in the replica system, withTr denoting the trace in (H ⊗ H) n . This is the nth power of the spectral form factor studied in [67]. This quantity does not correspond to the equilibration of a far-from-equilibrium initial state or have an equilibrium value. We therefore intuitively expect that the equilibrium approximation should not make sense for (5.3). Indeed, for any choice of I α , on applying the equilibrium approximation to the above expression, we get Z ≈ n!, which does not satisfy the self-consistency criterion of Sec. 2.4. 22 Based on studies in random matrix theory and the SYK model, the spectral form factor is expected to have a linear "ramp" and eventually a constant "plateau" of order e S (where S is the thermodynamic entropy) at late times in chaotic systems with conserved energy, both of which are not captured by the equilibrium approximation. In the JT gravity calculations of [67,68], the ramp contribution to the spectral form factor is correctly captured by "trumpet" geometries, which involve bulk connections between disconnected closed boundaries. While these structures geometrically resemble the replica wormholes of [16], our observations above imply that these two kinds of connected geometries appearing in two different quantities have distinct physical origins. The replica wormholes of [16], appearing in the calculation of the Renyi entropies, can be fully explained within the framework of the equilibrium approximation. On the other hand, the ramp contributions to the spectral form factor, and hence the trumpet geometries associated with them, appear to be beyond the scope of the equilibrium approximation. The trumpet geometries lead to a disagreement between the direct evaluation of Tr[U ] Tr[U † ] and the product of Tr [U ] and Tr[U † ] in the gravity calculation, which is similar to the conflict between (3.13) and (3.15) discussed in section 3.2. However, while the equilibrium approximation can resolve the issue discussed in section 3.2, it does not seem to explain the factorization problem of the spectral form factor.
In a different future direction, it would also be interesting to understand how the mechanism for equilibration based on operator growth in the infinite-temperature case in section 4 generalizes to other choices of I α .

Acknowledgements
We would like to thank Ping Gao and Sam Leutheusser for helpful discussions. This work is supported by the Office of High Energy Physics of U.S. Department of Energy under grant Contract Number DE-SC0012567.

A Equilibrium approximation for a general initial density matrix
Here we discuss the generalization of the equilibrium approximation to a general initial density operator rather than a pure state which has been the focus of the main text.
Consider the quantities z n = Tr ρ n 0 = Tr ρ n = Tr(U ρ 0 U † ) n = η|(U ⊗ U † ) n |ρ 0 , e . (A.1) Applying the equilibrium approximation to the above expression, we have In the case where ρ 0 is a pure state, recall that A n = I α , τ |ρ 0 , e is independent of τ , and by imposing (2.27) we obtain z n ≈ 1 as discussed in (2.63). For a ρ 0 which is a mixed state, requiring the n = 1 equation to be satisfied again fixes For any quantity T that can be written as a transition amplitude in a replica Hilbert space, we can separate T as a sum of two parts, for some states |a , |b ∈ (H ⊗ H) n . The definition of T P , T Q are in complete analogue with (2.24) for the Renyi entropies. Note that in general T may not be real. To discuss the self-consistency of the equilibrium approximation for the quantity T , similar to (2.37), we will consider where |ā , |b ∈ (H ⊗ H) n are defined from |a , |b by the following procedure is the projector associated with I α in (H ⊗ H) n ⊗ (H ⊗ H) n . We have again assumed Z 1 is large and (2.33). We then have When τ ∈ S 2n can be written in a factorized form τ = σ ⊗ τ for σ , τ ∈ S n (meaning that it can be seen as a composition of a permutation σ involving only the first n elements with a permutation τ involving only the last n elements), the contribution from τ in the first term cancels with the contribution from the associated σ , τ in the second term. One can write down a similar expression for (T 2 Q ) eq app . Hence, where τ = σ ⊗ τ indicates that τ ∈ S 2n cannot be written in a factorized form. We now consider (B.7) or (B.8) for a few different observables.

B.1 Renyi entropies
For the n-th Renyi entropy we take |a = |ρ 0 , e and |b = |η A ⊗ eĀ , for which we have As in (2.44), we can estimate the magnitude of the above expression by counting the number of traces for A andĀ, with k 1 and k 2 can be interpreted respectively as the number of dashed and solid loops in the diagrams shown in Fig. 14. These diagrams are obtained by following exactly the same rules as those for Fig. 2-4. The contractions corresponding to the final condition η A ⊗ eĀ ⊗ η A ⊗ eĀ| are given in Fig. 14(a). Notice that the contractions for the first group of n elements and those for the second group of n elements respectively form separate subdiagrams in such a way that for any τ = σ ⊗ τ , we always get non-planar diagrams, as such a τ necessarily connects some elements of the first group to those of the second group. An example is Fig. 14(c), while τ for Fig. 14(b) is factorizable and does not contribute to (B.9). Like in the discussion around Fig. 5 in section 2.5, we can obtain a double-line diagram from each of these diagrams by adding an extra surrounding loop. The total number of loops in the double-line diagram is again equal to the number of faces F of the polygon associated with it, when it is placed on a manifold where it does not have crossing lines. In this case, the total number of edges of the polygon is E = 6n − 1 and the total number of vertices is V = 4n − 2, so if the diagram associated with τ can be drawn without crossings on a surface of minimum genus h, then (B.11) Since we always get non-planar diagrams with h ≥ 1 for τ = σ ⊗ τ , Also, since any τ = σ ⊗ τ is not equal to either e or ν, and for any τ = e, k(τ ) ≤ 2n − 1, Recall that for d A ∼ dĀ ∼ Z  Also note that in both cases d A ∼ dĀ ∼ Z 1 2 1 and d A dĀ ∼ Z 1 , the next-to-leading order correction to Z n,P is suppressed by Z −1 1 relative to the leading contribution, so that the contribution from Z Q is larger than this contribution.
For n = 1, we have (with η = (12) below) which is consistent with the other self-consistency condition (2.27) that we introduced earlier.
We see from the above discussion that the suppression of the correction Z Where τ = σ 1 ⊗σ 2 ...⊗σ m indicates a permutation in S mn that is not factorized among each of the m consecutive sets of n elements. The discussion completely parallels the m = 1, 2 cases discussed earlier, with , (B.17) k 1 and k 2 can again be seen as the number of dashed and solid loops in the diagrams of Fig. 15 (shown for the m = 3 case). Now for a diagram we have E = 3mn − m + 1, V = 2mn − 2m + 2. Hence, Examples of τ corresponding to h = 0 and h = 1 are shown respectively in Fig. 15 (b) and (c). For any τ = σ 1 ⊗ σ 2 ... ⊗ σ m , we get a non-planar diagram, so for all terms appearing in (B.16), Also, again since any τ = σ 1 ⊗ σ 2 ... ⊗ σ m is not equal to e or ν m , We then find that in both limits, d A ∼ dĀ ∼ Z  by at least a factor of Z −1 1 relative to the leading term in (Z n,P ) m .

B.2 Matrix elements and correlation functions
Let us now consider , and t m is the smallest time among t 1 , ..., t n . If t m t s , then we can apply the equilibrium approximation for U (t m ), with |a = |ρ 0 , e , |b = |O † , e and n = 1. We find where ∆ ij is dropped under the equilibrium approximation. Furthermore, where a, b denote indices for a basis ofĀ. Now suppose I α can be factorized I α = I A ⊗ IĀ, we then find that (B.27) Now as an illustration let us consider the example of Sec. 3.1 with (3.1) and A = R. We have have (B.28) where |i again denotes a basis for A. Suppose c ij are randomly picked numbers with comparable magnitude (which we can take to be 1) but random phases. From (B.22)

B.2.2 Equal-time correlation functions
We thus find the approximation (B.22) is good if

C Causality argument
Here we illustrate the causality argument of Sec. 2.8 by modeling the time evolution using discrete steps, that is, by a unitary circuit. More explicitly, we consider an infinite spin chain in 1+1-D with local Hilbert space dimension q, and its local time-evolution U is modeled by a circuit of two-site unitary operators, as shown in Fig. 16. We would like to show that the entanglement entropies of the region A shown in Fig. 8 are independent of the part of the time-evolution operator outside the region J(A) which is in causal contact with A. for all n. Let us act on the state |ψ(t) created by the circuit in Fig. 16 with (U − )Ā ⊗ 1 A , where the operator U − acting onĀ is constructed from local unitary operators as shown in Fig. 17. U − has t layers of local unitary operators like the original time-evolution operator U , and each local unitary in U − is equal to the inverse of the operator in U located at its mirror image with respect to the line at time t. By construction, the state prepared by the circuit in Fig. 17 is equal to the state prepared by that of Fig. 18 In this section, we show that the result (4.13) obtained from the equilibrium approximation holds for sufficiently late times in local random unitary circuits. We use the setup and methods of [52][53][54], where the system is a spin chain in (1+1) dimensions with local Hilbert space dimension q. The time-evolution operator is constructed from local unitary operators as shown in Fig. 19(a), where each V appearing in the circuit is an independent random  Figure 18. Due to the cancellation of inverses between local operators in U and U − A , the final state in Fig. 17 can also be produced by the circuit shown above, where the time-evolution operator is of the formŨ J(A) ⊗ 1 J(A) , withŨ given by the part of the circuit between the green dashed lines.
unitary matrix acting on two sites, drawn from the Haar measure of U (q 2 ). We denote the number of sites in the full system and in the subsystem A respectively as |L| and |A|.
Then d A = q |A| and dĀ = q |L|−|A| , with the subsystems A andĀ = B 1 ∪ B 2 as shown in Fig. 19(a). We will take both |A| and |L| − |A| to be large, and consider times t |A|, |Ā|. where we can express (U ⊗ U † ) n as a product of (V ⊗ V † ) n for the local unitaries V . As explained in [52][53][54], by using the Haar average of (V ⊗ V † ) n , each V in Fig. 19(a) can be associated with a "spin" σ taking values in the permutation group S n , and (D.1) becomes the partition function for a classical spin system which has the same lattice structure as the circuit in Fig. 19(a). See Fig. 19(b). The final state and initial state in (D.1) respectively determine the spin configurations at the top and bottom layers of the systems.
In the top layer, the A subsystem has e spins whileĀ has η spins. The bottom layer has a superposition of spin states determined from the matrix elements of operator O.
The spin system has interactions among the three spins on each shaded triangle in Fig.  19(b), characterized by a factor J(σ b , σ c ; σ a ). The contribution to (D.1) from a given spin configuration is given by the product of J(σ b , σ c ; σ a ) from all shaded triangles in Fig. 19(b), and a factor involving O that comes from the bottom layer. As in the usual Ising model, the partition function for this spin system can be obtained by summing over different domain wall configurations. A domain wall between spins σ and τ is labeled as σ −1 τ (See Fig. 20), and thus there are altogether n! different types of domain walls, one for each element of S n . We refer to domain walls associated with transpositions of two elements as elementary domain walls. Just like an element of S n can be decomposed into a product of transpositions, a domain wall can decomposed into compositions of elementary domain walls. For example, given that η = (n, n − 1, ... 1) can be decomposed as η = (1, 2) (1, 3) ... (1, n − 1) (1, n), an η-domain wall may be considered as a composite of n − 1 elementary domain walls associated with these transpositions.
When the difference in position ∆x and the difference in time ∆t between the initial and final points are both large for all domain walls relevant for a quantity of interest, there exists a coarse-grained description where we can characterize domain walls by their velocities ∆x/∆t, and collectively take into account the contributions from all detailed configurations that correspond to these velocities [54]. Then a domain wall with velocity v contributes a factor q −E (2) (v)∆t to the partition function, where E (2) (v) is independent of the type of the elementary domain wall. Furthermore, if there are l − 1 elementary domain walls traveling together at velocity v for time ∆t, we get a factor q −(l−1) E (l) (v)∆t . With finite q, for general v, E (l) (v) is different from E (2 (v) due to interactions among domain walls (they become equal in the limit q → ∞). It was argued in [69] based on the dynamics of entanglement growth in chaotic systems that the conditions should be satisfied for all l in any chaotic system. Here v B is the butterfly velocity of the system. The explicit form of E (l) (v) in random unitary circuits with finite q is not known for l > 2, but the condition E (3) (v B ) = v B was checked up to next-to-leading order in 1/q in [54]. From the spin configuration indicated at the top boundary of Fig. 19(b), we have an η −1 domain wall at the left edge of A, and an η domain wall at the right edge. As discussed above, each of these domain walls can be seen as composites of n−1 elementary walls which can in principle travel independently through the lattice. The lower end-points of each of these elementary domain walls in the lattice can be either on the left or right edges, or at the bottom. 23 One example of a possible configuration is Fig. 20 (b). Now suppose t |A|, |Ā|. Since the E (l) (v) are all O(1), in all configurations where any domain walls reach the lower boundary, we get factors exponentially suppressed in t relative to configurations where all domain walls meet in the middle or end at the edges of the system. So at such times, it is sufficient to consider configurations which do not reach the lower boundary. A general example of such a configuration is indicated in Fig. 21 (a). From (D.2), we see that for a pair of domain wall configurations which meet in the middle, the maximal contribution comes from the case where both have velocity v B . Similarly with those ending on the edges. Thus any domain wall which travels to the left or right must have velocity v B at leading order in our limit. For a decomposition of η where σ corresponds to the composite domain walls which meet in the middle and ν to those ending on the edge, the corresponding domain wall configuration contributes q −|σ| |A| q −|ν| |Ā| = q −(n−k(σ)) |A| q −(n−k(η −1 σ)) |Ā| .

(D.4)
In the regime we are interested in, the above expression is maximized when k(σ) + k(η −1 σ) is maximized, that is, they should saturate (2.45).
On the lower boundary, |O, e is attached to a σ −1 spin at each site, so that we get a factor of σ −1 |O, e . Putting together the contributions from such leading domain wall contributions for different choices of σ that saturate (2.45), we then obtain the result (4.13). Figure 21. (a) shows a general domain wall configuration that can contribute in the scaling limit and at late times. (b) shows the leading contribution in the scaling limit for a particular choice of σ, where each of the domain walls has velocity v B .
In the above derivation, we only made use of the fact that the average over local random unitaries could be expressed in terms of "spins" associated with permutations, and that there is a membrane tension associated with the domain walls between such spins in the scaling limit, which satisfies the conditions (D.2). The discussion of [10] implies that the above features are also present in the scaling limit in a variety of chaotic systems involving no random averaging, such as the floquet spin chains studied there. We therefore expect 23 In principle, there can also be cases where n − 1 elementary domain wall starting at the top boundary "split" into more than n − 1 elementary domain walls as they pass through the lattice, but as we explain later, in the limit we are considering, we will not need to take into account such possibilities. that it may be possible to show using the methods of [10] that the result (4.13) holds in the systems considered there.