Distinguishing Random and Black Hole Microstates

This is an expanded version of the short report [Phys. Rev. Lett. 126, 171603 (2021)], where the relative entropy was used to distinguish random states drawn from the Wishart ensemble as well as black hole microstates. In this work, we expand these ideas by computing many generalizations including the Petz R\'enyi relative entropy, sandwiched R\'enyi relative entropy, fidelities, and trace distances. These generalized quantities are able to teach us about new structures in the space of random states and black hole microstates where the von Neumann and relative entropies were insufficient. We further generalize to generic random tensor networks where new phenomena arise due to the locality in the networks. These phenomena sharpen the relationship between holographic states and random tensor networks. We discuss the implications of our results on the black hole information problem using replica wormholes, specifically the state dependence (hair) in Hawking radiation. Understanding the differences between Hawking radiation of distinct evaporating black holes is an important piece of the information problem that was not addressed by entropy calculations using the island formula. We interpret our results in the language of quantum hypothesis testing and the subsystem eigenstate thermalization hypothesis (ETH), deriving that chaotic (including holographic) systems obey subsystem ETH for all subsystems less than half the total system size.


Introduction and summary of results
A unifying idea spanning quantum information theory, quantum chaos and thermalization, and black hole physics is that of (in)distinguishability of quantum states. In quantum information theory, we would like to understand what the space of quantum states are.
In particular, how can we characterize which states are close or far away and endow the Hilbert space with a geometry? This notion of distinguishability is critical for storing and processing quantum information.
Quantum chaos and thermalization is all about distinguishibility. A natural definition of quantum thermalization is that the state is indistinguishable (up to a certain error) from a completely thermal e.g. Gibbs state. It is then important to characterize which systems thermalize and the mechanism for thermalization to occur. We can begin with two states that are easily distinguishable e.g. the "all spin up" and "all spin down" states of a quantum spin chain. If we evolve these states with a thermalizing Hamiltonian, the states will become indistinguishable using "simple" measurements.
Similarly, the black hole information problem is most naturally framed in terms of thermalization and indistinguishability. Black holes can be formed in many different ways. Moreover, they have an extraordinary number of microstates [1,2]. Even so, using semiclassical calculations, Hawking showed that all black holes with identical thermodynamic quantities (mass, charge, and angular momentum) will radiate thermal radiation [2]. This means that at late times, after the black hole has evaporated, all of these microstates are completely indistinguishable, which is in sharp tension with the unitarity of quantum mechanics. To resolve this apparent paradox, different black hole microstates must be made distinguishable directly from the radiation.
Remarkably, all three of these broad problems may be addressed using random matrix theory calculations of distinguishability measures. The purpose of this paper is to make this statement precise and elucidate the intricate and surprising connections, extending the analysis of Ref. [3].
The techniques of random matrix theory have become ubiquitous across far-ranging fields of physics. Originally used to characterize the spectra of heavy nuclei [4], random matrix theory has flourished in its applications in quantum information theory [5], quantum chaos and thermalization [6], and black hole physics [7,8]. What's more is that these fields are now understood to be deeply related to one another and, to some extent, inseparable.
In Section 2, we lay the foundation by reviewing the precise definitions of distinguishability in quantum information theory. In particular, we review various distinguishability measures that are used to characterize how well different states can be discriminated between. This is formalized by the operational tasks of quantum hypothesis testing and state discrimination. These are the most fundamental information processing tasks and make precise the operational meaning of our subsequent results.
In Section 3, we undertake our main technical computations. We introduce the ensemble of random mixed states (Wishart ensemble) and a diagrammatic approach in evaluating moments of the reduced density matrices. We exactly compute the relative entropy, Petz Rényi relative entropies, sandwiched Rényi relative entropies, fidelities, and trace distance of random states in the limit of large Hilbert space dimensions. This characterizes the space of generic quantum states. In particular, we find that when the logarithm of the dimension of the sub-Hilbert space that we consider is less than half of the total Hilbert space, then generic states are indistinguishable up to exponentially small terms in the system size. When the sub-Hilbert space is larger, we find the states are completely distinguishable up to exponentially small terms. We find interesting O(1) crossover behavior. We compare these "large-N " results to finite size numerics and find precise agreement.
In Section 4, we begin to interpret the random matrix theory results in the language of gravity. First, we show that in AdS/CFT, if one considers two different black hole microstates, the evaluation of distinguishability measures between the black holes microstates when an observer only has access to a subregion of the boundary is formally identical to the formulas present in Section 3. This occurs in special states called "fixed-area states" [9,10]. With this realization, we characterize the distinguishability of black hole microstates in Anti deSitter space or equivalently, high-energy states in conformal field theories. We subsequently apply this formalism to a toy model of an evaporating black hole [11]. We conclude that before the Page time, an observer of an evaporating black hole is only able to distinguish different black holes microstates if one has an O(e 1/G N ) number of copies of the radiation, meaning that the microstates are nearly completely indistinguishable. After the Page time, an observer of an evaporating black hole can easily distinguish microstates with a single copy of the radiation, though the needed measurement will be quite complex. In these calculations, replica wormholes play a central role.
In Section 5, we generalize our computations to random tensor networks. These represent new ensembles of random matrix theory and introduce a notion of locality into the quantum state. We find qualitatively new features in distinguishability with larger tensor networks being more distinguishable than smaller tensor networks and the Haar random states of Section 3.
In Section 6, we discuss thermalization in chaotic quantum many-body systems. Using an ansatz for the structure of high energy eigenstates in chaotic systems [12][13][14], we evaluate the various distinguishability measures. We interpret these results using the subsystem eigenstate thermalization hypothesis [15], a very strong version of thermalization. We determine that chaotic systems obeying the ansatz also obey subsystem ETH for subsystem sizes that are less than half the total system, after which subsystem ETH is violated. Furthermore, we find similar structures in gravity, generalizing the calculations of Section 4 to black hole microstates without fixed-areas. Analogous conclusions apply and we conclude that holographic CFTs in generic dimensions obey subsystem ETH.
We relegate certain details and extensions to the appendices including alternative derivations using free probability theory in Appendix A.
2 Distinguishability measures and their use

Review of distinguishibility measures
In this section, we review various distinguishability measures commonly used in quantum information theory. Each measure has an operational meaning and there are various relations between the measures. Readers familiar with distinguishability measures and hypothesis testing may move to Section 3 as there are no new results in this section.
Relative entropy We begin with the quantum relative entropy which is arguably the most important quantity in quantum information theory as many of the deepest results in the field are directly derivable from its fundamental properties. The classical relative entropy or Kullback-Leibler divergence is defined as D KL (P ||Q) := log x∈X P (x) log P (x) Q(x) , (2.1) where P and Q are classical probability distributions over a set X. The quantum relative entropy is the noncommutative analog defined for two density matrices, ρ and σ as This is only well-defined when the support of ρ is contained within the support of σ.
Otherwise, the relative entropy is infinite. The relative entropy acts as a distinguishability measure as can be seen from its basic properties. The first is positivity, D(ρ||σ) ≥ 0, with the inequality saturated if and only if ρ = σ. The second is referred to as the data processing inequality or monotonicity of relative entropy 1 which states that the relative entropy is non-increasing under completely-positive trace-preserving (CPTP) quantum channels, N [16] D(N (ρ)||N (σ)) ≤ D(ρ||σ). (

2.3)
This property is crucial for a distinguishability measure because it asserts that if you are given two quantum states, after performing operations on them, they can never become easier to distinguish. A particularly important quantum channel is the partial trace operation on a bipartite Hilbert space Under the partial trace, we lose all information about region B, making ρ harder to distinguish from other states that look similar on A. The partial trace will play a central role throughout the rest of the paper because we are generally interested in how to distinguish states when only having access to a subregion. While the relative entropy characterizes the structure of the space of quantum states, importantly, it is not a metric. This is most obviously seen from the definition which is not symmetric under exchange of ρ and σ. This is a feature and not a bug as can be seen by its operational meaning that we will soon explore.
The relative entropy is a parent quantity to many other central information-theoretic quantities, such as the von Neumann entropy where d A is the Hilbert space dimension, the mutual information and conditional entropy (2. 8) In these terms, the strong subadditivity of von Neumann entropy is a straightforward consequence of the data processing inequality (2.10) Rényi relative entropies Like the Kullback-Leibler divergence, the relative entropy can be generalized into Rényi relative entropies. However, because of the noncommutativity of density matrices, there are many inequivalent ways to generalize the relative entropy such that it reduces to the classical α-Rényi divergences, the unique set of quantities satisfying the five axioms of a generalized divergence [17] D KL,α (P ||Q) : where α is a positive semi-definite real variable. We will study two complementary families which have served the most uses in quantum information theory. The first is the most obvious quantum analog of (2.11) and is referred to as the Petz Rényi relative entropy (PRRE) [18] D α (ρ||σ) : The PRRE satisfies various nice properties, such as reduction to the von Neumann relative entropy when α → 1. For α ∈ [0, 1), the PRRE is finite even when the support of ρ is larger than the support of σ. Most importantly, the PRRE satisfies the data processing inequality when α ∈ [0, 2] [18][19][20]. One particularly useful case is at α = 1/2, which defines what has been called Holevo's "just-as-good fidelity" [21] or affinity [22] which, for most purposes is just as (if not more) useful as the more widely used Uhlmann fidelity (2.14) Both satisfy all of Jozsa's axioms for distinguishability measures [23] and define metrics on the space of quantum states called the Bures distance, Bures angle, and Hellinger distance repectively. The other quantum generalization of (2.11) we will study is the sandwiched Rényi relative entropy 2 (SRRE) [25,26] It is clear that this is equivalent to the PRRE when ρ and σ commute and reduces to the Uhlmann fidelity at α = 1/2. Like the PRRE, the SRRE reduces to the von Neumann relative entropy in the α → 1 limit and is only finite if either α ∈ [0, 1) or the support of ρ is contained within the support of σ. The most important property of SRRE is that it satisfies the data-processing inequality for α ∈ [1/2, ∞). In this way, it is complementary to the PRRE. Similar formulas for Rényi analogs of entropy, mutual information, and conditional entropies can be written in terms of the Rényi relative entropies.
Trace distance The final distinguishability measure that we will study is the trace distance, defined as where | · | 1 is the trace norm. The trace distance defines a metric on the space of quantum states and takes values between zero and one. However, unlike Holevo's just-as-good and Uhlmann fidelities, it does not descend from a relative entropy. The trace distance is monotonically decreasing under quantum operations. It will play a central role in our discussion of eigenstate thermalization in Section 6. There are various useful relations between the above distinguishability measures that we now list. First, we note that both PRRE and SRRE are monotonic in α while the SRRE lower bounds the PRRẼ (2.20) 2 We note that both PRRE and SRRE can be described as specific cases of the α-z-relative entropies defined as [24] Dα,z(ρ||σ) : These are strong results that we will use throughout the paper due to the difficulty in directly computing the trace distance. They are also important, nontrivial consistency checks of our results.

Operational interpretations in hypothesis testing
The most fundamental information processing processes are quantum state discrimination (QSD) and hypothesis testing (QHT). It should then be no surprise that this is where the most fundamental quantities, relative entropy and trace distance, find their operational meanings. In this section, we make precise what it means for states to be distinguishable by first introducing QSD and QHT, then stating what the distinguishability measures say about our ability to perform these tasks. For more details, we refer the reader to the literature e.g. Refs. [29,30]. The general set-up is that we are given a state on H that is either ρ or σ and we wish to determine which state we were given. We are allowed to use any positive operator-valued measure (POVM) which is a collection of positive semi-definite operators, {M i }, that sum to the identity operator on H. Each subscript, i, corresponds to a measurement outcome. Because we are looking for a binary outcome (is our state ρ or σ?), we can consolidate the M i 's into just two elements. For outcomes i ∈ A, we conclude the state is ρ, while for outcomes i / ∈ A, we conclude the state is σ. Our POVM is then {A, 1 − A} where A := i∈A M i . There are many choices for A and we want to optimize this choice as to have the least error in our conclusions. There are two types of errors. The probability of mistakenly concluding that we have σ when we were really given ρ is given by while the probability of mistakenly concluding that we have ρ when we were really given σ is given by These are referred to as the error probabilities of the first and second kind respectively (or type I and II).
There are various ways of optimizing these errors 3 . The symmetric way is called state discrimination. The smallest combined error is given by the trace distance between the states [32,33] where the optimization is taken over all POVM. If the trace distance is very large (close to one), we are able to choose a POVM that has very small error probabilities. If the trace distance is small (close to zero), then the combined error is close to one, the maximal optimized error which can be saturated by taking A = 1. Likewise, the probability that we correctly discriminate, P + , is also given by the trace distance State discrimination can be made easier if instead of given one copy of the state, we are given multiple, n, copies. This is the topic of asymptotic state discrimination. With these n copies, we can ask what is the optimal POVM on H ⊗n . The error probabilities are generalized in the obvious way (2.27) The sum of the errors can be shown to be bounded above by Holevo's just-as-good fidelity Unless the states are identical (F H = 1), the error rate exponentially decays to zero as we are given a large number of copies. If the fidelity is small, we may only need one (or very few) copies to confidently discriminate the states. Asymptotically (n → ∞), this is strengthened to an equality by the quantum Chernoff bound [34,35] lim The quantity on the right-hand side of this equation is called the quantum Chernoff distance. We progress to the asymmetric treatment of this problem, quantum hypothesis testing. The asymmetric optimization is the task of minimizing one of the errors while keeping the other error below some fixed, finite threshold . We define Quantum Stein's Lemma [36,37] asserts that for any ∈ (0, 1), the type II error decreases exponentially with the rate given by the relative entropy With the above review, we have a thorough understanding of how to quantify the ability to discriminate between two quantum states. It would be desirable to generalize this to an arbitrary, finite number of states {ρ i }. As we will see in section 4, this is particularly important for the black hole information problem. In the case that we are discriminating many states, we no longer consolidate the M i 's into A and 1−A. Rather, each measurement outcome i can lead us to conclude that we have state ρ i . If we are given the state ρ i with probability p i , the error probability is given by whose optimized value we define as Rather remarkably, building on the work of Refs. [42][43][44][45], the quantum Chernoff bound was generalized in the multiple state case, referred to as the multiple quantum Chernoff bound 4 [49] lim The value on the right hand side is referred to as the multiple quantum Chernoff distance. When comparing to (2.29), it is surprising that when discriminating between arbitrarily many more states, all one needs to do is apply a global minimum.
In the one-shot case, bounds can be placed on P * err (ρ i ), though, to our knowledge, an equality is not known. If we take the spectral decompositions of our POVM as M i := T i i λ ik Q ik , an upper and bound is given by [49] i<j The quantum Sanov's lemma provides the analogous asymmetric multiple state hypothesis testing result [46,47]. We also note intriguing new multiple state divergences obeying the data processing inequality whose operational meaning is not yet fully understood [48].
where r is the total number of states and T := max [T i ]. The upper bound can be made more intuitive, though generally weaker, by noting where in the second line, we have removed the remaining sum to mimic the form of (2.37) even though this formula is strictly weaker when setting r = 2. In passing, we note that determining P * err is a computation can be formulated as a semi-definite program [30,50], which means that it may be efficiently evaluated.

Distinguishing random states
In this section, we consider Haar random states. This ensemble can be described in several ways. Perhaps the simplest is to consider an arbitrary reference state |0 ∈ H and act with a random unitary matrix drawn from the Haar measure, the unique left-right invariant measure over U (dim H) : |Ψ = U |0 . This ensemble is particularly nice because the averages over the α copies of Haar random states are sums of permutations, τ , of the α copies (3.1) where g τ is the matrix representation of τ and the denominator ensures that the state has unit norm. We are generally interested in ensembles for mixed states that are induced from taking a partial trace over a sub-Hilbert space. If H = H A ⊗ H B , the reduced density matrix on A is given by where the subscript on the permutation elements mean that they only permute within a sub-Hilbert space. The trace of a permutation element is straightforward to work out, equaling the dimension of the Hilbert space, d A d B , to the number of cycles in the permutation, C(τ ). The denominator can then be written as τ ∈Sα which can be summed exactly because we know that the number of permutations of α elements with k cycles is given by the Stirling number of the first kind. However, we can easily avoid this technical point because we will be interested in the regime where the Hilbert space dimension is large. Therefore, only the permutations that maximize C(τ ) will contribute at leading order. The unique permutation that maximizes C(τ ) is the identity permutation which has α cycles, so throughout this paper, we will approximate the denominator as There is an alternative description of the same induced ensemble of density matrices that will be useful for us when generalizing to tensor networks. Rather than starting with fiducial state |0 , we begin with complete bases, |i A and |J B , on H A and H B respectively. The Haar random state is then represented as 5) and N is a normalization constant. The reduced density matrix on H A is then [51][52][53] Ensemble averages over n copies are given by the same formula in terms of permutations, so at large dimensions, ρ A XX † . This is the famous Wishart-Laguerre ensemble and is equivalent to the previously introduced Haar random states [54,55]. The advantage of working with random Gaussian states instead of Haar random states is due to "Wick calculus" being simpler than "Weingarten calculus." The difference will appear for random tensor networks although the ensembles will still be equivalent at large Hilbert space dimension [56]. Moreover, the class of random tensor networks used for holography involving projected Haar random states [57] precisely correspond to the states we study even at finite Hilbert space dimension, as explained in Appendix C.
We now introduce a diagrammatic approach for computations of certain moments of the Wishart ensemble involving multiple states, building on Refs. [3,[58][59][60]. This will prove invaluable in the following calculations.
We represent the elements of the random global pure state as two vertical lines where the solid line represents H A and the dashed line H B . To form the density matrix, we take the outer product (3.8) We will usually drop the index labeling of the lines to avoid cumbersome notation. All matrix manipulations are done on the lower ends of the lines. For example, we can take a partial trace over H B by connecting the dashed line square the matrix by taking two copies and connecting the bra of the first matrix with the ket of the second then take a trace by connecting the remaining solid lines to determine the purity (3.11) For every insertion of the density matrix, we include a factor of (d A d B ) −1 . This will give the normalization factor that we computed from (3.3).
The ensemble averaging of the states are done on the upper ends of the lines. The rule here is that we must add up all diagrams contracting the any bra with any ket. For α insertions of the density matrix, there will be α! diagrams, corresponding to the α! allowed permutations. Within each diagram, we count the number of loops with each loop giving a factor of the Hilbert space dimension. One can see that this diagrammatic sum is precisely the numerator of (3.1).
We can now practice by taking the ensemble averaged purity. There are two (2!) diagrams descending from (3.11) immediately leading to d −1 A + d −1 B . Because we are interested in distinguishing density matrices that are independently sampled from the ensemble, we must extend the diagrammatic technique. We do this by introducing different colors for different density matrices. When ensemble averaging, bra's of one color can only contract with ket's of the same color. For example, the overlap between independent induced states ρ A (black) and σ A (red) looks similar to the purity but the ensemble averaging will only include a single diagram because the second diagram would have connected the black and red indices which is disallowed. With this formalism, we are now ready to compute each distinguishability measures using a replica trick.

Relative entropy
We begin with the von Neumann relative entropy, the topic of Ref. [3], both because it is the most fundamental quantity and the simplest to compute using our techniques. This will illustrate our strategy that will be used throughout. The relative entropy may be computed using a replica trick. That is, we first compute a certain series of moments of the ensemble and then analytically continue to arrive at the desired quantity. The replica trick for the relative entropy is given by [61] We compute the ensemble average of the two terms separately. The first term is the Rényi entropy and, as a diagram, looks like (3.16) While we make the dimensions of the sub-Hilbert spaces large, d A , d B ∝ N → ∞, we keep their relative sizes, d A /d B , finite. The leading diagrams maximize the total number of loops.
These are the planar diagrams as this double line notation corresponds the standard large-N topological expansion. Planar diagrams correspond to the non-crossing permutations, N C α , a well-studied object in enumerative combinatorics and probability theory. The ensemble averaged Rényi purity is then given by where η is the cyclic permutation, spawning from the matrix multiplication and trace in (3.16). The non-crossing permutations maximize the total exponent as C(η −1 • τ ) + C(τ ) = α + 1. A more refined statement is that the number of non-crossing permutations with C(η −1 • τ ) = k (and therefore C(τ ) = α + 1 − k) is given by the Narayana number With this information, we can reorganize (3.17) as a sum over k instead of a sum over permutations which can be rewritten again as a hypergeometric function 5 (3.20) 5 The two elements of the piecewise function are equivalent on the integers. The reason why we write it as a piecewise function is for ease of analytic continuation to non-integer values because the hypergeometric functions are entire when the argument is less than one.
The A ↔ B symmetry of Rényi entropies of bipartite pure states is manifest. Taking the logarithm and analytically continuing to α = 1, we obtain Page's formula [62] lim α→1 (3. 21) In writing this formula, we have assumed that logarithm and ensemble average commute.
In Appendix B, we explain why this is true when the Hilbert space dimensions are large. The second term in (3.15) involves both ρ A and σ A

6
Tr Because there is only a single copy of ρ A , when ensemble averaging, we must contract the first density matrix with itself. There are no constraints on how to contract the red lines.
This means that the S α permutations are broken down to 1 × S α−1 (3.23) We still need to maximize the exponent by choosing non-crossing permutations, though many such permutations are disallowed by the identity factor on the first matrix. The diagrams are topological, so we have From this diagram, it is clear that the cardinality of the intersection of N C α and 1 × S α−1 is given by the cardinality of N C α−1 and the number of such non-crossing permutations with C(η −1 • τ ) = k is given by Narayana number N α−1,k which can also be represented by a hypergeometric function (3.26) Taking the α → 1 limit, we have This can be generalized such that the auxiliary systems for σ and ρ are of different sizes dB 1 and dB 2 .
In the diagrammatics, this corresponds to assigning different weights to the black and red dashed lines. The resulting generalized sums are still tractable, though, we do not currently have use for these calculations in our applications to black holes because dB corresponds to the size of the black hole, which is simple to measure by an outside observer, rendering σA and ρA easily distinguishable when dB 1 = dB 2 . Some exact results for this set up in the Wishart ensemble can be found in Refs. [63][64][65].
Therefore, we find the ensemble average of the relative entropy to be This is a satisfying, simple answer. For small d A /d B , the relative entropy is given by If we think in terms of "number of qubits," N A and N B , this is exponentially small in the difference (N B − N A ), meaning that the states will be very difficult to distinguish whenever we have access to a few qubits less than half the system; the asymptotic error rate, β * n ( ), is very small, meaning we will need exponentially (in N A ) many copies of the state to identify it with confidence. (3.28) is also monotonically increasing in d A /d B , a consequence of the data processing inequality when we take the partial trace as the quantum channel. When d A → d B , the relative entropy approaches the curious value of 3/2. This value of 3/2 was also determined in Ref. [66] using very different techniques which serves as an additional consistency check of our results.
When d A > d B , every reduced state on H A in the ensemble will be rank deficient with d A − d B zero eigenvalues. This is because the Wishart ensemble has rank at most min(d A , d B ). It is therefore overwhelmingly unlikely that two independent states, ρ A and σ A , will have the same support. In particular, the support of ρ A will not be contained within the support of σ A . This is the reason why the relative entropy becomes infinite in this regime; there will be a measurement we can choose that easily distinguishes ρ A and σ A .

Petz Rényi relative entropy and Holevo's just-as-good fidelity
To understand more sophisticated structures in Haar random states, we progress to the computation of the PRRE. The PRRE has a tricky 1 − α exponent for σ A , so we use a replica trick with two replica parameters, α and m We will compute this for α, m ∈ Z + , only taking the limit to α, m ∈ R at the end of the calculation. The positive integer moments in diagrammatic form are where there are α black density matrices and m red density matrices. When ensemble averaging, we are only able to contract using the subgroup S α × S m ⊂ S α+m , leading to the sum over permutations (3.31) As can be seen by the diagram, even with the restricted sum, there are many ways to contract the lines that are non-crossing, hence maximizing the exponents. These are precisely the non-crossing permutations acting independently on the black and red indices, so the combinatorial factor will be given by the product of two Narayana numbers (3.32) The reason why there is an additional "−1" in the exponent of d A is that the black and red lines are connected at the bottom of the diagram due to the matrix multiplication. Note that this expression is a generalization of the replica trick used in the previous section for the relative entropy, (3.25), if we set α = 1 and m = α − 1. As before, the double sum can be expressed in terms of hypergeometric functions (3.33) Now that the sum that required m to be an integer is complete, it is safe to take the m → 1 − α limit (3.34) When d A = d B , this precisely agrees with a formula from Ref. [66]. Taking the logarithm leads to an exact closed-form expression for the PRRE in the large Hilbert space dimension limit This is a rare instance where we have an exact closed-form solution for relative entropies and can be thought of as the "Page formula" for PRRE. Importantly, this equation contains much more information about random quantum states than (3.28). A highlight is the finiteness of (3.35) for α < 1 in the d A > d B regime. This explains the approach of random quantum states to complete distinguishability. There are a few consistency checks that we can readily verify. Namely, we note that (3.35) reduces to (3.28) if we send α → 1, (3.35) is monotonically increasing in d A /d B (data processing inequality), and monotonically increasing in α.
An additional desirable property of (3.35) is that it is simple enough that we can perform the optimization needed to compute the quantum Chernoff distance where the optimal value of α in (2.29) is found to be 1/2. This definitively establishes the error rate in quantum state discrimination for a measure one set of quantum states. Because α = 1/2 is the optimal value, this adds to the usefulness of Holevo's just-as-good fidelity, which is given by (3.37) In order to evaluate the quantum multiple Chernoff distance, we need to characterize the fluctuations in the PRRE. To compute the variance, we must compute Only the diagrams that connect the two blocks will contribute to the variance because the disconnected diagrams are subtracted. For small d A /d B , these contributions will be after taking the relevant limit. We may use a Taylor expansion of the logarithm to determine that the variance of the PRRE, σ 2 , will be the same order. The higher degree central moments, and therefore higher cumulants, will be subleading because in general, the n th central moment will be O( . Therefore the PRRE will follow a normal distribution at subleading order.
For a normal distribution, the probability of random variable X being r standard deviations, σ, below the mean, µ, is Therefore, if we have W independent samplings of ρ A and σ A , the probability that the minimum relative entropy will be at most r standard deviations from the mean is If we are discriminating between W states, the quantum multiple Chernoff distance will be with probability 1 − 1 . In order to be confident in the state discrimination (P * err < 2 ), we need copies of the state. Due to σ being suppressed in the total Hilbert space dimension, this formula only mildly depends on W even when W is of order the Hilbert space dimension. Thus, the multiple Chernoff bound is essentially just as tight as the two-state Chernoff bound.

Sandwiched Rényi relative entropy and Uhlmann fidelity
Continuing our progression in difficulty, we now compute the SRRE using a new replica trick requiring two replica indices The associated diagrams are more complicated because there the red and black lines are not cleanly partitioned There are still many ways to contract the above diagram without crossing lines. We take two steps. First, we need to have the black lines contract with themselves in a non-crossing manner. For example, we may have This gives a factor of d where τ is the non-crossing permutation of the α black lines. We can see from this diagram that depending on how the black lines are contracted, this restricts the allowed permutations for the red lines. This is why this computation is more complicated than for the PREE where the black and red permutations simply factorized as N C α × N C m . In order for the global permutation to be non-crossing, the red permutations must be non-crossing within each block partitioned off by the black permutations. In the above example diagram, the black "rainbow" restricts the red permutation to be of the form N C 2m × . . . . The identity permutation on the black density matrix on the right places no additional restrictions. In terms of equations, the diagrams may be summed as where the product is over the cycles of η −1 • τ and | · | represents the length of the cycle. We first focus on the d A < d B regime where the inner sum may be computed as before where in the second line, we have pulled out the factors of d A and d B from the product by enforcing the global permutation to be noncrossing. This formula does not need m to be an integer, so it is now safe to take the m → 1−α 2α limit The product over cycle structures makes this formula still very difficult. Fortunately, Kreweras solved exactly this combinatorial problem about cycle structure in his landmark paper on non-crossing partitions [67]. He found that the number of non-crossing permutations of {1, 2, . . . , α} with cycle structure 7 {m i } is given by, what we will call, the Kreweras number [67,68] Therefore, we can reorganize the sum such that there are no more references to permutations, only natural numbers This formula still presents a daunting task to evaluate in terms of elementary functions for generic α, though it provides a tractable, controlled expansion in d A /d B . This is because, for small d A /d B , the hypergeometric function is close to one. We then must consider the smallest values of i m i . First, we take only the leading term with i m i = 1 (cyclic permutation) This is not terribly useful because, as explained above, to this order, the RHS is exactly one, which would lead to the SRRE being identically zero. To find a nontrivial result, we need the next term where i m i = 2 which can be achieved in many ways. These are the all the ways to sum to integers between 1 and α − 1 to α where the floor function in the sum ensures that we do not double count. The Kreweras number is (3.53) j = α 2 will only occur when α is even. The exact form of the hypergeometric functions in the sum are not important at this order because for small d A /d B , they are all close to one. Therefore, only the Kreweras number is important. We can easily compute the sum at this order for any integer α and find that this parity effect disappears, leading to an SRRE ofD (3.55) Note that this agrees with the previously derived von Neumann relative entropy in the relevant α → 1 limit. Moreover, it obeys the data processing inequality for all positive α if we take the quantum channel to be the partial trace. The Uhlmann fidelity is found 8 by setting α = 1/2 (3.56) At this order, the Uhlmann fidelity is identical to Holevo's just-as-good fidelity (3.37). The value of the fidelity exactly at d A = d B was found in Ref. [66] to be 9 16 and additional results may be found in Ref. [69].
We can also evaluate the SRRE exactly for any integer moment using (3.50). Here, we work out the least tedious case of α = 2 which is also known as the collision relative entropy [70]. In this case, (3.52) is actually exact and does not contain corrections. We only sum over j = 1, so leading to an SRRE of (3.59) Now that we have done the sums over the m permutations, we can safely take m → 1−α 2α and rewrite the sum in terms of Kreweras numbers This is an exact formula, but is difficult to evaluate away from limits. For large d A /d B , all of the hypergeometric functions are close to one so all that matters is the total number of noncrossing permutations, which is given by the Catalan number Therefore, at leading order, we have . (3.62) Unlike for small d A /d B , there are no additional terms at leading order. The SRRE is thus Important to note is that this is only well-defined for α < 1. This is to be expected because of rank deficiency. In the well-defined regime, the SRRE is monotonic in α and manifestly obeys the data processing inequality. We can evaluate the asymptotic expression at α = 1/2 to find the Uhlmann fidelity The prefactor comes from the Catalan number which is nonintegral for noninteger α. We see that the fidelity is inversely proportional to d A /d B , decaying to zero when subsystem A occupies most of the Hilbert space. The full spectrum of σ A , and hence the fidelity, may be evaluated using techniques of free probability theory. This is completed in Appendix A. The answer is the free multiplicative convolution of two Marchenko-Pastur distributions.

Trace distance
The final distinguishability measure we discuss is the trace distance. This is the ideal measure when discussing one-shot state discrimination (2.25). More general than (2.18), we can define an α-norm version of the trace distance where the α-norm of an operator, A, is defined as For even α and Hermitian A, we can dispose of the square root The trace norm is then theα → 1/2 limit of this expression. The replica trick, while only requiring a single replica parameter, is quite difficult as we must compute all even powers of ρ A − σ A which involves arbitrary mixing of ρ A and σ A [71] Tr where the sum runs over the power set of {1, 2, . . . , α} and | · | is the cardinality of the subset. We have Each term in the sum can be expressed as an appropriate summation over the symmetric group, though this is far from straightforward.
Consider the small d A /d B limit. In this case, the terms that maximize C(τ ) will dominate the sum. This is when τ is the identity. This permutation is always present, regardless of S and universally contributes as d 1−α A which in the α → 1 limit contributes at O(1). To see if this contributes to the overall sum, we need to understand the cardinalities. The number of subsets with cardinality k is given by the binomial coefficient, so the identity contributes as To get a nontrivial answer, we must therefore move beyond the identity permutation. This is to be expected because the trace distance will be small for small d A /d B and should not be O(1). The next leading term is when C(τ ) = α − 1 which corresponds to the identity on all sites except for two which are swapped; this is always non-crossing. This contributes contribution. The combinatorics are slightly more complicated. If |S| = k, then there are k 2 + α−k 2 ways to have a single pairing because we can only choose pairings within the block of k ρ A 's or n − k σ A 's. Therefore, the contribution at this order is This is not an analytic function, so theα → 1/2 limit is quite ambiguous. We are free to work to higher orders, though we will argue that this will not help our cause.
In the small d B /d A regime, the expansion is more involved. The leading terms come from maximizing C(η −1 • τ ). We can only have C(η −1 • τ ) = α in the case that S = {1, 2, . . . , α} or is empty because otherwise, τ = η will not be an allowed permutation. At this order, we therefore have The parity effect arises from the exponent of the sign in the sum. Analytically continuing the even integers to one, we find meaning that the states are nearly maximally distant. To understand how the trace distance approaches one, we need to work at the next order. The noncrossing permutations that give C(η −1 • τ ) = α − 1 are those that are of the form η α 1 × η α 2 . This means that the ρ A 's and σ A 's must be in disjoint blocks i.e. S is a set only containing consecutive integers. There are (α − 1) ways to partition α into nonzero integers α 1 and α 2 if we define the tuple (α 1 , α 2 ) to be distinct from (α 2 , α 1 ). There is an additional factor of α coming from the rotations of S to S + 1, leading to Finally, when S = {1, 2, . . . , α} or S = ∅, we again can partition the elements into α 1 and α 2 size blocks but this time (α 1 , α 2 ) and (α 2 , α 1 ) are indistinguishable. Therefore, for even α, there are α/2 possibilities while for odd α, there are only (α − 1)/2 possibilities plus the rotation factors 9 , leading to The rotation factor for even α when α1 = α2 is only α/2 because of indistinguishability.
where the odd terms are trivial because the S = {1, 2, . . . , α} and S = ∅ terms exactly cancel in the sum due to the power of the sign. Taking the α → 1 limit of even α, we find the trace norm at this order to be (3.75) The trace distance may be evaluated away from limits using free probability techniques as we review in Appendix A [72] T Interestingly, our asymptotic formula was exact. The trace distance at d A = d B was found in Ref. [66] to be 1 2 + 1 π . Without free probability, we can still use bounds from Section 2 to place strong constraints on the trace distance. In particular, this is helpful for the small d A /d B where we were unable to find an analytic answer. First, we use Pinsker's inequality (2.21), using the relative entropy (3.28) as an upper bound for This is only a useful bound when the RHS is less than one. Recall that when d A = d B , the RHS will be This means that the trace distance is very small, though without a lower bound, we cannot yet say that we could not find the leading order trace distance from the above expansion. We can determine a lower bound using Holevo's just-as-good fidelity which is, in general, stronger than the lower bound from Uhlmann fidelity where we have also included the upper bound that is, in general, weaker than the upper bound from Uhlmann fidelity. For small d A /d B , this gives This is a stronger upper bound than from Pinsker's inequality, though the scaling is still not nailed down, only constrained to between linear and square root with d A /d B . Because the scaling is at most linear, we cannot hope to find the leading order behavior from the expansion at the beginning of this subsection because the linear term was zero if we are to trust the "continuation." The Uhlmann fidelity for small d A /d B does not strengthen the upper bound at leading order.
We also want to characterize the d A > d B regime. While Pinsker's inequality does not help here because the relative entropy is infinite, Holevo's just-as-good fidelity places nontrivial upper and lower bounds meaning the states are almost as far away from each other as possible, approaching one exponentially in N A − N B . The upper bound can be improved by the Uhlmann fidelity at leading order such that the scaling behavior is completely fixed (3.83) These bounds are consistent with the analytic expressions (3.75) and (3.76).

Small-N numerics
All of our computations thus far have been in the limit where both d A and d B are large.
It is important to ask whether these asymptotic results are accurate when d A and d B are finite. One motivation is if these predictions can be observed in experiments and Noisy Intermediate-Scale Quantum (NISQ) technology [73]. Of course, the Hilbert space dimensions are exponentially large in the number of qubits, so there is hope that our results are predictive for small-scale experiments. In this section, we numerically compute the various distance measures and compare to the asymptotic formulas. This serves as a further consistency check of our results, which we find to be extraordinarily accurate. In Fig. 1, we plot the von Neumann relative entropy, D 3/2 , andD 2 . All of these quantities are infinite for d A > d B due to the rank deficiencies in the reduced density matrix. For this reason, we are able to sample very large Hilbert space dimensions because the bottleneck on classical computers is d A and not the total system size. We find very accurate agreement between the exact large-N predictions and the small-N numerics. The fluctuations in the entropies are noticeably larger for small d A because of the subleading corrections that we have thus far ignored.
In Fig. 2, we investigate the other regime by plotting D 1/2 andD 1/2 . These quantities are related to Holevo's just-as-good and Uhlmann fidelities respectively and are therefore well-defined in the d A > d B regime. This limits the Hilbert space sizes we can probe, though we still find very accurate agreement with the large-N analysis.
Finally, in Fig. 3, we plot the trace distance, examining both the small d A /d B and large d A /d B regimes. The large-N expressions precisely agree with numerics and are bounded within the fidelities.

Distinguishing black holes
While studying random states is interesting in its own right, the physical implications of our results becomes significantly richer when we apply them to gravitational systems. We will explain how the connection between random states and gravitational systems is more than an analogy and in some ways, quantitatively identical.

Fixed-area states in holography
In quantum field theory, we compute the moments of reduced density matrices by evaluating the partition function on certain replica manifolds [74,75]. These are glued according to the relevant trace structure. If the quantum field theory is holographic, we may map the calculation to an evaluation of the gravitational path integral with boundary conditions prescribed by this trace structure. In the gravitational path integral, we are instructed to sum over all geometries with the given boundary conditions. In the derivation of the Ryu-Takayanagi formula [76], only replica symmetric geometries were considered. In contrast, we find that replica symmetry breaking saddles are important for the evaluation of relative entropies.
In general, it is very difficult to evaluate the gravitational path integral for multiple replicas. This is because the nontrivial coupling between the replicas leads to backreaction, changing the bulk geometry [77]. A great simplification in the gravitational path integral can be made if we focus on "fixed area states" [9,10]. These are states where the area of one or more surface is fixed and not integrated over. In general, the different replicas will not backreact among themselves, so we are left with copies of the original bulk geometry except for potential conical singularities appearing at the locations of the fixed surfaces.
As an example, consider the Rényi entropies of a region on the boundary of a pure state black hole background 10 . There exist two extremal surfaces that are candidate Ryu-Takayanagi surfaces, γ 1 and γ 2 , each wrapping the black hole horizon in topologically distinct manners. Denote the areas of these two surfaces A 1 and A 2 respectively. The moments of the reduced density matrix are where the numerator is the gravitational path integral on the replicated geometry and the denominator is the path integral on a single copy, necessary for normalization. Because the geometry is identical in both geometries away from the conical singularities, the numerator and denominator will almost completely cancel. The nontrivial terms come from the actions of the conical singularities which are determined by their opening angles

2)
10 On the CFT side of the duality, these states should be thought of as highly-energy pure states. where G N is Newton's constant and the n k 's are the lengths of the cycles in τ and ρ b is the bulk state labeling the black hole microstate. These account for the bulk entropy term in the FLM formula [78]. The sum over the permutation group arises from all of the ways the replicas may be glued together in the codimension-one region bounded by the two fixed surfaces (see Fig. 4). We have chosen the bulk state to be pure such that all of the bulk traces are one This sum should now look familiar as it is identical to the sum needed for the Rényi entropies of Haar random states, (3.17), once identifying d A ↔ e A 1 /4G N and d B ↔ e A 2 /4G N . In this way, entropies in fixed-area states in holography are identical to entropies in Haar random states 11 . This connection becomes even richer when we consider more than one gravitational state to compute the relative entropies. Consider the following moments needed for the von Neumann relative entropy Both states have fixed areas and the same semiclassical geometry, but come from different black hole microstates, ρ b and σ b . In the language of Refs. [79,80], they are orthogonal states in the same code subspace. Just as before, the gravitational path integral instructs us to sum over all topologies, meaning that the region between the two fixed-area surfaces can be glued according to any S α permutation. This seems different than the calculation in Haar random states which only contained a sum over a subgroup 1 × S α−1 , (3.23) (4.5) However, because ρ b and σ b are orthogonal, Tr ρ b σ n 1 −1 b is only non-zero if n 1 = 1. This reduces the sum to which is identical to (3.23) under the same identification. A nearly identical argument holds for the PRRE, SRRE, and trace distance. Therefore we conclude that not only do fixed-area states have the same entropies as Haar random states, but they also have identical Hilbert space geometries. These results have interesting implications for the distinguishability of black hole microstates. Namely, the asymptotic observer with arbitrarily small information about the state (small A), is able to distinguish between any black hole microstates. This is surprising because we usually consider all black holes to look the same from outside the horizon to any observer, especially local observers. The catch is that the microstates are only distinguishable nonperturbatively in Newton's constant, O(e −1/G N ). This is because all distinguishability measures are linear in d A /d B for small region A which translates to proportional to e (A 1 −A 2 )/4G N . This means that while distinguishability is in principle possible, the error rates in state discrimination will be very high unless the observer has an exponentially large number of copies of the system. The distinguishability is nonperturbatively small up until region A is roughly one qubit less than half the boundary system, at which point it becomes O(1). When the observer has access to more than half of the boundary, the black hole microstates become completely distinguishable up to nonperturbatively small corrections.
We also note that these results represent nonperturbative corrections to the JLMS formula which asserts that the boundary relative entropy equals the bulk relative entropy within the entanglement wedge [81]. We have considered bulk states that are pure, orthogonal, and localized between the two extremal surfaces; the bulk states are identical outside of the black hole 12 . When A is sufficiently small, the black hole is not within its entanglement wedge so the bulk states are identical i.e. the bulk relative entropy is zero. Therefore, the JLMS formula asserts that boundary relative entropy is zero. We have shown that there are nonperturbative corrections to this statement.

The PSSY model and replica wormholes
In a landmark achievement, the Page curve [7] for an evaporating black hole was computed for the first time in two independent papers [83,84]. The key mechanism that "fixed" Hawking's calculation was the inclusion of certain wormhole saddles in the gravitational path integral, referred to as "replica wormholes" [11,85]. Using the toy model of black hole evaporation presented in Ref. [11] (PSSY), we now show the role of replica wormholes in calculations of relative entropy. This elucidates how the assumptions of Hawking fail. We call this a violation of the no-hair theorem, which is a non-perturbative effect and therefore not present in Hawking's calculation.
The PSSY model consists of two-dimension Jackiw-Teitelboim gravity decorated with end of the world (EOW) branes with k flavors. The Euclidean action is given by where S 0 is the large ground state entropy, g (h) is the bulk (asymptotic boundary) metric with curvature R (K), φ is the dilaton, and µ is the tension of the EOW brane.
The EOW brane has k 1 internal microstates. The global states on the black hole and radiation that we consider are of a maximally entangled form where |i R represents an orthonormal basis of the states of the radiation. Consider a second microstate where i + k ∼ i is implied. The definitions of these states are not microscopic in the sense that the |ψ i B 's are defined by a gravitational path integral and are not exactly orthogonal. As for the fixed-area state calculation, they may also be thought of as being orthogonal in the code subspace. the von Neumann entropy [11]. Because of the non-orthogonality, the reduced states of (4.8) and (4.9) on the radiation are not a priori identical even though the states appear to be related only by a local unitary transformation on B 13 . The overlap between these two states is The overlap on the right hand side is given by a gravity amplitude (4.11) Because i = i + 1, connecting the brane in the gravity diagram is incompatible and the amplitude is zero. This means that |Ψ and |Ψ are roughly orthogonal but there are important caveats to this statement because the overlap should be thought of as an ensemble averaged statement. In particular, | ψ i | ψ i+1 B | 2 is non-zero. This is completely analogous to the Haar random story where, on average, two independently chosen vectors will have zero overlap, but the variance is non-zero. The analog of (4.11) for random matrices is (4.12) Because, when ensemble averaging, we cannot contract black and red indices, the average, ψ i | ψ j =i B , equals zero. In complete analogy, the ensemble average of is non-zero (though very small) because we may now contract red with red and black with black. We are interested in the relative entropy of the radiation for two different microstates. Hawking and even the island formula papers assumed that the radiation is seen as purely thermal 14 before the Page time in accordance with the no-hair theorem i.e. all black holes of the same mass, charge, and angular momentum look the same from the outside. After the Page time, while the island formula papers did not assume the radiation to be purely thermal, there was no difference between the calculations for different microstates of the black hole. From one perspective, this is great because unitarity can be realized without knowing the microscopic theory. On the other hand, it is disappointing because it bypasses the question of why all initial states appear to lead to the same final state. We resolve this part of the information problem within the PSSY model and believe analogous results should hold in more realistic models of black hole evaporation.
The reduced density matrix on the radiation for the first state is given by and similarly for the second state From here on out, we will drop the subscripts labeling the Hilbert spaces as it should be clear. We now compute the PRRE between two states of the radiation using the replica trick.
(4.16) Figure 5. Left: The boundary conditions for the path integral in (4.16). Center: An example of a legal way of filling in the geometry. This geometry is also planar, so it contributes at leading order. Right: An example of an illegal way of filling in the geometry because the EOW brane with label i 1 cannot be connected to the EOW brane with label i 1 − 1. It is not hard to convince oneself that every diagram that connects the left n and right m boundaries will always lead to an inconsistency as in the right diagram.
This is a more complicated but still tractable gravitational path integral. As shown in Fig. 5, the sum is over only the non-crossing permutations in the S α × S m subgroup of S α+m due to the boundary conditions on the EOW brane i 1 and i 1 − 1 being incompatible as well as i α+1 and i α+1 + 1 being incompatible. There are crossing permutations in S α+m /S α × S m that are compatible with the boundary conditions, but these are subleading. The only other difference in diagrams from the random matrix theory calculation is that the geometries are allowed to have additional handles. This, however, is unimportant because each handle will contribute a factor of e −2S 0 and we have assumed the ground state entropy to be large.
To compute the PREE, we must evaluate the gravitational path integral on these replica geometries. For simplicity, we consider the case where the black hole is in the microcanonical ensemble i.e. instead of fixing the lengths of the boundary, we fix the energy, E. The path integral of an n-boundary wormhole is [11] Z n = e S y √ 2E n , (4.17) where S is the microcanonical entropy at energy E. Because of the simply power of n, after normalizing the density matrix, the function y will drop out of the final answer. All calculations are then identical to random matrix theory with the identification of k ↔ d A and e S ↔ d B 15 . We choose to only write the PRRE (4.18) 15 For analogous reasons, the exact same analysis as in Section 3 can be made for the SRRE and trace distance but we do not write these out explicitly to avoid repetition.  Figure 6. Holevo's just-as-good fidelity (blue) and one minus the fidelity (orange) are shown for the PSSY model following (4.21). Before the Page time (log [k] = S BH ), the fidelity is exponentially close to one. After the Page, time, the fidelity exponentially decays to zero. where we found the maximum to be at α = 1/2 i.e. Holevo's just-as-good fidelity

The quantum Chernoff bound asserts
While (4.20) saturates at large n, the RHS holds as an upper bound for all integer n.
Holevo's just-as-good fidelity, plotted in Fig. 6, is When observing a black hole from the outside, our task is not as simple as distinguishing two states. Rather, we need to distinguish between all e S 2 states of the black hole. On the face of it, this seems like an insurmountable task. However, using the multiple quantum Chernoff bound (2.37) and the normal distribution for relative entropies leading to (3.41), we determine that our asymptotic error in the multistate discrimination is identical, at leading order, to that of the two state discrimination.
This has important implications on the nature of black hole evaporation that have not been addressed in the calculations of the entropy. The island formula (or quantum Ryu-Takayanagi formula), stated below, was the main tool in recent calculations of entropy of Hawking radiation [86,87] S vN (ρ R ) = min where A χ is the area of the codimension-two quantum extremal surface χ and S semi-cl (Σ χ ) is the von Neumann entropy of the bulk quantum fields in the codimension-one region, Σ χ , bounded by χ in the bulk i.e. the entanglement wedge of the radiation, R. While this formula accurately computes the von Neumann entropy of the radiation, restoring consistency with unitarity, it leaves more to be desired. In particular, the bulk entropy term is completely semi-classical and given by a quantum field theory calculation in curved space. The calculation is agnostic to the details of the black hole microstate. One of the central pieces of the apparent paradox was that Hawking radiation always looked the same on the outside regardless of the dynamics in the black hole interior, the phenomenon of no hair. (4.21) instead tells us that there is detectable hair in the radiation even before the Page time. In fact, information about the particular microstate is present even in the first Hawking quantum. Because the fidelity of the radiation for any two different microstates is strictly less than one, we can always tell the difference between Hawking radiation coming from black holes that are in different microstates, even if they have the same macroscopic parameters mass, charge, and angular momentum. The caveat is that the difference between states of the radiation coming from distinct black hole microstates is exponentially small in the black hole entropy i.e. the deviation of the fidelity from one before the Page time is O(e −S ). This means that while in principle possible, any reasonable observer will be hard-pressed to observe this difference. If we want the probability of error in distinguishing the black hole microstates to be less than , we need an O(e S log ) number of copies of the state of the radiation, more precisely (4.23) After the Page time, there is a different caveat. The fidelity is exponentially close to zero, so the states are essentially fully distinguishable. Precisely, with just one copy of the state, the error probability is bounded above as The issue is that the amount of radiation needed to perform this discrimination is of order the size of the black hole. This means that the observer will have to perform a very complex computation, which again is not so feasible in practice. Now, consider what we would have concluded if we did not include replica wormholes in the gravitational path integral. This is the analog of Hawking's calculation of the state of the radiation that led to information loss. Removing replica wormholes corresponds to only including the identity permutation in the sum. This means (4.25) leading to all PRRE's being identically zero, regardless of how much radiation is collected. This is consistent with the initial paradox where the radiation was thought to be in the same state regardless of the black hole microstate. In fact, this was clear from (4.14) and (4.15) because, if the states of the black hole are orthogonal, the reduced density matrices on R would be identical. Finally, we note that the computation of the relative entropy between two states in the PSSY model was recently studied as a way to detect the violation of global symmetries in theories of quantum gravity [88]. The simpler quantity Tr [ρ A σ A ] was evaluated as a proxy with the full relative entropy computation left as an open question. (4.18) is the (generalized) solution to this question. While an O(1) answer was anticipated for the relative entropy after the Page time in Ref. [88], we conclude that the relative entropy is indeed infinite. It is only O(1) slightly prior to the Page time and exponentially small but finite at earlier times.

Tensor networks
Tensor networks represent a generalization of the states we have considered, adding in the ingredient of locality. As such, tensor networks have been particularly useful as toy models of holographic duality [57,89]. They are also independently interesting as presenting new classes of ensembles of random states with novel spectral properties [56]. In this section, we generalize the computations of Section 3 to generic random tensor networks, finding qualitatively new phenomena. A specific application of these results is for the random tensor networks used for modeling holography. We clarify which random states faithfully represent holographic states and which do not.
We begin with the warm-up example of a random tensor network with two tensors, T 1 and T 2 , contracted together This is the simplest generalization of the single-tensor network i.e. Haar random state The two-tensor network has one additional degree of freedom, the dimension of the internal bond, d b . For T 1 and T 2 independently Gaussian, it is straightforward to generalize the diagrammatic approach. The state is now where the dotted line is for d b and is always contracted. The arrow indicates that the dotted lines must be connected in a way that has all arrows with the same orientation. The reduced density matrix is Note the directions of the arrows. We can see that the normalization associated with each density matrix is When taking the average of the moments, we now have a double sum over the permutation group, corresponding to the two random tensors. For example, the purity moments will be (5.5) To solve this equation at leading order, we need to maximize the exponents. That is, we must find the set of permutations, . This is already a significantly harder problem than the single-tensor network where the answer is that τ must be a non-crossing permutation. Interestingly, this maximization may be rephrased as a classical network flow problem [56]. We attach a "source" to B and a "sink" to A and determine the maximal flow, w max-flow , of the network where each edge has a weight corresponding to the logarithm of the Hilbert space dimension. We apply the Ford-Fulkerson method in which, one at a time, we take a path from the source to the sink through the tensors, subtracting the weight of the edges by one as we go along the path [90]. Each one of these paths is called an augmenting path. We repeat this process until there are no more paths from the source to the sink such that we are left with a residual network. The rules for each permutation are that 1. All τ i 's are non-crossing. At leading order, the moments will then be where {τ 1 , τ 2 } is the set of permutations obeying the constraints and the dimensions with tildes are O(1) due to multiples of N , a large parameter, being pulled out. For example, if In the special case that alld i 's are one, we have where F α represents the number of paths satisfying the constraints. For example, consider the case where d A = d B = d b = N . There will be a single augmenting path (w max-flow = 1) such that the resulting network will consist of disconnected tensors with the constraint that The number of such permutations is given by the second Fuss-Catalan number so the moments will be given by The associated von Neumann entropy is This generalizes Page's formula. Had we instead taken, for example, √ d A = d B = d b = N , the same augmenting path would have led to the following residual network Because T 1 is still connected to the sink τ 1 will be set to η while τ 2 can be any non-crossing permutation of which there are a Catalan number's worth, leading to More generally, we can have a tensor network with n tensors, {T 1 , T 2 , . . . , T n }. A set of indices of these tensors will be contracted. We refer to the dimensions of these indices by the tensors they connect e.g. d ij . There is also a set of uncontracted indices which correspond to systems A and B. We refer to the dimensions of these indices as d Ai and d Bi which label the subsystem they belong to and the tensors that they are indices of. The purity moments can then be expressed as a sum over n permutation elements Here, we must maximize the more complicated exponent which can also be formulated as a network flow problem. (5.6) is generalized to where {τ i } is the set of permutations obeying the updated rules. (5.7) still applies, though the combinatorics may become significantly more difficult. If we are not concerned with the O(1) constant, we only need to determine the maximal flow. By the max-flow mincut theorem, the maximal flow from the source to the sink will always be equal to the minimal cut, γ A , in the network needed to separate the source and sink into disconnected components [90,91]. The von Neumann entropy is then We can now generalize this, as before, to relative entropy. We will explicitly compute the PRRE. This only changes the permutation allowed in the sum The key difference between this replica trick and the one for Rényi entropies is that η is not an allowed permutation in the sum. This effects all of the C(η −1 • τ i ) terms because they are maximized not by α + m, but α + m − 1 which occurs with τ i = η α × η m ∈ S α × S m . This changes rule (1) of the Ford-Fulkerson algorithm to "All τ i 's are in N C α × N C m " and rule (4) to "All τ i 's in the connected component of the sink in the residual network are set to η α × η m ." The moments are then where E A is the weight of the external A edges before applying the Ford-Fulkerson algorithm, which, in the single-tensor case simply equaled the maximum flow. In the special case where alld i 's are one, First, consider the two-tensor network when all dimensions equal N . There is a single augmenting path and E A = γ A . However, due to the restriction to S α × S m , τ 1 and τ 2 are restricted to non-crossing within the subgroup, such that which is much smaller than F C α+m . The von Neumann relative entropy is then given by which should be compared with 3/2 which was found for the single-tensor network. Apparently, adding a random tensor makes the state more distinguishable. Generalizing this conclusion, if the tensor network is a string of n tensors the combinatorial factor is given by a product of the n th Fuss-Catalan number The relative entropy is then where H n := n k=1 1 k is the harmonic number. This function is monotonically increasing in n.
If we are not concerned with the O(1) contribution, the PRRE will generally be given by This implies that the quantum relative entropy is always divergent if the A edges do not coincide with the minimal cut. We will come back to this point shortly. Similarly, note that Holevo's just-as-good fidelity is exponentially small in this case This means that for many tensor networks, two independent states will be easily distinguishable, no matter the relative size of A and B. Note that when E A = γ A , the O(1) and subleading terms are very interesting and were the main topic of this paper.
Recall that holographic random tensor networks are tensor networks composed of random tensors that are arranged geometrically as discretized hyperbolic space (see Fig. 7) [57]. Due to the negative curvature of this space, the minimal surfaces for boundary regions always lie in the bulk. This means that we always have E A > γ A , so independent states will always be completely distinguishable. This seems to be in tension with the holographic results of Section 4. As it seems, single-tensor networks, which have no built in locality, exactly match holographic states while the tensor networks that naively look like Anti-de Sitter space do not share any information theoretic properties with holography except for the entropy.
At face value, the above conclusions are a bit unsettling. Fortunately, this can be remedied by more carefully stating how a tensor network should model holographic states. Figure 7. A discretization of hyperbolic space is shown at a tensor network. For boundary subregion A, the minimal cut through the network, γ A , always dips into the bulk and is smaller than the boundary cut E A . Tensor network models represent the holographic map as a quantum error correcting code where the bulk degrees of freedom play the role of "logical qubits" that are protected by being embedded in the larger boundary Hilbert space. The logical qubits live in a code subspace. In the random tensor networks we have been considering, the code subspace (the ensemble of states we are sampling from) is identical to the Hilbert space on the boundary. This equality between bulk effective field theory and boundary Hilbert space dimensions only occurs in AdS/CFT when one has a large black hole whose horizon approaches the asymptotic boundary of the space. This is the reason for the requirement that E A = γ A ; all minimal surfaces in the large black hole geometry hug the asymptotic boundary. In order to model other holographic states using tensor networks, we must make the code subspace significantly smaller than the total Hilbert space. Additionally, the bulk density matrices should not be orthogonal. For example, when considering perturbations about vacuum AdS, the total Hilbert space dimension is O(e −1/G N ) while the code subspace is O(e G 0 N ). In practice, this means that for the two states, ρ and σ, we must take the random tensors to be correlated with each other i.e. the measure for each random tensor only has support on a proper subset of the Hilbert space.
Another important class of random tensor networks is random unitary circuits. In these tensor networks, all tensors are random unitary operators drawn from the Haar measure. Such networks have been the focus of intense study because they present an exactly solvable minimal model of chaotic many-body dynamics, only preserving locality and unitarity. Using the replica trick and Weingarten calculus, many measures of entanglement and operator growth have been computed using geometric quantities in these circuits [92][93][94][95][96][97][98][99][100][101]. Analogously, the distinguishibility measures discussed in this paper will be computable. For the dynamics of evolving a state from a product state with a random unitary circuit, twice the time (depth of the circuit) plays the role of γ A when it is smaller than the length of the region A, l A , which plays the role of E A . We therefore will find that states are easily distinguishable for times t < l A /2 and very hard to distinguish afterwards. This describes process of thermalization where different initial states become indistinguishable at late times. The details of this calculation, including the precise approach to equilibrium, are left to future work.

Subsystem eigenstate thermalization
The eigenstate thermalization hypothesis (ETH) was a major development in understanding the emergence of thermal physics from isolated quantum many-body systems in pure states [102][103][104]. The statement of eigenstate thermalization is that given two energy eigenstates, |E i and |E j , and a "simple" few-body operator O, the expectation value varies smoothly with the macroscopic, thermodynamic quantities such as energy is a smooth function of the energy, S(E) is the thermodynamic entropy, and R ij is an O(1) pseudorandom matrix. The ETH is expected to hold for generic nonintegrable systems and violated in integrable systems. In words, it states that expectation values of simple observables appear thermal, up to exponentially small corrections. Note that the standard, local ETH is a statement only about local or few-body operators. A significant strengthening of the ETH can be made by asserting that the entire reduced density matrix supported on a finite spatial region appears thermal. More precisely, the subsystem eigenstate thermalization hypothesis states that the reduced density matrices of eigenstates, ρ A (ψ), are exponentially close in trace distance to a universal thermal density matrix, ρ univ (E), that only depends on the total energy [15,105] In addition, "off-diagonal" matrices are exponentially suppressed These conditions imply the local ETH for all operators in region A and are significantly stronger [106]. It is important to understand which systems obey the subsystem ETH. Of course, for all systems, when A is the entire system, the subsystem ETH completely fails because the distance between a pure state and a thermal state is O(1). It is then nontrivial to determine at which point the subsystem ETH breaks down and thermal physics no longer applies. In the following, we show that holographic CFTs obey the subsystem ETH whenever A is smaller than half the total system size. More generally, we find generic chaotic Hamiltonian systems, whose eigenstate ansatzes were put forward in Refs. [12][13][14], obey the subsystem ETH.

Generic chaotic Hamiltonians
We use the following ansatz for the tensor product decomposition of energy eigenstates of energy E for generic chaotic quantum many-body systems [12] where ∆ E, N is the normalization, |E i A and |E J B are subsystem energy eigenstates 16 , and the coefficients are complex Gaussian random variables where, with the proper normalization, the variance is set to one. The reduced density matrix is Using this ansatz for the reduced density matrix, we perform the replica trick for the PRRE. In analogy with Refs. [12][13][14]107] where the Rényi and von Neumann entropies were evaluated for this ansatz, we find, in analogy with Section 3.2, that in the thermodynamic limit, after ensemble averaging where , (6.8) and We use the following ansatz for the thermodynamic entropies , (6.10) 16 In fact, it is not quite correct to consider these subsystem energy eigenstates due to the interaction terms in the Hamiltonian coupling A and B that lead to correlations near the boundary, as emphasized in Ref. [14]. It is more accurate to consider very similar states referred to as "many-body Berry (MBB) states" in Ref. [14]. These are constructed from perturbing an integrable Hamiltonian by an integrability breaking term. In MBB states, the subsystem eigenstates are not energy eigenstates, but local product states. The following calculation is unchanged.
where s(u) is the entropy density, V is the volume of the total system and f is the fractional volume of subsystem A. The saddle point equation for the main integral is and for the normalization We can now evaluate the PRRE in various regimes. The saddle point equation for the normalization is simple to solve because s (u) is single-valued This is not so surprising as it implies a constant energy density. The normalization is then evaluated to We now specify to f < 1/2 where we claim that E 1 < E/2. In this regime, we can expand the hypergeometric function In this approximation, This term is exponentially small for f < 1/2. Therefore, the saddle point equation for E 1 can be treated as an expansion around E 2 To leading order, the saddle point equation is Solving for δ, we find self-consistency with the claim that E 1 is very close to E 2 The saddle point solution is then Therefore, the moments are so the PRRE is exponentially suppressed in the entropy for all values of α This places strict bounds on the trace distance This provides further evidence for the refined subsystem eigenstate thermalization hypothesis in Ref. [15] that postulated the scaling of the trace distance on the size of subregion A because the upper bound precisely matches that scaling when replacing N A , the number of qubits in region A, with the subsystem thermodynamic entropy. Note also that these results generalize our result for random matrix theory because at infinite temperature, we can identify Next, consider the f > 1/2. Unfortunately, the maximum of the integral occurs right near the transition S A (E 1 ) = S B (E − E 1 ). Because of this, we cannot simply make a saddle point approximation. However, it is straightforward to argue that the PRRE will be large for α < 1 and infinite for α ≥ 1. Note that the integrands of the numerator and denominator of Tr ρ α A σ 1−α A are exponentially close for S A (E 1 ) < S B (E − E 1 ) while the numerator is exponentially suppressed in relation to the denominator for S A (E 1 ) > S B (E − E 1 ). Because the saddle point for the denominator occurs when S A (E 1 ) > S B (E − E 1 ), Tr ρ α A σ 1−α A will be exponentially small i.e. log Tr ρ α A σ 1−α A will be negative and of order the entropy. Due to the factor of (α − 1) −1 in the PRRE, this implies that the PRRE for α < 1 is of order the entropy and ill-defined (infinite) for α ≥ 1, in analogy with the random matrix theory result (3.35).
We have found that whenever A is less than half the total system size, the PRRE and therefore the subsystem trace distance between any two eigenstates of the same energy is exponentially suppressed in the entropy. The trace distance is a metric on the space of density matrices, so these eigenstates lie within a ball with radius O(e −S(E) ). The universal density matrix then must also lie within this ball such that (6.2) is satisfied.
For (6.3), we need to perform an additional computation. The off diagonal (i = j) matrix for two random states is represented as To compute the trace norm, we need the integer powers of (6.26) The moments are given by a new sum over permutations (6.27) Here, S odd represents the set of permutations where, within each cycle, the difference between consecutive numbers is always odd. For example, the cycle (1, 2, 5, 6) is allowed, but (2,4,5) is not. Crucially, the identity permutation is not an allowed permutation. If we want to maximize the number of dashed loops for small d A /d B , τ must be composed of α noncrossing two-cycles. The degeneracy is given by the Catalan number so that The trace norm is the α → 1/2 limit, such that When d A /d B is large, the cyclic permutation is an allowed permutation and will dominate, leading to Taking the α → 1/2 limit tells us that |Tr B [|ψ i ψ j |]| 1 is exponentially close to one. For finite energy eigenstates, this translates to where G α (E) is now given by Due to the factor of 3, E 1 will be larger than E 2 . We then find For sufficiently large f , the saddle point will fall in the other regime such that the numerator and denominator are identical at α = 1/2, leading to |Tr B [|ψ i ψ j |]| 1 1 at leading order.

Holographic states
We could now simply posit that because holographic systems are believed to be chaotic, their eigenstates will also have a spatial decomposition according to (6.4) and thus, will obey the subsystem ETH for f < 1/2. However, this line of reasoning is somewhat unsatisfying because it is not constructive. Instead, we implement a gravitational calculation, using the fixed-area state analysis from Section 4, to evaluate the PREE in normal states without any areas fixed. Our strategy follows Ref. [107] in manipulating the gravitational path integral into a form identical to (6.7), deriving the validity of using (6.4) for holographic eigenstates when computing the PRRE. Due to the similarities with Ref. [107], we keep the derivation brief, referring the interested reader to the original literature.
To begin, we make the assumption that black hole microstates can be represented as a random superposition of energy eigenstates in a microcanonical energy window where β is an effective temperature. These states are believed to be holographically dual to black hole geometries with end-of-world branes specifying the microstate lying behind the horizon [108][109][110][111][112], similar to the PSSY model. The corresponding density matrix is represented as the path integral on a strip of width β with boundary conditions determined byĉ. We will consider two microstates in the same energy window, corresponding to two independent sets of Gaussian random variablesĉ andd. As argued in Ref. [107], after disorder averaging, the random variables match up the boundary conditions of the strips according to the same Wick contractions previously discussed for Haar random states. Therefore, the path integral is given by a sum over all allowed Wick contractions where M i are the replica manifolds. Using the holographic dictionary, this is a sum over bulk geometries with asymptotic boundary conditions M i . In these bulk geometries, the EOW branes have disappeared due to the disorder averaging. This is an alternative way to see the reduction of bulk saddles from S α+m to S α × S m . In general, solving the bulk equations of motion to evaluate the path integrals on shell is very difficult. We use the "double-defect" construction of Ref. [107] to separate the action into a bulk contribution I bulk (g, φ, E) and actions, I brane (Σ 1 ) and I brane (Σ 2 ), for cosmic branes, Σ 1 and Σ 2 , that are located at the two extremal surfaces where k i plays the role of the number of cycles in the Wick contraction corresponding to M i . The sum over M i may be done prior to the path integral and we also take m → 1 − α to arrive at and ∆I brane := I brane (Σ 1 ) − I brane (Σ 2 ). Next, we make use of the fixed-area basis by postponing the integrals over the areas of the branes until the very end, such that the integral is rewritten as is the (unnormalized) probability of being in the state with areas A 1 and A 2 P (A 1 , A 2 ) := DgDφ For high-energy eigenstates, P (A 1 , A 2 ) will localize to a trajectory where A 2 is a function of A 1 [107] Finally, to compare with (6.7), we want to change the integration variable from A 1 to the energy density in region A, E. Using the Bekenstein-Hawking formula [2], we write the areas in terms of entropy densities where A ∞ is the divergent piece of the Ryu-Takayanagi surface which approximately cancels in all expressions because we are in the high-energy limit where the surfaces are approximately purely radial until they reach the horizon and subsequently tightly wrap the horizon. E(A 1 , A 2 ) is the ADM energy which is a function of the horizon area A BH A 1 +A 2 −2A ∞ . In a saddle point approximation, the probability then becomes [107] P (E) e S A (E)+S B (E−E) . (6.44) In total, we find It should now be evident that this formula is identical to (6.7), so we conclude that the PRRE for holographic theories is exponentially small in the entropy when f < 1/2 and subsystem eigenstate thermalization will hold.

Discussion
We take the opportunity to now comment on future directions that are out of the scope of this work but deserve attention.
On the formal end, one may be worried that the basis of our calculations have assumed finite Hilbert space dimensions and tensor factorization of the Hilbert space into H A ⊗ H B . This is not true in quantum field theory and is the reason why reduced density matrices and von Neumann entropies are not well-defined. However, the relative entropy and related quantities are well-defined quantities in the continuum using modular theory (see, for example, Refs. [113,114]). For this reason, we expect that our calculations that assumed tensor factorization are accurate even though we ignored this subtlety. However, this expectation is not guaranteed. In particular, the rank deficiency in the reduced density matrices that led to infinite relative entropy has no analog in the continuum because subregions are described by Type III von Neumann algebras which are roughly "full rank." More practically, we were only able to compute the relative entropies of the simple PSSY toy model of black hole evaporation. We expect that this model captures the essential features of the distinguishability of evaporating black holes, like the wormhole contributions that restore unitarity. It is an important future direction to complete analogous calculations in more realistic models of black hole evaporation. Though a complete calculation including all saddles like in JT gravity is most likely out of reach, one may be able to identify the saddles that lead to O(e −1/G N ) fidelity prior to the Page time and O(1) fidelity after the Page time.
For non-evaporating black holes, our calculation was limited to large black holes away from phase transitions. It is clearly of interest to generalize these holographic calculations to smaller, normal black holes. Moreover, there may be interesting cross-over behavior in relative entropies near the phase transitions. In the Page states, the transitions were O(1) (e.g. for relative entropy, the D(ρ A ||σ A ) = 3/2 at the transition). However, for finite energy states, there may be enhanced corrections analogous to the O( √ V ) corrections in the von Neumann entropy [13,107,115,116]. These should be visible in chaotic, nonholographic systems as well. Preliminary numerical results were given in Ref. [3], though a more systematic study of both integrable and chaotic Hamiltonians is warranted.
Finally, it would be interesting to study random tensor networks that act as quantum error correcting codes from "bulk" Hilbert spaces to "boundary" Hilbert spaces. These quantum error correcting codes are more closely related to the semiclassical holographic states than the tensor network states studied in Section 5. Using these tensor networks, various interesting corrections to the JLMS formula [81] may be explored. We expect this to have important implications for approximate quantum error correction and entanglement wedge reconstruction in AdS/CFT. From the Cauchy transform, one may extract the spectral measure using a Stieltjes transformation Sometimes, we know that random variables X and Y have spectral measures ρ X and ρ Y but want to know the spectral measure of their sum, X + Y , or product, XY . The spectral measure of their sum is defined as the free convolution µ X+Y := µ X µ Y while the spectral measure of their product is defined as the free multiplicative product µ XY := µ X µ Y . In order to obtain the free convolution, it is convenient to introduce the R-transform The R-transform of a sum of free random variables is given by the sum of their individual R-transforms For the free multiplicative product, it is convenient to introduce the S-tranform The S-tranform of a product of free random variables is given by the product of their individual S-transforms The key to the usefulness of free probability for us is that Wishart random matrices are free random variables asymptotically, as the dimensions become large. Their empirical spectral measure is given by the Marchenko-Pastur distribution where c := d A /d B is the rectangular parameter and x := d A λ.
Petz Rényi relative entropy For the PRRE, we need to evaluate Tr ρ α A σ 1−α A . Because ρ A and σ A are asymptotically free random variables with Marchenko-Pastur distributions, µ These integrals may be evaluated to reproduce (3.34), the result from the replica trick.
Sandwiched Rényi relative entropy For the SRRE, we need to evaluate the averaged moments of σ A . Using the replica trick, we succeeded for integer α, but were unable to analytically continue to real-valued α in order to evaluate the Uhlmann fidelity at α = 1 2 . Here, we evaluate the spectrum of σ A (equivalently ρ A σ A ) to accomplish this goal. This is the free multiplicative convolution of Marchenko-Pastur laws µ M P and has been called a generalized Fuss-Catalan distribution [119]. The Fuss-Catalan distributions themselves were first derived in Ref. [120]. The S-tranform for the Marchenko-Pastur distribution is given by .
Therefore, the S-transform for ρ A σ A is (A.13) Plugging this into (A.8), we find Taking the correct root of this cubic equation and taking the Stieltjes transformation, we find the spectral measure We compare the derived distribution with numerics in Fig. 8.  Figure 8. The probability density function for the generalized Fuss-Catalan distribution is shown (grey lines) with comparison to numerics. On the left, the blue, green and red dots correspond to c = 2 0 , 2 −2 , 2 −4 respectively. On the left, the blue, green and red dots correspond to c = 2 0 , 2 2 , 2 4 respectively. The total system size is 2 16 and we disorder average over 10 3 realizations.
Trace distance The trace distance is defined using the trace norm of the difference of ρ A and σ A . This is tailor-made for a computation in free probability because of free convolution. The R-transform for the Marchenko-Pastur distribution is Because we are taking the difference, we must rescale the second R-transform Therefore, the R-transform of the difference is

B Commutation of ensemble average and logarithm
We have been using the replica trick throughout the paper to compute relative entropies. This has involved evaluating ensemble averages of traces of powers of density matrices and then taking a logarithm. In general, the ensemble average and logarithm do not commute.
In this appendix, we show that the two operations approximately commute in the large-N limit. To properly take the average of a logarithm, we need an additional replica trick For illustration, we work with the PRRE though the argument is the same for all other quantities. In diagrams, the necessary moments are Tr ρ α A σ m A q = · · · · · · . . . · · · · · · , (B.2) where there are q total blocks. As a sum over permutations, this is where in cycle notation (1 + q, 2 + q, . . . , α + m + q).

C Equivalence with Haar unitary tensor networks
Frequently, random tensor networks are constructed by projected Haar unitary states. This is in fact equivalent to the Gaussian random networks we use. The reason is the following.
In the Haar random construction, every vertex of degree k is a state of k qudits projected to a random state U |0 where U is a Haar random unitary and |0 is any state. This gives the state 0| ⊗k U † |i 1 · · · |i k . Denoting the set of k indices by one index i, this exactly corresponds to the Gaussian tensor network with the identification U i,0 ↔ X * i . Every edge corresponds to a maximally entangled pair in the projected Haar random network, which is just the index contraction in the Gaussian network. The projected unitaries indeed have a Gaussian distribution, since (see e.g. Ref. [121]) X * i 1 · · · X * in X j 1 · · · X jn = U i 1 ,0 · · · U in,0 U † 0,j 1 · · · U † 0,jn = = σ,τ ∈Sn δ i 1 ,j σ(1) · · · δ in,j σ(n) Wg(n, τ • σ −1 ) ∝ σ δ i 1 ,j σ(1) · · · δ in,j σ(n) (C.1) and zero for a different number of X and X * 's, just as for Gaussian variables.

D Interpolating between QSD and QHT
In the main text, we characterized the asymptotic error rates in distinguishing states in the totally symmetric (QSD) and totally asymmetric (QHT) cases with the quantum Chernoff distance and relative entropy respectively. It is natural to ask if the there is a way to interpolate between the two. This was addressed in Ref. [31] where the type II error β(A) was optimized given the constraint that α(A) ≤ ε 1−s β(A) s for s ≥ 0 and ε > 0. Note that this coincides with QHT for s = 0 and QSD for s = 1. This is referred to as s-hypothesis testing and the error rate was proven to be given by the s-quantum divergence defined as It is instructive to examine the two familiar limits. When s = 0, the quantity being maximized is the PRRE. We know that the PRRE is monotonically increasing with α so ξ 0 (ρ||σ) is given by the relative entropy in accordance with our expectation. When s = 1, the RHS becomes the definition of the quantum Chernoff distance. Using (3.35), we can evaluate the s-quantum divergence for random states. We plot the value of α that maximizes the RHS as a function of s in Fig. 9. This function monotonically decreases from one at s = 0 to zero at s = ∞, passing through α = 1/2 at s = 1.

E Entanglement plateau
Ref. [3] was not the first attempt in the literature to use related information theoretic quantities to try to distinguish black hole microstates. A selection of previous works include Refs. [106,[122][123][124][125][126]. In particular, Ref. [122] used the Holevo information to distinguish black hole microstates. The Holevo information of A for an ensemble of density matrices {ρ i } is given by the average relative entropy from each microstate to the ensemble average This is bounded above by the Shannon entropy, − i p i log p i , which happens to be the black hole entropy, S BH , in this case. The authors computed each piece holographically to leading order in Newton's constant using the Ryu-Takayanagi formula [127,128] and determined that there are three phases of χ(A). The microstate entropies were computed from the two extremal surfaces of the previous section, so the transition occurred when A was at half the total system size. In contrast, the ensemble averaged state (Gibbs state) entropy does not transition to the second extremal surface at the same time. While this was not the perspective taken in the original paper, we attribute this to the black hole contributing to the bulk entropy in the FLM formula [78]. This transition occurs when A is much larger than its complement and discussed thoroughly in Ref. [129] where it is called the "entanglement plateaux." In summary, they found the Holevo information to be identically zero until the halfway point, then linearly increase until a critical size when it saturates to its maximal value of the black hole entropy. The relevant extremal surfaces and behavior of the Holevo information are shown in Fig. 10.
Using fixed-area states, we can improve upon this analysis by computing nonperturbative corrections. As already reviewed in the previous section, all pure black hole microstates will have entropy where the first (second) subscript occurs when A 1 (A 2 ) is minimal. Figure 10. Left: Two competing extremal surfaces are shown for region A. In this case, there is no bulk entropy term because the black hole is in a pure state. Center: When the volume of A, V A , becomes sufficiently large (> V crit ), the area of the red curve plus the black hole entropy is smaller than the area of the blue curve. The black hole entropy can either be viewed as a bulk entropy term or an additional area term. Right: The Holevo information up to nonperturbative corrections as a function of the volume of A.
For the entropy in the Gibbs state, we must account for the bulk entropy where n i are the lengths of the cycles of τ . The bulk entropy terms manifestly go away when the black hole is in a pure state. For simplicity, let us take ρ b to be the maximally mixed state with dimension e S BH . Finite temperature corrections are not immediately important to demonstrate our main conclusions but may be interesting. In this case, the sum simplifies to This is just a renormalization of A 2 /4G N to A 2 /4G N + S BH . This makes sense because the alternative perspective would be to fix the area of the new extremal surface that includes the black hole horizon. Computing the sum as before, we find In total, the Holevo information is