Equilibration time scales of physically relevant observables

We address the problem of understanding from first principles the conditions under which a quantum system equilibrates rapidly with respect to a concrete observable. On the one hand previously known general upper bounds on the time scales of equilibration were unrealistically long, with times scaling linearly with the dimension of the Hilbert space. These bounds proved to be tight, since particular constructions of observables scaling in this way were found. On the other hand, the computed equilibration time scales for certain classes of typical measurements, or under the evolution of typical Hamiltonians, turn out to be unrealistically short. However neither classes of results cover physically relevant situations, which up to now had only been tractable in specific models. In this paper we provide a new upper bound on the equilibration time scales which, under some physically reasonable conditions, give much more realistic results than previously known. In particular, we apply this result to the paradigmatic case of a system interacting with a thermal bath, where we obtain an upper bound for the equilibration time scale independent of the size of the bath. In this way, we find general conditions that single out observables with realistic equilibration times within a physically relevant setup.

We address the problem of understanding from first principles the conditions under which a quantum system equilibrates rapidly with respect to a concrete observable. On the one hand previously known general upper bounds on the time scales of equilibration were unrealistically long, with times scaling linearly with the dimension of the Hilbert space. These bounds proved to be tight, since particular constructions of observables scaling in this way were found. On the other hand, the computed equilibration time scales for certain classes of typical measurements, or under the evolution of typical Hamiltonians, turn out to be unrealistically short. However neither classes of results cover physically relevant situations, which up to now had only been tractable in specific models. In this paper we provide a new upper bound on the equilibration time scales which, under some physically reasonable conditions, give much more realistic results than previously known. In particular, we apply this result to the paradigmatic case of a system interacting with a thermal bath, where we obtain an upper bound for the equilibration time scale independent of the size of the bath. In this way, we find general conditions that single out observables with realistic equilibration times within a physically relevant setup.
Knowing the details of how systems approach equilibrium is a major topic within statistical mechanics. However, deriving results on the equilibration time scales that are both general and apply to physically relevant situations has proven to be a challenge; one of the major open problems in understanding equilibration processes of quantum systems. This paper addresses the time scales for reaching equilibrium in closed quantum systems. Recently there have been promising advances [1][2][3][4][5], which add to the vast literature studying these issues in more specific models [6][7][8][9][10][11][12][13][14][15][16][17][18][19] (for recent thorough reviews of this and related topics see [20], [21] and [22]). In particular, we have learnt that typical observables (when appropriately drawn at random) equilibrate rapidly [2,3,5], and that the same is true for the evolution under typical Hamiltonians [9][10][11] and for systems starting from typical nonequilibrium subspaces [4]. Remarkably, this rapid equilibration has even been observed experimentally in certain systems [5]. Yet, one can construct observables which take an extremely long time to approach equilibrium, up to a time proportional to the Hilbert space dimension of the system [2,3]. Note that by fast vs slow equilibration we do not mean picoseconds vs years; slow can refer to timescales longer than the "age of the universe" for the constructions found in the papers mentioned above.
It is important to note that the above mentioned results do not teach us a great deal about what happens for a given, physically relevant, observable. For instance, they do not answer what the time scales of equilibration for a system interacting with an environment are. Meanwhile, the typical (in the mathematical sense) measurements considered will not generally represent physically relevant observables.
Moreover, the fact that one can always find mathematical constructions of observables which equilibrate in extremely long times, as in [2] and [3], implies that extra -more physical -conditions are fundamental in singling out the observables which equilibrate within reasonable time scales.
In this paper, we consider the following physically relevant scenario -measurements on a small system which is interacting with a large, highly mixed, bath via a given (non-random) Hamiltonian. The main result will be to find sufficient conditions on initial state, observable and Hamiltonian that ensure reasonably fast time scales; in particular time scales that do not grow or decrease with the full dimension of the Hilbert space. We find that this is the case for sufficiently mixed initial states (as thermal states of not-too-low temperature are), provided some natural conditions on the off-diagonal matrix elements of the observable and initial state in the energy basis are met, that essentially ensure that a wide range of frequencies are involved in the evolution. This will be applied to the paradigmatic case of a small system interacting with a thermal bath in the microcanonical ensemble [23], where we obtain an upper bound on the equilibration time scale which does not depend on the dimension of the bath. Importantly, the results hereby obtained do not depend on particular details of the system under consideration.
We will say that a system equilibrates when it approaches some steady state, and remains close to it, for some reasonably long time interval [24]. Given the fact that for finite dimensional systems there always exist revivals -times (in general very long) in which the system returns arbitrarily close to the initial state -in quantum mechanics one cannot have equilibration in the strict sense. Therefore, following [25], we will say that a system equilibrates if, for most times, its state is close to some fixed steady state. This fixed steady state is then called the equilibrium state.
Here, this closeness is assessed with respect to some particular observable A, so we say equilibration takes place if A cannot distinguish the instantaneous state from the equilibrium one. Restricting to different kinds of observables leads to different notions. Then, for instance, an observable acting on a subsystem probes whether that subsystem has equilibrated, and what happens in the remainder of the closed system is only relevant in how it affects the evolution of this subsystem. However, taking other kinds of observables, for example A being some many body observable, gives a different view to the process. These sort of questions are particularly relevant since experiments are bringing mesoscopic quantum systems closer to observation [26][27][28][29][30]. Notice that these situations are in general not described by master equations, and usually one needs to solve the actual evolution of the system in order to learn about time scales of equilibration.
We start in Section I by introducing the necessary notions for this paper, and a statement of the main result. Section II contains a general upper bound on the time averaged distance between instantaneous and equilibrium state, and an analysis of the time decay of this bound. Section III contains an expression for the time scale of equilibration, which depends on the observable, state and Hamiltonian under consideration; the first main proof in the paper. In IV we apply the result to the case of a system interacting with a thermal bath in the microcanonical ensemble, an important application of the previous part. We end in Section V with an analysis of the conditions necessary to obtain reasonably fast equilibration. All detailed calculations can be found in the Appendices.

I. SETTING AND SPECIAL CASES OF THE MAIN RESULTS
Consider a closed quantum system with a Hamiltonian H, and an initial state given by the density matrix ρ 0 in a Hilbert space H. We start by focusing on a weak notion of distance between states, based on comparing the instantaneous expected value of an observable A to its equilibrium expected value, where the evolved state is ρ t = e −iHt ρ 0 e iHt , and ω = ρ t ∞ is the equilibrium state [25,31], where f (t) T = 1 T T 0 f (t)dt denotes a time average. Note that the equilibrium state is simply the initial state decohered in the energy basis, since the infinite time averaging removes any oscillating terms. The operator A is assumed hermitian, with A denoting its spectral norm [32]. With this definition 0 ≤ D A ≤ 1. For simplicity we take units such that = 1.
Obviously, equilibration of expectation values does not imply equilibration in general, since one can have very different distributions with the same expected values. However, even for this weak notion of equilibration no reasonable time scale bounds for physically relevant observables were known up to now. Furthermore, it is easy to extend our calculations to a stricter notion of equilibration, the distinguishability between ρ t and ω given a measurement of A (for completeness we show this in Appendix A). In order to distinguish the quantity D A (ρ t , ω) from the actual distinguishability, we will call it the weakdistinguishability.
We can express the time average of the weakdistinguishability as where energy levels are denoted by E j , and the matrix elements of initial state, equilibrium state, and observable in the energy basis are ρ jk , ω jk , and A jk respectively [1]. The index α represents pairs (j, k) of levels with distinct energies; we denote the corresponding energy gap by G α = (E j − E k ), and define the coefficients Notice that only terms with non-zero energy gaps appear in the sum in (2) since ω jk = ρ jk for E j = E k . Our aim is to prove that the time average of the weakdistinguishability considered above becomes small. Since D is a positive quantity this would allow us to conclude that for most times the weak-distinguishability is small, showing equilibration occurs. The main objective of this paper is to determine, or at least to upper bound, the time scale T eq in which this decay happens.
The following normalised distribution will be crucial for our proofs: with the normalization factor The distribution p α contains information of all the physical quantities relevant for the dynamics, namely the observable A, initial state ρ 0 , and the Hamiltonian H, and is a measure of the significance of the different energy gaps G α . Our main technical result is a general bound on equilibration times for observables when the initial state is highly mixed (Theorem 6). Before embarking on the proofs of our general technical results, it may be illuminating to see how they apply in certain special cases that are of physical interest. The first concerns a small system interacting with a bath that is in a maximally mixed state. The second is a version of our main physical theorem (Theorem 8), in which the bath is in a microcanonical state.
Let us first consider a small system S of dimension d S interacting with a large bath in the maximally mixed . We can then prove the following (this follows straightforwardly from Theorem 6 by taking A = A S ⊗ ½ B and Q bounded by (29)).
Theorem 1 (Bound for system interacting with maximally-mixed bath). For any system observable A = where and a(ǫ) and δ(ǫ) depend on the distribution p α and an arbitrary parameter ǫ > 0. They are described briefly below, and defined in Proposition 5.
Crucially, we will show in Section II that if the initial state ρ 0 , observable A, and Hamiltonian H are such that p α is spread over many different energy gaps and approximately unimodal then we can choose ǫ such that δ(ǫ) ≪ 1 and a(ǫ) ∼ 1. We will argue that this is to be expected for a wide range of initial states in systems with interacting Hamiltonians, and in Appendix F show it explicitly in a simulation of a spin ring, i.e. a 1D Ising model with transversal magnetic field and periodic boundary conditions. Moreover, we will argue in Section IV B that we would expect to achieve a reduction in δ(ǫ) as the size of the bath increases, hence the second term in eq. (6) becomes small for large baths, and we obtain that equilibration occurs for large enough times T .
We can think of Theorem 1 as describing the system coupled to an infinite-temperature bath. To extend the analysis to a more physically realistic finite temperature bath (with inverse temperature β), we consider a bath which is initially in the microcanonical ensemble. Hence, the bath starts in a state ρ B = , where ½ ∆ B is the projector on some microcanonical window of width ∆ and dimension d ∆ B . We can then prove the following (this is Theorem 8 in Section III, applied to the special case in which A = A S ⊗ ½ B . Theorem 8 also applies to general observables).
Theorem 2 (Bound for system interacting with thermal bath). For any system observable A = A S ⊗ ½ B , initial , Hamiltonian H = H B + H S + H I , and any K > 0 and ǫ > 0, the weak-distinguishability satisfies where a(ǫ) and δ(ǫ) are defined in Proposition 5 . For a bath with density of states proportional to e βE in the vicinity of the microcanonical window, Considering a sufficiently large bath and choosing the constant K such that the last term is also small, we obtain that equilibration eventually occurs for large enough times T .
More precisely, we find that the system will be equilibrated with respect to A, in the sense described above, for times T ≫ T eq , where Crucially, note that if interactions between the system and bath are short-range, such that H I only couples the system to a finite region in the bath (e.g. nearest neighbour interactions in a spin lattice) then T eq does not scale with the size of the bath. Instead it depends on details of the system and its coupling to the environment, and can be easily calculated from the initial state, observable and Hamiltonian once a(ǫ) has been estimated.
In the next section we will show a general bound on the weak-distinguishability, setting the ground for proving the above results.

II. GENERAL BOUND ON AVERAGE DISTANCE
Since D A is a positive quantity, it satisfies where denotes the Lorentzian time average of the function f [3]. Upper bounding the sum in (2) by taking the absolute value of all of the terms, incorporating (11), and using the fact that e iνt LT = e −|ν|T , we get with p α ≡ |vα| Q and Q ≡ α |v α |. We are interested in the decay in time of D A (ρ t , ω) For the normalized probability distribution p α we define the function ξ p as follows.
Definition 3. Given any normalized probability distribution p over the values of a real variable Y , we define ξ p (x) as the maximum probability of any interval of length x. In particular, when Y is discrete, We prove in Appendix B the following.
Proposition 4 (General bound). For any initial state ρ 0 , any Hamiltonian, and any observable A, The function ξ p (x) will in general be difficult to compute explicitly, but for small x it can be bounded (and well approximated) by a linear function. We will capture this behaviour in the following.
Proposition 5. For any distribution p, It will be convenient to re-express this as where σ is the standard deviation of the distribution, and we define FIG. 1: a and δ as functions of ǫ for a binomial distribution of 2 × 10 6 randomly chosen bits (mean 10 6 , standard deviation ≈ 707).
Proof. Take (n − 1)ǫ ≤ x < nǫ, with n ≥ 1 a natural number. The function ξ p is non-decreasing, hence ξ p (x) ≤ ξ p (nǫ). Since ξ p (ǫ) quantifies the maximum probability that can fit any interval ǫ, we also have that ξ p (nǫ) ≤ nξ p (ǫ), which results in We now derive some general properties of ξ p . For many distributions p we would expect to be able to find an ǫ such that a(ǫ) ∼ 1 (in terms of its approximate order of magnitude) and δ(ǫ) ≪ 1. To visualize how this can be so, consider the case in which the distribution has essentially a single "peak", and that the standard deviation σ approximately quantifies the width of this peak. In such a case, a rough estimate for the maximum probability that can fit inside an interval ǫ can be given by With this estimate we indeed get a(ǫ) ∼ 1. Figure 1 illustrates this for the case of a binomial distribution, where 0.2 < a(ǫ) < 0.8 for all ǫ > 1 2 . In general the above will work when the distribution p is approximately unimodal, i.e. characterised by a single distinct peak. If, on the contrary, the distribution was composed of two or more peaks the estimate in equation (19) might not hold, as Figure 2 exemplifies.
When (19) holds, taking ǫ ≪ σ is also enough to ensure δ(ǫ) ≪ 1. Note that in Figure 1, a(ǫ) diverges for small ǫ. To avoid such behaviour, we would typically want to choose ǫ larger than the gaps between consecutive values of the variable. Overall, we would expect to be able to find an ǫ satisfying both a(ǫ) ∼ 1 and δ(ǫ) ≪ 1 if the for continuous distributions in the limit ǫ → 0. The bimodal distribution can violate the estimate ξ p (ǫ) ∼ ǫ σ , simply because one can make the standard deviation arbitrarily large by placing the peaks further apart without changing the actual value of ξ p (ǫ).
distribution is approximately unimodal and spread over many different values of the random variable. In our particular case, Proposition 4 refers to the dis- , which depends strongly on the distribution of energy gaps of the system. For large systems with typical energy ranges (e.g. finite positive temperatures), their energy levels tend to be more densely packed for larger energies, which leads to a much larger concentration of small gaps than large gaps. For most A and ρ 0 we would therefore expect the distribution p α to be more peaked towards the center and decay for larger values of the energy gaps G, leading to an approximately unimodal distribution over a dense spectrum as considered above. Nevertheless, this will not always be the case, as we will discuss in Section V.

III. OBSERVABLE DEPENDENT TIME SCALE BOUND
Propositions 4 and 5 lead to the following result.
Theorem 6 (Observable dependent bound). Given an initial state ρ 0 , observable A, Hamiltonian H, and any ǫ > 0, the time averaged weak-distinguishability satisfies where σ G is the standard deviation of energy gaps G α for the distribution p α , a(ǫ) and δ(ǫ) are as in Proposition 5, and d is the rank of ω.
Proof. Since the distribution p α is symmetric with respect to interchanging the indices {j, k} while G α is antisymmetric, we get that its variance, denoted by σ 2 G , satisfies Notice that for a local Hamiltonian and observable, and a known initial state, this expression (combined with the bound for Q which soon follows) is much simpler to compute than σ G , since it does not require detailed knowledge of the Hamiltonian's spectrum and eigenbasis, which is needed in order to construct the distribution p α in the first place. Moreover, we find an upper bound for Q, where Π ω projects onto all the energy levels of the Hamiltonian which occur with non-zero probability in ρ 0 (this is given by the support of ω). In the third line we restrict to this set of energy levels and use the Cauchy-Schwarz inequality. For the last line, notice that by the Cauchy-Schwarz inequality for the scalar prod- Then using that for any two positive semidefinite matrices Tr Inserting the above into eqs. (14) and (16) proves our claim.
If the second term on the right hand side of equation (20) is small, the system will eventually equilibrate with respect to A. The time dependence is determined by the first term. In particular, the system will be equilibrated (in the sense described in Section I) for times T ≫ T eq , where It is interesting to note the dependence of the above expression on [ρ 0 , H], H , which is, up to a minus sign, the second time derivative of the state at t = 0. Therefore T eq can be alternatively written as Remarkably, the denominator of this expression is what one would expect from a Taylor expansion of the distance for short times, assuming the system is initially as far from equilibrium as possible (then the first derivative term is 0 and one is left with the second derivative as leading order).
We argued earlier that we would typically expect a(ǫ) ∼ 1 and δ(ǫ) ≪ 1. However, we still have to address the size of the bound for Q given by equation (21), which could greatly influence the speed of equilibration. Notice that in general the dimension d of the Hilbert space is extremely large, since it scales exponentially with the number of constituents of the closed system being considered. Therefore, in order for this bound to show rapid equilibration we would need a very mixed initial state, spread over a significant fraction of the Hilbert space.
Moreover, the constant Q appears in the second term in Theorem 6, along with δ(ǫ). In order to show equilibration at all this second term needs to be small too.
In the next section we consider an important physical scenario and then use our bound to show reasonably fast equilibration.

IV. SYSTEM INTERACTING WITH A BATH
We now turn to the paradigmatic case of a small system interacting with a large thermal bath. This situation corresponds to decomposing the closed system considered in the previous sections into a small system S and a bath B. By assuming the observable A to be of the form where A S acts on the system and ½ B is the identity acting on the bath, one can focus on the system's behaviour.
The total Hamiltonian is denoted by where H S and H B are the system and bath Hamiltonians, and H I denotes the interaction between them. We assume that the system S is initially in an arbitrary state ρ S , and for simplicity not correlated with the initial state of bath ρ B , that is ρ 0 = ρ S ⊗ ρ B , corresponding to a system initially isolated which is suddenly allowed to interact with B via H I .
To show that such a situation can lead to a small value for Q, we first consider the case in which the bath is in a maximally mixed state, with In this case, it is easy to see from equation (21) that The remainder of this section corresponds to extending this simple example (which could be understood as a system interacting with an infinite temperature bath) to the more physical case of a system interacting with a finite temperature bath.
In what follows, given a Hamiltonian H we denote an energy window of width ∆ centered around an energy E in terms of its corresponding Hilbert space H E,∆ H , defined as We will consider the state of the bath from the microcanonical ensemble viewpoint. Consequently, we consider an energy window of the bath Hamiltonian of width ∆ centered around E B . The subspace that this defines, H EB ,∆ HB , will be referred to as a microcanonical window, and its dimension will be denoted by d ∆ B . The initial state of the bath is then , and the initial state of the system plus bath is which we will use from now on. The width of the microcanonical window is to be taken large enough such that it contains many energy levels, in particular many more than the dimension of the system, yet small in comparison to the whole spectrum of the bath Hamiltonian H B .

A. Truncation of the Hilbert space
Notice that the state , corresponding to a bath in the microcanonical ensemble, is quite mixed. This is good news for our bound Q 2 ≤ d Tr ρ 2 0 given by equation (21), since the purity of the state will be a small number. However, the presence of the dimension of the Hilbert space implies that the bound for Q could still be extremely large. In this section we show a truncation method for the state and the Hilbert space which allows us to reduce the relevant dimension significantly.
As The above reasoning is proved correct in Appendix C, where we show that the trace distance between the state ρ 0 and a truncated state Πρ 0 Π is small, where Π is a projector onto the truncated subspace. More precisely, we find the following. can be truncated to the state Πρ 0 Π, with where Π projects onto the subspace H EB ,∆+2 HS +η HB +HS +HI with a width extended by η = √ 8d S H I K.
As a straightforward corollary one obtains that where ρ t = e −iHt ρ 0 e iHt is the evolved state. This shows that, as long as we take K large enough, the two states give similar evolutions. This truncation procedure will be particularly useful to us, since the dimension of the accessible Hilbert space H EB ,∆+2 HS +η HB +HS +HI is in general much smaller than the full dimension.
We also find in Appendix C that, if the density of states of the bath is denoted by ν B (E), the dimension of the truncated state, d trunc = rank(Π), satisfies Meanwhile, the dimension of the (unperturbed) microcanonical window of the bath is given by Typically, thermal baths have a (coarse grained) density of states which grows approximately exponentially with energy. Thus, if we take where β is the inverse temperature and N a normalization constant, it is easy to obtain Note that, given that the energy width of the microcanonical window grows as the number of constituents of the bath increases, in general β∆ ≫ 1 holds for a large enough bath, in which case the last inequality is a particularly good approximation.

B. Time scales for a system in contact with a bath
Proposition 7 allows us to truncate the microcanonical state ρ 0 to Πρ 0 Π, since the error introduced is small. This greatly reduces the dimension of the relevant Hilbert space, and consequently the corresponding bound for the constant Q in Theorem 6.
However, this reasoning would also lead us to use the truncated state in the theorem itself, which would cause the replacement of Tr [ρ 0 , H], H A by (20). This not only introduces additional complexity but could possibly significantly weaken the bound. Moreover, even if the Hamiltonian involved nearest neighbour-type interactions, Π could be highly non-local and indeed we may have no way of computing it. Nevertheless, we prove in Appendix D that the time average of the weak-distinguishability can be bounded with a commutator involving the original state ρ 0 instead of the truncated one, while still having a relevant Hilbert space with much smaller dimension than the original space.
We finally have all the ingredients to apply Theorem 6 to the case of a system in contact with a thermal bath, which turns into the following. , the weak-distinguishability satisfies where a(ǫ) and δ(ǫ) are as in Proposition 5, and where ν B (E) is the density of states of the bath. Moreover, if we take the density of states of the bath to be ν B (E) ∝ e βE (as we would expect for a thermal bath in the vicinity of the microcanonical window), we obtain from eq. (37) that Taking a system observable A = A S ⊗ ½ B we recover the main result in Section I, since the Hamiltonian H B in eq. (38) commutes with ρ 0 and A. Let us consider this result more closely. To begin with, all time independent terms have to be small for our theorem to imply equilibration in the first place. The factors involving K in equations (38) and (40) come from the truncation procedure, and are small as long the microcanonical window and the truncation window are large enough.
The other time independent term is π δ(ǫ)Q 2 2 , which we have neglected so far. As discussed in Section II, for distributions p α that are approximately unimodal and sufficiently spread over different values one can estimate that δ(ǫ) ∼ ǫ σG . Notice that as the bath grows in size one would expect that this holds for smaller values of ǫ, since the distribution p α would be spread over more values. We could therefore take ǫ smaller and smaller and reduce δ(ǫ). At the same time, the bound on Q 2 in equation (40) will generally not grow with the dimension of the bath. To see this note that typically (e.g. for short range interactions in a lattice system), H I will not increase significantly as the bath size increases, and that increasing the width ∆ of the microcanonical window as the bath grows will cause the bound to become tighter. Therefore, in the limit of increasing bath sizes the term π δ(ǫ)Q 2 2 becomes negligible, as needed. The fact that the results in Theorem 8 do not depend on the dimension of the full Hilbert space is a very noticeable aspect of this paper. This is in stark contrast with previously know general upper bounds on the time scale of equilibration [1], which essentially scale with the full Hilbert space dimension.
Finally, the first term in eq. (38) determines the time decay of the weak-distinguishability, and can be interpreted the same way as in the corresponding term in Theorem 6 (see subsequent discussion). Notice that, once a(ǫ) is estimated, the time dependence can in general be calculated analytically for a given initial state, Hamiltonian and observable. Moreover, performing this calculation is much simpler than solving the exact time evolution, which involves commutators of initial state and Hamiltonian of all orders and can only be done for simple models.
It is illuminating to ask how our bound behaves in a case where no equilibration occurs. Take, for example, a spin 1/2 in a pure initial state |Ψ = 1 √ 2 |↑ + |↓ as the system S, and a bath composed of N other spins in the microcanonical ensemble. Furthermore, take the Hamiltonian H = Ωσ S z + H B , and the observable A = σ S x ⊗ ½ B .
Since the system does not interact with the bath it does not equilibrate with respect to the observable A. The key to understanding where our bound expresses this fact is in the factor δ(ǫ). It is easy to see that the distribution p α is composed of only 2 values, corresponding to the gaps Ω and −Ω, which results in δ(ǫ) ≥ 1 2 for any ǫ, hence no equilibration at all.

C. System interacting with an environment in a pure state: the typical behaviour
So far we have considered mixed initial states of the total closed system. Here we show that our results can be extended to the typical behaviour of pure initial states of the environment that interact with the system. Let us consider the environment's initial state to be pure, and drawn at random from the microcanonical window. Any pure state from the microcanonical window can be written as where U is a unitary operator acting on H EB ,∆ HB . By averaging over all possible U 's, drawn from the Haar measure, we have the typical behavior for random pure states from the microcanonical subspace H EB ,∆ HB . It turns out that taking the initial environment state to be a pure state chosen at random from a microcanonical window leads to very similar results to the environment starting in the microcanonical mixed state HB . More precisely, we show in Appendix E the following.
Proposition 9 (Evolution for typical initial states of the bath). The weak-distinguishability averaged over all possible initial pure states of the environment drawn from a microcanonical window of width ∆ satisfies where ρ U 0 = ρ S ⊗ U |ψ ψ|U † with corresponding evolved and equilibrium states ρ U t and ω U , and ρ 0 = ρ S ⊗ with corresponding evolved and equilibrium states ρ t and ω.
Since the microcanonical window is assumed to contain many more levels than the system's dimension, d ∆ B ≫ d S , the above expression implies that for typical initial pure states of the bath the evolution is as if the initial state It is straightforward to combine this Proposition with Theorem 8 and show that the upper bound for the typical time scale of equilibration for a system interacting with a bath in a pure state is the same as if the bath were in the microcanonical state.

V. DISCUSSION
From previous work we know that one needs to impose further conditions in order to prove reasonably fast equilibration, since extremely slow observables can always be constructed [2,3].
In this article we have found a set of sufficient conditions that ensure this. More precisely, when the distribution p α ≡ 1 Q A |ρ jk ||A kj | -which characterises the energy gaps that are most relevant to the particular state and observable under consideration-is approximately unimodal and spread over many different values, one expects a(ǫ) ∼ 1 and δ(ǫ) ≪ 1. In the setting of a system interacting with a thermal bath, this implies equilibration time scales that do not scale as the size of the bath grows, as Theorem 8 shows.
Whether the above holds or not ultimately boils down to the values of the off-diagonal matrix elements of the observable and initial state in the energy basis, and to the distribution of energy gaps. Nevertheless, there are general arguments indicating that a(ǫ) ∼ 1 and δ(ǫ) ≪ 1 might hold for a wide range of systems. Firstly, in typical situations one might expect that a state is spread roughly equally over a range of energies (this occurs for example for thermal states), and that, unless the observable A is fine-tuned, its components are also spread relatively smoothly over this band of energies, at least in a coarsegrained sense. Secondly, for large systems the distribution of energy levels tend to grow exponentially with energy in the region of finite temperature. It is easy to check that if one assumes a density of states ν(E) ∝ e βE , the corresponding density of gaps scales like µ(|G|) ∝ e −β|G| , with an exponential decrease. This implies that, in order to have a resulting distribution that is characterised by one peak, it is sufficient to have matrix elements of A and of the initial state ρ 0 that grow sub-exponentially as a function of the energy gaps. This does not seem like a particularly strong assumption. Finally, let us note that even when p α is not unimodal, and is instead composed of a number of distinct peaks, Theorem 8 will still give reasonable equilibration times (in particular, times which are approximately independent of the size of the bath) except in the case where the individual peaks get sharper as the size of the bath is increased. Our intuition is that such behavior is rare, and thus that the bound will have very general applicability.
The remaining question is whether physically relevant cases will in general be of this form (satisfying a(ǫ) ∼ 1 and δ(ǫ) ≪ 1, and therefore "reasonably fast equilibrating") or of the other (violating these conditions and therefore "slow equilibrating"). Appendix F illustrates the transition to approximate unimodality, and the conditions being met, as environment size increases in a simulation of a 1D Ising model with transversal magnetic field and periodic boundary conditions. However, proving that this occurs and finding the physical conditions under which it happens remains an interesting open problem for future study.
It is worthwhile comparing our conditions with the assumptions made in previous work in order to prove equilibration of closed quantum systems. Equilibration can be proven by assuming that the effective dimension defined as d eff = 1 j ρ 2 jj (for non degenerate spectrum for simplicity) is large [25,31], and that the Hamiltonian does not have too many degenerate energy gaps [1]. Notice that, although we do not make these assumptions explicitly, we are in some sense implicitly assuming both of them. On the one hand, a high effective dimension is related to having many energy levels populated in the system, which is necessary in order to have a distribution p α that is spread over many different values. On the other hand, the presence of a very degenerate energy gap results in a distribution p α such that δ(ǫ) ≪ 1 does not hold, as the simple example after Theorem 8 illustrates.
The present paper emphasizes the importance of the off-diagonal matrix elements of observable and initial state to the study of the equilibration time scales in closed quantum systems.
The Eigenstate Thermalization Hypothesis, introduced by Deutsch and Srednicki as a sufficient condition for thermalization [33,34], has motivated extensive work on the distribution of diagonal matrix elements of observables [35][36][37][38][39][40][41][42]. However, much less work dealing with the distribution of the off-diagonal matrix elements is available, some examples being [43][44][45][46]. The recent papers [44,45] show, in certain models, gaussian distribution of these matrix elements for local observables, which supports our claim that a unimodal distribution of p α is to be expected in many situations. Moreover, [46] numerically verifies our predictions in an experimentally realizable setup consisting of an electron in a quantum dot interacting with a bath of nuclear spins. Remarkably, the authors find that, even though our results are model independent and not tailored to this particular system, our new bounds fall within two orders of magnitude of the actual time scale.
Our paper focuses on the equilibration of a small system with respect to a pre-equilibrated bath, but many open questions remain regarding general equilibration timescales. One direction of particular interest is an equilibration timescale for the bath itself, and what aspects are necessary for it to play its usual thermodynamic role. We hope that the tools developed here will aid in further study along these lines and help shed further light into this important topic.

ACKNOWLEDGEMENTS
We would like to thank Moritz Fuchs, Daniel Hetterich, and Björn Trauzettel for many illuminating discussions, and for sharing the findings of [46] prior to publication. Part of this work was supported by the COST Action MP1209 "Thermodynamics in the quantum regime". ASLM acknowledges support from the CNPq. AJS acknowledges support from the Royal Society. AW is supported by the EU (STREP "RAQUEL"), the ERC (AdG "IRQUAT"), the Spanish MINECO (grant FIS2013-40627-P) with the support of FEDER funds, as well as by the Generalitat de Catalunya CIRIT, project 2014-SGR-966.
Equilibration of the expectation value of some observable does not imply equilibration of the observable itself. Here we show how the results can be cast into a stronger sense of equilibration in terms of the distinguishability, as we used in [3]. Distinguishability of states ρ and σ with respect to an observable M = {P 1 , P 2 , . . . , P N }, where P i are a complete set of projectors, is defined by and it characterizes the probability of successfully guessing between the two states (assuming they are given with equal probabilities), via p success = 1 2 + 1 2 D M (ρ, σ). By Jensen's and Cauchy-Schwarz's inequalities we can relate the distinguishability D M to the weak-distinguishability D Pi considered in this paper: Each term D Pi (ρ t , ω) T can be bounded via the results from Theorems 6 and 8, and therefore fast equilibration of the projectors will imply fast equilibration of the distinguishability.
where E B λ are the eigenvectors of H B and the coefficients c λ are positive and normalized (we will do the calculation for an arbitrary state of the bath commuting with H B , but in our case actually c λ = 1/d ∆ B ). The following result will be useful.
Lemma 10 (Gentle measurement [47]). For any state ρ and positive operator X such that X ≤ I and In order to apply the lemma for the state ρ = ρ 0 and the operator X = √ X = Π, we start with by taking absolute values in line 3, in line 4 using the Cauchy-Schwarz inequality (with |ψ = ψ|ψ as usual), and denoting the orthogonal complement of Π by Π ⊥ . In order to upper bound this expression, we use Bhatia's perturbation theory result (Theorem VII.3.1) in [48] (for another very interesting application treating the problem of proving thermalization in closed quantum systems see [49]).
Theorem 11. Let O and P be normal operators, S 1 and S 2 be two subsets of the complex plane that are separated by a strip (or annulus) of width ∆, and let E (F ) denote the orthogonal projection onto the subspace spanned by the eigenvectors of O (P ) corresponding to those of its eigenvalues that lie in S 1 (S 2 ). Then, for every unitarily invariant norm ||| · |||, In our notation, this theorem implies where we have related the euclidean vector norm on the left to the operator norm on the right. In the above expression ∆ l,λ is the distance between the supports of E S l , E B λ E S l , E B λ and Π ⊥ . Note that this distance satisfies ∆ l,λ ≥ η 2 .
Using equations (C6) and (C8), the fact that and λ c λ = 1, we get Choosing the truncation window with η = K √ 8d S H I and using the gentle measurement lemma leads to the main result It is easy to extend this to the weak-distinguishability. Note that the trace distance is invariant under global unitaries, in particular invariant under the Hamiltonian evolution. Therefore, expectation values of an observable A will be close, since we have To conclude, it will be useful to relate the rank of Π to the rank of a projector P onto yet another extended subspace H EB ,∆+2 HS +η+η ′ HB +HS . Since the rank of P is related to the spectrum of unperturbed system and bath Hamiltonians it will prove simpler to calculate.
First, denoting by P ⊥ the orthogonal projector to P we get by using the fact that for any operator Q 1 ≤ rank(Q) Q , Bhatia's theorem, and setting η ′ = 2K H I . The triangle inequality then leads to Recall that d trunc = rank(Π) = Π 1 . Hence (C16) Note that, since P projects onto the subspace H EB ,∆+2 HS +η+η ′ HB +HS corresponding to a system that does not interact with the bath, we can denote the density of states of the bath by ν B (E), and . The second line comes from using η ′ = 2K H I and η = √ 8d S K H I .
where we defined the maximum energy gap G max = E max − E min . We now impose that x is such that which together with eq. (D6) already gives condition (i). From the equation above we obtain  We now check that condition (ii) is also satisfied. Note that (which comes from the definition of A and ρ 0 ), and the fact that Π is orthogonal to |j min and |j max .
The above equation justifies the approach of defining auxiliary state and observable, since we proved that these mimic the original state and observable in the predictions.
The result can be translated into the weakdistinguishability between ρ t and ω. By the triangle inequality, a similar calculation as above, and the fact that the trace distance is invariant under unitary evolution, we see where ρ t = e −iHt ρ t e iHt and ω is the corresponding dephased state. This implies For the term D A (ρ t , ω) we can apply Propositions 4 and 5, and condition(i) to get The last inequality comes from our convenient construction of ρ 0 and A, specifically designed for this.
3. The factor Q for ρ 0 and A It is easy to see that the factor Q for the auxiliary state and observable satisfies where Q trunc is the normalization constant of the distribution that results from Πρ 0 Π and A. The above bound, plus equations (D13) and (D14), results in which, defining Q 2 = Q trunc + 2 K to simplify notation, gives the first part of Theorem 8.
In order to finish the Theorem's proof we upper bound Q trunc . From equation (23) we see In the main text we found (eq. (37)) With the initial state we focus on the average over all unitaries U within the subspace H EB ,∆ HB of D A ρ U t , ω U : By shifting the time dependencies to the observable A we get Tr ρ U t A = Tr ρ U 0 e iHt Ae −iHt ≡ Tr ρ U 0 A(t) . (E3) If A eq is the infinite time averaged A(t), we have Tr ω U A = Tr ρ U 0 A eq . Then where in the third line we have used ρ U as an observable acting on the microcanonical window of the bath Hilbert space.
Via the same calculations as used in the Appendix of [3], we get where Π s = project onto the symmetric and antisymmetric subspaces, and  is the swap operator on H EB ,∆ HB ⊗ H EB ,∆ HB , defined by  |φ 1 |φ 2 = |φ 2 |φ 1 . Since we see that β = 0, and from Tr ρ U B ⊗ ρ U B U = 1 we ob- , which leads to the simple expression The operator C is of the form C = Tr S [O], with O Hermitian and acting only on the microcanonical window, from its definition above. Any such operator can be written as where a jk are real coefficients, and {X j } and {Y k } are orthonormal bases of Hermitian operators on system and microcanonical window respectively [50]. They satisfy , while all other operators have trace 0. With these definitions we can write From this result, and the definition of C above, . We can now shift the time dependence back to the state to get which proves our claim. The bound 1 is only for presentation reasons and does not change the result much since d ∆ B ≫ d S > 1 in the regime we are interested in.
Appendix F: The distribution pα for a spin ring In order to get a deeper grasp of the behaviour of p α we simulated L interacting spin 1/2's, with a Hamiltonian given by where σ z λ and σ x λ are the Pauli z and x operators for the spin λ, and we adopt the notation σ x L+1 = σ x 1 . The spin λ = 1 is taken to represent the system S, and we focus on an observable A x and initial state ρ given by the latter representing a bath in a maximally mixed initial state and the system in the eigenvector |1 of σ z 1 with eigenvalue 1. Figure 3 depicts the normalized distribution p α = 1 Q ρ jk A kj Ax as a function of the energy gaps G α = (E j − E k ) as the number of spins increases, illustrating the transition between a distribution with several distinct peaks and a unimodal distribution. In Figure 4 we plot a(ǫ) and δ(ǫ) for different values of L, illustrating their decrease with increasing L, for most values of ǫ. Thus, even for moderate sizes of the bath, one can find an interval ǫ such that δ(ǫ) ≪ 1 and a ∼ 1. The first condition is necessary for Theorem 1 and the subsequent results to imply that equilibration occurs, while the second condition is necessary to ensure that the equilibration timescale does not grow for increasing bath sizes (see discussion after Theorem 8).
For example, in this model we find that for L = 9 one can take ǫ such that a(ǫ) = 1 and δ(ǫ) ≈ 0.006. Then Theorem 1 gives dictated by the observable, initial state and system and interaction Hamiltonians, which is straightforward to calculate. Figure 5 shows similar behaviour of a(ǫ) and δ(ǫ) for a spin ring with random couplings. We simulated a Hamiltonian where the couplings K λ are drawn at random from a Gaussian distribution with mean γ and standard deviation w. For completeness, in figure 5 we focus this time on an observable A z , and the same initial state as above: Distribution pα for observable Ax on translationally invariant spin ring   FIG. 5: Plots of average δ(ǫ) and average a(ǫ) and their respective standard deviations (after 1000 realizations) for an observable A z = σ z 1 ⊗ L λ=2 ½ λ and initial state ρ 0 = |1 1| ⊗ L λ=2 ½ λ /2, for a spin ring with random couplings K λ with mean γ = Ω and standard deviation w = 0.2Ω. Once again, both a(ǫ) and δ(ǫ) decrease as L increases, for a fixed energy gap interval ǫ. Hence, as L increases it becomes possible to find ǫ such that a ∼ 1 and δ ≪ 1.