Approximating the long time average of the density operator: Diagonal ensemble

For an isolated generic quantum system out of equilibrium, the long time average of observables is given by the diagonal ensemble, i.e. the mixed state with the same probability for energy eigenstates as the initial state but without coherences between different energies. In this work we present a method to approximate the diagonal ensemble using tensor networks. Instead of simulating the real time evolution, we adapt a filtering scheme introduced earlier in [Phys. Rev. B 101, 144305 (2020)] to this problem. We analyze the performance of the method on a non-integrable spin chain, for which we observe that local observables converge towards thermal values polynomially with the inverse width of the filter.

For an isolated generic quantum system out of equilibrium, the long time average of observables is given by the diagonal ensemble, i.e. the mixed state with the same probability for energy eigenstates as the initial state but without coherences between different energies. In this work we present a method to approximate the diagonal ensemble using tensor networks. Instead of simulating the real time evolution, we adapt a filtering scheme introduced earlier in [Phys. Rev. B 101, 144305 (2020)] to this problem. We analyze the performance of the method on a non-integrable spin chain, for which we observe that local observables converge towards thermal values polynomially with the inverse width of the filter.

I. INTRODUCTION
When an isolated quantum system is initialized in a pure state out of equilibrium, the unitary character of the evolution ensures that the state remains pure at any later times. However, if observations are restricted to a subsystem, thermalization may occur, that is, the rest of the system can act as a bath for the observed region [1,2]. More explicitly, if expectation values reach and remain close to a certain value for an extended period of time, one talks about equilibration [3][4][5]. And thermalization occurs if those values correspond to the expectation values at the thermal equilibrium state consistent with the energy of the system [1,2,6,7].
For a generic Hamiltonian with non-degenerate spectrum, the long-time limit of time-averaged observables corresponds to the expectation value in the diagonal ensemble [8]. This mixed state, diagonal in the energy eigenbasis, can be seen as the average of the density operator of the system at all times. To decide whether the system can thermalize it is thus enough to compare the expectation values in the diagonal ensemble to those in thermal equilibrium at the same energy. But while the thermal state of a local Hamiltonian can be efficiently approximated using tensor networks [9][10][11], simulating the out-of-equilibrium dynamics, and thus directly constructing the diagonal ensemble, is a much harder problem [12,13].
Generally speaking, integrable systems, due to their extensive number of conserved local quantities, do not thermalize but are instead argued to relax or equilibrate to the so-called generalized Gibbs ensemble [2,8,14,15], compatible with all the constraints. In contrast, nonintegrable systems are typically expected to thermalize [2,5,[16][17][18][19][20]. It is thus especially interesting to identify non-integrable systems that fail to do so, as the current interest in systems with many body localization [21][22][23], quantum scars [24] or disorder-free localization [25][26][27] makes evident. Nevertheless, the (absence of) thermalization of non-integrable systems is hard to determine, since the applicability of analytical tools for such models is limited, and numerical simulations of out of equilibrium dynamics are restricted to small systems or short times.
In this paper, we present an alternative method to approximate the diagonal ensemble without resourcing to the explicit simulation of the dynamics. We make use of a recently introduced filtering procedure [28], devised to prepare pure states with reduced energy variance, and show how it can be adapted to filter out the off-diagonal components of a density operator with respect to the energy basis.
More concretely, we apply to the initial density matrix a Gaussian operator that filters out large eigenvalues of the Hamiltonian commutator. In the limit of vanishing width of the Gaussian, the result will converge to the diagonal ensemble, in the most generic case, when there are no degeneracies in the spectrum. Notice that if there were degenerate energy levels, the procedure would leave untouched the coherences in the corresponding energy subspace, and thus would still lead to the correct limit of the time-averaged density operator. As described in [28], the filter can be approximated as a sum of Chebyshev polynomials, and its application to an initial vector can be numerically simulated using matrix product states [29,30] (MPS) methods, at least for moderate widths. Here we carry out these simulations for a non-integrable spin chain and investigate how the values of local observables converge towards the thermal equilibrium.
The rest of the paper is organized as follows. In section II we review the filtering procedure and its application to the problem of the diagonal ensemble. We also discuss some properties of this specific application. Section III describes the main elements of our numerical simulations. Our results are shown in section IV, where we discuss how the application of the approximate filter to this problem resembles and differs that of reducing the energy variance, and analyze the convergence of local observables to their thermal values. Finally, in Section V we summarize our findings and discuss potential extensions of our work.

II. FILTERING THE DIAGONAL ENSEMBLE
Let us consider a system of size N governed by a (local) Hamiltonian H, and a pure initial state, which can be written in the energy eigenbasis as |Ψ 0 = n c n |E n , with the normalization condition n |c n | 2 = 1. We are interested in the long time average properties of the evolved state, i.e. given any physical observable where the first equality holds under the generic condition, which we assume in the following, that the spectrum is non-degenerate [31], and in the second one we have used the definition of the diagonal ensemble If the system thermalizes, the diagonal expectation value O D := tr (ρ D O) will be equal to the expectation value in the thermal equilibrium state, ρ th (β) = e −βH /tr(e −βH ), that corresponds to the mean energy of the initial state. Thus, an approximation to the diagonal ensemble would allow us to probe whether a given state thermalizes or not.
In the energy eigenbasis, the density matrix for the initial state can be written as ρ 0 = n,m c n c * m |E n E m |. Filtering out the off-diagonal matrix elements in this basis will result in the diagonal ensemble (2). We thus define an (unnormalized) Gaussian filter which acts on the mixed state as a superoperator whereĤ C is the commutator with the Hamiltonian, i.e. H C [ρ] = Hρ − ρH. Notice that F σ is a completely positive trace preserving map, i.e. a quantum channel. The effect of this filter is to suppress the off-diagonal matrix elements corresponding to pairs of states with different energies. As the width σ is reduced, and for a generic, non-degenerate, Hamiltonian, the application of the filter will converge to the desired result Notice that the filter would not affect the density operator components in a degenerate energy subspace. Thus, if the Hamiltonian has degenerate levels, the limit of the procedure is block diagonal, corresponding to the long time limit of the time-average of the evolved state.
Mapping the basis operators to vectors [32] as |E n E m | → |E n E m , we can write the density matrix as a vector of dimension 2 2N , on which the filter acts as a linear operator, and the problem becomes formally analogous to the energy filters used in [28,[33][34][35].
In this representation, the commutator corresponds to the linear operatorĤ C = H ⊗ 1 − 1 ⊗ H T , which, if H is local, is also a local Hamiltonian with eigenvectors |E n E m and corresponding eigenvalues E n − E m , for n, m = 1 . . . 2 N . We can then apply the filtering procedure for reducing the energy variance from a state with given mean energy described in [28]. For a product initial state |Ψ 0 , the (vectorized) initial density matrix |ρ 0 = |Ψ 0 ⊗ |Ψ 0 is also a product, and the scenario is very similar to the one discussed in that reference.
With respect to the HamiltonianĤ C , any physical state has mean value ρ 0 |Ĥ C |ρ 0 = tr ρ † 0 [H, ρ 0 ] = 0. The filter (3) preserves this property of the initial state while it reduces the corresponding (effective energy) variance, ρ|Ĥ 2 C |ρ = −tr [H, ρ] 2 , which measures precisely the off-diagonal part of the density operator in the energy basis.
A. Chebyshev approximation of the filter Formally, this filtering procedure is analogous to the one described in [28], and some of the properties can be directly translated to the current case. In particular, the Gaussian filter F σ can be approximated by a series of Chebyshev polynomials.
Any piece-wise continuous function f (x) defined in the interval x ∈ [−1, 1] can be approximated by a linear combination of the M lowest-degree Chebyshev polynomials [36]. In particular, the corresponding series for the delta function truncated to order M (and improved using the kernel polynomial method) is known to approximate a Gaussian of width √ π/M . We can thus use such series to order M ∝ N/σ to approximate the Gaussian filter F σ . This sum has the form where α is a rescaling constant to ensure that the spectrum of αĤ C lies strictly within the interval [−1, 1]. We use H C = αĤ C for the rescaled Hamiltonian commutator at the rest of the paper. T m (x) is the m-th Chebyshev polynomial of the first kind, defined by the recur- , and γ M m are the Jackson kernel coefficients [36], We will denote the result of applying the series expansion to order M as Notice that this vector has a different normalization than |ρ σ , because the sum in Q M approximates a normalized Gaussian distribution, unlike F σ from (3).
The off-diagonal width of the operator ρ M is determined by the corresponding variance of H C as B. Properties of the diagonal filter Notice that the filtering procedure described so far is general, as it does not make any assumption on the spatial dimension of the problem. In the following we will focus on a one-dimensional problem, for which we can use tensor networks in order to obtain numerical approximations. As in [28], we can use matrix product state (MPS) techniques [29,30] to simulate the application of this filter to an initial state. In this way we construct a matrix product operator (MPO) [37][38][39] approximation to the filtered ensemble. Also here, for large system sizes and narrow filters, the required bond dimension for the approximation can be bounded as D c ′ √ N D 1/δ 0 , where c ′ and D 0 are O(1) constants. Accordingly, the expression for the entanglement entropy, corresponds now to a bound for the operator space entanglement entropy (OSEE) [40]. The spectrum of H C exhibits however an exponential degeneracy in the subspace of eigenvalue zero, which imposes a significant difference. For each eigenstate |E n of H, |E n E n is eigenstate of H C with zero eigenvalue. Thus, even if the spectrum of H is non-degenerate and even if it fulfills the stronger assumption of nondegenerate gaps, the "zero energy" subspace of H C is always exponentially degenerate.
Hence the target diagonal ensemble states could in principle have arbitrarily small OSEE, even with vanishing width σ (an extreme case would be the maximally mixed state, with zero OSSE). This is in contrast to the Hamiltonian filtering, where the limit would generically have thermal (i.e. volume law) entanglement. Even if we expect that the general relations between energy fluctuations and entropy or bond dimension demonstrated in [28] still hold during the main part of the filtering procedure, eventually, as the width becomes negligible and the procedure converges to the diagonal ensemble, the OSSE can converge to a non-generic value that will depend on the initial state.
The scenario we discuss here also exhibits another fundamental difference regarding physical observables. For a local operator O, the expectation value is computed as where |O and |1 are respectively the vectorized observable and identity operators.
As an overlap between two vectors, this is a global quantity, and no longer local in space. Therefore, the considerations in [28] about the minimal entanglement of a subregion required for local observables to converge to thermal values do not immediately apply here.

C. Convergence of the off-diagonal components
The initial state is given by a physical density operator, normalized in trace, trρ 0 = 1, and also Frobenius norm, ρ 0 |ρ 0 = trρ 2 0 = 1. The filter (3) preserves the former, but not the latter. Instead, the norm of the filtered vector |ρ σ indicates the magnitude of the remaining offdiagonal components.
The state resulting from the application of the original Gaussian filter F σ on ρ 0 can be written as a sum of two mutually orthogonal components, The first term is precisely the diagonal ensemble, and the second one includes all off-diagonal components of the density operator. Denoting them by |∆ρ := |ρ σ − |ρ D , the (Frobenius) norm of the off-diagonal components is The magnitude of these components may be estimated using simple arguments. We consider as initial state ρ 0 a pure product state, for which the energy distribution, given by |c n | 2 , is peaked around the mean energy E ρ0 = tr(Hρ 0 ), and has variance O(N ). For large systems, this distribution behaves as a Gaussian [41] and we can approximate the norm of the vector |ρ σ by a double integral over energies, from which we obtain The norm of the diagonal component, equivalent to the inverse participation ratio of the initial state, ρ D |ρ D = n |c n | 4 is independent of σ. Typically, the number of energy eigenstates contributing to the sum will be exponentially large in the system size, unless the mean energy of the initial state E ρ corresponds to a region of exponentially small density of states. To see this, we can take again into account the aforementioned distribution of the weights for our initial states, and the fact that for large systems the density of states approaches also a Gaussian distribution [41,42]. The inverse participation ratio then decreases exponentially with the system size, Unless the width of the filter is exponentially small in N , the norm of the filtered state is dominated by the off-diagonal component, and we expect both of them to decrease proportionally to the width, for fixed size N , according to (11). Notice nevertheless that a bound on the (Frobenius) norm of |∆ρ is not enough to extract conclusions about the convergence of physical observables, a question that we explore numerically in section IV.

III. SETUP FOR THE NUMERICAL SIMULATIONS
We use numerical simulations to explore some of the questions in the previous section. In particular, we investigate whether the diagonal ensemble can be approximated by a MPO, and how the physical observables approach the diagonal expectation values as we filter out the off-diagonal matrix elements of the density matrix.

A. MPS approximation of the ensemble
We use matrix product operators (MPO) [37,38,43] to represent the density operators corresponding to the initial and filtered states. Once vectorized, they are represented by MPS with double physical indices, which can be manipulated using standard tensor network methods [29,30,44,45].
We find a MPS approximation for the action of the filter (4) on a given initial state. The method is completely analogous to the one presented in [28] for filtering out energy fluctuations, with the only difference that here the effective Hamiltonian is the commutator superoperator H C acting on the vectorized density matrices. For a local Hamiltonian H, the commutator H C can also be written as a MPO.
As in [28,35,[46][47][48][49] we can then take advantage of the fact that we do not need the full polynomials T m (H C ), which in our case are operators acting on a 2 2N dimensional vector space, but only the vectors resulting from their action on the initial state T m (H C )|ρ 0 . The latter satisfy the same recurrence relation as the polynomials and can be computed with lower computational cost.

B. Model and initial states
We focus our study in the non-integrable Ising spin chain with longitudinal and transverse fields, and choose parameters (J, g, h) = (1, −1.05, 0.5), which is far from the integrability limit.

IV. NUMERICAL RESULTS
We have applied the procedure described in the previous section to system sizes N ∈ {20, 60}, using MPS with bond dimensions 100 ≤ D ≤ 1500. Additionally, we cross-check results for small system sizes N ≤ 20 which can be explored with exact diagonalization.
A. Scaling We expect the off-diagonal width δ of our simulations to follow the scaling predicted in Ref. [28], namely δ 2 ∝ 1/M 2 , for large enough number of terms in the approximation of the filter, and provided that the truncation error is not significant. Thus, the decrease of the width with M provides us with a check that our simulations are in the expected regime. Figure 1 shows that this is indeed the case. The figure shows that for all system sizes, the converged data are well described by a power law fit δ 2 ∝ M −α (dotted lines) with exponents −2.13, −1.98, −1.97, −1.95, −1.96 for N = 20, 30, 40, 50, 60, respectively.
A further check is provided by the norm of the filtered state |ρ σ . As described in section II C, ρ σ |ρ σ should decrease as the inverse off-diagonal width. Since our algorithm applies the normalized filter (4), Q M ∼ 1 √ 2πσ 2 F σ , we expect, for the proper values of M and σ, To directly probe this relation, we plot the vector norm of our resulting state in figure 2, for system sizes N = 20 − 60, and find that our data agrees well with this prediction, except for the smallest values of M .

B. Convergence of local observables
As the filtered state approaches the diagonal ensemble, so will the values of physical observables. If the state thermalizes, such limit will agree with the thermal value corresponding to the initial energy, and thus comparing this to the converged values can be used to probe thermalization of the system. Here we are interested in the rate of convergence of the physical expectation values.
For the problem of reducing the energy variance of a pure state, it has predicted that for chaotic systems [50] a polynomial decrease of the variance with the system size is required for all local observables to converge to their thermal values. In Ref. [28] it was numerically observed for model (13) that an energy variance decreasing as 1/ log N or faster was sufficient for convergence in the thermodynamic limit. But as discussed in section II, these conclusions do not need to apply in our case, because the expectation value in the mixed state does not have the same local structure. We thus explore this question numerically by studying the local x and z magnetizations in the middle of the chain, O = σ [N/2] x,z , and analyzing how the expectation values vary as the width of the filter decreases. For systems of size N ≤ 12 we can compute the action of the filter exactly for any width, while for larger systems, up to N ≤ 60, we run MPS simulations up to the narrowest filter widths that we can reliably reach with a maximum bond dimension D = 1000.
For small systems, N ≤ 24, we can compare the filtered values to the exact compute the exact magnetiza- tions in the diagonal ensemble. For larger systems we do not have access to either the evolved state at long times or the exact diagonal ensemble, but we can approximate the thermal ensemble corresponding to the initial energy using MPO [37,38,51]. For the cases we study, there are analytical and numerical arguments in favor of thermalization [35,52], such that the thermal expectation values should be very close to the diagonal ones. Thus, for our analysis it is enough to use the thermal value as reference, since we are only exploring the variation of the expectation values, but our simulations for large systems do not reach full convergence (see subsection IV D for a more detailed discussion of the numerical errors). We plot the results for small and large system sizes in figures 3 and 4 for initial state |X+ , and in figures 5 and 6 for initial state |Z+ . In all cases we represent the absolute value of the difference between the expectation values in |ρ M and the diagonal (thermal, for large systems) values as a function of the off-diagonal width δ. In all cases, i.e. for the different initial states and different sizes, we observe that this absolute error, which is given exclusively by the off-diagonal part of ρ M , decreases at least as fast as 1/ √ δ (see insets). Moreover, the figures show that curves for different system sizes practically collapse on top of each other.

C. Entropy
Since we start with a product state |ρ 0 and evolve it with a local Hamiltonian H C , the same arguments used in the case of pure states [28,53] then imply that the OSEE can be bounded as a function of the off-diagonal width and the system size as given in Eq. (8). Figure 7 (upper) shows that indeed, the evolution of the OSEE while filtering out the off-diagonal components of the state satisfies a similar bound. The plot shows the OSEE corresponding to the middle cut of the approximate filtered state ρ M , as a function of the system size, for simulations in which the number of Chebyshev terms was chosen as different functions of the size M = f (N ), corresponding to a width δ(N ) ∝ 1/M . We observe that for M ∝ √ N , which corresponds to δ ∝ √ N , the OSEE does not grow with the system size, while for M ∝ N or M ∝ N log N (correspondingly δ ∼ const or δ ∝ 1/ log N ), it increases as log N . For faster growing M ∝ N 2 , also the increase in entropy is faster (compatible with it growing at most as N , as predicted by the argument in [28]). The asymptotic universal scaling of the entropy can be appreciated more explicitly in figure 7 (lower), which shows that 2 S ∝ √ N (D 1/δ 0 − 1) for all system sizes N ≥ 20 with a constant D 0 = 2.76.
The limit of the filtering procedure when the width vanishes is a mixed state in the exponentially degenerate null space of H C . This subspace supports states with zero OSEE (e.g. the maximally mixed state), and thus the final OSEE is not generic, but will be determined by the initial state, in contrast to the case of pure state filtering, where we could generically expect that the entanglement entropy converged to a thermal volume law. We can explore how the limit value is approached during the filtering by analyzing the results for small systems, as shown in figure 8. As illustrated in the figure for different initial states and sizes N ≤ 12, the entropy grows Behavior of the exponential of the entropy as predicted by Ref. [28] that we have shown in eq. 8. The dotted line indicates the linear fit where all data points locate on the same line as expected for large system sizes. D0 from fitting the data for all system size is 2.76 (40) and the slope of the fit is 1.
with 1/δ for moderate widths, but it reaches a maximum after a certain point, and then decreases towards the diagonal value. If we examine how this final value depends on the system size, we observe, that in all the cases studied the diagonal OSEE increases almost linearly with the size, although the values change considerably from one state to another, where the slope of each initial states are 0.06(78), 0.89 (17), 0.02 (37) for |X+ , |Y + , |Z+ , respectively.

D. Error Analysis
In our strategy, for a fixed order M of the Chebyshev expansion, the main source of error is the truncation error, namely approximating the action of each Chebyshev polynomial on the initial state by a MPS with limited bond dimension. We can quantify this error for a given order m using as reference the best approximation found for the corresponding term T m (H C )|ρ 0 (in our case, with D = 1000) and comparing it to its truncated versions with smaller bond dimensions. In this way we can extract the bond dimension required for fixed precision. In previous works that used MPS approximations of Chebyshev series [35,[46][47][48][49] it was observed that the required bond dimension for such terms increases polynomially with the degree m. Our results, illustrated in figure 9, seem to agree with such behavior, except for the smallest values of m. We have also observed, as in the recent work [35], that for fixed m the bond dimension required to maintain constant truncation error in T m (H C )|ρ 0 gets smaller for larger system sizes. Notice, however, that for larger systems, also polynomials of higher degree will be required to attain a constant width δ, since, as discussed in Sect. II A, the order of the expansion scales as M ∝ N/δ.

V. DISCUSSION
We have presented a method to approximate the diagonal ensemble corresponding to a quantum many-body state. By applying a Gaussian filter to the density operator, the off-diagonal components in the energy basis are suppressed and, in the limit of vanishing filter width, the result converges to the ensemble that represents the long time average of the time evolved state. For a Hamiltonian with non-degenerate spectrum, this is the diagonal ensemble.
Numerically, the filter can be approximated by a Chebyshev polynomial series, and applied using MPS standard techniques, in an analogous manner to what was already described in Ref. [28] for an energy filter. In our case, we obtain a MPO approximation to the filtered ensemble.
The method allows us to treat larger systems than exact diagonalization. However our results for small systems indicate that the operator space entanglement entropy of the diagonal ensemble scales as a volume law, which limits the system sizes for which the MPO can provide a reliable approximation. Still, we are able to simulate the effect of filters with moderate off-diagonal width and to analyze the convergence of local observables towards the thermal equilibrium.
We have applied this method to a non-integrable spin chain and several out of equilibrium product initial states for system sizes up to N = 60. We have numerically observed that local observables converge towards their thermal values as a power of the inverse off-diagonal width. Remarkably, this behavior is mostly independent from the system size. Even for moderate off-diagonal widths, the method provides in this way insight beyond exact diagonalization. In the future, it can be thus used to explore other one-dimensional models.