Prethermalization and Thermalization in Generic Isolated Quantum Systems

Prethermalization has been extensively studied in systems close to integrability. We propose a more general, yet conceptually simpler, setup for this phenomenon. We consider a --possibly nonintegrable-- reference dynamics, weakly perturbed so that the perturbation breaks at least one conservation law of the reference dynamics. We argue then that the evolution of the system proceeds via intermediate (generalized) equilibrium states of the reference dynamics. The motion on the manifold of equilibrium states is governed by an autonomous equation, flowing towards global equilibrium in a time of order $g^{-2}$, where $g$ is the perturbation strength. We also describe the leading correction to the time-dependent reference equilibrium state, which is, in general, of order $g$. The theory is well-confirmed in numerical calculations of model Hamiltonians, for which we use a numerical linked cluster expansion and full exact diagonalization.


I. INTRODUCTION
Prethermalization [1] has emerged over the last decade as an interesting and ubiquitous phenomenon in the dynamics of ultracold quantum gases in one-dimensional geometries [2][3][4][5][6]. In general, it refers to a separation of timescales: Some systems far from equilibrium quickly relax to long-lived (non-)thermal states (not true thermal equilibrium states) on short timescales, before eventually relaxing to the expected true thermal equilibrium states on much longer timescales.
There are some general instances of this phenomenon that are well understood, analytically and numerically. A first example are quenches in isolated noninteracting (integrable) systems, in which interactions (integrabilitybreaking perturbations) of strength g are turned on [7][8][9][10][11][12][13]. By dephasing, observables quickly settle to quasisteady states that are well described by generalized Gibbs ensembles (GGEs) [14][15][16][17][18] of the non-interacting systems. The observables then relax to the thermal equilibrium values (thermalize) in a much longer time scale ∝ g −2 , as predicted by kinetic Boltzmann-like equations. The latter can be derived applying time-dependent perturbation theory to the GGEs [19,20], and, physically, describe the effects of collisions between (quasi-)particles. It was recently shown numerically that, weakly breaking integrability in strongly interacting integrable systems also results in thermalization rates ∝ g −2 [21]. A second example are isolated weakly interacting driven systems, for which GGEs and kinetic Boltzmann-like equations are also relevant [22,23]. A third example are periodically driven systems at high frequency. In general, they quickly reach a time-periodic state that can be identified as a Gibbs state corresponding to an effective Hamiltonian, before relaxing to the thermal (infinitetemperature [24,25]) state in a time scale that is exponentially long in the frequency of the drive [26][27][28][29].
In this work, we argue that the first two examples above are instances of a more universal phenomenon, a phenomenon that occurs whenever an equilibrating dy-namics (to a thermal-or GGE-like state) is weakly perturbed so that, at least, one of the conserved quantities in the original dynamics is no longer conserved in the weakly perturbed one. A similar point of view was put forward in Refs. [30,31] for open quantum systems. We first present this conclusion in a loose manner in Sec. II, and then, in Secs. III and IV, we develop a systematic treatment of weakly perturbed systems in which a conservation law is broken. The remainder of the manuscript is dedicated to demonstrating, in the context of numerical experiments, the validity and accuracy of this systematic treatment. Section V, in which we introduce the models, quenches, observables, etc, is the preamble to the numerical experiments, which are reported in Sec. VI. A summary of our results is presented in Sec. VII, while the Appendices report details of our analytical and numerical calculations.

II. PREAMBLE
Setup -We consider isolated extended quantum systems that are translationally invariant. For the sake of concreteness, one can think of chains with L 1 sites. We have a pair of HamiltoniansĤ 0 andĤ, in which the latter is a small perturbation of the former where g small. We assume that the (reference) Hamil-tonianĤ 0 has a conservation law, sayQ, which is not shared byĤ, namely, [Ĥ 0 ,Q] = 0 but [V ,Q] = 0. We are interested in cases in whichĤ 0 ,Q,Ĥ, andV are extensive operators. In order to take the thermodynamic limit, it is helpful to deal with the intensive counterparts of those operators:ĥ 0 =Ĥ 0 /L,q =Q/L,ĥ =Ĥ, etc. For simplicity, we restrict our analysis to cases in whicĥ Q has a discrete spectrum;Q can be the total particle number, the total magnetization, etc.
Assumption of fast equilibration -We consider cases in which the dynamics generated byĤ 0 , even when the system is far from equilibrium, results in fast equilibration. Namely, we assume that, for experimentally relevant translational-invariant initial statesρ I , observables converge within a time τ * to the predictions of an ensemble (e.g., microcanonical) of statistical mechanicsρ e0,q , with (e 0 , q) = ( ĥ 0 ρ I , q ρ I ). The restriction "experimentally relevant" is put to avoid cases in which the initial state is a macroscopic superposition of states at different densities (e 0 , q) (see Ref. [32] for a discussion of this issue in the context of quantum quenches). The restriction to translation-invariant states is used to avoid having leading effects in the equilibration dynamics that are L-dependent, e.g., particle or energy transport, which would complicate the picture.
Initial prethermalization -Even though there is no standard perturbation theory around a genuinely inter-actingĤ 0 , it seems reasonable to assume that the effect of the perturbation is not felt on times τ 1/g, and that, for such short times, one can meaningfully approximate the dynamics underĤ by the reference dynamics generated byĤ 0 . Therefore, by the assumption of fast equilibration, one should observe a fast initial relaxation of observables toward the predictions ofρ e0,q . The latter state can be very different to the thermal equilibrium ensembleρ e associated toĤ.ρ e0,q differs fromρ e by the fact that q is an additional constraint [of course, it also differs because e 0 is the energy density ofĤ 0 and not that ofĤ, but that change is only O(g)].
The thermalization rate -To determine how fast the true equilibrium is approached, let us start fromρ e0,q and look at the change in Q (τ ) e0,q . We do this perturbatively: Q (τ ) e0,q = Q e0,q + igτ [V ,Q] e0,q + O(g 2 τ τ * L) (2) where we we used that [Ĥ 0 ,Q] = [Ĥ 0 ,ρ e0,q ] = 0. The precise error estimate O(·) will only be argued for later. The important point to be highlighted from Eq. (2) is that the leading order correction to Q e0,q vanishes because [V ,Q] e0,q = 0, by the cyclic property of the trace and the fact that [Q,ρ e0,q ] = 0. This suggests that the thermalization rate is ∝ g 2 . Indeed, carrying out the expansion one order further, we recognize Fermi's golden rule. Finally, if we had replacedQ by a general observ-ableÔ, the first order term does not vanish. It results in a universal deviation of Ô in the instantaneous state from that ofρ e0,q .

III. SLOW DYNAMICS OF APPROXIMATELY CONSERVED QUANTITIES
In this section, we present a derivation of an approximate autonomous equation governing the dynamics of the densities (e 0 , q). This is Eq. (23), or, at a more abstract level, Eq. (21). Our derivation is not mathemat-ically rigorous, it uses physical assumptions, and goes substantially beyond the heuristics presented above. Its validity and accuracy are confirmed by numerical calculations in Sec. VI.

A. Slow variables
We identify the densities (e 0 , q) as slow variables. To set up a controlled derivation, we need a projection map from statesρ to (e 0 , q). It turns out, however, that it is more natural to start with a projection P fromρ to a probability distribution p on (e 0 , q). Indeed, each such a p can be lifted to aρ bŷ and the physically most natural case is, of course, when p(e 0 , q) is a Dirac delta distribution δ(e 0 − e * 0 )δ(q − q * ), cf., the discussion on 'experimentally relevant states' in Section II. To construct the map P, a natural approach is to measureĤ 0 andQ: where P (Ĥ 0 ≈ e 0 L) is a spectral projection ofĤ 0 on the interval [e 0 L − δ E , e 0 L + δ E ] with a resolution δ E that is much larger than the level spacing, but smaller than any relevant energy scale (like, e.g., the energy per site). For P (Q ≈ qL), we can simply take the projection ofQ on the eigenvalue closest to qL. Then we can set where " de 0 dq" represents a sum with the aforementioned resolution.
With translationally invariant quantum many-body systems in mind, we use a more drastic definition of Pρ, which relies on insights from eigenstate thermalization for nonintegrable (quantum chaotic) systems [32][33][34], and generalized eigenstate thermalization for integrable systems [15,18,[35][36][37]. We do not distinguish between nonintegrable systems, for which the number of conserved quantities in the thermodynamic limit is O(1), from integrable systems, for which the number of conserved quantities in the thermodynamic limit is infinite. In translationally invariant (nonintegrable or integrable) quantum many-body systems, even a single eigenstate of H 0 andQ is a bonafide equilibrium ensemble, so we take the resolution δ E above to be zero. In that case, P is the projection onto the diagonal ensemble [32], its action on ρ results in where |E 0 α are the eigenkets ofĤ 0 andQ, and we have assumed that either there are no degeneracies in the many-body energy spectrum or that, if present, they are few and unimportant. This is generically the case in many-body interacting quantum systems [20].
We defineP = 1 − P, and note that both P,P are projectors, namely, P 2 = P andP 2 =P. A warning is in order, despite the fact that Pρ andPρ encode all the information about the microscopic distributions of E 0 and Q (whose widths are expected to be subextensive in E 0 and Q [32]), all that information is not needed and our analysis does not allow one to keep track of it in time. Our results only depend on, and keep track of, the distribution of densities p(e 0 , q).

B. 'Mori-Zwanzig' approach
Let us introduce the Liouville superoperator L = −i[Ĥ, ·]. Then, following Mori-Zwanzig [38,39], see Appendix A, the theory of linear ordinary differential equations (ODEs) gives us the following rewriting of the Pprojected Liouville equation ∂ τρ (τ ) = Lρ(τ ): whereρ I ≡ρ(τ = 0). To bring some structure to this equation, we now split and note the properties They follow from elementary considerations. This allows to recast the above equation of motion (EOM) as where we introduced The object A s represents a memory kernel. For completeness, we also report the expression forPρ(τ ), see Appendix A, It should be stressed that all the previous equations are exact, and, hence, they are not particularly useful. In the next section, we make some motivated approximations.

C. Equilibration
Our only assumption is that the dynamics generated byĤ 0 results in fast equilibration of observables to the equilibrium state that is characterized by the expectation value ofĤ 0 , E 0 = Tr(ρ IĤ0 ), and ofQ, Q = Tr(ρ IQ ). Once again, we stress that we do not distinguish between nonintegrable reference HamiltoniansĤ 0 , for which observables equilibrate to the thermal predictions (thermalize [20]), from integrable reference Hamiltonians, for which observables equilibrate to the GGE predictions (exhibit generalized thermalization [15]). For our purposes, it does not matter whether equilibration is towards a traditional ensemble of statistical mechanics or towards a GGE: in both cases it means that, for generic (fewbody) observablesÔ and initial statesρ I , whereÔ 0 (τ ) = e iτĤ0Ô e −iτĤ0 . We define an equilibration timescale τ * ( ), which is understood as the time τ at which f (τ ) ≈ for some small dimensionless . By fast equilibration, we mean that τ * ( ) We do not expect to be arbitrarily small because, even if the Hamiltonian and the initial state are translational invariant, the unavoidable coupling to hydrodynamic modes in interacting systems renders the approach of local observables to equilibrium polynomial, f (τ ) ∝ τ −d/2 , as argued in Ref. [40] on the basis of fluctuating hydrodynamics. In such a case, one expects Our numerical results suggests that, for the models and observables studied, hydrodynamics tails may set in when is very small, at times that are beyond the reach of our numerical calculations.
The way τ * enters our analysis is that we approximate, for anyρ, accepting an error O( ). This approximation amounts to assuming that the system has equilibrated after a time τ * with respect to the dynamics generated byĤ 0 . Hence, at time τ * , we replace the density matrix of the system by that of the diagonal ensemble. Of course, the usual caveats typical of irreversibility apply: this replacement can only be correct when dealing with local or few-body observablesÔ, or sums thereof, not for (special) manybody operators such as the spectral projectors ofĤ 0 [20].

D. Born approximation
We can now state our main assumption as a weak coupling condition, namely, To see this assumption at work, let us expand the exponential e s(L0+PL1) in the definition of A s in Eq. (11) In order for this expansion in powers of g to be meaningful, one needs to make sure that the series above can be resummed such that it is linear in L. For example, since every L 1 carries a factor L, it appears that the first and second term in the equation above are of order L 2 and L 3 , respectively. However, one can write these terms as sums of [and integrals over (e 0 , q)] truncated correlation functions V 0 (s)V c e0,q and V 0 (s)V 0 (s 1 )V c e0,q , see, e.g., the remark following Eq. (B4). These truncated correlation functions are of order L due to clustering properties of the (e 0 , q)-equilibrium ensembles. In higher orders, such considerations become more complicated and we refer the interested reader to mathematical work establishing a meaningful expansion [41,42]. Assuming that one recovers the linearity in L to all orders, for any τ ≥ τ * , (18) where the first term is O(g 2 Lτ * ). One then sees that the expansion is meaningful if which is, of course, equivalent to Eq. (16).

E. Markov approximation
The discussion above has shown that the lowest order approximation to ∞ 0 dsA s , namely, is O(g 2 τ * L), with the factor L originating from the spatial sum in L 1 . The physical meaning of this factor is that the quantities that change smoothly in time are the densities (e 0 , q) rather than (E 0 , Q) themselves, or, alternatively, local observables. We then see that the superoperator K is O(g 2 τ * ), and this is the rate at whicĥ ρ changes. Recalling that the time-integral defining K reaches its τ = ∞ value at τ ≈ τ * , leads to the conclusion that, in Eq. (10), we can approximate Pρ(τ − s) by Pρ(τ ), making an error O(g 2 τ * 2 ) (which is small by our weak-coupling assumption gτ * 1). A further simplification occurs by observing that Y τ is proportional to g, and vanishes fast as τ ≥ τ * , so one might set Y τ = 0 for times τ ≥ τ * . If we make these two approximations, then the EOM [Eq. (10)] reads A conservative look at the validity of Eq. (21), in particular to the approximation Y τ = 0, shows that one should trust Eq. (21) only for times τ τ tr (where the subindex 'tr' stands for 'transient') and τ tr is determined by g 2 τ * gf (τ tr ), as follows by comparing the magnitude of the two terms in Eq. (10).

F. The autonomous equation
The meaning of Eq. (21), in which K is a superoperator acting on density matrices, is simplified by the presence of P. As mentioned before, P projects onto distributions of densities p(e 0 , q), and so Eq. (21) is actually an evolution equation on such p. It is easier, and more natural, to guess the form of this evolution equation than to derive it from the formula above, so we will do the former here and relegate the latter to Appendix B.
Givenρ e0,q , we need to find the rate of change of (e 0 , q). The natural path is to use Fermi's golden rule, within which e 0 does not change in time. The rate of change of q, called the 'drift' d, is given by where the integral is understood as a sum and the delta function δ(E 0 α − E 0 α ) selects an interval of energies with a width smaller than any relevant energy scale but much larger than the level spacing. The state |E 0 α is taken to have (e 0 , q), so that, by means of eigenstate thermalization (or its generalized version for integrable systems), d = d(e 0 , q) does not depend on |E 0 α but only on (e 0 , q). Even thoughV = O(L), the drift d(e 0 , q) is O(1) because of decay of spatial correlations, see Appendix B. We hence obtain the autonomous equation ∂ τ e 0 (τ ) = 0.
The corresponding evolution equation for the distributions p τ (e 0 , q) is then which lifts naturally to an evolution on Pρ. The stationary solution of Eq. (23) as τ → ∞ is denoted by (e 0 , q * ), where q * = q * (e 0 ) is determined by Within Fermi's golden rule [Eq. (22)], this is recognized as a detailed balance condition, indicating that q * (e 0 ) is the equilibrium value of q given e 0 . In other words, q * is determined by maximizing the entropy at fixed e 0 and, hence, the resulting ensembleρ e0,q * is equivalent to one in which no constraint onq is imposed: This also shows that the asymptotic stateρ e0,q * is close to the global equilibrium stateρ e for which ĥ = e = e 0 + O(g). In Sec. IV, we quantify the O(g) difference betweenρ e0,q * andρ e .

G. Corrections
In principle, our scheme allows to compute higherorder corrections in g to the autonomous Eqs. (21) and (23). In particular, one can expand A s in a power series in terms of order (gτ * ) n L, n ≥ 1, with the terms for n = 1 and 2 given in Sec. III D. Integrating over s, one then gets a series for K, and hence for the drift coefficient d[e 0 , q]. It is, however, far from obvious that such corrections are useful, as: (i) we have made approximations at several points, not only in truncating A s , but also, e.g., replacing Pρ(τ − s) for s ≤ τ * by Pρ(τ ), and (ii) a not-so-fast decay of f (τ ) would imply that the Y τ term in Eq. (10) remains potentially important at long times, obscuring any precise corrections to the Markovian part.
Rigorous work in the context of open quantum systems [41][42][43] implies that one can meaningfully compute corrections, but only if the condition

IV. DYNAMICS OF LOCAL OBSERVABLES
So far, we have focused on the evolution of the quasiconserved quantityQ, which we found to be described by the autonomous Eq. (23). We now turn our attention to more general observables. Our main finding is that their time-dependent expectation values are described by a 'deformed equilibrium ensemble', with an O(g) difference fromρ e0,q(τ ) . This deviation has a universal form.

A. Correction to Pρ(τ )
At first sight, the evolution of generic observables is slaved by the evolution of the slow variables (e 0 , q). Indeed, the general picture is that the evolution of the density matrixρ(τ ) takes place in the space of equilibrium density matricesρ e0,q(τ ) , leading to the prediction In the last equality, we assumed that p(e 0 , q) is concentrated at a single value. Yet, strictly speaking, the previous section dealt with Pρ(τ ), rather than withρ(τ ), so Eq. (27) is in need of a justification. To provide it, we return to the formalism in Sec. III B, and write [see The first term in Eq. (28) is dominant for large τ ≥ τ tr . It amounts to the approximation resulting in Eq. (27). The second term in the last line of Eq. (28) is transient, decaying as gf (τ ), as discussed also in Sec. III E. The third term is the correction we are interested in. Let us write it explicitly, when paired with an observableÔ, namely τ 0 ds Tr[ÔPe sPL LPρ(τ −s)]. By expanding e sPL in powers of g, we get the leading contribution One can further simplify this expression by approximating Pρ(τ − s) by Pρ(τ ), justified by the reasoning in Sec. III E, and by again assuming that the distribution p(e 0 , q), corresponding to Pρ(τ ), is concentrated at a single density [e 0 , q(τ )]. Then the last line in Eq. (29) reads By the assumption of fast equilibration at s ≈ τ * , this expression is O(gτ * ), i.e., it is a small correction to Ô e0,q(τ ) . Even though it is subleading, this correction has a universal form, as we explain in the next section. Finally, note that Eq. (30), obtained following a naive first order perturbation theory, was already written in Sec. II forQ [see Eq. (2)]. In that case, the s-independent integrand vanishes identically.

B. Susceptibility
To understand the correction in Eq. (30), consider the equilibrium ensembleρ e0,q * with (e 0 , q * ) the τ → ∞ solution of Eq. (23), or, alternatively, with q * determined by maximizing the entropy at fixed e 0 . The ensembleρ e0,q * is a small perturbation of the real equilibrium ensemblê ρ e corresponding toĤ, with ĥ ρe = ĥ ρ e 0 ,q * +O(g). We can relate these two ensembles by linear response theory. Indeed, we can imagine starting withρ e0,q * at τ = 0, and switching on the perturbation gV . The system will then evolve precisely toρ e , and the change in the expectation value of observablesÔ should be described by linear response theory. In particular, the stationary change is described by the zero-frequency response coefficient (also known as the susceptibility), which is exactly Eq. (30).
This discussion should also clarify how thermalization with respect toĤ is reconciled with our treatment, which is based on equilibration with respect toĤ 0 . The global equilibrium stateρ e is obtained as a universal correction to the stateρ e0,q * .

C. Deformed equilibrium ensembles from Fermi's golden rule
In Sec. IV A, we used the last term in Eq. (28) to derive the O(g) correction to observables inρ(τ ) from their expectation values inρ e0,q(τ ) . Here, we point out that this is also what the autonomous Eq. (23) dictates.
Let us compute the time derivative of q(τ ): where, in the last equality, we used that [Ĥ 0 ,Q] = 0. Previously, we evaluated this time derivative in a different way: we approximatedρ(τ ) byρ e0,q(τ ) [assuming, again, that p(e 0 , q) is a Dirac delta function] and we arrived at This is a more abstract rendering of the autonomous Eq. (23). At this point, one can ask which form ofρ(τ ) would reconcile Eqs. (31) and (32). By the cyclic property of the trace, and using that [q,ρ e0,q(τ ) ] = 0, we note that Eq. (31) does not depend on the leading contributionρ e0,q(τ ) toρ(τ ). It depends only on the correction term. We then see that Eq. (31) reduces to Eq. (32) if we choose the correction term to be which is the correction we used in Sec. IV A to derive Eq. (30). A spectacular corollary of this reasoning is the observation that, ifρ(τ ) were exactly commuting withq, then the rate of change vanishes, i.e., ∂ τ q(τ ) = 0.

V. MODELS, QUENCHES, AND OBSERVABLES
In the rest of the manuscript, we test the previous ideas and analytical results by means of numerical calculations. We focus on the dynamics of nonintegrable models of strongly interacting hard-core bosons in onedimensional (1D) lattices (which can be mapped onto spin-1/2 Hamiltonians). We study the effects of breaking particle-number conservation [the U(1) symmetry in the corresponding spin models].

A. Models
Quantum dynamics are studied under timeindependent HamiltoniansĤ α of the form where the reference HamiltonianĤ 0 , which we take to be nonintegrable, commutes with the total particle-number operatorN , We takeĤ 0 to be the t-V -t -V Hamiltonian for hardcore bosons in 1D lattices [44,45], with nearest (nextnearest) neighbor hopping t (t ) and interaction V (V ): is the hard-core boson creation (annihilation) operator andn i =b † ibi is the number operator at site i. When t = V = 0,Ĥ 0 is integrable (in the spin language, it is the Hamiltonian of the spin-1/2 XXZ chain) [46]. Here, we focus on cases in which t = V = 0 so thatĤ 0 is nonintegrable [44,45]. We consider two perturbations g αVα , with α = 1, 2. The first one is It will be important later that the presence of nearest neighbor hopping terms inV 1 make E 0 α |V 1 |E 0 α = 0 for typical eigenkets ofĤ 0 .
The second perturbation we consider is This perturbation only contains terms that change the particle number. Hence, E 0 α |V 2 |E 0 α = 0 for all eigenkets |α 0 ofĤ 0 andN .

B. Initial states and description after equilibration
We study the quantum dynamics of initial statesρ I that are far from equilibrium with respect to bothĤ 0 andĤ α . This is achieved by choosingρ I to be thermal equilibrium states of initial HamiltoniansĤ I , such that [Ĥ I ,Ĥ α ] = 0 and [Ĥ I ,Ĥ 0 ] = 0. Dynamics are generated by quantum quenches in which, at time τ = 0, one suddenly changesĤ I →Ĥ α and lets the systems evolve unitarily. We consider systems that are translationally invariant before and after the quench.
The time-evolving density matrix after the quench can be written asρ(τ ) = e −iĤατρ I e iĤατ . We are interested in the dynamics of observablesÔ, whose expectation values are given by O(τ ) = Tr ρ(τ )Ô . At long times, one expects observables to equilibrate at the values predicted by the diagonal ensemble (DE), is the density matrix of the diagonal ensemble [32]. When written in the basis of eigenkets |E α i ofĤ α ,ρ DE takes the form For nonintegrable (quantum chaotic) systems, because of eigenstate thermalization [32][33][34], one expects the predictions of the diagonal ensemble to match those of traditional statistical mechanics ensembles, namely, we expect observables to thermalize [20]. For the t-V -t -V Hamiltonian for hard-core bosons in 1D lattices, eigenstate thermalization and thermalization were studied in Ref. [44], while quantum chaos was studied in Ref. [45]. This means that, in our systems, we can also describe observables after equilibration by means of the grand canonical ensemble (GE) characterized by a temperature T , and, when particle number is a conserved quantity, by a chemical potential µ. The grand canonical ensemble density matrices have the form When g α = 0, T is fixed by the (conserved) energy of the time-evolving state When g α = 0, T and µ are determined by the (conserved) energy [Eq. (40)] and by the (conserved) particle number in the time-evolving state Due to particle-hole symmetry in the Hamiltonianŝ H α , when g α = 0 (namely, in the absence of particlenumber conservation), our systems after equilibration are always at half filling irrespective of the initial filling n I . We consider initial fillings n I = 1/2, which means that the filling must change during the dynamics when g α = 0.
From our analysis in the previous sections, we expect that, for small values of g α , the dynamics of generic observables follow a two-step process towards thermalization: (i) fast relaxation driven byĤ 0 (prethermal dynamics), and (ii) slower, nearly exponential, relaxation to the thermal equilibrium predictions (thermalization dynamics). At long times, close to thermal equilibrium, hydrodynamics may become dominant and algebraic relaxation is expected to take place [40]. That regime is not resolved in this work.
As discussed in Sec. IV, the near-exponential dynamics following prethermalization can be described by projectingρ(τ ) in the basis of the eigenkets ofĤ 0 , up to an O(g) correction. This projected state is a diagonal ensemble ofĤ 0 , whose density matrix [see Eq. (6)] we denote aŝ where |E 0 i are the eigenkets ofĤ 0 .

C. Numerical Linked Cluster Expansion
In what follows, we use the numerical linked cluster expansion (NLCE) approach introduced in Ref. [21] (see Refs. [47,48] for NLCE studies in two dimensions) to study the quantum dynamics of various observables in our translationally invariant 1D systems in the thermodynamic limit.
NLCEs were originally introduced to study systems in thermal equilibrium [49], and allow one to obtain the expectation value of extensive observables per site (O = O/L), in the thermodynamic limit (L → ∞), as sums over the contributions of the connected clusters that can be embedded in the lattice. Given the connected clusters c, which can be embedded in the lattice in M (c) ways per site and have weights W O (c), one obtains O in the following way W O (c) is computed, for each cluster c, from the expectation value of the observableÔ in the cluster (O c ) using the inclusion exclusion principle: where the sum is over all connected sub-clusters of c.
, whereρ c is the density matrix of the relevant ensemble in the cluster. In NLCEs, O c is calculated exactly numerically using full exact diagonalization. In our calculations, the density matrix of the initial state in each cluster,ρ c I , is taken to be the grand canonical density matrix set by the initial Hamiltonian in each clusterĤ c I (our initial states are in thermal equilibrium with respect toĤ I ). In all quenches, we takeĤ I to be the t-t -V -V model in Eq. (35). Since [Ĥ c I ,N c ] = 0, wherê N c is the total particle number operator in the cluster, Also, sinceĤ c I is particle-hole symmetric, we need µ I = 0 in order to have initial states with filling n I = 1/2.
To calculate the time evolution of the expectation values of the observables, O(τ ), where τ denotes the time after the quench, the density matrix of each cluster is evolved with the Hamiltonian after the quenchĤ c and the NLCE calculation is carried out as usual [21]. Our Hamiltonians after the quench have the form in Eq. (34). Similarly, in order to obtain NLCE results after the quench for the diagonal ensemble, the grand canonical ensemble, and in the projected basis ofĤ 0 , we useρ c DE ,ρ c GE , andρ c 0 (τ ) from Eqs. (38), (39), and (42), respectively, for each cluster c.
NLCEs have been used to study quenches in the t-t -V -V model [Eq. (35)] to understand the dynamics [21], and the description of observables after equilibration [50], at the integrable point t = V = 0 and away from it t = V = 0. The presence of next-nearest neighbor hoppings and interactions makes it possible to have different building blocks to construct the clusters in the NLCE. In Ref. [51], it was shown that maximally connected clusters -built adding contiguous sites and all possible bonds, starting from one site -are optimal for studying quenches in this model [51]. Here, as in Refs. [21,50], we use that NLCE in our calculations (there is only one maximally connected cluster with a given number of sites). The number of sites in the largest cluster considered defines the order of NLCE, and we denote the value of an observable O(τ ) evaluated with NLCE to order l as O l (τ ).

D. Observables
We study three observables which have properties that make them qualitatively distinct in the context of dynamics and description after equilibration.
The first observable is the total particle number whose value per site, the particle filling, is denoted as n(τ ). This is a conserved quantity with respect to the reference HamiltonianĤ 0 , i.e., it only changes during dynamics underĤ α after the quench if g α = 0. The second observable is the nearest neighbor hopping correlation function whose value per site is denoted as k(τ ), wherê k(τ ) is a local observable, closely related to the kinetic energy per site. It changes during the dynamics independently of whetherN is conserved or not. The third observable is the distribution function whose value per site is the momentum distribution function, denoted as m k (τ ).M k is the unnormalized (to make each k component extensive) Fourier transform of the one-body density matrix, given bŷ m k (τ ) is a non-local observable that changes during the dynamics independently of whetherN is conserved or not. This observable is of particular interest because it is regularly measured (using time-of-flight expansion) in experiments with ultracold quantum gases [52]. m k (τ ) was the observable used in Ref. [2] to show lack of thermalization in 1D Bose gases with contact interactions, and in Ref. [6] to study prethermalization and thermalization 1D Bose gases with dipolar interactions.

E. Parameters used in the calculations
The initial state is taken to be a thermal equilibrium state at temperature T I = 10 and chemical potential µ I = 2 (similar results are obtained for other values of T I , not too low, and µ I ), for an initial HamiltonianĤ I with nearest and next-nearest neighbor coupling parameters t I = 0.5, V I = 1.5, and t I = V I = 0.7. After the quench,Ĥ α has coupling parameters t = V = 1 (these set the energy scale in our calculations), t = V = 0.7, and g α ∈ (0, 0.12). For these parameters after the quench, H 0 [44,45] andĤ α are quantum chaotic and the system thermalizes for all values of g α (see Ref. [21] for an NLCE study of the quench dynamics when g α = 0).
For n(τ ) and k(τ ), we carry out the NLCE up to the 17th order for quenches with g = 0 (the largest Hamiltonian sector that needs to be diagonalized has 65792 states after exploiting reflection symmetry), and, thanks to particle number conservation, up to the 19th order for quenches with g = 0 (the dimension of the largest Hamiltonian sector that needs to be diagonalized is 46252). For m k (τ ), the NLCE is carried out to one order lower than for n(τ ) and k(τ ), namely, up to the 16th order for g = 0 and up to the 18th order for g = 0. This is because of the overhead generated by the calculation of the dynamics of all the matrix elements of the one-body density matrix [see Eqs. (50) and (51)].

VI. NUMERICAL RESULTS
A. Dynamics of the particle filling In Fig. 1(a), we show the evolution of the particle filling underĤ 1 for three values of g 1 . Results are shown for the last two orders of the NLCE up to τ = 100. In those quenches, we expect the convergence errors for n(τ ) to be below 0.01% for times τ < ∼ 4, and to remain low (below 1%) up to times τ ≈ 16 (see Appendix C). For τ > ∼ 16, the results for the last two orders of the NLCE can be seen to (slightly) deviate from each other in Fig. 1(a). In all the plots in Fig. 1(a), n(τ ) can be seen to approach n DE = 1/2. For g 1 = 0.12, n(τ = 100) ≈ n DE .
The approach of n(τ ) to n DE = 0.5 is exponential. This is apparent in Fig. 1(b), where we plot the normalized "distance" to equilibrium We fit δ DE l [n(τ )] to an exponential function ∝ exp[−Γ NLCE l (g 1 ) τ ] to obtain the thermalization rate Γ NLCE l (g 1 ). In order to gain an understanding of the accuracy of the obtained rates, we carry out fits in the time interval τ ∈ [1,16] for l = 17 [the corresponding fits are shown in Fig. 1(b) as thin continuous lines], and in the interval τ ∈ [1, 6] for l = 16. The rates obtained in those calculations are shown in Fig. 1(c), as NLCE-17 and NLCE-16, respectively. They agree with each other within the errors of the fits. This suggests that our calculation of Γ NLCE (g 1 ) is robust. A power law fit to the rates obtained for l = 17 is also shown in Fig. 1(c). We find that Γ NLCE 17 (g 1 ) ∝ g β 1 with β = 1.99, in agreement with the analytical results in Sec. III.
In closing Sec. IV C, we argued that the rateq(τ ) = 0 whenever the stateρ(τ ) commutes exactly withQ. The rate also vanishes ifρ(τ ) is time-reversal invariant about τ . Both conditions apply to our initial statesρ I . As a result, there is a narrow plateau in n(τ ) for τ ≤ 1. This plateau is best seen in Fig. 7. This is why, to obtain the rates reported in Fig. 1(c), we fit n(τ ) at times τ ≥ 1.
Next, we show that the values obtained for Γ NLCE (g 1 ) are in agreement with the ones predicted by Fermi's golden rule. From Eq. (22), changingQ →N and V →V α , one can writė To evaluate Eq. (53) numerically, we replace the delta function by a sum over energies E 0 j that lie within a small energy window E 0 i − ∆E/2, E 0 i + ∆E/2 (see Appendix D).
The thermalization rates are estimated by computing When |n(τ ) − n DE | n DE , Γ Fermi (g α ) becomes independent of τ and n(τ ) relaxes exponentially. This condition is, to a good degree, satisfied for our choice of initial state, for which δ DE l [n(τ )] < 0.09 [see Fig. 1(b)]. Our calculations of Γ Fermi (g α ) are done using full exact diagonalization in chains with L = 17 and 18 sites, and periodic boundary conditions. We identify a range of values for ∆E and τ for which the results for Γ Fermi (g α ) are robust against the choice of ∆E, τ , and L (see Appendix D).
In Fig. 1(c), we report our results for Γ Fermi (g 1 ). They are in excellent agreement with Γ NLCE (g 1 ). We should add that, for quenchesĤ I →Ĥ 2 , Eq. (53) predicts the same leading O(g 2 α ) dynamics as underĤ 1 . This is the case because the terms that change the total particles number are same inV 1 andV 2 . Hence, n(τ ) is the same for both Hamiltonians up to corrections O(g 3 α ).

B. Dynamics of the nearest neighbor hopping
Next, we study the dynamics of the nearest neighbor hopping: k(τ ) [see Eq. (49)]. In contrast to the particle filling studied in Sec. VI A, the nearest neighbor hopping exhibits dynamics even if g α = 0 after the quench.  The predictions of the diagonal and grand canonical (horizontal contiguous line) ensembles are very close to each other, indicating thermalization. This is expected be-causeĤ 0 after the quench is nonintegrable. The small difference between the diagonal and the grand canonical ensemble results are due to the lack of convergence of the NLCE for the former (the latter is fully converged) [50]. Those differences decrease with increasing the order of the NLCE. In what follows, we use the grand canonical ensemble predictions to probe thermalization.
In Fig. 3, we show k(τ ) after quenchesĤ I →Ĥ 1 (left panels) andĤ I →Ĥ 2 (right panels). The dynamics are qualitatively similar in both cases. They can be split in two regimes: (i) fast (prethermal) dynamics driven byĤ 0 (note that, at times τ < ∼ 10, dynamics in Fig. 3 are nearly identical to those in Fig. 2), and (ii) a slower (thermalization) dynamics controlled by the strength of the perturbation. During the latter regime, the system approaches [and reaches for g 1 = g 2 = 0.12 and τ = 100, see Figs. 3(b) and 3(d)] the prediction of the grand canonical ensembleρ GE corresponding toĤ α after the quench (with the temperature set by the initial state).
The slow approach to thermal equilibrium can be well described using the projected stateρ 0 (τ ) from Eq. (42). In Fig. 3, we show the results for the projected dynamics along with those for the actual dynamics. As follows from the discussion in Sec. IV, the results for the projected dynamics approach those of a thermal equilibrium state ofĤ 0 , which we denote asρ GE0 . We compute the temperature T 0 inρ GE0 using the expectation value of H 0 in the thermal equilibrium stateρ GE ofĤ α The chemical potential inρ GE0 is µ 0 = 0 because, for g α = 0, the systems after equilibration are at half filling.
In the left panels in Fig. 3, for quenchesĤ I →Ĥ 1 , one can see the advanced offsets (see Sec. IV) between the dynamics and the projected dynamics. The offsets are much smaller for quenchesĤ I →Ĥ 2 , whose results are shown in the right panels in Fig. 3. The offsets between the actual and projected dynamics remain constant at long times and are, essentially, the difference between the predictions ofρ GE andρ GE0 .
The difference in the offsets generated by quencheŝ H I →Ĥ 1 andĤ I →Ĥ 2 can be understood using Eq. (30), replacingÔ →K [K is defined in Eq. (49)] andV →V α . SinceK andρ 0 (τ ) are block diagonal in the particle number basis, only the presence of terms in V α that do not change the particle number can produce an O(g) correction. Such terms are present inV 1 (the hopping terms), see Eq. (36), but are absent inV 2 , see Eq. (37). This means that, to leading order, ∆k 1 (τ ) ∝ g 1 while ∆k 2 (τ ) ∝ g β 2 with β ≥ 2. In Fig. 4, we show the long-time offsets between the actual and projected dynamics as functions of g α (for positive and negative values of g α ) in the quenchesĤ I →Ĥ 1 [ Fig. 4(a)] andĤ I →Ĥ 2 [ Fig. 4(b)]. The offsets are computed as the difference between the predictions ofρ GE andρ GE0 for the equilibrated results (see the horizontal lines in Fig. 3). Those predictions are converged to ma- chine precision in the 17 th order of the NLCE shown in Fig. 4. Figure 4(a), for quenchesĤ I →Ĥ 1 , makes apparent the presence of a leading linear correction and of a subleading quadratic one. Figure 4(b) shows the absence of the linear correction for quenchesĤ I →Ĥ 2 . There, the leading correction is quadratic, and our numerical results allow us to identify a subleading cubic correction, which leads to a weak asymmetry about g 2 = 0. Comparing the results reported in Fig. 1(a), and in Figs. 3(a) and 3(b), for g 1 = 0.06 and 0.12, it becomes apparent that k(τ ) equilibrates (reaches the long-time horizontal line prediction) faster than n(τ ). This can be understood to be the result of the nearest neighbor hopping being particle-hole symmetric (as the Hamilto-niansĤ α are). This means that, close to equilibrium, the difference between k(τ ) and k GE can only be a function of even powers of the difference between n(τ ) and n = 1/2. Say the leading even power in the difference between n(τ ) and n = 1/2 entering in k(τ ) is 2, theṅ k(τ ) 2Γ Fermi (g α ) k(τ ). Our numerical results for the thermalization rates of k(τ ), not shown, support the correctness of this simple analysis. can be split into a fast prethermalization dynamics, and a slower relaxation to the thermal equilibrium prediction at a rate controlled by the strength of the perturbation. In Fig. 5, we show results for the 15 th (NLCE-15) and 16 th (NLCE-16) orders of the NLCE for m k (τ ) at four times (τ = 0, 2, 10, and 100) after quenches in which g 1 = 0.03 (left panels in Fig. 5) and g 1 = 0.12 (right panels in Fig. 5). In all panels in , the results of the three equilibrium ensembles are nearly indistinguishable from each other. They differ from those of the dynamics at τ = 100 (they all become nearly indistinguishable from each other at later times). For g 1 = 0.12 [ Fig. 5(h)], the results for the three equilibrium ensembles and for the dynamics at τ = 100 agree with each other. This, in contrast to the results for g 1 = 0.03, makes apparent that m k (τ ) thermalizes faster with increasing the magnitude of g 1 (as expected). Also, as expected from our discussion in Sec. IV and for k(τ ) (see Fig. 3) in Sec. VI B, there is an O(g 1 ) offset between the results for m k (τ ) in the dynamics and in the projected dynamics. The magnitude of this offset is momentum dependent, as made apparent by the inset in Fig. 5(d), and is too small to be resolved at the scales used in the main panels of Fig. 5.
The operator corresponding to the momentum distribution function, unlike the one for the nearest neighbor hopping, is not particle-hole symmetric. This implies that, as m k (τ ) approaches its equilibrium value in the diagonal ensemble m DE k , to leading order Thus, we expect m k to thermalize with the same rate Γ Fermi (g 1 ) given by Eq. (54). To quantify the "distance" to equilibrium for m k (τ ), we compute In Fig. 6 Fig. 6(a) is not as good as for δ DE l [n(τ )] in Fig. 1(b). This is understandable because: (i) we are able to calculate one order lower for m k (τ ) than for n(τ ), and (ii) m k (τ ) probes correlations at all distances while n(τ ) is local, and is a thermodynamic quantity.
We fit δ DE l [m(τ )] to an exponential function ∝ exp[−Γ NLCE l (g 1 ) τ ] to obtain the thermalization rates Γ NLCE l (g 1 ) for the momentum distribution function. The fits are carried out in the interval τ ∈ [2, 6] for NLCE-16 [shown as thin continuous lines in Fig. 6(a)], and in the interval τ ∈ [2, 5] for NLCE-15. The rates Γ NLCE l (g 1 ) are reported in Fig. 6(b) for g 1 ∈ [0.03, 0.12]. In Fig. 6(b), we also plot the rates Γ Fermi (g 1 ) obtained by evaluating Fermi's golden rule [see Eqs. (53) and (54)] using full exact diagonalization in chains with L = 18 sites and periodic boundary conditions (see Sec. VI A and Appendix D). [The rates Γ Fermi (g 1 ) were also reported in Fig. 1(c).] Figure 6(b) shows that, as advanced, the thermalization rates for the momentum distribution function are the same (within our computational errors) as the ones for the particle filling. A power law fit to the rates Γ NLCE 16 (g 1 ) is also shown in Fig. 6(b). We find that Γ NLCE 16 (g 1 ) ∝ g β 1 with β = 2.00, in agreement with the numerical results in Sec. VI A, and with the analytical ones in Secs. III and IV.

VII. SUMMARY
We have put forward a conceptually simple scenario for prethermalization and thermalization in generic isolated quantum systems with a weakly broken conservation law. This scenario applies equally to noninteracting and strongly interacting integrable systems in which integrability is weakly broken, an example of the latter was recently shown numerically to exhibit a thermalization rate that is O(g 2 ) [21], as well as to nonintegrable ones in which a conservation law is weakly broken. The weak perturbation allows the system to equilibrate to a state that is a (generalized) thermal equilibrium state of the unperturbed system. The properties of such a state are determined by the slowly changing value of the quasiconserved quantity. The separation of time scales leads to a universal description, two aspects of which stand out: (i) The dynamics of the quasi-conserved quantity is described by an autonomous equation that can be constructed from Fermi's golden rule in unperturbed equilibrium ensembles. This equation is the generalization of the nonlinear Boltzmann equation appearing in weakly interacting quantum system. (ii) The deviation of observables in the instantaneous state from the prediction of the unperturbed equilibrium ensemble is described by first order perturbation theory. This generalizes the concept of 'deformed GGE' that was described in Ref. [11] for integrable systems in the presence of a weak integrability breaking perturbation.
Our theoretical results, as well as several special behaviors related to the initial state selected, properties of the perturbations that break the conservation law, and properties of the observables studied, were validated by our numerical calculations. To stress the algebra of the problem, we use a notation different from that in the main text. We consider a linear space V with a projector P : V → V , P 2 = P, and a linear evolution equatioṅ Then, we write P 1 = P, P 2 = (1 − P), a i = P i a, and M ij = P i M P j . As the projector is time-independent, the evolution equation can be cast as a system of two coupled equations: We first formally solve the second of these equations See Eqs. (7) and (12) in the main text. The relevance of this framework to irreversible phenomena was realized in Refs. [38,39].
in more detail. Each L 1 contains a commutator −ig[V , ·], so in total there are four terms, depending on whether gV in each L 1 -term acts on the left or on the right. We decompose where the 'gain' operator contains the two cases in which theV 's act on different sides, and the 'loss' operator contains the two cases in which they act on the same side.
For both operators, we can recast the two cases into a single formula by extending the integration range of s from [0, ∞) to (−∞, ∞). One then has that while for K loss acting onρ one has where we abbreviatedP e0,q = P (Ĥ 0 ≈ e 0 L)P (Q ≈ qL) (see Section III A), and we used the invariance of Pρ and P e0,q under the evolution generated byĤ 0 . Note that we can assume, without loss of generality, thatV satisfies V e0,q = 0 for all (e 0 , q), because a term with nonzero mean would have canceled out in the commutators [alternatively, it would cancel between Eqs. (B3) and (B4)].
For this reason, we can replace all correlation functions below by truncated correlation functions.
Recall that we identified the space of Pρ with distributions p on (e 0 , q). Indeed, any Pρ is of the formρ p with distribution p, see Eq. (3). Therefore, we can now abuse notation and interpret K as a kernel on the space of distributions p. We then have Kp(e 0 , q ) = de 0 dqK(e 0 , q; e 0 , q )p(e 0 , q), and, from the above formulas, we identify To unravel this further, we introducê whereP Q are spectral projections ofQ and the sums runs over the eigenvalues ofQ. Note that the admissible values of δQ are O(1) becauseV is a sum of O(1) local terms.
For natural examples ofQ, such as the total particle number operator, we see that alsoV δQ is a sum of local terms. In that case Eq. (B7) is manifestly of order O(L), as a truncated correlation function in equilibrium. The first term in Eq. (B6) can be written as with e 0 = e 0 and q − q = δQ/L. Indeed, the condition e 0 = e 0 is enforced by the integral over s. The second term in Eq. (B6) has a contribution at e 0 = e 0 , q = q only (by the cyclic property of the trace, the projector P e 0 ,q is put next to the density matrixρ e0,q ) and its value is δQ r e0,q (δQ).
One could also have guessed this value because the process generated by K conserves probability, which translates to dqde 0 K(e 0 , q; e 0 , q ) = 0.
We have now explicitly interpreted the process generated by K as a jump process, with jump rates r e0,q (δQ) ≥ 0 for jumps in the density q of order 1/L. The link to the 'drift' computed in Sec. III F is by The expressions in Sec. III F are recovered by writing Eq. (B8) in terms of eigenkets ofĤ 0 , and this eventually yields Eq. (24).

Appendix C: Convergence of NLCE and exact diagonalization
All the numerical results reported in the main text, but the relaxation rates Γ Fermi computed using Fermi's golden rule and full exact diagonalization, were obtained using NLCE calculations. The basics of NLCEs was summarized in Sec. V C, and relevant parameters for the NLCE calculations (orders, largest Hilbert spaces involved, etc) were mentioned in Sec. V E. Here we discuss the convergence of the NLCE calculations and finite-size effects in the full exact diagonalization calculations.
All our full exact diagonalization calculations are carried out in chains with periodic boundary conditions. We use translational symmetry to block-diagonalize the Hamiltonian, which allows us to study larger chains than within the NLCE calculations. In the exact diagonalization calculations when g α = 0, in the absence of particle number conservation, the largest Hamiltonian sector diagonalized has 14602 states (for L = 18). When g α = 0, in the presence of particle number conservation, the largest Hamiltonian sector diagonalized has 9252 states (for L = 20). We only report exact diagonalization results for g α = 0 in Fig. 9(a). The matrices involved in our full exact diagonalization calculations are complex for sectors with total quasi-momentum k = 0 and π, and the results reported contain the contribution from all L quasi-momentum sectors.
In Fig. 7, we show the evolution of the particle filling n(τ ) for g 1 = 0.12 (the fastest changing case studied) as obtained in the last four orders of the NLCE, and in the three largest chains solved using full exact diagonalization. In the scale of the figure, all the results are nearly indistinguishable from each other up to τ ≈ 10. Beyond that time, but not too far from it, the NLCE results can be seen to oscillate in the order of the expansion, even orders of the NLCE are above the odd orders (see the inset in Fig. 7). With increasing the order of the NLCE, one finds that the amplitude of the oscillation decreases and the results converge at longer times. This is similar to what happens in thermal equilibrium calculations, in which the NLCEs converge at lower temperatures as one increases the order of the expansion [49]. The exact diagonalization results, on the other hand, can be seen to approach the NLCE ones monotonically with increasing the chain size. In Fig. 7, the n(τ ) results in the last order of the NLCE and in the largest periodic chain diagonalized are nearly indistinguishable up to τ ≈ 20.
To gain a more quantitative understanding of the convergence of the particle filling n l (τ ) calculations within NLCE, where l is the order of the expansion, and of finitesize effects in the exact diagonalization calculations of n L (τ ), where L is the chain size, we compute the relative differences between the NLCE results at order l and the last order calculated l = 17, and between the results for chains with L sites and the largest chain L = 18 diagonalized. We also compute the relative differences between the last order of the NLCE and the largest periodic chain diagonalized In Eqs. (C1)-(C3), n I is the initial particle filling, which is obtained within machine precision at the 17 th order of the NLCE, and 1/2 is the filling in the diagonal ensemble.
In Fig. 8, we plot the relative differences defined in Eqs. (C1) and (C2) for l = 15 and 16 of the NLCE, and for L = 16 and 17 in the exact diagonalization calculations, also for g 1 = 0.12 as in Fig. 7. Some points to be highlighted from the plots in Fig. 8   results for l = 16 are likely converged within machine precision up to τ ≈ 2, while the exact diagonalization ones for L = 17 are likely converged within machine precision only up to about one half of that time (τ ≈ 1).
(ii) The relative differences between the various orders of the NLCE and between the various exact diagonalization calculations are below 0.01% for τ < ∼ 4. (iii) At times τ ≈ 16, the relative differences are in all cases below 2% (they are smaller between the exact diagonalization results than between the NLCE ones).
The results for the relative difference ∆ NLCE-ED [n(τ )] are qualitatively similar to those for ∆ NLCE l [n(τ )] and ∆ ED L [n(τ )]. We find that ∆ NLCE-ED [n(τ )] ≤ 0.01% for times τ ≤ 5. We use times τ ≤ 5 for the exact diagonalization calculation of the rates from Fermi's golden rule in Appendix D. We also find that ∆ NLCE-ED [n(τ )] < ∼ 0.5% for times τ ≤ 16. We use times τ ≤ 16 in the fits to obtain the rates from the 17 th order of the NLCE dynamics in Sec. VI B.
Next, for the nearest neighbor correlations k(τ ), we discuss the convergence of the NLCE calculations k l (τ ), and finite-size effects in the full exact diagonalization calculations k L (τ ). k(τ ), being a correlation function, is more challenging to obtain accurately than n(τ ), which is a thermodynamic quantity.
In Fig. 9(a), we show dynamics after the quenchĤ I → H 0 for k l (τ ) with l = 17, 18 and 19 (the latter two are also reported in Fig. 2) and for k L (τ ) with L = 18, 19 and 20. k l (τ ) and k L (τ ) are almost indistinguishable from each other up to times τ ≈ 5 (the earliest times are not shown to gain dynamical range in the y axis  than the exact diagonalization results for the diagonal ensemble (DE ED-20 ). The differences between them are due to lack of convergence of the diagonal ensemble within NLCEs and finite-size effects for the diagonal ensemble within exact diagonalization. The NLCE results for the grand canonical ensemble are converged within machine precision. Consequently, at long times, the NLCE is expected to be more accurate than exact diagonalization. At intermediate times 5 < ∼ τ < ∼ 15, k l=16 (τ ) and k l=17 (τ ) are almost indistinguishable from each other, while k L (τ ) shifts upward with increasing L, toward the NLCE predictions. This suggests that NLCE is also more accurate than exact diagonalization at intermediate times.
In Fig. 9(b), we show dynamics after the quenchĤ I → H 1 (g 1 = 0.12) for k l (τ ) with l = 15, 16, and 17 (the latter two are also reported in Fig. 3) and for k L (τ ) with L = 16, 17, and 18. k l (τ ) and k L (τ ) are almost indistinguishable from each other up to times τ ≈ 4 (again, the earliest times are not shown to gain dynamical range in the y axis). For τ > ∼ 4, the NLCE and exact diagonalization results again split, as for the quenchĤ I →Ĥ 0 , despite the fact that both equilibrate to diagonal ensemble results that are very close to each other. k DE l=17 (DE NLCE-17 ) and k DE L=18 (DE ED-18 ) in Fig. 9(b) are almost indistinguishable from each other and from the grand canonical ensemble result k GE l=17 (GE NLCE-17 ). At times 5 < ∼ τ < ∼ 15, k l=16 (τ ) and k l=17 (τ ) are almost indistinguishable, while k L (τ ) shifts upward toward the NLCE predictions with increasing L. As for the quenchesĤ I →Ĥ 0 , these results suggest that NLCE is more accurate than exact diagonalization at intermediate times in quenchesĤ I →Ĥ 1 .
The discrepancy between the NLCE and exact diagonalization results at intermediate times after quencheŝ H I →Ĥ 1 is likely a manifestation of finite-size effects in k L (τ ) that result from the "prethermal" dynamics seen in Fig. 9(a), which equilibrates to a diagonal ensemble value (k DE L=20 ) that is lower than the expected grand canonical ensemble one. The results in Fig. 9(b) also make apparent that, because of finite-size effects, the thermalization rates for k(τ ) are smaller in exact diagonalization, and increasing with increasing L, than in NLCE.
Appendix D: Thermalization rates from Fermi's golden rule We use full exact diagonalization in chains with L sites and periodic boundary conditions to evaluate Eq. (53). For the numerical calculation, the delta function is replaced by a coarse-graining procedure leading to the following modified version of Eq. (53) where |E 0 i (|E 0 j ) are the eigenkets ofĤ 0 with energy E 0 i (E 0 j ), N i = E 0 i |N |E 0 i , and P i (τ ) = E 0 i |ρ(τ )|E 0 i . In Fig. 10, in Fig. 10(a) and τ = 5 in Fig. 10(b), and for three values of g 1 at each time. Our main finding in Fig. 10 is that, in the interval ∆E/L ∈ [0.03, 0.09], f ∆E (τ ) is nearly independent of ∆E, and is approximately the same in chains with L = 17 and L = 18. Similar results were obtained at times τ ∈ [0, 5], for which we showed in Appendix C that our best NLCE and exact diagonalization results for n(τ ) differ by less than a 0.01%.
Having identified an appropriate range of values of ∆E, we compute the rate which is defined following Eq. (54). Figure 11 shows Γ ∆E,τ (g 1 ) vs τ for three values of g 1 , and for three values of ∆E/L for each value of g 1 . The rates for each value of g 1 decrease slowly with increasing τ , and are very close to each other for the three values of ∆E/L shown. Given the results for n(τ ) at times τ ≤ 1 in Fig. 9, which exhibit a plateau-like behavior discussed in Sec. VI A, the rates Γ Fermi (g α ) for τ ≤ 1 are not meaningful. Hence, the rates Γ Fermi (g α ) reported in Sec. VI A were obtained averaging Γ ∆E,τ (g α ) over the results for τ = 1, 1.5, . . . , 5 (9 values), and for ∆E/L = 0.03, 0.032, . . . , 0.09 (31 values), for a total of 279 values entering each average. In Fig. 11, we report the averages as horizontal lines. For each average, we also compute the standard deviation. In Fig. 1(c), we report the averages, and the standard deviations (as errorbars), for different values of g 1 and for L = 17 and L = 18.