Thermalization near integrability in a dipolar quantum Newton's cradle

Isolated quantum many-body systems with integrable dynamics generically do not thermalize when taken far from equilibrium. As one perturbs such systems away from the integrable point, thermalization sets in, but the nature of the crossover from integrable to thermalizing behavior is an unresolved and actively discussed question. We explore this question by studying the dynamics of the momentum distribution function in a dipolar quantum Newton's cradle consisting of highly magnetic dysprosium atoms. This is accomplished by creating the first one-dimensional Bose gas with strong magnetic dipole-dipole interactions. These interactions provide tunability of both the strength of the integrability-breaking perturbation and the nature of the near-integrable dynamics. We provide the first experimental evidence that thermalization close to a strongly interacting integrable point occurs in two steps: prethermalization followed by near-exponential thermalization. Exact numerical calculations on a two-rung lattice model yield a similar two-timescale process, suggesting that this is generic in strongly interacting near-integrable models. Moreover, the measured thermalization rate is consistent with a parameter-free theoretical estimate, based on identifying the types of collisions that dominate thermalization. By providing tunability between regimes of integrable and nonintegrable dynamics, our work sheds light both on the mechanisms by which isolated quantum many-body systems thermalize, and on the temporal structure of the onset of thermalization.

Isolated quantum many-body systems with integrable dynamics generically do not thermalize when taken far from equilibrium. As one perturbs such systems away from the integrable point, thermalization sets in, but the nature of the crossover from integrable to thermalizing behavior is an unresolved and actively discussed question. We explore this question by studying the dynamics of the momentum distribution function in a dipolar quantum Newton's cradle consisting of highly magnetic dysprosium atoms. This is accomplished by creating the first one-dimensional Bose gas with strong magnetic dipole-dipole interactions. These interactions provide tunability of both the strength of the integrability-breaking perturbation and the nature of the near-integrable dynamics. We provide the first experimental evidence that thermalization close to a strongly interacting integrable point occurs in two steps: prethermalization followed by near-exponential thermalization. Exact numerical calculations on a two-rung lattice model yield a similar two-timescale process, suggesting that this is generic in strongly interacting near-integrable models. Moreover, the measured thermalization rate is consistent with a parameter-free theoretical estimate, based on identifying the types of collisions that dominate thermalization. By providing tunability between regimes of integrable and nonintegrable dynamics, our work sheds light both on the mechanisms by which isolated quantum many-body systems thermalize, and on the temporal structure of the onset of thermalization.

I. INTRODUCTION
In classical physics, chaos and the approach to thermal equilibrium are intimately related: the irregular spacefilling trajectories of a chaotic system sample all of phase space. An integrable system, on the other hand, executes simple closed orbits. Systems that are nearly but not strictly integrable (such as the famous Fermi-Pasta-Ulam chain [1]) have a rich multiple-timescale dynamics, and equilibrate extremely slowly. Classical thermalization near integrability is understood in terms of Kolmogorov-Arnold-Moser (KAM) theory [2] and related concepts. Classical chaos and KAM theory are based on the notion of phase-space trajectories, whereas quantum chaotic dynamics and thermalization are understood in terms of a different conceptual framework, involving random matrix theory and the eigenstate thermalization hypothesis [3][4][5][6][7][8]. Within this framework, there is no general theory of thermalization in near-integrable systems, though it has been widely discussed [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26]. Moreover, numerical exploration of such questions is challenging because the achievable system sizes are quite small if one wishes to simulate to arbitrarily long times [11,12].
Experimental studies are far less limited by finite-size concerns. In a pioneering experiment [27], oppositely moving bunches of ultracold bosonic atoms were confined to an array of one-dimensional (1D) tubes; atoms in this quantum Newton's cradle collided repeatedly, yet did not thermalize as atoms in a 3D trap would. Rather than ex-hibiting thermalization or revivals [1], a nonthermal momentum distribution persisted to long times. Such longlived, nonthermal states are often termed prethermal states, and are naturally present in nearly integrable systems; they have been experimentally observed in weakly interacting, quasi-1D quantum gases [28][29][30]. The question of how such prethermal states eventually thermalize, once integrability is broken in the presence of strong interactions, remains unexplored. In particular, there is no theoretical consensus even on the basic question of whether relaxation involves two distinct timescales or three [13,14,[17][18][19].
Motivated by these findings, we explore the onset of thermalization in a nearly integrable, strongly interacting system-an array of dipolar quantum Newton's cradles consisting of dysprosium atoms-subject to an integrability-breaking perturbation of tunable strength, namely the magnetic dipole-dipole interaction (DDI); see Fig. 1(a) [31]. The tunability of our system enables us to systematically map out how the dynamics of observables changes as the system moves away from integrability; this has never before been done experimentally. We focus on an observable, the momentum distribution of the interacting dysprosium atoms, that exhibits nontrivial dynamics even in the integrable limit (because of the presence of contact interactions and confining potentials). We find that the dynamics of the momentum distribution exhibits two temporal regimes: rapid dephasing followed by a nearly exponential approach to the thermal distribution. This is similar to numerical results Application of an optical phase grating (not shown) kicks atoms along the tubes and the weak harmonic confinement induces periodic collisions. The highly magnetic dysprosium atoms (silver spheres) are trapped in a 2D optical lattice (red) defining the tubes. (b) DDI strength between the central atom and atoms in neighboring tubes in one quadrant of the square lattice when θ = 90 • , where θ is the angle between the B-field in the xz-plane and the axis of the 1D tubes alongx. The pattern of strengths in the other three quadrants are the same by symmetry. The DDI strength is tunable via changing θ. Numbers labeled on the atoms and the tubes are pair-wise DDI strength and integrated DDI along the tubes, respectively, in Hz. Blue indicates large positive, while red large negative strength. (c) Dependence of integrability-breaking dipolar interaction strength and γ on θ. Solid curves: Integrability-breaking contributions of the DDI energy UDDI (added in quadrature and defined in Appendices A-C) versus θ. Shown are the total, total intertube, and total integrability breaking intratube DDI energies. While the intratube DDI is maximally repulsive (attractive) for θ = 90 • (0 • ), it vanishes among intratube atoms for θ = 55 • . Integrability-breaking DDI contributions come not just from the intratube 1D DDI alongx, but also from the 3D DDI between atoms in all neighboring tubes alongŷ andẑ. This dilutes the tunablility of the DDI, reducing the contrast to a factor of ∼1.5 between θ = 0 • and 90 • . Dashed curve: Lieb-Liniger parameter γ(θ); see Appendix D for calculation. obtained in weakly interacting systems near a noninteracting limit [19,24] even though our integrable limit is strongly interacting. We corroborate the generality of these findings using exact diagonalization calculations of a two-rung hard-core boson model with inter-rung nearest neighbor interactions.
Furthermore, the thermalization rate extracted experimentally can be quantitatively captured by a simple physical picture: the thermalization mechanism involves an effective three-body collision, consisting of an intratube s-wave scattering event (the strength of which controls the parameters of the integrable model) together with an intertube dipolar scattering event (which serves as the dominant integrability-breaking perturbation). Both couplings are sensitive to the DDI. Based on our experimental observations, we argue that the thermalization rate depends not only on the strength of the integrability-breaking perturbation, but on the parameters of the integrable model itself.

II. DIPOLAR QUANTUM NEWTON'S CRADLE
The dipolar quantum Newton's cradle consists of ultracold bosonic dysprosium atoms, which have a magnetic DDI ∼100× stronger than, e.g., Rb's. The Bose-Einstein-condensed (BEC) atoms are tightly confined in 1D potentials created by a 2D optical lattice. The atoms are kicked into motion using an optical phase grating, and two packets of atoms in opposite momentum states collide twice each period of motion along the weakly confined direction of the 1D tubes. The integrabilitybreaking interaction strength mediated by the DDI is tuned by changing the angle θ that the dipoles of the atoms (set by the magnetic field orientation) make with respect to the 1D tube axis; see Fig. 1(b). We now describe the experimental details.

B. 2D optical lattice
We adiabatically load the BEC into the 2D lattice by simultaneously turning on the two lattice beams using a 150-ms exponential ramp. The atoms are confined to ∼700 parallel one-dimensional (1D) tubes, with ∼50 atoms in each tube. The 2D lattices are formed by retroreflecting a pair of beams in theŷ andẑ directions. Both beams are red-detuned from the Dy narrowline λ = 741-nm transition [33] by 13.7 GHz. The waist radii of theẑ-lattice beam and theŷ-lattice beam at the BEC position are 195 µm and 150 µm, respectively. Both beams are linearly polarized, and the polarization direction is chosen to be perpendicular to the applied magnetic field (confined to the xz-plane) such that the total AC Stark shift is maximal, including the tensor shift [34]. Theẑ-lattice beam is polarized alongŷ, such that the total light shift is constant for any θ. The polarization of theŷ-lattice beam lies in the xz-plane and is set by a half waveplate to be perpendicular to the field direction for each θ setting. The lattice depth is calibrated using the Kapitza-Dirac diffraction method [35]. We experimentally verified that the depth of theẑ lattice is independent of θ. For theŷ lattice, we experimentally find the optimum waveplate angle and calibrate the lattice depth for each θ setting.
We used a lattice depth of V 0 = 18.0(3)E R , leading to a transverse trap frequency ω ⊥ = k R 2V 0 /m = 2π × 19.0(2) kHz [36], where k R = 2π/λ is the recoil momentum and E R = (hk R ) 2 /2m. To achieve V 0 = 18E R , the power of theẑ-lattice beam is set to 250 mW and that of theŷ-lattice beam is tuned between 130-170 mW as θ is changed. This power tuning is required to compensate for both the θ-dependent change in the tensor part of the atomic light shift and the loss of power through polarization-dependent optics as the laser's polarization is rotated to follow θ. The Gaussian intensity profile of the lattice beams, though broader than the ODTs, increases ω x to 2π × 60(1) Hz at ω ⊥ = 2π × 19.0(2) kHz. The atoms oscillate within each tube with a frequency 1/T = 60(1) Hz.

C. Kicking the cradle in motion
After loading into the 2D lattice, we split the gas into two equal but opposite |±2hk D momentum states by applying a precisely timed double-pulse 1D optical phase grating along the tube direction [27,37,38]; The phase grating beams are also redtuned 13.7 GHz from the 741-nm transition. The two beams are linearly polarized alongẑ and are oriented along (x +ŷ)/ √ 2 and (−x +ŷ)/ √ 2 directions. Large momentum collisions can occur every T /2 = 8.3(1) ms; the maximum collision energy between a pair of atoms is up to E c = 2(2hk D ) 2 /(2m) = h × 9.0 kHz. This energy is three-times-lower than that required for transverse motional excitations due to the large transverse trap frequency ω ⊥ /2π = 19 kHz [39]. Atomic motion is therefore restricted to 1D [40,41].
We experimentally observe that kicking the gas at different θ leads to different populations of undiffracted atoms. These atoms are manifest in the momentum distribution as a small central peak in the dephased momentum distribution. This central peak, though small, has a shape and height that varies with θ and therefore biases the distance-to-thermalization (DT) metric of the dephased distribution. (DT is defined in Sec. II F below.) Among the reasons for this effect may be the dependence on the shape of the initial momentum distribution on θ due to a dependence of the diffraction efficiency on DDI strength. To mitigate this systematic, after kicking the gas, we allow the distribution to evolve with θ fixed to 35 • for 5 periods of oscillation before we rotate the field to the desired θ setting. This rotation takes 20 ms using a linear ramp. The ramp time is much shorter than the thermalization timescale of interest. Appendix H shows data demonstrating that this procedure results in a dephased momentum distribution that exhibits no systematic variation in DT versus the target θ setting. Moreover, data are shown that demonstrate that the time chosen for the rotation also does not affect the subsequent thermalization rate.

D. Thermalization tunability
To control thermalization, we break integrability through collisions mediated by the angle-tuned DDI. The effect of the DDI can be understood perturbatively as follows. In 1D, two-particle collisions only swap momenta between particles, leaving the overall momentum distribution invariant. In integrable systems, three-and moreparticle collisions also have the same property: they are "non-diffractive." The non-zero-range DDI breaks integrability by inducing diffractive three-particle collisions, which simultaneously change three momenta; the three particles involved need not reside in the same tube. For example, two particles in the same tube can collide via short-range interactions while interacting with a third particle in a nearby tube via the long-range DDI. This should lead to thermalization of the momentum distribution [41].
The DDI's anisotropic nature, proportional to 1 − 3 cos 2 (θ), provides control of the DDI strength through tuning of θ; see Figs. 1(b) and 1(c) and Appendices A-C. Several experimental imperfections can also break integrability, though none in the strongly θ-dependent fashion we observe. Chief among these are heating and atom loss from spontaneous emission due to absorption of the optical trap confinement light [42]. Neither of these effects dominates thermalization at the employed trap depth; see Appendices E and F. Tunneling between the tubes also breaks integrability; however, we estimate its contribution to the observed thermalization is negligible; see Appendix I. Lastly, virtual excitation of transverse motion can mediate diffractive three-body interactions and the longitudinal confinement can break integrability. Both contributions are expected to be small for our system [43][44][45][46].
We note that dipolar effects were far weaker in the Rb-based experiment of Ref. [27]. Dy has a dipole moment µ that is 10 times larger than Rb's. Since the ther- malization rate is proportional to the dipolar interaction squared (as we demonstrate in Sec. III B), and therefore to µ 4 , the contribution to the thermalization rate due to dipolar interactions was ∼10 4 -slower in the Rb experiment.

E. Oscillation evolution and observation of momentum distribution
After we allow the state to dephase following the initial kick, we rotate the field to the target angle θ and hold constant the power of the lattice beams and the optical dipole trap beams for a duration of varying integer multiples of oscillation half-periods, T /2. To measure the evolved momentum distribution alongx, we first deload the lattice using a 500-µs exponential ramp, and then suddenly turn off (in <10 µs) the ODT beams. The lattice deloading time is slow compared to the bandexcitation timescale (∼50 µs), but fast compared to the thermalization timescale in the 3D trap [∼100 ms, see 3D thermalization data in Fig. 2(a)]. Therefore, this deloading procedure constitutes a band-mapping operation [47] that adiabatically transfers the quasimomentum distributions in the lattice confinement directions (ŷ andẑ) into real momentum distributions, but does not affect the momentum distribution along the tube directionx, the direction of interest.
We image the gas alongŷ after 14 ms of time-of-flight using absorption imaging at the 421-nm transition. The images are the sum of the contributions from all tubes. We integrate the 2D distribution alongẑ to obtained a 1D distribution p(x) because the momentum distribution of interest is alongx and the band-mapping procedure produces an approximately flat distribution alongẑ within the first Brillouin zone.
We observe no atomic population outside the lowest, ground-state band inẑ, verifying that the 2D lattice confinement realizes an effective 1D environment for the atoms. We cannot directly observe the expanded atomic distribution alongŷ, the imaging direction, but we expect atoms also remain in the ground band due to the identical depth and deloading procedure used for both lattices. We also note that a time-of-flight expansion without transverse 1D confinement also eliminates complications arising from interaction effects during expansion. For measuring thermalization in a 3D trap as in Fig. 2(a), we diffract the BEC without loading into the lattice and allow the gas to evolve in the crossed ODT. The oscillation period in thex-direction is 14.8(1) ms in this trap. The 3D gas thermalizes within seven oscillation periods.
F. Distance-to-thermalization metric Figure 2 shows the momentum distribution evolution of a kicked gas in a 3D dipole trap as well as the evolution for 1D gases at different θ's. We quantify the distanceto-thermalization (DT) of a measured momentum distribution p(x) by fitting p(x) to a Gaussian distribution f (x) = ae −x 2 /(2σ 2 ) + mx + b, where the last two terms accounts for background gradient and offset of the image, respectively. We then compute the quadrature sum of the fit residuals, is the fitted distribution, i is the pixel index, and t is the holding time. See Appendix F for a discussion of the spontaneous emission heating analysis and Appendix G for comments regarding other DT metrics. The detection noise causes DT (t) to decrease to a finite positive value rather than zero when p(x) becomes thermal: At long holding times DT (t) reaches a constant, as evident in Figs. 3 and 11. We use the mean and standard deviation of all the DT (t) values in the constant region across all measurements as the mean and uncertainty of the noise floor, respectively. The natural log of the DT is plotted in Fig. 3 for these θ's.
The post-kick dephasing of oscillations (which constitute regime I of evolution discussed below) reduces the initial density, allowing the gas to achieve a larger γ(θ) = 2.2-7.4, placing the system in the crossover to the TG; see Fig. 1(c) for a plot of γ(θ) and Appendix D for more details. However, the post-kick kinetic energy scale is also much larger, and whether fermionization transiently persists during the far-from-equilibrium, post-kick evolution is a priori unclear [54]. Once thermalized, the gas is classical in nature.
One can estimate post-kick interaction effects as follows: the characteristic length-scale of the nonequilibrium state is given by the wavelength of the standingwave phase-grating pulse: λ = λ/ √ 2 ≈ 520 nm. A dimensionless ratio of this scale to a total 1D (θ) = 2h 2 /[mg total 1D (θ)], defined as γ ≡ 2λ /a total 1D (θ), generalizes the zero-temperature quantity γ to this far-fromequilibrium situation. The logic is the same as when defining γ, or generally when considering whether a problem involves weak or strong correlations: one considers the ratio of the interaction strength [55] to kinetic energy. However, since the system is far from equilibrium, the kinetic energy is no longer set by the density, but is in general much larger. We find that γ (θ) ranges from 0.9 to 3.1.

III. THERMALIZATION OBSERVATIONS
We now describe the two regimes of thermalization evolution in the experimental results. The evolution of the kicked, bimodal distribution to a dephased, flattop distribution at a time 7T is shown in Fig. 2(b) for the example of θ = 35 • . Figure 3 shows the full evolution for this θ, where the vertical dashed line at 10T demarcates the boundary between regime I and II. See Appendix H for more details.

A. Regime I evolution
The first regime, characterized by a fast decay in log(DT), is governed by dephasing effects, which brings the system to a prethermal state. Dephasing is dominated by dynamics arising from the inhomogeneous trapping potential in the presence of interactions. There are two distinct dephasing processes due to the trap: (1) dephasing of oscillations between different harmonic tubes, owing to their different natural frequencies and subsequent ensemble averaging over tubes with different T during the imaging process; and (2) dephasing of the oscillations of the gas in a single tube, owing to its anharmonicity. Both processes were discussed in Ref. [27]. These processes correspond to different physics: process (1) yields an approximately stationary state as an artifact of averaging over tubes, while process (2) causes dephasing in each individual tube.
We have quantified these trap-induced technical dephasing processes by a collisionless classical simulation of the particle dynamics in each anharmonic tube, averaged over the inhomogeneous tubes; see Appendix J. This allows us to use knowledge of trap parameters to predict the dephasing timescale of processes (1) and (2). The simulation shows that the momentum distribution completely dephases in approximately 150 ms. It also shows that the contribution from process (2) is as important as process (1).
We note that the simulated dephasing time is slightly longer than that observed in the experiment; see Fig. 2(b). The discrepancy is likely due to the lack of interactions in the simulation, though could also be the result of an imperfect modeling of the trap arising from uncharacterized distortions to the beam shapes and overlap of the beam focus. Interactions are expected to rapidly broaden the initial momentum peaks [56] and, hence, to speed up dephasing. Indeed, our exact diagonalization simulations in Fig. 6 show that a rapid decay due to interacting integrable dynamics ensues after the quench even in the absence of technical dephasing. (Section IV and Appendix L describe these simulations in more detail.) However, we remark that in a strictly harmonic trap, integrable interactions alone are not expected to yield a stationary distribution as we observe in our anharmonic system: processes (1) and (2) together with interactions are important in the experiment.
To gain further understanding of the interplay between technical dephasing and interaction effects in our system, we performed an experimental study involving a singlesided kick measurement. This measurement reveals the effect of technical dephasing in the absence of high-energy collisions, i.e., head-on collisions. See Appendix K for experimental details and comments. We observe that the dephasing time, where the initial fast decay of the DT transitions to a much slower decay, is ∼70 ms and is similar to the dephasing time observed in our doublesided data in Fig. 2(b). We conclude that dephasing due to anharmonicity and inhomogeneity in the presence of interactions, but in the absence of large-momentum collisions, explains regime I evolution.
We note that integrable dynamics immediately after a quench are often referred to as prethermalization. During prethermalization, observables not directly related to the conserved quantities dephase; see, e.g., Refs. [25,28,83]. In this experiment, the observable is the momentum distribution of the atoms, while what is conserved at integrability is the distribution of the so-called rapidities. At zero density in interacting systems, or in noninteracting systems, the rapidities are the same as the momenta of the particles (in systems that are translationally invariant). However, at nonzero densities in interacting systems the rapidities are not easily related to the momenta of the atoms [58]. As a result, even though the distribution of rapidities does not change, physical observables such as the momentum distribution function of for definition of thermalization rate 1/τ th . Gray curve is the scaling estimate γ 2 (θ)U 2 total (θ)/(Ec) with no free parameters or offset. Vertical bars and light blue band indicate standard error; atom number noise and a3D uncertainty dominate the latter. Evidently, the thermalization rate in the dipolar quantum Newton's cradle is well-described by terms dependent on both the long-range and short-range parts of the DDI, the former through total (inter-plus intratube) U 2 total (θ) and the latter through the intratube DDI dependence of a intra 1D (θ) in γ 2 (θ).
the atoms can change and do change in this experiment.
Regime I is the prethermalization regime in our experiment. The dephased state at the end of regime I can be described using a generalized Gibbs ensemble [59] arising from the combination of all three mechanisms. Physically, mechanisms (1) and (2), and integrable interactions, give rise to a dephased state through independent prethermalization processes, and each dephased state may be described by a generalized Gibbs ensemble. We therefore refer to the final dephased state at the end of regime I as a "prethermal state" regardless of its prior history. We now turn to the thermalization of this prethermal state.

B. Regime II evolution and thermalization rate
To determine the thermalization rate for the regime II slow-decay evolution data shown in Fig. 3, we fit the regime II decay to (1) which is asymptotically a single exponential decay characterized by a rate 1/τ th at short times and becomes a constant noise value DT 0 at long time. Here τ th and t th are free parameters, and DT 0 is determined from the data.
The fitted rates versus θ are plotted in Fig. 4. The rates are corrected for spontaneous-emission heating; see Ap-pendix F. Comparing to the total DDI and γ(θ) plotted in Fig. 1(c), we see that the slowest (fastest) thermalization rate occurs at small (large) θ where both the DDI and γ are smallest (largest), with a monotonic increase from low to high θ. While there is no ab initio theory we can yet invoke to explain either this trend or magnitude, we can provide a simple estimate. We expect the thermalization rate to scale as the square of both the contact and dipolar interactions, as the largest integrability-breaking perturbation involves both an s-wave collision and a twobody dipolar collision. The matrix element giving rise to thermalization is linear in both the s-wave collision rate and the DDI, and thus by Fermi's Golden Rule, the rate is quadratic in both quantities. An appropriate measure of contact interactions in the nonequilibrium state is γ , as argued in Sec. II G above. This suggests that the rate should scale as γ 2 (θ)U 2 total (θ)/E c , where E c = 2E k is the collision energy of two intratube atoms and E k = (2hk D ) 2 /2m. This simple estimate, plotted in Fig. 4, is in good quantitative agreement with the data.

IV. EXACT DIAGONALIZATION CALCULATIONS
In what follows, we relate the observation of a twotimescale evolution to the dynamics obtained in exact diagonalization calculations of a two-rung lattice model of hard-core bosons.

A. Setup
The lattice consists of two identical 1D chains, with each chain described by a t-t -V Hamiltonian with nearest neighbor hopping t, next-nearest neighbor hopping t , and nearest neighbor interaction V . The two chains interact along the rungs, with a strength set by V r , to mimic the intertube DDI in the experiment (see Fig. 5). The Hamiltonian can be written aŝ whereb † ,i (b ,i ) is the creation (annihilation) operator at site i in chain (=1, 2), andn ,i =b † ,ib ,i is the site occupation operator. L denotes the total number of sites in the lattice, which has L/2 sites per chain. Periodic boundary conditions along the chains are imposed by the conditions ( , L/2 + 1) = ( , 1) and ( , L/2 + 2) = ( , 2).
The hard-core boson creation-annihilation operators obey bosonic commutation relations [b ,i ,b ,j ] = FIG. 5. Two-rung lattice made of two identical chains with nearest neighbor hopping (t), interaction (V ), and nextnearest neighbor hopping (t ). The two chains interact along the rungs (Vr).
,i 2 = 0 to prevent multiple occupancy of the lattice sites. When t = V r = 0, the Hamiltonian reduces to that of two disconnected integrable chains (the spin-1/2 XXZ Hamiltonian in the spin language) and can be solved using the Bethe ansatz [60]. For V = t = V r = 0, the chains become the lattice analogue of the Tonks-Girardeau gas, and the Hamiltonian can be mapped onto that of noninteracting spinless fermions [60]. Given t = 0 and V = 0, the Hamiltonian is nonintegrable for t = 0 and/or V r = 0. We are mostly interested in dynamics when V r V (t = 0), so that integrability is weakly broken.
We take our initial states to be in thermal equilibrium, as described by the grand canonical ensemble (GE), for the initial HamiltonianĤ Î whereN = ,in ,i is the total number of particle operator, T I is the initial temperature (we set the Boltzmann constant to 1), and µ I is the initial chemical potential. We set µ I = 0 in all our calculations, which results in the lattices being at half filling because of the particle-hole symmetry of Hamiltonian (2).
The system is taken out of equilibrium by a sudden quench in whichĤ I is changed toĤ F , such that [Ĥ F ,Ĥ I ] = 0. The system is assumed to be isolated so that the ensuing dynamics is unitary. The density matrix at time τ after the quench is given by (we seth = 1) where |n and E n are the energy eigenkets and eigenvalues ofĤ F , respectively. Our observable of interest is, as in the experiments, the momentum distribution (m k ) along the chains (see also Appendix L 1) The time dependence ofm k is studied by computing m k (τ ) = Tr[m kρ (τ )], while the expectation value of this observable after relaxation can be obtained from the infinite-time average [6] m k = lim In the absence of degeneracies, which is ensured in our calculations by breaking down the Hamiltonian into its symmetry irreducible sectors, the infinite-time average agrees with the prediction of the so-called diagonal ensemble (DE) [5]: whereρ A central question we address with the exact diagonalization calculations is how the momentum distribution equilibrates after the quench. For that, we compute a "distance-to-equilibrium" as the RMS deviation of the momentum distribution function at each time from the DE prediction: See Appendix L 2 for a discussion of thermalization.

B. Numerical results
We set t = 1 (our energy scale) before and after the quench (and set our unit of time toh/t = 1). As mentioned before, our quenches start from an initial state in thermal equilibrium. We take the temperature to be T I = 5t I (qualitatively similar results were obtained for other temperatures) for an initial HamiltonianĤ I that has t I = 50 and V I r = 0. A large t inĤ I is chosen to create an initial momentum distribution that peaks at k = 0 and k = π (see Fig. 16). This is done to resemble the post-kick bimodal initial state created in the experiment. After the quench, t inĤ F is set to 0 and V r is set to various nonzero but small values, so that the evolution occurs under a (in most cases) weakly nonintegrable Hamiltonian. Exploiting translation symmetry, particle-hole symmetry, number conservation per chain in the two-rung system, as well as parity under space reflection, we perform exact diagonalization calculations in systems with up to L = 22 sites. The value of V is kept constant during the quench and is selected to be V = 1.6. Figure 6 shows the "distance-to-equilibrium" δ DE plotted as a function of time for four values of the strength of the integrability breaking inter-rung coupling V r . Like in The symbols show results for a quench in which the system is initialized in a state with a two-peaked momentum distribution (created through an initial Hamiltonian with strong next-nearest neighbor coupling t = 50), and the integrabilitybreaking interaction is turned on post-quench. Dashed lines show results for evolution under the same final Hamiltonian, but from an initial state that has already dephased under the fast integrable dynamics. Specifically, the initial state is a diagonal-ensemble state generated by a quench in which t = 50 → t = 0 is changed but dipolar interactions are absent. The fast dephasing at short times in the simulation depends weakly on the strength of the integrability-breaking perturbation.
the experiments, one can see that the exact diagonalization results exhibit two-timescale dynamics. Prethermalization occurs for times < ∼h /t, a time-scale set by V ∼ t. The near-exponential approach to the diagonal ensemble result occurs in a longer timescale, which is set by V r .
The experiment strives to use the same initial state to study the approach to thermalization when the strength of the DDI (set by θ) is changed. The initial state is taken to be the one after the short-time dephasing for a particular value of θ (θ is changed after that). We can emulate such a procedure in our numerical calculations by "splitting" our single quench in which t is set to zero, and V r is made nonzero, into a two-step quench. In the first quench, t is set to zero (this is a quench to the integrable part ofĤ F ) and the system is allowed to equilibrate. One can then take the equilibrated state as the initial state for a second quench in which the integrability-breaking interaction V r is turned on. Alternatively, one can take the diagonal ensemble after the first quench as the initial state for the second quench. Both procedures produce indistinguishable relaxation rates (see Appendix L 3).
Using the diagonal ensemble after the first quench as the initial state for the second quench allows us to separate out the effects of the integrable and the nonintegrable parts of the Hamiltonian. The ensuing dynamics, shown as dashed lines in Fig. 6, are indistinguishable from those of the original quench after the short-time de-phasing, making apparent that regime II is entirely due to the integrability-breaking interactions.
The similarity between the results in Fig. 3 and in Fig. 6 is striking considering that the systems studied experimentally and theoretically are microscopically very different. By doing so, it highlights the robustness of our findings about the relaxation dynamics close to a strongly interacting integrable point [61].

V. CONCLUSIONS
In summary, we explored the far-from-equilibrium dynamics of a strongly interacting nearly-integrable system as it is systematically tuned away from integrability. We provide the first experimental demonstration that observables in such systems thermalize in a two-step process: prethermalization followed by near-exponential thermalization. A similar behavior is observed in exact numerical calculations of a strongly interacting lattice model. We have also shown that the thermalization rate in our experiments is well-described by a DDI-dependent scaling function that is consistent with perturbative expectations: The scaling is quadratic in the effective intratube contact interactions, and also in the intra-and intertube dipolar interactions.
Our ability to control the strength of integrabilitybreaking perturbations opens a new venue to explore quantum thermalization in strongly interacting systems. Many questions remain, such as how thermalization depends on the "quantumness" of the system, which we can also control by changing the amount of energy deposited in the initial state. Our detailed characterization of the approach to the thermal regime can also play an important role in the development and benchmarking of quantum Boltzmann approaches that could be used in other areas of physics, such as heavy ion collisions.
The dipole moment µ of Dy is 9.93µ B . The effective 1D dipole-dipole interaction (DDI) has been derived in the single-mode approximation to be [62][63][64]: where and u = x/l ⊥ , l ⊥ = h/mω ⊥ , and erfc(u) is the complementary error function. The δ-function term in Eq. (A1) comes from the point limit of an extended dipole [65] and has an opposite sign to V 1D DDI (u). For large distances |x| l ⊥ , V 1D DDI (u) → 4/|u| 3 , just like the DDI in 3D. However, V 1D DDI (u) assumes a finite value at the origin, becoming more sharply peaked for smaller l ⊥ . This behavior resembles that of a δ function and allows one to define an effective δ-function potential for V 1D DDI (u) at short distance [63].
We note that the intratube DDI is suppressed as atoms approach within a few l ⊥ by a factor of 4/|u| 3 To understand this reduction in 1D, consider θ = 90 • . While most of the DDI between atoms alongx is repulsive (i.e., dipoles lying abreast), there remains a small attractive contribution (i.e., dipoles lying head-to-tail) from the part of their wavefunctions that extend transversely by l ⊥ . In general, if the DDI interaction between two dipoles is repulsive when they are separated in the longitudinal direction (side-by-side), their interaction will be attractive when separated in the transverse direction (head-to-tail), and vice versa, reducing the strength of the DDI in either case. See Ref. [63] for details.
In the following discussions, we use the superscript to denote the interaction range ("sr" for short-range and "lr" for long-range) and the subscript to denote the nature of the interaction ("intra" for intratube and "inter" for intertube).

Short-range part of the intratube dipolar interaction
The magnitude of the short-range part of the 1D DDI is given by the sum of the term proportional to the δ function − 8 3 δ(u) in Eq. (A1) and the δ-function-like part of V 1D DDI (u) in Eq. (A3). We calculate the strength of the V 1D DDI (u) by integrating it over a suitably chosen spatial domain inx. Reference [63] determines this range to be ± √ 2πl ⊥ , which is sufficiently smaller than the interparticle spacing inside the tube such that the longrange 1/r 3 tail of the DDI is not double-counted. Taking u ∈ [− √ 2π, + √ 2π] as in Ref. [63], we find the normalized strength A of the short-range part of the interaction

Long-range part of the intratube dipolar interaction
The long-range (i.e., 1/r 3 -scaling) part of the intratube DDI is given by However, not all of long-range intratube DDI contributes to the integrability-breaking perturbation; only the momentum-dependent part can lead to momentum randomizing collisions. To find the leading momentumdependent part, we expand the Fourier transform of V 1D DDI (u) up to O(k 2 ), which provides the terms associated with the DDI-induced virtual interactions leading to integrability breaking; higher-order terms would contribute less to thermalization. The k-space form of the DDI is V 1D , where σ = (2k D l ⊥ ) 2 /2 ≈ 0.2 and Γ(0, σ) is the incomplete Gamma function. The result is η = ( γ + log σ)σ, where γ = 0.577... is the Euler-Mascheroni (Euler-Gamma) constant. The integrability-breaking term from the intratube DDI is therefore ηV (θ)B.

Appendix B: Intertube dipolar interaction
Due to the lack of spatial correlations between atoms in nearby tubes after splitting, the intertube DDI should be calculated as that between an atom in one tube and the integral over all x positions in the nearby tube. More explicitly, for a tube located at (y, z), where Here r = (x, y, z) denotes the atomic position vector and B the direction of the magnetic field. With the geometry of our experimental setup, we can parameterize the two vectors as where a = λ/2 = 371 nm is the lattice constant and i, j are integer indices that denote the location of each tube. Dimer bound states are predicted to form between pairs and arrays of tubes for any negative interaction U y,z inter (θ) < 0 [66,67]. However, these complexes would have binding energies far lower than the post-kick atomic collision energy, and so are unlikely to survive the kicking process. We therefore do not expect intertube spatial atomic correlations to arise from pre-kick dimer formation.
Appendix C: Calculation of U 2 total (θ) We calculate U 2 total , the quadrature sum of the integrability-breaking DDI contributions, using where i, j are the tube indices, and The magnitudes of the integrability-breaking intra-and intertube DDI energies, ηU lr intra (θ) and i,j [U i,j inter (θ)] 2 for i, j ≤ 2, are plotted in Fig. 1(c).

Appendix D: Lieb-Liniger parameter γ(θ) calculation
In the absence of a DDI, the dimensionless coupling parameter γ due to the Van der Waals interaction is defined as [68] γ where n 1D is the 1D particle density and the interparticle interaction along the tube axis is well approximated by an effective potential U 1D = g VdW 1D δ(x). The 1D Van der Waals interaction strength is g VdW [49]. The effective 1D scattering length is given by where a 3D = 141 (17) Bohr is the weighted-average swave scattering length of 162 Dy as measured in two previous experiments [69][70][71] and l ⊥ = h/mω ⊥ = 57.3(3) nm.
With a DDI present, γ is where g total 1D (θ) = g DDI 1D (θ) + g VdW 1D and g DDI 1D (θ) is given in Eq. (A5). Confinement-induced resonances modify this expression for a 1D through an additional factor of 1 − Ca3D √ 2l ⊥ = 0.87(2), where C ≈ 1.46 [49,72]. This correction does not significantly change the shape or magnitude of the theory curve in Fig. 4. Moreover, this factor could be modified by the presence of the DDI to a value that has not been either measured or uniquely determined by theories of dipolar confinement-induced resonances [62,[73][74][75][76]. Given this uncertainty, we choose to use the simple expression in Eq. (D2) for a 1D .
To find a weighed averaged γ avg (θ), we calculate the number of atoms in each tube by assuming a Thomas-Fermi density distribution n TF for the BEC: where N = dr 3 n TF is the total atom number, R i is the Thomas-Fermi radius, and i = x, y, z. The TF approximation is justified given the weak dependence of γ ∼ N 2/3 on atom number [27].
We then obtain a 2D density distribution of the BEC in the yz-plane by integrating along the tube direction: n(y, z) = n TF (r)dx (D5) To find the number of atoms loaded into each tube N i,j , we assume each tube collects atoms in a square cross section with length a = λ/2, equal to the lattice site spacing, at a local density n(y, z) with the atom number given by N i,j = a 2 n(y i , z j ), where y i and z i denote the tube position. This calculation neglects rearrangements of atoms during the lattice loading procedure, i.e., tunneling when the lattice is still shallow, but this assumption is justified by the weak dependence of γ on atom number. We calculate the peak atomic density of each tube using the 1D Thomas-Fermi distribution before the gas is excited. Since γ is only weakly dependent on atom number, we use the mean-field result rather than the full TG result, as in Ref. [77].
Before exciting the gas, each tube has a γ i,j 0 (θ) ∝ 1/n TF 0 , and for the ensemble of tubes, we calculate an average γ avg 0 (θ) weighed by atom number in each tube: Note that for each tube, γ i,j 0 (θ) has a weak dependence on atom number: i,j . For our experimental conditions, we load into approximately 70 × 10 tubes, with ∼50 atoms in the central tubes, resulting in an ensemble averaged density n avg 0 = 3.1 µm −1 . This yields an ensemble averaged initial γ VdW,avg 0 = 1.5 (2). Including the g DDI 1D (θ) term, γ avg 0 (θ) varies from 0.6(1) at 0 • to 1.9(2) at 90 • . The gas dephases at a time ∼100 ms after being diffracted. The dephasing reduces the density in each tube because the narrow, counterpropagating packets of atoms spread throughout the entire tube length with a higher classical turning point due to addition of the large energy from the momentum kick. Our classical non-interacting dynamics simulation, discussed in Sec. III A, shows that the dephased density distribution is approximately uniform, and we therefore estimate the dephased density to be n i,j  Fig. 1(c).
In its ground state, a system at such values of γ would be in the crossover to the TG regime, in which the microscopic bosons exhibit antibunching, as free fermions would [48,51]. This antibunching occurs because the interaction strength dominates the zero-point energy. Whether fermionization persists in the high-energy, farfrom-equilibrium post-kick evolution is a priori unclear, though in equilibrium, at the post-kick energy density, there is no antibunching [54]. Since the dephased state is far from equilibrium, these results cannot be applied directly, but are suggestive (as one might expect typical high-energy-density states to be thermal in some respects [11]).

Appendix E: Atom number variation
Any atoms that flip spin due to spontaneous emission from the optical trap and lattice beams are immediately lost from dipolar relaxation collisions and do not lead to heating of the gas [78]. Feshbach resonances are avoided by tuning to 1.58(1) G, which lies within a resonance-free region between 0.5 G and 2.5 G [79]. We ensure that the B-field remains at this field within 10 mG at every angle. We do not observe any confinement-induced resonances or dipolar confinement-induced resonances since we do not observe resonant atom loss at any θ angle investigated [52,62,[72][73][74][75][76]80].
We do not observe significant atom loss in the data sets presented. Atom number as a function of time is shown in Fig. 7 for θ = 0 • , 55 • , and 90 • , with γ VdW,avg d = 5.7 (7). For the longest observation time of 2.8 s, we lose about 25% of the total atoms, which increases γ i,j 0 by just 16% according to the γ i,j ∼ N 2/3 i,j scaling relation. Aside from atom number loss during the observation time, there is also a slight variation of atom numbers between data taken for different θ. For all angles used in the γ VdW,avg d = 5.7(7) measurement, the mean atom number is 15(2) × 10 3 , corresponding to a 13% variation, which is smaller than the 25% variation in atom number over the time evolution at a fixed θ. We therefore conclude that it is reasonable to treat γ VdW,avg sus time is in contrast to the rapid increase in γ observed in Ref. [27] due to large atom loss rates.

Appendix F: Heating measurements and simulations
Heating from the lattice beams can affect the momentum distribution evolution. The lattice lasers can induce heating in two ways: 1) Intensity noise at certain frequencies can parametrically heat the gas or excite atoms to higher lattice bands; 2) Spontaneous emission imparts photon recoil momentum onto the atoms, whose projection along the tube direction leads to heating [81,82]. We now show that the second mechanism is the dominant heating source in our system before discussing its effect on the DT in more detail with the aid of a collisionless Monte Carlo simulation.
We measure the heating rate in our system by loading a BEC into the lattice and measuring its momentum distribution versus lattice hold time. The procedure is identical to the DT measurements, though without splitting the gas. At short hold time (t < 0.5 s at 18E R ), we observe a distribution alongx that is similar to that reported in Ref. [27]: a broad Gaussian centered about a narrower Gaussian. At longer times, the measured distribution fits well to a single Gaussian. The fitted width of the single Gaussian increases linearly with time, and the best-fit slope corresponds to the heating rate. We verify that the dominant heating mechanism in our system is spontaneous emission by observing that the heating rate of an unkicked gas decreases as 1/∆ when we vary the detuning ∆ from atomic resonance at constant lattice depth V 0 . Engineering a TG system requires the deepest lattice possible. However, too much heating from a large V 0 would obscure the dynamics of interest. We therefore search for a V 0 with the slowest thermalization rate. We experimentally determine this optimal depth by measuring the DT at a fixed holding time for a range of V 0 values at θ = 90 • , the angle with the largest DDI. The results for three different holding times, t = 10T , 15T , and 20T , are shown in Fig. 8. The slowest thermalization occurs near V 0 = 18E R , which is the lattice depth we use for our measurements, yielding γ VdW,avg d = 5.7(7). We measured the heating rate at V 0 = 18E R for the twelve θ values used in our thermalization rate data. The results are shown in Fig. 9. The heating rates are similar among all angles with little-to-no systematic variation. The highest rate, 17(1) nK/s at 0 • , is still ∼5× slower than the slowest rate observed in Ref. [27]. The low heating rates versus those in Ref. [27] are achieved through the use of lower 2D lattice depths and 5-10× smaller n 1D . Nevertheless, in Ref. [27] the ratio between collision energy and transverse trap frequency is E c /(hω ⊥ ) = 0.45, whereas we have 0.47-essentially the same. However, the recoil momentum of the lattice k R used in Ref. [27] is √ 2× larger than ours. In addition, the mass of their atomic species, Rb, is twice lighter than Dy's. Therefore, their ω ⊥ has to be four times larger than in our experiment, and so a much deeper lattice is required to remain in the 1D regime, leading to larger heating rates. On the other hand, our shallower lattice results in a faster intertube tunneling rate J. However, we estimate in Sec. I that the thermalization rate associated with tunneling is ≥10× smaller than our lowest measured thermalization rate and is therefore negligible.
We use the measured heating rate of a BEC at equilibrium to simulate the effect of spontaneous emission heating on the DT evolution of an experimentally measured dephased distribution. A Monte Carlo method is employed that accounts for the heating effects described in Refs. [81,82], and we find that the dominant heating process is from one-body spontaneous emission. Following Ref. [81], we consider those changes in vibrational state in the transverse direction with n = n y + n z → n ± 1 due to both absorption and emission of lattice photons. Atoms with n ≥ 3 are considered lost from the trap, since we expect intertube tunneling for atoms in these states to become non-negligible because their vibrational energy approaches the transverse lattice depth. Atom loss can also occur in the axial direction when the total axial energy for an atom exceeds the axial trap depth V 0 = mω 2 w 2 0 /4, where ω and w 0 are the trap frequency and Gaussian beam waist in the direction of interest, respectively. As in the experiment, we observe little atom loss in the simulation: The typical loss is 3% in 8 s, which is over two times longer than the longest thermalization time measured in the experiment.
In addition to one-body loss due to spontaneous emission, two-body collisions after a spontaneous emission event can lead to heating. In particular, Ref. [82] considers seven two-body transverse state-changing collisional processes that are energetically allowed and permitted by parity selection rules. As the vibrational levels of the scattered atoms are modified, there is a finite probability for the transverse energy to be deposited in the axial direction, leading to an axial momentum kick. Since the rates of such transitions depend on the population of the relevant n = 0 states, these collisions are secondorder; the atoms are initialized in the (n y , n z ) = (0, 0) state and spontaneous emission is the only mechanism to excite them to higher vibrational levels. Indeed, by using the worst-case reflection and transmission probabilities [? ], we find within the experimental timescale that the simulated momentum distributions exhibit negligible deviation compared to those without two-body collisions. Therefore, we need only consider one-body heating processes in our analysis. This is fortunate, as dipolar twobody collisions could have led to θ-dependent heating.
We can now use these heating rates as inputs to simulations of the DT evolution. This is done in order to account for how heating affects the rate of change of DT so that we may then account for heating in our measured thermalization rates. To do so, we introduce to the simulated momentum distributions the measured heating rate and a Gaussian white noise background that is matched to the experimentally measured noise level. The DT and noise floor are then computed in the same way as described in Sec. II F. To reduce Monte Carlo sampling noise, we average twenty simulated distributions so that their noise is negligible compared to the added detection noise. The simulations yield thermalization rates between 0.156(5) s −1 and 0.25(1) s −1 versus θ. This shows that the fastest spontaneous-emission-limited heating rate is slower than the slowest measured thermalization rate, and therefore this heating rate is never larger than our measured thermalization rates, and indeed is much smaller than those rates for θ's above 30 • . Finally, to deduce the thermalization rate due to the integrabilitybreaking physics alone, i.e., the rates plotted in Fig. 4 thermalization rates from the experimental thermalization rates.

Appendix G: Comments on DT metric
We have considered other metrics for distance-tothermalization (DT): (1) Kurtosis is a common measure of deviation from a Gaussian for a given distribution, but does not work well for our data due to its high sensitivity to noise. 2) We compared p(x) to a thermal distribution with the same total energy at each recorded time step, including both initial kinetic energy imparted on each atom and heating from spontaneous emission. This method is less reliable because the energy summation from p(x) is sensitive to noise in the high-momentum wings.
Appendix H: Measurement of DT for different θ and rotation times The kicking procedure described in Sec. II C ensures that the boundary between regime I and II appears at log(DT) ≈ 1.5 regardless of the final θ setting. We recall that this is because, to eliminate systematics due to the splitting processes, we fix the system to evolve under θ = 35 • in regime I before rotating θ to its final setting at the beginning of regime II. Figure 10 shows data exhibiting no systematic variation in DT versus θ at the end of regime I.
While for some θ there can be an additional dephasing evolution after the rotation time-e.g., the 0 • data in the inset of Fig. 3 takes longer to reach regime II (log(DT) ≈ 1.5) than the 90 • data-we have verified that waiting longer to rotate does not affect the subsequent thermalization rate: We took data at two different rotation times for θ = 0 • to demonstrate that the time chosen for the rotation also does not affect the subsequent thermalization rate. We choose θ = 0 • because it exhibits the slowest decay time so that we can best test the difference between these rotation times. These data are shown in Fig. 11, and we find that the slow decay rate is approximately unchanged within experimental resolution.

Appendix I: Tunneling between tubes
The tunneling rate is approximately given by where s = V 0 /E R is the dimensionless lattice depth. This formula agrees with the exact value of J to better than 10% accuracy for s > 15 [84]. For our lattice, s = 18 and J = 0.004E R , which corresponds to a J/h = 2π × 9 Hz tunneling rate. Tunneling between tubes can break integrability by allowing effectively 2D scattering [41]: To leading order, two particles in the same tube collide while one atom scatters to a neighboring tube via tunneling. Such a scattering event conserves total momentum and energy, but not momentum along the tubes due to a finite lattice bandwidth 4J, leading to thermalization.
We estimate the thermalization timescale set by tunneling in the following manner. The initial momentum k i = 2k D and final momentum k f of an atom in a twoparticle scattering event that involves tunneling can be related by from which we obtain the relative change in momentum ∆k = k i − k f ≈ (4J/2E R )k D . Such scattering events involving tunneling then lead to a random walk in momentum space with a step size of ∆k. Thermalization requires a change in momentum on the order of k i , and the time T th that it takes for this process is given by where the collision rate is twice the longitudinal trap frequency f l = T −1 = ω x /(2π) and the square root on the left side arises from the random walk process. The factor of RN/2 is the total number of collisions given N atoms in each tube with reflection coefficient R = (2k D a 1D ) −2 = 1/28 per atom [27]. For our parameters, T th ≈ 600 s. This estimated time scale is two orders of magnitude larger than the longest measured thermalization time. We also note that tunneling cannot be the source of the angular dependence we observe in the data. Therefore, we conclude that tunneling between tubes, while not completely negligible, is a much smaller thermalization mechanism than either the DDI or the spontaneous heating caused by the lattice lasers.

Appendix J: Collisionless classical dynamics simulation
As mentioned in Sec. III A, there are two dephasing processes due to the trap: (1) dephasing of oscillations between different harmonic tubes due to their different natural frequencies; and (2) dephasing of the oscillations of the gas in a single tube due to its anharmonicity. To quantify their relative contributions to dephasing, we simulate the momentum distribution evolution using a classical dynamics model that does not take account for interactions.
We numerically solve the classical equation of motion for atoms in each tube given its longitudinal Gaussian potential. The parameters of the Gaussian potential of each tube are determined by the tube location with respect to the center of the crossed ODT trap. We simulate an array of 70×12 tubes in the yz-plane. The ODT beam parameters are given in Sec. II A. We assume the foci of the two crossed ODT beams overlap perfectly and neglect the axial trapping contribution from the lattice beams. We note that any slight imperfection in beam alignment and beam shape distortion only increases the effect of (1) and (2); consequently, the simulation provides an upper bound to the dephasing time.
We initialize the simulation by distributing 15 × 10 3 atoms into the tubes using the density calculation described in Appendix D. All atoms are located in the center of each tube and to match the measured initial momentum spread, we assign an initial momentum to each atom by sampling from a Gaussian distribution with standard deviation σ = 0.2hk D . We then add a ±2hk D momentum kick to the atoms in each tube. We solve the trajectory for each atom at a time step of T /30, where T = 16 ms is the trap period of the central tube. Doing so allows us to keep track of the momentum and position of each atom at every time step.
Unlike in the experiment, the momentum distribution produced by this collisionless simulation is not station-  (2) is present (circle), and second when both (1) and (2) are present (triangle). Inset shows the dephased momentum distribution.
ary within a period. However, the averaged distribution over a period reaches a steady state, as shown in the inset of Fig. 12. The figure's main panel shows the RMS difference between the averaged distribution over each period and this steady distribution. We call this metric the distance-to-dephasing (DD). This metric is more appropriate than DT because we are interested in the time to the end of technical dephasing. The figure shows two simulation cases: the first (shown as light blue triangles) only includes process (2); the second (shown as darker blue circles) includes both processes (1) and (2). We see in Fig. 12 that if only process (2) is present, the momentum distribution completely dephases in about 600 ms, while the dephasing time reduces to ∼150 ms when process (1) is included. Both processes are important, and the non-negligible contribution of process (2) means it is reasonable to assume an equilibrated density profile when calculating dipolar interactions and γ.

Appendix K: Details of single-sided kick measurement
We impart a single-sided momentum kick p ≈ −2hk D to the atoms while keeping all other trap settings identical to that employed to take the thermalization data in Fig. 3. We then measure the time at which the resulting distribution dephases. The single-sided kick is achieved using a double-pulse sequence similar to that used for creating the symmetric | ± 2hk D splitting. The spatial symmetry is broken by introducing a small initial momentum k s to the BEC. Using a numerical optimization algorithm, we find that nearly all atoms can be transferred to the |k s − 2hk D state using the following parameters: k s = −0.21k D , a phase grating lattice depth of 11.1E r /2, a first pulse with duration τ 1 = 60 µs, followed by τ 2 = 93 µs of free-evolution, and a second pulse with duration τ 3 = 90 µs. The calculated time evolution of the populations of the lowest two diffraction orders and the undiffracted order are shown in Fig. 13. Populations in the higher diffraction orders are negligible.
To compare the single-sided dephasing time to our thermalization data, and to facilitate the pinpointing of the dephasing time, we add to the single-sided distribution its own mirror image to emulate the situation where there are two packets of atom oscillating symmetrically in the tube. We then use the same analysis procedure as described earlier to find the DT (t). The results are shown in Fig. 15.
The observed single-sided momentum distribution evolution is shown in Fig. 14. We note that the value of DT at this dephasing time, log (DT) ≈ 1.5, is consistent with the choice of division between regime I and II in the thermalization data of Fig. 3.
The experimental sequence ensures that the atoms experience the same level of anharmonicity and inhomogeneity as in the measurements starting with a symmetric |±2hk D distribution, but removes the effects of high-momentum interactions, i.e., head-on collisions. In this configuration, the collisions alone (in the absence of anharmonicity) cannot give rise to a stationary momentum distribution, and so the stationary momentum distribution must arise from the technical dephasing processes. (The generalized Kohn theorem [85] guarantees that the oscillatory center of mass motion in a strictly harmonic trap-and hence the oscillations of the momentum distribution-are unaffected by interactions.)  In Fig. 16, we show examples of momentum distribution functions of: (i) initial states, (ii) the diagonal ensemble (DE) after the quench to the integrable part ofĤ F , (iii) the DE after the quench toĤ F , and (iv) the grand canonical ensemble (GE) prediction for the thermal momentum distribution after the quench (see Sec. L 2). Note that the initial state and the DE after the quench to the integrable part ofĤ F exhibit a peak at k = π, while such a peak is absent in the thermal predictions-there is no "memory" of the initial state distribution. The thermal predictions are almost k-independent because of the high energy density of the initial state inĤ F (as in the experiments), which results in a high temperature T F . The DE predictions can be seen to approach those of the GE with increasing V r . As we argue next, the differences between these two ensembles in Fig. 16 are due to finite-size effects. They vanish in the thermodynamic limit.

Thermalization
An important question that was not discussed in the main text in the context of the exact diagonalization calculations is whether the momentum distribution thermalizes, namely, whether the equilibrated momentum distribution function is that of a system in thermal equilibrium. Observables in integrable systems are expected to equilibrate but not thermalize, while in nonintegrable ones they are expected to thermalize [6]. In order to determine whether the momentum distribution thermalizes, we first need to compute the GE prediction m k (GE) = Tr[m kρGE ] at the same energy and number of particles as in the time-evolved state. The density matrix of the grand canonical ensemble that describes thermalized observables iŝ where T F and µ F are found solving for the two equations: Since our systems are always at half filling, µ F = 0. We then compute a distance-to-thermalization metric It is only in the thermodynamic limit that the diagonal ensemble (DE) predictions become identical to those of the GE in nonintegrable systems [6,16]. Because of finite-size effects, δ DE (τ ) and δ GE (τ ) are different in our We show the momentum distributions for an initial state (labeled as "initial state"), the diagonal ensemble after the quench to the integrable part ofĤF (labeled as "initial DE"), the diagonal ensemble after the quench toĤF (labeled as "DE"), and the grand canonical ensemble prediction for the thermal momentum distribution after the quench (labeled as "GE"). For the quenches, we set V = 1.6 and (a) Vr = 0.2 and (b) Vr = 0.8. The momentum distribution of the initial state and of the diagonal ensemble after the quench to the integrable part ofĤF are the same in (a) and (b), only the diagonal and grand canonical ensemble predictions change due to the change of Vr.
calculations away from integrability. To check that thermalization takes place in our nonintegrable systems, we also compute a distance between the diagonal and the grand canonical ensemble and explore its behavior with changing system size. In Fig. 17(a), we plot the distance-to-thermalization δ GE (τ ) for quenches with fixed V = 1.6 but different values of V r . At any given time, one can see that δ GE (τ ) is larger the closer the system is to integrability. One can also see that, for the largest value of V r , δ GE (τ ) converges to a nonvanishing value at long times. This is the result of finite-size effects. In Fig. 17(b), we plot δ GE (τ ) for a fixed value of V r in chains with different number of sites. The plots show that the saturation value of δ GE (τ ) at long times decreases with increasing system size. This suggests that, in the thermodynamic limit, the time-evolving momentum distribution function approaches the thermal prediction during the equilibration dynamics. Further evidence to support this expectation is presented in the inset in Fig. 17(b), in which we plot the distance between the diagonal and grand canonical ensemble predictions δ(DE-GE) as a function of L. The results are consistent with δ(DE-GE) vanishing exponentially with increasing L. The distance is computed, in all cases, from the DE result for the single quench. Results are shown when the second quench is carried out at τ = 0.8 and 9.0 following the first quench, and at τ = 0 when starting with the DE of the integrable system after the first quench (labeled as "initial DE"). The solid lines at long times depict exponential behavior indicating the same relaxation rate in all cases. (b) Distance to equilibration in single quenches for different systems sizes L. These results show the finite-size effect on the relaxation rates. Figure 18(a) shows the distance to the diagonal ensemble of the single quench case, taking a state at two different times after the short-time dephasing following the first quench, and taking the diagonal ensemble after the first quench. They all can be seen to result in an exponential decay at long times to the diagonal ensemble of the single quench case. The exponentially decaying part exhibits nearly the same relaxation rate in the three curves (see fits). This shows that the thermalization rate is not affected by the choice of time to switch on V r . The near-exponential relaxation can then be understood as generated by the time evolution of the DE of the integrable part ofĤ F under the nonintegrableĤ F . The main limitation of our exact diagonalization results is, as mentioned before, finite-size effects. Figure 18(b) shows the evolution of the distance-toequilibration δ DE (τ ) in the single quench protocol as one changes the system size (L = 16, 18, 20, and 22). The near-exponential relaxation is apparent in all cases, but the relaxation rate can be seen to be affected by finite-size effects. Nevertheless, the trends manifest in the simulations qualitatively match those in the experiment, which has a far larger system size.