Variational quantum dynamics of two-dimensional rotor models

,


I. INTRODUCTION
Nonequilibrium quantum many-body physics has been at the forefront of condensed matter, atomic physics and chemistry research for over a decade [1,2].The field is driven by remarkable progress in our ability to coherently control matter at the atomic scale.This control has resulted in the creation of novel phases of matter, including observations of light-induced superconductivity [3], cavity-enhanced chemical reactions [4] and dynamical phase transitions [5].
The capacity to precisely control [6][7][8][9] modern quantum experiments and hardware is becoming increasingly limited by numerical simulation of the real-time evolution of quantum systems.At its core, the problem is related to fast entanglement growth in systems out of equilibrium, which forces one to keep track of all the intricate correlations that build up in the system.While there has been considerable progress [10][11][12][13][14], challenges remain, in particular if one moves away from one-dimensional spin models.
Recently, it has been proposed that methods inspired by classical and quantum machine learning might alleviate some of these problems [15][16][17][18][19].In practice, however, it has been difficult to achieve reliable results due numerical instabilities resulting from a combination of Monte Carlo noise and flatness of the quantum geometry of modern neural-network wave functions [17,[20][21][22][23].
In this work, we present an approach for capturing long-time dynamics of two-dimensional (2D) lattice models with continuous degrees of freedom, using a combination of methods that were previously unexplored in the field of variational simulations -the Hamiltonian Monte Carlo sampler, a tailored variational ansatz and proper regularization of the projected dynamics.We focus on the quantum rotor model with direct applications to arrays of coupled Josephson junctions and explore previously unreachable system sizes and evolution times, up to 8 × 8 square lattices.
The paper is organized as follows.First, we introduce the physics of the quantum rotor model and the variational wavefunction.Then, we outline the Hamiltonian Monte Carlo sampler and its connection to the time-dependent variational Monte Carlo algorithm.Finally, we present results for the two-dimensional model, showing magnetization, vorticity and the Loschmidt echo converging to appropriate equilibrium values.Our Monte Carlo results are substantiated by self-consistency checks when key hyperparameters are changed and by comparing our approach to tensor-network calculations in one and two spatial dimensions.

II. MODEL AND METHODS
Consider a system of continuous planar rotors, whose angles θ k (with respect to an arbitrary axis) could, for example, represent superconducting phases of adjacent Josephson junctions on a lattice Λ with N sites.We use the basis |θ⟩ ≡ |θ 1 , . . ., θ N ⟩ for the Hilbert space H.We start with an effective Hamiltonian that captures the relevant physics of superconducting Josephson junctions [24][25][26]: where L k = −i ∂ θ k and nk = (cos θ k , sin θ k ) in the continuous basis |θ⟩ of choice.The Hamiltonian in Eq. 1 is often called the quantum rotor model (QRM).Its equilibrium properties [27] have been studied using variational

Convolution Convolution
FIG. 1. Top: The ansatz ψα(θ) architecture used for simulations of two-dimensional QRM systems.It amounts to a two-layer convolutional neural network with an activation function given by Eq. 15.To enforce periodicity and improve expressivity, we precalculate sines and cosines of input angles which are treated as different input channels by the CNN.The final layer outputs a single channel and all of its components are summed into a single complex number (because of complex parameters α ∈ C P ) we then interpret as ln ψα(θ).Bottom: An illustration of the Hamiltonian Monte Carlo algorithm.Dummy momentum variables are introduced and sampling the given N -dimensional probability distribution is rewritten in 2N -dimensional phase space with an artificial effective Hamiltonian H.Samples are collected as snapshots of solutions of Hamilton's equations of motion.
Monte Carlo (VMC) [28] and other quantum Monte Carlo (QMC) [29] methods.Perhaps most notably, the quantum critical point separating the disordered and O(2) broken phase has been predicted at g c ≈ 4.25.
However, as noted in the introduction, real-time evolution properties of the QRM have barely been explored.This is mainly due to the lack of suitable methods that can access experimentally relevant times t ≫ J −1 , at large system sizes in two dimensions.The ability to simulate relatively large system sizes is not only of theoretical interests but has technological applications in the study of dynamics of arrays of coupled Josephson junctions [30].
The evolution equation for a state Ψ = Ψ(θ), in the continuous basis |θ⟩, reads with appropriate periodic boundary conditions for each rotor k.Eq. 2 is prohibitively expensive to solve exactly even for a handful of interacting rotors.The continuous nature of the |θ⟩ basis exacerbates the problem.

A. Variational simulation
We represent a quantum state using a wavefunction ψ α (θ) where α ∈ C P is a set of P real or complex variational parameters.Since any |ψ⟩ ∈ H admits an expansion in terms of |θ⟩, we define the following un-normalized variational quantum state (VQS): where Building on previous work on continuous systems [16], our simulation of the real-time dynamics of the state given in Eq. 4 is based on the time-dependent variational Monte Carlo (t-VMC) method [15,31].The core assumption that allows us to approximately solve Eq. 2 is that of time dependence of parameters α = α(t).
Since quantum averages over an exponentially large Hilbert space H in the TDVP Eq. 6 cannot be computed exactly, Markov chain Monte Carlo (MCMC) sampling methods are often employed [33,34].In VMC calculations, it is common to rewrite quantum averages, such as those in Eq. 6, as expressions amenable to estimation through sampling.For example, in the case of the Hamiltonian H, we obtain the local energy E L : where For more details about the specific sampling algorithm employed in this work, we refer the reader to Sec II B and Appendix A 1.
After computing the matrix S and the vector g at time t, one can formally define α = −i S −1 g and use any ordinary differential equation (ODE) integrator (see Appendix A 3) to obtain the next set of parameters, at time t + δt.However, the inverse is often ill defined.
One reason is that Monte Carlo estimates of matrix elements are noisy.Noise accumulates to render the matrix singular by making a small eigenvalues vanish.Therefore, quickly and efficiently obtaining many uncorrelated samples from p(θ, t) ∝ |ψ α(t) (θ)| 2 is crucial.The other reason is that the specific choice of ψ α introduces redundancy between different parameters, producing linearly dependent or vanishing rows and columns in S. Therefore, choosing an efficiently parameterized trial wavefunction is equally important.In practice, adding more parameters to the wavefunction can sometimes unexpectedly reduce accuracy by making S ill conditioned.
In order to move forward with the algorithm, regularization schemes must be used.For ground-state optimization tasks, simply replacing S → S + ϵ1, for some small positive constant ϵ, often suffices to diminish the effect of small eigenvalues.
However, in this work, we regularize the S matrix by diagonalization S = U ΣU † at each time step.Having obtained eigenvalues σ 2 µ such that Σ = diag(σ 2 1 , . . ., σ 2 P ), we define the pseudoinverse as We heuristically find that the smooth cutoff with a hyperparameter λ 2 in Eq. 9 is superior to traditional pseudoinverses when using adaptive integrators for updating parameters α.For more details on regularization, see Appendix A 2.
After calculating averages in Eq. 6 and appropriately regularizing the QGT inverse S −1 , one can use any external ODE integrator to perform time stepping in the top-level equation α = −i S −1 g.In this work, we use the embedded Bogacki-Shampine adaptive solver RK3(2) from the Runge-Kutta family [35][36][37].

B. Hamiltonian Monte Carlo
Hilbert-space averages defined in Eq. 6 cannot be evaluated analytically for an arbitrary ψ α .To perform this task in an efficient and scalable way, we employ Hamiltonian Monte Carlo (HMC) [38,39] to obtain samples from the distribution p(θ, t) at each time step t.We make this choice because HMC offers a systematic way of making large steps in MCMC proposals while still keeping acceptance probabilities high, unlike more conventional approaches like random-walk Metropolis (RWM).This results in a Markov chain with considerably lower autocorrelation times, allowing for treatments of larger systems with less overall runtime spent on sampling.
For a generic probability distribution p(θ), HMC augments the configuration space with artificial momentum variables π = (π 1 , . . ., π N ) ∼ N (0, M ): for some choice of a positive-definite mass matrix M .Interpreting the exponent in Eq. 10 as an effective classical Hamiltonian β H(θ, π) inducing a Boltzmann weight e −β H , Monte Carlo updates can be defined through numerical integration of relevant Hamilton's equations.
Owing to insights from statistical physics, we know that a large number of particles in equilibrium following classical equations of motion have precisely this desired Boltzmann distribution.Given θ(0), π(0) and a small step size ε, a common choice is the leapfrog integrator: where V (θ) = − ln p(θ) and τ is the fictitious HMC time variable, unrelated to t in Eq. 5.This specific integrator is chosen because of its symplectic [37,38] propertyit conserves energy/probability exactly, allowing for large jumps in the θ space while keeping high acceptance probabilities.We note that higher-order symplectic integrators can be used as well.
After integrating for L steps, the new configuration (θ(Lε), π(Lε)) is proposed as the next sample in the Markov chain.It is common to apply the Metropolis-Hastings accept-reject step [33,34] despite the fact that the new configuration has the same energy (probability) as the initial one.This is done to offset the effects of unwanted numerical errors in the leapfrog scheme, usually improving overall performance for many samples [38,39].For the small quench to g f = 4.5, we observe the expected behavior with slower approach to the new ordered equilibrium state.Convergence is similar to adiabatic change.The moderate quench to g f = 6.0 exhibits a sharp increase in rotor angle variance is accompanied by a single flip (right panel) in the average magnetization at t ≈ J −1 .For the large quench to g f = 9.0, many rotor flips occur after the first one, indicating much more detailed exploration of the underlying Hilbert space.Convergence to the new equilibrium starts taking place only for t ≳ 5J −1 .
Right: A parametric plot of the mean rotor direction.We observe a more thorough exploration of the magnetization sphere for larger quenches.
Eqs. 11 simulate a swarm of effective classical particles whose positions and momenta follow the desired joint Boltzmann distribution in Eq. 10.Discarding all π samples is equivalent to marginalizing the distribution in Eq. 10.In practice, randomness is injected by sampling the normal distribution π(0) ∼ N (0, M ) each time initial conditions are required for numerical integration.
Choosing the mass matrix M , the time step ε and the integration length L carefully is crucial for efficient exploration of the configuration space.In this work, we chose to set M and ε automatically, by using heuristically proven [39][40][41] algorithms operating samples from an extended warmup phase for each Markov chain individually.Integration length L was treated as a hyperparameter.For more details and specific values, see Appendix A 1.

C. The trial wavefunction
In this work, we use a variant of the standard convolutional neural network (CNN) architecture [42,43] to model ψ α (θ).Our approach is built on those of Refs.[17,44].Specifically, we set where * denotes a convolution over lattice indices k and c = 1, . . ., 2K is the channel index.Features are the output of D − 1-layer CNN defined by: with an elementwise nonlinear activation function f d , biases b d , and weights w d at layer d.We include all weights and biases into the set of trainable parameters α and use automatic differentiation (AD) techniques to obtain all derivatives O µ required for evaluation of Eqs. 6.For CNN inputs h 0 , we concatenate the following features: along the channel axis, as illustrated on Fig. 1.This construction allows us to include a limited number of higher Fourier modes a priori, improving ansatz expressivity in a controlled way.In this work, we set D = 2, K = 4 for larger two-dimensional (8 × 8) experiments and K = 1 for smaller systems.
To maintain analytic dependence on parameters α, we restrict the CNN nonlinearities f d to polynomial functions.The Taylor expansion of the logarithm of the zeroth-order modified Bessel function of the first kind is used: This particular activation function choice is motivated by the appearance of I 0 in the version of the restricted Boltzmann machine (RBM) adapted to the QRM in Ref. [28].This approach has the advantage of maintaining the holomorphic dependence of ψ α on α and preserving the form of Eqs. 6.
In this work, we focus on a simple two-layer CNN ansatz to control the number of parameters P .In addition nontrivially affecting the QGT inverse (see subsection II A), the cost to diagonalize the QGT in order to regularize the inverse in Eq. 9 grows as O(P 3 ).Heuristically, we also find that introducing more parameters α requires more Monte Carlo samples to correctly resolve the relevant averages in Eq. 6 and does not significantly contribute to simulation accuracy in our case.A systematic investigation of larger neural-network architecture details is left for future work.

III. RESULTS
In this section, we study dynamical properties of several observables of the QRM, focusing on the twodimensional model.A series of benchmarks in one and two dimensions can be found in section III A.
We simulate the effects instantaneous quenches of the coupling constant g in Eq. 1. Specifically, we initialize parameters α of the ansatz ψ α illustrated on Fig. 1 to the ground state of the QRM Hamiltonian with g = g i using imaginary-time variational Monte Carlo (VMC) [31,43] methods.We then simulate real-time dynamics under g = g f .In this work, we focus on quenches from the ordered phase to the disordered: g i < g c < g f .In Fig. 2, we choose a square 8 × 8 lattice, tracking the dynamics of the potential energy density and the average magnetization magnitude M along with its x, y components defined by M = N −1 k ⟨n k ⟩ t .Averages ⟨•⟩ t are performed with respect to the ansatz state at time t.In addition, corresponding circular variances were defined as Var(θ k ) = −2 ln |⟨n k ⟩ t | and averaged over the lattice index k.
These observables were chosen as a proxy for thermalization.Across a wide range of quenches we observe convergence to their respective equilibrium values at g = g f , see Fig. 2. We observe two distinct dynamical regimes in relation to the quantum critical point g c ≈ 4.25, when g i < g c .For small quenches (left column of Fig. 2) we see the expected outcome -slower equilibriation with only small fluctuations in the direction of the magnetization.However, for moderate to large quenches in Fig. 2, we observe a (transient) demagnetization of the sample and convergence to a new equilibrium state.
In addition, we define a measure of average vorticity over a surface A with edge ∂A on the lattice.Using Stokes' theorem, we rewrite the expression as a contour integral over ∂A in the positive direction.On Fig. 3 (right panel), we plot v(A) averaged over all n ℓ square ℓ × ℓ surfaces: As expected, we find almost zero vorticity for quenches in the ordered phase, while larger fluctuations are generated for quenches across the critical point.We postpone a detailed analysis for future work.
Aside from local observables, such as energy and magnetization, one also has access to global observables such as the Loschmidt echo.The latter has some interesting properties in the context of dynamical phase transitions [45] and quantum chaos [46].The Loschmidt echo expresses the quantum state overlap between the initial state and some time-evolved state.In general, the fidelity F (Ψ, Φ) between two generic normalized quantum states Ψ and Φ is defined as F (Ψ, Φ) = |⟨Ψ|Φ⟩| evolution, we expect the fidelity F (Ψ(t = 0), Ψ(t)) to decay as a function of time t, for any given initial state |Ψ(t = 0)⟩.
To evaluate this quantity using Monte Carlo sampling of unnormalized ansatz wavefunctions ψ(θ, t) = ψ α(t) (θ), we rewrite the fidelity definition as: following Refs.[47,48].The expression in Eq. 19 is manifestly independent of the normalization factor.In practice, we take the real part of Eq. 19 to discard the small nonzero imaginary part coming from finite-sample estimates of the two factors.In addition, we calculate and store both factors in log space to preserve accuracy and maintain numerical stability.
As expected, we find that that the return probability (or fidelity in short) decays quickly with time, as illustrated in Fig. 3 (left panel).For smaller quenches, the fidelity shoots back up to a nonzero value suggesting a finite overlap between the initial state the long time "equilibrium" state after the quench.The latter may be interpreted as a signature of quenching between two Hamiltonians in the ordered phase.
As a measure of the fidelity decay, we introduce another time scale τ1 /2 defined as the time needed for the fidelity to decrease by 50%.We observe that τ1 /2 has in-creased linearly with the quench g f .This result matches basic estimates given by the second-order short-time expansion of F (t) and uncertainty relation ∆E∆t ≥ 1 /2.Therefore, fidelity decay time can be lower bounded by ∆E −1 , estimated using samples from the initial state ψ α(0) [49].Reference points from this calculation are presented in Fig. 3 (left, inset).This comparison demonstrates that the t-VMC method can be used to estimate quantities of experimental interest for system sizes unreachable by other wavefunction-based methods.

A. Benchmarks
To substantiate our results, we perform a series of benchmarks and compare results to tensor-network simulations for a one-and two-dimensional versions of the model.In particular we benchmark the results with the time-evolving block decimation (TEBD) [50,51] algorithm.For all benchmarks, states were initialized to the coherent superposition of all basis states |ψ(0)⟩ ∝ dθ |θ⟩ by explicitly setting the final convolution kernel w c D (Eq.12) to zero.All presented tensor-network simulations have been performed with a fixed singular value cutoff.Convergence within the matrix product state (MPS) variational manifold has been confirmed by repeating simulations with larger cutoff values.
We organize numerical benchmarks as follows.First, we compare t-VMC results with TEBD for an extended one-dimensional and a smaller two-dimensional system.Practical error estimates are defined.Then, we turn to examining effects of key hyperparameters in the t-VMC approach and show evidence of self-consistent convergence.
Following Refs.[17,52], we use the following figure of merit: where |ψ(t)⟩ = ψ α(t) .In Eq. 20, D(•, •) represents the Fubini-Study distance on the Hilbert space H.We estimate r 2 (t) at each time t using HMC samples from the ansatz (see Ref. [17] and Appendix D).Intuitively, r 2 (t) measures an appropriately normalized measure of deviation between the full state e −iHδt |ψ(t)⟩ after one time step δt and its projection onto the variational manifold ψ α(t+δt) .We plot the integrated error to reflect error propagation through time as accurately as possible.We remark that the integrated-squared error in Eq.21 should be interpreted an upper bound on the square of the integrated error R(t) = t 0 r(s) ds due to the triangle inequality.
In Fig. 4 (left), we show that this algorithm performs well on a one-dimensional system of N = 64 rotors where the growth of the so-called bond dimension χ is limited.Convergence to appropriate equilibrium values is reached for both methods with good agreement at intermediate times for the dynamics of potential energy density ϵ p (t) and the Loschmidt echo F (t).The integrated residual R 2 (t) grows more rapidly for lower values of g.This is expected because the initial state ψ(0) has lower energy for larger values of g in the QRM Hamiltonian, Eq. 1, representing a more typical state in the disordered phase.
In contrast to the one-dimensional (1D) case, in Fig. 4 (center), we observe that the TEBD method exponentially grows the MPS bond dimension χ past the cutoff χ max = 1000 at relatively short times.We plot the number of parameters P MPS in the MPS as a function of time in the right panel of Fig. 4, in units of the number of parameters P CNN in the CNN ansatz presented in this work.We see qualitative agreement between the two methods for early times, before χ grows to the point where further simulation is numerically prohibitively expensive.
In Fig. 5 we show evidence that the variance of observables is controllable through the most important Monte Carlo (HMC) hyperparameters while the bias is mostly controlled by different regularizations of the S-matrix inverse (Eq.6).In the top panel of Fig. 5, we see that the standard deviation of the estimator for total magnetization M (t) scales with the number of HMC samples N s in in all cases.Middle: Variance change in magnetization estimates by varying the number of leapfrog integrator steps L between HMC proposals.In the L → 1 limit, HMC approaches the random-walk Metropolis sampler.Bottom: Bias increase associated with changing the λ 2 cutoff parameter in Eq. 9.
an expected way: for three different times during the evolution.
In addition, we report that heuristically varying the number of leapfrog integration steps L increases estimator variances the most around segments of trajectories with higher curvature, as evidenced by the middle panel of Fig. 5. Intuitively, in the limit of L → 1 and small leapfrog step sizes ε, HMC approaches random-walk Metropolis sampling (see Ref. [39] and Appendix A 1) which suffers from lower acceptance rates and longer mixing times in cases of sharply peaked target distributions.We observe that even a moderate increase to L ≈ 10 accompanied by automatic hyperparameter tuning described in Sec.II B considerably reduces variance.
Finally, we explore the effects of S-matrix regularization (Eq.9).In practice, we set λ 2 itself in an adaptive manner each iteration: depending on the S-matrix spectrum.In the bottom panel of Fig. 5, we see that, for a fixed a c = 10 −5 , increasing r c leads to increasing the estimator bias.Excluding relevant eigenvalues from participating in time evolution through Eq. 9 can lead to a failure to capture parts of relevant physics.Overall, both t-VMC and TEBD algorithms predict similar dynamical behavior of the potential energy density (Eq.16) and the fidelity (Eq.19), as shown on Fig. 4.However, the number of parameters in the MPS grows exponentially due to entropy buildup during time evolution.Tensor-network real-time evolution algorithms [53,54] based on MPS or two-dimensional architectures such as projected entangled pair states (PEPS) [12,55] face several challenges to extend to late times and higher dimensions.Incorporating continuous degrees of freedom exacerbates the problem -tensor network algorithms are limited to using the locally truncated eigenbasis of the angular momentum operator L k in the QRM Hamiltonian in Eq. 1, in contrast to the t-VMC method (see Appendix B).

IV. CONCLUSION
We present a method to approximate unitary dynamics of continuous-variable quantum many-body systems, based on custom neural-network quantum states.The approach employs Hamiltonian Monte Carlo sampling and custom regularization of the quantum geometric tensor.The method was benchmarked on quench dynamics of two-dimensional quantum rotors.We indicated that our calculations are able to access nonlocal quantities like the return probability.Good agreement was found with tensor-network-based TEBD simulations for the case of one-dimensional systems of comparable size.Finally, we showed evidence that the method is controlled by a hand-ful of key hyperparameters.Our approach paves the way for accurate nonequilibrium simulations of continuous systems at previously unexplored system sizes and evolution times, bridging the gap between simulation and experiment.
V. ACKNOWLEDGEMENTS M. M. acknowledges insightful discussions with Filippo Vicentini about t-VMC regularization, Bob Carpenter about the role of circular geometry in Monte Carlo sampling and Hamiltonian Monte Carlo details.In addition, discussions with Sandro Sorella about the infinite variance problem and James Stokes about different ansatze were very helpful for fine tuning simulations.MM also acknowledges support from the CCQ graduate fellowship in computational quantum physics.The Flatiron Institute is a division of the Simons Foundation.D. S. was supported by AFOSR: Grant No. FA9550-21-1-0236 and NSF: Grant No. OAC-2118310.

Software libraries
The code used in this work has been packaged into an installable library and is publicly available to reproduce any results in this work or explore new ones: github.com/Matematija/continuous-vmc.
It was built on JAX [56] for array manipulations, automatic differentiation for sampling and optimization and GPU support, Flax [57] for neural-network construction and manipulation and NumPy [58] and SciPy [59] for CPU array manipulations.Matplotlib [60] was used to produce figures.
Step size Dynamically adapted

R
The leapfrog integrator step size.

Mass matrix
Dynamically adapted The covariance (metric) tensor of the dummy momentum variables π.

N
The number of leapfrog steps taken before proposing a sample.(If γ > 0, we relabel L → L 0 .)Target acceptance rate used for optimization of ε by algorithm in Ref. [40].

Nw
Length of warmup phase 800

N
Total number of MC samples used for extended warmup.

Np
Number of slow windows 5

N
Total number of slow adaptation windows during warmup.

Ns
Number  variables: m k = Var(θ k ) using the appropriate formula for the variance of periodic random variables presented in the main text, Sec.III.
After initializing each θ k ∼ Uniform(−π, π), we begin the warmup phase with a single fast window of length Nw /12, followed by five fast windows.The first fast window is Nw /36 steps long with each subsequent slow window doubling in size.Finally, we end the warmup by running an additional fast window for the remaining Nw /18 steps.After each window, the HMC transition kernel (the leapfrog ODE solver) is updated with adapted values for ε and M (for slow windows).After the final fast window, all hyperparameters are locked in and actual collection of the N s for Eq. 6 begins.The full list of relevant hyperparameters can be found in Table A 1.
We use automatic differentiation (using JAX [56]) to obtain numerically exact gradients ∇ θ ln p(θ, t) of needed to run the leapfrog integrator.To avoid loss of accuracy or numerical instabilities through exponentiation, we employ the following identity: when the logarithm of the wavefunction is parameterized instead of the wavefunction itself.
For completeness, we note that a common precaution against leapfrog integration getting stuck in regions of high curvature used in this work.Instead of fixing the integration length to a specific value L = L 0 , it is randomly chosen between (1−γ)L 0 and (1+γ)L 0 each time the integrator is called, with a new hyperparameter 0 ≤ γ < 1.This jittering of trajectory lengths can help HMC walkers move away from regions of high curvature if they get stuck [38,39,61].
Finally, to collect more independent samples by utilizing modern massively parallel GPU hardware, we run N c such chains in parallel, each one warmed up independently.
Finally, we note that the HMC proposal outlined in Eq. 11 approaches the RWM update: in the limit of few leapfrog integrator steps: L → 1.Indeed, for L = 1 and small step sizes ε, Eq. 11 becomes where M −1 is equivalent in effect to the √ Σ matrix and π(0) ∼ N (0, M −1 ) by construction in Eq. 10.

Numerical regularization schemes
After evaluating the averages in Eq. 6 at time t, one needs to solve the linear system i S α = g to obtain α needed to progress to time t + δt.Since the S matrix is singular in most cases of interest, a robust regularization scheme is needed.As pointed out in the main text, replacing S → S + ϵ1 is often enough in the case of ground-state searches (imaginary-time evolution).We remark that this is equivalent to the L2-regularized least-squares solution of i L α = h.
where L † L = S is the Cholesky decomposition of the S matrix (assuming S is positive-definite), L † h = g, and ∥•∥ 2 is the standard euclidean 2-norm on C P .
As outlined in the main text, we instead adopt a regularization scheme based on the spectrum of the S matrix, S = U ΣU † , where Σ = diag(σ 2 1 , . . ., σ 2 P ).Our definition of the pseudoinverse is In the limit of λ 2 → 0, we recover the actual matrix inverse.As opposed to the more traditional choice of the step function f (σ 2 ) = θ(σ 2 − λ 2 ), we find that choosing a smooth functional form for f (σ 2 ) in Eq.A5 makes the adaptive time stepping in the top-level integration routine (see Appendix A 3) more stable.
As noted in the main text, we set λ 2 to: each iteration, with a c = 10 −4 and r c = 10 −2 chosen for 2D calculations and a c = 10 −5 and r c = 10 −4 for 1D benchmarks.To track potential over-regularization and as a measure of ansatz expressivity, we define the effective rank ρ(S) = µ f (σ 2 µ ).Intuitively, since 0 < f (σ 2 ) < 1 for all eigenvalues σ 2 , ρ(S) can be interpreted as the effective number of eigenvalues that have not been set to zero by the regularization function f .In other words, it corresponds to the number of parameters in α that get updated at time t.
We plot ρ(S) as a function of time on Fig. 6 for some simulated quenches.In all cases, we see that the effective rank increases rapidly to ρ ∼ 1 at intermediate times that, for larger quenches, correspond to rapid oscillations and onset of vorticity.In those cases, it is natural to interpret this regime as almost all parameters α being important to capture the relevant physics.At later times, ρ converges to values below 10%, indicating equilibration and less oscillatory behavior.
For completeness, we mention that alternative regularization techniques have been explored as well.For example, the method of Schmitt and Heyl in Ref. [17], based on the signal-to-noise ratio (SNR) for each eigenvalue in σ 2 , represents a computationally and physically well-motivated approach.However, it did not bring any measurable performance improvement in our case.

The time-dependent variational principle and ODE integrators
To make use of the TDVP action in Eq. 5 in the main text, to propagate the variational parameters forward in time, one must construct the corresponding Euler-Lagrange equations.To this end, we first manipulate the action into a more transparent form: which is identical to eigenfunctions for a particle on a circle at each lattice site -we have a product basis basis |m⟩ = |m 1 , . . ., m N ⟩.The Hamiltonian given in Eq. 1 then reads: After inserting the identity 1 = dθ |θ⟩⟨θ| into the second term and simple integration, we obtain where δ •• is the Kronecker δ symbol.The structure of Eq.B3 suggests rewriting the original Hamiltonian as where

Convergence and cutoffs
To reinforce TEBD results as a benchmark in two dimensions, we further study the dependence of obtained results on the singular value cutoff c and the local basis truncation parameter M defined in subsection B 1. In both cases, we perform a series of calculations on a 4 × 4 system for a set of different cutoffs and for two different values of the coupling constant g in the QRM Hamiltonian (Eq.1).
In the case of the singular value cutoff c, we look at a range of values between 10 −4 and 10 −11 in Fig. 7.We observe that plotted curves seem to converge only for c ≤ 10 −9 .Satisfactory energy conservation is reached for c ≈ 10 −9 as well as convergence of measured observables, except for the highest value c = 10 −4 .In that case, almost all singular values are discarded, extinguishing any nontrivial dynamics.Any further decrease in c has the undesirable side-effect of rapidly growing the MPS bond dimension χ, making longer simulations prohibitively expensive.
We explore a range of values for the local basis truncation parameter M as well.For both reference values of the coupling constant g, we observe fast convergence towards self-consistent time evolution.These results indicate lower sensitivity to values of M , as long they lie above a threshold of M ≳ 3 (M = 5 was used in the main text).Therefore, for short to intermediate time evolution, Jt ∼ 1 − 10, we expect the numerical cost of increased bond dimensions to remain dominated by singular value cutoff c.
In all cases, bond dimensions χ increase quickly and saturate at the cutoff value χ max , as is characteristic of MPSbased calculations in two dimensions.Therefore, to estimate any additional errors coming from the choice of χ max (set to 10 3 in the main text), we perform an additional set of independent calculations for a range of values, at g = 8.0.Results are presented in Fig. 9 where we plot estimated errors in reference observables.We discover that the relevant errors can be neglected for values as low as χ max ≈ 600, at least for intermediate times.Reference observables quickly converge to the desired precision in this case.

FIG. 2 .
FIG.2.Results for different quenches from initial value gi = 3 on a two-dimensional 8 × 8 square lattice.Left: Potential energy, magnetization and angular variance as functions of real time.For the small quench to g f = 4.5, we observe the expected behavior with slower approach to the new ordered equilibrium state.Convergence is similar to adiabatic change.The moderate quench to g f = 6.0 exhibits a sharp increase in rotor angle variance is accompanied by a single flip (right panel) in the average magnetization at t ≈ J −1 .For the large quench to g f = 9.0, many rotor flips occur after the first one, indicating much more detailed exploration of the underlying Hilbert space.Convergence to the new equilibrium starts taking place only for t ≳ 5J −1 .Right: A parametric plot of the mean rotor direction.We observe a more thorough exploration of the magnetization sphere for larger quenches.

2 .
FIG.4.One-and two-dimensional benchmarks and comparison with tensor-network data.Evolution was performed starting from a coherent superposition state |ψ(0)⟩ ∝ dθ |θ⟩.Results are compared with the TEBD tensor-network algorithm evolving a matrix product state (MPS) in the conjugate angular-momentum eigenbasis (see Sec. III A and Appendix B).Left: A onedimensional benchmark on a chain with N = 64 rotors and open boundary conditions.Center: A two-dimensional benchmark of the t-VMC method on a 4 × 4 lattice and open boundary conditions.We note that disagreement between t-VMC and TEBD results appears as the maximum bond dimension χmax is reached.Singular value cutoff of 10 −12 was used.Right: The growing number of MPS parameters PMPS associated with the increasing bond dimension χ is plotted in units of the number of the CNN parameter count PCNN as a function of time.One-and two-dimensional cases are compared.A cutoff of χmax = 1000 was reached in the 2D system for the singular value cutoff of 10 −9 .

FIG. 5 .
FIG. 5. Effects of key hyperparameters on magnetization measurements.All experiments were performed on a onedimensional chain with N = 32.Top: Effects on magnetization estimates by varying the number of HMC samples Ns.Errors were estimated using bootstrap resampling independently at different times show expected scaling σM ∝ N −1/2 s B5)    for all k ∈ Λ.To perform tensor-network calculations, we truncate local basis states to {|−M ⟩ , . . ., |M ⟩} so thatL + |M ⟩ = L − |−M ⟩ = 0.Namely, we set matrix product operator (MPO) representation of the Hamiltonian.We use M = 5 throughout.For use in real-time evolution through the TEBD algorithm in Sec.III A in the main text, we initialize the trial wavefunction as a matrix product state (MPS)[63] (see Appendix C 2).We exploit|ψ(t = 0)⟩ ∝ dθ |θ⟩ = i∈Λ |m i = 0⟩ (B7)to match the initial state given in the main text in Sec.III A by initializing the corresponding MPS to have bond dimension χ = 1.
FIG.3.Fidelity and vorticity as functions of time.Left: Time-dependent many-body fidelity F (t) defined in Eq. 19, for a number of quenches.For trajectories quenching to values of g f in the same equilibrium phase, we see convergence to nonzero values at late times.Conversely, trajectories with g f > gc converge to F (t → ∞) = 0. Additionally, τ1 /2 (the time it takes for fidelity to decrease by 50%) is shown to scale linearly with g in agreement with the appropriate uncertainty relation ∆E∆t ≥ 1 /2.Right: The onset of vorticity (defined in Eq. 18) for three quenches of increasing magnitude.

TABLE I .
The list of relevant hyperparameters for the Hamiltonian Monte Carlo algorithm with their values used in this work.