Lattice real-time simulations with learned optimal kernels

We present a simulation strategy for the real-time dynamics of quantum fields, inspired by reinforcement learning. It builds on the complex Langevin approach, which it amends with system specific prior information, a necessary prerequisite to overcome this exceptionally severe sign problem. The optimization process underlying our machine learning approach is made possible by deploying inherently stable solvers of the complex Langevin stochastic process and a novel optimality criterion derived from insight into so-called boundary terms. This conceptual and technical progress allows us to both significantly extend the range of real-time simulations in 1+1d scalar field theory beyond the state-of-the-art and to avoid discretization artifacts that plagued previous real-time field theory simulations. Limitations of and promising future directions are discussed.


I. INTRODUCTION
What unites many of the pressing open questions in modern physics, irrespective of whether they relate to eV (condensed matter-), MeV (nuclear-) or GeV (particle physics) energy scales, is the need to access the dynamics of strongly correlated quantum many-body systems in Minkowski time.Concretely, as e.g.outlined in the recent Snomass community review [1,2], an abinitio understanding of transport properties of nuclear matter at high temperature and density, as well as the scattering of showers of high energy partons still remain out of reach of state-of-the-art analytic and numerical Monte-Carlo methods.First principles insight into realtime transport of non-relativistic fermions [3] and their interaction with gauge fields is a key puzzle piece in understanding high-temperature superconductivity (e.g. in the Hubbard model [4]).Fission and fusion dynamics [5], too, remain currently out of reach of fully ab-initio field-theoretic approaches, requiring model input.
Vital ab-initio insight into the static (thermodynamic) properties of strongly correlated many-body systems has been achieved in the past through Monte-Carlo simulations of Feynman's path integral [6].These numerical techniques rely on analytic continuation to an unphysical Euclidean time.In turn the extraction of relevant real-time dynamics becomes an ill-posed inverse problem [7], which severely affects the accurate determination of central quantities of interest: from transport coefficients [8,9], to in-medium decay rates [10,11], to vacuum parton distribution functions [12,13].Developing a direct simulation approach in Minkowski time is thus called for.
Direct simulations of real-time dynamics suffer from the so-called sign problem [14,15].Feynman's path integral is formulated as a sum over field configurations weighted by a complex phase.A minute signal emerges from the sum of a vast number of almost cancelling phases, overwhelming otherwise efficient Markov-chain sampling based approaches.Some sign problems have been proven [16] to belong to the class of NP-hard computational problems, which entails that no generic solution method in polynomial time exist on a classical com-puter.Various approaches have been proposed to tackle the sign problem, such as reweighting (RW), extrapolation (EX) [17][18][19][20], density of states (DS) [21][22][23], tensor networks (TN) [24,25], Lefschetz thimbles (LT) [26][27][28] and complex Langevin (CL) [29,30] .They all propose a system agnostic recipe to the estimation of observables in the presence of a sign problem.Without a system specific component, each of these methods are destined to eventually fail, be it that their computational cost scales unfavorably when applied to systems in realistic volumes in 3+1 dimensions (RW,TN,DS,LT) or they suffer from convergence to an incorrect solution (CL).
Quantum computing offers a different angle of attack to the sign problem [31], as in principle it can compute the unitary time evolution of a spin-system.The mapping of a realistic field theory to spin systems remains an open challenge, especially if gauge degrees of freedom are involved [2].The necessity to derive a Hamiltonian for implementation on a quantum computer, to date, requires truncation of the continuous state space and in the case of photons and gluons fixing to a particular gauge.It is acknowledged in the quantum computing community (see e.g.[32,33]) that with near-future noisy intermediate scale devices, many physical systems of interest remain too complex to be modelled with quantum circuits.
Therefore, progress in the short-term requires innovation among system specific real-time simulation techniques on classical computers.

II. REAL-TIME COMPLEX LANGEVIN
Here we build upon the complex Langevin approach, which is one of the complexification strategies to the signproblem, with similarities but important differences to contour deformations (LT) (see discussion in [34,35]).In conventional stochastic quantization [36,37] one proves that the expectation values of a Euclidean quantum field theory ⟨O⟩(τ ) = Dϕ E O(ϕ E )exp[−S E (ϕ E )] can be reproduced by simulating a stochastic process in an additional Langevin time τ L direction, using the Langevin equation ).Quantum field theory with a mixed initial density ma- trix in Minkowski time constitutes an initial value problem and is formulated on the Schwinger-Keldysh contour C SK with a forward-and backward branch, housing the fields ϕ 1 , ϕ 2 respectively In a thermal system at T = 1/β, we have ρ β ∼ exp[−βH] and sampling over initial conditions can be written as a path integral on a compact imaginary time domain of length β, connecting the real-time branches as closed contour C SKt .We parametrise C SKt in the complex time plane with the real contour parameter γ: t(γ).Cauchy's theorem allows us to deform the integration contour and we choose the convention sketched in fig. 1, where the downward portion of C SKt is divided into two pieces at t = t max and t = t 0 .
Naive CL proposes to complexifiy the field d.o.f.ϕ c = ϕ R +iϕ I and to carry out the following coupled stochastic process [37] for ϕ R and ϕ estimating observables ⟨O⟩ from the ensemble average over analytically continued observables ⟨O⟩[ϕ c ].While significant progress has been made in application of CL in various model systems [38,39] and even to the theory of strong interactions at finite Baryon-chemical potential [40][41][42][43], the simulation of real-time dynamics so far has been hampered by various hurdles: divergencies (runaways) and convergence to incorrect solutions as the real-time extent of the contour is increased [44][45][46].The runaway problem leads to a break down of the numerical solver for the Langevin equation when the process explores regions of the complexified manifold far from the origin.It has been shown in ref. [47], that it can also be understood as a consequence of the stiffness of the non-linear CL dynamics.This practical problem is solved by either using an adaptive stepsize control [48], or through the use of inherently stable implicit discretization schemes, such as Euler-Maruyama, for eq.( 3) [47].The inherent regularization provided by the implicit scheme also makes it possible to directly simulate with CL on the real-time axis of the C SKt contour, without tilt.
Important insight into the convergence properties of CL have been gained in [49,50] through analysis of the relation between the real probability distribution P [ϕ R , ϕ I ] sampled by eq. ( 3) and the complex Feynman weights exp[iS] in eq. ( 2).Connection is made via the real Fokker-Planck operator L and its complex generalization L. For CL to correctly reproduce expectation values, the sampled distribution P [ϕ R , ϕ I ] must fall off sufficiently fast in ϕ I , to enable integration by parts.At the same time the spectrum of L must have negative real parts.Based on this insight, criteria for correct convergence have been developed: boundary terms [51,52], and two improvements to complex Langevin have been proposed: gauge cooling [53], where gauge freedom is exploited to keep the d.o.f.close to the original real-valued manifold and dynamic stabilization [54], which introduces an additional drift term into eq.( 3).The drawback of the latter is that the new drift term is non-holomorphic and thus at odds with the proof of convergence of CL and it might introduce a bias in the results.
It is long known [37] that real Langevin can be modified by a kernel K, without changing its stationary distribution.This freedom has been exploited to improve autocorrelation in Euclidean theories [55].In complex Langevin one may introduce a complex kernel [56][57][58].K must be a holomorphic function and be factorizable as K = H T H, but can otherwise be an arbitrary (matrix) function of the fields.It can encode transformations [59] such as in coordinates, contour deformations or redefinition of variables.The general kernelled CL evolution equation for discretized spatial coordinates ϕ c (x j ) = ϕ j c reads In the past, a few system specific transformations have been found that soften (see e.g.[60,61]) or even avoided (see e.g.[57][58][59]) the sign problem in model systems (see also reformulation strategies e.g.[62][63][64]).However for realistic systems, success has been limited and no systematic recipe is known to extend results from simpler systems.This study instead uses machine learning (ML) techniques to systematically learn optimal kernels, based on system specific prior information.

III. MACHINE-LEARNING ASSISTED KERNELLED LANGEVIN
Our machine learning strategy for kernelled Langevin is inspired by reinforcement learning (RL) [65].RL underlies recent advances in diverse fields: beating com-puter games or steering autonomous vehicles.It is based on an agent, endowed with a set of limited actions, placed in a predefined environment.Success of the agent is encoded in a cost/policy functional defined from environment variables and the internal state of the agent.A mathematical representation of the actions of the agent allows the use of differential programming techniques [66] to evaluate the gradients of the cost functional w.r.t.those actions.Challenges are the robust detection of failure modes of the agent, and the trade-off between generality of the actions of the agent and learning efficiency.
Specifying to real-time simulations, we define our agent as the controller of the kernel K, which allows it to explore the abstract space of stationary distributions of the stochastic process eq. ( 4).A crucial ingredient is our use of system-specific prior information to define the cost functional p[K], used to assess the success of CL convergence.As was shown e.g. in [35] the failure of convergence of CL on the SK contour occurs globally, i.e. it affects correlators on all branches.In a thermal setting, time translation invariance requires equal-time correlation functions to be constant on the whole complex time-contour.Conventional simulations in the Euclidean domain in addition give access to their values, as well as to the Euclidean unequal-time correlators.
Deviations from this prior knowledge are easily assessed within the CL simulations.To test for successful convergence, we deploy p , which consists of two likelihood terms involving the covariance matrices C Re/Im of the CL equal-time correlators Re/Im⟨ϕ 2 c ⟩.It thus assesses the constancy and agreement with apriori known values.p[K] makes reference to expectation values involving the fields ϕ c and thus implicitly depends on K. To obtain robust gradients w.r.t. the entries of K we must take derivatives over the whole stochastic dynamics.While adjoint [67] and shadowing methods [68] are promising to compute gradients, we find from the CL Lyapunov exponents [69] that the dynamics actually becomes chaotic, degrading the performance of conventional differential programming techniques.
Instead we use a low-cost optimization functional l[K], which provides gradients ∇ K l[K] that significantly reduce the values of the actual cost functional p[K].We find that proposed in [70], offers the best performance in minimizing p[K], compared to earlier choices in [35].This improvement relies on its relation to boundary terms and to the recently derived improved correctness criterion in [71].
We restrict ourselves to the simplest type of a field-and τ L independent kernel.Note that even though the optimization functional may contain non-holomorphic terms, the kernel does not and thus eq. ( 4) is compatible with the proof for correct convergence.The potentially costly Jacobian δK/δϕ c also does not need to be computed.

IV. NUMERICAL RESULTS
Let us apply our machine learning assisted complex Langevin approach to thermal scalar field theory in 1 + 1 dimensions with m = 1 and a quartic self-coupling (λ/4!)ϕ 4 with λ = 1 at βm = 4/10, a benchmark also used in [72].The field is discretized on the SK contour sketched in fig. 1 with N t = 32 points along the real-time branches each and N τ = 4 steps along the imaginary time direction.The spatial dimension is resolved with N x = 8 points.As lattice spacing we use a s m = 2/10 and a finer a t m = 1/10, to avoid discretization artifacts from the corners of the SK contour.The discretized action is identical with the one of Ref. [72].(code available at [73]) Using adaptive step size with maximum Langevin step dτ L = 10 −3 , we simulate at a real-time extent of t max = 3.2, which lies deep in the region where naive CL (K = I) fails to converge correctly, as shown by the gray triangles in fig.2, denoting the real-and imaginary part of the unequal time, momentum zero, correlator C(t) = ⟨ϕ c (t, p = 0)ϕ c (0, p = 0)⟩.The failure manifests in the deviation of C(0) at K = I from the value of the equal-time correlation function F (γ) = ⟨ϕ c (γ, p = 0)ϕ c (γ, p = 0)⟩ at γ = 0. Its values are known from conventional HMC simulations and indicated by the black dashed line (in 1+1d, F (0) only carries a minute lattice spacing dependence).
We parameterize a fully dense, constant, complex kernel via H = A + iB with real matrices A and B, each of which have [(2N t +N τ )N x ] 2 entries.Learning of the optimal kernel starts from the trivial choice H = 1, i.e.K = H T H = 1.The resulting stiff dynamics is solved with an implicit Euler-Maruyama integrator for which we use the implementation in the DifferentialEquations.jllibrary [74] of the Julia language.After a Langevin time of τ opt L = 5 we compute the gradient of the discrete eq. ( 5) using the autodiff capability of Julia.Based on the Adams algorithm with learning rate r l = 10 −3 the entries of A and B are iteratively updated reducing the initial p[K = 1] ≈ 1200 to p[K opt ] ≈ 7.2.Observables for K opt are obtained from three streams with τ obs L = 5000.The central result of our study, the unequal-time correlation function C(t) from optimal learned kernels in 1+1d is shown as blue crosses Re[C] and green stars Im[C] in fig. 2. We reach a real-time extent of at t max = 3.2 which is twice that was previously achieved in the literature using contour deformations [72].Note that our simulation results for Re[C](0) and Im[C](0) both agree with F (0) from HMC, given by given by the dashed black lines.
We emphasize that the advantageous scaling properties, that our approach inherits from CL, enables us to deploy a finer grid here.Implicit methods do not pose a problem, as highly optimized implementations of solvers are readily available.Since we restrict ourselves to field independent kernels so far, we avoid the need for Ja-FIG.2. Re[C] and Im[C] in 1+1d field theory from naive (gray triangle, stars) and optimal kernel CL (blue crosses, green stars) with Nt = 32.Result from contour deformation [72] as black squares with Nt = 8.The value of the correlator at t = 0 from hybrid Monte-Carlo is given as gray solid line.
cobians, whose computational cost is a central limiting factor for contour deformation methods.
We showed in [35] that the equal-time correlator F (γ) is more difficult to reproduce in CL than the unequaltime correlator C(t) and thus plot the former against the contour parameter γ in fig. 3. Note that both Re[F ](γ) and Im[F ](γ), while showing minute oscillations, agree with the apriori known values (gray dashed lines).The optimal kernel CL results are in stark contrast to naive CL (K = I), for which F (γ) (gray triangles, rhombes) clearly deviates from the HMC.This crosscheck provides convincing support for the correctness of convergence.
Further support for correct convergence is provided from the absence of boundary terms, such as B 1 (see [51,52]) for the ⟨ϕ 2 ⟩ observable.As shown in fig. 4 B 1 exhibits a powerlaw dependence on dτ L , consistent with vanishing boundary terms in the continuum limit.
In the top row of fig. 5 we plot the values of the optimal learned kernel underlying figs. 2 and 3. Color coding resolves values |K opt | ≤ 0.4, sufficient for all but the diagonal entries of Re[K diag opt ] ≈ 1.025.Each pixel corresponds to a component of K opt that connects two spacetime points, ordered such, that in each spatial slice x i denoted by a gray arrow, the parameter γ traverses the full contour C SKt .Re[K opt ] is dominated by the diagonal alone, while Im[K opt ] shows a distinct banded structure in space with diminishing amplitude farther away from the diagonal, which represents the spatially non-local nature of the transformation implemented by K opt .In the two insets we highlight the behavior of the kernel in a single spatial slice, where the corresponding sections of the SK contour connected by the kernel entries are indicated by the gray arrows.Similar to the results in 0+1d, we find a characteristic finite difference-like behavior, where entries of opposite sign on the sub and supradiagonal accompany those on the diagonal, indicating a Fourier filter.
Our ML approach is able to identify a much simpler structure than what a naive extension of the free theory kernel suggests (c.f.[35]), a testament to the efficacy of the learning strategy, which holds potential for analytic insight into convergence restoring transformations.
While success of the ML approach is encouraging, it too will fail at larger real-time extents.Establishing an exact range of validity is work in progress.One reason is that we only optimize based on the low-cost functional bust gradient estimators for chaotic stochastic systems (see e.g.NILSAS for the Lorenz system [69]) is called for.Another reason is the thimble structure of the theory (see also [75]).One-d.o.f.models tell us that parameters exist, in which a field-dependent kernel is needed to capture the physics of multiple contributing thimbles.In that case the kernel can be systematically expanded via a rational approximation involving ϕ c , limiting computational cost.Transfer learning between kernels of different expressivity will be key to keep cost in check.
In summary we have presented a machine learning approach to direct real-time simulations on the lattice, in which system-specific prior information is incorporated into CL via iterative ML of an optimal field independent kernel.Using a novel low-cost functional for the computation of gradients for learning, we achieve efficient convergence in 1+1d field theory to at least twice the real-time extent previously accessible.Due to the efficiency of the approach, we can access fine grids to avoid discretization artifacts affecting previous studies.Work is ongoing to extend the results to realistic (3 + 1)d and we explore the inclusion of field dependent kernels.
FIG. 5. (top) Entries of the real-(left) and imaginary part (right) of the optimal kernel underlying figs. 2 and 3. (bottom insets) zoom-in of (left) Re[Kopt] and (right) Im[Kopt] on a single spatial slice with the subsections of CSKt labelled on gray arrows.(see text for detailed discussion) Magnitude of the dominant boundary term B1, based on the ⟨ϕ 2 ⟩ observable.Note the best fit powerlaw dependence on (dτ max L) 0.66 consistent with correct convergence.