Relativistic continuous matrix product states for quantum fields without cutoff

I introduce a modification of continuous matrix product states (CMPS) that makes them adapted to relativistic quantum field theories (QFT). These relativistic CMPS can be used to solve genuine 1+1 dimensional QFT without UV cutoff and directly in the thermodynamic limit. The main idea is to work directly in the basis that diagonalizes the free part of the model considered, which allows to fit its short distance behavior exactly. This makes computations slightly less trivial than with standard CMPS. However, they remain feasible and I present all the steps needed for the optimization. The asymptotic cost as a function of the bond dimension remains the same as for standard CMPS. I illustrate the method on the self-interacting scalar field, a.k.a. the $\phi^4_2$ model. Aside from providing unequaled precision in the continuum, the numerical results obtained are truly variational, and thus provide rigorous energy upper bounds.


I. INTRODUCTION
Tensor network states (TNS) have become the tool of choice for many-body quantum systems on the lattice [1,2]. They provide a sparsely parameterized manifold of wavefunctions that is well suited for the approximation of the low energy states of local Hamiltonians [3,4]. The bond dimension D controls the size of the manifold and thus the expressiveness of the class of states. In 1 + 1 spacetime dimensions, matrix product states (MPS) [5] are the simplest instance of TNS. They underpin the earlier density matrix renormalization group [6][7][8] and provide one of the most efficient numerical methods to study quantum chains. Their extension for 2 + 1 dimensional problems, the projected entangled pair states (PEPS) [9], while more difficult to optimize, have been used successfully in a wide range of non-trivial instances including the Hubbard model. Aside from their numerical use, tensor network states are powerful theoretical tools [10], e.g. for the classification of topological [11,12] and symmetry protected [13][14][15][16] phases of matter.
In light of the progress they enabled in the discrete, it is tempting to use TNS to solve problems in the continuum, and attack quantum field theories (QFTs). This can be done in two ways, (i) by discretizing the QFT first to then use the standard lattice TNS toolbox and finally extrapolate the results (ii) by taking the continuum limit of TNS first to apply them to the continuum model without extrapolation needed. The first option has already provided numerically accurate results in many quantum field theories [17][18][19][20][21][22]. However, working directly in the continuum offers crucial advantages: spatial symmetries are preserved and the perilous continuum extrapolations of the results are avoided. It is this neater second option, which consists in defining a manifold of states directly in the continuum, that I am interested in here.
In 2010, Verstraete and Cirac introduced continuous matrix product states (CMPSs) [23], which are a proper * antoine.tilloy@mpq.mpg.de continuum limit of MPS particularly adapted to 1 + 1 dimensional non-relativistic field theories (like the Lieb-Liniger model). The corresponding numerical toolbox was then developed and extended in the past ten years [24][25][26]. More recently, an extension to d + 1 (d ≥ 2) dimensions pushing PEPS to the continuum, was put forward in [27], but without a general numerical toolbox yet.
The previous continuum ansatz are adapted to nonrelativistic theories only, and cannot be used for relativistic QFTs without an additional momentum cutoff Λ. There are many ways to understand this limitation which I summarize in sec. II B. For relativistic QFTs, this partially defeats the purpose of going to the continuum in the first place, since one still needs to extrapolate the final results as Λ → +∞. This is particularly disappointing given that, at least in 1 + 1 dimensions, relativistic QFTs are straightforward to define "all the way down", without any cutoff [28,29].
My main objective in this paper is to introduce a modification of CMPS, the relativistic CMPS (RCMPS), that is adapted to relativistic QFT in 1 + 1 dimension, and does not require any additional cutoff, UV or IR. In a nutshell it is a state |Q, R parameterized by two D × D complex matrices Q, R and defined as (1) I will explain this formula in more detail later. Let me just mention here that a † (x) is the Fourier transform of the momentum creation operator a † k diagonalizing the free part of the relativistic Hamiltonian under consideration and |0 a the associated Fock vacuum. Crucially, a † (x) is not the creation operator ψ † (x), local in the canonically conjugated fields, and used for nonrelativistic CMPS. Using a instead of ψ is the only difference between CMPS and RCMPS. We will see later why using a is a good idea and the consequences this choice has.
I can mention one advantage already. Because RCMPSs allow to work directly with the true theory arXiv:2102.07741v2 [quant-ph] 5 Jul 2021 without cutoff or extrapolation, the results one obtains are genuinely variational, and thus provide rigorous energy upper bounds. Further, while results based on a discretization or with a finite UV cutoff have a limited validity at high momenta, the RCMPS are exact in the UV.
The price to pay is that RCMPS are obtained from CMPS via change of basis ψ, ψ † → a, a † that is not local. In the new basis, the Hamiltonian density of a relativistic QFT is no longer strictly local, and only exponentially decaying. In practice, the lack of strict locality of the Hamiltonian density introduces minor complications in the computations. However, and perhaps surprisingly, these complications do not change the asymptotic scaling of the cost compared to regular CMPS, which remains ∝ D 3 . Further, the energy density seems to converge fast as a function of D as one would expect [30], while the cost which means RCMPS provide an efficient class of states for relativistic QFTs.
In addition to their direct use for relativistic QFT in 1 + 1 dimensions, demonstrated in this paper, RCMPS could be used as an auxiliary step to solve non-relativistic QFTs in 2 + 1 dimensions. Indeed, continuous generalizations of PEPS currently lack a general purpose optimization algorithm. The missing routine is a function to solve a relativistic QFT in one dimension less, that is in 1 + 1 space-time dimensions [27]. The relativistic nature of this auxiliary theory comes from the need for Euclidean invariance in the 2 space dimensions of the original nonrelativistic theory [27]. For that task, RCMPS would fit the bill better than CMPS and preserve exact Euclidean invariance at short distances.
The present paper is structured as follow. I first discuss standard CMPS and their limitations for relativistic theories in sec. II before introducing and discussing the RCMPS ansatz in sec. III. I first explain how to compute expectation values in sec. IV and then show how to optimize the state to find ground states in sec. V. The numerical results for the ground energy and correlation functions, which result from this optimization, are presented for the φ 4 model in sec. VI. I finally discuss some extensions in sec. VII, namely a slight variation of RCMPS with more general basis, and explain how one could obtain more general observables by extending known CMPS techniques. I conclude with a general discussion of the method in sec. VIII, comparing it with renormalized Hamiltonian truncation (RHT). A quicker presentation of the results, emphasizing the importance and difficulty of the variational method, is presented in a companion letter [31].

A. Definition and main properties
Continuous matrix product states (CMPS) were introduced by Verstraete and Cirac [23] in 2010. They extend the successful matrix product states ansatz from the lattice to the continuum. Concretely, for a bosonic quantum field theory on the closed line [−L, L] with (nonrelativistic) creation and annihilation operators (ψ † , ψ), a CMPS is the state where by definition [ψ(x), ψ † (y)] = δ(x − y), P is the path ordering operator, and |0 ψ is the Fock vacuum annihilated by all the ψ(x). The state is parameterized by two D × D matrices Q and R, where D is the bond dimension, and which contain all the variational freedom. The trace is taken over the associated D dimensional Hilbert space, and implements periodic boundary conditions (more general boundary conditions are possible by inserting a matrix B in the trace).
CMPS can be derived from a genuine continuum limit of MPS and thus share many of their properties [32]. In particular, all the correlation functions of local operators can be computed explicitly and efficiently. This is seen by introducing the generating functional Z j ,j : which can be used to compute all normal-ordered correlation N -point functions of local operators, e.g.
Using Wick's theorem, one can show [32] that this generating functional has an exact expression: is the transfer operator and the trace is now taken over the tensor product of two copies of the original D dimensional Hilbert space. For example, for the simple two-point function of eq. (4), using the exact expression gives, for x ≥ y: The state |Q, R nr is parameterized in a redundant way, in the sense that different Q, R pairs give the same state. In particular, conjugating both matrices with an invertible matrix U keeps the state invariant. This freedom can be used to choose a gauge in which only the anti-Hermitian part K of Q is free [33]: In this gauge, the transfer operator T is of the Lindblad form, hence negative, and generically has a single largest eigenvalue λ 0 = 0 with associated left and right eigenvectors 0 | and |r 0 . Using this form provides convenient expressions in the thermodynamic limit, as e LT → |r 0 0 | when L → +∞. For example, eq. (6) simplifies to: In what follows, and in particular for the upcoming relativistic extension, I will always work directly in the thermodynamic limit. A CMPS is typically used to find the ground state of a non-relativistic QFT. The archetypal example is the Lieb-Liniger model with Hamiltonian H LL : With the generating functional, it is possible to compute the expectation value of the 3 terms of the Hamiltonian density h LL Q,R = f (Q, R), where f is a simple function containing expectation values of products of Q and R as in eq. (8). One then minimizes the energy density over the matrices Q (or K) and R to get close to the ground state: The approximation can get arbitrarily good as the bond dimension (and thus the size of the variational manifold) is increased because CMPS are dense in the Fock space [32]. Further since the method is variational, we also know that the energy we find for finite D always upper bounds the true ground energy In practice, one can simply feed the expression of the energy density as a function of Q and R to a standard numerical minimizer as was done originally in [23]. However it is typically much more efficient, especially for large D, to use more elaborate tangent space methods [34]. I explain how they work in sec. V. In a nutshell, the latter require computing the exact gradient with respect to the variational parameters as well as the natural metric on the tangent space of CMPS induced by the real part of the Hilbert space scalar product. The optimization is then done through gradient descent on the corresponding differentiable manifold. At this stage, all that matters for us is that it can be done and that it is efficient: one can easily optimize CMPSs of large bond dimensions [24] (without barren plateaus or other pathology).

B. UV problems in relativistic theories
CMPS are well suited for non-relativistic theories, which they approximate well "all the way down", without short distance cut-off. However, once we move to relativistic QFT, it becomes necessary to introduce a UV cut-off. This is best understood on the free boson, which was discussed in the CMPS context in [35]. The free boson Hamiltonian is where m is the mass and π, φ are canonically conjugated [φ(x), π(y)] = iδ(x−y). To deal with such a Hamiltonian with a CMPS, it is tempting to express the field operator and its conjugate as a function of a non-relativistic creation-annihilation pair ψ † , ψ: which introduces a new arbitrary mass scale Λ. In this basis, the Hamiltonian H fb is still the integral of a local density, and thus it is straightforward to evaluate the energy density on a CMPS |Q, R nr . This unfortunately leads to mild and serious divergences. The first mild one is that H fb is a priori not normal ordered in the ψ † , ψ operator basis. This is simply solved by considering : H fb : ψ instead. The serious divergence comes from the fact that : H fb : ψ not only contains the standard non-relativistic kinetic energy ∂ x ψ † ∂ x ψ but also ∂ x ψ∂ x ψ + h.c.. The latter is divergent when evaluated on a generic CMPS [33]. This second divergence can be cured by adding an adapted UV regulator to the Hamiltonian and considering [35] which kills the problematic terms. To reach a fixed precision, one then needs to increase the bond dimension D as the cutoff Λ is lifted. Similarly for fermionic theories, it was observed in [36] that one needs to add a UV cutoff to the Hamiltonian, this time not to ensure the finiteness of the results, but to make the optimization problem well behaved. In a nutshell, without a cut-off, the CMPS reduces the energy density by approximating larger and larger momentum modes as the optimization proceeds, completely missing the IR which contributes to the energy only in a subleading way. Independently of CMPS, this sensitivity to high frequencies in variational methods was considered by Feynman as one of the main difficulties preventing its use in relativistic QFT [37]. Zooming out from the CMPS peculiarities, this situation is not surprising, since we are working in the wrong operator basis. Indeed, the operator : H fb : ψ is not bounded from below with this specific ordering. Hence even if we manage to have a finite energy density when computing CMPS expectation values, we are always infinitely far from the true ground state. Further, the free boson is a conformal field theory (CFT) at short distances (the massless free boson). It is thus no surprise that a CMPS, which is adapted to systems with a gap and exponentially decaying correlation functions, completely fails to capture the short distance behavior of relativistic models. Yet another way to see the problem is that the ground state of the free boson has an infinite density of non-relativistic particles, whereas a CMPS always gives a finite density.
Note that these UV problems are not made easier if one adds a relevant interaction, as the latter becomes negligible at short distances. The relativistic continuous matrix product state ansatz will solve these short distance problems already present in the free theory, but also allow to deal with interactions without additional difficulty.

A. Intuition
In the standard textbook approach to quantum field theory [38], the problem of infinite particle density is solved at the very beginning by changing of basis (and in fact even of Hilbert space). Typically, one expands the field operator in new creation-annihilation modes that diagonalize the Hamiltonian: where . In condensed matter physics, this would be called a Bogoliubov transform. In this new basis, the Hamiltonian is diagonal Then, normal ordering H fb →: H fb : a removes the infinite vacuum energy contribution and one finds that the ground state is simply the Fock vacuum annihilated by all the a k 's: |ground = |0 a . The Hamiltonian : H fb : a further is a well defined operator on the Fock space generated by creating relativistic particles from the vacuum. The crucial observation is that in 1+1 dimensions, this normal ordering procedure is typically 1 sufficient to cure all the divergences that can appear, even when adding interactions. In particular, the φ 4 Hamiltonian H which we will consider in more detail later, is a perfectly legitimate and regular Hamiltonian on the free Fock space. From now on, I will omit the subscript on the normal ordering which will always be done with respect to a, a † unless otherwise stated. The Fock space generated by acting with a † k on |0 a is much more adapted to relativistic theories than the Fock space generated by acting with ψ † on |0 ψ , because the former solves the scale invariant short distance behavior of the theory exactly with its vacuum. However, the operators a k , a † k are not adapted to CMPS because the latter encode the state in real space, and not momentum space. Further, the Hamiltonian H is not translation invariant in momentum, which is the situation CMPS are most adapted for. A tempting workaround is to simply Fourier transform a k to get a(x) which verifies [a(x), a † (y)] = δ(x − y). Note the crucial fact that ψ(x) = a(x), as the Fourier transform (21) does not contain the factors ω k . Intuitively, a † (x)|0 a corresponds to a (bare) relativistic particle localized in x. Working with this new operator basis is the key to make CMPS adapted to relativistic theories.

B. Definition and basic properties
The previous discussion naturally leads us to introduce the relativistic CMPS (RCMPS) ansatz: is the Fourier transform of the creation operator diagonalizing the free part of the relativistic Hamiltonian under consideration. Note that we choose to work directly in the thermodynamic limit, and thus the state |Q, R has no cut-off, UV or IR (although the latter is trivial to reintroduce).
The properties of the state are the same as before if one replaces ψ by a, as these operators obey the same algebra. A large part of the theory of CMPS can thus be reused. In particular, all normal-ordered N -point correlation functions of a, a † can be computed efficiently using the same generating functional. Additionally, we have that RCMPS are dense in the appropriate QFT Fock space (the manifold is maximally expressive) and thus that the precision can be arbitrarily refined by increasing D.

C. Consequence on the Hamiltonian density
The main difference with the non-relativistic case is that, once written as a function of a(x), the Hamiltonian of a (massive) relativistic theory is not strictly local, but only exponentially decaying. Indeed a relativistic Hamiltonian is local in the field φ(x) and its conjugate, which do not have a local expression as a function of a(x). More precisely, where J(x) is a smooth kernel for x = 0 (and not a Dirac distribution) which I will make explicit later. Consequently, the Hamiltonian density of a relativistic theory with terms of order up to n in the field φ can be written as a function of a(x) with n integrals, a bit informally: where a ( †) is a compact notation to convey the fact that both creation and annihilation occur in a sum, and K is a kernel that decays exponentially as a function of the difference of its arguments. Is this non-locality a problem? Technically, it certainly induces complications in evaluating the energy density: we have exact expressions for N point functions of a, a † which we then have to integrate over some kernel (instead of taking them at equal point for a local density). This is however not insurmountable, and as I will show in the next section, one can still compute the Hamiltonian density at a cost ∝ D 3 .
More crucially, and provided we can optimize them, do we expect the expressive power of RCMPS to be lower for such Hamiltonians? There is no reason to think so, and in fact CMPS have already be applied to Hamiltonians with exponentially decaying interactions in the non-relativistic context [39]. Intuitively, we are merely introducing a new length scale m −1 that replaces the lattice scale in lattice models. This is more physical than introducing a much smaller and arbitrary cut-off scale Λ −1 m −1 as was done previously. Other choices of length-scales that could further improve precision are discussed in sec. VII.

A. Naive direct evaluation
The operators a † (x), a(y) have the same commutation relations as the ψ † (x), ψ(y) of standard non-relativistic CMPS and thus all the formulas follow. In particular, one can straightforwardly compute the exact expectation values of normal-ordered products of a † and a using the generating functional (3).
Obtaining simple functions of the field φ, e.g. expectation values of normal-ordered monomials : φ n : is less straightforward because the field φ is obtained from a, a † with a convolution (23). Hence, to compute the expectation value of : φ n : , one a priori needs to compute n integrals (in fact n − 1 using translation invariance) of exact a, a † correlation functions This is feasible for low bond dimension and can be used as a sanity check, but is prohibitively expensive for a full optimization of the state 2 .

B. Vertex operators
For efficient computations of functions of the field φ, the starting point is to compute expectation values of vertex operators: which we may evaluate in x = 0 without loss of generality because of translation invariance. Such an expectation values seems even more difficult to compute than field monomials at first sight. Indeed, using the naive approach above and expanding the exponential (26) provides n integrals at each order n. However, it turns out one can directly compute the expectation value without expanding the exponential.
To this end, we simply express the vertex operators as a function of a(x) where J is a real function = K 1/4 (|x/m|) 2 9/4 √ π Γ(5/4) |x/m| 1/4 (29) and K ν (x) (not to be confused with the matrix K) is the modified Bessel function of the second kind. We see from (27) the crucial fact that the expectation value of a vertex operators is simply the generating functional itself, with bJ as source, i.e. V b = Z bJ,bJ , thus explicitly (30) Inside the trace, this is simply the solution of an ordinary differential equation that can be solved numerically.
To reduce the computational cost, one can use the standard trick of matrix product states which consists in mapping the tensor product Hilbert space C D ⊗ C D to the space of matrices ρ acting on C D . Introducing the super-operators we have where ρ ss is the stationary state of L normalized with trace 1. Rewriting the path-ordered exponential as the solution of an ODE we have where lim x→−∞ ρ x = ρ ss and The limit (34) is well defined because J(x), and thus R(x), decrease exponentially fast at infinity and are integrable in 0. Because of this fast decay at infinity, one could in fact use any density matrix as initial state. Using a simple ODE solver, e.g. a backward differential formula (BDF) solver, one can obtain the limit in (34) to arbitrary precision with only a reasonable number of subdivisions. The total computational cost is proportional to the cost of applying the super-operator L + bR(x) on a density matrix and thus scales ∝ D 3 only.

C. Field monomials
To compute field monomials, one can differentiate vertex operators with respect to their exponent b This allows to obtain : φ n : by directly differentiating the ODE (35). Doing so yields with the convention that ρ (0) x ≡ ρ ss and for k > 0, ρ (k) −∞ = 0. Solving the ODE above numerically provides arbitrarily accurate approximations of : φ n : at a cost ∝ n × D 2 .

D. Kinetic term
In addition to exponentials and polynomials of the field φ, it is important to be able to compute the expectation value of the free part of the Hamiltonian.
For convenience, we consider directly the Hamiltonian for the massive free boson, but since the mass term m 2 : φ 2 : is also computable, it could be subsequently subtracted to obtain the pure kinetic term. The free boson Hamiltonian can be expressed as a function of the momentum space creation and annihilation operators a † k , a k and reads The corresponding Hamiltonian density h fb (x) is As before, we would like to write this density as a derivative of a vertex operator. If we try to mirror the reasoning of the previous subsection, we face the issue that the natural sourceJ that appears now is the Fourier transform of √ ω k . This is not a function but only a distribution.
To get a true function, we can divide and multiply by ω 2 k = m 2 + k 2 , and interpret the k 2 term on the numerator as a ∂ x ∂ y derivative. This gives We are back to an expression that depends on the source J(x) that we introduced before (28) : except this time derivatives of a, a † appear as well. This gives where Y j ,j is the generating functional of normal ordered correlation functions of ∂ x a † (x), ∂ y a(y). This generating functional also has an exact expression, which is easily derived by differentiating correlation functions obtained from Z with respect to position and σ x := σ b1b2 x with initial solution σ −∞ = ρ ss and dynamics We further introduce notations for the partial derivatives ρ (1,0) := ∂ b1 ρ| b1,2=0 , ρ (0,1) := ∂ b2 ρ| b1,2=0 and ρ (1,1) Hence the expectation value of the massive free boson Hamiltonian density : h fb : can be computed by solving 2 systems of 3 coupled matrix ODEs, and thus can be obtained to arbitrary precision at a cost ∝ D 3 .

A. Failure of naive optimization
Using the results in the previous section, one obtains an expression for the energy density of the form h = f (Q, R) where f is a function of the matrices R and Q (in practice K) that can be evaluated efficiently on a classical computer at a cost ∝ D 3 . One may thus simply input this function to a standard minimizer, that will typically use a gradient computed by finite differences and hope for the best. This is what was done in the original paper on CMPS applied to the Lieb-Liniger model [23]. For our model, this approach works reasonably well up to D = 4, after which all the standard optimizers (e.g. L-BFGS or conjugate gradient) get stuck in plateaus. To understand why it happens and go beyond such small values of D, we need to do better, and use a tangent space approach.

B. Tangent space approach
It is well known that the notion of steepest descent in optimization depends on a choice of metric. More precisely, if we want to minimize a function f (x) where x = {x µ } N µ=1 is a vector of parameters (think of the coefficients of R, Q), we can go down the steepest descent direction u defined by This scalar product on the tangent space u, v x = g µν (x)u µ v ν and associated metric g µν are a priori arbitrary. The notion of "steep" depends on a metric, and what is steep for the "right" metric g µν may look like a plateau for the naive δ µν metric if g µν is singular. If there are many parameters, the naive metric has no reason to be good. But what is the right metric? It is one where the distance between parameter values is proportional to how much they change the function one optimizes. An excellent choice of metric is thus given by the Hessian Hess µν := ∂ µ ∂ ν f of the function one is optimizing. Taking this metric gives the descent direction u ∝ −[Hess −1 ] µν ∂ ν f where [Hess −1 ] µν Hess νρ = δ µ ρ , which corresponds to the famous Newton method. This matrix is costly to estimate for RCMPS, because it requires computing ∝ D 4 derivatives of the energy density, instead of ∝ D 2 if we only compute the gradient.
There is another natural option that comes from the fact that, in our case, the tangent space is also a Hilbert space [40]. Indeed, let us write |x = |Q, R a state in the manifold of RCMPS. Then a natural tangent space metric is simply the Hilbert one It provides a notion of distance between parameter values associated to how much they change the quantum state (instead of the energy). Further, in our case, it can be computed straightforwardly (it is instantaneous in comparison with the computation of the gradient). A more physical justification for the use of this metric is that it corresponds to (approximate) imaginary time evolution [40], which converges exponentially fast for a gapped system and an expressive enough state manifold. Indeed, upon an infinitesimal imaginary time evolution dτ an RCMPS |x evolves into |x − dτ H|x . This latter state no longer belongs to the RCMPS manifold, and to get an approximate evolution we need to project down the evolution to the tangent space. More precisely, we want to find a direction u ∈ R N in the tangent space such that u µ ∂ µ |x −H|x . It is obtained by minimizing u µ ∂ µ |x + H|x 2 , which gives provided g µν is invertible which we will assume here. This projected imaginary time evolution corresponds to the (imaginary) time dependent variational principle (TDVP) in the tensor network context [34]. In my opinion, the advantage of seeing imaginary TDPV simply as gradient descent with a different metric is that it makes it obvious the time step does not need to be infinitesimal, and can be chosen optimally with a line search. In practice, I observed for D ≤ 4 that quasi-Newton methods, which try to estimate the best metric (the Hessian) from the gradient at different iterations, are still reasonably efficient. For larger D, I found that the metric g µν becomes very singular near the ground state, which may explain why quasi-Newton methods fail to estimate the Hessian (which is likely very singular as well) and get stuck in plateaus. However, as we will see in V E, the tangent space approach I presented converges fast even for large values of D as one would expect. For the optimization RCMPS in moderately large D, it is thus better to have an exact "good" metric, than an approximation of the best metric.

C. Computing the metric
The metric can be computed easily following [34]. The first step is to define the tangent space vectors The complex matrices V, W parameterize the direction in the tangent space, and Q, R the point on the RCMPS manifold. We work in the translation invariant case, where Q, R are taken position independent at the end, but the position argument x in (54) is necessary to know how operators are ordered. A crucial fact is that the tangent space is overparameterized and, with the left canonical choice (7) we took for Q, i.e. Q = −iK − 1 2 R † R, one is free to fix V = −R † W without losing a linearly independent direction [34]. We may thus drop V as a parameter, as it is fixed by W . With this choice, one can show [34] that the overlap between tangent vectors takes the particularly simple form where ρ ss is the (normalized) stationary state of the Lindbladian L defined in (31). We thus have 2D 2 directions on the tangent space corresponding to the real and imaginary parts of the coefficients W αβ . The metric is simply the bilinear map taking two W and outputing the real part of (55). Note that the metric depends on the state only and is cheap to compute. Indeed, it does not require the resolution of ODEs which are the numerical bottleneck of RCMPS.

D. Computing the gradient with an adjoint method
To compute the gradient of the energy density in the 2D 2 independent directions, one a priori needs ∝ 2D 2 computations of expectation values each with a cost ∝ D 3 . However, using standard adjoint methods (a.k.a. backpropagation), one can compute the complete gradient with the same asymptotic cost as computing the energy, hence ∝ D 3 . In principle, this could be done by using a complex ODE solver compatible with automatic differentiation. In practice, because the ODEs involved have a special form, it is easy, efficient, and illuminating to implement the adjoint method directly. I illustrate the idea on the example of the computation of the gradient of a vertex operator, but it applies equally easily to the gradients of field monomials and kinetic term. The expectation value of a vertex operator on a RCMPS is Let us consider the gradient in the W direction Differentiating directly (56) yields (58) with the notation L b (x) = L + bR(x) and recalling again that V = −R † W . This gradient of L b (y) appears in (58) between two evolution super-operators. Because a trace is taken at the end, we can replace the last part of the evolution from y to +∞ by its adjoint acting on the identity where the adjoint L b * (y) of L b (y) is defined as Writing as before ρ x = Pe This can be further simplified exploiting the expression of ∇ W L b (y) (59), and one obtains all the components of the gradient from 2 matrices M W and M W † In practice, one computes ρ y and O y by solving the corresponding ODEs. The matrices M W and M W † are then obtained by evaluating the integrals in (64) with an efficient numerical method like the tanh-sinh quadrature [41]. This gives an algorithm with a cost ∝ D 3 to compute the full gradient of the expectation value of a vertex operator. The gradients of the kinetic term and of other potentials can be computed efficiently with the same method.

E. Algorithm
We now have all the pieces to understand the optimization algorithm. The first step is to start from an initial guess. One can certainly do much smarter, but I started from uniformly random R and K matrices. This gives a very high starting energy density, but it fortunately decreases fast enough that initialization is a secondary concern for the bond dimensions I probed. The second step is to compute the descent direction, obtained by acting with the inverse metric on the gradient. The gradient is computed with the backpropagation method described before, and is the most costly step, while the computation of the inverse metric is an immediate algebraic operation using (55).
The third step is to move R and Q in the descent direction. The step size need not be small, as in imaginary time evolution, and it is chosen to approximately yield the maximal energy decrease at each step. In practice, I used a backtracking line search with Armijo-Goldstein condition to find the the right amount to move at each step. Note that this is very similar to what was done by Ganahl et al. in [24] for standard CMPS and the Lieb-Liniger model. Like them, I observed that this approach speeds up the optimization by roughly 2 orders of magnitude compared to imaginary time evolution with an optimal but fixed time step. Typically, results are converged after ∼ 10 2 − 10 4 iterations depending on the coupling (convergence is slower near criticality), bond dimension (convergence is slower for large D), and random initial seed.

A. The model
To assess the soundness of RCMPSs, we consider the simplest non-trivial QFT in 1 + 1 dimensions, the selfinteracting scalar field with Hamiltonian The normal ordering is again done with respect to the creation and annihilation operators a † k , a k which diagonalize the quadratic part of the Hamiltonian and are defined in (17).
This model is a good case study because it is simple to define, even rigorously, as H is a genuine renormalized Hamiltonian (self-adjoint, finite energy density). Yet, the model is not integrable, and carrying accurate computations out of the perturbative regime is non-trivial. The self-interacting scalar has been studied with a wide variety of methods: renormalized Hamiltonian truncation (without space-time discretization but finite size) [42], infinite matrix product states (with space discretization but no finite size cutoff) [17,43], Monte-Carlo (with space-time discretization and finite size) [44][45][46], tensor network renormalization (with space-time discretization and finite size) [21,22], and, of course, (resummed) perturbative expansions (without cutoff, but perturbative) [47].
Out of these works, let us mention two that are particularly relevant for our study as they are carried in the Hamiltonian formalism. The study of Milsted et al. [17] is the the closest, in terms of method used, to what we shall do: the authors discretize the model in space, and find the ground state with translation invariant matrix product states, thus without IR cutoff, reaching unbeatable precision at the time. The drawback is the need to extrapolate the continuum limit, i.e. the UV cutoff. On the other hand, the remarkably pedagogical Hamiltonian truncation study of Rychkov and Vitale [42] is carried directly in the continuum with the Hamiltonian (65). In a nutshell, the authors introduce an IR and an energy cutoff that make the Hilbert space finite, and exactly diagonalize the resulting finite Hamiltonian matrix. In addition, they introduce a smart renormalization procedure which makes the convergence of the results faster as the cutoffs are lifted. The drawback is that the size of the Hilbert space (and thus the cost) is exponential in both cutoffs (I discuss it further in VIII). Further, while the renormalization procedure and careful extrapolations drastically improve precision, they destroy the variational nature of the results.
My objective is not so much to improve upon these studies in terms of raw numerical precision than in terms of conceptual simplicity, robustness, and scaling: with RCMPS one can in principle find the ground state of (65) directly in the continuum limit, without UV or IR cutoff, and in a variational way (that is, with rigorous energy upper bounds).

B. Ground energy density
The ground state energy density is finite and negative for the model we consider. This is clear given that the Fock vacuum for the free part |0 a , which is no longer the ground state for g = 0, already gives a zero energy because the Hamiltonian is normal ordered. Optimizing the RCMPS for m = 1 indeed shows that the energy density is negative and decreases as the coupling is increased (see Fig. 1), at first quadratically in the coupling constant g (as expected from perturbation theory [42]).
Importantly, the convergence in bond dimension is fast for all values of the coupling, even deep in the nonperturbative regime (which kicks in roughly at g ≥ 0.1). As we can see in Fig. 1 the energy as a function of g is already qualitatively correct for D = 5, and the points at larger bond dimensions (D = 10,15,20) are essentially indistinguishable on this plot. At large coupling (g ≥ 3 the results are substantially below those obtained with renormalized Hamiltonian truncation (RHT) [42], which means they are more accurate since the method is truly variational.
To estimate the error, it is not possible to compare with an exact solution (φ 4 2 is not integrable), nor with earlier numerical results, e.g. RHT, which are less precise even in their latest high precision development [48]. To get an accurate point of comparison, I simply considered a large D estimate of the energy density as reference to estimate the error at lower bond dimensions. For D = 32, I obtained the rigorous bounds h g=1 ≤ −0.039354 and h g=2 ≤ −0.157214, which can be used as fairly good estimates of the true values. The resulting errors for lower bond dimensions are shown in Fig. 2 for g = 1 and g = 2 and provide hints that the convergence to the ground energy is close to exponential in the bond dimension, or at least faster than any power law as one expects from the discrete [30]. Retrospectively, this fast convergence implies that our D = 32 points of comparison are exact enough, with an error to the true ground state much smaller than that of the lower D points whose error is estimated.
Note that the energy density obtained with RCMPS is the renormalized one and thus would be very difficult to estimate with a similar precision with lattice methods. Indeed, the latter give access to the total energy density, which diverges in the continuum limit. One would need to subtract the diverging tadpole part to get the finite renormalized contribution. As one gets closer to the continuum limit, obtaining this finite correction to a fixed precision requires a prohibitively high relative precision on the total energy.
Finally, note that a second order phase transition occurs for g c 2.77 (for m = 1) [22]. Just like Rychkov and Vitale [42] with RHT, and as expected from the similarity with the 2d Ising model, we see no sign of this transition in the ground state energy density. As we will see, the transition appears more clearly once we look at observables like the expectation value of the field φ (which behaves like the Ising magnetization).

C. Observables
Once an RCMPS is optimized to approximate the ground state of a given Hamiltonian, expectation values of operators come essentially for free. As an illustration, I show the expectation values of φ (Fig. 3) and : φ 2 : (Fig. 4) but one could equally easily consider : φ 42 : or : cosh(φ) :.
The expectation value of φ, which is the equivalent of the Ising magnetization for the φ 4 model, is the most instructive. It shows a clear spontaneous symmetry breaking around the expected coupling g c 2.77. To locate it more precisely (with a precision closer to the lattice extrapolations [22]) and estimate the critical exponents, one would need to compute more data points near the critical point, for a wide range of D, and use finite entanglement scaling techniques [35,43,49]. This is left for future work. Consequently, the exceptional numerical accuracy of the ansatz is so far limited to the gapped phases on both sides of the critical point.

A. Adjustable characteristic length-scale
The core idea of RCMPS compared to CMPS is to use creation operators such that the theory is exactly solved at short distance. We used the pair a(x), a † (x) associated to the free part of the theory with mass m. As we discussed, this introduces a length-scale m −1 for the exponentially decaying support of the Hamiltonian density, which is reminiscent of the lattice scale for lattice models. But this scale is arbitrary in our case and we could have chosen a different pair of creation-annihilation operators as long as the UV behavior is still exact.
The simplest extension is to consider mass-adjustable RCMPS, that is states of the form and where am are the operators diagonalizing the free theory of massm, and |0 m is the associated Fock vacuum. More explicitly, the field operators can be expanded in this new operator basis as where ω(k) = √ k 2 +m 2 . At short distances, large k, the state |Q, R,m still solves the theory exactly just like the RCMPS |Q, R = |Q, R, m , but the variable mass or inverse length-scalem = m gives an additional degree of freedom one can optimize to better fit the IR.
The computations with the mass variable RCMPS are more difficult than with the standard RCMPS, and the complications mainly come from the fact that the Hamiltonian density is not normal-ordered for the operators am, a † m . Evaluating the Hamiltonian density is thus tedious but doable, and I could carry the optimization. However, for the few values of coupling I tried, I obtained rather underwhelming results, only marginally improving the precision at the cost of a substantial increase in complexity and slower optimization.
A more thorough study should be done in future works. One could expect that carefully optimizing the lengthscalem −1 near criticality would yield more meaningful improvements, since the gap becomes much smaller than the mass m appearing in the Hamiltonian (corresponding to the one-loop renormalized mass). A more systematic exploration of Bogoliubov transformations would be interesting as well. In principle, one can change the a, a † into new operators b, b † by replacing ω k in the mode expansion (17) with any function Ω k with the same large k behavior so that expectation values are still UV finite. This should provide a substantial gain in expressiveness at fixed bond dimension, but the computations would be more involved.
Finally, I would like to emphasize that this idea of nonlocal change of basis to make the continuum limit well behaved or even simply increase expressiveness, which comes at the cost of making the Hamiltonian exponentially decaying instead of local, could be used on the lattice as well.

B. Excitation spectrum and beyond
Once optimized, a RCMPS can be used to compute all correlation functions at equal time for no additional minimization cost. As I argued, this also gives some dynamical information because of Lorentz invariance, but can one get more?
In principle, one can use TDVP in real-time to evolve states which means we have access to all dynamical properties. Note however in that case that one cannot use the trick of taking optimally large time steps, since, in real time, we no longer have a global minimization problem.
Using standard tangent-space CMPS techniques, one also has access to the excitation spectrum [34]. In a nutshell, the idea is to diagonalize the Hamiltonian in the tangent space |V, W Q,R which is a vector space orthogonal to the ground state (in the gauge we chose). This would give approximations only to the excited states with zero momentum, but simple local modifications allow to target states of non-zero momenta as well [34]. Carrying such computations requires evaluating matrix elements of the form V , W |h|V, W Q,R , which should be doable. A comparison of this spectral data from the one one could extract from the two-point function would be interesting.

C. Other quantum field theories
I illustrated the use of RCMPS on the self-interacting scalar field only, but it can in principle be applied to almost all theories in 1 + 1 dimensions. Other bosonic theories with polynomial interactions (e.g. : φ 6 :) can be treated without new technique. Scalar theories with exponential potentials, like the Sine-Gordon and Sinh-Gordon models can also be dealt with immediately since expectation values of vertex operators are straightforward to compute.
Fermionic theories could also be dealt with directly, with a minor subtlety related to regularity conditions [32], that is conditions on R, Q one has to impose to make expectation values finite. The Gross-Neveu model [50], which has already been studied with various UV cutoffs with tensor networks [36,51], would be an interesting candidate to probe the behavior of a "just renormalizable" theory. Alternatively, a large class of Fermionic models, like the Thirring model, can already be dealt using bosonization and the results of the present paper.
There are however serious difficulties remaining to extend the method to relativistic QFT in 2 + 1 and 3 + 1 dimensions. The first is specific to the Hamiltonian formalism, as renormalized Hamiltonians are more difficult to define in higher dimensions: normal ordering is no longer sufficient and the renormalized Hamiltonian no longer acts on the free Fock space [52]. Nonetheless, recent progress was made using renormalized Hamiltonian truncation [53] and lightcone conformal truncation [54], with promising numerical results, showing that the Hamiltonian approach is still reasonable in 2 + 1 dimensions and in the continuum.
The second difficulty is related to continuous tensor network states themselves, that is the higher dimension equivalent of CMPS. In the non-relativistic setting, these states have been proposed in [27], but evaluating correlation functions is so far efficient only for Gaussian states [55]. Indeed, the equivalent of the finite transfer matrix T of CMPS in 2 + 1 dimensions is an operator acting on (two copies of) the Fock space of a 1 + 1 dimensional relativistic QFT. In principle, one could bootstrap the present approach and evaluate correlation functions (or the energy density) in 2+1 using a boundary RCMPS approach. In a nutshell, one would find the stationary state of the transfer matrix as a (large bond dimension) RCMPS. Every evaluation of an expectation value in 2+1 would be done at the cost of a full optimization in 1 + 1. This is the typical dimensional reduction obtained with tensor network approaches, where a physical dimension is traded for a variational optimization. Clearly, optimizing RCMPS is so far orders of magnitudes too slow to be used as a routine to be called many times in a full optimization process. I hope the present work can stimulate work in this direction.

VIII. DISCUSSION
In this paper, I have presented a new class of states, the relativistic continuous matrix product states, that is adapted to relativistic quantum field theories. It is usable directly in the thermodynamic limit (no IR cutoff) and is exact at short distances, all the way down (no UV cutoff). The bond dimension D controls the expressiveness of this class, and a state has 2D 2 independent parameters.
The comparison between RCMPS and Hamiltonian truncation is a good illustration of the difference between linear and non-linear methods. Hamiltonian truncation (up to its renormalization refinements) is a linear approach. The candidate ground state is expanded in a truncated Hilbert space. The energy is quadratic in the coefficients, and thus minimized efficiently as a linear problem. The price to pay is a lack of extensivity, the size of the Hilbert space grows exponentially as the size of the system increases for a fixed energy cutoff (and thus fixed precision). A side effect is that the number of parameters needs to be huge to reach good precision, with the candidate ground state written as a linear superposition of typically 10 4 to 10 6 states [48]. In contrast, using a continuous tensor network ansatz as we did, we work with a manifold of states, and the energy density is a highly non-linear function to optimize. Nonetheless, it can still be done efficiently as I showed in V. Further, having a non-linear class of states allows us to have an extensive ansatz, where going to the thermodynamic limit does not increase the number of parameters (in the translation invariant case). The results at D = 5 provide an approximation to the ground state with only 2D 2 = 50 independent real parameters which is already more accu-rate than RHT at strong coupling.
This parsimonious encoding of the state translates into a better asymptotic behavior of the approximation. The error of Hamiltonian truncation decreases as a power law in the truncation energy E T , while the cost is exponential in E T (see e.g. [48]). In contrast, for RCMPS, the cost of the optimization is polynomial in D while, at least for the model I considered, the error seems to converge exponentially fast to zero as a function of D. Rigorous results for MPS [30] lead one to expect a convergence at least faster than any power law. This favorable scaling should be explored with more careful numerical simulations, and it would be interesting to see if it can be proved rigorously. Nonetheless, even without such a proof, RCMPS already provide rigorous and rather tight energy upper bounds. These rigorous results are made possible because we work directly in the continuum, without the need for extrapolations.
Naturally, there is a lot of work to do to generalize the RCMPS to a wider a range of quantum field theories. But I hope the present paper will motivate others to pursue this necessary exploration.