NLO Renormalization in the Hamiltonian Truncation

Hamiltonian Truncation (a.k.a. Truncated Spectrum Approach) is a numerical technique for solving strongly coupled QFTs, in which the full Hilbert space is truncated to a finite-dimensional low-energy subspace. The accuracy of the method is limited only by the available computational resources. The renormalization program improves the accuracy by carefully integrating out the high-energy states, instead of truncating them away. In this paper we develop the most accurate ever variant of Hamiltonian Truncation, which implements renormalization at the cubic order in the interaction strength. The novel idea is to interpret the renormalization procedure as a result of integrating out exactly a certain class of high-energy"tail states". We demonstrate the power of the method with high-accuracy computations in the strongly coupled two-dimensional quartic scalar theory, and benchmark it against other existing approaches. Our work will also be useful for the future goal of extending Hamiltonian Truncation to higher spacetime dimensions.


Introduction
Developing reliable and efficient techniques for computations in strongly coupled quantum field theories (QFT) remains one of the critical challenges of modern theoretical physics. In this paper we will be concerned with one such technique-the Hamiltonian Truncation (HT). 1 This method became popular after the work of Yurov and Zamolodchikov [1,2] in the late 80's-early 90's. 2 By now it's an established technique with many nontrivial results (see [4] for a recent review).
The HT is applicable to QFTs whose Hamiltonian can be split in the form H = H 0 + V where H 0 is exactly solvable. H 0 may be a free theory or an interacting integrable theory, such as an integrable massive QFT, or a solvable conformal field theory (CFT). V describes additional interactions. 3 The total Hamiltonian H is in general not exactly solvable and is treated numerically. To set up the calculation, one needs to know the energy eigenstates of H 0 in finite volume and the matrix elements of V among them. Then one represents H as an infinite matrix in the Hilbert space of H 0 eigenstates. This matrix is truncated to the subspace of low-energy eigenstates below some energy cutoff E T and diagonalized numerically. This procedure represents a natural adaptation of the Rayleigh-Ritz method from quantum mechanics to QFT.
The HT method is non-perturbative and a priori works for interactions V of arbitrary strength. It works best if the interaction switches off fast at high energy (in technical language, if V is strongly relevant). In this case the method converges rapidly, and accurate results can be obtained with low E T cutoff and with truncated Hilbert spaces of modest size. If on the other hand V is only weakly relevant, then the convergence is poor, as the truncated results exhibit significant E T cutoff dependence even for the highest numerically affordable E T 's. This is a limitation of the method.
Another, related, limitation is that so far most applications were in d = 2 spacetime dimensions (although in principle the method can be set up in any d [9]). The reason is that in d = 2 there are many physically interesting integrable QFTs and CFTs, which can play the role of H 0 . Many of these systems possess perturbations V which are strongly relevant-a favorable situation according to the above-mentioned convergence criterion. On the contrary, in d > 2 the only exactly solvable H 0 's are basically free theories, and the available interactions are typically weakly relevant or even marginal, so that the convergence is poor.
Motivated by the need to overcome these limitations, much recent work has focused on improving the convergence of the method. One natural idea is to construct a renormalized truncated Hamiltonian, whose couplings are corrected to take into account the effect of states above the cutoff which are truncated away. The renormalized truncated Hamiltonian is still diagonalized numerically, but its eigenvalues exhibit a smaller dependence on the cutoff. This method was developed in [10,9,11] where renormalization corrections of leading (quadratic) order in the interaction V have been considered. Leading-order (LO) renormalization has been successfully used to improve convergence in several HT studies [9,[11][12][13][14][15][16][17].
A natural hope [11,4] is that one can improve convergence even further by consider nextto-leading (NLO) order renormalization corrections. Previous work on this problem [18] led to somewhat pessimistic conclusions: it was found that the most straightforward NLO renormalization performs poorly. The goal of our paper will be to present a different implementation of NLO renormalization which overcomes the difficulty found in [18] and improves convergence compared to the LO methods. A short exposition of our results has appeared in [19].
The paper is structured as follows. In section 2 we review previous work on the renormalized HT and describe our approach to NLO renormalization. Our construction is completely general and is presented as such. In the rest of the paper we apply NLO-renormalized Hamiltonian Truncation (NLO-HT) to one particular strongly coupled QFT-the φ 4 theory in two spacetime dimensions. This is a field theory interesting both in its own right, and as a benchmark model for testing the HT method. This theory has been studied by renormalized HT in our prior work [11,14,18], 4 and so it will be easy to compare the performance.
In section 3 we remind the setup of the HT method as applied to (φ 4 ) 2 . We then explain how our general NLO-HT construction from section 2 can be implemented for this theory. In section 4 we present numerical results. We study the spectrum dependence on the Hilbert space cutoff and show that the convergence is both smoother and more rapid for NLO-HT than for the LO renormalized HT. We discuss the spectrum dependence on the volume L and the extrapolation to the infinite volume. Finally, we study the dependence of the spectrum on the quartic coupling g, and determine the critical coupling where the theory transitions to the phase of spontaneously broken Z 2 symmetry. Then we conclude.
The interested reader will find much further useful information in the appendices. Appendices A,B,C are devoted to conceptual issues: general considerations and numerical experiments regarding the structure of interacting eigenstates in finite volume (in particular how the orthogonality catastrophe is avoided), problems with naive implementations of renormalization corrections, and connections of the renormalized HT with the time-honored Brillouin-Wigner and Schrieffer-Wolff constructions of effective Hamiltonians. The rest of the appendices are more technical (see the table of contents). 4 It has also been recently studied by Coser et al [20] using a variant of the Truncated Conformal Space Approach (TCSA) [1], by Bajnok and Lájer [21] using the HT, and in [22][23][24] via the light front quantization. These papers did not use renormalization improvement. The Hamiltonian H 0 is assumed to have an exactly solvable discrete spectrum of eigenstates, which form a basis in the Hilbert space H. The matrix elements of V among H 0 eigenstates are assumed known, so that we can view H as an infinite matrix acting in H. In many applications V is an integral of a local operator: For a concrete example, think of H 0 describing a free massive scalar field φ in 1 + 1 dimensions, H the Fock space, and O = : φ 4 : the quartic interaction. This example will be considered in detail below. For the moment we would like to stay general.
Let us now pick an energy cutoff E T and divide the Hilbert space into the low-and high-energy subspaces: where H l is spanned by basis states with H 0 -eigenvalue E E T . 6 Notice that one could in principle consider different types of cutoff, which depend not only on E T but on other conserved quantum numbers which may be present in the integrable Hamiltonian H 0 , for example, occupation numbers of individual momentum modes for free H 0 . It's a tantalizing but little-explored possibility that significant improvement can be achieved by considering alternative cutoffs (see appendix A).
The HT method constructs the "truncated Hamiltonian", which is the Hamiltonian H restricted to the finite-dimensional subspace H l . The truncated Hamiltonian is diagonalized numerically, producing "raw" [9] spectrum. We will assume that the scaling dimension of the perturbing operator O is below d/2. In this case the raw spectrum converges to the exact finite volume spectrum for E T → ∞ [26,9]. However, in practice one cannot push to very high E T as the dimension of H l grows exponentially (see appendix A). In many practically interesting cases one finds that the convergence error is still non-negligible at the maximal numerically accessible cutoffs. This calls for improvements. 5 In relativistic QFTs one can also quantize on surfaces of constant light-cone coordinate. This light front quantization [5] is also used in numerical solutions of strongly coupled QFTs via a version of HT; some recent work is [7,8,22,23,25,24]. The structure of the unperturbed Hilbert space is different from the equal time case, which leads to important differences in the numerical procedure. All technical claims in this work will refer exclusively to the equal time quantization. 6 E T = E max in the notation of [11].

Integrating out versus truncating
A natural way to reduce the convergence error is to integrate out the high energy states rather than to simply truncate them away. This can be done rigorously as follows. The eigenvalue equation for the full Hamiltonian in the full Hilbert space is: Let c = (c l , c h ) be the low-and high-energy components of the eigenvector c. We have: 7 (2.6) From the second equation we have Substituting this into the first equation we obtain H eff .c l = Ec l , c l ∈ H l , (2.8) where (2.10) The eigenvalue equation (2.8) in the truncated Hilbert space is exactly equivalent to the original eigenvalue equation (2.4) in the full Hilbert space. The term ∆H takes into account the removal of the high energy states. Needless to say, ∆H cannot be found exactly in any situation of interest, because E − H hh is impossible to invert exactly. However, one can hope that it can be found approximately, and that using these approximations and diagonalizing H eff one can reduce the convergence error compared to the raw truncation at the same cutoff value. This will be discussed below.

Historical comment
The above effective Hamiltonian construction was first brought to bear on the problem of renormalized HT in [9,11]. However, in the general quantum mechanics context, it goes back at least as far as the work of Feshbach [27,28] and Löwdin [29] around 1960. It is also used in quantum chemistry, see e.g. [30,31]. There, the procedure of dividing the Hilbert space is called 'partitioning', H l and H h the 'model' and the 'outer' space, and H ll + ∆H(E) the 'intermediate' Hamiltonian. The approximation (2.12), see below, is also commonly used.
See also appendix C for parallels between the renormalized HT and two other expansions used previously in quantum physics (the Brillouin-Wigner series and the Schrieffer-Wolff transformation).

Leading-order renormalized HT
The simplest method to reduce cutoff effects and improve convergence of the HT is the local LO renormalization, first argued in [10]. It is easy to implement in practice and it has been used in several recent HT studies [9,[11][12][13][14][15][16][17].
The method is best justified by viewing it as a particular approximation to ∆H [9,11]. Earlier work on the renormalized HT idea includes [32][33][34]. We disagree with these papers and with [10] on several conceptual points, and especially on the treatment of subleading effects, as discussed in [9], section 5.4.
Consider a formal expansion of ∆H in powers of V hh (2.11) Let us keep only the first term in this expansion (thus we approximate H hh ≈ H 0 hh in (2.10)): Although the matrix in the denominator is now diagonal and easy to invert, the definition still involves an infinite sum over all high energy states, and some approximation is required in order to compute it. The simplest and the most widely used is the local approximation [9,11], which adds small corrections to local couplings: Here O i are some local operators of the theory (the original interaction O will be typically one of them). Coefficients κ i (E T ) can be given analytically, using the operator product expansion (OPE) [10,9,11]. The (φ 4 ) 2 theory case will be treated in detail below.
Eq. (2.13) can be motivated as follows. By the effective field theory intuition, the local approximation can be expected to work well for the matrix elements (∆H 2 ) ij if the energies of the external states E i,j are much below E T , the lowest intermediate energy summed over in Eq. (2.12). These are the most important matrix elements, because the states with E i E T dominate the lower energy interacting eigenstates (see appendix A). The matrix elements among states close to the cutoff are not well reproduced by the local approximation, but those states are unimportant.
Replacing ∆H by ∆H loc 2 in H eff gives the local LO renormalized truncated Hamiltonian. Solving the eigenvalue equation (2.8) numerically, we obtain the "local renormalized" [11] spectrum. Empirically, this spectrum does show a smaller E T cutoff dependence than the raw spectrum, obtained by direct diagonalization of the truncated Hamiltonian H ll .

Beyond local leading-order approximation?
One modest improvement of the local LO approximation is the "local subleading" approximation discussed in [9,11]. For states well below E T , it partially takes into account subleading dependence of the matrix elements (∆H 2 ) ij on their energy. It performs slightly but not dramatically better than the local one. So it is important to look for further improvements.
Our goal will be to develop an NLO approximation, taking the cubic term ∆H 3 into account. Naive NLO would be to use the first two terms in (2.11) ∆H ≈ ∆H 2 + ∆H 3 (naive NLO) . (2.14) However, there is a difficulty in following this route [18]. To recognize it, let us go back to the LO approximation (2.12) and mention a subtlety glossed over in that discussion.
Notice first of all that while the local approximation (2.13) is convenient and natural, technically we are not forced to use it. The local approximation is good for E i , E j E T , but if we really wanted, we could actually compute ∆H 2 with reasonable accuracy for all energies below the cutoff, by splitting the infinite sum into two parts, treating one of them exactly, and the other approximately [18] (see section 3.2.1). Suppose we did it. Would we get better results for the spectrum using ∆H 2 instead of ∆H loc 2 ? Surprisingly, the answer is no. The explanation is as follows. When we replace ∆H by ∆H 2 , we already make an error. This error is small for E i , E j E T , but it turns out that it is very large for energies close to the cutoff. There, ∆H 2 overestimates certain matrix elements by many orders of magnitude. As we said, states close to the cutoff appear with tiny coefficients in the interacting low-energy eigenstates. So a moderate error involving the matrix elements among those states would not be important. However, the behavior of ∆H 2 near the cutoff turns out to be so bad that it ruins the spectrum. In this respect, using ∆H loc 2 instead of ∆H 2 is a blessing. While it adds another small error for E i , E j E T , it also regularizes the extremely bad behavior of ∆H 2 near the cutoff. Of course ∆H loc 2 remains inaccurate near the cutoff, but this inaccuracy is order one and does not affect the spectrum appreciably. Now consider the naive NLO proposal (2.14). The described problem with ∆H 2 is just the first sign that the series expansion (2.11) is inadequate for the matrix elements of ∆H involving states close to the cutoff E T (see appendix B). Given this problem, what can we do? To mitigate the bad behavior near the cutoff, we could try to treat ∆H 3 in (2.14) via a local approximation. However, to match the expected increase in accuracy, we would have to treat ∆H 2 better than in the local or the local subleading approximation, and at the same time regularize the bad behavior near the cutoff. It's not obvious what such an approximation might be.
In the next section we will present a modified approach to NLO renormalization, which neatly avoids all mentioned difficulties. Another approach, to be explored in the future, is outlined in appendix B.1.

NLO renormalization which works: NLO-HT
We will now describe our modified approach to NLO renormalization. Let us revisit the effective Hamiltonian construction in section 2.1.2. Let's focus on the key equation (2.7), which expresses the "tail", i.e. the high energy part c h of the eigenvector, in terms of its low-energy part c l . If we simply diagonalize H ll , we forget about these tails. On the other hand, the correction ∆H in the effective Hamiltonian takes the tails into account.
Our approach will take the tails into account in a slightly different way, motivated by the already mentioned connection between the Hamiltonian Truncation and the Rayleigh-Ritz (RR) method. In the RR method, one diagonalizes the Hamiltonian truncated to a subspace H RR of the full Hilbert space. For example, the raw HT method corresponds to H RR = H l . The cornerstone of the RR method is the variational characterization of the truncated eigenvalues provided by the min-max principle. It implies, in particular, that as the subspace H RR is enlarged, the truncated eigenvalues approach the exact eigenvalues monotonically from above.
The raw HT enlarges H RR by raising the energy cutoff E T , but this is exponentially expensive. A more efficient way to enlarge H RR would be to add new basis elements capable of reproducing the entire tails (2.7). This is the idea of our approach. Formally, we will proceed as follows. We will be applying the RR method in the subspace H RR of the form: where H l is the same as above with a certain cutoff E T , and H t is a finite-dimensional subspace of H h spanned by "tail states" defined below. Since this H RR is strictly larger than H l , we are guaranteed to do better than the raw truncation. How much better will depend on the choice of tail states.
Let |i be the Fock state basis of H l , i = 1 . . . D = dimH l . The tail states |Ψ i will be vectors in the high-energy Hilbert space H h . The "optimal" choice for |Ψ i would be (would-be optimal tails). (2.16) Since c l in (2.7) is a linear combination of |i , using these optimal tail states we could reproduce c h exactly, and so the RR eigenvalues would be equal to the exact eigenvalues.
The optimal tails cannot be found and manipulated exactly, for the same reason that ∆H in (2.10) cannot be found exactly. Instead, we will use a simple approximation to the optimal tail states: Here we replaced the exact eigenvalue E by some reference energy E * which will be eventually chosen close to a given eigenvalue of interest. We also replaced H hh by H 0 hh . We will see that these simpler tail states are tractable. We will also see that the RR method using the simpler tails performs significantly better than both the raw truncation and the LO renormalization procedures. This is a sign that the simpler tails do approximate the optimal tails reasonably well.
So, subspace H t in (2.15) will be spanned by |Ψ i defined in (2.17). In the numerical calculations of this work we will always include the full set of tails T = {1 . . . D}. However, a priori we can include tail states corresponding to any subset i ∈ T ⊂ {1 . . . D} of low-energy states. In this section we will develop the theory for such a general case. 8 The reader may be wondering what all this has to do with the NLO renormalization. This will become clear later, once we formalize the procedure. Consider the eigenvalue equation (2.4) truncated to the H RR subspace (2.15). In operator form we have P RR HP RR |ψ = E RR |ψ , (2.18) where |ψ ∈ H RR , P RR is the corresponding projector, and E RR is the RR eigenvalue. We will call it E from now on, although it's only an approximation to the exact eigenvalue appearing in (2.4) and (2.8). In matrix form the equation becomes where c = (c l , c t ) are the components of |ψ when expanded in the basis of H RR :

20)
H RR is the matrix of H in the same basis, and G RR is the Gram matrix. Since the tail states live in H h , the Gram matrix has the block-diagonal form: The part G tt = G tt (E * ) is nontrivial because the tail states are not orthogonal; it is given by: Consider now the block structure of H RR : Here H ll is the usual Hamiltonian truncated to H l . The other blocks must be worked out using the definition of tail states. It turns out that they can be conveniently expressed in terms of ∆H 2 and ∆H 3 discussed in the previous section: (2.27) See section 3.2.3 for a discussion of how one could proceed to find the spectrum directly from these equations and of computational advantages it could bring (in the context of the φ 4 theory).
In this paper we will instead transform the problem to an equivalent form by eliminating the tail components c t and deriving an effective equation involving only c l . While this step is not strictly speaking necessary, it will bring additional physical insight on the method. So, expressing c t from the second equation and substituting into the first, we get an analogue of (2.8): (2.30) We emphasize the notation: every time a matrix has a subscript l (resp. t) it means that the corresponding index runs over the full {1 . . . D} (resp. over the subset T ).
In our computations we will always choose E * sufficiently close to E for the states of interest (which will be the lowest energy states in both parity sectors), and neglect the last term in the denominator. 9 Also let us specialize to the case when T is the full set of tails, as will be in all numerical computations below. In this case we obtain a simplified expression: where all matrices have indices running over the full basis of H l . This is our main theoretical result. In the rest of the paper we will test how this correction performs, in the context of the two dimensional φ 4 theory.
Finally let us clarify the relation with NLO. Performing a formal power series expansion of ∆ H in ∆H 3 up to the first order, we obtain: For E ≈ E * , these are the same two terms as in the naive NLO correction (2.14). We see that our approach based on ∆ H will capture O(V 3 ) corrections, unlike the studies in [9,11,14,18] based on ∆H 2 . For this reason we will refer to ∆ H as "NLO renormalization correction". In practice we will of course use the full expression (2.31) without expanding.
Of course, ∆ H is not identical to the naive NLO correction, differing by the higher order . . . terms in (2.32). That's good because naive NLO fails, as discussed in section 2.1.4. On the other hand our NLO approach is guaranteed not to fail. This is because we arrived at our ∆ H via a variational route. Since the Hilbert space (2.15) is strictly larger than the raw truncated Hilbert space H l , our NLO renormalization is guaranteed to perform better than the raw truncation. As we will see, it also performs better than the local LO renormalization from section 2.1.3.

NLO-HT for (φ 4 ) 2 theory
In the previous section we gave a general description of NLO renormalized Hamiltonian Truncation (NLO-HT). In the rest of the paper we will apply this method to one particular strongly coupled QFT: the φ 4 theory in d = 2 spacetime dimensions. In this section we describe implementation of the method, and in the next one the numerical results. As we have already studied the (φ 4 ) 2 theory in [11,14,18] using the LO renormalization, it will be very instructive to compare.

The (φ 4 ) 2 theory
We give here only the minimal information, see [11] for the details. The theory is defined by the normal-ordered Euclidean action (3.1) We quantize it canonically on a cylinder with periodic boundary conditions, expanding the field into creation and annihilation operators: Here x is the coordinate along the spacial circle of length L, while τ ∈ R is the Euclidean time along the cylinder.
In terms of normal-ordered operators, the Hamiltonian is a sum of the free piece and the quartic interaction, plus finite-volume corrections, The E 0 (L) and z(L) terms are exponentially suppressed in the limit Lm 1. They are discussed in [11] and defined in Eqs. (2.10), (2.18) of that paper, which we do not reproduce here. Introduction of these terms is necessary for putting the theory correctly in finite volume. For example, E 0 (L) can be understood as the Casimir energy. In [11] these contributions were described, but then neglected in the numerical analysis. In this work they will be kept, as the numerical error will be sometimes smaller in comparison, allowing us to analyze these exponentially suppressed effects.
The Hamiltonian H acts in the free theory Fock space H Fock in finite volume L (we will consider volumes up to 10m −1 ). There are three conserved quantum numbers: total momentum P , spatial parity P (x → −x), and field parity Z 2 (φ → −φ). As in [11,14,18], we will focus on the invariant subspaces H ± consisting of states with P = 0, P = +, Z 2 = ±. The states in H + (resp. H − ) contain even (resp. odd) number of free quanta. The basic problem is to find eigenstates of H belonging to H ± . The two subspaces don't mix and the diagonalization can be done separately.
The lowest eigenstate in H + is the ground state in finite volume (the interacting vacuum). The interpretation of the lowest eigenstate in H − depends on the phase of the theory, namely if the Z 2 symmetry is spontaneously broken in infinite volume or not. The Z 2 -preserving phase is realized for moderate quartic couplings g/m 2 < g c , where the critical coupling was measured as g c = 2.97 (13) in [11], while here we will find a smaller but compatible value g c ≈ 2.8. In the Z 2 -preserving phase, the lowest H − eigenstate is the one-particle excitation at zero momentum. Excitation energy over the ground state then measures the physical particle mass m ph . In the Z 2 -broken phase at g/m 2 > g c , the lowest H − eigenstate is the second vacuum, exponentially degenerate with the first one at finite L [14,21].
In this paper we will focus on the Z 2 -preserving phase, below g c . We will use the NLO-HT method to measure the physical mass m ph as a function of the quartic coupling. We will also measure g c , as the point where m ph goes to zero. It will be instructive to compare with [11] where these measurements were done using the LO renormalized HT.

NLO-HT implementation outline
Here and below we will fix the units of energy by setting the mass to m = 1.
In our python code, we first build the Fock state basis of H = H ± up to a fixed energy cutoff E T . For example, we will use E T = 20 for L = 10, corresponding to order 10 4 states. We then evaluate the matrix elements of H between these states (i.e. the matrix H ll ) directly from the definition (3.4). This matrix is sparse, and it is important to organize this computation exploiting this sparsity maximally efficiently. Our current algorithm improves on [11]; it is described in appendix I. The subsequent steps are the computation of ∆ H and the numerical diagonalization; they are discussed below.

∆H 2
We need to evaluate the matrix element (∆H 2 ) ij between any two H l states. Recall that ∆H 2 is defined by (2.12) which is an infinite sum over intermediate states in H h . The choice of E * will be described below; for now let's keep it as a free parameter.
We introduce a new cutoff E L > E T ('L' for 'local approximation') and split this sum into "moderately high" states in the range E T < E k E L and "ultrahigh" ones of energy E k > E L [18]: The number of "moderately high" states, which contribute to ∆H < 2 , is large but finite. We will choose E L not excessively large, so that this finite sum can be done exactly; see appendix I for the algorithmic details. On the other hand, while the number of ultrahigh states contributing to ∆H > 2 is infinite, all of these states have energy significantly higher than the external energies E i,j . For this reason we will be able to approximate the matrix ∆H > 2 by a sum of local operators: The expected accuracy of the local approximation (3.8) is (E T /E L ) 2 . In principle we need E L E T , but in practice we will choose E L ≈ 3E T and we will check that it already gives a reasonable approximation (see appendix G). The local approximation can be justified using the operator product expansion (OPE) as in [11]; it can also be connected with the diagram technique (see appendix E). The coefficients κ N are given by [11]: 9) where µ N can be conveniently expressed as the relativistic phase-space integrals [18]. They can be computed in an m/E expansion and the leading terms are [11]: 10 (3.10)

∆H 3
The evaluation of ∆H 3 follows the same strategy as for ∆H 2 . We introduce an intermediate cutoff E L (in general different from E L ) and split the definition into four sums depending if the exchanged 10 µ N = µ 44N in the notation of [11]. In obtaining Eq. (3.10), the infinite length limit L → ∞ was taken. This is a good approximation for the volumes that we consider later in the numerical study. While we will keep exponentially suppressed term in the zeroth-order Hamiltonian (3.4), keeping such terms in renormalization corrections is unimportant at the current level of accuracy.
states k, k are moderately high or ultrahigh with respect to E L : We compute ∆H << 3 by evaluating and multiplying the involved finite matrices; see appendix I. For ∆H >> 3 we use a local approximation: This involves local operators up to V 6 as well as bilocal operators with up to eight fields, whose appearance is a novelty first observed here (see section E.3 and appendix F for details). 11 Concerning ∆H <> 3 , its definition can be rewritten as a finite sum over moderately high k: The ∆H > 2 here is the piece of ∆H 2 receiving the contribution from the ultrahigh states; it is given by (3.7) with E L → E L . For (3.7) we could use a local approximation since both external energies were much below the cutoff, but here we cannot do this right away, since E k may be close to the cutoff E L . To deal with this nuisance, we introduce a further cutoff E L > E L . Then, in the sum defining (∆H > 2 ) kj , the part over the intermediate states below E L is performed explicitly, and for the part above E L the local approximation (3.8) is used (with E L → E L ).
Let us now make some remarks on the computational cost of evaluating ∆H 3 , which is the most expensive step in the procedure. For the choice of parameters L, E T , E L , E L which we will use in section 4, the expressions (3.12) and (3.14) involve double sums over tens of millions of high-energy states. We are able to take advantage of the sparsity of the matrices to perform these sums relatively efficiently (see appendix I for the details). Still, this step limits the value of the local cutoffs and/or the number of tails that can be included. In the future, one may have to devise more efficient approximate procedure to evaluating the matrix ∆H 3 . One simple option would be to discard tail states that are not important for modeling the high-energy part of the eigenvectors c h in (2.7) (see note 8). Alternatively, one could consider varying degrees of approximation for the different matrix entries of (∆H 3 ) ij . For instance, if for a pair of states i, j it happens that (∆H 2 ) ij (∆H 3 ) ij , one might be justified in discarding altogether the smaller contribution for this matrix element. It would be very interesting to explore these and other possibilities. We leave this for future work, while here we will stick to the simple prescription described so far.
This finishes a rough outline of how the needed matrices will be evaluated. We would like to emphasize one feature of the proposed algorithm: the systematic split of all sums into moderately high and ultrahigh parts. The moderately high sums are done by simply evaluating and multiplying the needed finite matrices, while in the ultrahigh parts the local approximation can be used. In appendix D we will review a diagrammatic technique of [18], which in principle provides a different way of organizing the computation of ∆H n . Since that technique is not easily automatizable, we will not use it here for the moderately high region computations. However, it will be instrumental for analyzing the local approximation for the ultrahigh parts (appendices E, F).

Idea for the future
We would like to record here a promising idea which occurred to us late in this project, so that we had not had the chance to test it in detail. In the setup outlined in section 3, in which the full set of tails is added to the variational ansatz, we can raise the cutoff up to E T = 20, beyond which computing the matrix ∆H 3 becomes too expensive. On the other hand the raw truncation can be implemented up to E T = 35. We could further increase the accuracy of our procedure combining the two, i.e. by considering E T = 35 but introducing an incomplete set of tails for states below E t = 20 < E T (see note 8). We think this combination may be affordable if we analyze this problem directly via (2.27), without integrating out the tails, as we explain in section 3.2.4.

Diagonalization
We used an iterative Lanczos method diagonalization routine scipy.sparse.linalg.eigsh (based on ARPACK), with the parameter which='SA', intended for computing algebraically smallest eigenvalues. With this choice of parameter the matrix is not inverted and diagonalization times are smallest. Notice that this routine works both for sparse and non-sparse matrices. In our problem, the matrices H ll , ∆H 2 , ∆H 3 are sparse, but the matrix ∆H is not sparse because of the matrix inversion involved in its definition.
With the same routine one could implement the idea outlined in section 3.2.3, by passing the inverse of the Gram matrix G RR and solving directly (2.27). In that case every large matrix needed for the numerical algorithm will be kept in the sparse format, while the only non sparse matrix will be G −1 tt , of modest size. Below we will also compare NLO-HT to the raw truncation at much higher cutoff, up to E T = 35 when the Hilbert space contains millions of states. In this work, the full needed matrix is always evaluated and saved in memory, and then the diagonalization routine is called. When the involved matrices are sparse, it might be possible to use the option of evaluating the needed matrix elements 'on the fly', as opposed to prior evaluation and storage of the whole matrix. We have not explored this option in this work.

Numerical results
In the previous section we described how to set up the NLO-renormalized HT method for the (φ 4 ) 2 theory in finite volume. In this section we will present the numerical results which come out of this implementation. Recall that we are working in the units in which m = 1.
Our code is written in python and was run on a cluster with 100 Gb RAM nodes. As an example of required computational resources, one NLO-HT data point in figure 1 for L = 10 and E T = 20 requires 40 CPU hours and about 80 Gb RAM. Running time and memory requirements quickly decrease with E T . The whole scan for E T = 10 -20 in steps of 0.5 for a given g takes about 140 CPU hours. For the raw and leading LO renormalized HT, the maximal attainable E T was limited by available RAM, while the running time was faster than for the NLO-HT.

E T dependence
The numerical accuracy of the NLO-HT method is determined by the cutoff E T of the low-energy Hilbert space, and by the auxiliary "local" cutoffs E L , E L , E L introduced in section 3.2. The latter cutoffs are used in the computation of ∆H; here we will fix them relative to E T as E L = 3E T , E L = 2E T , E L = 3E T . This is high enough so that ∆H is approximated sufficiently well (see also the checks in appendix G). The E * parameter in (2.31) will be fixed as follows. At each E T , we will choose E * equal to the energy of the lowest state in each Z 2 parity sector, as computed for the same E T in the local LO renormalized approximation (section 2.1.3).
With the auxiliary cutoffs fixed as above, E T remains the only free parameter. Therefore, the numerical error will be estimated just by varying E T .
In Fig. 1 we plot the NLO-HT vacuum energy E 0 and the physical mass E 1 − E 0 as a function of E T , for g = 1 and 2 and L = 10. Notice that g = 1, 2, while being smaller than the critical coupling g c ≈ 2.8, are well above the window g 0.2 where perturbation theory is accurate. 12 We could push the NLO-HT cutoff up to E T = 20 for L = 10, corresponding to ∼ 10 4 states. The main numerical bottleneck which prevents us from going higher is the evaluation of ∆H 3 .
For the sake of comparison, in the same figure we overlay the numerical results obtained by two of the methods described in [11]. These are the raw truncation, in which the correction term ∆H(E) in (2.8) is simply thrown away, and the local LO renormalized (referred to as simply "local" below) procedure, in which ∆H(E * ) is replaced by the simpler correction term ∆H loc 2 (E * ) computed in a fully local fashion, as discussed in section 2.1.3. 13 As one can see, we are able to push the cutoff E T much higher for these simpler methods, up to E T = 34 for L = 10, corresponding to ∼ 10 7 states.
The first observation is that the raw HT and the NLO-HT are variational procedures, and hence always provide upper bounds on the eigenvalues, which become monotonically more accurate with increasing E T . This is visible in the figure. On the contrary local renormalization is not variational 12 See appendix B of [11]. 13 In this case E * is taken from raw HT. We do not include the results obtained by the "local subleading" method of [11], which are only marginally more accurate than the local ones. and does not have to be monotonic.
From Fig. 1 it is evident that the raw HT is by far the least accurate, therefore we will not report results of this method in the rest of the discussion. We will keep showing local results as a baseline to judge the relative advantages of the NLO-HT, and to justify its additional complexity. Fig. 2 compares the rate of convergence of these two methods. In this figure the vacuum energy density E 0 /L and mass E 1 − E 0 are shown for L = 6, 8, 10 and g = 1 and g = 2. These plots are consistent with the expectation that both methods converge to the same asymptotic values as E T → ∞. Notice that the local data are plotted versus 1/E 2 T , while the NLO-HT data versus 1/E 3 T . At asymptotically large E T , both methods appear to have linear convergence with respect to these two variables. Notice that for the smaller values of L we could push the cutoff higher than for L = 10, due to larger gaps in the free spectrum.
Naively, we may have expected faster convergence with the cutoff: 1/E 3 T for the local and 1/E 4 T for the NLO-HT. For example, the coefficients of the local correction terms given in (3.9) behave as 1/E 2 T times logarithms. If these coefficients were to correct the 1/E 2 T behavior fully, we would have remained with an error decreasing one power of E T faster. Apparently this does not happen. Similarly, in the NLO-HT case, the largest local coefficient at the cubic order decreases as the cubic power of the cutoff, see Table 3, and again this does not seem sufficient to fully correct the 1/E 3 T behavior of the spectrum. While we don't understand why the naive expectations concerning the convergence rate fail, 14 it remains true that the observed convergence for NLO-HT is much faster than for the local (which in turn is much faster than for the raw HT).
The local data in Figs. 1 and 2 show significant fluctuations on top of the 1/E 2 T approach, especially pronounced for the mass. The origin of these fluctuations lies in the discreteness of the spectrum. For a continuously increasing E T , the truncated Hilbert space changes discontinuously when the high-energy states fall below the cutoff. At the same time, the local correction term ∆H loc 2 (E * ) varies continuously with the cutoff, and so is unable to compensate the effects of discreteness. 15 On the other hand, the correction term ∆H in the NLO-HT method adjusts itself discontinuously with the cutoff, because the sum over states just above the cutoff if performed exactly and not in the local approximation. For this reason the NLO-HT provides a much smoother dependence on E T , as Figs. 1 and 2 demonstrate. This makes the NLO-HT data well amenable to a fit. We tried various fitting procedures, and the one which seemed to worked best is to fit the NLO-HT points by a polynomial in 1/E T of the form From these fits we extract predictions for the eigenvalues at E T = ∞, with error estimates, which will be used in the subsequent sections. For more details on the fitting procedure see appendix H.
The reader may notice that some points in the left panels of figure 2 violate monotonicity in E T by a small amount, which is in apparent contradiction with was what stated earlier about the variational nature of the NLO-HT procedure. These fluctuations are numerical artifacts having negligible impact on the accuracy of the method. Their presence is explained by the following two reasons. First, as explained in section 3.2, the ultrahigh energy contributions to the matrices ∆H 2 and ∆H 3 in (2.31) have been computed in the local approximation, rather than exactly. Second, in our prescription, we choose the parameter E * in ∆H to depend on E T , as explained above, implying that increasing E T does not strictly correspond to enlarging the variational ansatz.

L dependence
In this section we study the dependence of the numerical eigenvalues on the volume L. Finite volume effects in quantum field theory are very well understood theoretically [35][36][37]. This will allow us to perform interesting consistency checks of our results, and to devise a procedure for extracting infinite volume predictions.
Let us discuss first the theoretical expectations for the vacuum energy density and for the physical particle mass in finite volume. The vacuum energy at L 1/m ph should behave as where Λ is the infinite volume vacuum energy density (the cosmological constant) and m ph is the physical mass of the lightest particle. This formula is valid in any massive quantum field theory in 1+1 dimensions in absence of bound states (i.e. particles with mass below 2m ph ). See the discussion in [11] after Eq. (4.4), as well as [38], Eq. (90) and later. Free bosons/fermions have a = ±1. For interacting theories we expect a = O(1). This is satisfied by the fits below.
The physical mass in finite volume is defined as E 1 − E 0 where E 1 is the lightest excited energy level at zero momentum. The large L corrections to this quantity can be understood as contributions to the one-particle self-energy arising from virtual particles traveling around the cylinder representing (spatial circle)×(time) [35,37]. In a 1+1 dimensional theory with unbroken Z 2 symmetry they can be expressed as: 16 where S(θ) is the S-matrix for 2 → 2 scattering, with θ the rapidity difference. The third term in (4.3) is given by contributions in which virtual particles travel around the cylinder multiple times.
While the S-matrix can be measured in the HT approach by studying the L dependence of two particle states [2], this will not be done in this work. Instead, we will parametrize our ignorance of the S-matrix replacing S(θ + iπ/2) with a Taylor series expansion around θ = 0. This is reasonable because the integral in ∆m is dominated by small θ. We obtain: The Bessel function here would be the exact answer for a constant S(θ), while the second term comes from θ 2 in S(θ + iπ/2) doing the integral via the steepest descent (the linear term vanishes in the integral). Further corrections are suppressed by additional powers of 1/(m ph L).
In Fig. 3 we present the numerical data: the vacuum energy density E 0 /L and the physical mass E 1 −E 0 as functions of L for three values of the coupling g = 0.2, 1, 2. We include the NLO-HT data points at the highest E T we could reach for the given L (blue), the NLO-HT data fit-extrapolated to E T = ∞ as discussed in the previous section (red error bars), and the local data at its highest E T (yellow). 17 Let us interpret this data theoretically, starting with with weak coupling g = 0.2 which lies at the boundary of the region where fixed order perturbation theory ceases to be reliable [11]. We fit the E T = ∞ data for the physical mass using Eq. (4.3) where we neglect the third term and approximate ∆m by Eq. (4.6). The fit has three parameters: m ph , b, c. The fit works well in the whole range of L and allows us to extract the value of m ph reported in Table 1. The uncertainty on m ph was determined by fitting the upper and lower ends of the error bars.
We next fit the E T = ∞ data for the vacuum energy using Eq. (4.2) with Λ, m ph , a as fit parameters. Including the error term ∝ a is not very important to achieve a good fit for this low value of g, but it's important for g 2 considered below. We checked that a very good fit can be obtained with m ph in the range determined from E 1 − E 0 . The final determination of Λ reported in Table 1 is obtained using a constrained fit 18 restricting m ph to that range. Notice that it is crucial for this test not to neglect the corrections E 0 (L), z(L) in the Hamiltonian (3.4) whose decrease rate e −mL is close to the e −m ph L effects we are trying to observe. Passing to g = 1, 2, for these stronger couplings there is much more difference between the three curves. The NLO-HT data at the maximal attainable cutoff do not show dependence on L compatible with theoretical expectations. However, the same data extrapolated to E T = ∞ can be fitted very well. We use the same fitting procedures as for g = 0.2. The fits are good and the physical mass from the two determinations agrees within errors. See Table 1 for the extracted m ph and Λ.   Let us comment on the local data in Fig. 3. For g = 1, 2 they are unsuitable to perform the fit, just as the non-extrapolated NLO-HT data. 19 While the local LO data can be pushed to a much higher E T , it is more difficult to extrapolate them to E T = ∞ than the NLO-HT due to pronounced fluctuations within the asymptotic 1/E 2 T convergence rate. We tried extrapolating the local data and obtained results largely consistent with NLO-HT but with larger error bars. Also, we remark that in higher dimensions, where the Hilbert space grows more quickly with the cutoff, and the convergence is slower, we expect the local LO approximation to perform even worse than here with respect to the NLO-HT approach, as the cutoff cannot be pushed as high.
In all the above fits the third term in (4.3) was neglected. We checked that this assumption gives a reasonable fit up to g = 2.6. As g is increased further it gets close to the critical coupling g c ≈ 2.8. On the one hand, fitting finite volume data in this region becomes more difficult as the physical mass approaches zero and the neglect of subleading terms suppressed by higher powers of e −m ph L is no longer justified. On the other hand, we know that at g = g c the φ 4 theory should flow to the critical Ising model. So, for g near g c , the flow must lead to the Ising field theory (IFT)-the critical Ising perturbed by the operator, up to irrelevant corrections which go to zero as g → g c and which we will neglect in the subsequent discussion. 20 The IFT is integrable and its finite volume partition function is known exactly. In particular, the functional dependence of E 1 (L)−E 0 (L) and E 0 (L)−E 0 (∞) on m ph L is known. One could use this information to improve our fitting procedure for g near g c . For instance, the coefficients b and c in (4.6) in that region would be fixed to the values to 2/π and 0 [39], rather than being fitted from the data. This reasoning also explains why neglecting the third term in (4.3) works even for relatively small values of m ph , since in the IFT σ = 3, above the generic value √ 3. Using the IFT predictions would lead to more accurate estimates of Λ and m ph for g close to g c . However, in the present work we will be content with our simplified analysis, not using explicitly this additional piece of information.

g dependence and the critical coupling
In the previous sections we explained how NLO-HT data can be extrapolated to E T = ∞ and then to L = ∞. We will now use these procedures to study the spectrum dependence on g.
In Fig. 5 we show the NLO-HT data for the vacuum energy density and the physical mass for g ∈ [0, 3] in steps of 0.2. Green error bars refer to L = 10 NLO-HT data extrapolated to E T = ∞, while red error bars are the infinite volume estimates (we only perform the latter for g 2.6, i.e. not too close to the critical point).
These plots should be compared to Fig. 5 in [11], taking into account that in those figures we did not attempt to extrapolate to infinite E T and L and did not provide error estimates. The current results are clearly superior in that these sources of systematic error are properly taken into account.
There is not much structure in the vacuum energy plot except that it is a monotonically 19 The g = 1 vacuum energy data could perhaps be fitted. Notice that the fluctuations in this data are much smaller than in Fig. 7 (left) of [11], because of higher E T cutoff. 20 For many although not all purposes the IFT can be thought of as the theory of free massive Majorana fermions. Our fits prefer negative values for a in Eq. (4.2) close to g = g c , as appropriate for fermionic excitations. : Left: the vacuum energy density as a function of g. The dashed line joining the points is not a fit; it is included to guide the eye. We also show errors divided by g 2 . Right: the physical mass as a function of g. The line is a fit described in the text. We also show fit residuals divided by g 2 .
decreasing function of g. The physical mass plot is more interesting. We see by eye that the mass gap vanishes somewhere close to g ≈ 2.8. This is in accord with the previous theoretical [40] and numerical [41,11,42,43] studies, which found that our theory undergoes a second order phase transition at a critical value of the coupling. To give a more accurate estimate of g c , we perform a fit of the red data points in the range g ∈ [0, 2.6]. We use a rational function: , with fit parameters a, g 1 , g 2 , g 3 , g c , and ν. We demand that g 1 , g 2 , g 3 > 0 so that m fit (g) has poles at the negative real axis. We see that f (g c ) = 0 by construction. Performing the fit, we get our final estimate for the critical coupling, reported in Table 2.
The ν parameter in the above fit is a critical exponent, and assuming the Ising model universality class for the phase transition, we expect ν = (2 − ∆ ) −1 = 1 using ∆ = 1, the dimension of the most relevant non-trivial Z 2 -even operator of the critical Ising model [11]. In our fit we fixed ν to this exact value. Relaxing this assumption gives the same prediction with somewhat larger error bars.
The rationale behind introducing the poles into the ansatz f (g) is that they are supposed to approximate the effect of a branch cut along the negative real axis, which the analytically continued function m ph (g) may be expected to have. In fact it's impossible to get a good fit using a purely polynomial approximation. The number of poles is somewhat arbitrary. Three poles as in (4.7) gives a good fit, and we checked that increasing the number of poles does not change the prediction for g c appreciably.
The ansatz f (g) = 1 + O(g 2 ) by construction. We checked that the g 2 and g 3 coefficients of our best fit are roughly consistent with the perturbation theory prediction (appendix B of [11]) Using a slightly more complicated ansatz , (4.9) we could find a fit which agrees with perturbation theory precisely. The g c estimate from such a fit comes out nearly identical with the one provided above. This is not surprising because most of the constraining power of the fit relevant for determining g c comes from the region 1 g 2 where perturbation theory is anyway not adequate.
Year  In Table 2, we compare our estimate for g c with other recent results in the literature. Our original HT estimate in [11] was a bit high, evidently because the effects of the extrapolating to E T → ∞, L → ∞ were not taken into account. It's reassuring that our current estimate agrees well with the HT estimate from [21], obtained approaching the critical point from the other side, i.e. from within the Z 2 -broken phase.
The last four results in the table are based on studies of latticized φ 4 models, such as lattice Monte Carlo simulations of the euclidean model [41,43] or matrix product states approach to the latticized Hamiltonian formulation [42]. Lattice considerations also enter [44] which determines the critical coupling via resummed perturbation theory. It should be pointed out that matching to the continuum limit is particularly subtle in the two dimensional lattice φ 4 theory, because of the presence of an infinite number of relevant and marginal operators [11]. The above lattice studies do not perform careful matching, and use the simplest possible discretization. The agreement with HT is good, and so this simplest discretization seems to have the right continuum limit. It would be interesting to understand why this is so.
Recently, the two dimensional φ 4 theory was also studied using the light front quantization [22][23][24] 22 using a wavefunction basis superior to the old discrete light cone quantization work [45]. The light front quantization scheme is different from the equal-time quantization scheme used here. This difference is apparent already at the perturbative level, since certain diagrams contributing to vacuum energy and mass renormalization are absent in the light front scheme. The vacuum 21 The Z 2 -broken phase of the theory was studied, using minisuperspace treatment for the zero mode (as in [14]). Their estimate for the critical coupling has been translated to our convention using the Chang duality [40,11]. 22 See also [25] for an application to the three dimensional φ 4 theory at large N . energy cannot be compared between the two schemes as it is set identically zero in the light front scheme. On the other hand, it is believed that the physical mass can be compared between the two schemes, with an appropriate non-perturbative coupling redefinition. 23 A method to perform such a coupling redefinition was recently proposed in [23,46] (see [47] for previous related work). We refer to those works for the comparison of the critical coupling estimates obtained using the two methods.
We conclude this section with a rough check that our method reproduces the physics of the phase transition at criticality. Conformal field theory predicts that at g = g c the energy levels should vary with L as where ∆ I are operator dimensions in the critical Ising model. In Fig. 5 we test this relation for the first three energy levels above the vacuum, which should correspond to the operators with dimensions ∆ σ = 1/8, ∆ = 1, ∆ ∂ 2 σ = 2 + 1/8. The bands correspond to varying g in the range 2.76(3). We see reasonable agreement for σ and , while it is possible that the agreement for ∂ 2 σ will be reached at higher values of L. This figure can be compared to Fig. 6 in [11] and Figs. 22, 23 of [21], which show similar behavior.

Conclusions and outlook
In this work we have addressed several conceptual and practical issues regarding the renormalization improvement of the Hamiltonian Truncation (HT) technique. This led us to propose the NLO-HT, a variant of the HT using a variational correction term to the Hamiltonian, of nextto-leading-order accuracy in the interaction. The NLO-HT method puts on a firmer theoretical footing the renormalization theory in the context of Hamiltonian Truncation, and at the same time rigorously improves the numerics with respect to previous work.
In the second part of the paper, we tested the NLO-HT in the context of the two-dimensional φ 4 theory. We also benchmarked the NLO-HT against the simpler existing versions of the HTthe raw truncation and the local leading-order renormalization. Compared to these, the NLO-HT results exhibit smoother and more rapidly convergent dependence on the Hilbert space cutoff E T . Therefore, they lend themselves to more accurate extrapolations to E T = ∞ and ultimately provide more accurate determinations of the true eigenvalues.
In this work, we focused on the massive region where the Z 2 symmetry is preserved, and on the critical region, where the mass gap vanishes. We computed the mass gap and vacuum energy density over the whole range of couplings, as well as the critical exponents at the critical point. In the future it will be interesting to use NLO-HT to also study the region beyond the phase transition, where the Z 2 symmetry breaks spontaneously. That region was previously studied in [14,21] using the local LO renormalized and raw Hamiltonian Truncation.
The implementation of the NLO-HT method required a refinement of the local approximation of the counterterms, which formed the basis of the previously used local LO renormalization. We have discussed and addressed novel issues arising in the local approximation at the cubic level, such as the presence of bilocal operators. Following [18], we used the local approximation only to approximate the "ultrahigh" energy parts of the correction terms, while the moderately high parts were evaluated exactly. This required additional computational effort, but as a result all matrix elements of the correction terms were accurately taken into account. At present, the evaluation of counterterms presents a computational bottleneck demanding significant time and memory resources. This step is the main limiting factor in the performance of the method. In this regard, we outlined several directions for future development. One promising idea was already mentioned in sections 3.2.3, others are scattered in the main text, see e.g. note 8 and section 3.2.2. Other interesting questions for developing the method include: • Is it worth it/possible to enrich the variational ansatz to allow for an even more accurate reproduction of the would-be optimal tails (2.16)?
• Can we deal more efficiently with the states with high occupation numbers, which at present occupy a fraction of the Hilbert space disproportionally large compared to their total weight? See appendix A.
However, while further improvements in the method are welcome, they are not strictly speaking necessary. The NLO-HT is already one of the most advanced implementations of Hamiltonian Truncation currently available. It would be great to see it applied in further HT studies of the φ 4 theory or of other strongly coupled QFTs. We will be happy to share our code upon request. One φ 4 application we are currently thinking about is to investigate the analytic structure of m ph and Λ for the complexified quartic coupling g. The Hamiltonian Truncation seems to be the only non-perturbative technique currently suitable for this task.
Finally, we believe that Hamiltonian Truncation is now in a much better shape to attack strongly coupled renormalization group (RG) flows in higher dimensions. For instance, as the next step one could study models of the Landau-Ginzburg or Yukawa type in three dimensions, and their RG flow either to a gapped or to a conformal phase. Furthermore, one could apply the renormalization procedure described in this work in the context of TCSA, in order to deform interacting fixed points directly. For instance, it would be interesting to study the temperature and/or magnetic deformation in the 3D Ising model, in which the UV data (OPE coefficients and scaling dimensions) for the low-lying primary operators are known to high accuracy [48][49][50].

A Structure of the interacting eigenstates
Much of the motivation underlying the HT method is based on the idea of decoupling-that interacting eigenstates in finite volume are dominated by the low-energy non-interacting states. In this appendix we will show some plots demonstrating the validity of this idea, in the context of the (φ 4 ) 2 theory (see also the related discussion in [4], Section VII.B). We will also discuss, and resolve, the apparent contradiction with the "orthogonality catastrophe".
All plots in this appendix will correspond to m = 1, L = 10, and cutoff E T = 20. We will be showing data for the raw truncated Hamiltonian eigenstates -as we are interested here in the qualitative features, it's not crucial to include renormalization corrections.
We start by showing the composition of the Z 2 even truncated Hilbert space subject to the constraints P = 0, P = 0. In Fig. 6 we plot the distribution of the number of states per particle number (0,2,4,. . . ) and per interval [E, E + 1) of energy. As this plot illustrates, the total Hilbert space dimension (dashed line) grows exponentially with the cutoff. We expect that the leading exponential asymptotics will be the same as in the massless scalar boson CFT, ∼ exp √ x, x = where |n runs over the basis of H l described in appendix I.1. We will call w n = |c n | 2 the weight of the given basis state inside |E . We assume |E is unit normalized so the weights sum to one. The most important basis states are those which carry most weight. Which are those states?
We will now show a series of plots concerning the weight composition of the interacting vacuum (the lowest eigenstate in the Z 2 even sector). 24 We will choose three representative values of the coupling g = 1, 2, 3. These couplings are all strong, and g ≈ 3 roughly corresponds to the end of the Z 2 invariant phase [11].  The energy E and the total occupation number N are two principal parameters of a basis state. How do they correlate with the weight? Starting with the occupation number, let w(N ) be the total weight of all states of occupation number N . As is clear from Fig. 7 (left), w(N ) decreases exponentially with N . This tendency is especially pronounced at g = 1, 2, but it is noticeable at g = 3 as well. The free vacuum (N = 0) dominates the interacting ground state for all three couplings (for the reference, its weight w 0 = 0.96, 0.80, 0.54 for g = 1, 2, 3 respectively). 25 We next study the distribution of weights in energy. Let w(E), E = 0, 1, 2, . . . be the total weight of states whose H 0 energy belongs to the interval [E, E + 1). It turns out that this distribution also decreases, although not exponentially, but rather like a powerlaw ∼ E −2 for large E. This is clear from Fig. 7 (right), where we plot w(E) multiplied by (E + 1) 2 .
Next let us combine Figs. 7 and see how the weight is distributed both in energy and in the occupation number. Let w(E|N ) be like w(E) from the previous plot, but limited to states of fixed total occupation number N = 0, 2, 4, . . .. This set of distributions is shown in Fig. 8, where we take g = 2, the other values of the coupling being similar. Like in Fig. 7 (right), we multiply by (E + 1) 2 . This plot reveals that for every N the function w(E|N ) follows the same powerlaw ∼ E −2 (the only exception is N = 2, where the decrease with E seems faster). The total weight per N decreases rapidly with N , consistently with Fig. 7 (left).  Fig. 7 (right)) shows the total weight per the same energy interval.
In the above histograms we grouped states by energy or by occupation number or both. It's important to realize that there is further significant variation of individual weights within the histogram bins. This is clear from Fig. 9 (left) which shows each state separately for the interacting ground state at g = 2. For example, weights of 4-particle states (golden points) with nearby energies fluctuate by as much as two orders of magnitude.
It's instructive to try to understand this plot using Eq. (2.7) which expresses the high energy part of the eigenvector in terms of the low-energy components. For the purpose of this exercise "low" will denote all energies below 5 (say), and "high" all energies between 5 and 20. In the spirit of our approximate tail formula (2.17), we will also approximate H hh by H 0 hh in (2.7). Let then c l be the part of the raw eigenvector corresponding to states of energies E 5, and define c h by  the formula: where E 0 is the raw eigenvalue. The resulting c h is shown in Fig. 9 (right). Comparing to the left panel, we see that the periodic variation of the 4-particle component is largely reproduced. This variation is explained by the spread of the V hl matrix elements. The order of magnitudes of N = 2 and N = 6 weights are also reproduced (although not the change of sign of c n in the N = 2 component which is responsible for the dip at E = 9). The N = 8 component is captured poorly, which is not surprising given that too few 4-particle states have been included into c l .
The observed exponential decoupling of high occupation numbers N is asking to be explained. Is it related to the fact that N changes by at most a finite amount (four) in each φ 4 interaction? At the moment there is no proof. 26 The exponential decoupling is also asking to be exploited. Can we take different energy cutoffs in each occupation number sector? Looking at Fig. 8, it would seem natural to increase the cutoff in the 4-particle sector and reduce the cutoff for N 6. One possible rule is that the near-cutoff states in each sector should contribute comparably. There is no guarantee that that this will work, given that some Hamiltonian matrix elements grow with N . Still, this is something that needs to be explored.
In this work, as in [11,14,18], we took a common energy cutoff for all sectors. This was convenient for implementing the renormalization corrections. The price to pay is that there's a huge number of states in the Hilbert space -those with high occupation numbers -which have very little weight in the interacting eigenstates. Notice however that we cannot neglect them altogether because their integrated weight is not negligible. mentioned in note 8. Suppose that we pick a weight threshold . How many states are there whose weight is < , and how large is the cumulative weight of all remaining states? The answers can be read off Fig. 10. The solid lines plot the sequence w n ordered from large to small weights. The dashed lines represent the total weight of all states in this ordered list subsequent to the n-th.

A.1 On the orthogonality catastrophe
The orthogonality catastrophe 27 is the notion that infinite volume interacting eigenstates have zero overlap with the non-interacting ones. Since the HT works in a finite but large volume, one may have thought that we will see exponentially small overlaps, while we have seen in the above plots that overlaps remain O(1) even for Lm = 10 − 20. We would like to discuss how this apparent contradiction gets resolved.
Consider the overlap between the interacting vacuum |Ω and the perturbative vacuum |0 in a finite but large volume L. In general we expect that in 1 + 1 dimensions it will go to zero as where α = α(g/m 2 ) is expected to be order 1 for moderate couplings. The 1/2π has the usual phase space origin, since the suppression originates from the accumulation of normalization factors of different momentum modes. A toy example is the free massive scalar perturbed by the φ 2 interaction, which amounts to a change in mass. This example can be solved in finite volume via a Bogoliubov transformation. The interacting ground state is a kind of a coherent state. The overlap with the free vacuum can be computed exactly, and α = O(1) confirmed. 28 27 Early examples were considered by van Hove [53] and Anderson [54]. This discussion is also related to Haag's theorem [55]. In this context, for a formal proof of unitary non-equivalence of two free massive scalars fields with masses m 1 = m 2 in infinite volume, see Theorem X.46 in [56]. 28 In the notation of [11], section 3.4, we have In HT we are not interested in taking the mathematically strict infinite volume limit-it suffices to have a volume large enough so that we can extract infinite volume limits of physical quantities, like the particle spectrum. 29 By Lüscher's theorems [35], corrections to stable particle masses in 1 + 1 dimensional field theories on a finite circle of length L go as where m ph is the physical particle mass one is trying to extract, and β = √ 3/2 or 1 depending on whether the particle appears as a pole in its own 2 → 2 scattering amplitude or not. 30 Suppose now we stay away from the critical point so that m ph is order m (this is satisfied for g 2 for the (φ 4 ) 2 theory). Comparing (A.3) with (A.5) we see that there is a "sweet window" where the spectrum is already accurate, but the interacting eigenstates are still dominated by the low-energy non-interacting states. 31 This is the range where the HT is expected to work best, and the g = 1 and g = 2 plots from this appendix fall precisely into this range. So we see that the above mentioned apparent contradiction is explained by the extra α/(2π) in the overlap exponent. We expect α = O(g/m 2 ) for small g, so that the window in (A.6) widens, while for moderately large couplings we expect α = O(1).
This discussion brings to mind the following question (more theoretical than practical). Suppose that we computed volume L eigenstates, with L in the sweet window. Is it then possible to "exponentiate" them and construct approximate eigenstates in any volume L L, which would then exhibit the orthogonality catastrophe? We do not know.

B Problems with the naive truncation
In this appendix we elaborate on the difficulties found when trying to approximate accurately the operator ∆H by truncating the series expansion (2.11), which we copy here: where the sum is taken over all states j i above the cutoff E T .
Consider this series in the (φ 4 ) 2 theory. The naive dimensional analysis suggests that each next term in the series is suppressed by O(g/E 2 T ). However, this expectation turns out incorrect. There are some intermediate states which violate this power-counting. Because of these states, For x = 0.1, 1, 10 we get α(x) = 0.003, 0.15, 1.8. 29 For another recent discussion of the infinite volume limit in HT see [57]. 30 Note that for a massive QFT in d+1 dimensions compactified on a flat torus, Eq. (A.5) remains valid as written while in Eq. (A.3) one has to change Lm/(2π) → (Lm/(2π)) d in the exponent. In particular, the sweet window (A.6) is expected to survive. 31 For Lm 2π/α, we expect that the maximum of the distribution of weights of interacting eigenstates will shift to nonzero occupation numbers. It would be interesting to explore this phenomenon in more detail. matrix elements [∆H n (E * )] rs exhibit anomalous growth with n. This growth first becomes visible for states r, s just below the cutoff E T , but for sufficiently large n it propagates to all external states. As a result the expansion does not converge; it is only asymptotic.
These effects were first discussed in [18], and we will review them here. The culprits are intermediate states with large occupation numbers N . An oscillator acting on such a state gives an extra factor of ∼ √ N , and the accumulation of such factors skews the asymptotics. We will demonstrate the phenomenon using the states |N consisting of N 1 particles at rest. 32 In this section we use ∼ to denote order of magnitude estimates.
As a first example, consider equal initial and final states r = s = |N . We choose N = E T /m so that this state is at or just below the cutoff. Then We see in particular that for any g there exists a large enough N such that the perturbation V is not suppressed with respect to H 0 in this matrix element. As we will see now, ∆H 2 will pick up a further factor of f N . Indeed, the state |N will be connected by V to the states |N + 2 and |N + 4 which lie above E T . The connecting matrix elements are of the same order as V ss . Taking into account the contribution of just these states to ∆H 2 , we get: Notice that all terms entering the expression for [∆H n ] rs have the same sign, namely (−1) n−1 , as long as E * < E T as we assume. This is because the matrix elements V ij are positive by inspection, and all denominators are negative. So if we focus on just some intermediate states, we obtain a lower bound on the absolute value, as in (B.3). Going to higher orders, we will keep getting the same relative factor: totally unlike the naively expected suppression by powers of g/E 2 T . For sufficiently large N (i.e. for sufficiently large E T ) we will have f N > 1 and the series for this matrix element will then diverge.
The above example can be generalized to show that the situation is in fact even worse, namely that the series diverges for any E T and for any nonzero matrix element. For this we argue as follows. We pick s, r in the same Z 2 sector, for definiteness even. Pick an even N so large that the state |N is above E T and that f N > 1. It's easy to see that any even state can be connected to the state |N by a finite sequence of intermediate Fock states |j which are above E T and are obtained by repetitive actions of V , i.e. so that the matrix elements V j i+1 j i are nonzero. 33 If n s and n r are the number of steps necessary to connect s, r to |N , then starting from n = n s + n r each 32 Similar phenomena will happen for other intermediate states with large occupation numbers, e.g. containing N/2 particle pairs of momenta k, −k. 33 Here's one way to do this. Recall that we assume that s, r have zero momentum. There are four stages: (1) Act on s with V once just to get above E T ; (2) Pick one particle, say of momentum p, and act on it with (a † 0 ) 2 a † p a p monomial inside V repeatedly, increasing the zero momentum occupation number up to N ; (3) Eliminate the nonzero momentum particles by acting repeatedly with a † 0 a † p2+p1 a p1 a p2 , picking particle pairs with |p 2 | |p 1 | and p 2 , p 1 of opposite sign. (4) Annihilate unnecessary zero momentum particles.
following [∆H n ] rs will pick up at least a factor of f N by the same argument as the one leading to (B.4). Since f N > 1 the series will diverge.
The above effects show that the strategy of systematically improving the accuracy of the spectra by truncating (B.1) at increasingly higher orders n max is problematic. So what can we do about all this? It's important that we are mostly interested in the low-energy eigenstates, and as discussed in appendix A, those have large overlap mostly with the low-energy noninteracting states. In particular, the states with large occupation numbers, like the state |N close to the cutoff, have contributions which are exponentially suppressed (see Fig. 7). One could hope that the problem of the overall divergence of the ∆H n series is irrelevant if one is mostly interested in the low-energy entries of the Hamiltonian matrix and if one truncates the series at low n. One could also hope that even though ∆H 2 is not small for some states close to the cutoff, this is not important because those states contribute very little to the interacting eigenstates. In other words, Hope: the series (B.1), truncated at low n, approximates the low-energy part of the matrix ∆H well, and the part close to the cutoff which is not well-approximated is unimportant.
However, as numerical experimentation shows, this hope does not seem to materialize, at least for the values of E T which are computationally feasible. For example, if one truncates the expansion at n = 2, computes ∆H 2 exactly, and then uses it to diagonalize H + ∆H 2 in (2.8), then one finds the following. First of all, one finds spurious eigenvectors which live close to the cutoff. Since ∆H 2 is negative and large near the cutoff, these spurious eigenvectors have eigenvalues smaller than the physical eigenvalues. Even if one eliminates these and focuses on the eigenvectors which can be interpreted as corrections to the raw truncated eigenvectors, one finds that the corrections are erratic and not always small. The conclusion is that the matrix ∆H 2 near the cutoff is really too large compared to the true ∆H, and this messes up the physical spectrum. 34 This problem did not influence the results of [11,14] because in those papers the local approximation was used for ∆H 2 , suppressing the anomalously large matrix elements near the cutoff. The problem was instead realized by the authors of [18], who were the first to compute ∆H 2 exactly. The problem was dealt with in [18] by introducing an auxiliary cutoff E W E T /2 and setting ∆H 2 to zero above this cutoff. This temporary solution did not allow to fully take advantage of the exactly known ∆H 2 , nor to include the ∆H 3 corrections.

B.1 Taming the divergence?
We would like to describe here an idea which might tame the divergence of the perturbative series (B.1). The idea was not used in this paper, but in the future it may be used either as an alternative to NLO-HT from section 2.2 or in combination with that method.
The key observation is that the problematic growth of (B.3) and (B.4) for large occupation numbers can be cured if one performs an expansion not around H 0 but aroundH 0 ≡ H 0 + diag V , with diag V the diagonal part of the potential V . So, consider splitting the Hamiltonian as where we introduced the notationV = V − diag V . This is a reasonable split becauseH 0 is still an exactly solvable Hamiltonian, diagonal in the same Fock space in which H 0 is diagonal. On the other hand, by moving the diagonal part of V intoH 0 , one can hope that the series for the correction term will be better behaved. 35 The derivation (2.4)-(2.10) goes through with the corresponding substitutions, so that one obtains the formal expansion (2.11) for the correction term with the replacements These replacements produce higher powers of the occupation numbers in the denominators in such a way that the r.h.s. of (B.3) gets replaced by which is at most order V ss no matter how high N is.
Similarly, in the mechanism for the divergence of any matrix element we will no longer encounter arbitrary large factors f N . At most we get O(1) factors starting from n = n s + n r . Notice that those factors will not come from the diagonal matrix elements N |V |N , since those are zero, but one can get similar factors oscillating between |N and |N + 2 , say. Actually, we believe the series is still divergent (as our numerical experiments and the study of the anharmonic oscillator example show), but it diverges much more slowly and one could think that the above-stated Hope perhaps has a chance to be true in this modified setup.
Concerning the technical realization of this possible solution, notice that the HamiltonianH 0 , although diagonal in the free Fock space, does not allow a natural formulation in terms of fields. In particular, the increase in the energy of a state acted upon by the oscillator depends on the initial energy and not only on the oscillator frequency. Still, diagrammatic rules from appendix D apply with appropriate changes. Namely, those vertices with two lines to the left and two to the right which correspond to diag V are forbidden, and the energy of the states between any two vertices gets replaced by theH 0 eigenvalue. Initial numerical tests of the described procedure looked promising (in particular we had nice results truncating to n max = 2 and evaluating the correction term ∆H 2 exactly). A more complete exploration is left for the future.

C Relations to other expansions C.1 Brillouin-Wigner series
One particular case where the equation for the effective Hamiltonian (2.8) is used in quantum mechanics is when the low Hilbert space H l consists of a single element, of non-interacting energy E 1 , say. In this case there is nothing to diagonalize, and Eq. (2.8) directly expresses the interacting eigenvalue as a solution of the non-linear equation: We can then expand the denominator in V hh and obtain the analogue of Eq. (2.11): This perturbative series is called the Brillouin-Wigner (BW) series [58,59] and represents a way to organize quantum-mechanical perturbation theory which is somewhat different from the usual Rayleigh-Schrödinger (RS) series. Of course, if we further expand E in the denominator and solve the equation order-by-order in V , we get back to the RS series. But if we truncate the BW series at a certain finite order and then solve the resulting equation for E exactly, we will get an approximation to the true eigenvalue which is different from the same-order RS approximation.
As proved by Wigner [59], the BW approximations of odd order allow the variational interpretation. Namely, let E = E 2N +1 , N 1, be an odd-order BW approximation, i.e. the smallest solution of the truncated equation Then there exists a wavefunction ψ = ψ 2N +1 such that This wavefunction can be given explicitly: The proof consists in plugging (C.5) into (C.4). Various cancellations and simplifications occur as a consequence of (C.3) and the identity follows. If the eigenvalue in question is the ground state, the existence of the variational interpretation (C.4) implies that the BW approximations are always overestimates. Notice that there is no claim that the accuracy of approximation increases with N , as unlike in the RR method the trial Hilbert space is not enlarged.
It's instructive to compare the above discussion with our section 2.2. To allow for the comparison, we specialize section 2.2 to the case when H l consists of a single state, to which we add a tail.
The effective Hamiltonian correction ∆ H, which is simply the eigenvalue correction in the single state case, is then given by (see Eq. (2.31), where ∆H 2 , ∆H 3 are numbers in the case at hand) This can be compared to the BW correction for N = 1, which takes the form: Both our correction and the BW correction have a variational interpretation. For the BW it's (C.5) with N = 1. For us it's the same equation except that the normalization of the tail, which is the second term in (C.5), is not kept fixed to 1 but is a free parameter which is determined dynamically (see Eq. (2.20), where c t and c l are independent variables). This means that our procedure is bound to give a better approximation. In the case of the ground state, the variational interpretation implies that our correction has to be always smaller than BW. This can also be seen formally from the above equations: for the ground state ∆H 2 < 0 and so ∆ H ∆H BW independently of the sign of ∆H 3 .

C.2 Schrieffer-Wolff transformation
The renormalization correction ∆H 2 in (2.12) is closely related to another kind of renormalized effective Hamiltonian used in condensed matter physics. Let us describe briefly the idea behind it.
Consider a Hamiltonian having the following block structure where the interaction V that mixes the low and high energy Hilbert spaces spanned by the eigenvalues E i of the free Hamiltonian. H L and H H act on the low and high Hilbert spaces, respectively. V is assumed small, in the sense specified below, and the method will involve an expansion in V .
We want to derive an effective Hamiltonian in the low energy Hilbert subspace. The idea then is to perform a canonical transformation to H to bring it into block diagonal form Since (C.10) is block diagonal, H eff is the renormalized effective Hamiltonian that describes the low energy physics taking into account the mixing with the states in the high energy Hilbert space. A practical way to find the unitary transformation matrix U is to plug in the ansatz U = e S , (C.11) with S antihermitean, in (C.10) and solve perturbatively for S = S (1) + S (2) + . . . , where S (i) = O(V i ), by requiring U † HU to be block-diagonal [60]. At leading order S ≈ S (1) and (C.10) is solved by Projecting e S (1) He −S (1) in the low Hilbert space gives with the Schrieffer-Wolff (SW) correction given by where the sum over k is over the high energy Hilbert space. This has to be compared to ∆H 2 in (2.12), which we recall here: The key difference is that (C.15) corresponds to the two terms in ∆H SW 2 with E replaced by the free energies E i and E j . In fact the Hamiltonian H eff constructed via the SW procedure is E-independent, unlike (2.10) which was the starting point of our discussion.
The perturbative solution to the canonical transformation (C.10) was worked out by Schrieffer and Wolff in [60]. There it was used to relate the Anderson impurity model to the Kondo model. The Anderson impurity model describes the interaction of conducting electrons in a metal with localized atoms in it (impurities) that can lead to localized magnetic moments. The highest atomic states of the Anderson model can be integrated out, following for instance the procedure just reviewed. This leads to an effective Hamiltonian that couples the spin density of the conducting electrons with a localized spin, namely the Kondo effective model. In this physical system the use of (C.14) and the truncation of the series is well justified because the energy difference in the denominators is large, namely the energy gap of the atomic transition into excited states. In other words the dimensionless expansion parameter V ki /(E k − E i ) 1. This has to be contrasted with QFT applications that we have in mind in this paper. In the QFT context, the spectrum is dense at the cutoff and there is no parametric separation between the low and high energy Hilbert spaces. If we introduce a cutoff E T and take states E i and E k just below and just above the cutoff, the ratio V ki /(E k − E i ) can be arbitrarily large. For this reason the SW procedure does not seem adapted for our problem.

D Diagram technique
This appendix reviews the diagram technique [18] for a systematic expansion for the matrices ∆H n in Eqs. (2.11), (B.1). Although only n = 2, 3 is needed for this work, we will consider general n.
The diagram technique follows from Wick's theorem. We represent each V insertion by a vertex with 4 lines exiting. Lines exiting towards right (resp. left) will represent a k (resp. a † k ). 36 Momentum and time flows from right to left and there is momentum conservation in each vertex. There is also a factor of 1/ √ 2Lω k for each line. In this notation V is shown in Fig. 11. + 4 + 6 + 4 + Figure 11: The vertices representing the quartic interaction; see the text.
To construct a diagram we put n vertices time-ordered from right to left in the same order as in (B.1). Some lines exiting from a vertex can be contracted with lines entering into a later vertex. These are produced by oscillator contractions when using Wick's theorem. The uncontracted lines are extended to the ends of the diagram; they correspond to the remaining creation and annihilation operators. This is better illustrated by examples rather than formalized. E.g. the diagram in Fig. 12 produces the operator with momenta subject to More precisely, the constraint of momenta conservation is imposed by a delta function times the length L of the cylinder circle. For instance we have the factor Lδ k 3 +k 2 +q 1 −q 4 for the leftmost vertex in Fig. 12. The internal momenta should be summed over. There is also a scalar factor 1/ 2Lω q for each external and 1/(2Lω k ) for each internal line. We also have to multiply by factors 1/(E * − E j i ) produced by the corresponding insertions in (B.1). Here E j i are energies of the intermediate states, which can be found in terms of the final and initial energies E r and E s , taking into account that each vertex changes the flowing energy by the total frequency of all creation minus all annihilation operators. For example, in Fig. 12 we have two intermediate states denoted by vertical dashed lines in Fig. 13, and their energies are related to E r,s by: Figure 13: The same diagram as in Fig. 12 where we indicated the energies of the external and the intermediate states.
where δV i are energy changes in the vertices.
One way to write a compact solution for these energy conservation constraints, for any diagram, is as follows. Denote by W j the sum of frequencies of all oscillator lines crossing the dashed line j (which can be an intermediate or external state line). We can move from a external state to an intermediate state in a number of steps, and every time we have to subtract W j and add W j+1 . When we add all the steps all increments except the first and the last cancel. Thus the energy of an intermediate state j i is related to the external state energies by Taking the difference of these two equations we obtain a more symmetric expression [18] We have to impose the constraints that all of these intermediate state energies are above E T . This will translate into the restrictions on the internal line momenta. For some diagrams this constraint cannot be satisfied at all, and such diagrams won't contribute. One example is the diagram in Fig. 14, for which the energy of the intermediate state is Finally, there is a combinatorial factor for each diagram which is computed as usual.
The original derivation of the diagram technique [18] was different. It used an auxiliary operator ∆ H n defined as in (B.1) but summing over all j i (not just those with E j i > E T ). This ∆ H n is expressed as an iterated integral of a time-ordered n-point correlation function of the : φ 4 : interaction: 37 Wick's theorem is then used at the level of fields, giving rise to diagrams. Only then one passes from ∆ H n to ∆H n , imposing the restriction that all intermediate states be above E T . This is neatly achieved by considering the analytic dependence of any diagram on E, viewed as a fiducial variable. The needed terms are those for which the poles in E are above E T .
The derivation given here, based on Wick's theorem for oscillators, is more direct than the one in [18]. Both derivations have virtues. The original derivation of [18] has a useful spinoff by allowing to focus directly on the diagrams giving rise to the local approximation, as discussed in appendix E.2. Also, as emphasized in [18], the poles of ∆ H n at E * < E T , although not needed for renormalization, can be used to set up an efficient test for the code. On the other hand, the derivation given here is useful if one wants to play with splitting H into the diagonal and off-diagonal part differently from (3.4), see appendix B.1.

D.1 Bound on the intermediate energies for ∆H 2
In this section we will prove an auxiliary result which will be needed in appendix E.1. Consider the diagrams contributing to ∆H 2 . Some of these have loops, others are tree-level or disconnected. We claim that: there is an upper bound 2E T + m on the intermediate state energy E j for tree-level and disconnected diagrams contributing to ∆H 2 . The proof is based on the formula (D.8): where we use the notation E rs = 1 2 (E r + E s ). Consider first the disconnected diagrams. For such diagrams we have W j W r + W s (with equality if all lines from the left vertex go right, and all lines from the right vertex go left). So E j E rs + (W r + W s )/2. To have a nonzero matrix element, we must have W r E r , W s E s , since all particles acted upon by the oscillators must be present in the initial and final state. So the nonzero matrix elements have E j 2E rs 2E T , which is even stronger than the claimed bound. The proof for the tree level diagrams is slightly more difficult as one has to keep track of the line connecting the vertices. It will be convenient to condense diagrams into "thick line" diagrams, carrying the essential information. For this we replace all lines entering or exiting the vertex from the same direction by a "thick line" carrying the momentum and energy equal to sum of united line momenta and energies. For a thick line carrying momentum Q we will denote the energy carried by it E(Q). Although this energy depends not only on Q but also on how the momenta are distributed, this information will not be needed in the proof and is omitted.
The most general thick line diagram corresponding to a tree-level ∆H 2 diagram is: For example the diagram (D.14) Some of the thick lines may be missing. E.g. for the diagram k q 2 q 3 q 1 q 4 q 5 q 6 (D.15) the thick line diagram would be missing the Q 1 thick line, since there are no thin lines coming into the vertex from that direction. In order not to treat such cases separately, we represent them by the same thick line diagram (D.11) with the understanding that the "missing" lines have associated momentum Q = 0 and energy E(Q) = 0. Now to the proof. We have (D. 16) We also have In the first equation, E(−Q 1 − Q 2 ) stands for the total energy of the constituents of the s state apart from those which are acted upon by the oscillators in the diagram. Their momentum is −Q 1 − Q 2 since the total state momentum is zero. The second equation is analogous.
Using the above equations in (D.10) and eliminating E(Q 1 ) and E(Q 3 ), we obtain: We claim that Indeed, in the l.h.s. we have a sum of energies of a group of particles whose momenta sum to k. Using convexity properties of the function ω k it's not hard to prove that which implies (D.20), in its stronger version without −m in the r.h.s. This −m is needed in the special case when the l.h.s. of (D.20) is actually empty because all three groups of particles are empty (in particular if Q 2 and Q 4 are "missing lines"). If this happens then k = 0 and adding −m we restore the inequality.
Eq. (D.20) and its analogue for E(−Q 3 − Q 4 ) imply that δ −m, and so as claimed

E Local approximation
The diagram technique from appendix D leads to exact expressions for the matrix elements of ∆H n , but evaluating these exact expressions can be demanding. There are many diagrams, and diagrams with loops involve sums over intermediate momenta, with the cutoff that the intermediate energies be above E T . Each diagram corresponds to a product of a certain number of creation and annihilation operators, but the coefficients have a complicated dependence on their frequencies.
As a result, ∆H n cannot be exactly expressed as an integral of an operator local in the field φ; as we say, it's a non-local operator.
However, if we are interested in matrix elements between low-energy states, then one can hope that ∆H n may be approximated by a local operator. In fact, the calculation of the diagrams can be greatly simplified when the energy exchanged between the different vertices of the diagrams is much larger than the frequencies of the external particles. In this limit, according to the usual effective field theory intuition, we may expect that the processes described by the non-local diagrams can be approximated by collapsing the loops over ultrahigh momenta into point-like interactions, i.e. local operators. It's definitely true for ∆H 2 [11,18], but as we will see there are subtleties for ∆H 3 . It's instructive to proceed carefully and see how the local operators arise as a good approximation starting from the diagrams. We will focus on n = 2, 3 as needed in this work.

E.1 Local approximation for ∆H 2 via diagrams
The diagrams for ∆H 2 have two vertices. For the quartic interaction case considered here, depending on the number of contractions, the resulting terms have 0,2,4,6, or 8 oscillators; see appendix C.1 of [18] for the full list.
To see how the local approximation arises, we start by considering the diagram with 2 external legs, hence 2 oscillators. There are four such diagrams: Let's start with the first of these. By the rules of appendix D, it corresponds to the operator 96g 2 Here E j is the energy of the intermediate state, subject to E j > E T . We have E j = E s + ω q 1 + ω k 1 + ω k 2 + ω k 3 by Eq. (D.6). As in appendix D, E r and E s denote energies of the external states in the considered matrix element, q's are the external and k's the internal momenta. Momentum always flows from right to left. The combinatorial factor for this diagram is 96 = 4 3 2 3!.
As explained in section 3.2, we will introduce another energy scale E L > E T (we would like it to be much larger than E T but in practice we can afford E L = (2 -3)E T ). We will split ∆H 2 as in Eq. (3.5) depending on whether the intermediate energy is below or above E L . The part of diagram (E.2) with E j E L will be included in ∆H < 2 and will be evaluated exactly. Here we will be concerned with the part with E j > E L , included in ∆H > 2 . In this case we will approximate (E.2) by dropping the q 1 dependence in the momentum conserving δ-functions, and also by neglecting E s and ω q 1 in E j with respect to ω k 1 + ω k 2 + ω k 3 . In this way we conclude that the E j > E L part of the diagram is approximated by where φ + (x) = q a † q / 2Lω q e −iqx is the positive-frequency part of φ(x), and C is just a constant without q-dependence: So we see that the first diagram in (E.1) produced a piece of φ 2 , and it's not hard to guess that the remaining pieces will come from the remaining three. Their exact expressions differ from (E.2) in how q's and ω q 's enter into the momentum conserving δ-functions and into E j . However, when we consider the ∆H > 2 parts and carry out the approximation described above, these differences disappear. So each of these diagrams is approximated by another piece of φ 2 (φ + φ − for the second and fourth, [φ − ] 2 for the third), times the same constant C as above. The pieces combine neatly when we sum the diagrams, producing Hence the ultrahigh energy part of these four diagrams renormalizes the local operator : φ 2 :.
When the above procedure is carried out systematically for other classes of diagrams, it gives rise to the approximate expression (3.8). To be precise, the ultrahigh energy part of diagrams with p contractions, p = 2, 3, 4, renormalizes V 8−2p . The coefficients are given by: with s p = 4 p 2 p! the combinatorial factor. This can be rewritten in terms of the relativistic phase space in finite volume: in (3.9). In finite volume, the spectrum is discrete and phase spaces Φ p (E) are sums of δ-functions.
We will be mostly interested in the L → ∞ limit, Lm 1. For the purposes of evaluating the integral (E.7) we can then replace Φ p (E) by its infinite-volume limit: Eqs. (3.10) arise from the leading terms of (E.10) in the m/E expansion. These expressions can be obtained by the Laplace transform method [11]. For p = 2, 3 one can also expand the known exact expressions for the infinite-volume phase space [18].
It remains to discuss the diagrams with one and no contractions (six and eight external legs). These diagrams cannot be approximated by local operators, because the energy exchanged between the vertices is of always of the same order as the frequencies of the external particles. Consider for instance (E.11) The intermediate state energy is Since there are no free loop momenta, the intermediate state energy can never become parametrically large compared to the external energies. So there is no way to approximate this diagram by local operators; it has to be computed exactly.
The same is true for the rest of the diagrams with six or eight external legs: the intermediate energies is never much larger than E T . As shown in appendix D.1, the maximal possible energy is 2E T for the disconnected diagrams and 2E T + m for the tree-level diagrams.
Our strategy will therefore be as follows. In the "moderately high" energy range E T < E j E L our procedure of computing ∆H < 2 exactly (by multiplying matrices) will amount to taking into account all diagrams, including the 'non-local' ones like (E.11), without making any approximations. In the "ultrahigh" range E j > E L we will take into account the diagrams with 2,3,4 contractions in the local approximation (3.8). For this to be a reasonably good approximation we will take E L (2 -3)E T . The 'non-local' diagrams like (E.11) can be ignored when considering the "ultrahigh" range, in view of the discussed upper bound on their intermediate state energy.

E.2 Local approximation for ∆H 2 via correlation functions
In the previous section we explained very concretely how the local approximation arises via the diagrams. We will now review an alternative derivation which produces all the relevant terms quickly without having to sift through the diagrams. This will be especially helpful when we move to ∆H 3 where the number of diagrams is even larger. It also has other uses, e.g. if one wants to compute or understand the sub-leading corrections in the E T /E L expansion.

Consider the operator
which differs from ∆H 2 in that we sum over all intermediate states, not just over those above E T . We will first analyze ∆ H 2 (E * ). Then, we will obtain ∆H > 2 (E * ) by picking up the terms in ∆ H 2 (E * ) rs which have poles in E * located at E * > E L . This is the trick of [18].
Eq. (E.13) is the n = 2 case of (D.9), except that we shifted V 's to the symmetric time configuration, which explains the change E r → E rs = (E r + E s )/2 in the exponent. 38 This will be convenient, as the linear terms in τ will vanish when doing the local expansion around τ = 0.
We compute ∆ H 2 by applying Wick's theorem to express the operator product under the integral sign as a sum of normal-ordered terms: where G L (x, τ ) is the Euclidean propagator in finite volume, for positive times given by Recall that Eq. (E.14) can be used as a starting point to produce the diagrammatic expansion of [18], as we explained in appendix D. Each diagram has a series of poles in E * , which are the intermediate energies.
Restricting the diagrams so that all poles be above E T gives ∆H 2 . Here we would like to emphasize a different fact, namely that Eq. (E.14) can be also used as a starting point to produce the local approximation, bypassing the diagrams.
The local approximation takes into account the contributions of high-energy intermediate states. Since high energy corresponds to short times, it should be possible to pick up these contributions by studying correlation functions in the τ → 0 limit, using the operator product expansion [9,11]. So we Taylor-expand the operator insertions of Eq. (E.14) around x, τ = 0. Keeping only the leading term (the subleading O(x 2 , τ 2 ) terms can be used to study the m/E expansion) we get where the coefficients are given bŷ Plugging in (E.15) and performing the integral, we obtain The local approximation coefficients (E.6) are obtained from this by two simple and natural operations. First, we drop E rs in the denominator, since the external energies were totally neglected in (E.6). Second, we should add a θ-function restricting summation to intermediate states above E L . This is the operation which passes from ∆ H 2 to ∆H > 2 . Notice that if we apply these operations toκ 8−2p with p = 0, 1 we get zero. This is not surprising, since we already know that the diagrams with 0 or 1 contractions do not allow local approximation. Soκ 6 andκ 8 are unphysical and should be simply dropped.
To summarize, the correlation function method for deriving the local approximation for ∆H 2 proceeds as follows. Write down ∆ H 2 , and use the OPE under the integral sign. This gives an expansion in local operators with coefficients given by integrals of products of Green's functions. Do the integrals, drop the external energies, and insert θ-functions to enforce the intermediate energy thresholds. Use a bit of diagrammatic intuition to eliminate terms which come from diagrams without such high energy intermediate states (i.e. diagrams with 0 or 1 contractions).

E.3 Local approximation for ∆H 3 : general strategy
According to Eq. (3.11) we organize the calculation of ∆H 3 by splitting it into the <<, <> and >> parts. The << part will be evaluated exactly by multiplying matrices. In the language of diagrams, this means that contributions of all diagrams, including tree-level and disconnected ones, is taken into account. On the other hand, a local approximation will be used when evaluating ∆H <> 3 and ∆H >> 3 . The corresponding cutoffs should be chosen high enough so that the local approximation is accurate.
The calculation of ∆H <> 3 follows the logic explained after Eq. (3.16). It involves the matrix ∆H > 2 , which will be approximated by local operators, as reviewed in the preceding section. Recall though that the coefficients are evaluated at E L which will be fixed at E L /E L ∼ 1.5. We hasten to add that the introduced scales E L and E L are arbitrary. The final exact result should not depend on them. In practice the use of the local approximation introduces some dependence, but we check that it is quite negligible (appendix G).
We next discuss the calculation of ∆H >> 3 . Recall that both intermediate states in ∆H >> 3 have the H 0 -energies restricted to E j > E L . As we will now explain, for E L E T , ∆H >> 3 is well approximated by the local and bilocal operators in (3.15). In practice it will be sufficient to take The appearance of bilocal operators is one of several new issues encountered for ∆H >> 3 compared to the ∆H > 2 case. The derivation can use any of the two methods explained in sections E.1 or E.2 for ∆H > 2 . The first method starts from the exact diagrams, neglects the energies of the external states, and collects all the pieces that combine into the local operators. Here we will follow the second, equivalent, method which start from the correlation functions and uses the OPE. We consider the operator where φ x,t = φ(x, t), G ij is the Green's function (E.15) joining points i and j, and the symmetry factor is The next step would be to perform the OPE as in the step from (E.14) to (E. 16). This sets all points at the same time, and would seem to produce a local operator. However, one has to be careful. The leading term will indeed have all three points at the same time, but depending on the Wick contraction pattern, not all operators may end up at the same spatial point. First consider the fully connected contraction patterns, such as e.g. m = 2, n = 2, p = 1: These indeed force all three operators to live near the same x and t, giving rise to a local operator (: φ 2 : in this example). But what about not fully connected patterns? Most of these don't contribute to ∆H >> 3 , because the intermediate energy constraints are not satisfied. However, those which do contribute can give rise to bilocal operators. There are two patterns for which this happens. The first one is m = n = 0, p = 3: This clearly does not represent a local operator. Indeed, the six momenta are split into two groups, 4+2, which sum to zero independently. A moment's thought shows that the corresponding operator is : V 2 V 4 :. The second case is m = n = 0, p = 2: , (E. 24) which gives rise to the operator : V 4 V 4 :. Another not fully connected pattern which contributes to ∆H >> 3 is m = n = 0, p = 4: .
(E. 25) However, this one does give rise to a local operator V 4 (formally because 1.V 4 = V 4 ).
In this way we arrive at the local approximation shown in Eq. (3.15), with the coefficients related to the diagrams representing the various Wick contraction patterns. The λ-coefficients depend on E L , since we enforce the constraint that both intermediate state energies be above E L . Further details will be provided in appendix F.

F Local approximation for ∆H 3 : gory details
In this appendix we analyze in detail the local approximation (3.15) for ∆H >> 3 . The correlation function and OPE method presented in section E.3 gives rise to terms (E.20), corresponding to the various Wick contraction patterns. It's not difficult to reconstruct from which diagrams these terms would come if we started from the diagrammatic expansion rather than from the correlation functions. For example, pattern (E.22) would correspond to the four diagrams where the external lines could extend left or right, like in (E.1). The exact expressions for the diagrams would be sensitive to this information, but in the local approximation we just get an overall coefficient, common for the four diagrams and represented by the Wick contraction pattern. oscillator momenta and energies, as well as the energies of the external states. We will introduce a few rules to save space in the writing: • The external oscillators with their 1/ √ 2Lω k factors are not written, as they are included into V N . This also concerns one momentum conserving delta-function Lδ k i , or two of those if we are dealing with a bilocal operator. • In this section q i , p i and k i will denote the momenta connecting the left and central vertices, the central and right, and the left and right, respectively. Momenta always flow from right to left as in appendix D.
• The θ-functions imposing intermediate state energies above E L are understood but not written. They are always uniquely reconstructible, as one intermediate energy involves the sum of ω q 's and ω k 's, and the other the sum of ω p 's and ω k 's.
• Each diagram involves a sum over all finite volume momenta (2π/L)Z, subject to the shown δ-functions. We will define the following summation symbol that includes the relativistic normalization and has a finite infinite volume limit. If there are n momenta P i (be that q's, p's or k's) to sum over, we will write: .

F.1 Coefficients
λ 0 receives contribution from just one pattern (click on λ's to go to the asymptotic analysis of the corresponding diagram in section F.2): λ 2 receives contributions from patterns with 5 contractions. Up to left-right reflection (denoted by h.c.), there are 4 such patterns: λ 4 receives contributions from 6 patterns with 4 contractions: (F.12) λ 6 receives contributions from just two patterns: Finally, as explained in section E.3, coefficients of the bilocals are given by the patterns: A comment is in order concerning the diagrams for λ 4.6 and λ 6.2 . They have the middle vertex joined to the rest by a single propagator (the horizontal line). Strictly speaking, this invalidates the local approximation. Indeed, let p be the momentum flowing through this line, which is the sum of momenta entering the middle vertex. The original diagrams will depend on p through the propagator, and also through the energy of the right intermediate state. In (F.13) and (F.15) this dependence is neglected: p → 0, so that ω p → m. This is not a problem in the intermediate state, whose energy is dominated by the other energetic particles. But in the propagator this replacement is problematic, as it changes 1/(2ω p ) → 1/(2m) and overestimates the matrix elements unless p = 0.
A moment's thought shows that the diagrams for λ 4.6 and λ 6.2 should be more properly approximated by the following bilocal operators: with N = 1 and 3, respectively. However, in this paper we will not try to correct this small error.
For numerical evaluation, expressions (F.3)-(F.17) will be further simplified by taking the formal infinite volume limit L → ∞. This will be done by replacing The validity of this approximation, for the volumes L that we consider in our computations, will be justified below.
So we face the task of evaluating 15 coefficients corresponding to the L → ∞ limits of each diagram. It would be great if we could find analytic expressions for the spectral densities for both intermediate states, similar to (3.10). This would allow us to reduce these computations to twodimensional integrals in the energies of those intermediate states, similar to the one-dimensional integrals in (3.9). For 6 diagrams (4.1, 4.5, 4.6, 6.2, 2|4, 4|4) the spectral densities trivially reduce to products of spectral densities (3.10). For example, the spectral density for λ 4.1 is For the other diagrams we were not able to find analytic spectral densities. For those diagrams we evaluate the original multidimensional integral, for each needed E L and E * , numerically via Monte Carlo integration (we use vegas-3.2 in python).

F.2 L → ∞ limit and the asymptotic estimates
In this section we will carry out a rough asymptotic analysis to determine how various λ's scale with E L , m, L. The accuracy of these asymptotic approximations would be insufficient for practical computations, for which as mentioned we have to resort to numerical integration. Still, this exercise is instructive. It will also help understand the validity and limitations of the described formal L → ∞ limit which replaces sums over momenta by integrals. For brevity of presentation, we will not keep track of E * dependence. I.e. we assume E * E L and set E * → 0.  Table 3: Representative values for λ's and the asymptotic behavior for E = E L m, Lm 1. We only give leading-log asymptotics. The approximate numerical values are given for g = 1, m = 1, L = 10, E * = 0, in units of 10 −3 . For comparison the last three lines report κ's from (3.9) in the same format. See the text for the meaning of the last column. Click on λ's to go back to the drawn diagrams in section F.1.
The results of this analysis are summarized in Table 3. Below we explain how the entries of this table are obtained. We start from the simple diagrams and proceed to the more complicated ones. E 1 and E 2 will denote the energies of the two intermediate states, counting from the left. Depending on the context, the symbol ∼ in this section means proportionality, asymptotic equality, or leading-log asymptotics. Click on diagram's number to go back to its drawing in section F.1. The spectral densities are just two particle spectral densities, expressed in terms of two particle phase space Φ 2 (E), see (E.10), which in the limit L → ∞ is given by We will write Φ(E) = Φ 2 (E) if the total pair momentum is P = 0. This is the case for diagram 4.1 since we neglect the external momenta. So omitting the prefactors and setting E * → 0 we get where we used that Φ(E) ∼ 1/E 2 for E m. Notice that the L → ∞ approximation is justified. Indeed, since both intermediate states have large energy and the pair momenta is zero, it follows that both pair components have large momentum, and the spectrum is dense in that region. Thus it's clearly justified to replace sums by integrals.
Diagram 4.2. In this case the intermediate state E 1 is made of two groups of two particles, one of which is E 2 . So E 1 > E 2 . The joint spectral density is Φ(E 1 − E 2 )Φ(E 2 ), and we get The crucial question is what's the typical value of Then we can rescale E 1 = E 2 (1 + x) and write (F.23) where we used the asymptotics Φ(E) ∼ 1/E 2 for both Φ's. However, the end result is inconsistent since the integral over x diverges at x = 0. This means that in fact the leading contribution to λ 4.2 comes from the region where E 1 − E 2 is very close to the two particle threshold. In this region the approximation Φ(E) ∼ 1/E 2 is invalid. Instead we denote E = E 1 − E 2 and approximate where we used that ∞ 0 dE Φ(E) ∼ 1/m. It's important that this latter integral converges at the upper limit, otherwise we would not be able to approximate E 1 ≈ E 2 in the measure dE 1 /E 1 .
The main lesson is that singularities at the boundary of the phase space give rise to 1/(E 3 m) dependence where naive dimensional analysis ignoring the m scale would predict 1/E 4 .
One might worry about the validity of the naive L → ∞ approximation (replacing all sums by integrals) for this diagram, since as we have seen the dominant contribution involves a two particle state close to the threshold. There are not so many such states in finite volume, and one might worry about higher sensitivity to finite L effects compared say to diagram 4.1. However, a closer inspection of the exact expression for the relevant integral in finite volume (see (E.8)) shows that finite L effects are exponentially suppressed: This is not accidental. In fact the sum can be expressed as an integral of the propagator (E.15): 26) and the finite and infinite volume propagators differ in position space by exponentially small "winding" terms. Similar reasoning will apply for the other diagrams, and in the end we will show that the L → ∞ approximation is justified for all of them.
There is however another effect related to the importance of low momenta states for this diagram, which is not so innocuous. This concerns the dependence on the external momenta, marked by in the last column of the table. Denote by P ext = P the momentum flowing into the diagram through the middle vertex. Naively if P = O(m) E L it can be neglected (and it was neglected above). However for this diagram this neglect is not valid, because the small loop is very sensitive to this momentum. If P = 0 we must replace Φ(E) by Φ(E, P ) in (F.24). Since we see that even P = O(m) leads to an O(1) change (suppression) of this diagram. So strictly speaking it's not allowed to neglect the P ext dependence.
We will see below several other diagrams exhibiting P ext dependence, by the same mechanism (4.3, 2.2, 2.3), or by a slightly different one (4.6, 6.2). Although it's certainly possible to include this dependence in our numerical calculations, it's a bit tedious, and in this paper we will not take it into account, setting P ext → 0. This can be improved in the future work if needed. where we neglect the particle mass. The invariant mass of the two particle state is If we denote E 2 = E 1 x and use the approximation Φ(s) ∼ 1/s we get (E = E 1 ) .
The integral over x is log-divergent at the lower limit, and must be cut off at x ∼ m 2 /E 2 because of the cutoff s > 4m 2 which we ignored so far. So Although the leading contribution involves two particle states with small invariant mass, their total momentum was large. As a result this diagram will not be particularly sensitive to finite L and P ext effects. λ 2.4 is similar but involves the three particle phase space, with an extra log in the asymptotics.
Diagrams 4.5, 2|4, 4|4 where we used the leading-log four particle phase space asymptotics Φ 4 (E) ∼ (log E/m) 2 /E 2 The overall factor L arises because the diagram is disconnected. The other two diagrams are fully analogous, except three and two particle spectral densities are involved, and the factor L is absorbed into the bilocal operator.
Diagrams 4.6 and 6.2 where we used the leading-log three particle phase space asymptotics, and P is the external momentum flowing in through the central vertex (we assume ω P E L ) The expressions in (F.13) and in the table correspond to ω P → m which neglects the P ext dependence and overestimates the diagram. λ 6.2 is similar but involves the two particle phase space. Notice that the mechanism for P ext dependence of these two diagrams is different and simpler than for 2.2, 2.3, 4.2, 4.3.
Diagram 6.1. Denoting by p the momentum going around the loop we have where we neglected m in the second approximation. The bottom line is forced to carry a large momentum, which is different from λ 4.3 and λ 2.3 where the main contribution came from soft bottom line momenta. As a result this diagram clearly has no finite L or P ext sensitivity.
While naively one may have expected 1/E 4 L asymptotics, there are two regions of phase space which give an enhanced contribution. The first one is that of small k, whose contribution is where we cut off the log-divergent k integral at k ∼ E L , where the small k approximation breaks down.
The second enhanced region is 2k ∼ E 1 ∼ E 2 , where the invariant masses of the two particle states are small. Consider the half of the integral where E 2 > E 1 . Introduce E 2 = xE 1 , x > 1 and ω k = yE 1 /2, 0 < y < 1. Neglecting the m 2 in the r.h.s. of (F.39) for the moment, and using the approximation Φ(s) ∼ 1/s, which will be adequate to pick the leading-log part, we get Notice that I(y) has a log singularity as y → 1 but has a finite limit as y → 0. Substituting I(y) into (F.40) and recalling the effective cutoff m 2 /E 2 for y near 1, we get that the contribution of this region is doubly log-enhanced.
The invariant masses of the two particle phase spaces are: The dominant region will be E, p = O(m) E 2 ∼ E L . Contribution from this region is We can equivalently write I as I 3 , where There will also be sensitivity to P ext for this diagram.
Diagram 0. We have three groups of two particles, each of the same total momentum P = q 1 + q 2 = p 1 + p 2 = −(k 1 + k 2 ). Let E be the energy of the q 1 , q 2 group, then the other two have energies E 1 − E and E 2 − E 1 + E. We have We can equivalently write I as (F.45) with N = 4, from which the rest of the argument follows.

F.3 General lessons
One important lesson of the careful discussion in this section is that one must be cautious applying naive dimensional analysis to predict how the coefficients of the local approximation scale with E L . For the situation at hand, naive dimensional analysis fails as often as it is successful, because other scales with the dimension of energy, m and L −1 , come in and change the scaling.
It would be interesting to develop a local approximation procedure appropriate for the renormalization at the cubic order in the context of TCSA, in which H 0 describes a CFT. In the φ 4 case the role of the scale m was to regulate IR divergences, and power counting may be simpler in the TCSA case when no IR divergences are present. However, as mentioned in note 11, we do expect bilocal operators to appear in the TCSA case as well.

G Local approximation: checks of accuracy
As explained in section 3.2 and appendix E, we have the scales E L , E L , E L which control the accuracy of the local approximation used to compute the ultrahigh pieces of ∆H 2 and ∆H 3 . In this appendix we present some numerical checks of how accurate the local approximation is. For illustrative purposes, we pick a rather low E T .
In Fig. 15 we plot two matrix elements of ∆H 2 as function of E L : 0|∆H 2 |0 on the left and 0|∆H 2 |2 0 on the right, with |0 the free theory vacuum and |N 0 denoting N particles at rest. These matrix elements are computed as explained in (3.5), i.e. by splitting the calculation as ∆H 2 = ∆H < 2 +∆H > 2 . Recall that ∆H < 2 includes the contribution from states in the range (E T , E L ]. Instead, ∆H > 2 includes those in the range (E L , ∞) and is computed in the local approximation in the L → ∞ limit, using the expressions in (3.9 Figure 15: Two low-energy matrix entries of ∆H < 2 and of ∆H < 2 + ∆H > 2 as a function of E L . ∆H < 2 is computed exactly and ∆H > 2 in the local approximation. on the arbitrary scale E L . The only local operator in (2.13) that can connect the state |0 with itself is V 0 . Thus, the left plot tests the coefficient κ 0 . Instead, the right plot tests κ 2 since V 2 is the only operator in (2.13) with non-zero matrix element 0|V N |2 0 . In these plots E T was fixed to 10, but in fact this test does not depend on E T since shifting E T just adds a constant to both curves. We did other similar plots for different matrix entries (in particular testing κ 4 ), showing equally good behavior.
Analogously, in Fig. 16 the matrix elements 0|∆H 3 |0 (left) and 0|∆H 2 |6 0 (right) are plotted as a function of the scale E L , fixing E L = 1.5E L . These plots are a numerical test of (3.11). The steepest solid line of both plots includes only the nonlocal piece ∆H << 3 of (3.12), including the states in the range E T < E k E L . Instead, the flatter dashed (dotted) lines add to the solid ones the operators ∆H <> 3 (and ∆H >> 3 ) in (3.11). The matrix ∆H <> 3 in (3.14) is computed as explained after (3.16), i.e. the contribution of the states between E L and E L is computed exactly doing matrix multiplication while in the range (E L , ∞) we use the local approximation for ∆H 2 . The matrix ∆H >> 3 in (3.13) is calculated entirely in the local approximation, taking the L → ∞ limit.
The left plot in Fig. 16 is a check of λ 0 in (F.3). Instead the right plot tests the λ coefficients of those (bi-)local operators in (3.15) that can connect the vacuum |0 with the six-particle state |6 0 . These are the operators V 6 and : V 2 V 4 :. Hence, the right plot is a check of the diagrams λ 6.1 , λ 6.2 , λ 2|4 . We did similar plots for other matrix elements of ∆H 3 in order to test the rest of the λ's, and we obtained similar results to the ones shown in Fig. 16.
Lastly, in Fig. 17 we show two plots of the vacuum energy. In the left plot we vary E L keeping fixed E L = 1.5E L = 2E T , while on the right we vary E L keeping E L = 1.5E L and fixed E L = 3E T . We use the full ∆H 2 or just its ∆H < 2 part on the left, and the full ∆H 3 or just its ∆H << matrix elements with energies E i close to E T . However, as we stressed several times, such states have a relatively low impact on the lowest excited states, even for moderately strong couplings g. This must be the reason why the spectrum varies so little even for low values of E L and E L . Nevertheless, in the main text we were conservative and took relatively large values of E L , E L , E L , so that all matrix elements of ∆H 2,3 are well approximated.
As mentioned in section F.2, some terms in the local expansion of ∆H 3 (those marked with in Table 3) are sensitive to the momenta of the external states P ext . In our way of approximating those terms, the magnitude of the corresponding matrix entries is overestimated. As the scale E L is increased those matrix entries decrease. Perhaps this can be used to explain why the dashed line in the right plot of Fig. 17 shows some residual growth. Namely, at leading order, the correction to the vacuum due to off-diagonal elements in ∆H 3 is negative, due to the usual level-splitting. Then, as E L is increased the value of the vacuum energy should indeed somewhat increase. Although this is a perturbative argument, perhaps there is some truth to it.
In any case, in the future it could be interesting to take better care of P ext dependence, as explained in section F.2. This should reduce the residual E L -dependence of the spectrum. Perhaps one can then lower further the value of E L needed to achieve a given accuracy, saving a significant amount of computational resources.

H Fit procedure
In this appendix we give further details on the fitting procedure.

H.1 Infinite cutoff extrapolation
After we compute the numerical NLO-HT mass and vacuum energy at finite E T , we try to extrapolate them to E T = ∞, by fitting the data points with a function of the form (4.1). 40 The central value and error bars are computed as follows. For each n = 0, 1, 2, 3, we remove n points in the low E T part of the data sample, specifically 10 E T 12.5, where in total 5 data points are present for our choice of discretization of E T . Several subsamples are generated, by removing n points according to all possible combinations. Fitting the model (4.1) for each subsample, we obtain a series of "fit models" F i (E T ) and the corresponding asymptotes α i . Then, we take the mean, max and min of the α i 's as the central value, the upper bound and the lower bound estimate.Furthermore, to account for fluctuations for the higher values of E T , we provide alternative estimates for the error bars, as follows. We compute the maximum absolute difference between the data points and the mean of the F i (E T ) in the range E max − 5 E T E max , where E max is the maximum cutoff we attain at a given L. The final error bars are the largest between the two methods.

H.2 Estimate of the critical coupling
The critical value of g where the theory undergoes a phase transition is determined from the right plot in Fig. 4. The red data points of the plot are fitted with the rational function in (4.7), minimizing over g 1 , g 2 , g 3 , a and g c the "log-likelyhood function" formed for N data points: (H.1) 40 For g 1, we instead set γ = 0. This is motivated by the fact that by eye the dependence is predominantly linear in 1/E 3 T , and that the linear fit is more robust to the fluctuations of the data around the smooth curve. While this procedure may seem ad hoc, we tested it, and it works well. In the future one can think of more complicated fitting procedures.
The central value g c = 2.76 reported in Table 2 corresponds to the smallest χ 2 , call it χ 2 (2.76). The uncertainty was determined through the following procedure. We fix g c close to 2.76 and fit the same ansatz (4.7), minimizing the log-likelyhood only over g 1 , g 2 , g 3 and a. The error interval reported in Table 2 corresponds to those g c for which the root mean square normalized error is within factor 3 of what it is at g c = 2.76, i.e. χ 2 (g c )/N 3 χ 2 (2.76)/N . (H.2) We believe that this error determination is conservative.

I Algorithmic details
We will describe here some details of the basis and matrix generation algorithms used in this work, highlighting key improvements over the code used in [11]. It will be important to control both the time and memory complexity of the computation. The core component of our code is a routine 41 F : |i → {V ji |j | E j E max , V ji = 0} , (I.1) taking as input a state and returning all the states |j and coefficients V ji such that V ji = 0, in a given energy range. This routine will be described in more detail in section I.2.

I.1 Basis generation and storage
We use two different data structures to represent the Fock states in the Hilbert space. The reason to do so will become clear below. Each Fock state is represented in one of the following ways: 1. as a list of tuples [(n, Z n ), . . .] where n represents wavenumber and Z n occupation number (only Z n > 0 are included). The list is ordered in n. E.g. [(−1, 3), (0, 2), (3, 1)] is a state in this representation. This representation is convenient to use as input for the routine (I.1), but it's relatively expensive in memory.
2. as a fixed-length list of all occupation numbers [Z −nmax , Z −nmax+1 , . . . , Z nmax ], including the zeros. E.g. [0, 0, 3, 2, 0, 0, 1] with n max = 3 is the above state. The state which is part of the output of the routine (I.1) is efficiently computed in this representation. It can be stored cheaply in memory as a byte sequence (bytes in python).
As in [11], we restrict ourselves to the truncated Hilbert space with total momentum P = 0. Furthermore, we are interested in the part of the Hilbert space which is P-invariant (where P is spatial parity). Its basis is formed by the states which are either P-invariant Fock states or have the form (|ψ + P|ψ )/ √ 2, (I.2) where |ψ is a Fock state such that |ψ = P|ψ . In the latter case the state is represented in the basis by storing either |ψ or P|ψ (but not both), choosing between the two arbitrarily. Finally, we work separately in the sectors Z 2 = ±1 of the field parity φ → −φ. The finite-dimensional Hilbert space that is stored numerically is composed of several parts: • "Low energy" states, i.e. all states with energy E E T . This chunk of the Hilbert space is stored both in Representation 1 and 2. In this work it typically contains ∼ 10 4 elements.
• "Moderately high" states with energy E T < E E L . Typically, in this work we choose E L ∼ 2E T . These states are summed over in the computations of ∆H << 3 in in (3.12) and of ∆H > 2 in (3.14). Notice that these are not all the states in the given energy range, as we only need those states j for which there is a nonzero V matrix element connecting them to a low energy state: This distinction is important, as the number of all states in the given range grows exponentially with E L (for fixed E T ), while the number of those respecting the condition (I.3) only polynomially. To generate them, we do the following. As mentioned, we have routine (I.1) which, given a state |i in Representation 1 as an input, returns all the states |j such that V ji = 0 in Representation 2. We apply this routine (with E max = E L ) over all the states below E T , and save the results in a hashset (set in python), which has constant lookup time.
In this way, states are not overcounted. Finally, the states j so generated are stored in both representations. In this work, their number is usually of the order 10 6 .
• States with energy E L < E max(E L , E L ), which are summed over either in ∆H < 2 in (3.6) or in ∆H > 2 in (3.16). In this work we typically choose E L ∼ E L ∼ 3E T . These states are generated analogously to the "moderately high" states above, but they are not saved in the Representation 1 format, because it is not necessary to act on these states with V anymore. This saves a significant amount of memory, as there can be around 10 7 − 10 8 states in this chunk of the Hilbert space.

I.2 Computation of matrix elements
We will now describe some details of the routine (I.1), and of how the matrices ∆H 2 , ∆H 3 are computed. Suppose we want to find all the non-vanishing matrix elements V ji between all the states |i ∈ H I , |j ∈ H J , where H I , H J are subsets of the Hilbert space with maximal energies E I max and E J max . We assume E I max E J max without loss of generality. The procedure is described below.
First, the local operator V must be represented efficiently, by decomposing it into sums of elementary terms. For generality we consider V = φ n , with n arbitrary. In this way, the code can be used to construct both the "non-local" and "local" parts of ∆H 2 , ∆H 3 , where all the even powers of n 6 appear. 42 Schematically, V is a sum of products of oscillators where n c is the number of creation operators. This sum is infinite, but for given finite E I max , E J max only a finite subset will contribute nontrivially to the matrix elements we wish to compute. These relevant terms are selected and stored in memory as follows: • For each n c , we cycle over all the states in H I , creating a set of all the possible (n − n c )dimensional tuples {q i } of momenta which are present in at least one state. We will only need terms in (I.4) for which {q i } is such a tuple, since all other terms annihilate all states.
• We iterate over this set of tuples, and for each tuple we generate a list of all the possible n c -dimensional tuples of momenta {k i }, subject to the constraints This list is then sorted in energy. Clearly we only need terms in (I.4) for which {k i } is such a tuple, since any other term will either violate the zero momentum condition or raise the energy of the state above the threshold E J max we are interested in.
• For each n c , we create a hash table (dict in python) mapping the tuples of annihilation momenta to the sorted lists of tuples of creation momenta. It is useful to use this data structure as it has constant lookup time. Also, we construct a similar hash table of the same size, containing all prefactors (including the factors 1/ √ 2ωL and the combinatorial factors) for each pair of creation-annihilation sets of operators. These coefficients, multiplying the terms in (I.4) (not shown in that equation for simplicity), are precomputed for efficiency.
Next, we cycle over H J and create a lookup hash table (dict) of all the associations {|j : j} between the states |j and the row indices of the matrix V ji .
Finally, we enter the core routine (I.1), which makes use of the data structures defined above, and works as follows: • We iterate over H I , select a state |i with energy E i and generate a list of all the sets of momenta {q i } than can be annihilated at each value of n c .
• We iterate over this list, selecting a tuple {q i }, and get the corresponding list of tuples {k i } previously computed in the hash table.
• We iterate over the list of {k i }. This inner loop is the most expensive part of the computation, and it has been optimized using the cython extension. We act on the state |i with the given sets of creation and annihilation operators and generate a new state |j in Representation 2 and partial coefficient V ji . The state is looked up in the hash table to get the index j.
• We add the partial coefficient to the column i of the matrix, and repeat through the previous points, until the column i of V ji is entirely computed. We add this column to the full matrix in the sparse format (for maximum efficiency we use the coo format in scipy.sparse).
At the end of this cycle one obtains the full matrix V ij over the subspaces H I , H J . V is then converted from the coo to the csc format to allow for fast algebraic operations.
Some of the tricks describe above reduce the time complexity by orders of magnitude. We do not report other tricks which speed up the computation by factors of a few. For example, many quantities, such as the energies of the states, can be precomputed and stored. Also, if H I = H J and V is hermitian, only half of the terms in (I.4) related to each other by conjugation can be retained.

I.3 Evaluation of ∆H 2 , ∆H 3
The matrices ∆H 2 and ∆H 3 are computed by summing over basis states with energy above E T . As explained in section 3.2, they are decomposed into a "non-local" part, where the Fock states are summed over exactly, and a "local" part, where the sum is approximated analytically. Here we describe in detail how to evaluate efficiently the non-local part of the matrices. This step represents the bottleneck of the entire numerical computation.
The sum (3.6) has to be evaluated. To do so, it is most convenient to to apply the routine (I.1) over the basis states with energy E E T , to construct the matrix V kj in (3.6) and its transpose. Then, ∆H < 2 is easily evaluated by multiplying those.

3
One has to compute the sum (3.12). The matrices V ik and its transpose are constructed as above. Instead, V kk is sometimes too large to be stored in memory, even if it's sparse. If this happens, we divide the set of basis states with energy E T E E L into chunks and compute blocks of V kk one at a time, summing over them sequentially.

3
The non-local contribution to ∆H <> 3 in (3.14) is evaluated analogously to ∆H << 3 . To save resources, it is important to evaluate the matrix elements V kk cycling over the states with energy E T E E L and acting on them with V , rather than cycling over the more numerous states in the range E L E E L .