Variational optimization algorithms for uniform matrix product states

We combine the Density Matrix Renormalization Group (DMRG) with Matrix Product State tangent space concepts to construct a variational algorithm for finding ground states of one dimensional quantum lattices in the thermodynamic limit. A careful comparison of this variational uniform Matrix Product State algorithm (VUMPS) with infinite Density Matrix Renormalization Group (IDMRG) and with infinite Time Evolving Block Decimation (ITEBD) reveals substantial gains in convergence speed and precision. We also demonstrate that VUMPS works very efficiently for Hamiltonians with long range interactions. The new algorithm can be conveniently implemented as an extension of an already existing DMRG implementation.


I. INTRODUCTION
The strategy of renormalization group (RG) techniques to successively reduce a large number of microscopic degrees of freedom to a smaller set of effective degrees of freedom has led to powerful numerical and analytical methods to probe and understand the effective macroscopic behavior of both classical and quantum many body systems. 1-4 However, it was not until the advent of White's celebrated Density Matrix Renormalization Group (DMRG) 5,6 that variational RG methods reached unprecedented accuracy in numerically studying strongly correlated one dimensional quantum lattice systems at low temperature. The underlying variational ansatz of Matrix Product States (MPS) [7][8][9][10][11][12][13] belongs to a class of ansatzes known as Tensor Network States. 11,14,15 These variational classes encode the many body wavefunction in terms of virtual entanglement degrees of freedom living on the boundary and thus satisfy an area law scaling of entanglement entropy per construction. As such, they provide a natural parameterization for the physical corner of Hilbert space, where low energy states of quantum many body systems ought to live in. 16,17 MPS in particular are especially fit for studying ground states of strongly correlated one dimensional quantum systems with local interactions. [18][19][20] The variational parameters in MPS are contained within local tensors associated with the individual sites of the lattice system. For homogeneous systems, the global wave function can then be captured using just a single (or a small number of) such tensors, independent of the system size. They consequently offer very natural access to the thermodynamic limit, providing a clear advantage over other numerical approaches such as Exact Diagonalization or Quantum Monte Carlo.
On finite lattices, (one-site) DMRG implements the variational principle (energy minimization) by exploiting that the quantum state is a multilinear function of the local tensors. By fixing all but one tensors, the global eigenvalue problem is transformed into an effective eigenvalue problem for the local tensor. 5,6,12,[21][22][23] Using a translation invariant parameterization gives rise to an energy expectation value with a highly non-linear dependence on the tensor(s). Two different algorithms are widely used to obtain such an MPS in the thermodynamic limit. Infinite system DMRG (IDMRG) 5,6,24 proceeds by performing regular DMRG on a successively growing lattice, inserting and optimizing over new tensors in the center of the lattice in each step only, effectively mimicking an infinite lattice by using a finite, albeit very large lattice. After convergence the most recently inserted tensors in the center are taken as a unit cell for an infinite MPS approximation of the ground state. An alternative approach is known as infinite time evolving block decimation (ITEBD). 25,26 It works directly in the thermodynamic limit and is based on evolving an initial state in imaginary time by using a Trotter decomposition of the evolution operator.
We present a new variational algorithm, inspired by tangent space ideas, 13,27,28 that combines the advantages of IDMRG and ITEBD and addresses some of their shortcomings. As such it is directly formulated in the thermodynamic limit, but at the same time optimizes the state by solving effective eigenvalue problems, rather than employing imaginary time evolution. We find that it leads to a significant increase in efficiency in all of our test cases. The following section introduces MPS notations and definitions and presents our variational algorithm, heuristically motivated from the perspective of finite size DMRG. Sec. III illustrates the performance of our algorithm on various test cases, and compares to conventional IDMRG and ITEBD results. After the conclusion in Sec. IV, we provide further technical details in the appendices. Appendix A contains additional theoretical background: we derive the self-consistent conditions that characterize the variational minimum and provide additional motivation for our algorithm from the perspective of the MPS tangent space. Appendix B presents a suitable strategy to expand the bond dimension of translation invariant MPS. Appendix C explains how to construct effective Hamiltonians in the thermodynamic limit. These involve infinite geometric sums of the transfer matrix, which are further studied in Appendix D. In this section we introduce a variational algorithm for optimizing MPS directly in the thermodynamic limit. Because the algorithm strongly resembles conventional DMRG, we explain it by describing a single iteration step from the viewpoint of DMRG and show that only a few additional ingredients are needed to arrive at our variational algorithm. We only briefly motivate these extra ingredients for the sake of readability, and refer to Appendix A for additional explanations and more rigorous theoretical motivations. As such, the new algorithm can easily be implemented as an extension to an already existing (I)DMRG implementation.
We start by considering a setting familiar from conventional DMRG: a finite homogeneous one dimensional quantum lattice system, where every site corresponds to a d level system. We label the sites by an integer n and thus have a basis {|s n , s = 1, . . . , d} for the local Hilbert space on site n. The total Hilbert space is spanned by the product basis |s = n |s n . We assume the dynamics of the system to be governed by a translation invariant Hamiltonian H.
We further consider a variational parameterization of a ground state approximation of the system, for now in terms of a finite size (site dependent) MPS, but we will ultimately be interested in the thermodynamic limit. DMRG proceeds to find the best variational ground state approximation by employing an alternating least squares minimization: It starts from some initial state and successively optimizes each of the individual MPS tensors site by site by solving effective (Hamiltonian) eigenvalue problems, in a sweeping process through the lattice until convergence, where each iteration depends on already optimized tensors from previous iterations (see e.g. Refs 5,6,12,21,and 23).
We are now however interested in the thermodynamic limit n ∈ Z (but will ignore the technical complications involving a rigorous definition of a Hilbert space in that limit). In that case the MPS ground state approximation will be given in terms of a translation invariant uniform MPS, described by a single MPS tensor (or a unit cell of N tensors), repeated on all sites. Two immediate difficulties arise: Firstly, conventional DMRG updates the variational state site by site, thus breaking translation invariance. Secondly, the effective Hamiltonian for a single-site optimization has to be constructed from an infinite environment.
After briefly introducing the variational class of uniform MPS and introducing necessary notation and conventions (for further details see Sec. A 2), we describe how the new algorithm modifies DMRG accordingly to exactly account for these two issues in order to arrive at a variational ground state algorithm directly formulated in the thermodynamic limit.

A. Uniform MPS
A uniform MPS (uMPS) of bond dimension D defined on an infinite translation invariant lattice is parameterized by a single collection of d matrices A s ∈ C D×D for s = 1, . . . , d. The overall translation invariant variational state is then given by and can be represented diagrammatically as Exploiting the invariance of (1) under local gauge transformations A s → XA s X −1 , with X ∈ C D×D invertible, the state can be cast into certain favorable representations, among them the left and right canonical representation or diagrammatically Here L and R correspond to the left and right reduced density matrices of a bipartition of the state respectively. We henceforth refer to A L (A R ) as a left (right) isometric tensor, or just a left (right) isometry.
Defining the left and right transfer matrices and using the notation (x| and |x) for vectorizations of a D × D matrix x in the D 2 dimensional "double layer" virtual space the transfer matrices act upon, the gauge conditions (2) are equivalent to i.e. 1 1 and R are the left and right dominant eigenvectors of T L , while L and 1 1 are the left and right dominant eigenvectors of T R .
As is standard practice in DMRG, we mix both of these representations and cast the state into the mixed canonical representation Here we have defined the center site tensor A s C (known as the single-site wave function Ψ s in DMRG) in terms of the bond matrix C, which constitutes the (invertible) gauge transformation relating A L and A R via A s L = CA s R C −1 . The singular values of C then encode the entanglement spectrum of the state. Indeed, using A s L C = CA s R we can verify that the left and right reduced density matrices in (2) are given by L = C † C and R = CC † . Furthermore, A s L C = CA s R ensures that (5a) and (5b) are translation invariant and that A C and C can be shifted around arbitrarily. Normalization of the state, as well as of the reduced density matrices L and R, corresponds to the single condition C 2 2 = Tr(CC † ) = 1. For ease of notation we further introduce the following partial states with n arbitrary, and use them to define the reduced basis states |Ψ (α,s,β)

B. Effective Hamiltonian
The use of the mixed canonical representation (5a) in DMRG is of significant importance for the stability, as it reduces the minimization of the (global) energy expectation value Ψ|H|Ψ / Ψ|Ψ with respect to A C into a standard (hermitian) eigenvalue problem, instead of a generalized one. The effective Hamiltonian for this eigenvalue problem is the system Hamiltonian H projected onto the degrees of freedom of A C , and is known as the "reduced" or "superblock" Hamiltonian in DMRG.
We define the thermodynamic limit version of this reduced single-site Hamiltonian acting on A C as Additionally, we also define an effective Hamiltonian acting on the bond matrix C as which does not appear directly in the context of DMRG, but will be needed later for a consistent update of the state without breaking translation invariance. It can be interpreted as a "zero site" effective Hamiltonian, which would feature in an optimization of the global energy expectation value with respect to the Schmidt values.
In an efficient implementation, these effective eigenvalue problems are typically solved using an iterative eigensolver, so that we only need to implement the action of H A C and H C onto A C and C.
While the energy expectation value is extensive and thus divergent in the thermodynamic limit, the effective Hamiltonians H A C and H C are well defined and finite in the thermodynamic limit if one properly subtracts the current energy expectation value from the Hamiltonian H. We demonstrate this procedure for the case of nearest neighbor interactions H = n h n,n+1 , where the two-site Hamiltonian h n,n+1 acts on neighboring sites n, n + 1 only. We refer to Appendix C for the case of long range interactions and for general Hamiltonians given in terms of Matrix Product Operators (MPOs).
In the case of nearest neighbor interactions, the action of H A C onto A C splits up into four individual contributions, which follow from the decomposition |Ψ = α,β,s A s C,(α,β) |Ψ α L |s |Ψ β R (left block containing sites n < 0, center site n = 0, and right block containing sites n > 0). The action of H A C onto A C is given by where the first two terms correspond to the Hamiltonian terms h −1,0 and h 0,1 coupling the center site to the left and right block, respectively, and H L and H R sum up the contributions of all the Hamiltonian terms h n,n+1 acting strictly to the left and to the right of the center site.
The environments H L and H R are usually constructed iteratively while sweeping through the (finite) lattice in conventional DMRG, or grown successively in every iteration in IDMRG. In the thermodynamic limit, these terms consist of a diverging number of individual local interaction contributions h n,n+1 , and care needs to be taken in their construction.
Indeed, the k th contribution to (H L | comes from the Hamiltonian term h −k−1,−k and is given by or diagrammatically Summing up all such local contributions gives rise to infinite geometric sums of the transfer matrices T L/R where (H L | can be presented diagrammatically as and likewise for |H R ). The transfer matrix T L has a dominant eigenvalue of magnitude one, with corresponding left and right eigenvectors (1 1| and |R). The projection (h L |[T L ] k |R) = (h L |R) is the energy density expectation value e = Ψ|h −k−1,−k |Ψ and is independent of k. Subtracting the energyh = h − e1 1 from the Hamiltonian, we can write (h L | = (h L | + e(1 1|. The second term is exactly proportional to the left eigenvector of eigenvalue 1 and therefore gives rise to a diverging contribution in the geometric sum, corresponding to the total energy of the left half infinite block. Since this contribution acts as the identity in the effective Hamiltonian H A C [Eq. (11)], we can however safely discard this diverging contribution without changing the eigenvectors of H A C . This corresponds to an overall energy shift of the left half infinite block such that (H L |R) = 0. For the remaining part (h L | the geometric sum converges. With |h R ) = |h R ) − e|1 1) the same comments apply to the construction of |H R ).
We can evaluate H L and H R recursively as with initialization (H R ) = |h R ). We can repeat these recursions until e.g.
drops below some desired accuracy S . This strategy is conceptually simple and closely resembles the successive construction of the environments in the context of (I)DMRG, but is not very efficient, as its performance is comparable to that of a power method. Calculate hL and hR from (12) 3: Calculate HL and HR by iteratively solving (14) or (preferably) (15), to precision S

13:
Calculate updated C from (16) 14: return C 15: end function A more efficient approach is to formally perform the geometric sums in (13) explicitly, and to iteratively solve the resulting two systems of equations for (H L | and |H R ) to precision S , as explained in Appendix D. So far, we have discussed the action of H A C . The action of H C onto C follows simply from (11) by projecting onto A L or A R . Using the defining property of H L or H R , the result simplifies to The first two terms of (11) can be applied in O(d 4 D 3 ) operations 29 , and the last two terms in O(dD 3 ) operations. For (16) the first term can be applied in O(d 4 D 3 ) operations, and the last two terms in O(D 3 ) operations. To generate the necessary terms for (11) and (16) we have to iteratively evaluate two infinite geometric sums, involving O(D 3 ) operations (when iteratively solving (15) the solutions from the previous iteration can be used as starting vectors to speed up convergence). A pseudocode summary for obtaining the necessary explicit terms of H A C and H C and their applications onto a state is presented in Table I We want to construct an alternative scheme that applies global updates in order to preserve translation invariance at any time. Global updates can most easily be applied with an explicit uniform parameterization in terms of a single tensor A. On the other hand, DMRG experience teaches us that the stability is greatly enhanced when applying updates at the level of A C and C, which are isometrically related to the full state.
We therefore calculate the lowest eigenvectorÃ C of H A C like in DMRG, but additionally also the lowest eigenvectorC of H C . We then globally update the state by finding newÃ L andÃ R as the left and right isometric tensors that minimize respectively. These minimization problems can be solved directly (not iteratively) and without invertingC (see below). As shown in Appendix A, at the variational optimum the values of these objective functions go to zero, and current A C and C are the lowest eigenvectors of H A C and H C respectively.
For the remainder of this section we omit tildes and use the following matricizations of the 3-index tensors We thus want to extract updated A L and A R from updated A C and C by solving In exact arithmetic, the solution of these minimization problems is known, namely A L will be the isometry in the polar decomposition of A we thus obtain Notice that close to (or at) an exact solution A s C = A s L C = CA s R , the singular values contained in Σ [ /r] are the square of the singular values of C, and might well fall below machine precision. Consequently, in finite precision arithmetic, corresponding singular vectors will not be accurately computed.
An alternative that has proven to be robust and still close to optimal is given by directly using the following left and right polar decompositions to obtain where matrices P are hermitian and positive. Alternative isometric decompositions might be considered in Eq. (21), though it is important that they are unique (e.g. QR with positive diagonal in R) in order to have P [ /r] close to convergence.

D. The Algorithm: VUMPS
We are now ready to formulate our variational uniform MPS (VUMPS) algorithm. As shown in Appendix A, a variational minimum (vanishing energy gradient) in the manifold of uMPS is characterized by tensors A L , C and A R satisfying the conditions Here bold symbols denote vectorizations of the MPS tensors and matricizations of the effective Hamiltonians, and E A C and E C are the lowest eigenvalues of the effective Hamiltonians. 31 When iterating the steps outlined in the previous subsections, convergence is obtained when these conditions are satisfied. In particular, starting with a properly orthogonalized initial trial state |Ψ(A) of some bond dimension D, we begin by solving the two eigenvalue problems for the effective Hamiltonians H A C and H C . Since we are still far from the fixed point, the resulting lowest energy statesÃ C andC will in general not satisfy the gauge condition (23c) together with current A L/R .
Following the procedure of the previous section we can however find optimal approximationsÃ s L andÃ s R for (23c) to arrive at an updated uMPS. Conversely,Ã C andC will not be the correct lowest energy eigenstates of the new effective Hamiltonians HÃ C and HC generated fromÃ L/R . We then use the updated state and reiterate this process of alternately solving the effective eigenvalue problems, and finding optimal approximations for A L and A R to update the state. For a pseudocode summary of this algorithm, see Table II.
We now elaborate on the various steps in the VUMPS algorithm. Firstly, extracting newÃ L/R from updated A C andC can be done using the theoretically optimal (but numerically often inaccurate) Eq. (20) or the more robust Eq. (22), depending on the magnitude of the smallest singular value inC. As a good uMPS approximation will always involve small singular values, Eq. (22) is preferable most of the time, except maybe during the first few iterations.
The maximum of the error quantities (18) provides an error measure for the fixed point condition in Eq. (23c) and is used as a global convergence criterion. (optional) Dynamically adjust bond dimension following Appendix B

5:
Calculate explicit terms of effective Hamiltonians HA C , HC ←HeffTerms(H,AL,AR,L,R, S ≤ prec) from Algorithm 1, 5 or 6 6: Calculate ground stateÃC of effective Hamiltonian HA C to precision H < prec using an iterative eigensolver, calling ApplyHAC(AC ,HA C ) from Algorithm 1, 5 or 6 7: Calculate ground stateC of effective Hamiltonian HC to precision H < prec using an iterative eigensolver, calling ApplyHC(C,HC ) from Algorithm 1, 5 or 6 8: Calculate newÃL andÃR fromÃC andC using (20) or (22), depending on singular values ofC 9: Evaluate new L and R from (18) 10: (optional) Calculate current expectation values 11: Set prec ← max( L, R) and replace AL ←ÃL, AR ←ÃR and C ←C 12: end while 13: return AL, AR, C 14: end procedure Table II. Pseudocode of the VUMPS algorithm described in Sec. II D. Terms within step 5 involving the evaluation of infinite geometric sums usually require the left dominant eigenvector L of TR and the right dominant eigenvector R of TL, for which L = C † C and R = CC † with current C are a good enough approximation to current precision prec (see main text). Notice that this algorithm is free of any possibly ill-conditioned inverses and therefore has no convergence issues in the presence of small Schmidt values. It also does not require expensive reorthogonalizations of the state at intermediate iterations.
It measures the precision of the current uMPS ground state approximation. Within every iteration, we use iterative methods (e.g. some variation of Lanczos) to find the eigenvectorsÃ C andC of the Hermitian operators H A C and H C . As the goal is to drive the state towards the fixed point relations in Eqs. (23a) and (23b), it is not necessary to solve these eigenvalue problems to full machine precision. Rather, it is sufficient to use a tolerance H chosen relative to prec . 32 A value of H of the order of prec /100 has proven to work well in practice. It is also worthwhile to use tensors from the previous iteration as initial guess for the iterative solvers to speed up convergence.
As the main part of the algorithm works at fixed bond dimension (i.e. it is a single-site scheme in DMRG terminology), one might choose to increase the bond dimension D before starting a new iteration. We have developed a subspace expansion technique that works directly in the thermodynamic limit and is explained in Appendix B.
While the true comparison of this algorithm with IDMRG 5,24 and ITEBD 26 will take place in Sec. III by gathering actual numerical simulation results, we can already compare the theoretical properties of these algorithms. Neither IDMRG or ITEBD is truly solving the variational problem in the sense of directly trying to satisfy the fixed point conditions Eqs. (23). IDMRG closely resembles regular DMRG on a successively growing lattice, as it inserts and optimizes over new tensors in the center of the lattice in each step. Tensors from previous steps are not updated, as this would render the cost prohibitive. When this approach converges, the resulting fixed point tensors in the center can be assumed to specify the unit cell of an infinite MPS. VUMPS has the immediate advantage that i) it directly works in the thermodynamic limit at all iterations and ii) it completely replaces the entire state after every iteration, thus moving faster through the variational manifold. In contrast, IDMRG keeps memory of earlier iterations and cannot guarantee a monotonically decreasing energy that converges to an optimum associated with a translation invariant MPS in which the effects of the boundary have completely disappeared. The advantages of VUMPS come with a greater computational cost per iteration, as two eigenvalue problems (for A C and for C) and -in the case of nearest neighbor interactions -two linear systems (for H L and H R ) have to be solved. IDMRG only solves a single eigenvalue problem and builds H L and H R step by step in every iteration. The latter approach is analogous to a power method for eigenvalue problems and, while very cheap, is expected to require many iteration steps to converge, especially for systems with large correlation lengths (e.g. close to criticality). ITEBD 26 is based on evolving an initial state in imaginary time by using a Trotter decomposition of the evolution operator. Like VUMPS, ITEBD works in the thermodynamic limit at any intermediate step, typically with a unit cell that depends on how the Hamiltonian was split into local terms in order to apply the Trotter decomposition. Furthermore, as every application of the evolution operator increases the virtual dimension of the MPS, truncation steps are required to restore the original (or any suitable) value of the bond dimension. While VUMPS can take big steps through the variational space, time steps in ITEBD have to be chosen sufficiently small (especially in the final steps of the algorithm) to eliminate the Trotter error, which negatively affects the rate of convergence. (Ref. 33 however proposes a scheme to effectively obtain a larger time step). Furthermore, the Trotter splitting essentially limits the applicability of ITEBD to short-range interactions and dictates the size of the unit cell of the resulting MPS, e.g. in the most common case of nearest neighbor interactions a two-site unit cell is obtained. (The approach of Ref. 34 to obtain a translation invariant MPS is restricted to certain Hamiltonians, but see Ref. 35 for an alternative proposal that can in fact also deal with long range interactions.) Finally, we can also compare VUMPS to the more recent time dependent variational principle (TDVP), 27 which was implemented as an alternative approach to simulate real and imaginary time evolution within the manifold of MPS by projecting the evolution direction onto the MPS tangent space. This approach can be applied to translation invariant MPS, independent of the type of Hamiltonian. When used to evolve in imaginary time, it can be identified as a covariant formulation of a gradient descent method, in that it evolves the state in the direction of the gradient of the energy functional, preconditioned with the metric of the manifold. As such, the energy decreases monotonically and at convergence, an exact (local) minimum is obtained, as characterized by the vanishing gradient. However, in its original formulation, TDVP was not formulated in a center site form and was therefore unstable and restricted to small time steps. For finite systems, a different formulation of the TDVP algorithm was provided in Ref. 28, which allows for taking the limit of the imaginary time step to infinite, and then becomes provably equivalent to the single-site DMRG algorithm. VUMPS can be motivated from these developments, as explained in Appendix A.
We conclude this section by elaborating on how to incorporate symmetries in the algorithm. The construction of uMPS that is explicitly invariant under onsite unitary symmetries is equivalent to (I)DMRG 12,23 and (I)TEBD 36,37 , and it is immediately clear that the various steps in VUMPS have a corresponding covariant formulation. The same comments apply to time reversal symmetry, in which case everything can be implemented in real arithmetic, or to reflection symmetry, in which case C and A s C will be symmetric matrices and A s R = A s L T (which implies that H L and H R are also related). In all of these cases, the computational cost is reduced. However, explicitly imposing the symmetry in the MPS requires caution, as the physical system might have spontaneous symmetry breaking, or -more subtly -might be in a symmetry protected topological phase where the symmetries cannot be represented trivially on the MPS tensor.
In the case of spontaneous symmetry breaking, MPS algorithms tend to converge to maximally symmetry broken states for which the entanglement is minimal. This is also the case for VUMPS. One can control which state the algorithm converges to by suitably biasing the initial state or by adding small perturbation terms to the Hamiltonian which explicitly break the symmetry, and which are switched off after a few iterations. Explicit conservation of translation symmetry was the very first requirement in the construction of VUMPS. In the case of spontaneous breaking of translation symmetry down to N -site translation symmetry (as e.g. in the case of a state with antiferromagnetic order), enforcing one-site translation symmetry would result in a (non-injective) equal weight superposition of all symmetry broken uMPS ground state approximations. In order to reach an optimal accuracy with a given bond dimension, such a superposition of N states is however undesirable, as the effective bond dimension is reduced to D/N . In the case where this situation cannot be amended by a simple unitary transformation that restores one-site translation symmetry (such as e.g. flipping every second spin in the case of an antiferromagnet), it is preferable to choose an MPS ansatz with a N -site unit cell, such that the state can spontaneously break translation symmetry. The generalization of the algorithm to multi-site unit cells is described in the next section.

E. Multi Site Unit Cell Implementations
We now generalize the VUMPS algorithm of the previous section for one-site translation invariant uMPS to the setting of translation invariance over N sites. Such a uMPS ansatz is then parameterized by N independent tensors A(k) s ∈ C D×d×D , k = 1, . . . , N , which define the unit cell tensor where s = (s 1 , . . . , s N ) is a combined index. We can then write the variational state as and the left and right orthonormal forms are given by the relations where it is understood that N + 1 ≡ 1 and 0 ≡ N . Defining the bond matrices C(k) as the gauge transformation that relates left and right canonical form via . We can then also cast |Ψ(A) in a mixed canonical form similar to (5a) with the center site tensor given by The variational minimum within this set of states is characterized by the following 3N fixed point relations Notice that due to (27c), the relations for different k are connected. There are several possible strategies for constructing algorithms which obtain states satisfying these conditions. In the following we present two approaches which have shown good performance and stable convergence, which we shall term the "sequential" and "parallel" methods. But let us first elaborate on computing effective Hamiltonians for multi-site unit cells, which works similarly in both methods. We again restrict to the case of nearest neighbor interactions, such that the effective Hamiltonians are constructed similar as in Sec. II B. To construct e.g. the left block Hamiltonian H L , we first collect all local contributions from a single unit cell in h L , before performing the geometric series of the transfer matrix, which now mediates a translation over an entire unit cell.

Sequential Algorithm
The sequential algorithm is inspired by finite size DMRG, in that we sweep through the unit cell, successively optimizing one tensor at a time while keeping tensors on other sites fixed. Notice that at site k we however need two updated bond matricesC(k) L =C(k − 1) and C(k) R =C(k), in order to calculate updatedÃ(k) s We thus have to amend steps 5, 6 and 7 of the single-site algorithm in Table II by constructing and solving for two effective Hamiltonians H C(k−1) and H C(k) instead of a single one. The newly optimized tensors then get replaced in all unit cells of the infinite lattice, and contributions to the effective Hamiltonians have to be recalculated accordingly, before moving on to the next site. For a pseudocode summary see Algorithm 3 in Table III.
One could now try to argue, that e.g. in a left to right sweep it is enough at site k to calculate updatedÃ(k) C andC(k) R =C(k) only, and to useC(k − 1) R from the previous step at site k − 1 asC(k) L for calculat-ingÃ(k) R . This approach however fails, as the effective Hamiltonian used for calculatingÃ(k) C already contains updatedÃ(k − 1) L/R , while the effective Hamiltonian used for calculatingC(k − 1) R does not, and we cannot determineÃ(k) R fromÃ(k) C andC(k − 1) R . Rather, C(k) L has to be recalculated using an updated effective Hamiltonian, which exactly leads to the sequential Algorithm 3.
There is an additional subtlety that needs to be considered, in order for all tensors to fulfill the gauge constraints (27c) to current precision. Bond matricesC(k) are calculated as lowest energy eigenvectors of effective Hamiltonians H C(k) and are therefore only determined up to a phase. Consider C(k) defined between sites k and k + 1. At step k it is updated asC(k) R and used to calculatẽ A(k) s L . In the next step k+1 however it is recalculated as C(k + 1) L (with an updated effective Hamiltonian) and used to determineÃ(k+1) s R . At the fixed point we should then haveC(k) R =C(k + 1) L = C(k), but this is only true if there is no phase ambiguity, which would also consequently lead to a phase mismatch betweenÃ(k) L and C(k) after step k + 1. This issue does not pose a problem for algorithm convergence (during calculations, matrices C(k) always appear as products of the form C(k) † C(k) or C(k)C(k) † and mismatching phases thus cancel out), but can be easily circumvented by employing a phase convention when calculating updatedC(k).

Parallel Algorithm
In the parallel approach, we choose to update an entire unit cell at once, using effective Hamiltonians generated from the same current state. To that end, we first generate all terms necessary for all H A(k) C and H C(k) . For the case of nearest neighbor interactions, the contributions H L and H R to the left and right environment outside the unit cell can be shared, so that the corresponding geometric sum only needs to be computed once, and contributions inside the unit cell are obtained through successive applications of transfer matrices.
Next, we simultaneously and independently solve for the ground statesÃ(k) C andC(k) of all 2N effective Hamiltonians at once. Once these are obtained we again simultaneously and independently determine all updated A(k) L andÃ(k) R , concluding one iteration for updating the entire unit cell. For a pseudocode summary see Algorithm 4 in Table III.

Juxtaposition of Both Approaches
Several comments on the two presented algorithms are in order. First, the parallel algorithm requires substantially less computational effort, since the construction of the different effective Hamiltonians H A(k) C can recycle the calculation of the infinite geometric sum. Therefore, updating an entire unit cell only requires to evaluate two infinite geometric sums and 2N effective eigenvalue problems. In the sequential algorithm, updating the environment after every tensor update requires to reevaluate Algorithm 3 sequential variational uMPS algorithm for multi-site unit cells Calculate ground stateÃC of effective Hamiltonian H A(n) C to precision H < prec using an iterative eigensolver, calling ApplyHAC(C,H A(n) C ) from Algorithm 1, 5 or 6 8: Calculate ground stateCL of effective Hamiltonian H C(n−1) to precision H < prec using an iterative eigensolver, calling ApplyHC(C,H C(n−1) ) from Algorithm 1, 5 or 6 To ensure gauge consistency, employ a phase convention forCL 9: Calculate ground stateCR of effective Hamiltonian H C(n) to precision H < prec using an iterative eigensolver, calling ApplyHC(C,H C(n) ) from Algorithm 1, 5 or 6 To ensure gauge consistency, employ a phase convention forCR 10: Calculate newÃL fromÃC andCR using (20) or (22), depending on singular values ofCR 11: Calculate newÃR fromÃC andCL using (20) or (22), depending on singular values ofCL 12: Evaluate new L(n) and R(n) from (18a) and (18b) 13: Calculate ground stateÃ(n)C of effective Hamiltonian H A(n) C to precision H < prec using an iterative eigensolver, calling ApplyHAC(C,H A(n) C ) from Algorithm 1, 5 or 6 8: Calculate ground stateC(n) of effective Hamiltonian H C(n) to precision H < prec using an iterative eigensolver, calling ApplyHC(C,H C(n−1) ) from Algorithm 1, 5 or 6 Calculate newÃ(n)L fromÃ(n)C andC(n) using (20) or (22), depending on singular values ofC(n) 12: Calculate newÃ(n)R fromÃ(n)C andC(n − 1) using (20) or (22), depending on singular values ofC(n − 1) 13: Evaluate new L(n) and R(n) from (18a) Table III. Pseudocode for the two approaches for a multi-site unit cell implementation described in Sec. II E. Algorithm 3 sweeps through the unit cell and sequentially updates tensors site by site, replacing updated tensors in all unit cells before moving on to the next site. Algorithm 4 updates the entire unit cell at once by independently updating tensors on each site. the geometric sum, thus leading to 2N infinite geometric sums and 3N effective eigenvalue problems for updating the complete unit cell. Additionally, the parallel approach offers the possibility of parallelizing the solution of all 2N eigenvalue problems in one iteration, while in the sequential approach only 3 eigenvalue problems can be solved in parallel for each site. However, while sweeping through the unit cell in the sequential approach, initial guesses for solving the infinite geometric sums can be generated easily from the previous iterations, and are usually much better than the initial guesses in the parallel algorithm. Equivalently, updatedC(k) obtained at site k is a very good initial guess for its recalculation with updated environment on site k + 1. Overall, the computational cost for the parallel update is still much cheaper, albeit less than expected.
On the other hand, state convergence in terms of iterations is generally substantially faster in the sequential approach. This seems reasonable, as the optimization on a current site takes into account all previous optimization steps, whereas in the parallel approach, the optimizations on different sites within one iteration are independent of each other. This effect gets amplified with increasing unit cell size N , and the performance of the parallel approach decreases, while the performance of the sequential approach seems more stable against increasing N .
In conclusion, while updating the entire unit cell is computationally cheaper in the parallel approach, the sequential algorithm usually requires a substantially smaller number of iterations due to faster convergence. While there are instances where one approach clearly outperforms the other by far, such cases are rare and strongly depend on initial conditions, and generally both approaches show comparable performance. For comparison benchmark results see Sec. III B 5.

III. TEST CASES AND COMPARISON
In this section we test the performance of the new algorithm on several paradigmatic strongly correlated lattice models in the thermodynamic limit, with nearest neighbor as well as long range interactions. In Sec. III A we introduce and discuss the models under considerations. In Sec. III B we first test the convergence and stability of the single and multi-site implementations of the new algorithm. Lastly, we compare its performance against established conventional MPS methods for ground state search in Sec. III C.

A. Models
As examples for spin chain models with nearest neighbor interactions we study the spin S = 1/2 transverse field Ising (TFI) model and the XXZ model for general spin S Here X, Y and Z are spin S representations of the generators of SU (2). The ground state energies are known exactly for the TFI model, 38 and for S = 1/2 also for the XXZ model. 39 For the S = 1 XXZ model we focus on the isotropic antiferromagnetic case ∆ = 1 and take the result of Ref. 27 for the ground state energy for D = 1024 as quasi-exact result. As a further example for a system with nearest neighbor interactions we also study the Fermi Hubbard model where c σ,j , c † σ,j are creation and annihilation operators of electrons of spin σ on site j, n σ,j = c † σ,j c σ,j and n j = n ↑,j + n ↓,j are the particle number operators. Again, the exact ground state energy is known. 40,41 Finally, as an example for an exactly solvable model with (algebraically decaying) long range interactions we consider the Haldane-Shastry model 42,43 where X, Y and Z are again spin S = 1/2 representations of the generators of SU (2). In order to efficiently compute the terms of the effective Hamiltonian (see Appendix C 1), we expand the distance function f (n) = n −2 in a sum of K = 20 exponentials, with maximum residual less than 10 −6 for a fit over N = 1000 sites.

B. Performance Benchmarks
We performed convergence benchmarks for several instances of the models introduced in the previous section, using simple implementations of VUMPS for single or multi-site unit cells presented in Algorithm 2, 3 and 4, without explicitly exploiting any symmetries. Hereto, we consider firstly the error in the variational energy density as a function of the number of iterations. Here e exact is the exact analytic (or quasi-exact numerical) ground state energy density of the model under consideration. Notice that at the point where the energy has already converged to machine precision, the gradient is still quite far from zero, and the state thus still some distance from the variational optimum. Table II has its internal convergence measure used to determine when to stop the iteration loop, as well as to set the tolerance in the iterative solvers used within every single outer iteration. However, as a more objective quantity that measures the distance to the variational minimum, we also compute the norm of the energy gradient; it is an absolute measure of convergence which is independent of any prior iterations, as opposed to relative changes in e.g. the energy or Schmidt spectrum between iterations. We denote this quantity as B , the two-norm of a D × d × D tensor B, which can be worked out to be given by (see Appendix A 3)

VUMPS as formulated in
The efficient and accurate computation of the gradient norm is further discussed in Appendix A 4. To obtain the energy gradient B of an N -site unit cell, it is equivalent to determine the gradients B(k) for each site independently and to calculate the norm of the concatenation of all N gradients. A well known-property of the variational principle is that the energy expectation value itself converges quadratically faster than the state. When the state has converged to some accuracy B , the energy density has already converged to precision O( B 2 ) which can therefore be well beyond machine precision. The convergence measure B does however dictate the convergence of other observables which are not diagonal in the energy eigenbasis. Note, however, that we are here referring to convergence towards the value at the variational optimum, not towards the exact value. The error between the variational optimum and the exact ground state can be quantified using e.g. the energy variance, or -in the context of DMRG -the truncation error. Both quantities are also discussed in Appendix A 4. We show results for the truncation error further down in Sec. III B 4, and when comparing VUMPS to IDMRG and ITEBD in Sec. III C. Out of the gapped systems, only the antiferromagnetic ground state of (c) physically breaks translation invariance by spontaneously breaking the Z 2 spin-flip symmetry; we therefore choose a two-site unit cell in this case. The critical systems physically show no spontaneous symmetry breaking. However, for uMPS ground state approximations it is often energetically beneficial to artificially break symmetries (which are restored in the limit of infinite bond dimension). In all three cases, the optimal uMPS ground state approximation artificially breaks a SU(2) symmetry and develops antiferromagnetic order, breaking translation invariance. We therefore choose a two-site unit cell in the case of the Hubbard model (e) and the Haldane-Shastry model (f). In the case of the Heisenberg antiferromagnet (d), translation invariance can be restored through a unitary transformation by rotating every second spin by π around the z-axis, transforming H XXZ (∆) → −H XXZ (−∆), and the artificially symmetry broken ground state becomes ferromagnetically ordered along the x and y directions. We can therefore choose a single-site unit cell for (d), and the staggered magnetization along z is thus zero. A similar approach could be chosen to restore translation invariance also for the gapped antiferromagnet (c) and the Hubbard model (e), but we do not choose to do so for demonstrative reasons. To summarize, we used the single-site Algorithm 2 for (a), (b) and (d), and the sequential Algorithm 3 with a two-site unit cell for (c), (e) and (f). For a comparison between the sequential and parallel approach see Sec. III B 5.

General Convergence
Above all we observe that VUMPS shows unprecedented fast convergence, both in the energy density e and the norm of its gradient B , and excellent accuracy of the final ground state approximation. Observe in Fig. 1 that in all cases the energy is already well converged to machine precision after O(10 − 50) iterations, while the state is still quite some distance from the variational optimum, according to the gradient norm B . Further optimizing the state, this quantity can also be converged to essentially machine precision (even in the presence of small Schmidt values), while the energy virtually doesn't change anymore. The resulting final state then corresponds to the variationally optimal state for the given bond dimension. This is very useful in the case where the quantum state itself is required to be accurate to high precision, e.g. when used as a starting state for real time evolution or as a starting point to compute excited states and scattering thereof. [44][45][46] The Schmidt spectrum of the ground state of the S = 1 Heisenberg antiferromagnet at D = 120, converged to gradient norm B < 10 −15 , is depicted in Fig. 2. It can be seen that the degeneracies are reproduced perfectly to the same precision, without explicitly exploiting any symmetries in the implementation of the algorithm.
In cases where the final desired bond dimension D final is not known beforehand, one can successively enlarge a state of some small initial bond dimension every few iterations until the state fulfills the desired criteria, e.g. current bond dimension above some threshold, truncation error (see below) or smallest Schmidt value below some threshold, etc. This strategy is particularly useful when using an implementation exploiting physical symmetries of the system, such as e.g. conservation of magnetization or particle number, as the correct number and size of the required symmetry sectors in the MPS tensors is generally not known beforehand. 47 On the other hand, if D final is known beforehand, it generally appears to be more efficient to immediately start from an initial state with D = D final . The gain in computational time due to the cheaper initial iterations with small bond dimension is usually outweighed by a considerable number of required additional iterations. On the other hand, for some hard problems (e.g. the Hubbard model) stability and convergence speed can profit from a strategy of sequentially  increasing the bond dimension from some small initial value.
To conclude the discussion of general convergence, we plot the evolution of the Schmidt spectrum, as well as the gradient norm B and various observables vs. iteration number during a ground state optimization for the TFI model (28) in the ferromagnetic phase at h = 0.45 in Fig. 3. During the simulation we used a sequence of bond dimensions D = [9,19,33,55], where we started with an initial random state with D = 9 and increased the bond dimension to the next value as soon as B dropped below 10 −14 . We chose this set of bond dimension in order to not cut any degenerate multiplets of Schmidt values (see also Sec. III B 3). It can be seen that the high lying Schmidt values converge quite quickly, while most of the computational time goes into converging the low lying Schmidt values. Moreover, there is quite some rearrangement of the small Schmidt values every time the gradient norm B reaches a local maximum during a phase of non-monotonous evolution (see also next subsection). Lastly, from the evolution of the errors of the local observables X and Z it is apparent that they require a substantially higher bond dimension of D = 55 to reach the same accuracy as the energy, which is already correct to machine precision at D = 19.

Different Regimes of Gradient Norm Convergence
Depending on the complexity of the model, the gradient norm shows a period of irregular non-monotonous behavior before entering a regime of monotonous convergence. This can be understood as the (random) initial state having to adapt its initial structure (e.g. the Schmidt spectrum) to the requirements of the best variational ground state approximation. Monitoring the Schmidt values during this period nicely shows how groups of Schmidt values slightly rearrange to the correct structure (see e.g. Fig. 3) . This period is usually more dominant in critical systems -as can be seen in Fig. 1 (d) -(f), where it takes O(50 − 100) iterations -and of course strongly depends on the chosen initial state. One could argue, that the jumps in parameter space caused by the algorithm during this period are too big for the state to find the correct structure quickly, hindering a fast crossover to the regime of monotonous convergence. However, an approach of preconverging the state using smaller steps through parameter space -e.g. by means of imaginary time evolution with moderate time stepshas proven to be even slower in all cases tried. Thus, the best choice is still to use VUMPS during the entire optimization process. We want to emphasize here, that we have never observed a stagnation of the algorithm during this initial regime; the algorithm always reached the monotonous regime eventually in all cases, and instances where the algorithm remains in the irregular regime for an unusually long time are rare and only occur in the case of particularly hard problems.
As soon as the gradient norm reaches the monotonous regime, it always converges exponentially fast. Surprisingly, this is true even for critical systems, where one would in principle expect algebraic convergence. This can be qualitatively understood from the theory of finite entanglement scaling, [48][49][50] which states that the MPS approximation itself introduces a small perturbation away from criticality, and thus a finite gap. However, as VUMPS improves convergence speed over existing methods (see also Sec. III C), it is ideally suited to study critical systems via the theory of finite entanglement scaling, which still requires that one finds the optimal MPS approximation in the first place.

Degenerate Schmidt Values
In the presence of multiplets of degenerate Schmidt values, the convergence rate is severely affected if the smallest few Schmidt values are part of an incomplete multiplet, i.e. if the last multiplet is "cut". In that case the algorithm still shows stable convergence, albeit at a greatly reduced rate. For an example in the TFI model see Fig. 4. This issue can be easily circumvented by ensuring that the smallest few Schmidt values are part of a complete multiplet when dynamically increasing the bond dimension, or by choosing a viable (or reducing from some) fixed initial bond dimension.

Energy Convergence with Bond Dimension
In a careful MPS study, variational energies obtained for different bond dimension D are compared in order to extrapolate to the exact D → ∞ limit. This can be done by plotting the energy e(D) as a function of bond dimension against the inverse of the bond dimension 1/D. The infinite D limit is then obtained by fitting with a power law form and extrapolating to 1/D → 0. In DMRG, another popular measure for the quality of an MPS approximation is given by the truncation error or discarded weight ρ , defined in Eq. (A28). The variational energy is found to scale linearly with ρ 51,52 and an extrapolation to ρ → 0 is thus generally easier and more stable. For further details on assessing the quality of the ground state approximation we refer to Appendix A 4.
We show an example for both extrapolation schemes for the isotropic S = 1/2 Heisenberg antiferromagnet in Fig. 5. The exact ground state energy is given by e exact = 1 4 − log (2) Comparing to e exact we observe that e T has an error of ∆e T ≈ 3 × 10 −9 , while e D has an error ∆e D ≈ 8 × 10 −10 .

Multi Site Unit Cell Implementations
Lastly we discuss and compare the performance of the sequential and parallel algorithms for multi-site unit cells presented in Sec. II E. As the two methods differ in their convergence with the number of iterations, as well as in the computational effort for each iteration of updating the entire unit cell, we compare the rate of convergence with absolute computing time t in seconds. To that end we only time operations that are absolutely necessary for each algorithm, i.e. we do not time measurements, data storage etc. We further start from the same (random) initial state and keep the bond dimension fixed throughout the entire simulation for both methods to make the simulations as comparable as possible. All calculations are performed using a non-parallelized MATLAB implementation on a single core of a standard laptop CPU.
We find that in general for gapped systems, the sequential approach outperforms the parallel approach, while in critical systems no definite statement about better performance can be made. There are instances where one algorithm takes substantially longer than the other to reach the regime of monotonous convergence, but such cases are rare and strongly depend on the model and initial state. In Fig. 6 we show two extreme examples of such behavior for the critical Hubbard model (30) with U = 5 and D = 65 and a N = 2 site unit cell. In the left plot the sequential approach is clearly more efficient than the parallel approach, while in the right plot the opposite is the case. The only difference between those two cases is the chosen initial state. Overall both approaches however generally show comparable performance, with the sequential approach appearing to be slightly more stable and reliable in the cases we considered.  Once both algorithms are in the monotonous regime, convergence speed in terms of absolute computing time is also similar, with the sequential approach generally taking longer for each iteration, but the parallel approach generally requiring more iterations to reach convergence.

C. Comparison with IDMRG and ITEBD
We further benchmark the performance of VUMPS against a standard two-site IDMRG implementation, 5,6,24 and a standard two-site ITEBD implementation, 26,36 and compare the rate of convergence of the energy error ∆e and the norm of the energy gradient B between the three methods. For VUMPS, we solve the effective eigenvalue problems in each iteration to precision H = prec /100 with prec the current precision according to (24). For IDMRG we solve the effective two-site eigenvalue problem in each iteration to precision H = (1 − F )/100, with F the current orthogonality fidelity F (see Sec. III.A in Ref. 24). For ITEBD we employ a fourth order Suzuki Trotter decomposition and measure every 10 time steps. We use a sequence of time steps δt ∈ [10 −1 , 10 −2 , 10 −3 , 10 −4 , 10 −5 , 10 −6 ] where we decrease the time step as soon as the change in Schmidt values per unit of imaginary time drops below a certain threshold. Naturally, the strategy of time step reduction should be optimized carefully for each model under consideration, however we choose the same strategy for all example cases to maintain comparability.
We also explicitly calculate all necessary quantities for obtaining a truly variational energy (e.g. by reorthogonalizing the unit cell) and for measuring the energy gradient, even if these quantities are not necessary for the respective algorithm itself.
As the three methods differ quite substantially in the number of iterations required for convergence, as well as the computational effort for each iteration, we again compare convergence against absolute computing time t in seconds, where we only time operations that are absolutely necessary for each algorithm and we do not time measurements, data storage, reorthogonalizing, etc. We further start from the same (random) initial state and keep the bond dimension fixed throughout the entire simulation for all three methods to make the simulations as comparable as possible. Again, all calculations were performed using a non-parallelized MATLAB implementation on a single core of a standard laptop CPU.
We show example comparisons for two gapped and two critical models in Fig. 7, similar to the cases studied in the previous section. Specifically, we show results for (a) the gapped isotropic S = 1 Heisenberg antiferromagnet, i.e. the S = 1 XXZ model (29) at ∆ = 1, (b) the S = 1/2 XXZ model (29) in the gapped symmetry broken antiferromagnetic phase at ∆ = 2, (c) the critical isotropic S = 1/2 Heisenberg antiferromagnet, i.e. the S = 1/2 XXZ model (29) at ∆ = 1, and (d) the critical Fermi Hubbard model (30) at U = 5 and half filling. We plot the energy error ∆e on the left and the gradient norm B on the right, vs. absolute computing time t in seconds. For VUMPS we used a single-site unit cell for (a) and (c), and a two-site unit cell for (b) and (d).
Above all we observe that VUMPS clearly outperforms both IDMRG and ITEBD by far, both in convergence speed and accuracy of the final state, especially for critical systems. In all shown cases, the final energy error ∆e of all three algorithms only differ by a few percent; VUMPS however always yields the best variational energy, often already after a few seconds, and thus converges in energy much faster than IDMRG or ITEBDin the case of critical systems even by orders of magnitude. Observe that especially for the two critical systems (c) and (d) a large part of the computational time of IDMRG and ITEBD goes in converging the last few digits of the energy (see also insets in Fig. 7). For (d) in particular, the final energy error obtained by IDMRG is still almost 10% higher than the value obtained by VUMPS.
In terms of convergence of the energy gradient B , we observe that IDMRG and ITEBD perform quite poorly. Surprisingly, IDMRG usually stagnates at some value B > 10 −7 . ITEBD on the other hand would be in principle capable of converging B essentially also to machine precision, albeit at prohibitively long simulation times, as the limiting factor appears to be the Trotter error, requiring very small time steps; we therefore also only reach values of B 10 −10 with ITEBD within reasonable simulation times. 53 VUMPS on the other hand is always capable to converge B essentially to machine precision, and does so -contrary to other methods -exponentially fast and with unprecedented speed. For instance, in the case of the Hubbard model in example (d), ITEBD only reached a gradient norm of B = 1.3×10 −7 after ≈ 60 hours of absolute computing time, while VUMPS already reached this value after only ≈ 30 seconds, and converged further to B < 10 −14 in ≈ 90 seconds. IDMRG on the other hand stagnated at a quite high value of B ≈ 2 × 10 −4 .

Observables
We also measure and compare the regular and staggered (averaged) magnetizations m r and m s of the final state after convergence for the Hubbard model in example (d), as in this case all three methods use a two-site unit cell. The exact ground state is SU(2) symmetric and thus has zero magnetization; a finite D ground state approximation however artificially breaks this symmetry. The final values for the regular magnetization m r are zero to machine precision for both VUMPS and IDMRG, but m r = 8 × 10 −12 for ITEBD. The staggered magnetization is m s = 0.011162 for VUMPS, m s = 0.080768 for IDMRG, and m s = 0.034797 for ITEBD. Both the regular and staggered magnetizations are thus smallest for the final state obtained from VUMPS. For IDMRG the staggered magnetization is highest, but the regular magnetization is zero, which in turn is finite for ITEBD. This result is not surprising, as VUMPS yields the best variational state out of the three methods.

Truncation Error
As a last figure of merit, popular in DMRG studies as a measure of the quality of the MPS ground state approximation and used for extrapolations to the exact infinite D limit, we also calculate the truncation error or discarded weight ρ of the final state, defined in Eq. (A28). In the case of the Hubbard model in example (d), we obtain a truncation error ρ = 2.54438 × 10 −6 from IDMRG, and a slightly lower ρ = 2.45138 × 10 −6 from VUMPS.

IV. CONCLUSION AND OUTLOOK
We have introduced a novel algorithm for calculating MPS ground state approximations of strongly correlated one dimensional quantum lattices models with nearest neighbor or long range interactions, in the thermodynamic limit. It combines ideas from conventional DMRG and tangent space methods by variationally optimizing a uniform MPS by successive solutions of effective eigenvalue problems. The algorithm can easily be implemented by extending an existing single-site (I)DMRG implementation with routines for i) calculating effective Hamiltonian contributions from infinite environments and ii) solving an effective "zero site" eigenvalue problem in addition to the usual single-site problem. The new algorithm is free of any ill-conditioned inverses and therefore does not suffer from small Schmidt values, contrary to other tangent space methods such as TDVP. Additionally, as it does not rely on imaginary time evo- It is obvious that VUMPS reaches convergence orders of magnitude faster than IDMRG or ITEBD, especially for critical systems. Notice also that, while ∆e differs only by a few percent between the different algorithms (up to ≈ 10% for (d)), VUMPS always manages to also converge B essentially to machine precision, whereas IDMRG and ITEBD stagnate at some substantially higher values, remaining quite far from the variational optimum. lution, it is especially fit for studying systems with long range interactions.
We described and benchmarked implementations for uniform MPS with both single-site and multi-site unit cells. We observed that the new algorithm clearly outperforms existing methods such as IDMRG and ITEBD, both in convergence speed and accuracy of the final state at convergence. The energy converges with unprecedented speed after O(10 − 50) iterations, even in critical systems (where this is orders of magnitude faster than conventional methods). The algorithm further proceeds to converge the state to the variational optimum by minimizing the energy gradient essentially to machine precision; it does so exponentially fast, even for critical systems, contrary to other methods. The new algorithm is thus the perfect choice for studying critical systems. Additionally, a state converged to the variational optimum is particularly useful in cases where the quantum state itself is required to be accurate to high precision, e.g. when used as a starting state for real time evolution or for a variational calculation of elementary excitations. [44][45][46] It is straightforward to include physical symmetries that come with good quantum numbers (such as e.g. conserved magnetization or particle number) after a proper definition of a symmetric uniform MPS unit cell, where absolute (diverging) values of these quantum numbers are replaced by densities. All steps of the algorithm then immediately also apply to MPS tensors with good quantum numbers. Symmetric ground states obtained this way are an excellent starting point for obtaining elementary excitations with well defined quantum numbers following Ref. 44, which for instance enables to target elementary excitation that lie within a multi-particle continuum. 47 Within the same framework it is also very natural to recover real or imaginary time evolution by replacing the effective Hamiltonian ground state problems (23a) and (23b) by small finite time evolution steps, which yields the thermodynamic limit version of the time evolution algorithm presented in Ref.
28. This enables e.g. to efficiently study real time evolution of quantum states on systems with long range interactions in the thermodynamic limit. 54,55 We believe that the ideas presented in this paper should be relevant for other classes of tensor-network states as well. Specifically in the case of projected entangled pair states (PEPS), designed to capture ground states on two-dimensional quantum lattices, the further development of efficient variational algorithms -as an alternative to ITEBD inspired approaches -is still much desired. 56,57 In particular, it motivates the search for (approximate) canonical forms for PEPS, which would enable a translation of the VUMPS algorithm to the twodimensional setting. If, in the case of nearest neighbor interactions, we only substitute h →h in the construction of HL and HR, but not in the local terms, we will have EA C = 2EC = 2e. 32 Further approximations to comparable accuracy can be made within the construction of the effective Hamiltonians, e.g. when determining HL and HR to precision S. There, the approximationsR = CC † andL = C † C for the true L and R needed for some of these operations are good enough, if S is roughly of the same order of magnitude as

Variational principle on manifolds
The variational principle in quantum mechanics characterizes the ground state of a given Hamiltonian as the state |Ψ which minimizes the normalized energy expectation value If (typically for computational reasons) we only have access to a subset of Hilbert space, the variational principle still gives a way to find an approximation to the true ground state, namely by solving the minimization problem within the restricted set. If this subset is a linear subspace spanned by a number of basis vectors {|i , i = 1, . . . , N }, we obtain a generalized eigenvalue problem This is known as the Rayleigh-Ritz method, and by orthonormalizing the basis it clearly amounts to projecting the full time-independent Schrödinger equation into the variational subspace.
If, more generally, we have a variational ansatz |Ψ(A) which depends analytically on a number of complex parameters, as encoded in the complex vector A, a variational minimum |Ψ(A * ) is characterized by a vanishing gradient of the energy expectation value, i.e. Eq. (A1) can be interpreted as a Galerkin condition: It forces the residual (H − E) |Ψ of the full Schrödinger eigenvalue equation -which does not have an exact solution in the variational subset -to be orthogonal to the space spanned by the states |∂ i Ψ(A * ) . If the variational subset is a manifold, these states can be interpreted as a basis for the tangent space of the manifold at the point of the variational optimum. Hence, geometrically, the residual has to be orthogonal to the manifold (and thus to its tangent space) at the point of the variational optimum. Interpreting Eq. (A1) as a Galerkin condition on the ground state eigenvalue problem is useful because it can be generalized to other eigenvalue problems which do not necessarily have a variational characterization (and thus no gradient), as e.g. when the operator is non-Hermitian. Indeed, a similar approach as is developed here was described for finding fixed points of transfer matrices, encoded as matrix product operators, in Ref. 58. However, before discussing Eq. (A1) in the context of MPS, let us conclude this section by relating it to the time-dependent variational principle (TDVP). 27 Geometrically, the TDVP also amounts to an orthogonal projection of the equation of motion (the time-dependent Schrödinger equation) onto the tangent space of the variational manifold. In the case of imaginary time evolution, it can be written as is the Gram matrix of the tangent vectors and thus the metric of the manifold. The right hand side of Eq. (A2) is again the gradient of the objective function, and the TDVP will thus converge when it reaches a variational optimum A * where Eq. (A1) is satisfied. However, the metric gī ,j in the left hand side shows that the TDVP equation is not a normal gradient flow, but rather a proper covariant gradient flow that takes the geometry of the manifold and its embedding into the Hilbert space into account. We can thus also associate a quantum state with the gradient, which is given by where we have omitted the arguments A andĀ, g i, is the inverse of the metric and is the projector onto the tangent space at the point |Ψ in the variational manifold M. This latter expression is only valid when the variational parameters are proper coordinates for the manifold (i.e. a bijective mapping). While this is not the case for MPS because of gauge freedom (see below), the geometrical interpretation for the Galerkin condition remains valid; the correct expression for the MPS tangent-space projector will be discussed in more detail in the following section. Independent of whether a variational algorithm is based on a gradient flow, the Hilbert space norm of the gradient P T |Ψ M H |Ψ provides an objective measure for the convergence of the state towards the variational optimum. Note that this is different from the standard Euclidean norm of the naive gradient vector with components ∂Ψ|H − E|Ψ .

Manifold of Uniform MPS and its tangent space
We have already introduced the set of uniform MPS and some of its properties in Sec II A. For completeness, we restate the definition as Such states can be rigorously defined in the thermodynamic limit and correspond to the so-called purely generated finitely correlated states of Ref. 7. However, for all practical purposes, we can interpret this state as a large but finite MPS, which happens to be uniformly parameterized (and therefore translation invariant) in the bulk.
We have now introduced boundary vectors v L , v R living at ±∞. They represent just one way to close the matrix product near the boundaries, but any other behavior is equally fine and won't affect the bulk properties, provided that the MPS tensor A has the property of injectivity. 10 This condition is generically fulfilled and is in one-to-one correspondence with the MPS transfer matrix having a unique eigenvalue of largest magnitude, with corresponding eigenvectors that can be reinterpreted as full rank positive D×D matrices. A proper normalization of the state is obtained by rescaling the tensor A such that this largest magnitude eigenvalue is 1. We can then use gauge invariance to transform the tensor A into the left and right canonical representations (2) discussed in the main text. These representations are themselves related via the bond matrix C as A s L C = CA s R , which allows to write the state in the mixed canonical representation (5) familiar from DMRG. We can also make contact with the representation commonly used in TEBD 25,26 by additional unitary gauge transforms. We make C diagonal by considering the SVD C = U λV † and gauge transformÃ s The singular values λ are then the Schmidt values of a bipartition of the state and we can obtain and L = R = λ 2 . It was proven in Ref. 59 that the set of injective uMPS constitute a (complex) manifold. We now construct the tangent space projector P |Ψ(A) ≡ P T |Ψ(A) M for a uMPS |Ψ(A) in the thermodynamic limit.
Let us first discuss generic tangent vectors |Φ in this tangent space. To define them via the partial derivatives of |Ψ(A) , we require that the latter is represented using a uniform parameterization or gauge choice of the tensor A throughout. By applying the chain rule, the derivative will give rise to a uniform superposition of states where a single A tensor is replaced by a new tensor B. 13,59 But it is clear that we can afterwards change gauges again and absorb the gauge factors in the tensor B that parameterizes the tangent vector, so as to obtain the most general tangent vector representation The multiplicative gauge freedom of the MPS translates into an additive gauge freedom in the tangent space, i.e. a transformation B s → B s +A s L X −XA s R with X ∈ C D×D leaves |Φ(B) invariant, as can readily be verified by explicit substitution. We can exploit these gauge degrees of freedom to impose e.g. the left tangent space gauge Strictly speaking, these conditions can only be imposed for tangent vectors |Φ(B) ⊥ |Ψ(A) . This is no restriction as we can always evaluate the contribution in the direction of |Ψ(A) separately. Under either of these two gauge constraints, we indeed have Ψ(A)|Φ(B) = 0, but more importantly, the overlap between two tangent vectors simplifies to This corresponds to an Euclidean inner product for the B tensors and thus to an orthonormal basis for the tangent space. The diverging factor |Z| arises because a tangent vector contains a sum over all lattice sites, i.e. its norm is extensive. Fortunately, this diverging factor will drop out in all computations. Given Eq. (A5) we need to derive the explicit form of the projector onto the tangent space, a derivation that was written down in Ref. With the additional constraint, the solution is still straightforward. It can be easily verified that the correct value of the Lagrange multiplier is such that the additional term acts as a projection if we would have chosen the right gauge of Eq. (A11).
By inserting the solution for B back into Eq. (A9), we can read off the tangent space projector. While the value of B depends on the gauge condition, the resulting projector is of course gauge independent and given by We can represent the tangent space projector as by defining the partial projectors We can verify that P 2 |Ψ(A) = P |Ψ(A) by using P L (m) P L (n) = P L (max(m, n)), (A17a) P R (m) P R (n) = P R (min(m, n)). (A17b)

Gradient and Effective Hamiltonians
As discussed in the beginning of this section, a variational optimum can be characterized geometrically as Applying the tangent space projection as in the previous section to the state |Ξ = H |Ψ(A) gives rise to a tangent vector of the form Eq. (A9), with where A C originates from applying P A C (n) and C from applying P C (n). By writing |Ψ(A) itself in a compatible gauge for every individual term, we obtain the diagrammatic expressions We can thus obtain A C and C by acting with the effective Hamiltonians H A C and H C introduced in (9) and (10) in the main text onto A C and C Even without subtracting the energy, the two choices of B (which are related by the additive gauge transform with X = C ) will be finite in the thermodynamic limit. However, the individual tensors A C and C will have a divergent contribution proportional to A C and C, respectively. Indeed, as discussed in the main text for the case of nearest neighbor interactions, the effective Hamiltonians H A C and H C have a divergent contribution corresponding to the total energy times the identity operator. It is thus by subtracting H →H = H − E (or h →h = h − e for the local terms) that these divergences are canceled. Appendix C provides a detailed description of the construction of the effective Hamiltonian for other types of interactions and illustrates explicitly that the diverging contributions cancel exactly.
A variational extremum is characterized by |Φ(B) = 0, which leads to B = 0 for either gauge choice, as these choices completely fix the gauge freedom. This gives rise to the following simultaneous conditions: However, because the gauge transformation that relates A L and A R is unique up to a factor (for injective MPS), C and C have to be proportional, and we have actually obtained the eigenvalue equations As we are looking for a variational minimum, the eigenvalues E A C and E C should be the lowest eigenvalues of the effective Hamiltonians H A C and H C . Depending on how we have regularized the divergent contributions, these eigenvalues might be different. If we have completely subtracted the energy expectation value from every term, we then have E C = E A C = 0.

Convergence and error measures
While neither VUMPS nor IDMRG or ITEBD directly use the gradient itself, the Hilbert space norm of |Φ(B) can be used as an objective convergence measure to indicate how far the current state is from the variational optimum. For either choice of B, we obtain |Φ(B) = √ N B , with N the diverging number of sites and B the 2-norm of the tensor B. Its square is given by where the equalities follow from C = s (A s L ) † A s C = s A s C (A s R ) † . Note that none of these expressions are well suited for numerically evaluating the norm close to convergence, as they involve subtracting quantities that are almost equal, especially when the state is close to convergence.
An alternative strategy for evaluating B is by using the matrix notation for tensors (17) Since A L is an isometry, we can extend it to a dD ×dD unitary matrix U = A L N L , where N L contains an orthonormal basis for the (d − 1)D-dimensional null space of A † L , i.e. A † L N L = 0. As the 2-norm is unitarily invariant, we can write The second equality follows from A † L B [ ] = 0 and the third from the null space property of N L . We then obtain B as the Frobenius norm of a single matrix, which can be calculated accurately as a sum of strictly positive numbers. For further reference, we reshape N L into a D × d × (d − 1)D tensor N L and similarly introduce a While the norm of the gradient provides a measure for the quality of approaching the variational minimum, it does not provide any information about the quality of the (u)MPS approximation to the true ground state itself. In the context of (two-site) DMRG schemes, a popular measure is the truncation error, as it is naturally accessible throughout the algorithm (see e.g. Ref. 12, 51, and 52). But also within VUMPS we can compute this quantity by first writing the state |Ψ(A) in a mixed canonical form with a two-site center block The two-cite center tensor A ss L CA s R (known as the two-site wave function ψ ss in standard DMRG) has an associated effective Hamiltonian H A 2C . We can compute its lowest eigenvectorÃ 2C and compute its singular value decomposition (by first reshaping it to a dD × dD matrix)Ã ss 2C = U s SV s . The truncation error then corresponds to the discarded weight when truncating the inner bond dimension of this twosite tensor to its original value D.
A more generic measure for the error in the variational approximation is given by the energy variance The advantage of this representation is again that it facilitates the calculation of the norm, as |Φ (2) (B 2 ) 2 = N B 2 2 with N the diverging number of sites. The projection of (H − E) |Ψ(A) onto this space can be worked out similarly as for the tangent space, and leads to the general result (for any Hamiltonian) We now also apply (A s R ) † A t R + (N s R ) † N t L = δ s,t 1 1 to the right hand side and recognize A s at the variational minimum, we obtain at the variational minimum We can also relate B 2 2 to the truncation error defined in the previous paragraph. For the truncation error arising in the context of two-site DMRG schemes, the lowest eigenvectorÃ 2C of the two-site effective Hamiltonian is used. When the DMRG algorithm has converged, the rank D approximation ofÃ 2C should again be the original two-site center tensor A st 2C = A s L CA t R . But this means that we can construct N L and N R exactly from the singular vectors corresponding to the (d − 1)D singular values that were truncated away, and thus that the truncation error is given by This definition is close to B 2 2 , except that the latter uses the tensor A 2C arising from applying the two-site effective Hamiltonian once. As A 2C andÃ 2C are anyway close, we can think of A 2C as providing the leading order correction from A 2C toÃ 2C in the sense of a Krylov scheme. Indeed, in the first iteration of the Lanczos method, the eigenvectorÃ 2C would be approximated in the form αA 2C + βA 2C . Since the first term drops out when projecting onto N L and N R , the DMRG truncation error and B 2 2 will be of the same order of magnitude. Note, however, that B 2 2 only captures the full energy variance (per site) for nearest neighbor Hamiltonians, whose action on |Ψ(A) is completely contained within the space of two-site variations as noticed above. In that case, we can see that the only term that survives in A 2C after projection onto N L and N R is the local term, whereh acts on the two-site center tensor. We can thus also write We can also relate this to the truncation step in (I)TEBD, where we would apply exp(−∆th) to every two-site block of the state. The resulting truncation would lead to a discarded weight of the order ∆t 2 B 2 2 . The considerations regarding the projection of (H − E) |Ψ(A) onto the space of two-site variations can also be used to devise a scheme for expanding the bond dimension of the uMPS. This approach is presented in the next section.

Appendix B: Dynamic Control of the Bond Dimension
A characteristic feature of two-site implementations of conventional MPS methods -such as e.g. (I)TEBD or (I)DMRG -is that the bond dimension D of the MPS is automatically increased in every iteration and has to be truncated in order to remain at a finite maximum bond dimension. This truncation step lies at the basis of why such schemes for finding ground states will never truly converge to the variational minimum up to machine precision, as observed in the results. Indeed, even in finite size simulations, two-site DMRG is used to initialize the state and one-site DMRG to obtain final convergence. However, the truncation step in two-site methods has the advantage that the bond dimension can be dynamically increased (or decreased) according to some quality constraint, such as the magnitude of the smallest Schmidt-value or the discarded weight. Especially in the presence of symmetry, this is important to automatically obtain the correct symmetry sectors within the virtual MPS space.
The VUMPS algorithm presented in the main text is variational from the start and therefore works at fixed bond dimension D, i.e. it is a one-site scheme in DMRG terminology. Alternative subspace expansion strategies for dynamically increasing the bond dimension in such one-site schemes have been proposed. 22,60 These methods use information from acting with the global Hamiltonian onto the current state to either add a tiny perturbation to the current MPS or to generate a larger basis in which the effective eigenvalue problem is solved.
We have developed a similar subspace expansion technique that works for a uMPS in the thermodynamic limit. It is based on projecting the full action of the Hamiltonian (H − E) |Ψ(A) onto the space of two-site variations, as developed in the previous section. There we have found the representation Even when we have not yet reached the variational minimum, the first term (on line 1) or the first two terms (on line 2) are captured in the tangent space, and only the last term (on either line) contains a new search direction. To capture it completely, we would need to expand the bond dimension from value D to dD. If we want to expand to a new dimensionD = D + ∆D, we can use a singular value decomposition to compute the rank ∆D approximation of s t (N s L ) † A s t 2C (N t R ) † = U SV . By keeping only the largest ∆D singular values, U and V are left and right isometries of size (d − 1)D × ∆D and ∆D × (d − 1)D respectively. As remarked in the previous section, in the case of nearest neighbor interactions, the projection of A 2C onto N L and N R does not require the full two-site effective Hamiltonian but reduces to the local term.
We do not directly update the current MPS, but rather write it in an en expanded basis in a mixed canonical form with matrices With these initial tensors, we can now start a new iteration of VUMPS. Note that we can straightforwardly update the environments used to construct the effective Hamiltonians into this expanded basis, which is necessary if we want to use them as initial guess.

Appendix C: Explicit Construction of Effective Hamiltonians
In this section we describe how to efficiently apply the effective Hamiltonians H A C and H C onto the center site tensor A s C and bond matrix C and how the necessary individual terms are explicitly constructed. Such a procedure is needed for solving the effective eigenvalue problems (23a) and (23b) by means of an iterative eigensolver. The case of systems with nearest neighbor interaction has already been discussed in Sec. II B. In the following we consider the cases of Hamiltonians with long range interactions in Sec. C 1 and general Hamiltonians given in terms of Matrix Product Operators (MPOs) in Sec. C 2.

Long Range Interactions
Consider Hamiltonians with long range interactions of the form H = j∈Z h j , where h j is itself an infinite sum The generalization to Hamiltonians containing several terms of that form is straight forward. Furthermore, we assume distance functions f (n) that are bounded in the sense of n>0 |f (n)| < ∞, such that h j < ∞, and that can be well approximated by a sum of K exponentials, i.e.
with |λ k | < 1 and n > 0. In practice, for an infinite system we fit f (n) with a suitable number of K exponentials over a distance N large enough, such that f (N ) and the largest residuals are below some desired threshold. Examples of Hamiltonians that fall in this class are the transverse field Ising (TFI) model or XXZ model with power-law interactions, 62-64 as well as the famous Haldane-Shastry model, 42,43 for which the ground state is exactly known.
Similar to the case of nearest neighbor interactions in Sec. II B, the effective Hamiltonians factorize into a number of terms which can all be applied efficiently. For H A C these are the five terms, out of which four are already familiar from the case of nearest neighbor interactions. Two of these are the left and right block Hamiltonians H L and H R with infinitely many local contributions from h j acting on sites strictly left or right of the current center site, and the other two are the terms containing interactions between the center site and the left and right block respectively, i.e. where h j partially acts on A C . For long range interactions we have one additional term, containing infinitely many interaction terms between the left and the right block only without involving the center site, i.e. where o j acts to the left of the current center site, and o j+n acts to the right.
To construct all these terms we start by defining the operator transfer matrices The current energy density expectation value e = Ψ(A)|h|Ψ(A) can thus be written as or using (C2) Since |λ k | < 1 the geometric series converge and we can perform them explicitly. We proceed by defining L |1 1). (C6) These terms can again either be calculated recursively by explicitly evaluating the geometric sums term by term until convergence, or more efficiently by iteratively solving the following systems of linear equations using iterative methods. We represent these terms by the diagrams and collect all such terms into single left and right environment contributions and further We can then write for the energy density Comparing with (C4) we have thus defined With these definitions at hand we can write the left and right block Hamiltonians as These equations are exactly the same as Eq. (13) for the case of nearest neighbor interactions, but with different (h L | and |h R ). We can thus evaluate the geometric sums recursively or by solving a linear system iteratively, as explained in Sec. II B. Note that we again start by applying an energy shift (h L | → (h L | = (h L | − e|R)(1 1| and similar for |h R ), such that (h L |R) = (L|h R ) = 0.
We are now ready to formulate the action of H A C onto A s C as The additional factor of λ k in the sum in the last term arises due to A s C adding an additional site between the left and right operators o. Similarly, the action of H C onto C becomes If there are additional simple single or nearest neighbor two-site terms present in the Hamiltonian, appropriate terms as described in Sec. II B can be added. For a pseudocode summary for obtaining the necessary explicit terms of H A C and H C for Hamiltonians with long range interactions, and their applications onto a state, required for solving the effective eigenvalue problems using an iterative eigensolver, see Table IV Calculate single environment contributions OL and OR from (C8) and hL and hR from (C9)

14:
Calculate updated C from (C14) 15: return C 16: end function whereŴ [j] contains operators acting on site j only and w L andŵ R are operator valued boundary vectors.
An example for such an MPO decomposition for the Transverse Field Ising (TFI) Hamiltonian with exponentially decaying long range interaction a . An application of T [W ] L/R ab onto both quasi fixed points will therefore accumulate an additional term contributing to the extensive global energy expectation value. Similar to the terms in Eq. (13) or Eq. (C12) in the previous cases involving infinite geometric sums, we can however safely discard these diverging contributions, which is equivalent to setting the energy expectation values of the semi-infinite left and right half of the system to zero (see below).
In the following we briefly reiterate the procedure of Ref. 69  Without loss of generality we assumeŴ ab to be of lower triangular form, i.e.Ŵ ab = 0, ∀b > a. Furthermore, we assume the typical case of any nonzero diagonal elements being proportional to the identity, i.e.Ŵ aa = λ a 1 1, where λ a ≤ 1 and usually λ 1 = λ d W = 1, as is e.g. the case in (C15). By defining the result of the action of the  d W ). A concrete evaluation (see below) of the discarded terms in these cases shows that they correspond to contributions to the energy density expectation value, i.e. discarding these terms is equivalent to a constant shift in energy, such that the energy density is zero and we have (L i.e. the fixed point relations only hold up to an additive diagonal correction for these elements. These corrections correspond to the energy density expectation value e = (Y L1 |R) = (L|Y Rd w ) (C27) and they can in fact be used for its evaluation. As a concrete example, for the long range TFI Hamiltonian given by MPO (C15) we obtain (L

41:
Calculate updated C from (C29) 42: return C 43: end function which can be performed in O(dd W D 3 ) + O(d 2 d 2 W D 2 ), respective O(d W D 3 ) operations. In total we also have to perform an iterative inversion for each diagonal element ofŴ .
This framework is very flexible, general and powerful, once a routine for determining quasi fixed points of general MPO transfer matrices has been implemented. The effective Hamiltonians of Sec. II B and Sec. C 1 are contained within as special cases. A pseudocode summary for obtaining the necessary explicit terms of H A C and H C for Hamiltonians given in terms of an MPO, and their applications onto a state, required for solving the effective eigenvalue problems using an iterative eigensolver, is presented in Table V.
Such expressions typically arise in situations where one sums up contributions of successive applications of T onto some fixed virtual boundary vector x, with the initial contribution being the boundary vector x itself. This is reflected in the above expression by summing from n = 0 and using the definition T 0 = 1 1. We assume a spectral decomposition of the transfer matrix given by where the left and right eigenvectors are mutually orthonormal, i.e. (j|k) = δ jk . Note that T is in general not hermitian and thus (j| = |j) † . For a generic injective normalized state, T has a unique eigenvalue of largest magnitude given by λ 0 = 1, whereas all other eigenvalues are contained in the unit circle (|λ j>0 | < 1).
We can safely perform the geometric sum for all eigenvalues |λ j>0 | < 1, while λ 0 = 1 contributes a formally diverging term We realize that the spectral decomposition of (1 1−T ) −1 has a component of |0)(0| and therefore identify the second term in (D5) as For the geometric sum we then obtain Usually it is not necessary to calculate the full matrix expression of n T n , but to just act with it onto some (x| or |x). The diverging contributions can typically be safely discarded, as they correspond to a constant (albeit infinite) offset of some extensive observable (e.g. the Hamiltonian). The action of the finite remaining part can be calculated efficiently by iteratively solving the linear system of equations of the type A y = x or y † A = x † (y|(1 1 − T ) = (x|Q (1 1 − T )|y) = Q|x) with inhomogeneities x = Q|x) and x † = (x|Q. One can then efficiently compute |y) and (y| by employing an iterative Krylov subspace method such as bicgstab 70 or gmres. 71 For such methods only the implementation of a (left or right) action of (1 1−T ) onto a vector is necessary, which can be done efficiently with O(dD 3 ) operations. If the transfer matrix is in left or right canonical form, we recover the linear systems in Eqs.