R\'enyi free energy and variational approximations to thermal states

We propose the construction of thermodynamic ensembles that minimize the R\'enyi free energy, as an alternative to Gibbs states. For large systems, the local properties of these R\'enyi ensembles coincide with those of thermal equilibrium, and they can be used as approximations to thermal states. We provide algorithms to find tensor network approximations to the 2-R\'enyi ensemble. In particular, a matrix-product-state representation can be found by using gradient-based optimization on Riemannian manifolds, or via a non-linear evolution which yields the desired state as a fixed point. We analyze the performance of the algorithms and the properties of the ensembles on one-dimensional spin chains.


I. INTRODUCTION
From the point of view of thermodynamics, thermal states describe the equilibrium properties of a system. Given a Hamiltonian H, the Gibbs state describes the state of the system at a given temperature 1/β. On the other hand, thermal states arise from the principle of maximum entropy [1,2]: for a given energy, the thermal ensemble is the one that maximizes the von Neumann entropy S G (ρ) = − Tr(ρ log ρ). Equivalently, this can be formulated as the minimization of the free energy so that ρ G (β) = arg min ρ 0 Tr ρ=1 for some fixed value of β. To keep the notation light, we do not explicitly write the dependence of ρ G on this parameter. One should keep in mind that the minimum is taken with respect to density operators, i.e. positivesemidefinite operators ρ 0 with a chosen normalization, typically Tr(ρ) = 1. This optimization is not very convenient in practice, since the entropy S G is often difficult to compute, as it requires information about the entire spectrum of ρ. In a quantum many-body setting, this would require diagonalizing an exponentially large operator, because of the inherent tensor product structure of the Hilbert space.
In the quantum many-body setting, numerical approaches to thermal equilibrium do not try to explicitly solve the optimization above, but resort to different approaches to approximate Eqs. (1). Monte Carlo methods use sampling to estimate very efficiently the physical properties from Eqs. (1), but they encounter difficulties in scenarios where a sign problem appears, as can happen for fermionic models or frustrated systems. A different approach is based on tensor networks (TNs), where the total state corresponds to the contraction of low-rank tensors and allows for a local description of the physics. This is motivated by the fact that thermal states for a local Hamiltonian obey an area law for the mutual information [3,4], and hence there is strong theoretical evidence that a tensor network description should be efficient at approximating thermal states [4][5][6][7][8].
In practice, TNs are extremely successful for studying thermal equilibrium. In one spatial dimension, matrix product states (MPSs) can be used to construct a representation of the (mixed) Gibbs state [9][10][11][12][13] or, combined with sampling, to construct minimally-entangled thermal states [14,15]. Alternatively, the partition function can be represented as a two-dimensional TN, and its contraction can be approximated using tensor renormalization group approaches, for instance, as originally proposed in Refs. [16][17][18]. The algorithms can also be generalized for two-dimensional systems [19][20][21].
In this paper, we study alternative thermodynamic ensembles that, instead of the von Neumann entropy, maximize the α-Rényi entropy [22], at a fixed energy. In the limit α → 1, S α reduces to the von Neumann entropy. By replacing the von Neumann entropy in Eq. (2) by a Rényi entropy, we define the Rényi free energy: We would like to stress that, in general, the extremizer ρ α of this function is not the thermal ensemble. However, as we will show in Sec. II, this ensemble nonetheless reproduces all local expectation values in the thermodynamic limit. The parameter β α is not, in general, related to the conventional inverse temperature β, but should be treated as a constant for the optimization. From a TN perspective, the definition in Eq. (5) offers the possibility of directly performing a minimization, since the Rényi entropies in Eq. (4) are efficiently arXiv:2012.12848v2 [quant-ph] 7 Jun 2021 computable-at least for small integer values of α. In this paper, we analyze the properties of such ensembles, in particular, how they approximate the thermal properties, and present several variational algorithms which can be used to compute them.
For practical purposes, we will often consider the most convenient case α = 2, for which Eq. (5) becomes where the subscript R represents α = 2. In other words, optimizing Eq. (6) is equivalent to finding the most mixed state at a chosen energy. In applied mathematics, the optimization of such a function is known as a non-linear semi-definite programming and can be tackled with interior-point methods. However, in many-body quantum physics, the dimension of ρ increases exponentially with the system size, making such approaches impractical for large systems.
This paper is organized as follows. In Sec. II, we provide an analytical solution to Eqs. (6), expressed in the eigenbasis of the Hamiltonian. Since the eigenbasis of a many-body system is not always accessible, we propose an optimization strategy based on uniform MPSs, to approximate the purification of ρ R directly in the thermodynamic limit. This non-linear optimization can be accelerated using state-of-the-art techniques [23] by restricting it to the Grassmann manifold. This is discussed in detail in Sec. III A, and accompanying numerical experiments to benchmark the algorithm are presented. Moreover, we present an alternative technique, based on a non-linear evolution of the density operator in Sec. III B, which flows toward the desired ensemble. To conclude, we discuss possible developments in Sec. IV.

A. Maximal Rényi ensemble
We now show the analytical form of the extremizer of Eq. (5), which has been previously derived for classical distributions [24][25][26]. We can use this result in the quantum case, noticing that the state that minimizes Eq. (5) must be diagonal in the energy eigenbasis {|E k } and thus its eigenvalues are equivalent to a probability distribution.
To find the coefficients {p k } in the density operator ρ = k p k |E k E k |, ρ 0 which maximizes the Rényi entropy Eq. (4) under the constraints Tr ρ = 1 and Tr(Hρ) =Ē, we introduce the Lagrange multipliers β α and γ α . The functional L is then At the stationary point, the parameter γ α can be eliminated [24], and we obtain the maximal Rényi ensemble (MRE) where Z α is a normalization factor and Π E ⊥ is a projector onto the eigenvalues below a cutoff energy E ⊥ := α β(α−1) +Ē [27]: where Θ(·) is the Heaviside function. The same distributions weighted with the corresponding density of states D(E), from the approximation in Ref. [28]. Below, the von Neumann (c) and 2-Rényi (d) entropies for the canonical (solid line) and 2-Rényi (dashed line) ensembles are compared at a given mean energy density, for the same system size and Hamiltonian. In both cases, the asymptotic behaviors lim β→0 SG = lim β R →0 SR = N log 2 and lim β→±∞ SG = lim β R →±∞ SR = 0 are recovered. The branch with negative (positive) mean energy density corresponds to a β > 0 (β < 0), corresponding to a solution with a projector onto energies below (above) the cutoff energy E ⊥ .
To illustrate the behavior of Eq. 8, we show in Fig. 1 some characteristics of the different ensembles in a particular finite case. In Figs. 1(a) and 1(b), we show the distribution of ρ relative to the eigenbasis. The MRE has a distinctive cutoff energy, beyond which the distribution is zero and therefore fairly different from the case of the canonical ensemble. However, in a many-body system, we have to consider that the density of states is not uniform but becomes increasingly peaked in the middle of the spectrum. Then the distributions, weighted by the density of states, become much more similar, as seen in Fig. 1(b).
Another way of visualizing the relation between the canonical and the Rényi ensembles is to compare their entropies for the same mean energyĒ. In Figs. 1(a) and 1(b), we explicitly show the comparison of von Neumann and 2-Rényi entropies for the ensembles that maximize each of them over the whole energy range for a small system size. While the behavior is qualitatively similar, both ensembles only coincide in the limiting casesĒ = 0, when the state is maximally mixed (corresponding to the Gibbs ensemble at infinite temperature β = 0) andĒ = E min (E max ), when the ensemble reduces to the ground (maximally excited) state, corresponding to β → +∞ (−∞). To study the behavior at large system sizes, we chose to study an exactly solvable case, the results of which are in Fig. 2. For this Hamiltonian, the density of states becomes Gaussian and the arguments in Appendix A hold. While we expect that local observables for both ensembles coincide as the system size increases, the same does not need to hold for non-local quantities, such as the entropies. It is interesting to notice that the Rényi ensemble has a von Neumann entropy which approaches the Gibbs state, and hence will have a free energy-see Eq. (2)which increasingly approaches its maximal value. However, the same cannot be said for the Rényi free energy introduced in Eq. (5).

B. Equivalence of local observables
We now consider a one-dimensional quantum system described by a local Hamiltonian H, an operator in the complex Hilbert space H. This total Hilbert space is formed by the tensor product of N local Hilbert spaces: The Hamiltonian is restricted to be local, i.e. it can then be written in the form where each h n acts non-trivially only on sites n, . . . , n + − 1, and has finite operator norm. Additionally, we will assume that almost all local terms satisfy h n op > 0, such that the spectrum of H is extensive. We mostly consider infinitely large systems, but, when considering finite systems, we specify either open boundary conditions (OBC) or periodic boundary conditions (PBC).
In this setting, it is straightforward to see that the density of states has a variance which scales as O( √ N ). For specific models, such as strictly one-local Hamiltonians, it can be shown that D(E) becomes Gaussian in the thermodynamic limit. Under the assumption of a Gaussian density of states, we can then compute the variance of the energy when we take into account the energy distribution of the ensemble. In the case of the 2-Rényi entropy, it turns out that this can be computed exactly. As described in Appendix A, in both cases the variances (∆H) . Hence, if we think about the normalized energy spectrum, both distributions will be increasingly peaked around the sameĒ = H with a standard deviation O(1/ √ N ) for large N . Hence, the expectation values of local observables become equivalent in the thermodynamic limit. This derives from the correspondence between microcanonical and canonical ensembles [29]. While there exist counterexamples to this correspondence, a sufficient condition for it to hold is that the energy per site converges to a constant [30,31]. Note that while this argument has been carried out for a Gaussian density of states, we believe that it can be extended to the general case as long as the Hamiltonian is local.
As a final note, we wish to remark that, at least in the case of α = 2, we find a correspondence β R → β which holds in the thermodynamic limit. This holds asymptotically for large β R and the range of validity of this approximation increases with system size. Hence, the β R for which the Rényi ensemble has the same energy den-sityĒ as a Gibbs ensemble turns out to be the same as the inverse temperature β. This can be shown in the case of a Gaussian density of states (see Appendix A), and is observed numerically in both integrable and nonintegrable models (see Sec. III A). This is somewhat surprising, since a priori there is no connection between the parameters describing the two different ensembles. From a practical point of view, however, this correspondence is convenient to approximate a thermal ensemble, since we may as well take β R to be the inverse temperature.

III. VARIATIONAL ALGORITHMS FOR APPROXIMATING THE RÉNYI ENSEMBLE
In this section, we introduce two different possibilities to numerically obtain the Rényi ensemble in Eq. (8).
Although we have a closed form for the exact solution, its use in a many-body setting is impractical because it would require knowledge of the full energy eigenbasis or of the projector in Eq. (8). This motivates the formulation of methods compatible with TN techniques. In Sec. III A, we explore how uniform MPSs can be used to form a purification which represents the density matrix, and its individual tensors can be optimized directly by using techniques from Riemannian optimization. In Sec. III B, instead, we propose a non-linear evolution which has Eq. (8) as a fixed point, so any arbitrary state can be brought to the desired one by simulating this evolution for a sufficient amount of time.

A. Minimization on the MPS manifold
The optimization problem in Eqs. (6) can be restricted to the manifold of states described by some class of tensor networks. In particular matrix product states (MPS) are arguably the most effective ansatz to represent ground states of local, gapped Hamiltonians in one dimension [5,[32][33][34]. We consider a uniform MPS, which written in the conventional diagrammatic notation, is Hence, given a local basis { s} = {s 1 , . . . , s N } s=1,...,d , each A is a rank-3 tensor with a physical index of dimension d and two virtual indices contracted with the neighboring tensors, each with dimension D [35][36][37]. In this section, we focus on uniform MPS for simplicity, but the method can be applied to finite MPSs as well.
A natural generalization of MPS for quantummechanical operators is a matrix product operator (MPO) [9,10,38], which is composed of rank-4 tensors contracted sequentially, The issue with this construction is that it is hard to ensure positivity (if the tensors are over the field C or R), which is a necessary and physical property for objects like density operators. The problem is that positivity is a global property, which cannot be captured in the local tensors [39][40][41]. Although an MPO ansatz has been used successfully to approximate the stationary points of dissipative dynamics [42,43], it is problematic for a variational method since there is no way to vary the local tensors without compromising positivity. An alternative is to introduce a locally purified state [9,39,44], which guarantees the positivity of the operator for any local tensor. The construction goes as follows. One considers a pure state, where each site has twice the degrees of freedom, which we call system and ancilla, so the local tensor is By tracing out the ancillary degrees of freedom, we obtain a ladder-like TN, which represents the density matrix ρ = Tr anc |Ψ Ψ|, or, graphically: Shaded boxes represent complex conjugation. It is simple to see that this TN is positive semidefinite by construction. The price to pay is that we have introduced a non-linearity in ρ with respect to the local tensors A s , so even if the objective function is quadratic in ρ, as in Eqs. (6), it will be quartic in the local tensors. Hence we cannot use linear algebra to iteratively optimize the local tensors, as in the case of the density matrix renormalization group (DMRG) [36]. Nonetheless, we can consider the problem in Eq. (6) a non-linear optimization over the tensors of an MPS.
The parametrization of the state in Eq. (13) has an inherent redundancy, since we can perform a gauge transformation on the virtual degrees of freedom of the form A s → XA s X −1 , for any invertible matrix X. This gauge redundancy of the MPS parametrization allows us to choose the tensors to fulfill the left-gauge condition: (17) For the rest of this paper, we will often not draw the ancillary degree of freedom, but implicitly assume it is part of the physical leg of each tensor. The tensor is the (positive-semidefinite) right fixed point of the transfer matrix, which encodes the Schmidt values [36]. Hence, we can view the tensor A as a linear map from the right virtual leg to the left virtual and physical legs, which is isometric. We will use W : C D → C d × C D to denote this specific mapping. Alternatively, we can think of W as a matrix, so we can use the notation W † W = 1 to unambiguously specify the isometricity condition.
Hence we can restrict a generic optimization of an MPS to the optimization of tensors over the Stiefel manifold [45], In reality, since there is a unitary freedom remaining in Eq. (17)-namely, A s → U † A s U -one can restrict the manifold even further to the Grassmann manifold. The Grassmann manifold should be understood as a quotient manifold, namely all W satisfying the isometricity condition up to a basis rotation, and it is often denoted as Gr(n, p) = St(n, p)/U(p) [45].
To optimize a generic function f (A): Gr(Dd, D) → R using any gradient-based optimization, we must be able to compute the gradient with respect to the parameters in A and project it onto the tangent space of the Grassmann manifold. The optimization of differentiable functions on Riemannian manifolds has been the object of extensive studies in mathematics and recently these techniques have been applied to TNs [23]. For the self-containedness of this paper, we summarize the key ingredients of this optimization in Appendix B.
For our application, the objective function is given by Eqs. (6). For the uniform MPS of Eq. (13), it reduces to where ε = Tr(Hρ)/N is the energy per site and η = Tr ρ 2 1/N is the purity per site. Both these terms are computable with standard TN routines in polynomial time, for uniform MPSs as well as finite MPSs. The gradient of Eq. 19 with respect to A is As for Eq. (C1), both these quantities ∂ε/∂A and ∂η/∂A are simple to obtain, as described in Appendix C. We thus use this gradient information to perform the optimization on the Riemannian manifold using the l-BFGS algorithm [46,47]. An open-source implementation of the non-linear optimization in Julia is available online [48].
To conclude, we note that gradient methods cannot guarantee in any way convergence toward the global minimum, but only some local minimum. While Eqs. (6) have a unique solution in the cone of the positive operators, the same cannot be said on a uniform MPS manifold of fixed bond dimension.

Numerical experiments
For our numerical experiments, we consider the Ising model: When the parallel field vanishes (h x = 0), the model is integrable, and local observables and correlations have a closed form [49,50]. We use this model to perform the optimization of Eqs. (6) as described in Sec. III A. The parameter β R is fixed to different values in the interval β R ∈ [0, 2], and the uniform MPS is optimized until the gradient is sufficiently small [51]. The results of the optimization are shown in Fig. 3, where we plot some local observables such as the magnetization σ z i and next-neighbor correlation Γ as a function of the mean energy density of the ensemble. By increasing the bond dimension, we increase the number of the free parameters, and the numerical results converge toward the thermal ones. Additionally, the comparison of the thermal observables by setting β R = β is shown in Fig. 4. Up to β R 2, we observe that there is a correspondence between the two ensembles at β R = β. For β R 2, the optimization of Eq. (19) converges to the ground space exactly, especially at small bond dimensions. To study the physics of low temperatures, it is therefore more convenient to reexpress the optimization problem in Eqs. (6) by introducing a Lagrange multiplier: The gradient (see Appendix C) can be modified accordingly, and the non-linear optimization can be performed in a similar way. This objective function gets rid of the dependence on β R , and one can directly choose an energy to target, since lim λ→∞ Tr(Hρ * ) =Ē. However, if one wishes to explore the behavior of some observable with respect toĒ, it is not necessary to perform the extrapolation with λ → ∞, but a finite λ is sufficient to obtain an energy in the vicinity of the desired value [52].
We also wish to remark that the method is completely general and does not depend on whether the system is integrable or not. To complete our benchmarks, we present   in Fig. 5 a comparison in the case where a parallel field is introduced, making the system non-integrable. In this case, exact results are not known, but our results are compared to those of an MPS approximation to the Gibbs state purification obtained with a traditional imaginary time evolution method [9,53]. Since the model does not have a finite-temperature phase transition, the method will behave similarly for any value of the fields. If one chooses h x = 0 and h z = 1, we expect that the required bond dimension increases when β → ∞, as the critical ground state is approached [54][55][56][57]. In this regime, the cost function in Eq. (6) will be dominated by the energy term. Hence the algorithm is reduced to an energy minimization, and we expect it to behave equivalently to other variational methods, such as the one proposed in Ref. [23].

B. Non-linear evolution
In Ref. [58], the authors introduced a non-linear evolution for approximating the thermal ensemble with Gaus-sian states. Here we generalize this idea for the Rényi entropies, which gives rise to an evolution that is efficiently computable with TN techniques. We consider a non-linear evolution of a density operator ρ τ which depends on a real parameter τ The operator J τ can be chosen such that the fixed point of this evolution gives rise to the MRE. For example, the choice gives rise to the same density operator as Eq. (6). The proof follows similarly from Ref. [58], and it is sufficient to show that the operator J τ in Eq. (24) satisfies the following criteria: Tr ρ τ = 1, ∀τ ∈ R trace conservation, (25a) ρ τ 0, ∀τ ∈ R positivity conservation, (25b) ∂f R (ρ τ )/∂τ ≤ 0 free-energy decrease.
Hence, choosing an appropriate density operator ρ 0 and integrating Eq. (23) over a sufficiently long interval, we obtain the solution to Eqs. (6), since its value can only decrease with time. There is no guarantee of reaching the global minimum-and indeed any eigenstate of H does not evolve under Eq. (23)-but a random choice of the initial state should be sufficient in most cases. We present some numerical experiments on small system sizes in Fig. 6, where the energy eigenbasis is available. In all cases, the numerically integrated density operator converges to the ensemble in Eq. (8). The evolution is discretized by expanding Eq. (23) to first order:  If the time step is chosen to be sufficiently small, then this evolution will converge to the desired fixed point. This is witnessed by the fact that the Rényi entropy reaches the theoretical maximum for each mean energy, as shown in Fig. 6. As a proof of concept, we also perform the integration using MPSs, in particular, using the TDVP scheme [59,60] to update the state at each time step. In practice, however, we observe that the time step required to obtain accurate results scales unfavourably with the system size, and we have yet to fully understand if the evolution becomes ill-conditioned for large system sizes. Notwithstanding, it is possible that different integration schemes allow for large time steps without compromising the stability of the evolution. We leave this as a venue for future work.

IV. OUTLOOK
In this paper, we have introduced an approach to compute thermal expectation values. Instead of attempting to approximate the minimum of the free energy, we construct an ensemble that maximizes the 2-Rényi entropy for the same mean energy, and-in the thermodynamic limit-reproduces local observables of the corresponding Gibbs ensemble.
We have shown that this ensemble can be efficiently approximated using TNs and have presented variational algorithms to obtain such an approximation. It is possible to work directly in the thermodynamic limit and use an MPS representation of the ensemble, which optimizes the objective function in Eqs. (6). Despite the simple form of this function, the optimization is non-linear and must be tackled with gradient-based methods. The fundamental reason is that the positivity constraint in TNs is highly non-local, and one way of enforcing it is via a purification. The convergence can be accelerated with techniques from manifold optimization, but a fundamental limitation is the high contraction cost. Indeed, for a purification of bond dimension D, the time complexity involved in computing the purity (see Appendix C) is O(D 5 ), which is significantly higher than the typical O(D 3 ) for other popular MPS algorithms, such as time evolution or ground-state search. Coincidentally, the former is the same leading cost of the original formulation of DMRG with PBCs [61]. Although the time complexity is higher, we observe that a moderate bond dimension captures well the ensemble and its local properties, both in integrable and nonintegrable models.
As an alternative to gradient-based optimization, we also propose an alternative method based on a non-linear evolution of the density operator. Under this evolution, the objective function in Eqs. (6) is monotonically decreasing, and hence flows to the MRE.
Despite these limitations, we believe more efficient cost functions could be devised. Additionally, the ideas outlined here could be applied to other wave-function ansätze. For example, in recent works [62][63][64][65], neural networks have been optimized with variational Monte Carlo to describe the steady state of dissipative dynamics. Such techniques could be adapted to perform the optimization described in this paper.

Gaussian of the form
where N is the system size and σ is a constant independent of N . Additionally, without loss of generality, let us assume it is centered at E mid = 0. For local Hamiltonians as Eq. (11), it was shown that the density of states weakly converges to a Gaussian in the thermodynamic limit, as a consequence of Lyapunov's central limit theorem [66][67][68]. As an alternative proof, one can take a an ancillary copy of the system, and consider the state |Ξ , which is the tensor product of maximally entangled pairs between system and ancilla: In the doubled system, the state |Ξ is a product state, and one can apply directly the Theorem in Ref. [66] to obtain the desired result. However, the rate of convergence to the central limit theorem is larger than O(1/ √ N ), and one should take into account the finite-size corrections when computing expectation values. Hence, we can think of Eq. (A1) as a toy model of actual local Hamiltonians, and derive results under this assumption.
For the Gibbs ensemble, we have that the partition function is This leads to Naturally, these results hold only in the region around the peak of the Gaussian, and break down when one tries to take the limit of β → ∞.
For the 2-Rényi distribution, we cannot use the trick of deriving the partition function with respect to β, since we cannot interpret it as a generating function. We can however express everything in terms of the truncated moments: The upper integration limit is related to the mean energy and β R as E ⊥ =Ē + 2 β R . These moments enjoy a recurrence relation of the form Φ m+2 = σ 2 ∂Φ m /∂σ. Additionally, Φ 1 is analytical because the integrand is the derivative of a Gaussian. This allows us to establish the identities By dividing the partition function by β/2, we can then compute the mean energy for this ensemble as Equating this result toĒ allows us to express Φ 0 in terms of Φ 1 : Using this last relation, we can write the variance as For the Gibbs ensemble, notice that β andĒ are collinear,Ē = −βσ 2 N . At infinite temperature (Ē = 0), we have trivially β = β R = 0. Expanding ρ G around β = 0, we obtain ρ G ≈ 1 − βH. Comparing this with the form of the MRE, we can easily conclude that in this limit β R ≈ 2β. However, one should take into account the thermodynamic limit. Indeed, as shown in Fig. 7, at non-zeroĒ, increasing the system size leads to an equation of state which asymptotically approaches E = −β R σ 2 N . This is due to the fact that the cutoff E ⊥ becomes proportional to β R . Indeed, at E ⊥ = 0, one has that β R (E ⊥ = 0) = O(1/ √ N ). The point E ⊥ = 0 also corresponds to a stationary point of ∂Ē/∂β R . Taking derivatives, one obtains a relation betweenĒ and β only (A10) As the system size is increased, the derivative converges toward a constant, as shown in Fig. 7. This allows us to conclude that, for a Gaussian density of states and β R 1/ √ N , we have β R ≈ β.

Appendix B: Technical details on Grassmann manifolds
The gradient of a function on a Riemannian manifold belongs to the tangent space of the manifold itself. A generic tangent vector to a uniform MPS is a linear combination of the partial derivative with respect to the single tensor. This can be seen as a vector embedded in where the sum runs over all physical sites. A tangent vector parametrized by a tensor B has an inherent gauge freedom to it. The explicit transformation that leaves the vector invariant is B s → B s +XA s −A s X, for any D × D matrix X. Indeed, the set of derivatives |∂ µ Ψ(A) form an overcomplete basis. Hence, by introducing the orthogonal complement of A [23], such that we can parametrize the tangent vectors as where Z is a D(d − 1) × D matrix. This parametrization arises quite naturally if one considers the tangent vectors to be embedded in the original Hilbert space. In this case, the choice of Eq. (B3) corresponds to imposing orthogonality of the tangent space, Ψ(A)|∆(B) = 0. We remark that Eq. (B3) is exactly the parametrization of the tangent space for Grassmann manifolds, as derived traditionally [45]. Hence we can consider the problem Eq. (6) as an optimization of a tensor A over the Grassmann manifold. One particularity arises from the choice of metric in the tangent space. In Riemannian manifold optimization, one usually chooses the Euclidean metric [45]: However, this is not the most natural choice in this setting, since the underlying physical Hilbert space prescribes the metric Note that, as opposed to Eq. (B4), this metric depends on the current point |Ψ(A) of the manifold. In practice, we notice that the choice of the metric is not very important for the optimization, and the Euclidean metric poses the advantage of not having to invert a potentially ill-conditioned fixed point when projecting onto the manifold. Additionally, the use of the Euclidean metric is not necessarily deleterious since, compared to Eq. (B5), it will magnify the importance of small Schmidt values of the state |Ψ(A) . In Ref. [23], the authors have proposed a non-linear preconditioner that acts as a compromise between these two metrics. Regardless of our choice, the metric allows us to project arbitrary Hilbert space vectors onto the tangent space. The projection operator for the Euclidean metric in Eq. (B4) reads [69] (B6)