Quantum-classical eigensolver using multiscale entanglement renormalization

We propose a variational quantum eigensolver (VQE) for the simulation of strongly-correlated quantum matter based on a multi-scale entanglement renormalization ansatz (MERA) and gradient-based optimization. This MERA quantum eigensolver can have substantially lower computation costs than corresponding classical algorithms. Due to its narrow causal cone, the algorithm can be implemented on noisy intermediate-scale quantum (NISQ) devices and still describe large systems. It is particularly attractive for ion-trap devices with ion-shuttling capabilities. The number of required qubits is system-size independent, and increases only to a logarithmic scaling when using quantum amplitude estimation to speed up gradient evaluations. Translation invariance can be used to make computation costs square-logarithmic in the system size and describe the thermodynamic limit. We demonstrate the approach numerically for a MERA with Trotterized disentanglers and isometries. With a few Trotter steps, one recovers the accuracy of the full MERA.


I. INTRODUCTION
The complexity of quantum many-body systems makes it a formidable challenge to understand the properties of quantum matter, in particular in strongly correlated regimes where perturbative approaches fail. Hence, powerful classical simulation techniques like quantum Monte Carlo [1][2][3][4] and tensor networks states (TNS) [5][6][7][8][9][10] have been developed. A strength of TNS techniques is that they are also applicable for frustrated quantum magnets and fermionic systems [11][12][13][14][15], where quantum Monte Carlo is hampered by the negative-sign problem [16,17]. These classes of systems include candidate spin liquid materials [18][19][20][21][22], fractional quantum Hall physics [23,24], and high-temperature superconductors [25,26]. Consider a lattice system with N sites, each associated with a site Hilbert space of dimension d such that the total Hilbert space has dimension d N . The idea of TNS is to approximate the many-body state by a network of partially contracted tensors. The tensors may carry physical indices that label site basis states and additional bond indices of dimension χ which are contracted with corresponding indices of other tensors. The structure of the network and the required bond dimension χ are adapted to the entanglement structure in the system. Typically, the more entangled a system is, the larger χ needs to be in order to achieve a desired approximation accuracy. To approximate the ground state of a given modelĤ by a TNS |Ψ⟩, one minimizes the energy ⟨Ψ|Ĥ|Ψ⟩/∥Ψ∥ 2 with respect to the tensor elements. The beauty of the approach is that computation costs for optimization steps are reduced from exponential in N to polynomial in N . In particular, they are linear in N for matrix product states (MPS) [5,6,[27][28][29], projected entangled pair states (PEPS) [7,8,[30][31][32], and the multi-scale entanglement renormalization ansatz (MERA) [9,33]. For homogeneous MERA, one can reduce the cost to O(log N ) and even access the thermodynamic limit N → ∞. However, the classical computation time may scale with a high power of the bond dimension χ. While it is only O(χ 3 ) for MPS in one-dimensional (1D) systems [6,28], it is O(χ 7...9 ) for 1D MERA [34], O(χ 10... 12 ) for 2D PEPS [35,36], and O(χ 16... 28 ) for 2D MERA [37,38]. Hence, practicable χ are usually rather small, which limits the approximation accuracy.
In this work, we propose and analyze a hybrid quantum-classical variational eigensolver [39] to overcome these limitations, where many-body ground states are approximated by adapted MERA states and (small) quantum computers are employed to efficiently execute tensor contractions. In this context, MERA have four advantages over other TNS: (i) MERA can be applied for systems with any number of spatial dimensions, (ii) all tensors are unitary or isometric, which allows for a rather direct implementation on quantum computers, (iii) MERA expectation values for local operators depend only on narrow causal cones such that they can be evaluated exactly and large systems can be simulated on noisy intermediate-scale quantum (NISQ) devices, and (iv) sets of MERA are closed which implies that optimizers always exist [40]. Also, MERA optimizations are not hampered by barren plateaus [41,42].
Quantum algorithms using MPS for 1D systems were recently suggested in Refs. [43][44][45][46][47]. Two prominent quantum-computing platforms are superconducting qubits [48,49] and ions in electromagnetic traps [50,51]. Ion-trap systems with qubit-shuttling capabilities [52][53][54] are particularly interesting for the quantum MERA scheme.  (triangles). Contraction lines between the tensors correspond to renormalized site vector spaces with dimension χ. The shaded region indicates the causal cone for a twosite operatorĥi. The cone has width three, i.e., contains at most three renormalized sites in each layer. Only causal-cone states like |Ψi⟩, associated withĥi, need to be generated on the quantum computer. (b) Isometries can be realized as unitaries where some input qubits are initialized in a reference state like |0⟩ (open circles). Such |0⟩ qubits can be moved in and others (filled circles) can be moved out of the quantum register or reset after applying the layer-transition maps. (c) A Trotter structure with t steps is imposed on each MERA tensor, making it a circuit of two-qubit gates. Here, t = 3 and χ = 16, i.e., q = 4 qubits per contraction line. (d) Each Trotter gate can be implemented using CNOTs and single-qubit rotations or, equivalently, single and two-qubit rotations. The specified Pauli operators generate the corresponding rotations (1), andσ refers to a general single-qubit rotation.
cess, states that are not important for the representation of the ground state are discarded. With a branching ratio of b, this process ends with one or a few renormalized sites after T ∼ log b N steps, and the resulting few-site problem can be solved exactly. While the physical site Hilbert spaces have dimension d, Hilbert spaces of renormalized sites have dimension χ. Seen in reverse, the renormalization group scheme defines a many-body state |Ψ⟩. This state is a MERA with bond dimension χ. It consists of T layers, each comprising the unitary disentanglers and isometries of a renormalization step. Optimizing the tensor elements to minimize the energy expectation value E = ⟨Ψ|Ĥ|Ψ⟩, one obtains a groundstate approximation.
In principle, it is straightforward to prepare a MERA |Ψ⟩ on quantum computers. Assume χ = 2 q such that every renormalized site corresponds to q qubits. A disentangler that acts on n (renormalized) sites is a χ n × χ n unitary acting on nq qubits. It can be decomposed into a circuit of O(4 nq ) single-qubit and CNOT gates [58][59][60]. An isometry that maps n sites into m > n can be implemented as a unitary acting on nq qubits and (m − n)q additional ones initialized in state |0⟩. It requires O(2 (n+m)q ) single-qubit and CNOT gates [61].
For simplicity, we assume a HamiltonianĤ = ĥ i with finite-range interaction termsĥ i . As exemplified in Fig. 1a, many tensors cancel in expectation values ⟨Ψ|ĥ i |Ψ⟩ due to their isometric property. The causal cone ofĥ i comprises all tensors that can influence the expectation value, and we define |Ψ i ⟩ as the corresponding causal-cone TNS such that ⟨Ψ|ĥ i |Ψ⟩ = ⟨Ψ i |ĥ i |Ψ i ⟩. For a binary 1D MERA, disentanglers act on n = 2 sites and the cost to evaluate ⟨Ψ i |ĥ i |Ψ i ⟩ would scale in q as O(4 2q ). Hence, the cost for the evaluation of an energy gradient would scale as O(4 4q = χ 8 ). This is only a modest improvement over the scaling O(χ 9 ) of the classical computation time. As discussed in Appx. A, the differences are generally more pronounced in higher dimensions. For example, the quantum and classical gradient evaluation costs for the 2D 2 × 2 → 1 MERA of Ref. [37] scale as O(4 8q = χ 16 ) and O(χ 28 ), respectively. In any event, one also needs to account for the required number of measurement samples in the quantum case, and the quantum computational complexity can be reduced drastically by imposing further structure on the MERA.

III. TROTTERIZED TENSORS
There are many options for substructures. Here, we choose to impose a Trotter structure on the MERA tensors. In particular, they shall consist of t Trotter steps, each comprising local unitary gates that act on, say, two nearest-neighbor qubits; see Fig. 1c. For tensors that act on n (renormalized) sites, each Trotter step consists of O(nq) local gates. For a Trotterized MERA (TMERA) with T layers, the measurement of a local expectation value ⟨Ψ i |ĥ i |Ψ i ⟩ then requires O(T t) time on the quantum computer. We will see that, using translation invariance, the measurement of the energy gradient requires O(T 2 t 2 nq) time. As we will also see in benchmark simulations, the local unitary gates approach identities when increasing the number t of Trotter steps. This establishes a connection to Trotterization as used in time evolution problems [62][63][64][65].

IV. HYBRID OPTIMIZATION ALGORITHM
In classical computations, MERA states are optimized by evaluating the so-called environment for each tensor and updating tensors one by one [66]. On a quantum computer, we can only measure observables, and the tensor environment is not accessible. The hybrid algorithm works as follows. The Trotter gates can be written as small circuits, parametrized through the angles θ = (θ 1 , θ 2 . . . ) of rotationŝ with respect to Hermitian unitary operatorsσ like the Pauli matrices {1,σ x ,σ y ,σ z } or tensor products thereof. A standard choice is depicted in Fig. 1d. It comprises three CNOT gates, oneσ z and twoσ y singlequbit rotations, as well as four general single-qubit gates [67,68]. The number of angles per Trotter gate agrees with dim SU(4) = 15 and reduces to nine angles when exploiting the unitary gauge freedoms in the TMERA. The energy gradient ∂ θ E can be evaluated by measuring where all angles except for θ j are kept fixed [69][70][71]. A derivation is given in Appx. C. The energy can now be minimized by a gradient-based algorithm like L-BFGS [72,73]. In experiments, two-qubit gates are typically much more costly than single-qubit gates. For ion-trap and superconducting systems, Refs. [74][75][76][77] specify typical single-qubit gate times of ∼ 10 µs and ∼ 30 ns, respectively, whereas two-qubit gates require ∼ 100 µs and ∼ 200 ns, respectively. The CNOT parametrization ( Fig. 1d) of the Trotter gates has the drawback that CNOT gates require two-qubit rotations with large angles. In the ion-trap and superconducting systems, CNOT is implemented using an effective Isingσ α ⊗σ α interaction with rotation angle θ = π/2 [77][78][79][80]. A better choice is then the canonical (CAN) parametrization in Fig. 1d that comprises three nativeσ α ⊗σ α rotations (α = x, y, z) and four general single-qubit rotations [81,82]. The benchmark simulations, discussed below, show that the occurring two-qubit angles for this parametrization are rather small. Furthermore, we find that the optimization actually works best in a parametrization-free fashion. Such a Riemannian quasi-Newton method on quantum circuits is described in Appx. C.2.

V. TRANSLATION INVARIANCE
For translation-invariant systems, the interaction termsĥ i are translates of the same operatorĥ. Correspondingly, we can reduce the number of variational parameters. A homogeneous MERA has translationinvariant layers, i.e., each layer consists of repeating iden-tical groups of tensors. A binary 1D MERA, for example, is then characterized by a single disentangler and a single isometry for each layer. For a heterogeneous system, the derivatives (2) can be evaluated by measuring expectation values for all terms h i that have the tensor of angle θ j in their causal cone. In total, this requires O(N T = N log b N ) measurements. With translation invariance, this can be reduced to O(T ) by either using classical random bits or introducing auxiliary qubits: For the 1D case illustrated in Fig. 1, there are two unitary transition mapsÛ L andÛ R . Either of them has to be applied to progress in the preparation of the causal-cone state |Ψ i ⟩ from layer τ to τ − 1. The specific sequence depends on the location i of the interaction term. In order to evaluate the energy density e := 1 N ⟨Ψ|Ĥ|Ψ⟩ = 1 N i ⟨Ψ i |ĥ i |Ψ i ⟩ for the entire system at once, we can replaceÛ L,R by their convex combination such that we obtain the state on layer τ − 1 aŝ For experiments, such quantum channels can be implemented by randomly selectingÛ L orÛ R in each transition. Practically, this constant reprogramming of the hardware in gradient evaluations can be slow. So, alternatively, the channels can be lifted to fixed unitary evolutions on a larger Hilbert space [83,84]. For Eq. (3), adding a single auxiliary qubit per layer is sufficient. Initializing it in the state (|0⟩ + |1⟩)/ √ 2 and applyingÛ L or U R conditioned on the auxiliary qubit realizes the channel (3). Appendix B gives details on the efficient realization of layer-transition maps for various MERA.
Homogeneous MERA are necessarily defined with periodic boundary conditions. Let the linear system sizes L x , L y . . . be large enough such that the causal cone of any local interaction termĥ i does not close upon itself along any of the spatial dimensions. Then, repeating the MERA tensor network in any spatial direction (L α → nL α ) with accordingly adapted boundary conditions defines families of homogeneous MERA, which all have the same energy density e. In particular, the results capture the thermodynamic limit N → ∞.

VI. QUBIT RESETS AND ION SHUTTLING
An attractive feature of the proposed quantumclassical TMERA algorithm is that, while we can simulate large systems, at any stage, only a system-sizeindependent number of qubits need to be acted upon with the unitary gates. When evaluating observables or gradients, as we progress from layer to layer, only the qubits inside the causal cone need to be in the quantum register. These are, e.g., 4q qubits for 1D binary and ternary MERA, and 14q qubits for the 2D 2 × 2 → 1 MERA of Ref. [37]. In every layer transition, some contraction lines (groups of q qubits) leave the causal cone. The same number of (new) qubits, initialized in state |0⟩, are needed to realize the isometries of the next MERA layer. When space efficiency is the highest priority, one can reset the qubits [87][88][89][90][91] that exit the causal cone to |0⟩ for reuse. Auxiliary qubits can be reset as well. When one wants to minimize execution times, one can employ quantum amplitude estimation (QAE) [92,93] in the gradient evaluations. For a preparation |Ψ i ⟩ =Û i |0, . . . , 0⟩ of the causal-cone state, QAE requires application of powers (Û i ) m . In this case, one cannot employ the (nonunitary) mid-circuit qubit resets. While the total number of required qubits is then logarithmic in the system size, still, only causal-cone qubits need to reside in the quantum register, and the others can be moved to a quantum memory. In ion-trap systems, this can be accomplished by shuttling as demonstrated in Refs. [52][53][54].

VII. BENCHMARK SIMULATIONS AND SCANNING
To demonstrate and benchmark TMERA, we simulate the 1D transverse-field Ising model It has a critical point at g = 1 with the paramagnetic phase for g > 1 and the ferromagnetic phase for g < 1. Figure 2 shows results for homogeneous TMERA with the modified binary network structure [34], using an L-BFGS optimization. The TMERA energy densities e are compared to the exact infinite-system value e ∞ gs . The left panel shows the convergence for g = 1.25, which quickly reaches a high accuracy. The right panel, shows TMERA accuracies for 0.75 ≤ g ≤ 1.25. Local minima are avoided through scanning, i.e., starting at g = 1.25, g is lowered in steps, and the converged TMERA of the previous step is used to initialize the optimization of the next. Upon reaching g = 0.75, we start scanning back to g = 1.25.
The numerical results confirm that a few Trotter steps t are sufficient to reach accuracies comparable to the full (non-Trotterized) MERA. In particular, t = 2 gives already excellent results for χ = 2 3 . For the experimental implementation, the Trotter gates can be expressed in the CAN representation. Figure 3 shows distributions of the rotation angles in the converged TMERA at different g. They are peaked at small angles. This remains true even for the critical point g = 1. The fact that most angles are small implies that these quantum gates can be executed quickly or at correspondingly higher fidelity.

VIII. COMPUTATION COST AND ACCURACY
Exploiting translation invariance, the O(T tq) components (2) of the energy gradient can be evaluated by preparing the corresponding causal-cone states, which costs O(T t) time, and then projectively measuring the local interaction termĥ. With N s samples per term, the statistical error of the gradient and, hence, the achievable energy accuracy scale as ϵ ∝ 1/ √ N s . Thus, the quantum cost for each TMERA optimization step is O(T 2 t 2 q/ϵ 2 ). Using QAE [92,93], the cost reduces to O(T 2 t 2 q log(1/ϵ)/ϵ) while increasing the circuit depth by a factor O(1/ϵ). Our simulations show that the error ϵ decreases according to a power law ϵ ∼ (t 2 q) −α . For fixed q, ϵ decreases until reaching the accuracy of the MERA with full tensors (fMERA) of bond dimension χ = 2 q . Upon approaching the saturation, one should increase q. The fMERA computation cost also follows a power law O(T χ r ) = O(ϵ −r/β ), where exponent r is determined by the contraction cost and exponent β by the relation between χ and accuracy ϵ. For 1D MERA, model-dependent exponents β ≈ 3.8 . . . 6.8 have been reported [34]. In simulations of the critical bilinearbiquadratic spin-1 chain [94][95][96][97][98] with modified binary MERA (r = 7), we find that the QAE cost O(ϵ −1−1/α ) is already lower than the classical fMERA cost O(ϵ −r/β ), providing a polynomial advantage; see Appx. A.5. As the exponent r for the classical fMERA cost is very large for higher-dimensional systems (r ≥ 16), the quantum algorithm should substantially outperform the classical simulations for models in ≥ 2 spatial dimensions. It is numerically very expensive to determine the scaling exponents for higher-dimensional systems and further investigations on this subject are needed.

IX. DISCUSSION
The presented TMERA quantum eigensolver allows for the approximation of many-body ground states with a system-size independent number of qubits, and it can substantially outperform classical MERA simulations. The appendices provide details on the computational complexity for different MERA network structures (Appx. A), the realization of layer-transition maps for homogeneous TMERA (Appx. B), optimization methods (Appx. C) including a Riemannian version of the L-BFGS algorithm, and the influence of different Trotter gate parametrizations (Appx. D).
Reference [99] discusses DMERA, which are a special type of TMERA, where the number of Trotter steps D in each layer is directly linked to the width ∼ 2D of the causal cone. The TMERA that we consider here have more structure, which allows one to tune these quantities independently, and the imposed structure admits a more direct comparison with the typical MERA used in classical simulations. The optimization based on the simultaneous perturbation stochastic approximation (SPSA), suggested in Ref. [99], is considerably less efficient because the energy derivative is evaluated only along random directions in the high-dimensional search space. Our gradient-based approach and the described utilization of translation invariance can be applied for any TMERA, including DMERA. Conversely, the robustness to noise as analyzed in Ref. [99] also applies generally to TMERA.
The decomposition of the MERA tensors into layers of nearest-neighbor Trotter gates is natural but not necessary. In future research, one could explore other network topologies to leverage, e.g., the all-to-all connectivity of ion-trap systems [100,101] and to increase the expressiveness of TMERA at fixed cost. One could also consider other gate types, especially those that are naturally available in prominent quantum-computing architectures. An example are multi-qubit Mølmer-Sørensen gates [102,103]. For an implementation on present-day devices, small two-qubit rotation angles are desirable. Hence, it will be interesting to explore how the angles and the TMERA accuracy are affected by adding large-angle penalty terms to the energy functional. See the follow-up paper Ref. [104] for further analysis of TMERA.
Note added. -During the long review of this paper, Refs. [105] and [106] appeared. The first studies the expressiveness of TMERA numerically; the second is an experimental demonstration, measuring critical correlations in preoptimized TMERA with bond dimension χ = 4 using an ion-trap system. Let us discuss the quantum computational complexity for different TMERA in one and two dimensions and compare to the corresponding classical simulation costs. Figure 4 shows the considered MERA networks: the 1D binary MERA, a modified 1D binary MERA [34], the 1D ternary MERA, a 2D 2 × 2 → 1 MERA [37], and a 2D 3 × 3 → 1 MERA [66]. We use the following labels: • b denotes the MERA branching ratio.
• T denotes the number of layers.
• χ = 2 q denotes the bond dimension with q being the corresponding number of qubits per renormalized site. • A denotes the cross section of the causal cone for local operators, defined as the maximum number of renormalized sites inside the causal cone at any layer interface (τ → τ ± 1).
• t denotes the number of Trotter steps for each tensor in TMERA (or an upper bound).
For heterogeneous MERA, the total number of sites is denoted by N and assumed to be ∼ b T .

A.1. Classical time complexity
The costs for optimizing MERA on classical computers are determined by the cost of computing the so-called environment of a tensor. This is in turn determined by the cost of applying a renormalization step (τ → τ + 1) to a local interaction term or, equivalently, for propagating a reduced density matrix inside the causal cone in the preparation direction (τ → τ − 1). In each step, one needs to contract the tensors of disentanglers and isometries and trace out sites that leave the causal cone. The costs for these operations, which one obtains by optimizing the contraction sequence, are given in Table I. Generally speaking, it is favorable to have a narrow causal cone, which explains why the classical costs for the 1D modified binary and ternary MERA are smaller than those of the plain binary MERA. On the other hand, for a given bond dimension χ, the binary MERA can encode more entanglement than the other two 1D MERA types and generally achieves higher accuracy.
The costs shown in Table I refer to one evaluation of the global energy gradient on a classical computer, or equivalently, a single update of all tensors in the Evenbly-Vidal algorithm [66]. For a homogeneous MERA, this cost, like the number of different tensors, is linear in T . For heterogeneous MERA, the total number of tensors and the cost are proportional to T τ =1 b τ ∼ N . The table shows the classical costs for MERA with full tensors (fMERA) because, on classical computers, there is not much to gain by exploiting the Trotter structure of TMERA tensors unless one is willing to introduce approximations.

A.2. Quantum computation time complexity
For the hybrid quantum-classical TMERA algorithm, the width of the causal cone is not as decisive for the computation costs. It determines primarily the number of qubits that need to be simultaneously in the interaction zone of the computer. Also, the specific network structure inside the cone, which influences the classical contraction costs, does not affect the scaling of the quantum computation costs. As discussed in Sec. III, the quantum costs for evaluating the TMERA expectation value of a local interaction termĥ i is proportional to tT . For translation-invariant systems and homogeneous   The comparison to the classical computation costs (Sec. A.1) is not trivial. In particular, one needs to take into account how many measurement samples are needed to reach a certain accuracy ϵ. As described in Sec. VIII, the comparison can be done by expressing the bond dimension χ as well as qt 2 in terms of ϵ. For critical models, they are related by power laws. The comparison for a specific model is discussed in Appx. A.5.
The classical component of the hybrid TMERA eigensolver controls the gradient evaluation and steers the gradient-based energy minimization. Its time complexity is always subleading to the quantum time complexity: For a TMERA with N different Trotter gates, the classical component operates on a vector space of dimension O(N ). The classical time complexity for one iteration of gradient descent or L-BFGS is just O(N ).
Although it is much less efficient than TMERA, one can also optimize fMERA using a quantum computer. A disentangler that acts on n renormalized sites can be decomposed exactly into a circuit of O(2 2nq ) singlequbit and CNOT gates [58][59][60]. An exact representation of an isometry that maps n sites into m > n requires O(2 (n+m)q ) single-qubit and CNOT gates [61]. The time cost for the energy gradient measurement of a homogeneous fMERA is then obtained by squaring the largest number of gates per tensor and multiplying by T 2 .

A.3. Quantum space complexity with resets
Concerning the quantum space complexity, we have to distinguish two settings. In the first, qubits corresponding to renormalized sites that leave the causal cone are reset to the reference state |0⟩ in order to reuse them for the implementation of isometries in the next layer transition τ → τ − 1. The distribution of measurement results in the evaluation of energy gradients and local observables like ⟨Ψ i |ĥ i |Ψ i ⟩ is the same with and without such mid-circuit resets. Qubit rests can be implemented through a projective measurement followed by a subsequent π rotation conditioned on the measurement result or through driven-dissipative reset schemes [87][88][89][90][91].
The number of qubits needed in the register is primarily determined by the cross section A of the causal cone. More precisely, from the sequence of contractions inside each layer of the causal cone, one can determine how many renormalized sites (groups of q qubits) are needed at any point in time. One can reduce this number by not insisting on parallel execution of gates, but shifting the MERA tensors of a layer-transition map temporally so that some qubits can already be reset before the application of further gates. The results for the considered MERA networks are shown in Table I.
As discussed in Sec. V, a major computation time reduction for the homogeneous MERA is achieved by taking appropriate convex combinations of the layertransition maps likeÛ L andÛ R such that the energy density for the full (infinite) system is obtained in one go. This can be done either with classical randomness or, if one wants to avoid the corresponding reprogramming of pulse sequences in the experimental evaluation of gradients, by introducing auxiliary qubits. In the latter approach, the number of required auxiliary qubits per MERA layer is proportional to log 2 of the number of transition maps. The auxiliary qubits are also amenable to qubit rests.

A.4. Quantum space complexity without resets
The experimental evaluation of observables and energy gradients can be made more time efficient using quantum amplitude estimation (QAE) [92,93]. For the preparation |Ψ i ⟩ =Û i |0, . . . , 0⟩ of a causal-cone state, QAE requires application of powers (Û i ) m . In this approach, one cannot employ the resetting of qubits that leave the causal cone, because subsequent factorsÛ i will again act on them. Without the resets, every layer transition τ → τ − 1, requires (b − 1)Aq new qubits, initialized in the reference state |0⟩ in order to realize the isometries. For a TMERA with T layers, one then needs a total of qubits, i.e., a number that grows logarithmically in the total system size N . In ion-trap systems, one can use shuttling [52][53][54] to move currently used qubits in and out of the quantum register.

A.5. Comparison of time complexities for critical spin-1 chains
To directly compare the time complexities of the classical fMERA and the quantum-classical TMERA algorithms, one has to take into account the number of measurement samples needed in each optimization step of the TMERA algorithm. As discussed in Sec. VIII, the required number of samples scales with the energy accuracy ϵ. Employing QAE [92,93], the total time complexity per iteration is O(t 2 q log(1/ϵ)/ϵ). The time complexity per iteration for a classical fMERA simulation is O(χ r ) = O(2 rq ), where the exponent r depends on the MERA type as shown in Table I. Figure 5 provides numerical results for the bilinearbiquadratic spin-1 chain [94][95][96][97][98] at the critical Uimin-Lai-Sutherland point ϑ = π/4. We choose this model because it corresponds to a conformal field theory with central charge c = 2 [107] and, hence, features significant entanglement. The figure shows the energy accuracy ϵ as a function of the classical fMERA and quantum-classical TMERA computation cost per iteration. In both cases, the energy accuracy follows a power law with a moderate polynomial advantage for TMERA. For the TMERA curves with fixed q, ϵ decreases until reaching the accuracy of the fMERA with bond dimension χ = 2 q . In practical simulations, one should hence increase q upon approaching the saturation, in order to follow the power-law decay as indicated by the dashed lines. The fMERA computation cost can be written as O(χ r ) = O(ϵ −r/β ), where the factor 1/β in the exponent captures the relation between bond dimension χ and accuracy ϵ. As the exponent r is very large (r ≥ 16) for 2D MERA, the quantum-classical TMERA algorithm will substantially outperform the classical fMERA simulations for models in ≥ 2 spatial dimensions.

Appendix B: Layer-transition maps for homogeneous MERA
Let us discuss in more detail, how classical sampling or auxiliary qubits can be employed to realize convex combinations of layer-transition maps in the preparation. In this way, the spatially averaged A-site density matrices of homogeneous TMERA can be prepared on the quantum computer in O(tT ) time. Hence, the energy density or a component of the energy gradient can be obtained in O(tT ) time. the least significant bit. For the choice shown in Figs. 4a and 6a, i τ = 0 or 1 means that we progress from layer τ to layer τ − 1 by applying transition mapÛ L ≡Û 0 or U R ≡Û 1 , respectively. Specifically, the causal-cone state for sites A i reads where state |T ⟩ for a single renormalized site defines the so-called top tensor of the MERA. Here and in the following, we do not explicitly denote the τ dependence of the layer-transition maps and use the convention that operators act on the A active sites in the causal cone and the remaining inactive sites are left untouched as indicated in Fig. 6b. For quantities like the energy density, we wish to evaluate the spatial averageρ = 1 N iρ i of the reduced density matricesρ i := Tr A ⊥ i |Ψ i ⟩⟨Ψ i | for the blocks of sites A i . The partial trace is over the inactive sites (outside the causal cone) and will not be denoted explicitly in the following. Starting fromρ (T ) := (|T ⟩⟨T |) ⊗A , we can recursively definê such that U L andÛ R are related to each other by a site permutation.
Hence, the quantum channel (B3a) can be implemented through a Stinespring dilation [83,84] with a single auxiliary qubit per layer. The auxiliary qubit is initialized in the state (|0⟩ + |1⟩)/ √ 2 and used for a controlled site permutation such that, after tracing out the auxiliary qubit, one obtains the channel (B3a).
Alternatively, one can use classical random numbers. Selecting in each layer transitionÛ L orÛ R with equal probability, also implements the channel (B3a). Mathematically, this scheme is completely equivalent to the approach using auxiliary qubits and results in the same distribution of measurement results in the evaluation of observables and energy gradients: Whether one prepares states |Ψ 1 ⟩ and |Ψ 2 ⟩ with 50:50 probability or prepares (|Ψ 1 ⟩ ⊗ |0⟩ + |Ψ 2 ⟩ ⊗ |1⟩) / √ 2 with an auxiliary qubit, the probability to observe outcome i associated with projec-torP i is in both cases ⟨Ψ 1 |P i |Ψ 1 ⟩ + ⟨Ψ 2 |P i |Ψ 2 ⟩ /2. Experimentally, the measurements need to be repeated until reaching a required accuracy ϵ. Practically, the approach using classical sampling involves a reprogramming of the experimental pulse generator hardware for each new sequence of transition maps. If this reprogramming is slow, the scheme with auxiliary qubits may be preferable as it employs the same quantum circuit in every measurement iteration.
Also the 2D 2 × 2 → 1 MERA can be treated similarly. Now, we use binary representations x 1 x 2 x 3 . . . and y 1 y 2 y 3 . . . for the x and y coordinates of site i on the square lattice with x τ , y τ ∈ {0, 1}. There are four layertransition mapsÛ T L ≡Û 0,0 ,Û T R ≡Û 1,0 ,Û BL ≡Û 0,1 , U BR ≡Û 1,1 , whereÛ T L corresponds to the causal cone continuing along the inner (top-left corner) 3 × 3 square indicated in Fig. 6e. The layer-transition channel readŝ Its Stinespring dilation can be implemented with two auxiliary qubits, each initialized in (|0⟩+|1⟩)/ √ 2, to realize the controlled site permutations that relate the mapŝ U xτ ,yτ to each other. The first auxiliary qubit can be used to permute rows of sites and the second to permute columns of sites on the inner 4 × 4 square in Fig. 6e.

B.4. 1D modified binary MERA
The cases of the 1D modified binary MERA and the 2D 3 × 3 → 1 MERA are bit more involved.
First, note that the layer-transition channel (B3a) for the (unmodified) 1D binary MERA can also be explained as follows: Let A Now, the spatially averaged density matrixρ (τ −1) = b τ −1 N jρ (τ −1) j is obtained by averaging over all sites i in layer τ and the two corresponding sites in layer τ − 1. Thus, we recover Eq. (B3a). The 1D modified binary MERA with A = 2 can be addressed similarly. In this case, not all bonds are equivalent. Depending on whether we are on an even or an odd bond, we apply either one of the three layer-transition mapsÛ L ,Û C ,Û R or the mapÛ o , respectively, as shown in Fig. 6d. WhileÛ L andÛ R move the causal cone from an even to an odd bond,Û C maps from an even to an even bond, andÛ o maps from an odd to an even bond. Letρ (τ ) e denote the bond density matrix averaged over all even bonds for layer τ andρ (τ ) o the average over all odd bonds. Then, the same considerations as above lead to the layer transitionŝ For the evaluation of energy densities and gradients we want to prepare the spatially averaged density matriceŝ ρ To realize this in the approach using auxiliary qubits, we can first introduce an additional flag qubit in the register of the quantum computer which indicates whether we are on an even or odd bond, i.e., progressing in the preparation direction, we want to prepare the stateŝ For these, the layer-transition channel readŝ It can be implemented by a Stinespring dilation that employs two auxiliary qubits per layer as specified in the second row of Table I.

Appendix C: Gradient evaluation and Riemannian optimization
The goal of the variational quantum eigensolver is to minimize the energy expectation value E = ⟨Ψ|Ĥ|Ψ⟩ over a TMERA variety {|Ψ⟩}, where the TMERA is characterized by the network structure, bond dimensions, and the tensor Trotterization as previously discussed. This minimization can be carried out by evaluating energy gradients and employing them in gradient descent methods or, preferably, quasi-Newton methods like the limitedmemory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm [72,73].

C.1. Gradients in the CNOT and CAN parametrizations
In the CNOT and CAN parametrizations, the Trotter gates are expressed in terms of single and two-qubit ro-tationsRσ(θ) = e −iθσ/2 [Eq. (1)]. The rotation angles θ parametrize the TMERA variety. To compute the energy derivative for one of these angles, we can write the energy expectation value in the form where the Hermitian operatorsÂ andB comprise the remaining tensors of ⟨Ψ|, |Ψ⟩, and the Hamiltonian. For brevity of notation, in the following, we will drop the "⊗1 ⊥ ", which indicates thatRσ(θ) only acts on a subspace corresponding to one or two qubits. The derivative is For the Hermitian and unitary operatorsσ,Rσ(± π 2 ) = (1 ∓ iσ)/ √ 2 and, hence, . (C3) such that Eq. (2) follows. In this way, the gradient can be evaluated on the quantum computer by measuring energy expectation values [69][70][71].
For homogeneous TMERA, the same rotation occurs multiple times as tensors are repeated in the translationinvariant MERA layers. Applying the product rule, this just means that the derivative ∂ θ E will contain one term for each occurrence of the rotationR(θ). This sum can be evaluated efficiently as described in the main text and Appx. B.
With the gradient in hand, one can apply standard implementations of gradient descent or quasi-Newton methods like L-BFGS.  2)). For the simplicity of notation, we stick to the full U(4) in the following.
For the optimization, we can regard M as embedded in the Euclidean space which is the space of the Trotter gates without the unitarity constraint. Let u ∈ M ⊂ E denote the vector that contains the matrix elements of all gates and E(u) = ⟨Ψ(u)|Ĥ|Ψ(u)⟩ the energy functional. To apply gradient-based optimization algorithms in this setting, we need to compute the derivative ∂ u E(u), project it onto the tangent space T u of M at u to obtain the gradient direction, construct retractions for line search, and vector transport to be able to sum gradient vectors from different points on the manifold. This is the program of Riemannian optimization as discussed generally in Refs. [108,109] and recently demonstrated for MERA in Refs. [110,111]. In the following, consider a single unitaryû ∈ U(n), with n = 4 for the considered Trotter gates, and we employ the Euclidean metric (real part of the Hilbert-Schmidt inner product) (û,û ′ ) := Re Tr(û †û′ ). (C6) The extension to the product manifold M is straightforward. As in Eq. (C1), let us write the energy expectation value in the form where "⊗1 ⊥ " indicates thatû only acts on an ndimensional subspace. Then, the energy gradient in the embedding space End(C n ) iŝ where Tr ⊥ is the partial trace over the subspace thatû does not act on. The gradientd fulfills ∂ ε E(û+εŵ)| ε=0 = (d,ŵ) for allŵ. An elementŵ of the tangent space Tû for U(n) atû needs to obey (û + εŵ) † (û + εŵ) = 1 + O(ε 2 ), i.e.,û †ŵ +ŵ †û = 0. So,û †ŵ needs to be skew-Hermitian and, hence, The Riemannian energy gradientĝ for the manifold U(n) atû is obtained by projectingd onto the tangent space such that (ŵ,ĝ) = (ŵ,d) for allŵ ∈ Tû. This giveŝ For a line search on the manifold, we need a retraction, i.e., a curverû ,p (τ ) on the manifold that starts fromû = rû ,p (0) in directionp = ∂ τrû,p (τ )| τ =0 ∈ Tû. We usê rû ,p (τ ) := e τpû †û ∈ U(n).
In analogy to Eq. (2), the Riemannian gradient (C10) can be obtained on the quantum computer by measuring energies of TMERA where one Trotter gate is modified: The tangent space Tû is a real n 2 -dimensional vector space, and we can choose a basis {iûσ j | j = 1, . . . , n 2 } with Hermitian unitariesσ j , i.e.,σ j =σ † j andσ 2 j = 1.
With (σ j ,σ k ) = δ j,k n, we can expand the energy gradient in the formĝ = i n 2 j=1 α jûσj /n and evaluate the expansion coefficients α j using the energy expectation values (C7), For the third line, we have used Eq. (C3) andRσ(θ) = e −iθσ/2 . To minimize TMERA energies, we can employ a Riemannian version of the L-BFGS algorithm. For the following, we return to the global optimization problem on the product manifold (C4) with the embedding space (C5), vectors u ∈ M comprising the matrix elements of all Trotter gates, and the Euclidean metric (u, u ′ ) = Re(u † u ′ ). Similarly, gradientsĝ, retractionsr etc. are now written in a vectorized form. In the Newton method, one generates a sequence of points u 1 , u 2 , · · · ∈ M that converges quadratically fast to a minimum of E(u). In each step, one obtains a second-order model of E(u) using the Riemannian gradient g k and the inverse Hessian H k at u k . The vector p k := −H k g k ∈ T u k that points from u k to the minimum of the quadratic model is used for an inexact line search, and one chooses the next point u k+1 = r u k ,p k (τ k ) on the corresponding line (retraction curve) with τ k ∈ R such that the Wolfe conditions are obeyed. The latter require that both the function value and the gradient norm decrease sufficiently, where, in the Riemannian version, one rather considers the energy derivatives in the search direction. The BFGS algorithm [72,108] modifies this procedure, avoiding the costly evaluation of the Hessian. Instead, one updates a positive definite approximationH k ∈ End(T u k ) of the inverse Hessian.H k+1 is determined by requiring that the gradient of the new quadratic model at u k+1 , evaluated at u k , should agree with the actual g k . This is equivalent to the secant equation s k =H k+1 y k with s k := T k τ k p k and the gradient change y k := g k+1 − T k g k , which are both elements of the tangent space T u k+1 as T k := T u k ,p k (τ k ) denotes the vector transport. From the solution space of the secant equation, one chooses the matrixH k+1 that is closest toH k in a suitable metric. Specifically, the BFGS  update readsH with V k := (1 − ρ k s k y † k )T k and ρ k := 1/(y k , s k ). Finally, the L-BFGS algorithm [72,108] avoids the increasing cost of operating withH k by keeping the ℓ most recent triples (ρ k , s k , y k ) in memory and computing an approximation of the inverse Hessian from them in each iteration.
The Riemannian L-BFGS algorithm that we employ, adapted from Ref. [108], is shown in Fig. 7. Note that, for numerical stability, after every retraction (C11), it may be necessary to project the resulting point onto the manifold in order to avoid the accumulation of small numerical errors. For example, this can be done by a singular value decomposition for every Trotter gate and setting all singular values to one. If this is done, the finite numerical precision for the vector transport (C12) should have negligible effects, i.e., need not be corrected.
The (classical) computation costs for each iteration of the Riemannian L-BFGS algorithm are linear in the total number N of Trotter gates and linear in the number ℓ of retained terms (rank ofH k ). The latter does not need to be scaled with the problem size, and we choose ℓ = 9 in all computations. Retractions (C11), vector transport (C12) etc. can be applied separately for every Trotter gate, each corresponding to 2n 2 entries of the full u vector with n = 4. The cost is hence indeed O(n 3 ℓN ) = O(N ).  9. XX-Trotterization. Left: In the TMERA considered so far, tensors are Trotterized into regular circuits of general two-qubit Trotter gates ∈ U(4). For 1D systems, one Trotter step consists of such gates applied on all odd bonds and then on all even bonds (or vice versa). In the CAN parametrization, each U(4) Trotter gate is realized by a sequence of two single-qubit gates, followed by three Ising rotations generated byσ α ⊗σ α , and two final single-qubit gates. In every Trotter step, causal cones widen by at most four sites. Right: In the XX-TMERA, layers of single-qubit gates alternate with layers ofσ x ⊗σ x Ising gates on even and odd bonds, respectively. One XX Trotter step contains two layers of Ising gates. In terms of the number of Ising gates, one U(4) Trotter step corresponds to three XX Trotter steps. For the latter, causal cones grow correspondingly faster.
Appendix D: Different Trotter-gate parametrizations and XX-TMERA The CNOT and CAN parametrizations for the Trotter gates of the TMERA are equivalent to the parametrization-free representation as U(4) unitaries, and, up to an irrelevant phase factor, one can transform between the parametrizations as described in Refs. [68,81]. This equivalence is tested and confirmed by optimizing TMERA for the 1D transverse-field Ising model (4) while scanning forth and back on the parameter interval g ∈ [0. 75, 1.25]. Starting at g = 1.25, the optimization was initialized by a product state with |ϕ⟩ =Rσz ( π 4 )Rσy ( π 4 )Rσz ( π 4 )|0⟩ on every site. Figure 8 shows the accuracies of the energy densities e during the scanning procedure for TMERA in the CNOT and CAN parametrizations as well as the parametrizationfree form ("Riemannian"). The results are compared to the corresponding fMERA optimization. The Euclidean L-BFGS algorithm [72,73] was employed for the TMERA in the CAN and CNOT parametrizations and the Riemannian L-BFGS algorithm, as discussed in Appx. C.2, was employed for the parametrization-free TMERA and fMERA. In all cases, the L-BFGS parameters were chosen as ε = 10 −12 , ℓ = 9, c 1 = 0.1, and c 2 = 0.9.
Although different TMERA parametrizations show differing energies in the early stages of the scanning procedure, they quickly converge. Without scanning, the parametrization-free TMERA is somewhat favorable. The explicit parametrizations in terms of rotation angles are more prone to getting stuck in local minima, whereas the local minima and saddle points for the parametrization-free form are entirely due to the structure of the TMERA manifold and the Hamiltonian. For random initial states, the parametrization-free form shows better convergence than the CNOT and CAN forms. As a simple example, note that the product state |0⟩ ⊗N is a stationary point in the energy landscape of the transverse Ising model (4) in the CAN parametrization. The data in Fig. 2 was obtained by Riemannian optimization.
For the 1D TMERA, we chose the Trotter circuit of each tensor to consist of steps with U(4) gates on all odd qubit bonds and all even qubit bonds, alternatingly, as shown in Fig. 1c and in the left panel of Fig. 9. The goal of this choice is to admit the generation of entangle-ment between any pair of qubits in a few Trotter steps. However, in the CNOT and CAN parametrizations, each Trotter gate features three elementary two-qubit gates; CNOT andσ α ⊗σ α Ising rotations with α = x, y, z, respectively. We have explored a different Trotterization approach, where layers of generic single-qubit gates alternate with XX Ising rotations. One XX Trotter step contains two layers of XX Ising rotations on odd and even bonds, respectively. For the same computational cost, characterized by the total number of two-qubit Ising rotations, causal cones in the XX Trotterization of the MERA tensors grow substantially faster than in the U(4) Trotterization. However, the benchmark simulations in Fig. 8 show no enhanced approximation accuracy. The energies for six XX Trotter steps per MERA tensor converge to approximately the same values as the energies for two U(4) Trotter steps per tensor. Improvements along these lines are a topic for future work.