Preparation of matrix product states with log-depth quantum circuits

We consider the preparation of matrix product states (MPS) on quantum devices via quantum circuits of local gates. We first prove that faithfully preparing translation-invariant normal MPS of $N$ sites requires a circuit depth $T=\Omega(\log N)$. We then introduce an algorithm based on the renormalization-group transformation to prepare normal MPS with an error $\epsilon$ in depth $T=O(\log (N/\epsilon))$, which is optimal. We also show that measurement and feedback leads to an exponential speedup of the algorithm, to $T=O(\log\log (N/\epsilon))$. Measurements also allow one to prepare arbitrary translation-invariant MPS, including long-range non-normal ones, in the same depth. Finally, the algorithm naturally extends to inhomogeneous MPS.

One of the most important tasks in many-body physics and quantum information science is the preparation of useful or relevant states.This has spurred a large effort to find ways to prepare states, for example, adiabatically [1], dissipatively [2,3], or using quantum circuits.A natural class of states to consider are matrix product states (MPS), because they efficiently approximate ground states of gapped local Hamiltonians [4][5][6].Moreover, many paradigmatic states can neatly be expressed as MPS, such as the cluster [7], GHZ [8], W [9] and AKLT states [10,11].
Several ways are known to prepare MPS.Using unitary quantum circuits with strictly local gates, all MPS can be prepared using a sequential quantum circuit of depth T ∝ N [12].This is provably optimal for long-range correlated states such as the GHZ state [13].However, for so-called normal MPS [14], which have short-range correlations, shorter depths are possible.Indeed, when allowing for a small error ϵ, they can be obtained by acting on a product state with a constant-depth circuit of quasilocal gatesgates whose support grows (poly-)logarithmically with system size [15,16].However, such quasilocal gates have to be compiled into gates with strictly local support, and in the worst case such a compilation leads to circuits with a depth scaling exponentially in the support, and thus as poly(N ).However, since normal MPS all lie in the topologically trivial phase, one can construct adiabatic paths with a guaranteed gap [17], which means normal MPS can provably be prepared adiabatically in T = O(polylog(N/ϵ)) [18] (also see [19]).
Despite of these results, it remains unclear if the scaling of the state-of-the-art algorithm [18] is optimal, or if there exist even faster algorithms to prepare normal MPS.Proving optimality requires finding a tight lower bound on the depth, or, equivalently, its complexity, which is believed to be difficult in general [13,20,21].After blocking, we approximate the state through its RG fixed point (nearestneighbor entangled pairs for normal tensors [Eq.( 12)]) combined with isometries V that encode the local structure of the state.(b) We use RG to construct an efficient circuit for V , which can be further expressed with a low-depth circuit of local gates.(c) Our algorithm extends to non-normal (i.e., long-range correlated) MPS with GHZ-like fixed points [Eq.(19)] depicted here.Using quantum circuits assisted by measurements, both the fixed-point states and the isometry V can be implemented efficiently.
Here we first resolve the question of asymptotically optimal preparation of normal translation-invariant (TI) MPS.We prove that any circuit faithfully preparing them requires a depth T = Ω(log N ), i.e., it has to scale at least logarithmically with N .We then introduce an algorithm that saturates this bound and prepares all normal TI-MPS in a circuit depth using strictly local gates.This is asymptotically faster than the previously fastest known algorithm (adiabatic preparation [18]) and also asymptotically optimal.Moreover, the algorithm naturally extends to inhomogeneous MPS that are suitably short-range correlated.
If one has additionally access to measurements and feedback, it is known that MPS can be prepared ex-arXiv:2307.01696v2[quant-ph] 25 Jan 2024 actly in a depth T = O(log N ) by expressing them in terms of the multiscale entanglement renormalization ansatz (MERA) [22].Including measurements also yields a speedup for our algorithm, and allows us to extend it to non-normal MPS, such that all TI-MPS can provably be prepared in depth T = O(log log(N/ϵ)). ( This is exponentially faster than the best known measurement assisted protocol [22].It also shows that our lower bound can be violated with access to measurements.As a byproduct, our work also proves that the finiterange MERA [23] can approximate normal TI-MPS in O(log log(N/ϵ)) layers.
Our algorithm fundamentally builds on the renormalization-group (RG) transformation.
The RG procedure consists of blocking neighboring sites and discarding short-range correlations.After consecutive RG transformations, the state asymptotically converges to its fixed-point state [24], which has only nearest-neighbor correlations for normal TI-MPS [24,25].This happens rapidly since it suffices to block only O(log(N/ϵ)) sites to approximate well the fixed-point state [16].Our algorithm first prepares this fixed point, and subsequently reintroduces the short-range correlations by applying an isometry of support log(N/ϵ) [cf.Fig. 1(a)].Our key contribution is that we can prove through an explicit construction (inspired by earlier works [12,22,24]) that this isometry can be implemented with a strictly local circuit of depth T = O(log(N/ϵ)) [cf.Fig. 1(b)].When assisted by measurements [cf.Fig. 1(c)], the depth of the isometry can be further reduced, while the GHZ-like fixed point of long-range correlated MPS can be prepared in constant depth [16,26].Together, this lead to the circuit depth T = O(log log(N/ϵ)) to prepare almost arbitrary (including all TI) MPS.
Preliminaries.-For simplicity, we first consider (normalized) TI-MPS, and later extend to the inhomogeneous case.Above A i are D × D matrices (D is the bond dimension) with i = 1, . . ., d (physical dimension).We will extensively use graphical notation and identify To each tensor A we associate its transfer matrix A tensor is called normal, if (i) it is irreducible (A i have no nontrivial common invariant subspace), and (ii) E A has a unique largest eigenvalue λ 1 = 1 and no other of the same magnitude [4,14].Its correlation length is defined via the subleading eigenvalue ξ = −1/ ln(|λ 2 |).After a gauge transformation [14], E A of a normal tensor can be brought into the form where the leading right eigenvector ρ > 0 (Hermitian and positive definite) [14,27], ⟨1|ρ⟩ = Tr(ρ) = 1, and R has spectral radius less than 1.Blocking q sites together yields a new tensor with physical dimension d q , the same bond dimension D, and transfer matrix E B = E q A .E B approaches its fixed point in the limit q → ∞ [24,25] which, for normal tensors, is E ∞ = |ρ⟩⟨1|.
Our goal is to devise an algorithm that approximates the target and | ϕ N ⟩ is prepared using a local quantum circuit.Our first result is that it is impossible to approximate well normal TI-MPS in depth o(log N ).Subsequently, we provide an explicit algorithm with the asymptotically optimal depth O(log(N/ϵ)).
Lower bound.-Given(i) {|ϕ N ⟩}, a sequence of normalized TI-MPS on N sites, generated by a normal tensor A, with finite correlation length ξ > 0 [cf.Eq. ( 3)], and (ii) {|ψ N ⟩}, a sequence obtained from depth-T local quantum circuits applied to product states, we are interested in determining how fast T has to grow in order to approximate the MPS well, as measured by the error ϵ = ϵ(ϕ N , ψ N ).We prove here that no quantum circuit with depth T = o(log N ) can faithfully approximate this class.
The proof can be found in [28].To establish this result, we use the fact that |ψ N ⟩ have a strictly finite light cone, whereas in a normal TI-MPS |ϕ N ⟩ correlation functions decay only exponentially.This leads to a mismatch in the expectation value of correlators outside the light cone, which gives a lower bound on the error between the two states.We additionally use the fact that sufficiently distant parts of the system are statistically independent, such that the error accumulates with increasing system size N , unless the circuit depth grows sufficiently quickly.
The algorithm.-Wenow present the key steps for our algorithm.We will (i) approximate |ϕ N ⟩ by | ϕ N ⟩, then (ii) show that | ϕ N ⟩ can be efficiently prepared, and (iii) prove that the approximation error decays sufficiently fast with N .We begin with the case of normal TI-MPS and return to the general case later.
Approximation through the fixed-point state.-Tomake the approximation, we follow the steps of the RG transformation [24].After blocking q sites, we perform a polar decomposition on the blocked tensor B, interpreting it as a map from the D 2 -dimensional virtual space to the d q -dimensional physical space 1 .This way we can write B = V P where V is an isometry with V † V = 1 D 2 and P > 0 is positive definite.Thus The approximation consists of replacing P by its fixedpoint version P ∞ in the tensor B, while keeping the isometry V intact.Graphically, Later we will assign meaning to the approximation sign in Eq. ( 9) by bounding the global error between the MPS |ϕ N ⟩ and | ϕ N ⟩ resulting from the two tensors, B and B. To obtain a vanishing error in the thermodynamic limit we will need q ∝ log N , which we assume for now and justify subsequently.
Preparing the approximate state.-Theapproximate state | ϕ N ⟩ can be prepared by acting on the fixed-point state with a product of unitaries of support q (for simplicity D = d in the illustration) The unitary is constructed such that it implements the required isometry when acting on a product state 2 |0⟩ ⊗ℓ over the "central" region (ℓ = 2 in the illustration) 1 To do a polar decomposition we need q large enough that d q ≥ D 2 .Later, we will have q scaling with the system size, so here we assume that B is injective [14].This is always true for normal tensors after blocking finite (independent of N ) sites [29]. 2 From dimension counting D 2 d ℓ ≥ d q thus ℓ ∼ q.
Note that for normal TI-MPS the fixed-point state |Ω⟩ = ⊗ N/q i=1 |ω⟩ RiLi+1 is a tensor product of entangled pairs, |ii⟩ RiLi+1 (12) each with support over the "right" and "left" Hilbert spaces of neighboring sites (dim It can thus be prepared from a product state with a constant-depth circuit.So far, it is not obvious that the resulting circuit can be expressed efficiently in terms of strictly local gates, because the unitaries in Eq. (11) are only quasilocal, i.e., having support q ∝ log N .While a naive bound on the circuit depth would be poly(N ), here we use the fact that U comes from an MPS to show that in reality it can be implemented in T = O(q).We do this by providing two explicit and exact decompositions of U in terms of gates with constant support, the "sequential-RG" and the "tree-RG".
The sequential-RG circuit.-Wecan express the unitary in Eq. (10) in terms of the original MPS by applying the inverse of P to its virtual legs3 , where in the last step we set A ′i = A i ⊗1 D and contracted P −1 with the rightmost A ′ to obtain C. As in sequential preparation of MPS [12] and in the left-canonical form [30], we can now iteratively apply singular value decompositions, starting from the tensor on the left and moving right, but stopping before the last tensor 4 .This defines a new set of tensors that describe the same isometry V but now each tensor is a local isometry (arrows indicate isometry direction, q = 4 in illustration) with every V i an isometry Since this sequential circuit comprises q sites, its depth is O(q).This scaling is unchanged if we additionally take into account that the inputs of the unitary in Eq. ( 13) are separated by O(q) sites, which requires one to implement SWAP gates.
The tree-RG circuit.-Blockingtwo neighboring sites followed by a polar decomposition is the basis for the real-space RG transformation and halves the correlation length [24,31,32].Instead of directly blocking q sites, we can repeatedly apply this transformation k ∼ log 2 q times to the same effect (illustration below, and in Fig. 1(b)).This generates a treelike circuit with k layers, in which each layer but the lowest consists of isometries from dimension D 2 to D 4 (below A k is obtained from A by blocking k sites, and q = 8) V P ( 1) V P (2)   .
In Eq. ( 16), the lowest layer is again the part that is replaced by the fixed-point state in our algorithm, i.e., a product of |ω⟩ [cf.Eq. ( 12)].In this scheme, the lowest isometry V (k) acts across a distance q.Though not strictly local, this can be done in a depth O(q) utilizing SWAP gates.Subsequent isometries act over distances q/2, q/4 and so forth, leading to an overall circuit depth T = O(q).Approximation error.-Sofar, we constructed efficient circuits for preparing | ϕ N ⟩, having assumed that we block q sites.The scaling of q is a consequence of the following Lemma, which is adapted from Ref. [16].
Lemma 1.Given a sequence of TI-MPS generated from a normal tensor, and for all γ < 1/2, The proof can be found in [28].Using Lemma 1, it follows that q = O(log(N/ϵ)).In particular, blocking We also numerically illustrate the exponential decay of Eq. ( 17) in [28] for preparing the 1D AKLT state [10,11] and an MPS family with tunable correlation length [33], which demonstrates that the circuit is also efficient in practice.
Inhomogeneous short-range correlated MPS.-Our results can be straightforwardly extended to MPS that have a finite correlation length, but are not TI.The setting here is that we are given a sequence of MPS {|ϕ N ⟩} with bond dimension at most D. We define such a sequence to have finite correlation length if, after blocking q = O(log N ) times, the resulting states can be approximated up to quasi-local isometries by a state consisting of nearestneighbor entangled pairs |Ω⟩ = N/q i=1 |ω i ⟩ RiLi+1 , with an error ϵ(Ω, ϕ pos ) → 0 as N → ∞.Here arises after blocking q sites and keeping the positive part of the decomposition of |ϕ N ⟩.If the finite correlation assumption is satisfied, then the preparation scheme consists of preparing |Ω⟩ and implementing the isometry, decomposed with either of the two methods.The resulting total depth is again O(log(N/ϵ)) with error ϵ(Ω, ϕ pos ), as in the TI case.
In the Supplemental Material [28] we numerically show that this protocol can prepare inhomogeneous random MPS [34][35][36][37] efficiently.For that, we use a simple extension of the Evenbly-Vidal algorithm [38,39], which efficiently variationally finds |Ω⟩.This illustrates that our finite correlation length assumption, as defined earlier, is satisfied in a practical setting.
Tree-RG circuit with measurements.-Localmeasurements and conditional local unitaries are the standard framework to perform quantum teleportation [45,46], which can be used to reduce the depth of the tree-RG circuit.Isometries appearing in Eq. ( 16) act on a constant number of sites which, although spatially separated, can be teleported at neighboring registers with a constant overhead.This can be achieved by creating nearest-neighbor entangled pairs, then performing simultaneous measurements, and correcting (without postselection) based on the measurement outcomes [47] (this process is also detailed in Ref. [22]).
Therefore every isometry in Eq. ( 16) takes constant time using measurement.Crucially, however, the tree-RG circuit requires only O(log log(N/ϵ)) layers (in contrast to Ref. [22]).Since the fixed-point state can be prepared in constant time as before, this gives a preparation algorithm for short-range correlated MPS with depth O(log log(N/ϵ)).
Long-range MPS using measurements.-Anotherconsequence of including measurements is that the creation of GHZ-like states |χ M ⟩ = b i=1 α i |i⟩ ⊗M becomes possible in only constant depth [16,26,48].These states are closely related to the fixed points of TI-MPS 5 which, up to an isometry, take the form [25] The normal case corresponds to b = 1 for which |Ω ′ ⟩ = |Ω⟩ while, in general, b is upper bounded by the number of blocks in the canonical form and α (N ) j may depend on N [25].
In [28] we show how to explicitly obtain a state of the form Eq. ( 19) that approximates well the target |ϕ N ⟩ up to local isometries by blocking q ∝ log(N/ϵ) sites.As a result, following the same steps as in the tree-RG circuit with measurements we have a scheme that approximates all TI-MPS (short-or long-range correlated) with depth T = O(log log(N/ϵ)) [cf.Eq. ( 2)].If instead measurements are only used for the preparation of |χ N/q ⟩, the depth is O(log(N/ϵ)).
Our construction generalizes to inhomogeneous longrange correlated MPS exactly as in the short-range case.
Connection to MERA.-The circuit in the tree-RG scheme can be interpreted as a finite-range MERA with O(log log N ) layers, namely a shallow tensor tree acting on the fixed-point state.Specifically, the isometries V (i) [cf.Eq. ( 16)] are identified with the isometries in finiterange MERA, and all disentanglers are the identity, save for the first layer, which is identified with the single layer of unitaries that prepare the fixed-point state.Hence, within the approximation error ϵ, normal TI-MPS ⊂ finite-range MERA with O(log log N ) layers.(20) Discussion and outlook.-Ourresults also imply that MPS in the same phase can be transformed into each other using a log-depth circuit, in contrast to the wellknown quasilocal evolution corresponding to polylogarithmic depth circuit [17,[50][51][52].It would be interesting to explore whether our results could be exploited for applications other than state preparation.Specifically, a number of protocols [39,[53][54][55][56][57] implicitly or explicitly depend on the ability to prepare (or disentangle) MPS using a sequential circuit.It may be possible to replace the sequential circuit with ours to reduce the circuit depth in these protocols.Another direction would be to extend our lower-bound proof and the preparation algorithm to prepare certain higher dimensional tensor network states [49].
where c N > 0 is the normalization constant.After a gauge transformation, every tensor A can be expressed in terms of a basis of normal tensors [25] where the normal A j are in canonical form II and produce orthogonal vectors in the thermodynamic limit [25].Without loss of generality we assume |µ j,k | ≤ 1 with at least one of them having magnitude exactly one.The normal case thus corresponds to b = 1 and a single |µ 1,1 | = 1.From Eq. (S2), it follows that the general form of a TI-MPS is where and |v j ⟩ is the (unnormalized) MPS generated by the normal tensor A j (i.e., Eq. (S1) without c N ).
Let us now define the approximate state | ϕ N ⟩, which generalizes Eq. (10).For that, as in the normal case, we block q sites and perform a polar decomposition of the tensor B i1...iq = A i1 . . .A iq .This results in B = V P , where P : C D 2 → C D 2 is positive-semidefinite and the isometry satisfies V † V = Π for Π the projector onto the image of P .Since V is an isometry, P inherits the block structure of where i 1 , i 2 = 1, . . ., D and all P j are normal tensors.We can therefore express where |v pos,j ⟩ is the unnormalized MPS generated by P j .
The approximate state | ϕ N ⟩ is defined by replacing each normal tensor P j with its fixed-point counterpart P j,∞ (analogous to Eq. ( 8)).Equivalently, we replace each |v pos,j ⟩ by the corresponding fixed-point state |Ω j ⟩.That is, where |Ω j ⟩ = N/q i=1 |ω j ⟩ RiLi+1 are (normalized) nearestneighbor entangled pairs over C D 2 .Importantly, since a basis of normal tensors was used for the decomposition of Eq. (S2), they satisfy local orthogonality ⟨ω j |ω j ′ ⟩ = δ jj ′ [25].Equation (S7) also justifies the form of Eq. ( 19), for which where we explicitly added a superscript (N ) to remember that β j may have a decaying contribution from |µ j,k | < 1 [Eq.(S4)].This contribution vanishes in the limit of blocking q → ∞, but here is taken into account, because neglecting it leads to an unwanted additional error contribution.Because of this, j β j |Ω j ⟩ in Eq. (S7) may strictly speaking not be a fixed point of the RG transformation in the non-normal case.
We now turn to the error estimate.For that, the key lemma comes from Ref. [16], where it was shown that for the normal tensor for all 0 < γ < 1/2 (see Eq. (S29) in the Supplemental Material of Ref. [16]).Here denotes the associated correlation length, i.e., λ the subleading eigenvalue of the transfer matrix of the normal tensor A j .
We are now ready to state and prove our result.
Lemma 1' (Approximation error).Consider a sequence of TI-MPS |ϕ N ⟩ generated by the tensor A. Then for all 0 < γ < 1/2: (ii) For a general non-normal A, and q = o(N ) where ξ diag = max j ξ jj .
Proof.(i) Let us start with the case of a normal tensor.
As detailed in the main text, Then, by the triangle inequality The first term is exactly equal to the LHS of Eq. (S9).
Using Cauchy-Schwarz and that c N = tr E N A , the second is O(exp(−N/ξ)).Since q = o(N ), the first term dominates.
(ii) We now move on to the non-normal case.Here, another set of length scales ξ jj ′ plays a role, which is defined as follows.Consider the inner product of two MPS over N sites, ⟨v j |v j ′ ⟩, where |v j ⟩, |v j ′ ⟩ are generated by normal tensors A j , A j ′ that belong to different basis elements [cf.Eqs.(S2) and (S3)].Then where where τ max the spectral radius of the mixed transfer matrix E jj ′ = i A i * j ⊗ A i j ′ and τ max < 1 (see Lemma A.2 in [25]).
Using triangle and Cauchy-Schwarz inequalities, For the first term, we have , where we used that c 2 N = j |β j | 2 .To bound the first fraction, we use Eq.(S9) and get O N q exp(−γq/ξ diag ) where ξ diag = max j ξ jj .By Eq. (S14) the second fraction is O(exp(−N/ξ off−diag )) where ξ off−diag = max j̸ =j ′ ξ jj ′ .The remaining term is The terms of the first sum are O(exp(−N/ξ jj )), while those of the second sum O(exp(−N/ξ jj ′ )).

Proof of Theorem 1
Before we present the proof, let us introduce the following lemma, which we will use to distinguish the states based on the mismatch between states with strictly finite correlation length and states with exponentially decaying correlations.
Lemma 2 (Exponentially decaying correlations).Let {|ϕ N ⟩} be a sequence of TI-MPS generated by an injective tensor A with finite correlation length ξ > 0 [cf.Eq. (S1)].Then, we can always find two local operators O 1 , O ′ s acting on spins 1 and s with ||O|| = ||O ′ || = 1 such that for any integer s > 1 and sufficiently large N , where c > 0 is independent of N, s.
Proof.Consider the connected correlation function where O 1 and O s are two (potentially different) operators placed at sites 1 and s.We have where the normalization c N = Tr E N 1 , and where d is the physical dimension of the MPS.Given the spectrum of E 1 we can always take N sufficiently large so that we can approximate with an arbitrarily small error, where D is the bond dimension and we have written Since the tensor A is injective [14], we can always choose O (and O ′ ), such that the corresponding transfer matrix E O = |A⟩⟨B| for arbitrary A, B (up to a normalization constant).In particular, we can use this to impose that The first line ensures (S17a), while the second and third ensure (S17b) for sufficiently large N , with c = c ′ /2, where 1/2 is an arbitrary constant chosen for concreteness.Now, we can prove Theorem 1.Let {|ϕ N ⟩} be a sequence of TI normalized MPS on N sites generated by a normal tensor A, and {|ψ N ⟩} a sequence of states obtained by applying a depth-T local quantum circuit to a product state and define the error ϵ = 1 − |⟨ϕ N |ψ N ⟩|.
Theorem 1 (restated).If T = o(log N ) there is some N 0 such that for all N > N 0 we have ϵ > 1/2.
Proof.Let us assume that T = o[log(N )] and T > 2ξ, since we can always add layers of identity operators to increase the depth of the circuit.We approximate {|ϕ N ⟩} through {| ϕ N ⟩} [cf.Eq. ( 10)] obtained by blocking q N = ⌈2(1 + η)ξ ln N ⌉ with η > 0 and use Lemma 1 to bound the error as for some constant c 0 independent of system size.We take N such that we have a large number of blocks, all of the same size, q N , except for the last one, which may be larger.This is always possible, as q N = O(log N ).We also take N large enough to ensure q N > T .We thus have where d(ρ, σ) = ||ρ − σ|| 1 /2 is the trace distance [47], as well as an upper bound on trace distance from fidelity combined with Eq. ( S23) In the following, we will find a lower bound to the first term in Eq. (S24) to make the difference larger than 1/2.We will also drop the subscript N to simplify notation.
To obtain a bound on the distance of |ψ N ⟩ and | ϕ N ⟩, let us consider instead a suitable subsystem.To that end, we divide the chain into ⌊N/(2q)⌋ blocks of size 2q each, with the last block potentially smaller than 2q.We then trace over all 2q spins at the sites contained in the intervals [4mq + 1, 2(2m + 1)q], with m = 0, 1, . . . in both states | ϕ⟩ and |ψ⟩.In case the last block we constructed is smaller than 2q, we trace it as well.If we perform such an operation on | ϕ⟩⟨ ϕ|, we obtain a product state which follows from the definition of | ϕ N ⟩ and the fact that it is invariant under translation by q sites.We have Analogously, applying the same trace to |ψ⟩⟨ψ| we also obtain a product state, because q > T , Using the fact that the trace distance is contractive under tracing, and bounding it in terms of the Uhlmann fidelity, we have [47] with the Uhlmann fidelity between two density matrices ρ and σ defined as Given that ρ and σ are product states, we have where δ = min i d(ρ 0 , σ i ) 2 , and where we have used another bound between the fidelity and the trace distance [47].Next we will lower bound δ using Lemma 2. However, we have to be a bit careful since this lemma applies to ϕ instead of ϕ.Fortunately, we can use Eq.(S25) to replace one with the other.For the sake of concreteness, we will bound d(ρ 0 , σ 1 ) but the same analysis applies to every σ i .
Let us take s = 2T + 1, and the operators O, O ′ from Lemma 2 to define where According to Lemma 2, a O1 = a O ′ s = 0, and In order to use Eq.(S34), we need to connect a Q to a Q .Using Eq. (S25), we choose sufficiently large N to obtain d( ϕ, ϕ) ≤ µ/3.This immediately implies that Moreover, since ψ is created from a product state by a depth-T circuit, every connected correlation for operators at a distance larger than 2T vanishes.Since We now show that for any choice of b O1 , b O ′ s , the distance d(ρ 0 , σ 1 ) is bounded below by a constant.Thus, we minimize Eq. (S34) with respect to (S36) From this it immediately follows that δ > µ 2 /9.

Numerical results
Here we provide numerical evidence that our proposed strategies are capable of preparing relevant short-range correlated (inhomogeneous) MPS under open boundary conditions, which are of the form are tensors on the boundary, while the tensors in the bulk are defined similarly to those in Eq. (3).To prepare such states, we construct the isometries analytically, as delineated in the main text, and use a simple extension of the Evenbly-Vidal algorithm [38,39] to variationally find the fixed-point state of the form in Eq. ( 19) that maximize the fidelity between the approximate state [cf.Eq. ( 10)] and the target state (see more details in Appendix 4).Since the variational space comprises only (superpositions of) product states of entangled pairs [cf.Eq. ( 19)], this optimization scheme is found to be highly efficient.Moreover, this local optimization strategy can effectively encapsulate the inhomogeneity present in the target state, which makes it especially suitable for preparing non-TI MPS and states with open boundary conditions, and can be directly extended to the case of long-range MPS.
In the following, we numerically study the performance of our algorithm for preparing three types of short-range correlated MPS: (1) the 1D AKLT state [10,11], (2) an MPS family with tunable correlation length [33], and (3) inhomogeneous random MPS [34], which illustrate that our algorithm succeeds in general inhomogeneous settings, too.
1. Preparation of AKLT state and the MPS family [33] The 1D AKLT state is a paradigmatic state in condensed matter physics, with important application in measurement-based quantum computation [59].The 1D AKLT state can be formed by first having a product state of singlets consisting of virtual qubits that connect neighboring sites of the 1D chain, then projecting the virtual qubits of two neighboring pairs to their symmetric subspace (with spin S = 1).In our calculation, we consider the spin S = 1 at each site is formed by the symmetric subspace of two actual qubits in the quantum device, and the resulting AKLT state is an MPS of bond dimension D = 2 and physical dimension d = 4.
Figure S1(a) illustrates the scaling of the error per block ϵ/M (where the number of blocks M = N/q)[cf.Eq. ( 17)] as a function of blocking range q (measured in units of the correlation length ξ, with ξ AKLT = 1/ ln 3) for the AKLT state.In our calculations, we chose the number of blocks M = 200.As expected, both our circuit constructions have comparable performance, with ϵ/M exhibiting an exponential decay with q/ξ, in accordance with the bound of Eq. ( 17).This verifies the predicted scaling of T = O(log(N/ϵ)).
To further assess the efficacy of our algorithm for MPS with varying correlation lengths ξ, we also investigate the MPS class of bond (physical) dimension D = 2 (d = 2), with matrices in Eq. (S39) of the form [33] A 0 .The results on the scaling of the error per block ϵ/M is shown in Fig. S1(a).We see that, for the AKLT state and the MPS class of various correlation lengths ξ ≈ 4, 16, the scaling of ϵ/M show almost the same behavior as ϵ/M ∼ exp(−γ num q/ξ) with γ num ≈ 2, where the number of blocks M = N/q.Notably, γ num is much larger than the analytically derived value 0 < γ < 1/2 [cf.Eq. ( 17)].Therefore, for these two classes of states, in practice, one can prepare them faster than predicted in the worst-case bound Eq. ( 17).

Scaling of CNOT depth for various schemes
To make the scaling T = O(log N/ϵ) of our protocol more relevant to the current devices where the multi-qubit unitaries are decomposed into CNOT gates and singlequbit rotations, we present a simple comparative study for the scaling of the CNOT depth T CNOT required to prepare the MPS class [cf.Eq. (S40)] with correlation length ξ ≈ 4 with the fidelity F = |⟨ϕ N | ϕ N ⟩| 2 = 0.9.We compare four different schemes: (1) sequential-RG scheme, (2) tree-RG scheme, (3) tree-RG scheme assisted by measurements, and (4) the sequential scheme [12].We present the result directly here and provide an explanation of our estimation of T CNOT later.
Figure S1(b) shows T CNOT for these four different schemes.As anticipated, for both the sequential-RG and the tree-RG schemes, we observe the overall scaling T = O(log N ), and these methods result in a significantly smaller T CNOT compared to that of the sequential method, particularly for large system sizes N .Additionally, due to the fact that the blocking range q can only increase discretely, we observe plateau-like features in the scaling of T CNOT in Fig. S1(b).Moreover, when the tree-RG scheme further assisted by measurements, the T CNOT is further reduced compared to the stand-alone tree-RG scheme, yielding the smallest CNOT among all schemes, with only T CNOT ≈ 100 when creating this state of N = 10 6 qubits.

a. Estimation of the CNOT depth
This technical subsection elaborates on how we estimate T CNOT for different schemes studied in Fig. S1(b).For the sake of simplicity, our focus lies on the MPS with the bond (physical) dimension D = 2 (d = 2), which aligns with the results presented in Fig. S1(b).Estimation for MPS with varying physical or bond dimensions can be accomplished in a similar manner.
In general, the schemes considered in Fig. S1(b) involve two types of gates: (1) Isometries mapping from m qubits to n ≥ m qubits and (2) SWAP gates utilized to change the qubit location of the fixed-point state in the sequential-RG scheme and to implement long-range isometries in the tree-RG scheme.17)] plotted as a function of the blocking range q/ξ, for preparing states in the MPS family [33] and the AKLT state.(b) Estimated CNOT depth TCNOT for preparing the MPS family of correlation length ξ ≈ 4 with fidelity F = 0.9 using different schemes.The 'sequential-RG' denote the sequential-RG circuit, 'tree' denote the tree-RG circuit.The 'tree + meas.' denote the measurement-assisted tree-RG circuit, and 'sequential' refers to the sequential circuit [12].
It is well known that a SWAP gate can be decomposed as three CNOT gate.For the isometries (1), we estimate the CNOT depth of each isometry, denoted by T iso (m, n), using its theoretical lower bound [60] T This theoretical limit could potentially be reached by existing gate decomposition approaches [61].Specifically, for preparing MPS with d = D = 2, we have T iso (1, 2) = 2 for the sequential scheme, T iso (2, 3) = 10 for the sequential-RG scheme, and T iso (2, 4) = 26 for the tree-RG scheme.
In the following we count the gates used in each scheme for preparing MPS of system size N and bond (physical) dimension D = 2(d = 2): • The sequential scheme uses one layer of 0-to-2 qubit isometry on the boundary and N − 2 layers of 1-to-2 qubit isometries in the bulk [12,62].
• The sequential-RG scheme with blocking range q uses q − 2 layers of SWAP gates and q − 2 layers of 2-to-3 qubit isometries.
• The tree-RG scheme iteratively blocks the chain for ∼ log(q) times.The first blocking (blocking to injectivity) produce a layer of two-qubit gates.After that, the m-th blocking (m ≥ 2) produce a layer of 2-to-4 qubit isometries, where the largest distance between qubits within the same isometry is 2 m .We implement such long-distance isometries using local gates by first swapping the qubits to the center region of the isometry, then local implementing the 2-to-4 isometry, and finally swapping the qubits back.This leads to an additional (2 m − 4) layers of SWAP gates for each m.
• The measurement-assisted tree-RG scheme simply eliminates the SWAP cost in the aforementioned tree-RG scheme, since now the long-distance isometries can be executed by gate teleportation [45,46].It's noteworthy that there are also Bell-state ancilla preparation, measurement, and postprocessing costs involved in this scheme, but our focus here is solely on the circuit depth of the scheme.
Based on the above components, we can directly estimate T CNOT as a function of the system size N and the required blocking range q.Here, q can be obtained from the scaling of the error per block (analogous to that in Fig. S1(a)) and the required state preparation fidelity F, thereby resulting in the T CNOT shown in Fig. S1(b).

Preparation of inhomogeneous random MPS
Here we illustrate our algorithm in the fully inhomogeneous case of random MPS.Since any MPS can be brought to a canonical form with isometric tensors, a natural way to define the corresponding ensemble is by choosing each tensor randomly according to the Haar measure of the unitary group U (dD) [34].As these states correspond to the ground states of disordered local Hamiltonians [49], they can be considered as representative states of the trivial topological phase [17,63], and it is of interest to explore various properties of this class [34][35][36][37].
Since random MPS are expected to be short-range correlated [37], we anticipate that our protocol can efficiently prepare this class of states.In Fig. S2, we display the scaling of the error per block ϵ/M with the blocking range q for randomly sampled 1000 states of d = D = 2 using the tree-RG protocol, and observe the asymptotic scaling ϵ/M ∼ exp(−cq) (S42) in all instances (note that the number of blocks M = N/q), with c varying only slightly between different individual cases.This scaling is reminiscent of the behavior predicted in Eq. ( 17), and it directly implies that our protocols can prepare such inhomogeneous random MPS efficiently.
S2. Error per block as a function of the blocking range q for inhomogeneous random MPS of bond dimension D = 2.The gray lines represent a total of 1000 individual samples, while the blue line illustrates the average behavior (averaged over c in Eq. (S42)).

The local variational optimization
Here we provide more details of the optimization algorithm developed in Ref. [38,39], which is used here for preparing inhomogeneous MPS.
Our task is to prepare the MPS |ϕ N ⟩ obc [cf.Eq. (S39)] of a finite physical dimension d and bond dimension D. As described in the main text, we first block each neighboring q site to analytically construct the isometries.This leaves us with |ϕ pos ⟩ [cf.Eq. ( 18)], which we aim to approximate with a state |Ω⟩ consisting of nearest-neighbor entangled pairs The optimization algorithm aims to maximize the fidelity F = |⟨ϕ pos | Ω⟩| 2 by optimizing the form of nearestneighbor entangled pairs {|ω i ⟩ RiLi+1 } in |Ω⟩.For this, we write each |ω i ⟩ RiLi+1 as a local two-qudit unitary W i acting on the product state, as To optimize the i-th unitary W i , we write the overlap between |ϕ pos ⟩ and |Ω⟩ as (S45) where E i = |Ω i ⟩⟨ϕ pos | is the environment of the unitary W i .By absorbing the overall phase factor into the unitary W i , the fidelity can be expressed as the square of the real part of the overlap, as Therefore, one can directly find the unitary W i to maximize F pos by doing a singular value decomposition of the environment E i = X i S i Y † i , and choosing W i = Y i X † i .By iterating through all the gates {W i } and sweeping back and forth until the fidelity F converges, we find the |Ω⟩ that best approximate |ϕ pos ⟩ [38,39].
Since there are N/q commuting two-qudit unitaries {W i } used for creating |Ω⟩ and the target |ϕ pos ⟩ is an MPS of finite physical dimension and bond dimension D, the construction of environment {E i } and its SVD decomposition are computationally efficient, with the computational effort of constructing each E i scales as O(N D 4 ).Moreover, we do not encounter the problem of falling into local minima for obtaining the results shown in Figs.S1 and S2, thanks to the small number of variational parameters in |Ω⟩.Overall, this allows the variational optimization to efficiently find the best approximation |Ω⟩ to |ϕ pos ⟩.We also point out that there are more advanced methods for optimizing isometries [64], which might be useful for the optimization task here as well.Finally, for the case of long-range correlated MPS, the fixed point |Ω ′ ⟩ [cf.Eq. ( 19)] can be created by applying a single layer of local two-qudit unitaries on the underlying GHZtype state, and the optimization algorithm described here can be directly extended to that case to optimize those two-qudit unitaries.

FIG. 1 .
FIG. 1. Algorithm for MPS preparation.(a)After blocking, we approximate the state through its RG fixed point (nearestneighbor entangled pairs for normal tensors [Eq.(12)]) combined with isometries V that encode the local structure of the state.(b) We use RG to construct an efficient circuit for V , which can be further expressed with a low-depth circuit of local gates.(c) Our algorithm extends to non-normal (i.e., long-range correlated) MPS with GHZ-like fixed points [Eq.(19)] depicted here.Using quantum circuits assisted by measurements, both the fixed-point states and the isometry V can be implemented efficiently.

g 0 0
, ∀j ∈ (2, ..., N − 1), (S40) and the boundary tensors are chosen as the 2 × 2 identity matrix.The correlation length of this MPS class can be tuned by the parameter g as ξ = ln 1−g 1+g −1 FIG. S1. (a)The error per block ϵ/M [cf.Eq. (17)] plotted as a function of the blocking range q/ξ, for preparing states in the MPS family[33] and the AKLT state.(b) Estimated CNOT depth TCNOT for preparing the MPS family of correlation length ξ ≈ 4 with fidelity F = 0.9 using different schemes.The 'sequential-RG' denote the sequential-RG circuit, 'tree' denote the tree-RG circuit.The 'tree + meas.' denote the measurement-assisted tree-RG circuit, and 'sequential' refers to the sequential circuit[12].