Local variational quantum compilation of a large-scale Hamiltonian dynamics

Implementing time evolution operators on quantum circuits is important for quantum simulation. However, the standard way, Trotterization, requires a huge numbers of gates to achieve desirable accuracy. Here, we propose a local variational quantum compilation (LVQC) algorithm, which allows to accurately and efficiently compile a time evolution operators on a large-scale quantum system by the optimization with smaller-size quantum systems. LVQC utilizes a subsystem cost function, which approximates the fidelity of the whole circuit, defined for each subsystem as large as approximate causal cones brought by the Lieb-Robinson (LR) bound. We rigorously derive its scaling property with respect to the subsystem size, and show that the optimization conducted on the subsystem size leads to the compilation of whole-system time evolution operators. As a result, LVQC runs with limited-size quantum computers or classical simulators that can handle such smaller quantum systems. For instance, finite-ranged and short-ranged interacting $L$-size systems can be compiled with $O(L^0)$- or $O(\log L)$-size quantum systems depending on observables of interest. Furthermore, since this formalism relies only on the LR bound, it can efficiently construct time evolution operators of various systems in generic dimension involving finite-, short-, and long-ranged interactions. We also numerically demonstrate the LVQC algorithm for one-dimensional systems. Employing classical simulation by time-evolving block decimation, we succeed in compressing the depth of a time evolution operators up to $40$ qubits by the compilation for $20$ qubits. LVQC not only provides classical protocols for designing large-scale quantum circuits, but also will shed light on applications of intermediate-scale quantum devices in implementing algorithms in larger-scale quantum devices.

Implementing time evolution operators on quantum circuits is important for quantum simulation. However, the standard way, Trotterization, requires a huge numbers of gates to achieve desirable accuracy. Here, we propose a local variational quantum compilation (LVQC) algorithm, which allows to accurately and efficiently compile a time evolution operators on a large-scale quantum system by the optimization with smaller-size quantum systems. LVQC utilizes a subsystem cost function, which approximates the fidelity of the whole circuit, defined for each subsystem as large as approximate causal cones brought by the Lieb-Robinson (LR) bound. We rigorously derive its scaling property with respect to the subsystem size, and show that the optimization conducted on the subsystem size leads to the compilation of whole-system time evolution operators. As a result, LVQC runs with limited-size quantum computers or classical simulators that can handle such smaller quantum systems. For instance, finite-ranged and short-ranged interacting L-size systems can be compiled with O(L 0 )-or O(log L)-size quantum systems depending on observables of interest. Furthermore, since this formalism relies only on the LR bound, it can efficiently construct time evolution operators of various systems in generic dimension involving finite-, short-, and long-ranged interactions. We also numerically demonstrate the LVQC algorithm for one-dimensional systems. Employing classical simulation by time-evolving block decimation, we succeed in compressing the depth of a time evolution operators up to 40 qubits by the compilation for 20 qubits. LVQC not only provides classical protocols for designing large-scale quantum circuits, but also will shed light on applications of intermediate-scale quantum devices in implementing algorithms in larger-scale quantum devices.

I. INTRODUCTION
Implementing time evolution operators under a largescale Hamiltonian is one of the most important tasks in noisy intermediate-scale quantum (NISQ) devices [1] and larger fault-tolerant quantum computers to exploit their computational power. The task is computationally hard for classical computers; despite the enormous effort toward its efficient computation, it generally takes resources that are exponential to the system size. On the other hand, quantum computers are capable of executing it in polynomial time [2]. It is also important for computing eigenvalues and eigenstates of a system on a quantum computer; the quantum phase estimation algorithm [3][4][5] uses controlled time evolution operators to generate them. Recent hardware with tens of qubits has realized its proof-of-principle demonstrations for systems such as Fermi-Hubbard models [6], discrete time crystals [7,8], and various equilibrium and nonequilibrium phenomena [9][10][11].
Trotterization is one of the simplest implementations, which has been extensively investigated theoretically * mizuta.kaoru.65u@st.kyoto-u.ac.jp [2,[12][13][14][15][16][17] and employed in various experiments such as Refs. [6,9,11,18,19]. Despite recent developments, it may involve a huge number of gates when applied to a large scale problem with over 50 qubits. For example, Ref. [17] estimated that we need 10 15 -10 18 gates to perform time evolution of simple molecules. Even for the simpler Heisenberg model, it is estimated that 10 6 -10 7 elementary rotation gates are needed [15]. These estimates are well beyond the reach of current quantum devices whose gate infildelities are on the order of 1%. Moreover, it is problematic even for an ideal fault-tolerant quantum computer because execution of 10 15 gates would require years even if it can perform 10 8 gates per second.
It is therefore vital to develop methods that can compress the circuits for time evolution. The so-called qubitization technique [20] has achieved an optimal scaling in the number of gates needed, but requires many ancilla qubits for its implementation (see e.g. Ref. [21]). When focusing on algorithms that requires no or few ancilla qubits, Refs. [22,23], for example, have presented depth-compression methods for Trotter expansion based on some algebraic structures. Another promising approach is to use the framework of variational quantum algorithms [24]. They are exemplified by variational quantum simulation [25][26][27][28][29][30][31], and quantum compilations . We can directly implement a large-scale time evolution operator with the optimal parameter θopt. LVQC can be completed by classical simulation with some approximation or NISQ devices, without implementing the target exp −iH (L) τ itself.
employing variational quantum diagonalization [32][33][34]. Among other methods, quantum-assisted quantum compiling (QAQC) [35,36] and its variant [37] are one of the promising ways to obtain approximate time evolution operators with compressed circuit depth. It uses a variational quantum circuit V to approximate a target unitary U . Importantly, they employed a local cost function instead of the naive global fidelity measure Tr U † V to avoid the barren plateau problem. While QAQC is available for generic target unitary gate U on L qubits, it seems to be problematic for depth compression that the target U itself should be accurately implemented on quantum circuits.
In this paper, we develop a local variational quantum compilation (LVQC) protocol to search an accurate and efficient quantum circuit for constructing a large-scale local Hamiltonian dynamics with limited-size quantum devices or possibly with classical simulation of such limitedsize quantum circuits. To formulate the protocol, we focus on Lieb-Robinson (LR) bound [38], which dictates that the dynamics under a local Hamiltonian has approximate causal cones. We compose of subsystem cost functions for every subsystem which measure the local difference between the target unitary gate and the ansatz. Exploiting the LR bound, we rigorously derive their scaling, which is validated when the subsystem size is as large as the approximate causal cone. These results lead to our LVQC protocol as described in Fig. 1; we optimize a local-compilation cost function, corresponding to the average of the subsystem cost functions over subsystems. This cost function can be computed with a at-most 2Lqubit quantum device or a corresponding classical simulator, whereL (< L: system size) denotes the scale of the causal cone size. Finally, we construct a quantum circuit that approximates the target time evolution operator for the system size L based on the resulting optimal parameters.
We also conduct classical numerical demonstration of LVQC to compress the depth of the ideal time evolution operators. We adopt a one-dimensional Heisenberg model, and optimize the cost function for subsystems by approximately computing it with time-evolving block decimation (TEBD) [39,40]. We successfully compose of a 5-depth time evolution operator for 40 qubits by the local compilation for 20-qubit systems. This achieves the average gate fidelity 0.9977, which is much better than that of the same-depth Trotter decomposition, 0.8580. In addition, by computing the stroboscopic dynamics of ferromagnetic states with local excitations or domain walls, the optimal ansatz obtained by LVQC reproduces the dynamics with size-and time-scales twice and ten times as large as those used in the compilation, respectively.
We emphasize some advantages of LVQC. First, it requires at-most 2L-qubit quantum devices as large as the causal cone size, which is comparably smaller than the whole-system size L. There is no need for preparing the ideal target unitary gate for the size L during our protocol. Second, our formulation relies only on the existence of the LR bounds. LVQC is available for broad systems involving finite-ranged, short-ranged, and long-ranged interactions in generic dimension, with the help of the recent developments in the LR bound [41][42][43][44][45][46][47][48][49]. We expect that LVQC can be applied for executing large-scale time evolution operators in the following ways; future, and hence our results will contribute to bridging the gap between NISQ devices and larger-scale quantum computers. The rest of this paper is organized as follows. In Sec. II, we introduce QAQC and the LR bound as the preliminaries for our results. We devote Sec. III, Sec. IV and Sec. V to provide the main results. In Sec. III, we introduce the subsystem cost function from the local cost functions of QAQC, and rigorously prove its scaling property by the LR bounds. In Sec. IV, we formulate the LVQC protocols respectively for translationally-invariant systems and other generic systems. The above scaling yields the local compilation of a large-scale Hamiltonian dynamics for both cases, while the protocol is simplified in the former case. Finally, we show its numerical verification in Sec. V and conclude this paper in Sec. VI.

II. PRELIMINARIES
In this section, we review some preliminary studies in order to derive our results on LVQC for a large-scale Hamiltonian dynamics.
A. Quantum-assisted quantum compiling (QAQC) Quantum-assisted quantum compiling (QAQC) [35] is a quantum-classical hybrid algorithm to obtain a variational quantum circuit V (θ) with parameters θ, which approximates a target unitary operator U . They have introduced several cost functions C(U, V ), that should be minimized, to to obtain an optimal parameter θ opt such that U V (θ opt ). The cost functions C(U, V ) should satisfy the following properties; 1. (Computability) We can efficiently compute C(U, V ) with a quantum computer.
2. (Faithfulness) C(U, V ) is always non-negative, and it becomes 0 if and only if U and V are equivalent.
3. (Operational meaning) C(U, V ) provides constraints on some operationally meaningful value.
The first cost function is a global one defined by when U and V are defined on an L-qubit lattice Λ. This can be measured by Hilbert-Schmidt Test (HST). In HST, we use an 2L-qubit lattice Λ A ∪ Λ B (each of Λ A and Λ B is a copy of Λ), and initialize the state by the Bell state |Φ + AB , defined by The state |Φ + Aj Bj represents the Bell pair of the j-th sites A j and B j respectively in Λ A and Λ B . Then, we apply U and V * respectively to the subsystems A and B, resulting in the state and perform the Bell measurements for every j-th pair A j and B j . It is equivalent to measure Π 1 Π 2 . . . Π L , where Π j is defined by Finally, since Eq. (1) can be rewritten as we can efficiently compute C HST (U, V ) with a 2L-qubit quantum device. The term Tr is schematically depicted by Fig. 2 (a). The cost function C HST (U, V ) is faithful in that it satisfies 0 ≤ C HST (U, V ) ≤ 1 and that it becomes 0 if and only if there exists ϕ ∈ R such that U = e iϕ V . The second one is a local cost function defined by where each term is given by for j = 1, 2, . . . , L. They satisfy 0 ≤ C LHST (U, V ) ≤ 1 and 0 ≤ C LHST (U, V ) ≤ 1 by their definitions. We can compute them on a 2L-qubit quantum device by Local Hilbert-Schmidt Test (LHST), in which we perform Bell measurement of the j-th pair A j and B j on the state ρ AB (U, V ) for C (j) LHST (U, V ) and take its average for C LHST (U, V ). The term Tr[Π j ρ AB (U, V )] is described by Fig. 2 (b). In terms of faithfulness, C (j) LHST (U, V ) satisfies the following property, (9) where I {j} denotes the identity operator acting on j-th qubit. This indicates that the action of U corresponds to that of V on the j-th site. Thus, the cost function In QAQC in Ref. [35], the authors employ either or the combined cost function (10) with 0 ≤ α ≤ 1. It is faithful, and possesses an operational meaning in terms of the average gate fidelity, defined bȳ Haar random state. (11) This indicates the expected fidelity between U |ψ and V |ψ averaged over a Haar random state |ψ , and it is bounded from below by the resulting cost functions as follows [35,50,51], where |Λ| denotes the number of sites in the lattice Λ. The cost functions of QAQC can be efficiently computed on a 2L-qubit quantum device based on Eqs. (6) and (8). Alternatively, we can nontrivially reduce the resource for cost evaluation to L qubits by the following lemma, which we prove in Appendix A.
Lemma 1. C HST (U, V ) and C LHST (U, V ) for L-qubit unitaries U and V can be evaluated effciently within an additive error with O(1/ 2 ) runs of an L-qubit device.
It should be noted that the algorithm to achieve Lemma 1 involves a Monte-Carlo sampling and induces increased (however constant) overhead compared to the case where we use 2L qubits. In any cases, the bottleneck of QAQC for compressing time evolution operators is to implement the target U itself on at-least L-qubit quantum systems for cost evaluation. Our protocol can avoid this problem by compiling with smaller quantum systems with the sizẽ L, as large as the approximate causal cone by the LR bound, as discussed in Sec. IV,

B. Lieb-Robinson bound
Lieb-Robinson (LR) bound dictates that any local observable cannot spread out faster than a certain finite velocity (called Lieb-Robinson velocity) under a local Hamiltonian [38]. This can be interpreted as the emergence of approximate causal cones in quantum mechanics.
Let us describe it more precisely. We focus on a local Hamiltonian on a lattice Λ, given by where h X denotes a term nontrivially acting on a domain X ⊆ Λ. Let · denote the operator norm. Here, we assume 1. (Extensiveness) Local energy scale at every site is bounded by a finite value g; X;X j h X ≤ g, for any j ∈ Λ.
3. (Range of interactions) Interactions are finite- Let us consider local observables O j and O j acting on j and j respectively, and assume that they are normalized as O j = O j = 1. Then, the inequality, holds for a fixed time τ . Here, the constant velocity v and the constant length ξ are determined only by the extensiveness g, the locality k, and the range d H , while the constant C depends on τ in addition (C typically increases linearly in τ [38]). This suggests that U (τ ) † O j U (τ ) approximately acts on the domain inside the approximate causal cone {j ∈ Λ | dist(j, j ) ≤ vτ }, and that the components outside of it are exponentially suppressed in the distance from j. As a result, it can be expected that U (τ ) † O j U (τ ) is well reproduced by the local Hamiltonian inside the approximate causal cone. Let H (L ,j) denote the local Hamiltonian composed of h X whose support X has distance from j smaller than L /2 [See Eqs. (23) and (24) for the exact definition]. In fact, we can derive the following inequality from the LR bound (See Ref. [47] and Appendix B); The brickwork-structured ansatz V (L) (θ), defined by Eq. (22). For translationally-invariant systems, we choose the variational parameter set θ = {θ i,k , θ i,k } i,k so that θ i,k and θ i,k respectively become independent of the position k.
The yellow region represents the j-centeredL-size domain Λ (L,j) , which is utilized for composing of the restricted ansatz V (L,j) (θ) based on Eq. (32).
where l 0 is defined by L = 2(l 0 + d H + vτ ). The integration comes from the summation all over the lattice out of the approximate causal cone. This relation enables us to approximate the local cost function (7) for a large size L by that for the smaller size L with an arbitrarily small error e −O(l0/ξ) when L is sufficiently large compared to vτ .

III. APPROXIMATION OF LOCAL COST FUNCTIONS BY LIEB-ROBINSON BOUND
In this section, we provide the first main result, where we compose of the subsystem cost functions and show their scaling property by the LR bound. The subsystem cost functions are obtained by the restriction of systems to smaller subsystems for the local cost function C LHST . We clarify the approximate causal cone from the LR bound and the exact causal cone from the ansatz in the local cost functions. They lead to two formulas, which are respectively raised as Propositions 2 and 3 below. As a result, we obtain how the error between the subsystem cost functions and C LHST scales in the subsystem sizeL, and validate the approximation of C LHST by the subsystem cost functions with properL. As we will see in Sec. IV, these results enables the LVQC protocol for the whole-system Hamiltonian dynamics.
First of all, we specify the setup and the notation. We consider a local and extensive Hamiltonian with finiteranged interactions, H, on a lattice Λ [See Eqs. (15)- (17)]. Throughout the main text, we focus on an Lqubit one-dimensional system, as Λ = {1, 2, . . . , L}, but the extension to other cases is straightforward (See Appendix B). We explicitly write the system size L like H (L) , and consider the target time evolution operator U (L) = exp −iH (L) τ . For simplicity, we employ a brickwork-structured ansatz with the depth d in the form of as described in Fig. 3. Here, V (2) j,j represents an arbitrary parametrized two-qubit gate on the neighboring sites j and j , and the parameter set θ is composed of {θ i,k } i,k and {θ i,k } i,k . Now, we derive two rigorous relations on the local cost function for each j-th site, C (j) LHST (U (L) , V (L) ), using the approximate causal cone from the LR bound and the exact causal cone from the locality of ansatz. The first one, coming from the LR bound, validates the evaluation of the cost function with a local Hamiltonian acting only on qubits around j-th cite. To be precise, when we define the j-centered L -size domain Λ (L ,j) and the restricted Hamiltonian H (L ,j) by for the L-qubit Hamiltonian H (L) = X;X⊆Λ h X , they are related to the local cost functions C with a tunable parameter l 0 , the range of the Hamiltonian d H , and the LR velocity v. Then, the time evolution operator under the restricted Hamiltonian, defined by provides the following inequality, Here, the term ε LR is defined by Eqs. (20) and (21), and it is exponentially small in the tunable parameter l 0 as ε LR = e −O(l0/ξ) .
Proof.-From the definition Eq. (8), we obtain Considering that the projection to the Bell state is expanded by the right hand side of Eq. (28) is bounded by The above inequality comes from Eq. (20), the LR bound for the local observable. Finally, we obtain the relation which implies the inequality Eq. (27).
This proposition says that the restriction of the Hamiltonian to a smaller region hardly alters the local cost functions. The difference is bounded by the LR bound error ε LR . Equivalently, the diagram of Fig. 2 (b), which gives C LHST , can be approximated by that of Fig. 4 (a), which gives the restricted version. We note that this proof relies only on the existence of the LR bound, and hence Proposition 2 is valid also for generic locally-interacting systems in any dimension. For onedimensional systems with finite-ranged interactions, we have ε LR = exp(−O(l 0 /ξ)) with L = 2(l 0 +d H +vτ ) from Eq. (21). Based on this proposition, we can accurately determine the upper bound of the local cost function C from the d-depth ansatz V (L) (θ) of Eq. (22). Here, the symbols Π (L,j) k represent the product over k such that the support of V (2) 2k,2k+1 (θ i,k ) (for the first one) or V (2) 2k−1,2k (θ i,k ) (for the second one) is included in the domain ΛL ,j (See Fig. 3). Then, we obtain the following proposition.
Proposition 3. We consider the same situation as that of Proposition 2. We assume 4d ≥ L for the d-depth Lsite ansatz V (L) (θ), and rewrite the depth as d = L /4+d (d ≥ 0 is chosen so that d becomes an integer). For L satisfyingL ≥ L + 2d + 1, where the right hand side represents the size of the approximate causal cones, the d-depthL-site ansatz V (L,j) (θ) satisfies the following equality; Here,Ũ (L ,j) represents the restriction of U (L ,j) to the domain ΛL ,j , which is given bỹ Remark.-The assumption 4d ≥ L is not essentially required for proving this proposition. It rather serves as a guideline to construct the ansatz V (L) . When we employ the brickwork-structured ansatz given by Eq. (22), a local observable acting on a single qubit generally spreads to 4d-qubit operators. Hence, we should use d such that 4d ≥ L to capture the correlation within the LR bound and thereby accurately approximate the time evolution. It is straight-forward to generalize the above proposition to smaller d with a slight modification ofL.
Proof.-We employ the causal cones of quantum circuits here. Let us focus on Tr[Π j ρ AB (U L ,j , V (L) )], which can be schematically depicted by Fig. 4 (a). To visualize the causal cone, we pick up a part of the circuit belonging to the right half in the figure (the light-green region), which results in Fig. 4 (b). The light-blue squares in Fig. 4 (b) represent local operators on j-th sites composing Π j , given by Eq. (29). We also note that V Fig. 4 (a) is translated into V (L) † since its input and output are exchanged.
Each local two-qubit gates in the ansatz V (L) † can be classified into three groups by its effect on the local cost function. The first group is depicted by the white (nonpainted) two-qubit gates in Fig. 4 (b). Since these local gates and the corresponding ones in V (L)T B cancel each other by the contraction in the lower layer of Fig. 4 (a), they do not affect Tr[Π j ρ AB (U (L ,j) , V (L) )]. This cancellation is due to the locality of Π j and independent of U (L ,j) A appearing in the upper layer. The second group, composed of the gray two-qubit gates, are also inactive, because they can be contracted to identity in the upper layer. In contrast to the first group, its cancellation originates from the size restriction of the Hamiltonian H (L) to H (L ) , validated by the LR bound. The last one is composed of the yellow gates residing within the causal cones. Only these two-qubit gates are relevant for Tr[Π j ρ AB (U (L ,j) , V (L) )], which can be schematically depicted as Fig. 4 (c).
Finally, we determine the proper compilation sizeL. The active region, composed of the two causal cones spreading from the left-and the right-side [See (i) and (ii) in Fig. 4

(c)], is designated by
When the compilation sizeL surpasses its height, that is, whenL the restricted ansatz V (L,j) (θ) includes all the two-qubit gates in the active region. Therefore, we have where we use the fact that the contraction over Λ\ΛL ,j gives identity for the last equality. By using the definitions of the local cost function C LHST (Ũ (L ,j) , V (L,j) ), which can be measured by a 2L-qubit quantum device or aL-qubit quantum device with a Monte-Carlo sampling based on Lemma 1. Propositions 2 and 3 yield that the local cost function C (j) LHST (U (L) , V (L) ) can be approximated by the subsystem cost function, and they also dictate the scaling property of the subsystem cost function in the subsystem sizẽ L. Importantly,L ≥ L +2d +1 = 2(l 0 +d H +vτ )+2d +1 can be independent of the whole-system size L and significantly smaller than L. We note that the coefficient of the depth d inL comes from the brickwork structure of the ansatz V (L) . We can obtain the same result for any other ansatz with changing the coefficient inL as long as it is local.

IV. LOCAL VARIATIONAL QUANTUM COMPILATION OF A LARGE-SCALE HAMILTONIAN DYNAMICS
In this section, we formulate the local variational quantum compilation (LVQC) of a large-scale Hamiltonian dynamics as the second main result. In our protocol, we construct an approximate time evolution operator for the large size L by optimizing the cost functions defined on the smaller sizeL. Based on Propositions 2 and 3, we provide two different formulations for translationallyinvariant cases (Sec. IV A) and generic cases (Sec. IV B).

A. Local compilation for translationally-invariant systems
We first deal with translationally-invariant cases under periodic boundary conditions (PBC). Throughout this section, we denote such a translationally invariant Hamiltonian and its time evolution operator for the size L as H Theorem 4. We define the local-compilation cost function by, for a certain α ∈ [0, 1], which is defined on anL-size translationally-invariant systems under PBC. Assume that, after the minimization of C (L) α (θ), the optimal parameter set θ opt gives the upper bound of the local and global cost functions as When we choose the smallest even number larger than 2(l 0 + d H + vτ ) + 2d + 1 as the compilation sizeL, the time evolution operator for an L-qubit system (L ≥L) is approximated as with the usage of the same parameter set θ opt .
Proof.-We first derive Eq. (41) from Eq. (39). We combine translation symmetry with the scaling property of the subsystem cost functions, represented by Propositions 2 and 3. As a result, we obtain the relation for any j, Here,Ũ (L ,j) and V (L,j) are constructed from U PBC . To recover the PBC, we take the following strategy. Figure  5 gives a schematic picture of a part of gates composing Tr[Π j ρ AB (Ũ L ,j , V (L,j) )], similar to Fig. 4 (b). First, we add two-qubit gates V PBC . Since local gates outside of the causal cones does not alter the local cost function at all, we obtain the following relation; (44) We also recover the PBC of the target unitaryŨ (L ,j) . Let us consider the two Hamiltonians H for any local normalized observable at a j-th site, O j . This implies that we can apply Propositions 2 and 3 with substituting U Combining this inequality with Eq. (44), we arrive at the relation of C LHST , given by Eq. (41). Next, we derive Eq. (42), which gives an upper bound of the global cost function C HST . We employ the following inequality [35], PBC (θ opt )) ≤ ε HST + 3 2 ε LR . Finally, considering |Λ| = L for a one-dimensional system, the second inequality in Eq. (47) implies Eq. (42).
This theorem tells us that the optimal parameter set θ opt for theL-size local-compilation cost function can be directly employed to construct the approximate largerscale time evolution by U PBC (θ opt ). Its accuracy can be guaranteed by Eq. (41) or Eq. (42). The error consists of two parts: the first terms, ε LHST and ε HST , are due to a limited expressive power of the ansatz V (L) PBC ; the second term, ε LR is the intrinsic error induced by this LVQC protocol. They can be improved by using more expresive ansatz and using larger compilation sizẽ L, respectively. Now, we discuss what compilation size should be used to achieve an accuracy of O(ε) for a quantity of interest. When we focus on some local observables under the approximate time evolution V (L) PBC (θ opt ), the local cost function C LHST plays a significant role since it guarantees the local equivalence with U (L) PBC by Eq. (9). To be more precise, C LHST = O(ε) implies additive error O(ε) in the expectation values of local observables. We wish to choose the compilation sizeL = 2 l 0 + d H + vτ + d + 1/2 so that ε LR = e −O(l0/ξ) can be neglected. Therefore, in this case,L can be taken as O(ξ log(1/ )) + 2d H + 2vτ + 2d , which is independent of the whole-system size L.
On the other hand, in the cases where we require the accuracy in terms of global observables, the average gate fidelityF has the operational meaning. 1−F = O(ε) implies an accuracy of O(ε) in the expectation values of any observables. When theL-size optimization is achieved as Eqs. (39) and (40), the combination with Eqs. (12) or (13) ensures its lower bound as α (θ) until ε LHST or ε HST becomes much smaller than O(L −1 ), and then we can obtain preferable accuracy.
To summarize, our protocol starts with choosing a proper compilation sizeL.L should taken to be comparable to the approximate causal cone size by the LR bound, 2(ξ + d H + vτ + d ), or a bit larger than it, depending on the desired error. After minimizing the localcompilation cost function C (L) α (θ) which can be evaluated using classical simulator or quantum device with at leastL qubits, we can directly apply the optimal parameter set θ opt to obtain the approximate time evolution U (L) PBC V (L) PBC (θ opt ). This reduction in the size makes NISQ devices or classical simulators employing some approximation (See Sec. V) suitable for the compilation. LVQC can be employed for various purposes such as depth compression and calibration of U (L) , without implementing the target U (L) itself but only with the one for smaller systems U (L) . This is clearly one of the largest advantages in our protocol. In addition, by repeating the application of V (θ opt ), we can approximately simulate the stroboscopic dynamics at t = nτ (n ∈ N). Thus, LVQC with the size-scaleL and the time-scale τ can be applied to reproduce the dynamics of larger scales both in space and time. We summarize the results in Fig. 1.

B. Local compilation for generic systems without translation-invariance
Here, we develop the LVQC protocol for onedimensional finite-ranged systems without translation-invariance. The result does not essentially alter from translationally-invariant cases, but they have different cost functions.
We directly use Propositions 2 and 3 to derive the protocol. For the brickwork-structured ansatz V (L) (θ) (not necessarily translationally-invariant), we define the localcompilation cost function for generic cases by where we directly use the subsystem cost functions C We also use the relation Eq. (47), which results in Therefore, we obtain the following theorem, which designates the protocol for generic cases.
Theorem 5. We variationally minimize the localcompilation cost function C (L) (θ). When the optimal parameter set θ opt gives CL(θ opt ) ≤ ε LHST , the cost functions for the size L is bounded by The average gate fidelity is bounded from below as follows: Based upon this theorem, we can perform the local compilation in a similar way to translationally-invariant systems, while the cost function is replaced by Eq. (49). We have the same compilation sizeL = 2 l 0 + d H + vτ + d + 1/2 with l 0 such that ε LR or Lε LR becomes sufficiently small. After the local optimization that achieves ε LHST 1 or Lε LHST 1, we use the optimal parameter set θ opt for the L-size time evolution operator as schematically shown in Fig. 1.
We also remark extension of our protocol to other generic cases. Our protocol relies only on the existence of the LR bound, given by Eq. (20), and the locality of the ansatz. Thus, the extension to higher-dimensional systems is straightforward, in which we change the form of ε LR and replace the coefficient L in Eqs. (42) or (53) by |Λ| ∼ L D . We can also consider short-ranged, or longranged interactions since they respectively show an exponential or polynomial decay of the error ε LR (Note that we require additional conditions when considering longranged interactions for the existence of the LR bound, as discussed in Appendix B 4). The compilation sizeL increases at-most in O(log L) (for finite-ranged, shortranged interactions in generic dimension) or in O(L σ ) with σ < 1 (long-ranged interactions in generic dimension). We can expect significant reduction in the compilation size for a broad class of locally-interacting systems to compile large-scale time evolution operators. See Appendix B for the detailed discussion.

V. NUMERICAL DEMONSTRATION OF DEPTH COMPRESSION
Here, we numerically demonstrate LVQC, and in particular, we try to compress the depth of a large-scale time evolution operator by the compilation. For simplicity, we concentrate on one-dimensional systems and rely on classical simulation by time-evolving block decimation (TEBD), based on matrix product states (MPS) [39,40,52,53].
We first introduce the model and the ansatz. We adopt an anti-ferromagnetic (AFM) Heisenberg model on a onedimensional lattice, defined by We employ OBC to make it easier to simulate by MPS. The target of the depth compression is the time evolution AFM τ with a fixed time τ . On the other hand, we give the ansatz V (L) (θ) by the brickwork-structured circuit under OBC, designated by Eq. (22). We parameterize each of two-qubit gates in it by in the basis of {|00 , |01 , |10 , |11 }, where η, ζ, χ, γ, and φ denote the variational parameters. This form is chosen so that V j,j+1 can represent any two-qubit gate preserving the total Z-spin which is a symmetry of H AFM [6,10]. Here, we expect that, when the system has more than tens of qubits, its boundaries hardly affect the results. Reflecting this approximate translation symmetry, we employ a single parameter set (η, ζ, χ, γ, φ) within each layer. Upon this setup, the number of the independent variational parameters becomes 10d for the d-depth ansatz.
We examine whether we can approximate U (L) by the shallow-depth circuit V (L) as U (L) V (L) (θ opt ), with the optimal parameter set obtained in the smaller sizeL. Based on the approximate translation symmetry, we apply the protocol for translationallyinvariant systems under PBC. To be precise, based on Theorem 4, we minimize the local cost function C (L/2) LHST (U (L) , V (L) (θ)), which is expected to approximate the local-compilation cost function C (L) α=0 (θ). Then, with the optimal parameters θ opt , we compute the cost functions C LHST (U (L) , V (L) (θ opt )) and C LHST (U (L) , V (L) (θ opt )) to evaluate how well the ansatz where H odd [H even ] represents terms composed of interactions between (2k − 1)-th and 2k-th sites [2k-th and (2k+1)-th sites] in H AFM . We variationally minimize the cost function by the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method implemented in SciPy [54] with maximum iteration set to 128. The initial parameter set θ is chosen as θ d trot so that the ansatz V (θ d trot ) becomes equivalent to the Trotter decomposition with the same depth, U trot,d , except for the global phase.  )) with various d, as described by the dashed lines. For each depth d = 3, 4, 5, the ansatz with the resulting optimal parameter set θ opt overwhelms the same-depth Trotter decomposition. For instance, the 5-depth ansatz V (L) (θ opt ) provides the cost value 7.80 × 10 −5 , which is as large as that for the 40-depth Trotter decomposition, 8.48 × 10 −5 . In other words, we successfully compress the time evolution operator from depth 40 to depth 5 under the compilation sizeL = 20.
Next, we examine how the larger-scale time evolution operator U (L) is approximated by our protocol. Hereafter, we concentrate on the 5-depth ansatz, and employ the corresponding optimal parameter set as θ opt . Considering the approximate translation invariance, the size-extended ansatz V (L) (θ opt ) is constructed by copying the two-qubit gate of V (L) (θ opt ) in the spatial directions. We again approximate U (L) by the large-depth Trotter decomposition U (L) trot,d=100 , and compute the cost functions as described in Fig. 6 (b). As Theorem 4 says, the local cost functions C LHST and C remains sufficiently small compared to 1. Any cost function for the ansatz with θ opt is comparably smaller than that with θ d=5 trot , the parameter set for reproducing the Trotter decomposition with the same depth d = 5 (See the blue, orange, and light-green solid lines).
We also assess the average gate fidelity. Based on Eqs. (12) and (13), the ansatz extended to L = 40 qubits is ensured to haveF (U (L) , V (L) (θ opt )) ≥ 0.9977, while the same-depth Trotter decomposition provides F (U (L) , V (L) (θ d=5 trot )) ≥ 0.8580. Therefore, our protocol succeeds in implementing the time evolution operator for the larger-scale 20 ≤ L ≤ 40 with the limited depth by exploiting the local compilation on the sizeL = 20.
Finally, we demonstrate how well the compressed time evolution operator V (L) (θ opt ) reproduces the dynamics of larger-scale systems under the accurate one U (L) . By applying V (L) (θ opt ) or its inverse repeatedly, we can approximately simulate the stroboscopic dynamics at the time t ∈ τ Z, which is larger than the original time scale τ , with a smaller-depth circuit. Furthermore, it should be noted that our protocol can capture larger-scale phenomena in the size L despite the compilation inL < L.
To confirm this numerically, we simulate the stroboscopic dynamics which involves the time scale and the size scale respectively larger than τ = 0.5 andL = 20.
As the simplest cases, we prepare the following two initial states, |ψ (L) |ψ (L) for the size L = 40. They respectively represent ferromagnetic states having two local excitations (for |ψ LE (0) ) and two domain walls (for |ψ DW (0) ) with dis-tanceL = 20. Then, we evaluate the expectation value of Z L/2 evolving under the Hamiltonian H AFM , and the central site j = L/2 observes their collisions. Thus, the change in the expected value of Z L/2 can be employed as a diagnosis for the larger-scale dynamics involving at-leastL + 1 sites, which is larger than the compilation size. Figure 7 shows the numerical results for the approximate stroboscopic dynamics obtained by the compilation. With the 5-depth ansatz V (L) (θ opt ) obtained by the optimization in the sizeL = 20, we compute the state and its local observable, given by We employ MPS with the bond dimension 60 for simulating the dynamics from the initial states |ψ LE (0) or |ψ DW (0) , which are depicted as the orange dots respectively in Fig. 7 (a) and (b). We also compute the dynamics under the large-depth Trotter decomposition U (L) trot,d=100 as the accurate dynamics for the comparison (See the blue solid lines). In both cases, the compilation results well reproduce the accurate dynamics up to t ≤ 10τ = 5 with the mean square errors 5.27 × 10 −6 and 1.29 × 10 −6 [55]. We conclude that our prescription exploiting the intermediate-sizeL and the fixed time τ provides an appropriate shallow-depth time evolution operator useful for larger-scale quantum systems both in space and time. We also remark that the optimal parameter obtained here is expected to be useful for even largerscale quantum simulations beyond the size considered in this work from the size-dependence in Fig. 6 (b). Our numerical results suggest the feasibility of the classical local compilation to design large-scale quantum circuits, in addition to the possible quantum local compilation by NISQ devices.

VI. DISCUSSION AND CONCLUSION
In this paper, we develop the local variational quantum compilation (LVQC), in which we conduct local optimization for intermediate-scale quantum systems designated by the Lieb-Robinson bound, and obtain an approximate time evolution operator of larger-scale quantum systems. Since the approximation error of the local cost function supporting our protocol relies only on the Lieb-Robinson bound, it has broad applicability to finite-ranged, shortranged, and long-ranged interacting large-scale systems in generic dimension. LVQC begins with the local compilation by intermediate-scale quantum devices or corresponding classical simulators, and ends up with the quantum execution of the compiled larger-scale dynamics. Therefore, not only it unveils a classical approach to design large-scale quantum circuits, but also it will play a significant role in bridging NISQ device technique to the practical use of larger quantum devices as the long-term goal.
We finish this article with providing some future directions. The first one is to seek for the possibility of the local compilation in classical ways. While we refer to our protocol as "quantum" compilation, Theorems 4 and 5 are not limited to the context of variational quantum algorithms where we optimize parametrized quantum circuit in a quantum-classical hybrid manner. Our numerical demonstration based on TEBD involving up to 40 qubits is indeed a good example for using classical simulator for LVQC. Other sophisticated techniques (e.g. tensor-network-based methods for 2D systems) will also be important for executing our protocol on classical computers. We might also be able to use exact bruteforce classical simulators for LVQC in future. This is because, for finite-ranged or short-ranged systems with vτ = O(L 0 ), LVQC ensures the classical efficient evaluaion of the cost function in time e O(L) = poly(L) given that it is sufficient to takeL = O(log L) in this case. Although current classical devices are still not capable of simulating quantum systems with sizeL, which typically becomes more than tens of qubits, LVQC without resorting to approximate simulators may be available in future. Note that this does not contradict with the existing result that states the evaluation of the cost functions C LHST and C HST in polynomial accuracy with respect to the system size L for general unitaries is a DQC1hard problem [35] (DQC1; efficiently solvable problems by one clean qubit and other noisy qubits [56]), since we restrict ourselves to certain short-time local Hamiltonian dynamics and shallow depth ansatzes (See Appendix C for detail).
The second significant task for future is to accumulate benchmark results by both classical and quantum simulation, including higher-dimensional cases, shortranged interacting cases, and long-ranged interacting cases. Several programmable quantum simulators, such as superconducting qubits [57] and Rydberg atoms [58], have recently achieved a few hundred qubits with highcontrollability and two-dimensionality, and they will be available for both the local compilation and the quantum execution of the compressed time evolution. For instance, as an immediate task to be tackled, it may be possible to observe long-time dynamics beyond the current coher-ence time on such compiled quantum simulators by the classical local compilation for tens of qubits. One of the ultimate goals is to compile time evolution operators for huge quantum chemistry materials such as molecules and crystals. Although long-ranged Hamiltonians of electrons from the first principles are out of scope with the current knowledge of the LR bound, our protocol is expected to be valid for various materials under the reorganization of approximate models based on their structures. As for including the improvement for long-ranged cases, we leave it as future work. Noting that (U A ⊗ V * B ) |Φ + AB = (U A V † A ⊗ I B ) |Φ + AB and Π j can be decomposed as a sum of Pauli operator by Eq. (29), it is sufficient to evaluate for arbitrary L-qubit unitary W A and Pauli operator P A and P B , we can obtain Π j . Here, we provide an efficient algorithm to estimate Eq. (A3). First, we observe the following equality holds: where we used the definition of the Bell state |Φ is real. Now, a Monte-Carlo approach can be employed to evaluate the sum of Eq. (A5).
The algorithm we propose is as follows. First, sample x from the uniform distribution on {1, 2, ..., 2 L }. Let α x |y x = P B |x where |y x and α x ∈ {±1, ±i} is a computational basis and a coefficient determined by x and P B . Then, we estimate y x | W † A P A W A |x on an L-qubit quantum device within an addtive error . This can be achieved by utilizing the following equalities that holds for an arbitrary observable O: where |± x,y := (|x ± |y )/ √ 2 and |±i x,y := (|x ± i |y )/ √ 2. More precisely, for a given pair (x, y x ), we first evaluate expectation values ± x,yx | W † A P A W A |± x,yx or ±i x,yx | W † A P A W A |±i x,yx using N 1 samples each and then combine them according to the above formula. Let an estimator of y x | W † A P A W A |x obtained by this procedure beP A,x . Importantly, Var[P A,x ] = O(1/N 1 ). Finally, we construct an estimator of F (W A , P A , P B ) aŝ when α x is real (imaginary). Note thatF (W A , P A , P B ) is defined by two random variables x andP A,x . To see thatF (W A , P A , P B ) is indeed an efficient unbiased estimator, we analyze its expectation value and variance. Let us assume that, for a fixed x, the random variable Re[α xPA,x ] follows a probability distribution p x (a). The probability thatF (W A , P A , P B ) takes a specific value f is given by x p x (f )/2 L . Then, we can calculate E[F (W A , P A , P B )] and E[F (W A , P A , P B ) 2 ] as follows: Equation (A9) shows thatF (W A , P A , P B ) is an unbiased estimator of F (W A , P A , P B ), the desired quantity. Combining the above with where is the variance of this protocol when we can exactly esti- . This implies that a sample mean of N 2 independent samples ofF (W A , P A , P B ), which requires N = N 1 N 2 runs of quantum devices for its construction, has variance O(1/ (N 1 N 2 )) + O(1/N 2 ). Therefore, it is sufficient to take N 1 = O(1), N 2 = O(1/ 2 ) and thus N = O(1/ 2 ) to obtain an estimate of F (W A , P A , P B ) within an additive error with high probability.
The same strategy can be taken to evaluate C HST (U, V ). In this case, the task is to estimate the expectation value of Π 1 Π 2 · · · Π L with respect to (U A ⊗ V * B ) |Φ + AB . We use the fact that Π 1 Π 2 · · · Π L can also be expanded as a sum of Pauli operator: where c P = 1 when P has even number of Y and c P = −1 otherwise. This decomposition has exponential number of Pauli operators, and a naive approach where we estimate expectation values of every Pauli operator takes exponential time to L. However, we can take an Monte-Carlo approach to evaluate this sum by interpreting the coeffcient 1/4 L as a proability. The algorithm for evaluating C HST (U, V ) is as follows. First, we pick up a Pauli operator P ∈ {I, X, Y, Z} ⊗L randomly. Then, we estimate the expectation value of P ⊗ P using the algorithm in the proof of Lemma 1. Repeating the above procedure N 3 = O(1/ 2 ) times while setting N 1 , N 2 = O(1), we obtain C HST (U, V ) within an additive error with high probability using N = N 1 N 2 N 3 = O(1/ 2 ) samples in total.

Appendix B: Extension to other cases
In the main text, we mainly focus on one-dimensional systems with finite-ranged interactions. Here, we discuss the extensions of our results to other cases in terms of the range of interactions and the dimension of systems.
From the derivation of Theorems 4 and 5 in the main text, the range of interactions and the dimension affect our results only via ε LR in Eq. (20), coming from LR bound. To be precise, we should change the choice of the intermediate size L = 2(l 0 +d H +vτ ) orL ≥ L +2d +1, which designates the restriction of the Hamiltonian and the ansatz, so that the bound ε LR can be ignored. Thus, after deriving ε LR caused by the Hamiltonian restriction in Appendix B 1, we devote the following sections B 2-B 4 to discuss an appropriate choice of the size for finiteranged, short-ranged, and long-ranged cases in generic dimension.

Hamiltonian restriction by Lieb-Robinson bound
We first discuss the error bound ε LR in Eq. (20), caused by the restriction of Hamiltonian to a local terms around a site j. Let us assume that a Hamiltonian H has the LR bound designated by for local observables O X and O Y , whose supports are respectively the subsets of the lattice, X and Y (⊆ Λ). The distance between domains is defined by We also define the distance between a site j and a domain Y by dist(j, Y ) = dist(X = {j}, Y ). Assuming the existence of the LR bound, we consider the dynamics of local observables. We define the restriction of the Hamiltonian H (L) = X h X for generic Ddimensional systems by where L and L (≤ L) respectively represent the linear scales of the lattices Λ and Λ L ,j . It is expected that the dynamics of local observables, e iH (L) τ O j e −iH (L) τ , is well described by the restricted Hamiltonian H (L ,j) for sufficiently large L , and in fact, it has been proved by Refs. [41][42][43] for finite-ranged and short-ranged cases.
In order to cover long-ranged cases and make our paper self-contained, we summarize and rederive the result in a slightly different way below. After that, we derive proper choice of the compilation sizeL for finite-ranged, shortranged, and long-ranged cases in generic dimension.
Lemma 6. We assume the existence of the LR bound in the form of Eq. (B1) on the Hamiltonian H (L) , and define the size of a domain X ⊆ Λ by When the function C(r, t) is monotonically decreasing in the distance r and monotonically increasing in the time τ , the inequality is satisfied, where the length scale r H is an arbitrary value satisfying 0 ≤ r H ≤ L /2, and the constants C 1 and C 2 are independent of L and L .
Introducing an arbitrary length scale r H , satisfying 0 ≤ r H ≤ L /2, the summation over X, which is not a subset of Λ L ,j , can be divided in the following way, where each of X A , X B (r H ), and X C (r H ), is defined by Using the triangular inequality of the operator norm, Eq. (B12) is further bounded by We now evaluate the upper bound of ε AB (r H ) and that of ε C (r H ), respectively. For the first term ε AB (r H ), we use the fact that a domain X, which belongs to X A ∪ X B (r H ), satisfies dist(j, X) ≥ L /2 − r H from their constructions Eqs. (B15) and (B16). Using the LR bound Eq. (B1) for the integrand, ε AB (r H ) is bounded by In the first inequality, we employ the monotonicity of C(r, t), which validates the replacement by C(dist(j, X), t) ≤ C(dist(j, j ), τ ) for X j and t ≤ τ . For the second inequality, we use Eq. (15). Concerning the summation over j in the last line, the number of sites j satisfying dist(j, j ) r is proportional to the surface area S D r D−1 under the finite density ρ. Thus, the summation j ;dist(j,j )≥L /2−r H is expected to be approximated by ∞ L /2−r H drρS D r D−1 . As a matter of fact, following this intuition, when C(r, t) is monotonically decreasing in r and the number of sites per volume is finite, there exists a positive constant C 3 such that for generic D-dimensional systems [47]. Here, the constant C 3 depends only on the dimension and the density of the lattice, but not on L and L . Defining the constant C 1 by C 1 = gτ O j C 3 , ε AB (r H ) is bounded from above by the first term in the right hand side of Eq. (B7).
For the second term ε C (r H ), we soon arrive at where we use the definition of X C (r H ), Eq. (B17), to derive the second inequality. When we choose a constant C 2 by 2τ O j , which is independent of L and L , ε C (r H ) is bounded by ε(r H ) [See Eq. (B8)] from above. Combining these upper bounds for ε AB (r H ) and that of ε C (r H ), we obtain the bound f (τ ) ≤ ε LR with taking ε LR by Eq. (B7), thereby completing the proof of Lemma 6.
Let us discuss in what conditions we can extend our results to other cases. The change in the dimension and the range of interactions only affects the proper choice the partial system size L , which designates the linear scale of the Hamiltonian restriction. Once L is determined, the remaining protocol is completely same as that of the onedimensional finite-ranged cases; we compile the dynamics using a quantum system with sizeL ≥ L + 2d + 1 [d = L /4+d : depth of the variational quantum circuit V (θ)]. Therefore, it is sufficient to make ε LR small enough with a proper sizeL based on Theorems 4 and 5. Depending on what kind of observables is focused on, we have different conditions. When considering local observables under the approximate circuit V (L) (θ opt ), we require ε LR 1 to keep the local cost functions small according to Eqs. (41) or (52). In this case, to extend our results, it is thus sufficient to choose sufficiently large L that makes ε LR 1 while keeping L /L < 1 so that the compilation size is smaller than L. On the other hand, when a near-unity average gate fidelity is required for global observables, we demand that |Λ|ε LR ∼ L D ε LR 1 based on Eqs. (48) and (54). As a result, the sufficient condition in that case is to achieve L D ε LR 1 with sufficiently-large L while keeping L /L < 1. In the following subsections, we derive how ε LR scales with respect to L in finite-ranged, shortranged, and long-ranged interacting cases to confirm that our protocol can be applied to these setups.

Finite-ranged cases in generic dimension
We consider finite-ranged cases in generic dimension. As introduced in Eq. (17), we here assume where d H designates the range of interactions. Finiteranged interacting systems have the LR bound C(r, t) = C exp{−(r − vt)/ξ} under a fixed time t, with some constants C, v, and ξ, as introduced in Eq. (18) [38]. Let us evaluate the bound ε LR . We set L = 2(l 0 +d H + vτ ) with a tunable scale l 0 , and choose the parameter r H in Eq. (B7) by r H = d H (≤ L /2). From the assumption of the range of interactions, ε(r H ), defined by Eq. (B8), vanishes. This results in the bound, reproducing Eq. (21) in the main text. With some elementary integration using the gamma functions, we arrive at (B26) Since the term in the summation is a polynomial of degree D − 1 in l 0 + vτ , there exists a positive constant C 4 satisfying Since ε LR exponentially decays in l 0 with polynomial corrections, both ε LR and L D ε LR can be arbitrarily small with sufficiently large L such that L /L < 1. Thus, we can apply the LVQC protocol to finite-ranged cases including high-dimensional systems.
Next, let us discuss how to choose the appropriate compilation sizeL. When focusing on local observables, we demand ε LR 1, which results in the following choice; 1. Choose l 0 so that can be ignored compared to 1.
To make Eq. (B28) small enough, l 0 should be at least larger than ξ, which is the localization length of the LR bound. Thus, our protocol typically requires the linear scaleL 2(ξ + d H + vτ + d ) for evaluating the cost functions. High-dimensional cases with D ≥ 2 have logarithmic corrections in its exponent. Although larger linear scale is required compared to one-dimensional cases, still we can expect much decrease in the size. On the other hand, when considering global observables, we demand L D ε LR 1. This brings an additional exponent D log L to Eq. (B28). As a result, the typical size for compilation becomesL 2(ξ + d H + vτ + d + Dξ log L) to ensure high average gate fidelity for larger quantum systems.

Short-ranged cases in generic dimension
Let us discuss short-ranged interacting systems in generic dimensions. In these cases, the range of interactions is infinite but their strength is suppressed exponentially in the distance as X;X j,j h X ≤ h exp (−dist(j, j )/ζ) , ∀ j, j ∈ Λ, (B29) with some positive constants h and ζ, for the Hamiltonian H (L) = X h X . The LR bound is the same as that of finite-ranged cases, C(r, t) = C exp{−(r − vt)/ξ} [41][42][43]. Now, we evaluate the bound ε LR for short-ranged cases. We choose the size L by L = 2(l 0 + r H + vτ ) with two tunable parameters l 0 and r H . The first term of ε LR in Eq. (B7) is the same as that of finite-ranged cases, resulting in the bound in Eq. (B27). The second term ε(r H ) is then bounded by We can again replace the summation over i and i by the integration over the D-dimensional real space like the derivation of Eq. (B22) from Eq. (B21). With the use of a proper positive constant C 5 , independent of L and L , we arrive at the following bound; Finally, using the relation L = 2(l 0 + r H + vτ ), ε LR satisfies the following inequality; where C 4 and C 6 are some positive constants independent of L and L .
Similar to finite-ranged cases, both ε LR and L D ε LR can be arbitrarily small with properly increasing l 0 and r H under L /L < 1. When we focus on local observables for larger-scale dynamics demanding ε LR 1, we should choose the compilation sizeL in the following way.
1. Choose l 0 so that can be ignored compared to 1.

2.
Choose r H so that (B34) can be ignored compared to 1, under the above choice of l 0 .
In contrast to finite-ranged cases, the error ε LR always has logarithmic corrections in its exponent, and has two independent tunable parameters for the scaleL. To make both Eqs. (B33) and (B34) sufficiently small, the compilation sizeL should be at least larger than 2(ξ+ζ+vτ +d ) (ζ: the typical range of interactions), which gives the typical size scale of short-ranged cases. When the high average gate fidelity is required, we replace the protocol by adding D log L to the exponents of Eqs. (B33) and (B34), to achieve L D ε LR 1. Then, the typical compilation size scale becomesL 2{ξ + ζ + vτ + d + D(ξ + ζ) log L}.

Long-ranged cases in generic dimension
The last case we consider is a long-ranged Hamiltonian in generic dimension. Here, we assume power-law interactions, satisfying for any sufficiently large distance R > 0, where h and α denote some positive constants. One of the simplest cases is the long-ranged transverse Ising model defined by on a D-dimensional lattice Λ. While a series of recent studies have succeeded in extending the LR bound to long-ranged cases in different ways [44][45][46][47][48][49], we hereby focus on one of their results, derived in Ref. [47]. When the power α is larger than the dimension D, there exist positive constants v, C 7 , and C 8 , such that for any σ satisfying (D + 1)/(α + 1) < σ < 1. Here, f σ (x) is a monotonically increasing function in x independent of L, and can be regarded as a positive constant for fixed τ and σ.
We compute the upper bound of ε LR based on Eq. (B7). The intermediate size L is again given by L = 2(l 0 + r H + vτ ) with two tunable parameters l 0 and r H . Substituting the above LR bound into Eq. (B7), the first term of Eq. (B7) is bounded by The first integral in the right hand side is computed by the substitution of s = r 1−σ − (l 0 + vτ ) 1−σ , which results in [The first term in the r.h.s of Eq. (B38)] with n Dσ = D/(1 − σ) ∈ N. As we derive Eq. (B27) from Eq. (B25) using the gamma functions, there exists a positive constant C 9 , which is dependent only on D and σ, such that is satisfied. On the other hand, considering D − 1 − σα < −1 from (D + 1)/(α + 1) < σ < 1, the second integral in the right hand side of Eq. (B38) is easily computed as We define a positive constant C 10 by C 10 = We note that this bound is independent of L, and vanishes with increasing l 0 → ∞. When the tunable parameter r H is sufficiently large, the second term ε(r H ), defined by Eq. (B8), immediately satisfies the following inequality, where we use the assumption of long-range interactions, Eq. (B35). Considering that the volume |Λ L ,j | is proportional to (L ) D , there exists a positive constant C 11 such that ε(r H ) ≤ C 11 (l 0 + r H + vτ ) D · (r H ) −α . From the assumption of α > D, this bound vanishes under r H → ∞ when the other parameter l 0 is fixed. Summarizing the results in Eqs. (B42) and (B43), we obtain the bound of ε LR for long-ranged cases in generic dimension as In contrast to finite-ranged and short-ranged cases, the bound ε LR shows polynomial decay inL, which leads to the absence of characteristic length. In addition, this also alters applicability of the LVQC protocol depending on which we focus on local or global observables for largerscale systems. When we are interested in local observables, ε LR 1 is demanded. Since ε LR is independent of L, we can make ε LR arbitrarily small by increasing l 0 and r H under the constraint L /L < 1. We can apply the LVQC protocol as long as the LR bound exists (e.g. α > D is required when we employ the LR bound in Ref. [47]). The proper compilation sizeL is organized by the following steps; 1. Choose l 0 so that both of and (l 0 + vτ ) D−σα become sufficiently small compared to 1.

2.
Choose r H so that (l 0 + r H + vτ ) D /(r H ) α can be ignored compared to 1, under the above choice of l 0 .
Here, we have options in the parameter σ satisfying (D + 1)/(α + 1) < σ < 1. Since the constants C 9 and C 10 are divergent for σ around its lower and upper bounds [See Eqs. (B39) and (B41)], a possible good choice may be σ = {(D + 1)/(α + 1) + 1}/2. When we are interested in global observables, we demand L D ε LR 1. The protocol to choose L is largely the same as the above one, where each term in ε LR is replaced by the corresponding term in L D ε LR . However, due to the polynomial decay of ε LR in l 0 and r H , we should impose additional conditions on the exponents α and D. Let us discuss asymptotic behaviour of the compilation size by defining the scaling l 0 ∼ L β and r H ∼ L δ with β, δ < 1. Multiplying the right hand side of Eq. (B44) by L D , we have three terms that should decay. The first term decays sub-exponentially in l 0 but polynomially increases in L. It can thus be made arbitrarily small by choosing sufficiently large l 0 . With regard to the second term, we demand the convergence of L D (l 0 + vτ ) D−σα ∼ L D+β(D−σα) (Here we assume vτ is constant). As a result, the inequalities, σα − D > 0 and D σα − D < β < 1 (B46) should be satisfied. The relation, β < 1, ensures reduction in the compilation size. The above inequality implies α > 2D and σ > 2D/α must be satisfied for successful size reduction. Finally, the third term scales as L D max(β,δ)−αδ+D . Taking the above constraints on β and σ, the sufficient condition for the vanishing third term is to satisfy σD σα − D < δ < 1.
To summarize, when demanding the high average gate fidelity, we can apply the LVQC to long-ranged interacting systems with the exponent α > 2D, which is stricter than what is required for the existence of the LR bound. Then, the compilation sizeL is at-least proportional to L D/(σα−D) with 2D/α < σ < 1. Let us finally discuss concrete examples of systems where we can apply LVQC successfully. With the usage of the LR bound for long-ranged cases derived in Ref. [47], the constraint for local observables, α > D, tells us the availability of the LVQC to various systems, such as 1d systems with dipole-type interactions (α = 2, D = 1) and 1d/2d systems with van der Waals interactions (α = 5, D = 1 or α = 4, D = 2). On the other hand, the constraint on global observables, α > 2D, implies the applicability to limited cases, such as 1d systems with van der Waals interactions (α = 5, D = 1) within the above examples. In both cases, the application to long-ranged Hamiltonians of electrons from firstprinciples (i.e. α = 1 − D by Coulomb potentials) seems to be difficult with the current knowledge of the LR bound. Anyway, we expect applicability of the LVQC to broader systems with the usage of other formulations on the LR bound [44-46, 48, 49] or as its further development. In this section, we discuss how the LVQC protocol is related to computational complexity of QAQC. According to Ref. [35], the determination of the cost functions belongs to DQC1-hard problems. This indicates that efficient QAQC by classical computers is difficult. On the other hand, our LVQC enables efficient evaluation of the cost functions with a restricted sizeL, and in some cases, we can efficiently complete the protocol by MPS like Sec. V. Here, we resolve this apparent contradiction.
We first introduce the complexity class, DQC1 (deterministic quantum computation with one clean qubit) [56]. Here, we concentrate on a one-dimensional system (extension to higher-dimensional systems is straightforward). In the DQC1 model, we prepare an (L + 1)-qubit initial state, composed of one clean qubit and the other qubits lying in a maximally-mixed state, as Then, we apply a unitary gate U with the depth up to poly(L), and obtain the following probability by measuring the first clean qubit, p z = Tr[(|z z|) 1 U ρU † ], z = 0, 1.
We refer to the problem of determining the probability p z with a multiplicative error ε < 1 as the DQC1 models. DQC1 models are originally introduced to evaluate the power of nuclear magnetic resonance quantum computes. Famous examples of DQC1-complete problems are estimating spectral density [56], trace of unitary matrices [59], and the Jones polynomials [60]. Importantly, Ref. [61] proves that, if the probability p z can be sampled with poly(L)-time classical algorithms, the polynomial hierarchy will collapse to the second level. This implies that efficiently simulating the DQC1 models in classical ways is unlikely. Recently, Ref. [35] has revealed that the determination of the global cost function C HST or the local one C LHST with an error ε < 1/poly(L) is DQC1-hard for poly(L) depth unitaries U and V ; any DQC1 model can be reduced to the problem of determining the above cost functions. Based on this fact, quantum compilation with the cost functions C HST or C LHST is also expected to be difficult by classical computation. The LVQC seems to give contradictory results by the size reduction. Let us consider one-dimensional systems with finite-ranged interactions, and assume that the compilation sizeL = 2 l 0 + d H + vτ + d + 1/2 satisfies L ∝ log L. We can classically compute the cost function C (L) (θ) with accuracy 1/poly(L) by employing matrices whose dimension is e O(L) ∼ poly(L) based on Eqs. (8) and (49). It takes at-most poly(L) time for its classical evaluation. Considering that ε LR is suppressed as ε LR < e −O(L) = 1/poly(L), Propositions 2 and 3 (or the proof for Theorem 5) say |C LHST (U (L) , V (L) (θ)) − C (L) (θ)| < 3 4 ε LR = 1/poly(L).
(C3) Therefore, we can classically determine the local cost function C LHST (U (L) , V (L) (θ)) with polynomial time in the system size L. Does this imply the collapse of the polynomial hierarchy or the fault of the LVQC formalism? As the discussion below, the LVQC protocol concludes neither of them.
We resolve the discrepancy depending on the size of the causal cones brought by the LR bound, vτ . The first case is where the time τ is constant. Then, the time evolution operator U (L) = e −iH (L) τ is not universal under the locality. The LR bound allows to regard it as a O(L 0 )-depth circuit in terms of the local observable C LHST . Therefore, while the local cost function C LHST (U (L) , V (L) (θ)) can be actually obtained by poly(L)-time classical computation, this case is not problematic. The second case is vτ ∝ L κ , where we can expect the size reduction if we assume 0 < κ < 1. In that case, the compilation sizeL = 2 l 0 + d H + vτ + d + 1/2 is proportional to L κ , and cannot scale as log L. Thus, the above discussion predicting the poly(L)-time classical evaluation is precluded, which results in the consistency of the LVQC with the DQC1-hardness of determining the cost function C LHST . Similarly, the LVQC appears to allow classicallyefficient evaluation of the global cost function C HST , but there exists no conflict with its DQC1-hardness.
We emphasize some points through this discussion. First, in some cases, there remains possibility of the local compilation by classical computers. For finite-ranged or short-ranged interacting systems under vτ = O(L 0 ), the LVQC can be completed with poly(L)-time classical computation. While we employ an approximate classical algorithm relying on MPS in Sec. V, we expect that high-performance classical computers in the future will achieve the compilation for the sizeL ∼ log L without resorting to any approximation. On the other hand, we also note that intermediate-scale quantum devices still play a significant role in the local compilation. While the compilation sizeL scales as log L in the above cases under L → ∞, the remaining constant term is not so small for current classical computers. For instance, as the numerical simulation in Sec. V, a typical 1d spin chain with finite-range interactions requiresL = 20, resulting in the compilation using 40-qubit quantum systems. It will be necessary to prepare hundreds or thousands of qubits for higher-dimensional systems involving finite-, short-, and long-ranged interactions. Since the DQC1hardness denies poly(L)-time classical simulation of the local compilation, NISQ devices will be essential to compile larger-scale time evolution operators.