Optimization schemes for unitary tensor-network circuit

An efﬁcient representation of a quantum circuit is of great importance in achieving a quantum advantage on current noisy intermediate scale quantum (NISQ) devices and the classical simulation of quantum many-body systems. The quantum circuits are playing the key ingredient in the performance of variational quantum algorithms and quantum dynamics in problems of physics and chemistry. In this paper, we study the role of the network structure of a quantum circuit in its performance. We discuss the variational optimization of quantum circuit (a unitary tensor-network circuit) with different network structures. The ansatz is performed based on a generalization of well-developed multiscale entanglement renormalization algorithm and also the conjugate-gradient method with an effective line search. We present the benchmarking calculations for different network structures by studying the Heisenberg model in a strongly disordered magnetic ﬁeld and a tensor-network QR decomposition. Our work can contribute to achieve the most out of NISQ hardware and to classically develop isometric tensor network states.


I. INTRODUCTION
The field of tensor-network formalism has emerged as a novel toolbox to address the long-standing problems of modern condensed-matter physics [1,2]. It offers an unbiased variational ansatz to simulate the physics of strongly entangled many-body systems, such as frustrated spins [3][4][5][6][7][8][9] and interacting fermions [10][11][12]. Tensor-network variational methods are not hampered by the exponential growth of the Hilbert space and the so-called sign problem (occurring in quantum Monte Carlo algorithms), with the only essential limiting parameter being the amount of entanglement in the system. Since the low-energy states of physical systems obey entanglement area-law [13][14][15], they could be efficiently simulated by a tensor-network ansatz in a polynomial time scale. However, tensor-network methods are not only limited to numerical simulation of many-body systems, but they are rapidly being extended to other areas of research such as the classification of novel phases of matter [16][17][18][19], machine learning and quantum computation [20][21][22][23][24][25][26], and variational (nonperturbative) approaches to quantum field theories [27][28][29].
While mostly the tensor-network states are the underlying key ingredient in the variational ansatz; but in many physical systems, a unitary tensor-network circuit (uTNC) plays that key role. In a system that exhibits quantum many-body localization (MBL) [30][31][32], violating the eigenstate thermalization hypothesis (ETH) [33,34], the full energy spectrum could efficiently be represented by a uTNC as (almost) all eigenvectors obey an entanglement area law [35][36][37][38][39]. One can efficiently approximate the unitary that diagonalizes an MBL Hamiltonian [40,41] by a local unitary circuit with finite depth. Additionally, in the context of quantum computation, a quantum algorithm is performed by a unitary quantum circuit including gates. One can think of this circuit as a uTNC and use well-developed tensor-network techniques such as efficient exact contractions, entanglement-truncation techniques, and optimization algorithms to systematically manipulate the circuit as desired [23,25,26]. Furthermore, in the context of projected entangled-pair states (PEPS) algorithms, a canonical form has been recently proposed based on tensor-network QR decomposition. The tensor-network Q is made of a circuit of unitary and isometry, representing an isometric tensornetwork circuit (iTNC) [42,43].
The optimization of uTNC is usually performed by minimizing a cost function by using conjugate-gradient-based methods [44,45]. The computational cost of the ansatz scales linearly with the system size, whereby larger system sizes could be simulated. In the first proposal of the variational uTNC ansatz [40], aimed at addressing MBL, the authors utilized a set of two-body unitary gates (rank-four tensors) arranged in a regular network to minimize the energy variance, which served as the cost function. It was shown that the accuracy of the ansatz rapidly improves with the depth of the network, denoted τ . Recently, it has been discussed that a different scenario [41] could significantly improve the accuracy: The main idea is to use l-body unitary gates in the same regular architecture as before, but the number of layers is fixed at τ = 2. To reduce the computational cost, the authors use a cost function based on the integrals of motion (instead of energy variance).
There remain many possible avenues to improve the variational uTNC ansatz and establish a standard method. The primary drawback of this approach is the poor convergence rates of conjugate-gradient algorithms, which significantly increases the computation cost. Specifically, for highly entangled excited states many iterations are required to obtain a converged result, limiting the control parameter τ and l. Additionally, it is unknown how different network structures play a role in the performance of the uTNC ansatz. One might utilize novel structures (aimed to more efficiently address entanglement) to improve the accuracy, without increasing the computational cost. Furthermore, a notable advantage of this method, compared to energy targeting methods [46][47][48][49], is the ability to simulate the real-time dynamics of the system, which is of great interest.
In this paper, we address the aforementioned possibilities by introducing new network structures and efficient optimization protocols, similar to that of multiscale entanglement renormalization ansatz [50] (MERA), to improve upon previous approaches. We use well-developed MERA algorithms [51] to show the energy variance (cost function) can be locally evaluated efficiently. The energy variance is then minimized by two optimization protocols which significantly accelerate convergence rates: (i) a linearizing algorithm used in MERA and (ii) a conjugate-gradient algorithm with an efficient linesearch method [52]. Both algorithms reduce computational cost as the system size and/or variational parameters increase. We further discuss that the MERA-like network structure not only takes into account larger correlations (resulting in better accuracy) but also could simply be used to study real-time dynamics.
We numerically benchmark the variational uTNC ansatz for the Heisenberg model with random magnetic fields [38,53] and in a tensor-network QR decomposition [42] to check the validity of these methods. We compare the accuracy of different networks with that of previous ones and analyze the improvement in computational cost. A sample PYTHON source code for the algorithms, presented in this paper, is available Ref. [54].

II. UNITARY TENSOR-NETWORK CIRCUIT ANSATZ
The main idea in tensor-network formalism is to efficiently represent a quantum state/operator in terms of local tensors (an object with several indices) connected by so-called virtual bonds. Tensor network representations allow us to manipulate the quantum state/operator through the individual tensors even for a large number of particles (N → ∞). The tensors are connected through a specific network structure usually determined by intuition from the physical properties of the system. The network structure is designed to capture, at the least, the main global physics of the system such as scaling of entanglement entropy and correlation functions [55]. For instance, to faithfully describe a 1D gapless system, a 2D holographic network structure (produced by MERA) is required to generate correctly logarithmic entanglement scaling and algebraic fall-off of the correlation functions. On the other hand, 1D gapped systems respecting entanglement area law require a simple 1D tensor-network structure, produced by matrix product state (MPS). Importantly, the tensor-network structure plays a key role in the efficiency and accuracy of the resulting tensor-network ansatz.
In this section, we aim to use a MERA-like network structure to build a unitary tensor-network circuit U {u,w} to approximate the unitary U that diagonalizes a Hamiltonian H where D is a diagonal matrix, whose diagonal elements are the eigenvalues of Hamiltonian H. The uTNCŪ {u,w} is composed of local unitary tensors {u, w} as shown in Figs. 1(a) and 1(b). The tensors {u, w} are then connected through a MERA network structure to con-structŪ {u,w} , as depicted in Fig. 1(c). Its explicit form is given byŪ where |τ = |τ 1 · · · τ N forms a complete basis of 2 N states (the same for |τ ) and F denotes tensor contraction. Sincē U {u,w} approximately diagonalizes the Hamiltonian, then the eigenstates are given byŪ {u,w} |τ , i.e., which reveals that all eigenstates |ψτ are represented by a finite-range nonhomogeneous MERA. In other words,Ū {u,w} encodes all eigenstates into a finite-range nonhomogeneous MERA. In this context, the tensors {u} are the so-called unitary disentanglers removing short-range entanglement; while the tensors {w} transform a block of three spins into one single superspin. The tensors {u, w} define a coarse-graining transformation which maps the spins at the (τ )th layer, with Hilbert space dimension χ τ , into superspins at the (τ + 1)th layer, with a larger Hilbert space dimension χ (τ +1) = χ 3 (τ ) . In order to avoid this growth of Hilbert space, the number of layers should be limited to finite values, hence the finite-range MERA.
All tensor-network variants of MERA, such as the so-called binary, modified binary, ternary, and tree tensornetwork structures could be used as a network in the uTNC [56]-obtained by replacing the isometric tensors by unitary ones. We have shown some network structures in Figs. 1(c)-1(f) used in this paper as a uTNC ansatz. The irregular uTNC, made of two connected binary MERA (one has been rotated) through long bonds, has an important feature that allows the light cone to grow exponentially with the uTNC's depth τ , compared to regular uTNCs, where the light cone grows linearly. Thus it is expected that the irregular network structure could better capture long-range properties.
The proposed uTNCs provide varying levels of accuracy and computational cost so that the most efficient ansatz can be empirically chosen. Furthermore, the MERA optimization methods can be explicitly generalized to uTNC ansatz, as we describe it in Sec. III. For the sake of simplicity, we use the ternary uTNC, depicted in Fig. 1(c), to discuss the algorithms and evaluation of the real-time dynamics.

III. OPTIMIZATION ALGORITHMS
We discuss some "improved" optimization methods to approximate the unitary U ≈Ū {u,w} based on mimizing a cost fucntion. As we aim to obtain the full spectrum of a particular Hamiltonian H, we consider the energy variance σ 2 as cost function [40] and seek to minimize it with respect to the tensors {u, w}. The cost function σ 2 reads where symbol (i, i) stands for diagonal elements of a matrix. The cost function σ 2 is the energy variance summed over all approximate eigenstates. Note σ 2 = 0 strictly indicates thatŪ {u,w} exactly represents all true eigenstates. The first term does not play any role in optimization procedure, so we ignore it and seek to maximize the second term, which can be efficiently calculated (computational time scales linearly with system size N). This is done by the so-called ascending and descending superoperators (similar to those appearing in standard MERA algorithms) that are used to move local operators up and down to different layers, as illustrated in Fig. 2(a) and detailed in the following section.

A. Efficient representation of cost function
We assume that the Hamiltonian H only includes shortrange interactions, i.e., H = i h i,i+1 . The local terms h could be mapped to upper layers τ by applying ascending superoperator A as shown in Fig. 2(b). This produce a sequence of operators, each defined on different layers where upper indices specify the layer τ (0 τ T ). The term h (0) shows local original interaction h 0 ≡ h, while h (τ ) represents a two-body interaction defined on two adjacent superspins with local Hilbert-space dimension χ τ . By using the superoperator A, we simply find thatŪ † HŪ = The square of this quantity, which appears in the cost function takes a simple form, given by where Fig. 2(d). The tensor contraction F is defined in Fig. 2(e) which is slightly different from the matrix trace. The main reason for such simplification is due to the unitary constraint (annihilating most tensors to the identity) which makes the computational time linear in the system size, i.e., 2 . Therefore, the final form of the cost function σ 2 is given by In this form, only operators at the topmost level T appear, but in the optimization procedure, we need a similar expression for each layer τ . To do that, we need to define the counterpart of ascending superoperators A to move the operators to lower layers, i.e., descending superoperators D as depicted in Fig. 2(c). Similarly, we obtain a sequence of operators in different layers By using both superoperators A and D, we can finally express the cost function in layers τ < T where the symbol "tr" stands for the matrix trace. The leading computational cost (in all steps) scales as O(χ 9 T −1 ) for the ternary uTNC and as O(χ 10 T −1 ) for the binary uTNC.

B. Conjugate-gradient method with efficient line-search algorithm
With an efficient method for computing the cost function, we now need to minimize it by sequentially optimizing the local unitary tensors. An iterative strategy provides an efficient way to do that: At each step, one tensor is optimized while others are held fixed, repeating this for all tensors until the cost function does not change significantly. However, the nonlinear nature of the cost function and the unitary constraint for each tensor makes this optimization procedure challenging. Well-developed algorithms exist to handle this optimization, specifically variants of the gradient method, such as steepestdescent (SD), conjugate-gradient (CG), and quasi-Newton algorithms.
The basic idea in the SD method is to minimize the cost function in the direction of the gradient. In each iteration i, the new unitary tensor u i+1 is obtained by where α is step-size parameter and g i is the Riemannian gradient direction (an skew-symmetric matrix) where tensor Y u i is the so-called environment tensor, obtained by contracting all tensors around u i . In our case, it has a simple tensor-network representation, thanks to simple form of the cost function obtained in Sec. III A, which is calculated once in each step i. After obtaining the gradient direction, the next step is to find the optimal value of α, referring to the line-search algorithm parameter. The standard approach is to minimize the cost function σ 2 (α) by using the above equation while systematically decreasing α, often according to certain criteria, such as the Armijo condition. Unfortunately, in practice, finding optimal α requires multiple evaluations of cost function, making the line-search algorithm the main computational bottleneck of both the SD and CG methods. We could improve the gradient optimization implementation by using a better choice for the search direction. In the SD method, the minimum point is usually approached in a zig-zag route (takes 90 • turns at every iteration), but in the CG method, the minimum point is often reached via a relatively direct path. There, the new search direction is determined by a proper choice between current and previous search directions, thus the new unitary tensor u i+1 is obtained by The parameter β i is determined by Polak-Ribiére formula (or other prescriptions), i.e., β i = , where the notation denotes the matrix norm. In principle, the CG algorithm requires an accurate search direction, thus a line-search algorithm plays a crucial role in the efficiency/accuracy.
A strategy to make the line-search method more efficient, avoiding multiple evaluations of the cost function, is to directly approximate the first-order derivative of the cost function and solving equation ∂ α σ 2 (α) = 0. The zero points of this equation are the optimal step-size α. We use a loworder polynomial approximation (up to pth order) and take the corresponding smallest positive root as step-size value, which significantly reduces computational time by avoiding cost function evaluations [52]. This method provides accurate results while the cost function (and its derivatives) would be almost periodic with respect to α. The function σ 2 (α) is called -almost periodic, if there exists a real number T , so that |σ 2 (α + T ) − σ 2 (α)| < , ∀α. The steps to find an optimal step size are the following (for more detail see Ref. [52]).
(a) Period of the cost function. Compute the largest eigenvalue of the g i , i.e., |w max |. The largest polynomial degree of u i , appearing in the cost function, determines the order of the cost function, denoted q. The parameter q is equal to 4 and 2 in the energy-variance cost function and the tensor-network QR decomposition, respectively. The period of cost function is then obtained by 1 T = q|w max | 2π . (b) Sampling cost function by equispaced points.
p , · · · , T } and p is the low-order polynomial approximation parameter controlling accuracy. For k ∈ {0, · · · , p}, we obtain the values of first-order derivative of cost functions (c) Polynomial coefficients. Obtain the p × p matrix z mn = μ n m , and p × 1 matrix b m = η m − η 0 , where indices m, n ∈ {1, · · · , p}, to compute coefficients a = z −1 b. (d) Step size. By finding the smallest real positive root ρ min of a 0 + a 1 x + · · · + a p x p = 0, where a 0 = η 0 and a 1 , · · · , a p is obtained from previous step (c), we can estimate the step size.
If there is no solution to the polynomial equation, we need to increase T or use alternative line-search algorithms to find step size α. In practice, we find that this algorithm works quite well (even with small values of the polynomial order p ∼ 3-4) with the type of cost functions we are dealing with. The PYTHON code for this implementation in Ref. [57].

C. Linearizing algorithm
We describe an alternative method based on a linearizing application to optimize the cost function. The difficulty in minimizing the cost function mostly lies in its nonlinear dependence upon the unitary tensors and the unitary constraint. The cost function σ 2 is a quartic function with respect to u i , including, at the most, fourth-degree polynomial terms. The basic idea to simplify the optimization procedure is to temporarily make the cost function "linear" with respect to u i , holding fixed all other tensors. To do that, we follow a strategy similar to one adopted in MERA simulation. We rewrite the cost function as follows: where Y u i ,u † i is again the environment tensor. Note Y u i ,u † i strictly depends on {u i , u † i }, but the basic idea is to temporary assume Y u i ,u † i to be independent of them and accordingly minimize the cost function σ 2 . The exact solution of this minimization problem is given by u i+1 = −v † w, where w and v are determined by singular value decomposition of the environment tensor Y i = w † sv. We then repeat this process until u i converges. The steps for the linearizing algorithm are as follow: (a) Environment tensor. Compute the environment tensor of u i by contracting all tensors in σ 2 , excluding u i , then perform singular value decomposition to split it into Y u i ,u † i = w † sv.
(b) Update. Choose u i+1 = −v † w and replace u i+1 → u i . (c) Evaluation. Evaluate cost function σ 2 and return to step (a) if it does not meet convergence criteria.
The computational bottleneck of the linearizing method is computing the environment tensor. Since it does not require a line-search algorithm, this increases significantly the convergence rate. However, it is not generally guaranteed to provide accurate results, hence, its accuracy/validity should be empirically examined-as we observe in some cases it fails to find global minima, getting stuck in local minima, see Sec.V.

IV. TIME-EVOLUTION ALGORITHM
In this section, we study the long-time dynamics of a local quantity by using ternary uTNC. We expectŪ (after the optimization procedure) to represent accurately U , hence Our goal is to study the following equation: whereÔ stands for a local operator, e.g., defined on one site and |ψ (0) represents the initial quantum state at time t = 0. The diagonal matrix D requires calculating 2 N eigenvalues, which grows exponentially, thus further simplification is still required. By using the ascending superoperator A, as shown in Fig. 3(a), we obtainÔ whereÔ (T ) is still a local operator defined on one superspin at the last layer T . In addition, we could find that e iDtÔ(T ) e −iDt has a simple form defined on three adjacent superspins, given byŌ Note that {τ j−1 , τ j , τ j+1 } represent the Hilbert space of three adjacent superspins with dimension χ T . The key point for such a simplification is that D is expressed in terms of the local operators, i.e., therefore, the time evolution of the local operator has the form with a simple tensor-network representation, shown in Fig. 3(b), where most of the unitary tensors are annihilated to identity. Similarly, one can find an efficient tensornetwork representation for time evolution of the wave function e −itH |ψ (0) .

V. NUMERICAL RESULTS
We analyze the algorithms presented by studying the Heisenberg chain with random magnetic fields given by the Hamiltonian where − → S are spin-1/2 operators. The fields h i are drawn from a uniform distribution [−W, W ], where W is called the disorder strength inducing different many-body phases. Exact diagonalization studies [38,53] (N 20) have predicted an ETH phase for W 3.5 and a MBL phase for W 3.5. Additionally, we test the iTNC ansatz in a tensor-network QR decomposition of a PEPS column M, see Figs. 4(a) and 4(b). The tensors in Q are represented by a circuit, which is made of a few layers of unitary gates with a layer of isometric tensors on the top. We can use the network structures and optimization schemes introduced in this paper to benchmark the accuracy of tensor-network QR decomposition and study possible improvements. Our initial PEPS is constructed from the PEPS ground state of the two-dimensional Heisenberg model on a l y × l y square lattice [58]. The PEPS bond dimension is denoted byD.

A. Comparing the convergence rate of the optimization methods
In this section, we benchmark the optimization methods introduced in Sec. III B and Sec. III C. We begin by studying energy variance σ 2 as a function of iteration number. The results are for the Heisenberg model with disorder strength W = 6, located in the MBL phase. Note that the optimization methods could be generally applied to different types of geometrical networks and cost functions.
We apply the optimization methods to a regular architecture, see Fig. 1(f), with circuit depth τ = 5. We first initialize the two-body unitary tensor by identity {u} = {I}, then start the optimization sequentially by sweeping through the local tensors. We plot the results in Fig. 5(a), where we observe that the linearizing algorithm and CG with an effective line search provide accurate results, similar to CG with Armijo line search. Both algorithms approach the same accuracy level similar to that of the CG/SD with Armijo line search, but with much fewer iterations. In this case (the type of our cost function), we do not observe a significant difference between CG and SD with the same line-search algorithm. In Fig. 5(b), we seek how the CG method performs with different values of polynomial order p for a binary uTNC. It shows that a small polynomial order p = 3 is enough to efficiently reach converged results. In this case, we observe that the linearizing method gets stuck in a local minimum. Empirically, we find that the linearizing method might not be able to find converged results for the ternary and binary uTNCs (where the number of variational parameters per tensor increases rapidly by τ ), but it is accurate for the regular and irregular uTNCs. The CG with effective line-search seems to converge to the actual minimum always. This is an important advantage of the CG with effective line search compared to other algorithms, providing huge speed-up without sacrificing accuracy.
To estimate the computational speed-up of the improved optimization schemes, we plot actual running time (directly calculated on a PC with six core processor) t averaged on 50 realizations versus the number of layers τ . Empirically, we notice that the linearizing algorithm is faster than CG with We see that this behavior remains the same for all structures, as convergence-rate/accuracy is almost independent of tensor-network architecture. As expected, by increasing the depth, we obtain a larger speed-up factor, up to ∼8-10. We expect to obtain larger factors by increasing system size N and τ compared to the CG/SD with Armijo line search. In our calculation, we have set polynomial order p ∼ 3, 4, as we previously noted that a small polynomial order is enough to obtain converged results in most cases. In challenging cases, we might need to increase it up to p ∼ 4, 5 to ensure the results are converged.

B. The accuracy of the tensor-network architectures
We study the performance of the tensor-network architectures by studying the energy variance σ 2 . In Fig. 6(a) seen that the binary and ternary uTNC outperform the regular and irregular ones. The main reasons for better performance in the binary and ternary structures are (i) having a larger number of variational parameters (per tensor) and (ii) effective connectivity of tensors efficiently capture the localization length of the system [41,55,56].
Further, the binary uTNC could outperform the ansatz introduced in Ref. [41]. To this end, we present results of the cost function versus the disorder strength for the regular uTNC with two layers τ = 2 and varying block sizes l (exactly similar to the one used in Ref. [41]) and a binary uTNC with χ = 4 shown in Fig. 1(d). The results are presented in Fig. 6(b), where we observe that the binary ansatz provides better accuracy compared to previous ansatz [41] even with block size l = 6. Empirically, we observe that the running time of the binary uTNC is almost similar to regular uTNC with τ = 2 and l = 4. The better performance of binary ansatz is understood from having deeper layers of unitary gates, as mimicking more accurately localization length than a simple regular structure. We now compare the tensor-network structures, shown in Figs. 1(c)-1(f), by studying the energy variance and the tensor-network QR decomposition. In Table I, we have presented the results for disorder strength W = 8 and the PEPS columns with length l y = 10 and bond dimensionD = 3. The ternary uTNC provides the best accuracy as it includes more variational parameters (with an entangled structure) compared to other ones. The iteration number to obtain converged results is related to variational parameters, thus the running time to perform the ternary uTNC is a factor of ∼2 larger than the binary uTNC, while it remains (almost) the same for the binary, regular and irregular uTNC. Surprisingly, the irregular and regular uTNC provides the same level of accuracy, as one might expect to see the irregular structure (due to a nonlocal connection of tensors) result in better performance. A global energy optimization might change these results (compared to the local optimization technique used here) as the irregular structure includes nonlocal connections.
We end this section by presenting some simple benchmark results for time evolution. We study the time evolution of the entanglement entropy after a local quench. We choose the initial state to be a product state in s z basis with a spin-flip operator on the ith site, i.e., s i x |0 ⊗N . We use the methods developed in Sec. IV to study the dynamics of entanglement entropy: We plot the time-dependent entanglement entropy S(t ), obtained from the von Neumann entropy of e −itH |ψ , as a function of time t in Figs. 6(c) and 6(d). The result is presented for the Heisenberg chain with random magnetic fields with W = 5 (one realization). We observe that by increasing the accuracy of the uTNC ansatz, we can reproduce the exact results. A relative error of order σ 2 /2 N ∼ 10 −3 is enough to qualitatively mimic the exact time evolution result.

VI. CONCLUSION
In this paper, we have studied the optimization methods of uTNC with different network structures. We discussed in detail an efficient local optimization of energy variance based on MERA techniques. We studied a CG method with an effective line search and empirically show that it provides the most efficient scheme compared to the linearizing algorithm and the Armijo-based algorithm. The method makes the ansatz faster by a large prefactor that linearly increases with the system size and circuit depth. Empirically, it is noted that the uTNCs with a MERA-like structure provide the best performance, compared to previously proposed structures, as observed by the benchmarking results presented for a tensor-network QR decomposition and estimating the eigenspectra of a system exhibiting MBL. We also presented a time evolution algorithm based on uTNC to evaluate the time-dependent entanglement entropy of a system exhibiting MBL after a local quench, as studied for the Heisenberg model in a disordered magnetic field.
Future research may focus on developing global optimization schemes for (nonlocal) network structures (especially in two dimensions) and its applications in the canonical PEPS ansatz, studying phase transition in MBL systems and real-time evolution. The network structures and optimization schemes introduced in this paper could be used in improving quantum circuit algorithms on the noisy intermediate-scale quantum (NISQ) computers [23,25,26].