Efficient representation of fully many-body localized systems using tensor networks

We propose a tensor network encoding the set of all eigenstates of a fully many-body localized system in one dimension. Our construction, conceptually based on the ansatz introduced in Phys. Rev. B 94, 041116(R) (2016), is built from two layers of unitary matrices which act on blocks of $\ell$ contiguous sites. We argue this yields an exponential reduction in computational time and memory requirement as compared to all previous approaches for finding a representation of the complete eigenspectrum of large many-body localized systems with a given accuracy. Concretely, we optimize the unitaries by minimizing the magnitude of the commutator of the approximate integrals of motion and the Hamiltonian, which can be done in a local fashion. This further reduces the computational complexity of the tensor networks arising in the minimization process compared to previous work. We test the accuracy of our method by comparing the approximate energy spectrum to exact diagonalization results for the random field Heisenberg model on 16 sites. We find that the technique is highly accurate deep in the localized regime and maintains a surprising degree of accuracy in predicting certain local quantities even in the vicinity of the predicted dynamical phase transition. To demonstrate the power of our technique, we study a system of 72 sites and we are able to see clear signatures of the phase transition. Our work opens a new avenue to study properties of the many-body localization transition in large systems.


I. INTRODUCTION
Many-body localization (MBL), a phenomenon conjectured by Anderson in 1958 for disordered, interacting quantum particles 1 , occurs in an isolated quantum system when it fails to reach thermal equilibrium.It was shown to exist within perturbation theory for shortranged interacting models with sufficiently strong disorder for states even at a finite energy density 2,3 .Strikingly, in one dimensional models the entire many-body spectrum can be localized 4,5 , known as full many-body localization (FMBL).As opposed to a thermalizing system where the eigenstates exhibit volume law entanglement and satisfy the eigenstate-thermalization hypothesis (ETH) 6,7 , for a one-dimensional system exhibiting FMBL, all the eigenstates of the Hamiltonian are expected to obey an area law 8,9 .
The breakdown of thermalization lends itself to several interesting phenomena which are absent in a thermalizing system 10 .Topological and symmetry breaking orders, which are destroyed by thermal fluctuations at equilibrium, can be extended to highly excited states at a finite energy density due to MBL [11][12][13][14][15] .Logarithmic growth of entanglement in the FMBL phase may allow the construction of logical qubits in an interacting system which can serve as robust quantum memories 16,17 .Even the quantum phase transition between the thermal and MBL phases is not described by any of the conventional theories of phase transition [18][19][20][21] .Recent developments in cold atoms and various forms of synthetic quantum matter have allowed the experimental study of the phenomena of thermalization and its breakdown in a controlled manner [22][23][24] .
Many of the features of FMBL can be understood in terms of an extensive set of emergent quasi-local, exact integrals of motion (qLIOM) [25][26][27][28] .The phrase quasi-local here indicates that if we trace over the operator within a region of size x around where the operator is localized, we should obtain an operator proportional to the identity up to corrections that decay as e −x/ξ L with localization length ξ L .(We use the term quasi-local as compared to strictly local which would mean that outside of a sufficiently large finite sized region we obtain an operator exactly proportional to the identity.) A proof of the existence of qLIOMs for a strongly disordered, one-dimensional spin-1/2 model shows that this characteristic of the non-ergodic phase is true at least deep in the MBL phase 29,30 .This emergent integrability is successful in capturing much of the phenomenology in 1D, developed based on exact diagonalization of small systems.But in the absence of numerics on sufficiently large systems or a mathematical proof, its generalization to weaker disorder or higher dimensions remains under intense investigation 31,32 .
In the FMBL phase, the entire spectrum of the manybody Hamiltonian can be described in terms of the quantum numbers of the qLIOMs, as opposed to the case where there is a many-body mobility edge in the spectrum [33][34][35][36] .As a consequence, all the eigenstates obey an area-law of entanglement 8,9 , which then allows the use of highly efficient approximations involving tensor networks [37][38][39][40][41][42] .The efficiency of these techniques is exemplified for states with area-law entanglement which can be described numerically using exponentially fewer parameters, than are required for an arbitrary state in Hilbert space.Such techniques are impressively (and provably) efficient for describing gapped ground states in one dimension 38,42 .Increasingly, similar techniques have been computationally effective in two dimensions as well 41,43 .For FMBL systems the area law holds not only for the ground state, but for the entire spectrum.
Exploiting this area-law entanglement, excited eigen-states of FMBL systems can be approximated efficiently as matrix product states (MPS) 44 .Furthermore, the unitary operator diagonalizing the entire Hamiltonian can be represented as a tensor network, known as a spectral tensor network 45 .The algorithm to construct such a spectral tensor network proposed by Pekker and Clark does not scale efficiently with system size 46 .Constructing the unitary using a Wegner-Wilson flow approach also appears to be limited to small system sizes 47 .A proposal with an efficient scaling was given by Pollmann et al. using stacked layers of unitaries, i.e., a quantum circuit, by minimizing the fluctuations in the total energy 48 .It was suggested that the accuracy of the approximation for a given chain length can be increased by increasing the number of layers (the depth of the quantum circuit).
Compared to the methods targeting eigenstates within an energy window [49][50][51][52] , this procedure is constructed to efficiently represent all eigenstates with sufficient accuracy, providing access to dynamical properties of local observables.
In this work we improve upon the ansatz from Ref. [48]  by increasing the size of the block of spins acted upon by the unitaries, while keeping the number of layers fixed at two.This corresponds to a quantum circuit of fixed depth with gates acting on several qubits.We provide analytic arguments and show numerically that this gives rise to an exponential improvement of the computational time and memory requirements.Our scheme constitutes the first scalable representation of the full set of eigenstates of FMBL systems by tensor networks: Local observables can be approximated with an error that decreases like an inverse polynomial of the computational cost.We use a figure of merit which is directly related to the qLIOMs and motivated by a procedure introduced by Kim et al. to identify slow operators in disorder-free non-integrable models 53 .This strongly reduces the computational cost of the tensor network (TN) contractions needed to optimize our unitaries compared to using the variance as a figure of merit (as in Ref. [48]).For concreteness, we consider the one dimensional random field Heisenberg model.We compare the numerical performance of our scheme and the one originally proposed by Ref. [48] even extending their study to four layers (albeit with our figure of merit to improve computational efficiency).We find our strategy to be both more accurate and computationally efficient.
Specifically, we quantify the performance of our scheme by minimizing the commutator of the Hamiltonian with the approximate, local integrals of motion defined through our TN ansatz.As we show, this figure of merit decomposes into strictly local parts, which allows us to evaluate it with linear cost in the system size, thus enabling us to reliably assess the performance of our ansatz in the regime where exact diagonalization is unavailable.We corroborate this by comparing the optimized TN with exact diagonalization results for 16 sites, where we observe that the numerical value of the figure of merit indeed reflects how well the real MBL energy spectrum is approximated.We find a very high accuracy of our ansatz for unitaries acting on eight contiguous sites, and thus use the same procedure to tackle a chain with 72 sites as a function of the disorder strength.Remarkably, the ansatz fares extremely well for local observables at weak disorder and close to the MBL-to-thermal phase transition in this model.We use the fluctuations in the half-cut entanglement entropy calculated with this ansatz to estimate the location of the transition 36 which is in agreement with the exact diagonalization studies.
In Sec.II we define the model used to perform our calculations and also highlight the phenomenological features of the FMBL phase in one dimension.In Sec.III and IV, we give a detailed description of the tensor network ansatz and the figure of merit used to diagonalize the full Hamiltonian efficiently.The numerical results and their comparison to exact diagonalization are presented in Section V.The scaling of the procedure with the total number of spins and its performance close to the MBL-thermal transition is also discussed in this section.In Sec.VI we present a summary of the results and future directions for the method.

II. MODEL AND ITS PHENOMENOLOGY
We consider the canonical random-field Heisenberg model defined on a spin-1/2 chain 4 of N sites with open boundary conditions, with S i = 1 2 σ i and each of the h z i is chosen from the uniform distribution bounded between [−W, W ], where W is called the disorder strength.The model is known to have a dynamical phase transition into the MBL phase where all states are localized for disorder strength greater than W c ≈ 3.5 4,33 .
In the FMBL regime the bare physical spins in the model (also known as 'p-bits') can be be unitarily transformed into an extensive set of mutually-commuting quasi-local effective spins τ z i (also known as 'l-bits') which are expected to commute exactly with the Hamiltonian. [H, where τ z i = U σ z i U † .U is the unitary operator which exactly diagonalizes the Hamiltonian.In the localized phase, the unitary transformation U can be decomposed into a sequence of local unitaries so that the l-bits develop exponentially decaying tails away from site i 29 .More mathematically, with positive constants a, ξ L for N r.We use the 1-norm A 1 = jk |A jk | and consider the matrix representation of τ z i − σ z i in a fixed basis.ξ L can be defined to be the localization length of the MBL system where the trace is taken over the collection of spins within a distance r of site i.It is important to note that the definition of the localization length is not unique.There can even be multiple localization lengths and some of them may not diverge at the MBL-thermal transition 25 .According to Eq. ( 2), the Hamiltonian and the set of lbits {τ z j } can be simultaneously diagonalized where every eigenstate is a product state in the l-bit basis.Each eigenstate can be uniquely labelled by the eigenvalues i j = ±1 of the set of l-bit operators {τ z j } (j = 1, . . ., N ), |ψ i1i2...i N .In the l-bit basis the Hamiltonian can be expressed in the following form, where the coefficients J ijk... typically decay exponentially with the largest distance between two spins |i − j| occurring in a particular cluster in the expansion.The probability of a coefficient being substantially larger than the typical value expected from this exponential decay, is also exponentially small 29 .

III. TENSOR NETWORK ANSATZ
Tensor network states are believed to provide an efficient representation of the ground states of local gapped Hamiltonians.That is, as the system size is increased, the number of parameters required to approximate the ground state wave function with a certain fidelity (e.g., with at least 99 % overlap) increases only polynomially with the system size.For MBL systems with sufficiently strong disorder (i.e., in the absence of a mobility edge), the whole spectrum of eigenstates fulfills the area law and thus, can be efficiently represented by MPS 44 .However, since the number of eigenstates is exponential in the system size, for large N one can only tackle the eigenstates in a certain energy window using MPS (see e.g.Refs.[49-51]).On the other hand, spectral tensor networks are meant to encode an approximation to all eigenstates at once, which is a desirable property if one aims to calculate dynamical properties of local observables in MBL systems.We build on the tensor network ansatz proposed in Ref. [48].It defines a unitary matrix Ũ , which approximately diagonalizes the Hamiltonian, in terms of many 4 × 4 unitaries, which are stacked in several layers and contracted as shown in Fig. 1.
In the following, we argue that for this tensor network the approximation of local observables with a given accuracy requires the computational resources to grow superexponentially with the localization length.In contrast, for the tensor network we will suggest, they scale only exponentially with the localization length.In addition, for a fixed localization length, the error of local observables is expected to decrease as the inverse of a polynomial function in computational cost.
In Ref. 48, the best tensor network approximation is found by minimizing the sum of the energy variances of all approximate eigenstates | ψi1...i N .The computational cost for the calculation of this quantity scales as 2 5n , where n is the number of layers (note that we will introduce a figure of merit below for which the minimization would only require a computational cost of order 2 3n ).However, it appears that n needs to grow exponentially with the localization length ξ L in order to keep the accuracy of the approximation fixed on average: Within distances smaller than ξ L there are no particular restrictions on the elements of the unitary U which diagonalizes the Hamiltonian.The number of real parameters required to describe the unitary within that range is expected to typically scale as 2 2ξ L .Therefore, in order to reproduce local observables with a given accuracy, of the order of 2 2ξ L N parameters are required.Since the number of parameters of the tensor network in Fig. 1 is only 4 2 N n 2 = 8N n, n is required to grow as 2 2ξ L .It follows that the computational resources are required to grow superexponentially with ξ L for a fixed accuracy.This makes it hard to approach the transition into the delocalized phase using the multi-layer ansatz.
We propose to overcome this problem by increasing the range of sites acted on by the building block unitaries, instead of varying the number of layers.Thus, we stick to two layers of unitaries with lower and upper "legs" ( is even) contracted as shown in Fig. 2. Each unitary has 2 2 real parameters, i.e., the total number of parameters is 2 2 +1 N/ .Hence, needs to grow only linearly with ξ L in order to keep the accuracy fixed.The contraction cost of the tensor network arising in the variational optimization of its unitaries scales only exponentially in (as discussed in the following section), which is an exponential improvement over the the multi-layer ansatz.Moreover, for a fixed localization length ξ L , we anticipate the error of our approximation to decrease as exp(− /ξ L ) due to Eq. ( 3), allowing us to describe eigen- states more accurately closer to the MBL transition.As the computational cost is exponential in , this corresponds to a decrease in the error of local observables as the inverse of a polynomial function in computational resources, which is typical for ground states using tensor network states.On the other hand, for the multilayer ansatz 48 , based on the above argument we expect an error of order exp(−2 log 2 (n)/ξ L ), which is computationally less efficient.If one chooses both and n larger than 2, the computational cost is approximately of order exp(− log 2 (n)/ξ L ), since the above scaling argument also holds if the unitaries act on several sites within the multilayer ansatz: The number of parameters of the tensor network would increase only linearly in the number of layers and thus, reduce the error of the approximation only as the inverse of a polynomial while increasing the computational cost exponentially.Hence, keeping the number of layers fixed at two and investing computational resources only into longer unitaries fares substantially better.
Finally, note that in both the ansatz of Ref. [48] and in our approach, the number of parameters required to represent the exact diagonalizing matrix U with a given accuracy increases linearly with N in the FMBL phase.

IV. FIGURE OF MERIT
In order to find the unitary Ũ as described by our tensor network which is as close as possible to the unitary U exactly diagonalizing the MBL Hamiltonian, we define a figure of merit which reflects the deviation between the two.This can be achieved by defining the approximate l-bits corresponding to Ũ , τ z i = Ũ σ z i Ũ † .If they were the exact l-bits, they would commute with the Hamiltonian and with each other.The latter property is fulfilled by construction, so we define the error in our approximation as the sum of the (squared) trace norms of the individual  5).The multiplications from left to right in Eq. ( 6) correspond to top to bottom in the figure.The indices of the lower wiggly lines are to be contracted with those of the corresponding upper wiggly lines.For a given position i of the σ z operator and arbitrary positions j, k of the two-body Hamiltonian terms, all unitaries of the lower layer (i.e., un,1) apart from the ones directly connected to the σ z operators cancel with their adjoints and can be replaced by identities (i.e., straight vertical lines).In the example in the figure this corresponds to all unitaries un,1 for n = x.Furthermore, all unitaries of the second layer (un,2) which are not directly connected to the remaining ones of the first layer cancel and can be substituted by identities.This implies that the x-th summand in Eq. ( 6) depends only on the unitaries ux+1,1, ux,2, ux+1,2.The contraction corresponding to this term is shown in Fig. 4.
commutator of τ z i with the Hamiltonian, In the following, we call f the sum of the commutator norms (SCN), which will be our figure of merit.In order to minimize f , we evaluate the right hand side of Eq. ( 5), which may naively appear exponentially hard in the number of sites N .However, it is possible to break it down into a sum of local terms, rendering the com-putational complexity linear in the system size.To that end, we first express the Hamiltonian as a sum of terms h i acting on two neighboring sites i, i + 1, H = N i=1 h i .Then, the last term of Eq. ( 5), N i=1 tr (τ z i H) 2 , can be easily written as a sum of tensor networks, see Fig. 3.This term can be further decomposed into local parts as depicted in Fig. 4 (using τ ) itself is a sum of tensor networks which only depend on u x,1 , u x−1,2 , and u x,2 , the Hamiltonian terms h j and the σ z i operators which are connected to those unitaries.Those tensor networks can be contracted by multiplying matrices of size up to 2 +1 ×2 .As explained in Appendix A, this results in a computational cost for the calculation of each f x (u x,1 , u x−1,2 , u x,2 ) which scales as 3 2 3 , giving rise to an overall scaling of N 2 3 2 (as there are N terms f x ).This scaling law is a result of the figure of merit we chose.On the other hand, the minimization of the variance as in Ref.48 requires the contraction of a matrix product operator (MPO) and scales as LD 5 χ 2 d 4 (see their appendix), where L is the number of tensors of the MPO, D is its bond dimension, χ is the bond dimension of the MPO representing the Hamiltonian (which does not depend on the block size ) and d is the physical dimension of the MPO tensors.
The most efficient way to obtain such an MPO in our case is to cut each unitary vertically through the middle by performing a singular value decomposition, giving rise to a number of singular values and bond dimension of D = 2 .Afterwards, one blocks upper and lower layers together.The physical dimension per tensor is then d = 2 /2 , leading to a scaling of N 2 7 (since L = N 2 ).Our figure of merit thus has a much lower computational complexity as is increased.Finally, we note that the extension of our approach to higher dimensions is still numerically efficient, since then our figure of merit still decomposes into local contributions.In contrast, calculating the variance would require the contraction of a projected entangled pair state 43 , which cannot be done exactly for large system sizes.

A. Optimization method
In the following section, we will approximate the eigenstates of the Hamiltonian defined in (1).The model possesses U (1) symmetry (it conserves the total spin-z component), [H, N i=1 S z i ] = 0. Furthermore, the Hamiltonian is real in the σ z -basis.In conventional tensor net-  6).Again, the indices of the lower wiggly lines are to be contracted with those of the corresponding upper wiggly lines.The shown tensor network is obtained after replacing mutually cancelling unitaries in Fig. 3 by identities (vertical lines).Those forming closed loops yield a factor of 2 each, which results in the prefactor 2 −N +2 +2 of fx shown in the figure.Terms j, k where hj or h k are not connected to ux−1,2 or ux,2 yield contributions which are independent of all unitaries ux,y and can thus be neglected in the local definition of our figure of merit.Note that the precise positions of σ z i , hj and h k depend on the indices i, j, k that are being summed over, and thus the graphic depicts one example configuration.For details of our contraction scheme, see Appendix A.
work states, symmetries of the model can be imposed on the individual tensors 54,55 : Any tensor network state that is invariant under a symmetry can be written as a (possibly different) tensor network state, where all its individual tensors form a projective representation of the corresponding symmetry group, that is, they are invariant up to a phase under the action of the symmetry.In doing so, the dimensions of the tensor indices might have to be increased by a factor that is independent of the system size.The cost of variational optimization of the tensor network states usually reduces tremendously by imposing such symmetries on the tensors, as they become sparse and have fewer variational parameters.We implement a similar procedure for our spectral TN: We impose it to be real by taking all tensors as real, i.e., its unitaries are orthogonal matrices.To ensure that the total spin-z component is conserved, each individual ten-sor u x,y is assumed to leave the total spin-z of the block invariant, i.e. [u x,y , q=1 Ŝz q ] = 0, where Ŝz q is defined in the same Hilbert space as u x,y .Graphically speaking, this means that the sum of the spin-z components on the lower legs of each tensor has to equal the sum of the spin-z components of the upper legs (remember that all indices have dimension two, corresponding to spin-1/2 particles).All tensor entries whose indices do not fulfill this requirement are forced to be zero.This leads to a block structure of the matrix which is obtained by grouping upper and lower legs together into one single index each.Each of these blocks, say u B , can be parameterized by an antisymmetric real matrix A B , u B = e A B , making the unitaries real and U (1) symmetric.
In order to carry out the optimization, we pick initial values for the antisymmetric matrices A B parameterizing the unitaries and optimize the unitaries individually by sweeping from the left end of the chain to the right and back, until convergence is achieved.Crucially, each such minimization step requires only the evaluation of a few terms in the sum of Eq. ( 6).As it turns out, faster convergence is achieved by always optimizing two connected unitaries at once.
We use a quasi-Newtonian routine supplied with the gradient with respect to the parameters contained in the matrices A B .This gradient comes almost for free in the contraction of the tensor network of Fig. 4 if one contracts its tensors in the right order, as explained in more detail in Appendix B.
As it turns out, the final SCN figure of merit depends on the choice of the initial unitaries.For = 2, best results are obtained by initializing the unitaries as identities and for larger if they are initialized according to the optimal tensor network obtained for smaller blocks.For = 4, 8, that is where 1 is the 2 /2 -identity matrix.Note that the obtained unitaries are also real and invariant under U (1) symmetry.For = 6, we could only initialize the unitaries with the blocked optimal = 2 unitaries for a given disorder realization.This corresponds simply to choosing u =6 x,y = u =2 3x−3+y,y ⊗ u =2 3x−2+y,y ⊗ u =2 3x−1+y,y .

B. Comparison to exact diagonalization
In order to demonstrate the precision of our method for efficiently representing U in the FMBL regime, we performed the optimization defined in Sec.V A for a system of size N = 16 with disorder strength W = 6 and 10 different disorder realizations using unitaries with = 2, 4, 8 legs.We compare our results to the energies and the eigenstates of the Hamiltonian obtained using exact diagonalization (which was performed taking advantage of the U (1) symmetry of the model).The results are shown in Figs. 5 and 6.
In Fig. 5 the distribution of the differences between the (ordered) diagonal elements of the matrix Ũ † H Ũ and the exact energies (defined to be ∆E) are plotted for the chosen values of .The distribution narrows tremendously with increasing with a sharp peak at ∆E = 0.The mean of the optimized value of the SCN figure of merit was f =2 /2 N = 1.011, f =4 /2 N = 0.194, f =8 /2 N = 0.0203., showing a rapid decay with .(For an explanation of the normalization factor 2 −N , see subsection V C.) The values for the individual disorder realizations are shown in Table I.The calculation time for a single disorder realization is of the order of 15 seconds for = 2, 10 minutes for = 4 and 4 days for = 8 on a single CPU.We also computed the mean variance of the approximate eigenstates (the figure of merit used in Ref. 48), ∆H 2 averaged over the different disorder realizations was ∆H 2 =2 = 0.2476, ∆H 2 =4 = 0.0405 and ∆H 2 =8 = 0.0035, decaying in a very similar way as the SCN.
We find that the SCN reflects reliably the accuracy of our approximation method and thus, captures the extent to which the Hamiltonian is diagonalized by the optimal unitary matrix Ũ .Therefore, for larger systems, where exact diagonalization is unavailable, we can use the SCN in order to assess the quality of the approximation by our tensor network.
As a further corroboration, we computed the overlaps between the exact eigenstates and the approximate ones for = 8.The overlaps are in general very high, see Fig. 6(a): More than 99 % of them have more than 60 % overlap, with a strong peak close to an overlap of 1, which is an extremely high accuracy given that the Hilbert space dimension is 2 16 = 65, 536.To show that the local properties of the eigenstates also match to high degree of accuracy, we compare the distribution over all sites of the expectation value of σ z i evaluated in all the eigenstates.In Fig. 6b the distributions resulting from exact diagonalization and the spectral tensor network overlap to a remarkable precision, showing that the method has indeed converged to the eigenstates with the appropriate local features.
For comparison, we also optimized the unitaries using the ansatz in Fig. 1 with four layers for the same 10 disorder realizations.This corresponds to the scheme proposed in Ref. [48], extending their explicit numerical study of a network of two layers to four layers (and also using our figure of merit rather than theirs to reduce computational time).We initialized the unitaries in one series of calculations as identities and in another one with the optimized two-layer result (choosing the remaining unitaries as identities) and took the best 4-layer result in each individual case.The histogram of the error in energy is shown in Fig. 7.The U (1) symmetry and the fact that the unitaries are real (orthogonal matrices) imply that there is only one variational parameter per unitary.We find f /2 N = 0.746 and thus little improvement compared to the two-layer case (Fig. 5a).The computation time per disorder realization is of the order of 20 minutes.
We also carried out such an optimization without imposing any symmetry on the unitaries, i.e., they are parameterized by an arbitrary Hermitian 4×4 matrix H x,y , u x,y = e iHx,y with 16 variational parameters per unitary.We found that the figure of merit hardly improves over its initial value, using a standard quasi-Newtonian minimization (and minimizing four or more connected uni- taries forming a column at once).This is due to the minimization algorithm being stuck in (bad) local minima, which appears to be another problem of the multilayer approach.We managed to overcome this obstacle by the following procedure: At each step of the sweep, several completely random initial choices for the parameters of the unitaries that are being varied are optimized individually (along with their original parameters) and ranked against each other, one is able to escape these local minima and obtain much better results.By carrying out 32 of such minimizations at each sweep step, we obtained f /2 N = 0.388, which is a significant improvement over the two-layer case.However, the four layer calculations without imposed symmetries are much more computationally intense, requiring of the order of 1 week per disorder realization.This is of the same order of magnitude as the = 8 calculations with two layers (note f =8 /2 N = 0.0203), while yielding worse accuracies than for = 4 (where f =4 /2 N = 0.194)!Hence, the potential landscape of the multi-layer ansatz, of which the algorithm has to find the global minimum, is tormented by many local minima of poor quality.As expected, we obtain much better numerical results if computational resources are invested into "longer" unitaries as compared to increasing the number of layers.For a comparison of the figure of merit and the variance of the individual disorder realizations, see Table .I.
In summary, despite our efforts, we did not succeed in making the multi-layer = 2 network calculation as accurate or computationally efficient as our multi-leg ansatz.Thus, the numerical results corroborate our analytic arguments that the FMBL system cannot be approximated as efficiently and accurately by increasing the number of layers compared to increasing the number of legs per unitary.We also note that on relaxing the restriction of the unitaries to be real and U(1)-symmetric for four layers, a significant improvement is achieved, as the additional parameters can partially compensate for the lack of parameters that we conjectured.

C. Scaling with the system size
One of the primary objectives of this work is to establish our ansatz for the description of fully many-body localized systems.For a given point in the MBL phase, increasing the system size does not require an increase in to approximate the local properties of eigenstates with a constant accuracy (averaged over disorder realizations).Therefore, the SCN should detect a constant mismatch per lattice site and thus, increase linearly with the system size.However, recall that it is defined as a trace of an operator in the 2 N -dimensional Hilbert space, i.e. a mismatch that is not affected by the sites far away is multiplied by the trace over the identity operator corresponding to them, which grows as 2 N .As a result, the SCN averaged over many disorder configurations should ).The = 4 ansatz performs better for all disorder realizations than the multi-layer approach.Bottom: Comparison of the variance ∆H (Eq.( 9)) for the same categories (apart from the asymmetric 4-layer case, where we were not able to compute the variance due to memory constraints).We also gather that the values of the variance and the SCN are very closely related.
grow as N 2 N .We corroborated this by optimizing our tensor network ansatz for = 2, 4, 6 and 8, and system sizes in the range between N = 12 and 72 for 100 disorder realizations for l = 2, 4, 6 and 10 for l = 8, as shown in Fig. 8.We gather that on average f /2 N indeed increases linearly with system size for all choices of .
The results in subsection V B show that the MBL eigenstates are well-represented by the optimized tensor network, i.e., by minimizing the SCN we obtain an overall unitary matrix Ũ that approximately diagonalizes the Hamiltonian to a very high accuracy for = 8.The linear dependence on N of the optimized SCN (divided by the dimension of the Hilbert space (2 N )) suggests that, in the localized region, expectation values of local observables in any eigenstate can be approximated with an error that depends only on and not on the size of the system.Hence, our method is able to approximate local properties of eigenstates for large system sizes, where exact diagonalization is not available.We note that while the full set of eigenstates is encoded approximately in our tensor network, computing the eigenspectrum from it would be exponentially hard, but this is not required in practical calculations.

D. Scaling with the block length
The disorder dependence of the figure of merit (scaled by 2 N ) is shown in Fig. 9 for the different values of at N = 72.The improvement of the accuracy with increasing can be gathered from Fig. 10: Deep in the localized phase the relevant quantity decays almost by an order of magnitude from = 2 to = 4 and again from = 4 to = 6.For = 8 the improvement is slightly less, presumably because our algorithm tends to get stuck in local minima.(Note that the number of parameters per unitary is 6307 for = 8.) Nevertheless, we get by far the most accurate results for = 8.As can be gathered from Fig. 10, the results point to a polynomial decay with in the delocalized phase (W < 3), whereas in the localized phase (W > 6) the decay with is exponential, as

expected.
Besides the fact that the approximation becomes worse as one approaches the transition, the SCN does not show any signature of the phase transition.In the following subsection, we investigate in more detail the accuracy of the eigenstates in the weakly disordered regime and the effects of the approaching phase transition into the thermal phase.

E. Approaching the many-body localization transition
Deep in the MBL phase, existence of qLIOMs makes it amenable to approximate the eigenstates of large systems with very high accuracy using TNs.As we approach the transition into the thermal phase at weaker disorder, the eigenstates become more entangled.The ansatz with a larger number of legs is able to capture the regions of high local entanglement, allowing the method to perform appreciably well even close to the phase transition.In Fig. 11 the distributions of the local observable σ z i evaluated in all the eigenstates of an N = 12 system for 100 disorder realizations, at disorder strengths W = 2, 3, 4 and 6 are shown for = 6.The average optimized SCN figure of merit f =6 /2 N was 0.134, 0.078, 0.054 and 0.023, respectively.
The distribution evaluated using the approximate eigenstates from the tensor network ansatz matches remarkably well with the exact diagonalization results in FIG.10.Optimized SCN averaged over disorder realizations as in Fig. 9 shown as a function of for various disorder strengths W .The error bars denote the error of the mean calculated from the distribution over disorder realizations.The inset shows the same data on a log-log plot (with symbol size at least as big as the error bars).For W = 8, 16 the decay of the SCN is approximately exponential for = 2, 4, 6 but deviates for = 8, probably because the algorithm tends to get stuck in local minima.For W = 0.5, 2 the decay with is to a good approximation an inverse power law (exponents −1.3 and −1.9, respectively).the vicinity of the MBL transition.The comparison with the data from exact diagonalization is good even at disorder strength W = 2, which is expected to be on the 'thermal' side of the phase transition.However, we cannot expect this to be the case if N is increased, as opposed to the localized phase, where local observables can be reproduced with a constant accuracy for fixed .Instead, would need to be scaled with N to keep the accuracy fixed 48 .In this regime the eigenstates from the tensor network ansatz have larger weight in the distribution at σ z i ≈ ±1, which suggests that the ansatz does not fully capture the local features of the eigenstates.The finite number of legs in the local unitaries of our tensor network ansatz enforces the qLIOMs to be always approximately conserved, but strictly local.Thus, our TN ansatz cannot resolve whether there are exactly conserved qLIOMs in the vicinity of the phase transition.
We finally turn to an extremely sensitive test of the approximate method's ability to reproduce subtle details of the MBL system.We evaluate the fluctuation in the half-cut entanglement entropy σ S (the standard deviation over disorder realizations of the entanglement entropy), where the entropy was averaged over all approximate eigenstates for = 2, 4, 6 and N = 12 for the TN and exact diagonalization.We performed these calculations over a wide range of disorder strengths.The quantity was evaluated using 100 disorder realizations.In exact diagonalization studies, this quantity has a peak at the MBL-ETH transition which is expected to diverge with system size 36 .Although our ansatz cannot represent any volume-law entangled states, it is expected to capture the entanglement structure at length scales of FIG. 13.Standard deviation of the eigenstate-averaged halfcut entanglement entropy, followed by an average over entanglement cuts, σS cuts (defined in the text), as a function of disorder strength.The system size is N = 72 and the calculations were performed on 100 disorder realizations using the TN with = 2, 4, 6 and 10 disorder realizations using = 8.For = 6 and = 8, the entanglement entropy was sampled over at least 1% of all eigenstates leading to an estimated relative error of about 5% (marked by error bars) of the mean and standard deviation of the entropies.The inset shows the corresponding eigenstate-averaged entanglement entropy, followed by an average over entanglement cuts ( S cuts).The errors in the inset are at most as big as the size of the symbols.
order 2 .In Fig. 12 we indeed see a broad peak close to the value of disorder strength where exact diagonalization gives a relatively narrow peak.As expected, at strong disorder the exact diagonalization and the TN (for = 4, 6) tend towards the same value.We also calculated the entanglement entropy averaged over the approximate eigenstates as given by our tensor network for system size N = 72 as a function of disorder strength W .For a specific disorder realization, the computational cost to calculate such an entropy is independent of N and only depends on : This is due to the fact the partial trace of | ψi1...i N ψi1...i N | gives rise to a reduced density matrix whose non-zero eigenvalues are the same as the ones of a reduced density matrix defined only on the sites which are at most one tensor block away from the entanglement cut, cf.Appendix C.This makes it possible to average over all eigenstates efficiently.
In Fig. 13 we show the statistical mean and standard deviation over 100 (10) disorder configurations of the eigenstate-averaged entanglement entropy as a function of W for = 2, 4, 6 ( = 8).The curves of σ S cuts and S cuts shown are obtained after averaging over different entanglement cuts to improve smoothness.The positions for the respective entanglement cuts have been chosen such that they are at least one tensor block away from the boundaries of the system.We observe maxima in the region 2.5 ≤ W ≤ 3.5.Averaging over entanglement cuts combined with the increased decay of σ S cuts at larger disorder strength makes the peaks much more pronounced than the N = 12 case.
In the insets of Figs. 12 and 13, the average entanglement entropy ( S) increases as the disorder strength goes down but for N = 12 this increase is still slower compared to the exact eigenstates.In the quantum critical regime, there are suggestions that the transition is driven by a subcluster of spins which are weakly entangled 56 .With increasing system size, on the thermal side of the transition the size and the entanglement of the subcluster grows with N , while on the localized side of the quantum critical regime, the entanglement remains small.By varying and N in our TN ansatz, it may be feasible to access this regime numerically which is a question suitable for future work.

VI. CONCLUSIONS AND OUTLOOK
In this work we have made several significant advances in efficiently representing the entire set of eigenstates of fully many-body localized systems.Besides improving upon the tensor network ansatz proposed in Ref. [48], we also optimize the network by minimizing a different figure of merit (the SCN) given by the magnitude of the commutator of the Hamiltonian and the approximate qLIOMs produced by the tensor network ansatz.This figure of merit can be evaluated by decomposing into strictly local terms leading to a much better scaling than the previous figure of merit (cf.Fig. 4).
We have extended the 2-leg, multi-layer tensor network ansatz for FMBL systems 48 to unitaries with several legs while keeping the number of layers fixed at two.We have shown that compared to increasing the number of layers, the extension to multiple legs ( -legs) is far more computationally efficient -obtaining (exponentially) higher accuracies for the same system size and computational cost.
By comparing the energies and eigenstates evaluated using a TN to exact diagonalization for a chain of 16 sites, we demonstrated that our figure of merit (SCN) reflects the accuracy of our method.In the regime where the figure of merit is small, the energy eigenvalues from the TN and exact diagonalization match extremely well.Furthermore, the distribution of expectation values of local observables in the eigenstates also matches very well with the exact diagonalization calculation.Therefore, this method is able to represent all eigenstates simultaneously to a very high degree of accuracy.
We observed that the SCN (normalized by 2 N ) increases linearly with the system size.This shows that our method only incurs a constant error per lattice site, i.e. on implementing our scheme to larger systems, for fixed , local observables can be calculated with a sizeindependent accuracy.Hence, our approximation can be readily used for large system sizes.
In the strongly disordered regime, the error as measured by the SCN decreases exponentially with the number of legs per unitary.At weaker disorder, on approaching the MBL-ETH eigenstate transition, the local properties of the eigenstates are well approximated even close to the transition.For a large system of N = 72 sites, we observed peaks in the eigenstate-averaged fluctuation of the entanglement entropy for = 6, 8 at disorder strengths which are slightly lower than the critical disorder strength W c ≈ 3.5, which is predicted to be the critical point using exact diagonalization.This might indicate that exact diagonalization mildly overestimates the critical disorder strength due to finite size effects.
The accurate construction of all eigenstates in the FMBL phase and in the vicinity of the MBL-ETH transition for large system sizes opens the door to study several fascinating phenomena associated with the subject.As a by-product of the procedure, using our optimized unitary one directly obtains the approximate qLIOM operators in the localized phase.The ability to vary study eigenstates in the vicinity of the MBL-ETH transition suggests that our procedure may be able to capture some of the scaling properties on the localized side of the quantum critical regime of the transition.Given the efficiency of the method, it may be feasible to scale the procedure to numerically address the question of manybody localization in two dimensions.Since, MBL of Floquet systems have a structure similar to that of static Hamiltonians, our method might be generalized to study the spectrum of Floquet systems exhibiting MBL as well.side, we contract the local tensor network of Fig. 4 as shown in Fig. 15 and cut out the tensor the derivative is taken of at the very top or the very bottom, respectively, before taking the overall trace indicated by wiggly lines.The contraction is again most efficiently carried out using the same blocking as in Fig. 14 (where the block from which the unitary has been taken has to be modified).Since we cut out a tensor with lower and upper legs, the result of the contraction is a 2 × 2 matrix, say M .This matrix can be used to obtain both f x by putting back the missing tensor, f x (u x,1 , u x−1 , u x,2 ) = tr(M u x,y ), and the desired deriva- ) without the need for any additional contractions.and given by S = −tr(ρ ln(ρ)).ρ corresponds to the tensor network contraction shown in Fig. 16a and can be simplified by replacing unitaries which are contracted with their adjoints by identities.We obtain a representation for ρ in terms of the unitaries {u 1,1 , u

FIG. 1 .
FIG. 1. Tensor network Ũ as proposed in Ref.48 with n layers of 4 × 4 unitaries {ux,y}.Since the unitary Ũ is supposed to approximately diagonalize the Hamiltonian, the corresponding approximate eigenstates | ψi 1 ,...,i N are the states obtained by fixing the lower open indices in the figure to be i1, . . ., iN .

FIG. 2 .
FIG. 2. Construction of the unitary Ũ in terms of unitaries ux,1, ux,2 acting on sites (in this example = 4).Again, the approximate eigenstates | ψi 1 ,...,i N are the states obtained by fixing the lower open indices in the figure to be i1, . . ., iN .

FIG. 3 .
FIG.3.Sum of tensor network contractions which yields the second term, N i=1 tr (H τ z i ) 2 , in Eq. (5).The multiplications from left to right in Eq. (6) correspond to top to bottom in the figure.The indices of the lower wiggly lines are to be contracted with those of the corresponding upper wiggly lines.For a given position i of the σ z operator and arbitrary positions j, k of the two-body Hamiltonian terms, all unitaries of the lower layer (i.e., un,1) apart from the ones directly connected to the σ z operators cancel with their adjoints and can be replaced by identities (i.e., straight vertical lines).In the example in the figure this corresponds to all unitaries un,1 for n = x.Furthermore, all unitaries of the second layer (un,2) which are not directly connected to the remaining ones of the first layer cancel and can be substituted by identities.This implies that the x-th summand in Eq. (6) depends only on the unitaries ux+1,1, ux,2, ux+1,2.The contraction corresponding to this term is shown in Fig.4.

FIG. 4 .
FIG. 4. Decomposition of the figure of merit (5) into local terms resulting in Eq. (6).Again, the indices of the lower wiggly lines are to be contracted with those of the corresponding upper wiggly lines.The shown tensor network is obtained after replacing mutually cancelling unitaries in Fig.3by identities (vertical lines).Those forming closed loops yield a factor of 2 each, which results in the prefactor 2 −N +2 +2 of fx shown in the figure.Terms j, k where hj or h k are not connected to ux−1,2 or ux,2 yield contributions which are independent of all unitaries ux,y and can thus be neglected in the local definition of our figure of merit.Note that the precise positions of σ z i , hj and h k depend on the indices i, j, k that are being summed over, and thus the graphic depicts one example configuration.For details of our contraction scheme, see Appendix A.

FIG. 5 .
FIG. 5. Comparison of the optimized tensor network Ũ for = 2 (a), = 4 (b) and = 8 (c) with exact diagonalization for N = 16 and ten disorder realizations at W = 6.The energy differences ∆E were obtained by ordering the diagonal elements of Ũ † H Ũ in each spin-z sector and subtracting them from the ordered exact eigenvalues of the Hamiltonian H in the corresponding spin-z sector.The plots show the concatenation of the data of all spin-z sectors and disorder realizations.The optimized SCN figure of merit was on average f =2 /2 N = 1.011, f =4 /2 N = 0.194, f =8 /2 N = 0.0203.

FIG. 6 .
FIG. 6.Comparison of the optimized tensor network Ũ for = 8 with exact diagonalization for N = 16 showing the data for 10 disorder realizations at W = 6.The approximate eigenstates are given by the columns of Ũ .(a) Distribution of the overlap of the exact and the matched eigenstates.(b) Distribution of the expectation value of σ z i over the sites and the eigenstates obtained from exact diagonalization (light brown) and the TN (blue).

FIG. 8 .
FIG.8.Scaling of the SCN figure of merit as a function of system size N for W = 6 and 100 different disorder realizations optimized using unitaries of block sizes = 2, 4, 6 and ten disorder realizations for = 8.For given N , the same hundred (ten) disorder realizations were taken for all values of .As discussed in the main text, we expect f /2 N ∝ N , which is consistent with the numerical results.The error bars denote the error of the mean calculated from the distribution over disorder realizations.The inset shows an enlargement of the data for = 6 and = 8.There, the symbol size is at least the size of the error bars.

FIG. 9 .
FIG. 9. Figure of merit, SCN, for N = 72 as a function of disorder strength W for = 2, 4, 6, 8.The same hundred (ten) disorder configurations were taken for = 2, 4, 6 ( = 8) and all choices of W , adjusting only the overall prefactor of the random magnetic fields.The error bars denote the error of the mean calculated from the distribution over disorder realizations.

FIG. 11 .
FIG. 11.Comparison of the eigenstates from the optimized TN Ũ for = 6 and exact eigenstates from exact diagonalization for N = 12 and 100 disorder realization at disorder strengths (a) W = 2, (b) W = 3, (c) W = 4 and (d) W = 6.We present the distribution of the expectation value of σ z i over the sites, eigenstates and disorder realizations from exact diagonalization (blue) and the TN (light brown).

FIG. 12 .
FIG.12.Standard deviation of the half-cut entanglement entropy (σS) for 100 disorder realizations averaged over the (approximate) eigenstates as a function of disorder strength for the TN with = 2, 4, 6, and exact diagonalization.The system size is N = 12.The inset shows the average entanglement entropy ( S) for the four cases.

FIG. 15 .
FIG.15.The shown tensor network contraction results in 2 −N +2l+2 M , with the 2 × 2 matrix M described in the main text if the unitary of which the derivative is being taken, is cut out on the very top or very bottom (marked in red).The tensor network can be contracted with the same computational scaling as before ( 3 2 3 ) by using the same blocking as in Fig.14.The resulting blocks A (n) j , B (m) k etc. are contracted in such an order that the one which contains the unitary to be varied is contracted with last.E.g., if ux,1 is being varied (which is contained in the upper Qi), one contracts first A (n) j with A (m) k and the resulting tensor with the lower Qi and subsequently with the contraction of B (n) j and B (m) k .
Appendix C: Calculation of the entanglement entropy for large systems The von Neumann entropy of an approximate eigenstate | ψi1...i N for an entanglement cut through the middle of the system is defined via its reduced density matrix ρ = tr N/2+1,...,N | ψi1...i N ψi1...i N | (C1)