Hyper-optimized approximate contraction of tensor networks with arbitrary geometry

Tensor network contraction is central to problems ranging from many-body physics to computer science. We describe how to approximate tensor network contraction through bond compression on arbitrary graphs. In particular, we introduce a hyper-optimization over the compression and contraction strategy itself to minimize error and cost. We demonstrate that our protocol outperforms both hand-crafted contraction strategies in the literature as well as recently proposed general contraction algorithms on a variety of synthetic and physical problems on regular lattices and random regular graphs. We further showcase the power of the approach by demonstrating approximate contraction of tensor networks for frustrated three-dimensional lattice partition functions, dimer counting on random regular graphs, and to access the hardness transition of random tensor network models, in graphs with many thousands of tensors.


I. INTRODUCTION
Tensor network contraction, a summation over a product of multi-dimensional quantities, is a common computational structure.For example, this computation underlies quantum circuit simulation [1][2][3][4][5][6][7][8][9], quantum many-body simulations [10][11][12][13][14][15][16][17], evaluating classical partition functions [18][19][20][21][22][23][24][25], decoding quantum error correcting codes [26][27][28][29][30][31][32], counting solutions of satisfiability problems [25,[33][34][35][36][37][38][39], statistical encoding of natural language [40][41][42][43], and many other applications.The cost of exact contraction scales, in general, exponentially with the number of tensors.However, there is evidence, for example in some many-body physics applications, that tensor networks of interest can often be approximately contracted with satisfactory and controllable accuracy, without necessarily incurring exponential cost [44,45].Many different approximation strategies for tensor network contraction have been proposed [12,13,17,20,23,[46][47][48][49][50].Especially in manybody physics contexts, the approximate contraction algorithms are usually tied to the geometry of a structured lattice.In this work, we consider how to search for an optimal approximate tensor network contraction strategy, within an approach that can be used not only for structured lattices, but also for arbitrary graphs.We view the essential prescription as the order in which contractions and approximate compressions are performed: this sequence can be summarized as a computational tree with contraction and tensor bond compression steps.Within this framework, sketched in Fig. I, the problem reduces to optimizing a cost function over such computational trees: we term the macro-optimization over trees "hyper-optimization".As we will demonstrate in several examples, optimizing a simple cost function related to the memory or computational cost of the contraction also leads to an approximate contraction tree with small contraction error.Consequently, our hyperoptimized approximate contraction enables the efficient and accurate simulation of a wide range of graphs encountered in different tasks, bringing the possibility of eliminating, or otherwise improving on, formal exponential costs.In addition, in the structured lattices arising in many-body physics simulations, we observe that we can improve on the best physically motivated approximate contraction schemes in the literature.
A tensor T is a multi-index quantity (i.e. a multi-dimensional array).We use lower indices to index into the tensor, e.g.FIG. 1. Overview.The approximate contraction is specified by a sequence of contractions and compressions, expressed as an ordered tree.The strategy optimizes a cost-function over such trees.A: The hyper-optimization loop.Approximate contraction trees Υ on the graph G are suggested by the tree generator.The tree characteristics are controlled by heuristic parameters θ and maximum bond dimension χ.The hyper-optimizer minimizes a cost-function M or C (peak memory or computational cost).B: The approximate contraction tree.The tensor network T is shown at the bottom.Moving upwards, pairs of tensors are contracted (blue lines), and singular value compressions are performed between tensors (orange lines).By the top of the tree, one obtains a scalar output Z, using resources ∼ M or C. T i1i2...in is an element of an n-index T , and upper indices to label a specific tensor, e.g.T [1] , T [2] . .., out of a set of tensors.A tensor network contraction sums (contracts) over the (possibly shared) indices of a product of tensors, where {e} is the total set of indices, {e out } is the subset that is left uncontracted, and {e v } is the subset of {e} for tensor T [v] .We can place the tensors at the vertices v of a network (graph), FIG. 2. A. Pairwise exact contraction of a tensor network, with the unordered contraction tree Υunordered indicating the contractions.Each intermediate (green node) corresponds to a pair of parentheses in the expression.B. An approximate contraction tree Υ for same network.Since compression steps do not commute, this tree is ordered.Here, the compressions (orange lines) take place at steps 4. and 6. C. The sequence of contractions and compressions associated with the tree in B. Newly contracted tensors in green, tensors with compressed bonds in orange.
with the bonds (edges) corresponding to the indices {e}.An examples of contraction is shown in Fig. 2A.
In practice the sum in Eq. 1 is performed as sequence of pairwise contractions, and the order of contraction greatly affects both the memory and time costs.Much recent work has been devoted to optimizing contraction paths in the context of simulating quantum circuits [6][7][8][9].Parametrized heuristics that efficiently sample the space of contraction paths, for example by graph-partitioning, are crucial, and optimizing the parameters of such heuristics (hyper-optimization) to minimize the overall cost has proven particularly powerful, leading to dramatic reductions in contraction cost (i.e.many orders of magnitude).
Here we extend the ideas of hyper-optimized tensor network contraction to the setting of approximate tensor network contraction.As discussed above, approximate contraction has a long history in many-body simulation, but such work has focused on regular lattices.Although several recent contributions have addressed arbitrary graphs [30,51,52], with a fixed contraction strategy, they do not focus on optimizing the strategy itself.In part, this is because there is a great deal of flexibility (and thus many components to optimize) when formulating an approximate contraction algorithm, and because an easily computable metric of quality is not clear a priori.
We proceed by first formulating the search space of approximate tensor network contraction algorithms, which we identify as a search over approximate contraction trees.To reduce the search space, we define simple procedures for gauging and when to compress bonds in the tree.We then discuss how to sample the large space of trees, by optimizing the hyperparameters of a contraction tree generator, with respect to the peak memory or computational cost.We use numerical experiments to establish the success of the strategy, comparing to existing algorithms designed for structured lattices and for arbitrary graphs.Finally, using the hyper-optimized approximate contraction algorithms, we showcase the range of computational problems that can be addressed, in many-body physics, computer science, and complexity theory, illustrating the power of approximate tensor network computation.

A. Components of approximate contraction
In an exact tensor network contraction, the computational graph, specified by the sequence of pairs of tensors which are contracted, can be illustrated as a computational contraction tree.This is illustrated in Fig. 2A, where the tensor network is shown by the black lattice at the bottom, and the contractions between pairs occur at the green dots in tree, Υ unordered .Note that the value and cost of the exact tensor network contraction does not depend on the order in which the contractions are performed [53], thus the contraction tree is unordered.The problem of optimizing the cost of exact contraction is thus a search over contraction trees to optimize the floating point and/or memory costs.
In the process of contracting tensors, one generally creates larger tensors, which share more bonds with their neighbors.In approximate contraction we aim to reduce the cost of exact contraction by introducing an approximation error.The most commonly employed approximation is to compress the large tensors into smaller tensors (with fewer or smaller indices); this is the type of numerical approximation that we also consider here.The simplest notion of compression arises in matrix contraction, e.g.given two D × D matrices A, B, the contraction AB ≈ Ā B where Ā is of dimension D × χ and B is of dimension χ × D, and the approximation is an example of a low-rank matrix factorization.The singular value decomposition (SVD) is an optimal (with respect to the Frobenius norm) low-rank matrix factorization.Singular value decomposition is also at the heart of compressing tensor network bonds.For example, if we have two tensors connected by bonds (Fig. 3A), we can view the bonds as performing a matrix contraction (Fig. 3B), and use SVD to replace the connecting bonds by one of dimension χ (Fig. 3D).In the general tensor network setting, however, things are more complicated, because when compressing a contraction between two tensors, one should consider the other tensors in the network, which affect the approximation error.The effect of the surrounding tensors on the compression of a given bond is commonly known as including the "environment" or "gauge" into the compression.We consider how to perform bond compression, including a simple way to include environment effects into the bond compression for general graphs, in section II C.
Given a compression method, we view the approximate tensor network contraction as composed of a sequence of contraction and compression steps.Compressions do not commute with contractions (or each other) thus a contraction tree with compression (an approximate contraction tree) is an ordered tree.An example tree Υ is shown in Fig. 2B, where in addition to the contraction operations (the green dots), we see compressions of bonds between tensors (the orange lines).The ordered sequence of contractions and compressions is visualized in Fig. 2C.If we work in the setting where the compressed bond dimension χ is specified at the start, then once the approximate contraction tree is written down, the memory or computational cost of the contractions and compressions can be computed.Optimizing the approximate contraction for such costs thus corresponds to optimizing over the space of approximate contraction trees.
The space of trees to optimize over is extremely large.This we tackle in two ways: by defining the position of compression steps in the tree entirely in terms of where the contractions take place (discussed in Sec.II D), which means we only need to optimize over the order of the contractions; and by using the hyper-optimization strategy, where (families) of trees are parameterized by a small set of heuristic parameters, constituting a reduced dimensionality search space (described in Secs.II G, II H).
Ideally we wish to minimize the error of the approximate contraction as well as the cost, but the error is not known a priori.This can only be examined by benchmarking the errors of our hyper-optimized contraction trees.This is the subject of section III.
Note that other ingredients could also be included in an approximate tensor network contraction algorithm, for example, the use of factorization to rewire a tensor network, generating a graph with more vertices [20,52]; or inserting unitary disentanglers to reduce the compression error [54,55].We do not currently consider these ingredients in our algorithm search space, although the framework is sufficiently flexible to include such ingredients in the future.We also note that some of these additional ingredients are targeted at the renormalization of loop correlations in tensor networks to yield a proper RG flow [56].We discuss the corner double line (CDL) model and show that it is accurately contractible with our approximate strategy in the SI [57].
To aid in our discussion of the ingredients of the approxima- tion contraction algorithm, and how to examine our choices, we will use a set of standard benchmark models, which we now discuss.

B. Models for testing
To assess our algorithmic choices, we will consider two families of lattices and two tensor models.(Note that these are only the tensor networks we use for testing the algorithm; Sec.IV further considers other models to demonstrate the power of the final protocol).The two types of lattices we consider are (i) the 2D square and 3D cubic lattices, which reflect the structured lattices commonly found in many-body physics applications, and (ii) 3-random regular graphs (graphs with random connections between vertices, where each vertex has degree 3).On these lattices, the two types of tensors we consider are (i) (uniaxial) Ising model tensors, at inverse temperature β close to the critical point, (ii) tensors with random entries drawn uniformly from the distribution [λ, 1] (we refer to this as the URand model).Changing λ allows us to tune between positive tensor network contractions and tensors with random signs, the latter case being reminiscent of some random circuit tensor networks.In all models, the dimension of the tensor indices of the initial tensor network will be denoted D, while the dimension of compressed bonds will be denoted χ; we refer to the value of the tensor network contraction as Z, and the free energy per site f = − ln Z/βN , where N is the number of spins.More FIG. 4. Overview of the tree gauge for improving bond compression accuracy, suitable for arbitrary local geometry.A: Given the bond eAB connecting tensors A and B, we want to take into account information from the surrounding environment E, shown in (i).In (ii) we form a local spanning tree (shaded bonds) up to distance r = 2 from A and B. If we consider 'loop' bonds (colored orange) that are not part of the spanning tree as cut, then the resulting local tree environment shown in (iii) can be optimally compressed as a proxy target.B: Depiction of the gauging process for a local tree.In (i) tensors at distance r = 2 from A and B are QR decomposed, and the R factors (yellow circles) are accumulated towards the bond eAB, see Fig. 3E.In (ii) the same happens for r = 1 tensors, and finally in (iii) the R-factors from A and B after accumulating all the outer gauges, RA and RB, are contracted to form RAB. C: The Frobenius norm (squared) of an r = 1 local region in the tree gauge is shown in (i).The norm of the local network with loop bonds (orange) cut is shown in (ii), which is exactly encoded, due to the isometric tensors, as (TrR † AB RAB) . Performing a truncated SVD on RAB is thus only r-locally optimal up to the presence of such loop bonds.
discussion of these models (as well as a treatment of corner double line models [56]) is in the SI [57].

C. Bond compression strategies
We first define how to compress the shared bonds {e AB } between tensors A, B. We can matricize these by grouping the indices as {e A }/{e AB }, {e AB }, and {e B }/{e AB }, with effective dimensions D A , D AB and D B respectively (see Fig. 3A).Generally D AB < D A and D AB < D B and so AB is already low-rank and we can avoid forming it fully.Instead we perform QR decompositions of the matricized A, B, giving where the Q matrices satisfy the canonical conditions , with the canonical direction indicated by an arrow in graphical notation shown in Fig. 3C (detailed in the SI [57]).Then, we obtain the compressed Ā, truncating to χ maximal singular values in σ.Because of the canonical nature of the Q matrices, truncating the SVD of R AB achieves an optimal compression in the matrix Frobenius norm of AB due to the orthogonality of Usually T will contain additional tensors connected to A, B. We refer to the additional network of connected tensors as the environment E, with T = {e}/{eout} A {e A } B {e B } E {e E } (Fig. 4A).To compress the bond e AB optimally, we must account for E. We first consider the case when E forms a tree around the bond e AB (Fig. 4A(iii)).Then, we can perform QR inwards from the leaves of the tree, pushing the R factors towards the bond (Fig. 4B (i)-(iii)).This is a type of gauging of the tensor network (i.e. it changes the tensors but does not change the contraction T ) and we refer to this as setting the bond e AB in the tree gauge; alternatively, we can say the tensors in the tree are in the canonical form centered around bond e AB .This results in a similar matricized Then, the truncated SVD of R AB in (4) similarly achieves an optimal compression of e AB with respect to error in T .
More generally, T may contain loops, which extend into the environment (Fig. 4A) and a similarly optimal gauge is hard to compute [58,59].However, by cutting loops in the environment E (i.e.not contracting some of the bonds in the loops) we obtain a tree of tensors around bond e AB , e.g. a spanning tree out to a given distance r. (There are multiple ways to cut bonds to obtain a spanning tree; the specific spanning tree construction heuristic is given in the SI [57]).Placing e AB in the tree gauge (of distance r), we can then perform the same compression by truncated SVD, but without the guarantee of optimality since we are neglecting loop correlations, see Fig. 4C.However, this type of tree gauge compression is easy to use in the general graph setting, and thus will be the main bond compression scheme explored in this work.
One can show [57,[60][61][62] that performing the truncation in Eq. ( 4) is equivalent to inserting the projectors As such, having computed R A and R B from the local spanning trees we can form and contract P A and P B directly into our original tensor network without affecting any tensors other than A and B, but still include information from distance r away.In other words, the steps depicted in Fig. 4 are performed virtually, which avoids having to reset the gauge after compression.

D. Early versus late compression
In practice, compression must be performed many times during a tensor network contraction.It might seem natural to perform compression immediately after two tensors are contracted to form a tensor larger than some size threshold, here given by a maximum bond dimension χ (early compression).This is illustrated in Fig. 5A.However, as discussed above, including information from the environment is important for the quality of compression.Early compression means that tensors in the environment are already compressed, decreasing their quality.An alternative strategy is to compress a bond between tensors only when one of them (exceeding the size threshold) is to be contracted (late compression), as illustrated in Fig. 5B.By delaying the compression, more bonds/tensors in the environment are left uncompressed, which can potentially improve the quality of the contraction.However, late compression will also increase the cost/memory of contraction (as there are more large tensors to consider).This means that it is most efficient to use late compression when the associated gain in accuracy is large.
In Fig. 5C, we assess the effect of early versus late compression when contracting a 2D 16 × 16 lattice (D = 4, URand model with tensor entries ∈ [−0.5, 1]).All compressions are performed using the tree gauge (out to some distance r, several tree distances r are shown), and we show the relative error of the contraction ∆Z as a function of the maximum allowed bond dimension χ.We see in this case that late compression is more accurate than early compression, and that this improvement increases when using larger tree gauge distances, reflecting the fact that the gauging is incorporating more environment information.In Fig. 5D, we similarly compare early versus late compression using the tree gauge in a 3D lattice (using a hyper-optimized Span tree as described later).In contrast to the 2D result, here we see a smaller improvement from performing late versus early compression and from increasing the tree gauge distance.This suggests that incorporating the effect of the environment requires a more sophisticated gauging strategy in 3D.In general, we summarize our findings as so: FIG. 6. A: Error in free energy, ∆f , as a function of bond dimension, χ, for different gauging and environments for the 2D Ising model at the critical point.B: Contraction error, ∆Z, for the same settings but on a D = 4 URand model with λ = −0.5.All contractions contract from the boundary row by row, thus all bond compressions are for bonds on the boundary.However, different gauging is performed before the compressions.None: no gauging, Tree: bonds are placed in the tree-gauge up to distance r, followed by 'late' compression, Boundary: bonds are placed in the canonical form of the MPS boundary, before compression, Full: the environment around tensors A, B, is explicitly contracted using a counter-propagating MPS of the same bond dimension, and the bond between AB is then truncated to minimize the error in Tr BEA.Note that since E is itself only approximate and many truncations are compounded the error overall is not guaranteed to be smaller than another method -as seen for some small χ points here.C: Illustration of the different environments that a single compression step is optimal with regard to.late compression is preferred when trying to maximize accuracy for a given bond dimension χ or size of the largest single tensor operation, while early compression can be better when optimizing computational total cost or memory for a given accuracy.In our subsequent calculations, we will indicate the choice of early or late compression in the simulations.

E. Comparison of the tree gauge to other gauges
To evaluate the quality of the tree gauge compared to other gauging/environment treatments in the literature, we consider contractions on a 2D lattice.To isolate the comparison to only the choice of gauge, we use the same approximate contraction tree as used in boundary contraction, namely contraction occurs row by row starting from the bottom, and compression occurs left to right after the entire row is contracted.We then use 4 different gauges/environment treatments during the compression: None, Tree, Boundary, and Full.None corresponds to no gauging.Tree is the tree gauge discussed above (up to distance r).Boundary corresponds to the standard MPS boundary gauging [44,50], where, after the new row of tensors has been contracted into the boundary, the boundary MPS is canonicalized around the leftmost tensor and then compressed left to right in an MPS compression sweep (see Fig. 3 of the SI for an illustration).Full corresponds to explicitly computing the environment E by approximate contraction (using the standard MPS boundary contraction algorithm to contract rows from the top).Then, for the tensors A, B sharing bond e AB to be compressed, the scalar value of the tensor network is Z = Tr BEA (where A, B, E have been matricized).Using the eigenvector decomposition, BEA = LσR † , where L, R are left, right eigenvectors respectively, then e AB is optimally compressed by defining B = L † B, Ã = A R, where L, R are the eigenvectors corresponding to the eigenvalues of largest absolute magnitude [22,24].Note that the full environment gauge is expensive, as it requires an estimate of E from all the tensors in the network.
The numerical performance of the different strategies is shown in Fig. 6 for two problems: a 32 × 32 lattice (2D Ising model, near critical) and a 16 × 16 lattice (D = 4, URand model with entries ∈ [−0.5, 1]).In all cases, we see that including some environment information is better than not including any environment ("None").In the 2D Ising model, as the tree distance r increases, tree gauge compression converges in quality to the MPS boundary environment scheme ("Boundary"); the two are related as the MPS boundary corresponds to setting an infinite tree distance r for a tree that grows only along the boundary.In the 2D URand model, even for small r, the tree gauge already improves on the boundary environment.The full environment treatment yields the best compression quality for larger χ, but this is achieved at larger cost.
Our numerical results in 2D suggest that the tree gauge is a reasonable compromise between accuracy and efficiency, equaling or outperforming the common boundary environment strategy, while being well-defined for more general graphs.

F. Approximate contraction algorithm
Given a choice of late or early compression, and using the tree gauge, we can explicitly write down a simple pseudocode version of the core approximate contraction function, Algorithm 1, which implements Fig. inner functions is detailed in the SI [57].An alternative, that might be useful in some contexts, is to use the compression locations to transform a tensor network into an approximately equivalent but exactly contractible form, by inserting a set of explicit projectors -this is also detailed in the SI [57].

G. Generating contraction trees
After fixing the choice of early or late compression, the subsequent location of compressions in the contraction tree is purely determined by the contraction order.This is a major simplification, because, when optimizing over the approximate contraction trees we need only optimize the order of contractions.Nonetheless, the space of ordered trees is still extremely large and hard to sample fully.
To simplify the search, we work within a lower-dimensional parameterization of the search space by introducing tree generators.These heuristics generate trees within three structural families we term Greedy, Span, and Agglom.The specific instance of tree within each family is defined by a set of hyperparameters that can then be optimized.Here we describe the heuristic generators at a high level (with a more detailed description in the SI [57]).The input to the generators is only the tensor network graph, bond sizes and χ -the tensor entries are not considered.
The Greedy tree generator assigns a score to each bond in the T .It then chooses the highest scoring bond, generates a new T by simulating bond contraction and compression (i.e.computing the new sizes and network structure) and repeats the process, building an ordered tree.The bond score is a combination of the tensor sizes before and after (simulated) compression and contraction, a measure of the centrality of the tensors (their average distance to every other tensor), and the subgraph size of each intermediate (i.e.how many tensors were contracted to make the current tensor); the hyperparameters are the linear weights of each component in the score.
The Span tree generator is inspired by boundary contraction.It generates a directed spanning tree of the original graph, and the contraction order is chosen to contract the leaves inwards.Contracting simultaneously along all the branches of the tree defines an effective contraction boundary, that sweeps towards the root.The algorithm begins by choosing either the most or least central node in the TN, i 0 , as an initial span, S = {i 0 }.It then greedily expands to a connected node j, adding the contraction (i, j) → i in reverse order to the path, where i ∈ S, j / ∈ S. The algorithm repeats by then considering all neighbors of the newly expanded span S → S ∪ {j}.A few local quantities -connectivity to S, dimensionality, centrality, and distance to i 0 -are combined into a score used in the greedy selection of the next node in the tree, and the combination weights are the hyperparameters.
The previous approaches grow ordered trees locally.The Agglom tree generator explicitly considers the full TN from the start and is inspired by renormalization group contraction strategies in the literature [20].Given a community size K, the generator performs a balanced partitioning over the N tensors in T to find N/K roughly equal subgraphs.These subgraphs then define intermediate tensors, and the tensors within the subgraph are contracted using the Greedy algorithm with default parameters.After simulating the sequence of compressions and contractions, the network of intermediate tensors defines a new "coarse grained" tensor network for which the agglomerative process can be repeated.In this work, Agglom uses the KaHyPar graph partitioner [63,64], treating the community size K, imbalance, partitioning mode and objective as the tunable hyperparameters.Some sample ordered contraction trees generated by the above heuristics are shown in Fig. 7 for a 2D 8 × 8 lattice.In particular, we observe the boundary-like contraction order of the Span tree (contracting row by row from the bottom) and the hierarchical RG like structure of the Agglom tree (forming increasing clusters); the Greedy tree contracts simultaneously from all 4 corners inwards, rather than from one side like the Span tree.Note that the Agglom tree tends to perform more contractions before compressions are performed than the Span tree because it constructs many separate clusters simultaneously, and the Greedy tree exhibits behavior intermediate between the two.

H. Optimizing the contraction trees
We optimize the trees by tuning the hyperparameters that generate them with respect to a cost function.Since we also wish to sample many different trees it is important that the cost function is cheap to evaluate.We perform the optimization over the hyperparameter space using Bayesian optimization [65,66], which is designed for gradient free high dimensional optimization.The overall process is shown in Fig. IA, with more detailed pseudo-code in the SI [57].
Depending on the computational resources available, we can choose the cost function to be memory (peak memory usage M ) or the computational (floating point) cost C.For C we include the cost of contractions, QR and SVD decompositions.
We optimize the contraction trees over the hyperparameters in each of the 3 families of ordered tree generators.In all results with optimized trees, we used a budget of 4096 trees, though in practice a few hundred often achieves the same result.The practical effect of the hyper-optimization time is considered in the SI.

I. Quality of hyper-optimization
To test the quality of the hyper-optimization and the tree search space, we first consider a small (but nonetheless nontrivial-to-contract due to large D and χ) square TN of size 6 × 6.In this case, the number of tensors is small enough that it is possible to perform an exhaustive search over all ordered contraction trees using a branch and bound algorithm ('B&B' -see SI); here we minimize peak memory M .In Fig. 8 we compare the performance of the hyper-optimized Greedy algorithm against the exhaustive branch and bound search (using tree gauging for compression in each case for a fair comparison).The standard boundary contraction algorithm is also shown as a comparison point.
As shown in Fig. 8A, hyper-optimizing over the space of Greedy trees produces performance quite similar to the B&B search and very different from the standard boundary contraction.This indicates that the hyper-optimization is doing a good job of searching the approximation contraction tree space for graphs of this size.In the top-panel, we can see the optimal contraction strategy found by B&B produces a very different contraction order to boundary contraction, exploiting the finite size of the graph and the targeted χ to significantly reduce M .
We can also verify that optimizing M leads to reduced er- ror.In Figs.8B and C, we show the contraction error for the URand model with λ = −0.8,where it can be seen that for equivalent error (∼ 10 −4 , indicated by the yellow circles) the peak memory or cost of using the hyper-optimized Greedy or B&B approximate contraction trees is indeed much lower than that of boundary contraction.Interestingly, the heavily optimized 'B&B' tree does not improve on the error of the Greedy tree for a given peak memory M .

III. BENCHMARKING HYPER-OPTIMIZED APPROXIMATE CONTRACTION TREES
A. Summary of hand-coded strategies for regular lattices In our benchmarking below, when considering regular lattices, we will compare to a range of hand-coded contraction strategies used in the literature in many-body physics applications, namely boundary contraction, corner transfer renormalization group (CTMRG) [67], and higher-order TRG (HOTRG) [68].We briefly summarize the handcoded strategies here.Boundary contraction (as already used above) is a standard method in 2D, but has not been widely applied in 3D.We define a 3D version of (PEPS) boundary contraction on a cube that first contracts from one face of the cube towards the other side, leaving a final 2D PEPS tensor network that is contracted by 2D boundary contraction (with the same χ).CTMRG is usually applied in 2D and to infinite systems.Here we apply CTMRG to the finite lattice by using a finite number of CTMRG moves [57].Finally, HOTRG has been applied to both 2D and 3D infinite simulations; here we perform a limited number of RG steps appropriate for the finite lattice.For both CTMRG and HOTRG, we also compute and insert different projectors for each compression, since we are dealing with generically in-homogeneous systems.Illustrations of all the algorithms are given in the SI [57].

B. Cost scaling with graph size
In Fig. 9 we show the computational cost (memory and floating point cost) of hyper-optimized trees in the Greedy, Span, and Agglom classes for a 2D square of size L×L, a 3D cube of size L×L×L, and for 3-regular random graphs with |V | vertices, using the early compression strategy, and given bond dimension χ.We compare against the cost of contraction trees generated by boundary contraction, CTMRG, and HOTRG.
From this and other examples, we can make some general observations.First, Span trees yield good costs for simple lattices which have a regular local structure, the Agglom tree is superior for random graphs, and the Greedy tree works well for both sets.The markedly different performance of the Agglom and Span trees on random versus simple lattices suggests that these are two good limiting cases for testing contraction heuristics.Interestingly, in the 3D cubic case, the hyper-optimized Span tree performs a boundary-like contraction, but rather than contracting from one face across to the other side, it can find a strategy that contracts all faces towards a point, as visualized in Fig. 10.This substantially improves over the hand-coded boundary PEPS strategy in terms of cost.Similar observations apply to the Greedy tree, which is similar to or outperforms both Span and handcoded algorithms for smaller structured lattices, although its performance degrades for larger lattices.We also find that Greedy trees optimize the cost function less well in other instances of large lattices .This suggests that the search space generated by the Greedy tree generator is limiting at larger lattice sizes.In the 2D square lattice, we find that compared to the hand-coded algorithms, the Span and Greedy trees with early compression are superior with respect to memory and cost, even beating out the most widely used boundary MPS strategy.At smaller system sizes, the superior performance of Span and Greedy over boundary MPS reflects the ability of these algorithms to exploit edge and boundary effects.Interestingly, CTMRG is also superior to boundary MPS at small system sizes, and the optimized strategies seem to interpolate between a more CTMRG-like and MPS boundary-like contraction.The HOTRG algorithm exhibits similar performance to the Agglom tree, as expected due to its real-space RG motivated ordering of contractions.At larger sizes, boundary, CTMRG, Span, and Greedy show similar asymptotic cost, with the optimized strategies retaining a modest asymptotic improvement in memory.FIG. 13.Error versus cost of hyper-optimized approximate contraction using optimized Span, Greedy, and Agglom trees, in comparison to boundary contraction, CTMRG, and HOTRG, for large TNs where no exact reference is available.As a proxy for error, we monitor the rate of change of the log contraction value or free energy with bond dimension, d ln |Z|/dχ, df /dχ, respectively.We plot this against both contraction cost and peak memory, where the trees have been optimized separately for each.The hyper-optimized methods use gauging settings of r = 6 in 2D and r = 2 in 3D, both with early compression.

C. Error versus bond dimension
As discussed above, we optimize over the generated contaction trees for a given bond dimension χ.In Fig. 11, we plot the relative error in the contraction value ∆Z/free energy per site ∆f for 2D and 3D Ising and random tensor models for the hyper-optimized contraction and handcoded strategies as a function of bond dimension.
It is natural to expect the error of an approximate contraction to decrease as we increase χ, since in the limit χ→∞ the algorithm becomes exact.For all the models and algorithms investigated we find a roughly polynomial suppression of the error with inverse χ.What is perhaps less obvious is whether approximate contraction trees with given χ should yield comparable errors regardless of the cost of the particular tree, M or C. We see that this is in fact the case for the hyper-optimized trees, i.e. the error correlates reasonably well with the compressed bond dimension χ, independent of the choice of tree.Thus by choosing the optimized tree with lowest cost for a given χ, we are not paying a price in terms of accuracy.
On the other hand, the hand-coded algorithms do not follow this observation, e.g.CTMRG in the 2D lattice, and boundary PEPS and HOTRG in the 3D lattice exhibit considerably larger errors than the hyper-optimized strategies for given χ.For fixed bond dimension, the hyper-optimized contraction trees appear to use the computational resources (memory and cost) in a more effective way to reduce error than the hand-coded strategies.

D. Error versus cost
We next consider the error obtained for a given peak memory or computational cost.In Fig. 12 we show the relative error in the contraction value ∆Z/free energy per site ∆f for 2D and 3D Ising and random tensor models for the hyper-optimized contraction and handcoded strategies, plotted against peak memory usage or contraction cost (depending on which was used as the cost function to optimize the trees).In this figure, the sizes of the problems were chosen so that the exact value of Z or f can be computed by exact TN contraction.In Fig. 13 we consider the same models, but now for problem sizes too large for exact contraction.In these cases, we use d ln |Z|/dχ and df /dχ as a metric of the convergence of the calculation.
From these plots we observe a few features.In the 2D models, the optimized Span, Greedy, and standard boundary contraction algorithms generally all achieve quite similar performance, and are the best performing algorithms.(We note that although Span shows a consistent advantage over boundary contraction in the error versus χ plots in Sec.III C, it does not do so when the overall cost/memory is considered, because this depends on additional details besides χ, such as the number of large tensors, order in which they contracted, etc.).CTMRG, HOTRG, and Agglom also perform similarly, and all perform much worse than the Span, Greedy, and standard boundary contraction algorithms on this regular 2D latice.
In 3D, the PEPS boundary, CTMRG, and HOTRG all perform quite poorly, while Span performs well.Greedy performs well in the smaller examples, but degrades in the larger lattice, presumably again because of the limited contraction tree space generated by the Greedy algorithm.As noted in Sec.III C, hyper-optimized Span trees choose a quite different contraction path than the PEPS boundary algorithm, while still taking advantage of the boundary, and this is key to the improved performance.
Taken in total, the comparisons in the last three subsections illustrate how the optimized approximate contraction trees are competitive with, and can even exceed, the performance of standard contraction strategies in the simple lattices studied in many-body physics applications.

E. Comparison to another strategy for general graphs
We next turn to a comparison of our hyper-optimized approximate contraction strategy to another recently proposed technique for arbitrary graphs.Ref. [52], proposed an algorithm to automatically contract arbitrary geometry tensor networks with a good performance across a range of graphs.For convenience we refer to that algorithm as CATN.As we cannot trace through CATN in the same way as our previous performance comparisons, we measure the contraction time directly on a single CPU.Although CATN is formulated in a geometry independent manner, a critical difference with the current work is that CATN does not optimize over families of approximate contraction trees.
In Fig. 14 we compare CATN against hyper-optimized Span trees for the 2D/3D Ising model at approximately the critical point, as a function of accuracy versus contraction time.For the Span trees, we sweep over χ for different choices of tree gauge distance r (i.e. each line corresponds to one r, while χ is swept over).The performance of CATN is considered as a function of the two bond dimensions D max and χ (each line corresponds to a given D max , while χ is swept over).We do not include the time to find the tree for the hyper-optimized contraction since one generally re-uses this many times.However as a rough guide, for the lattices in Fig. 14 the search converges to a good tree in 10-20 seconds.Further details and comparisons are in the SI [57].We see clearly that in both the 2D and 3D cases (Figs.14A and B respectively) the hyper-optimized Span trees achieve a better accuracy versus contraction time trade-off than the CATN algorithm.Given that CATN itself has an ordering of compressions an interesting question is to what extent the strategy of CATN might also be optimized.

IV. THE POWER OF HYPER-OPTIMIZED APPROXIMATE CONTRACTION
We now illustrate the power of the hyper-optimized approximate contraction protocol defined above in a further range of interesting problems.

A. Ising partition function on the pyrochlore lattice
We consider a tensor network contraction corresponding to the Ising partition function on large, finite, pyrochlore lattices with up to 4000 sites.(We consider the version of the model where all spins are aligned along the same axis, see Ref. [69]).The highly frustrated geometry makes it harder to compute a low complexity contraction path for this tensor network than in simpler lattices.In Fig. 15A we show the peak memory for the optimized Span algorithm as a function of side length L. The total lattice size is L × L × L × 4, thus the largest calculation (L = 10) is a contraction of 4000 tensors.For L > 6 we see the peak memory starts to saturate.By fitting f (χ) = A + B/χ to our data for parameters A and B we can accurately estimate both the free energy and its error as the fitted value and square root variance of the parameter A = f (∞).We show the result of this in Fig. 15B, in the vicinity of the critical point [70,71].We note that while Ising systems like this can be studied using Monte Carlo techniques, the partition function itself is tricky to estimate, requiring methods [72] beyond the standard Metropolis algorithm [73].Fig. 15D shows the second derivative of 1  N ln Z with respect to inverse temperature: ∂ 2 (−βf )/∂β 2 , which displays a growing peak as a function of system length L, illustrating the critical point.
The largest exactly contractable tensor network corresponds to size L = 6; it is visualized in the inset Fig. 15C.For this size we can investigate the free energy error of the approximate contraction scheme, ∆f , and this is shown in Fig. 15E.We see that increasing χ reliably decreases the error, and for the largest χ considered, the relative free energy is only 10 −4 .

B. Random 3-regular graphs and dimer coverings
We next study a problem defined on random 3-regular graphs.Here the Agglom algorithm performs best, and we show the resulting complexities in terms of peak memory, M , in Fig. 16A.Here, because the length of loops in the graph grows with increasing number of vertices, |V |, we still find an exponential scaling, even when compressing to fixed χ.Nonetheless, we can usefully push much beyond exactly contractible limit of |V | ∼ 300 (illustrated in Fig. 16C).To study the accuracy we consider the problem of counting dimer coverings on these graphs.This is equivalent to so-called positive #1-IN-3SAT [36,74,75].Each edge (i.e.index) is considered a potential dimer, and by placing the tensor on each vertex, we enforce that every tensor be "covered" by a single dimer only for any configuration to be valid.The decision version of this problem is NP-Complete [76,77], and on random 3-regular graphs specifically the problem is known to be close, but just on the satisfiable side, in terms of ratio of clauses to variables, of the hardest regime [74,75].The contraction of the above tensor network gives the number of configurations, Z, at zero-temperature, and a corresponding 'residual' entropy, S = ln Z.We plot S in Fig. 16B.Considering the entropy per site, s/|V |, by performing a least squares fit with a quadratic function of inverse size, 1/|V | and bond dimension, 1/χ, with fitted parameters, s ∞ , c 1...5 , we can estimate the infinite size entropy per site as s ∞ = 0.1429 (2).The theoretical value of this can be computed using [78] when |V | ≳ 5×10 11 (see the SI), yielding 0.1438, suggesting a small systematic error remains from the finite size.In Fig. 16D we consider the relative error in S when compared to exact contraction results for |V | = 300, where again we see that increasing χ reliably improves the error.

C. Hardness transition in random tensor networks
The final problem we consider is one where the hardness derives from the tensor entries themselves rather than the geometry.We take the URand model -with tensor entries sampled uniformly ∈ [λ, 1] -and consider two lattices which are amenable to contraction with relatively large χ, the square and diamond lattices.We take sizes 16×16 with D = 4 and 6×6×6(×2) with D = 2 respectively, both of which are at the limit of what is contractible exactly.In Fig. 17A we show the relative error in the approximately contracted value, ∆Z, as a function of λ across 20 random instances.There is a clear transition in hardness at λ ∼ −0.7 -above this even moderate χ is sufficient to contract the tensor network with very good accuracy.Below this however, there is no improvement to the error at all with increasing χ; the contracted value remains essentially impossible to approximate.An obvious question is how does Z itself change with λ?In Fig. 17B we show the fraction of instances whose exact value Z is negative, as well as the average magnitude of Z.The problem varies from smaller magnitude values (compared to the total number of terms in the sum, ∼ 10 300 ) evenly split between negative and positive, to large always positive values.We also consider the same model but embedded in a 3D diamond geometry in Fig. 17C.The same transition in hardness occurs at a slightly different value of λ.In the hard regime, there remains some small ability to approximate Z with large χ, probably as this is approaching exact contraction.The complexity of contraction of the random tensor network is thus closely related to the positive nature of the tensor entries.This is likely related to the conjectured low entanglement of typical positive tensor networks [79], as well as the hardness of approximating complex valued Ising models [80][81][82].

V. CONCLUSIONS
We have introduced a framework for approximate contractions of tensor networks defined on arbitrary graphs, based on hyper-optimizing over ordered contraction trees with compression steps.In particular, our work attempts to optimize over the many choices and components in such an algorithm, ranging from the manner in which compressions are performed, to the sequence and ordering of compression and contractions.Interestingly, we observe that by minimizing a cost function associated with memory or computational cost, we simultaneously generate an approximate contraction tree that yields small contraction error.In many cases, we find that the optimization produces significantly cheaper and more accurate contraction strategies than hand crafted approximate contraction algorithms, even in well studied regular lattices.While we cannot claim that our final algorithm is optimal, the purpose of the framework is to allow an optimization over compression strategies, and such an optimization can be extended should, for example, other metrics of approximate contraction quality be introduced.We envisage that the many constituent parts of our protocol can be separately improved in future works.
We have discussed different regimes of computational advantage for approximate contraction over exact contraction.Firstly, for locally connected graphs and "non-hard" tensor entries, we expect approximate contraction to display an exponential benefit over exact contraction, as shown here for the pyrochlore lattice.Secondly, for certain geometries with long range interactions, we expect approximate contraction to still scale exponentially, but with a usefully reduced pre-factor, as shown here for 3-regular random graphs.Finally, we expect some classes of tensor entries to be essentially impossible to approximately contract, regardless of geometry, as shown here for certain distributions of random tensors.Whether the latter result corresponds to the hardness of contraction expected for generic random quantum circuits, is an interesting question.Similarly, the application of these techniques to quantum circuit and ansatz expectation values is a natural direction that we leave for future work.factor we write: If we resolve the identity on either side, we can form the product R A R B in the middle and perform a truncated SVD on this combined reduced factor yielding U σV † .and likewise: This form of the projectors makes explicit the equivalence to CTMRG and HOTRG [60][61][62], for which R A and R B contain only information about the local plaquette.Note in general that we just need to know R A and R B (not Q A or Q B ) to compute P L and P R , but we can include in these the effects of the distance-r tree gauge in order to perform the truncation locally without modifying any tensors but A and B.
Rather than dynamically performing the approximate contraction algorithm using the ordered contraction tree, one can also use it to statically map the original tensor network, T , to another tensor network, T P , which has the sequence of projectors lazily inserted into it (i.e. each AP L P R B is left uncontracted).Exact contraction of T P then gives the approximate contracted value of T .Such a mapping may be useful for relating the approximate contraction to other tensor network forms [50], or for performing some operations such as optimization [22].Here we describe the process.
To understand where the projectors should be inserted we just need to consider the sub-graphs that the intermediate tensors correspond to.At the beginning of the contraction, each node corresponds to a sub-graph of size 1, containing only itself.We can define the sub-graph map S(i) = {i} for i = 1 . . .N .When we contract two nodes i, j to form a new node k, the new sub-graph is simply S(k) = S(i) ∪ S(j).When we compress between two intermediate tensors i and j, we find all bonds connecting S(i) to S(j), and insert the projectors P L and P R , effectively replacing the identity linking the two regions with the rank-χ operator P L P R .Finally we add the tensor P L to the sub-graph S(i) and P R to the sub-graph S(j).This can be visualized like so.
Grouping all the neighboring tensors on one side of the bonds as an effective matrix A and those on the other side as B (note that these might generally include projectors from previous steps), the form of P L and P R can be computed as above.
An example of the overall geometry change of performing this explicit projection transformation for the full set of compressions on a cubic tensor network approximate contraction is shown in Fig. 18.Note that the dynamic nature of the projectors, which depend on both the input tensors and the contraction tree, is what differentiates a tensor network which you contract using approximate contraction, and for instance directly using a tree-or fractal-like ansatz such as T P .many quantities in such models, we note that the partition function and free energy are typically much more challenging [72].Regardless of geometry we assume the spins are orientated in the same direction -the uniaxial Ising model.Typically one converts Z into a 'standard' tensor network with a single tensor per spin (or equivalently vertex of G), by placing the tensor, on each vertex v of G, where the matrix W is defined by (W 2 ) i,j = exp(βσ i σ j ).For j > 0 we can define W as real and symmetric using: This is equivalent to splitting the matrix on each bond then contracting each factor into a COPY-tensor placed on each vertex.We note that while this yields a tensor network with the exact geometry of the interaction graph G, one could factorize the COPY-tensor in other low-rank ways.Indeed for the exact reference results the TN in Eq. (F2) is contracted directly by interpreting every spin state index as a hyper index (i.e.appearing on an arbitrary number of tensors).The relative error in the free energy is given by: with • exact results obtained via exact contraction.Depending on geometry the Ising model undergoes a phase transition at critical temperature β c in the thermodynamic limit and it is in this vicinity that generally ∆f peaks for finite systems.For example, on the 2D square lattice the exact value is known, β c = log(1+ √ 2) 2 ≈ 0.44 [85,86].

URand Model
While the Ising model varies in difficulty depending on β, it seems always relatively easy to approximate to some extent using approximate contraction.On the other hand we expect there to be tensor networks which are exponentially difficult to approximate even for simple geometries.Here we introduce the URand model which allows us to continuously tune between very hard and very easy regimes.This is achieved simply by filling each tensor with random values sampled uniformly from the range [λ, 1].When λ ≥ 0, every term in the TN sum is non-negative and the sum becomes very easy to approximate.As λ becomes more negative however, the sum increasingly becomes terms of opposite sign which 'destructively interfere' making the overall contracted value Z hard to approximate.Choosing an intermediate λ allows us to generate 'moderately hard' contractions where the different gauging and tree generating strategies have a significant effect.For the URand model we consider the relative error in Z directly: with Z exact computed via exact contraction.

Dimer covers / positive #1-IN-3SAT
In this model we want to compute the entropy per site of dimer coverings of a graph G with number of vertices |V |.Here, a valid configuration is given if every vertex of G is 'covered' by exactly one dimer.Counting all valid configurations is done by enumerating every combination of placing a dimer on a bond (setting the corresponding index to 1) or not (setting the index to 0), FIG.24.Illustration of a single coarse graining step of HOTRG for a finite 3D lattice.For brevity we only show coarse graining in the x-direction but the full algorithm coarse grains each of the three dimensions in succession.Note that once two or more directions have been coarse grained, all bonds will be of size χ and so subsequent rounds start with D = χ.Note also that the projectors (pink and yellow) are not identical across the lattice but are computed specific to the local tensors to allow for finite in-homogeneous systems.
which can be formed as a tensor network with the following tensor on each vertex: The total number of valid configurations is then the contraction: This is also equivalent the counting problem positive #1-IN-3SAT [36,74,75], the decision version of which is NP-

FIG. 3 .
FIG. 3. Primitive tensor operators for approximate contraction.A: Grouping of indices into left, shared and right sets, giving a matricization of the product AB, with rank DAB.B Graphical depiction of contracting two tensors AB → C. C Graphical depiction of an isometric tensor, Q, such that when dimensions with incoming arrows are grouped Q † Q = 1.D Compression of the shared bonds between two tensors A and B, via QR reduction and truncated SVD to new shared bond dimension χ.E Gauging of the bond between A and B to generate an isometric tensor QA.

FIG. 5 .
FIG. 5. A: Schematic of 'early' compression, where after each pairwise contraction, any shared bonds of total size > χ are truncated.B: Schematic of 'late' compression, where before each pairwise contraction, any shared bonds of total size > χ are truncated.C: Error ∆Z of contracting a 16×16 D = 4 TN with uniform random entries ∈ [−0.5, 1] as a function of χ, tree gauging distance, r, and either early or late compression.The TN is contracted using the standard MPS boundary contraction algorithm.Line (band) shows median (interquartile range) over 50 instances.D: The same but for an approximate contraction of a 5×5×5 D = 2 tensor network with uniform random entries ∈ [−0.4,1].The 3D TN is contracted using a hyper-optimized Span tree.
accumulated the products of R factors from all tensors to the left and right of bond e AB (Fig. 4B (iii)).

FIG. 7 .
FIG. 7. Example approximate contraction trees for a 2D 8×8 square lattice TN.A, B, and C show the ordered contraction trees with compressions (orange, using the 'late' strategy) explicitly marked for the Span, Agglom and Greedy methods respectively.The computation proceeds from the bottom to top.D, E, and F show the same contraction trees as hierarchical communities, with the contraction ordering proceeding from the smallest pink bands to largest blue bands.

FIG. 8 .
FIG.8.Performance of hyper-optimized approximate contraction for a 6×6 square TN with all bonds of size D = 16.The TN is filled with uniformly random entries ∈ [λ = −0.8,1.0] (2D URand model).We test 3 different contraction trees: standard MPS boundary contraction, optimization over Greedy trees, brute force optimization over all approximate contraction trees (B&B).The upper insets show snapshots illustrating the 'Boundary' contraction, an example of Greedy, and the optimal 'B&B' contraction for χ = 32.A: The peak memory usage, M , of the standard MPS 'boundary' method, compared to the Greedy and 'B&B' algorithms which have been optimized for each χ.B and C: the error for each plotted against peak memory, M , and computational cost, C, respectively, averaged over 20 instances of the URand model.The compressions are performed late using a tree-gauge distance r = 1.The yellow circles correspond to approximately equal error ∆Z ≈ 10 −4 , using the 'Boundary' and Greedy trees, to enable the ratio of costs to be determined; e.g. the observed speed-up of Greedy over 'Boundary' for this error is 120×.

FIG. 9 .
FIG. 9. Contraction peak memory and cost of approximate contraction trees in different families, versus exact contraction cost and CTMRG, HOTRG and boundary contraction algorithms, for different geometries (insets show sample geometry).The tree gauging distance is set to r = 0 for this comparison, with 'early' compression, and we also turn off gauging in the Boundary algorithm.The hyper-optimized trees are optimized for M and C separately.A and B: peak memory M , and cost C, of contracting a 2D square TN with D = 4 and χ = 32 as a function of side length L. C and D: peak memory M , and cost C, of contracting a 3D cube TN with D = 4 and χ = 32 as a function of side length L. D and E: peak memory M , and cost C, of contracting 3-regular random graphs with initial bond dimension D = 2 and χ = 4 as a function of number of vertices |V |.The line and shaded band show the median and interquartile range across 20 instances respectively.

FIG. 11 .
FIG. 11.Error versus compressed bond dimension χ of hyper-optimized approximate contraction using optimized Span, Greedy, and Agglom trees, in comparison to boundary contraction, CTMRG, and HOTRG, for medium size TNs.(Insets show sample geometry).In terms of gauging settings for the hyper-optimized methods, in 2D we use r = 6 with late compressions, and for the others we use r = 3 and early compression.A: Relative error in the free energy of the 2D Ising model on a 32 × 32 square lattice close to the critical point.B: Relative error in the contracted value of the 2D URand model on a 16 × 16 square lattice with D = 4 in the intermediate hardness regime of λ.C: Relative error in the free energy of the 3D Ising model on a 6 × 6 × 6 cubic lattice close to the critical point.D: Relative error in the contracted value of the 3D URand model on a 5 × 5 × 5 cubic lattice with D = 2 in the intermediate hardness regime of λ.E: Relative error in the free energy of the Ising model on 3-regular random graphs with |V | = 300 close to the critical point (line and bands show median and interquartile range across 20 instances).F: Relative error in the contracted value of the URand model on 3-regular random graphs with D = 2 in the intermediate hardness regime of λ (line and bands show median and interquartile range across 20 instances).

FIG. 12 .
FIG. 12. Error versus cost of hyper-optimized approximate contraction using optimized Span, Greedy, and Agglom trees, in comparison to boundary contraction, CTMRG, and HOTRG, for medium size TNs where the exact reference values are available.Here the error is plotted against either the total cost of the contraction, C, or peak memory requirement M , as computed by tracing through the computation.The trees are optimized for each separately.Four different lattice and model combinations are shown: A -2D Ising model, B -2D URrand model, C -3D Ising model, and D -3D URrand model.The lines are annotated with the value of χ.The gauging settings used for the hyper-optimized methods are r = 6 and late compression in 2D and r = 2 and early compression in 3D.

FIG. 14 .
FIG. 14. Performance comparison of hyper-optimized approximate contraction (current work) and the algorithm of Pan et al. [52] for computing the free energy of the Ising model at approximately the critical point of A: a square lattice and B: a cubic lattice.For both algorithms χ is varied and the points are labeled with the value.The insets show the geometry and specific sizes of the lattices.

FIG. 15 .
FIG. 15.Approximate contraction of the ferromagnetic Ising model on the pyrochlore lattice.A: peak memory, M , of the Span algorithm on the pyrochlore lattice with D=2 as function of side length L and χ.B: the free energy, f , near the critical point, estimated by extrapolating approximate contractions in χ.A tree-gauge distance of r = 2 was used and the error bars show fit uncertainty.C: example instance of the pyrochlore geometry for L = 6 corresponding to 864 sites.D: The second derivative of −βf with respect to β, showing a diverging peak around the critical point for increasing L. E: relative error in free energy near the critical point as a function of χ for L = 6 and r = 2.

FIG. 16 .
FIG. 16.Approximate contraction of dimer covering counting on random 3-regular graphs.Quantities are averaged over 20 instances.A: peak memory, M , of the Agglom algorithm on random 3-regular graph instances with D = 2 as a function of number of vertices, |V |.B: configuration entropy, S = ln W , where W is the number of valid configurations as a function of |V | and χ.The red dotted line shows the constant from a least squares fit to a quadratic function of inverse |V | and χ (see main text).C: example random regular graph for |V | = 300.D: relative error in S as a function of χ for |V | = 300.

FIG. 17 .
FIG. 17. Hardness transition in approximately contracting tensor networks with random uniform entries ∈ [λ, 1].A: relative error, ∆Z, in approximately contracted value of the URand model on the square lattice using the Greedy algorithm as a function of λ and χ with r = 2. Line and bands show median and interquartile range across 20 instances.B: distribution of actual values Z for the square URand model in terms of fraction of negative instances (green, left axis) and average absolute magnitude (purple, right axis).Error bars denote error on mean.C: relative error, ∆Z, in approximately contracted value of the URand model on the diamond lattice using the Greedy algorithm as a function of λ and χ with r = 2. Line and bands show median and interquartile range across 20 instances.D: distribution of actual values Z for the diamond URand model in terms of fraction of negative instances (green, left axis) and average absolute magnitude (purple, right axis).Error bars denote error on mean.

FIG. 18 .
FIG.18.An example of transforming a tensor network, T , into an exactly contractable tensor network, TP , using the explicit projector form of a approximate contraction tree.Here we take a 6×6×6 with D = 2 cube and use an optimized contraction tree from the Span generator, for χ = 8.Each node in the original tensor network is colored uniquely.The grey square nodes in the right hand side diagram represent the inserted projectors, with thicker edges the compressed bonds of size χ.Arrows indicate the orientation of the projectors (i.e. the order of the compressions).

FIG. 20 .
FIG.20.Overview of a single step of the the manual 2D boundary contraction method that uses an MPS to sweep across the square.

FIG. 23 .
FIG.23.Illustration of a single step of the the manual 3D boundary contraction method that uses a PEPS to sweep across the cube.When Lx, Ly or Lz = 1 the scheme becomes equivalent to MPS boundary contraction.

FIG. 26 .
FIG. 26.A: Accuracy of both manual and hyper-optimized approximate contraction schemes for the 2D corner double line (CDL) model with lattice size of 64 × 64 and an effective D = 2 × 2 = 4. B: schematic of the OBC CDL model.