Tensor Networks contraction and the Belief Propagation algorithm

Belief Propagation is a well-studied message-passing algorithm that runs over graphical models and can be used for approximate inference and approximation of local marginals. The resulting approximations are equivalent to the Bethe-Peierls approximation of statistical mechanics. Here we show how this algorithm can be adapted to the world of PEPS tensor networks and used as an approximate contraction scheme. We further show that the resultant approximation is equivalent to the ``mean field'' approximation that is used in the Simple-Update algorithm, thereby showing that the latter is a essentially the Bethe-Peierls approximation. This shows that one of the simplest approximate contraction algorithms for tensor networks is equivalent to one of the simplest schemes for approximating marginals in graphical models in general, and paves the way for using improvements of BP as tensor networks algorithms.


I. INTRODUCTION
There is a natural connection between classical probabilistic systems of many random variables and quantum many-body systems.In both cases the description of a generic state of a system requires an exponential number of parameters in the size of the system, or the number of physical units that compose it.For example, a general probability distribution over n bits requires the specification of 2 n − 1 non-negative numbers, while a classical description of a quantum state over n qubits requires 2 n − 1 complex numbers.However, in both cases, states that are relevant to us are often subject to many local constraints, which, in turn may lead to a succinct description of the system.A good example is tensor networks (TNs) [1], where the 2 n coefficients of a quantum state are given by the contraction of a set of local tensors.As a probabilistic analog, consider the Gibbs distribution of a classical spin system on a lattice H = a,b h ab (x a , x b ).Also here, the multivariate probability distribution P (x 1 , . . ., x n ) = 1 Z e −βH(x1,...,xn) = 1 Z a,b e −βh ab (xa,x b ) can be compactly described by specifying the local interactions h ab (x a , x b ).This is an example of a graphical model [2][3][4], in which the global probability distribution is given by a product of local factors.Tensor networks and graphical models are therefore two frameworks that provide a compact description of the state of the system, which in principle can be used to simulate it.
Both frameworks also face similar challenges.In both cases, calculating the expectation value of a local observable can be an NP-hard problem [5,6], as it (at least naively) involves summation over an exponential number of terms.In addition, when the underlying graph that describes the model is a tree, this can be done efficiently using dynamical programming (via the sum-product algorithm [2] for graphical models or directly by the results of Ref. [7] for tensor-networks), but when there are loops, the problem becomes hard and one usually resorts to approximations.In the world of tensor networks this is known as the problem of approximate con-traction, whereas in the world of graphical models this is known as the problem of approximated inference and local marginals.
Over the years many different algorithms and techniques have been suggested to this problem for both frameworks.Some of them have been adopted and adjusted to the other framework.For example, the corner transfer method (CTMRG) [8,9] for approximate tensornetwork contraction has its roots in Baxter's work in statistical mechanics [10,11], as well as ideas of using Monte-Carlo sampling for TN contraction [12][13][14].From the other side, the Tensor Renormalization Group algorithm (TRG) for the contraction of tensor networks can be used for highly accurate approximations of classical statistic mechanical quantities such as partition functions and magnetization [15] (see also Ref. [16]).
In this paper we show how an important class of inference and marginalization algorithms for graphical models, called Belief Propagation (BP) algorithms, can be adapted and used for approximate tensor network contraction.This idea was suggested in Ref. [17], in the context of a general mapping between tensor-networks and graphical models.Here, by using a slightly different mapping, we show how this approximation is in fact equivalent to the basic contraction approximation that is at the heart of the simple-update algorithm of tensor networks.As we discuss later, since the underlying approximation in the BP algorithm is the Bethe-Peierls approximation, our results imply that this type of approximation is also at the center of the simple-update method.It also motivates the study of various improvements of the BP algorithms as potential tensor-network contraction algorithms.

II. A BP ALGORITHM FOR TENSOR NETWORKS
Belief Propagation (BP) [18] is a statistical inference algorithm on graphical models, that can also be used to approximate their marginals [2][3][4].It is also known as the sum-product algorithm in the context of cod- ing theory [19], and can also be viewed as an iterative way to solve the Bethe-Peierls equations of statistical physics [20,21].
In what follows, we present a BP variant on a PEPS tensor-network.For an alternative approach, which first maps the PEPS to a graphical model, and then uses the BP on that graphical model, please see Appendix A.
We consider a PEPS |ψ , in which the physical particles sit on the vertices (nodes) of some graph G = (V, E) (Fig. 1a).Each node a ∈ V is associated with a tensor T a that has one physical index (leg) of bond dimension d and an index of dimension D for each adjacent edge, which is also called a 'virtual leg'.Virtual legs of the same edge in G are contracted together.
Consider a double layer TN that corresponds to the scalar ψ 2 = ψ|ψ (Fig. 1b).When G = (V, E) is a tree (like an MPS, for example), it can be contracted efficiently using dynamical programming.One way to preform it is as follows.Given two incident nodes, a, b, the edge that connects them divides the system into two parts, see Fig. 1a,b.We define the "message" m a→b (x, x ) to be the tensor that results from contracting the branch of ψ|ψ that is connected to a, where (x, x ) are the indices of the open ket-bra edges connected to node a.Similarly, the message m b→a (x, x ) is related to the contraction over the branch of b.See Fig. 1a,b.If we consider them as matrices of the indices (x, x ), they are positive semi-definite, due to the fact that they are a result of a contraction of a branch with its complex conjugate.Crucially, these messages satisfy a recursive relation.If N a is the set of nodes that are incident to a, then the tensor FIG. 2. In a tree, the converged BP messages can be used to calculated the local RDMs.In this example, a 2-local RDM.The same formulas are used to approximate the local RDMs also when the underlying PEPS is not a tree.m a→b is given by the contraction where Tr(•) denotes contraction of joint indices.For example, If b, c, d are incident to a, then the message m a→b (x, x ) is given in terms of the messages m c→a (x, x ) and m d→a (x, x ), as shown in Fig. 1c.
In principle, we can pick any node which is not a leaf, define it as a root, and use Eq. ( 1) to calculate the messages from the leaves to the root.Using the messages that lead to the root, we can calculate ψ 2 .This calculation can also be done differently.Instead of forcing a particular causality order between the messages, we can try to solve Eq. ( 1) for all messages simultaneously.This can be done by solving Eq. ( 1) iteratively: starting from a set of random positive semi-definite (PSD) messages {m (0) a→b (x, x )} for all incident nodes a, b, we define the set of messages at step t + 1 using the messages of step t: Equation ( 2) is the BP equation for PEPS tensornetwork.It is a natural extension of the BP equations for graphical models [2][3][4].The equation also guarantees that if the messages at t are PSD when viewed as a matrix whose (x, x ) element is m a→b (x, x ), then so would the messages at t + 1 be.The fixed point of this iterative process will solve Eq. ( 1), and give us all the m a→b (x, x ) messages.It is a well-known fact that the BP iterations on tree graphical models have a unique fixed point to which they converge in linear number of steps [22].The same arguments easily generalizes also to our case.Once we have the messages, we can use the fact that they are contractions over branches, and use them to calculate local reduced density matrices (RDM).For example, the calculation of 2-local RDMs are shown in Fig. 2. The PSD property of the messages guarantees the resultant RDMs are also PSD.Thus far, the BP equations (2) might seem no more than an elegant method for contracting tree PEPS.Things become interesting when we consider graphs with loops.In such case, the messages can no longer be defined as contraction of branches, since an edge in the graph no longer partitions it into two distinct branches.Yet we can still define them as solutions to Eq. ( 1), and try to find them iteratively using Eq. ( 2).If the iterations converge to a fixed point, we can use the messages to estimate local marginals, using the same expressions as we did in the case of trees, which are illustrated in Fig. 2.This procedure is called loopy BP.Evidently, this is an uncontrolled approximation.In fact, there is no guarantee the BP iterations will converge to a fixed point, or that there is a unique fixed point.Nevertheless, in the world of graphical models, the BP method often provides surprisingly good results, in particular when the graph looks locally like a tree and there are no long-range correlations.A famous example are problems related to the decoding of error correction codes, in which BP performs extremely well [23].As we shall see, also in the world of tensor-networks loopy BP often performs well on problems with short-range entanglement.
While a general theory to explain the performance of the BP algorithm is still lacking, there are some partial results in this direction.An important result is due to Yedidia et al [24], who established the correspondence between fixed points of the BP algorithm and the Bethe-Peierls approximation [20].The Bethe-Peierls approximation is an approximation scheme for classical statistical mechanics, in which one treats the system as if its defined on a tree (a Bethe lattice).This assumption implies that the Gibbs distribution of the system, as well as the free-energy functional can be written as functions of the local marginals.When the actual interaction graph of the system is not a tree, this is an uncontrolled approximation.Nevertheless, also in these cases, one can take the Bethe free-energy functional and look for locally consistent set of marginals that minimizes it.This procedure often gives surprisingly good approximations.Yedidia et al [24] showed that there is a 1-to-1 correspondence between extermum points of the Bethe free-energy and fixed points of the BP equations.The marginals obtained from the messages at a fixed point minimizes the Bethe free-energy, and conversely, from the marginals at the extremum point one can derive fixed-point messages of BP.
As we show in Appendix A, this result naturally generalizes to our case.The idea is that our BP algorithm for PEPS is the usual BP algorithm that is applied to a graphical model with complex entries, obtained from the tensor-network of ψ|ψ .Such graphical models were studied in Ref. [25], under the name Double Edge Factor Graph (DEFG), where it was argued that the BP algorithm corresponds, like in the usual case, to the extreme points of the Bethe free-energy.
At this point it seems tempting to benchmark the BP algorithm for PEPS contraction, and compare it to other methods.Indeed, we first used the BP algorithm as a contraction subroutine in an imaginary time evolution algorithm on various 2D models, and compared to the performance of the simple update method [26].Surprisingly, the final energies of both algorithms were suspiciously close to each other.As we show next, this can be explained theoretically; the mean field approximation at the heart of the simple-update method and the BP algorithm are equivalent.

III. THE SIMPLE UPDATE METHOD
The simple-update method [27] is a direct generalization of the TEBD [28][29][30] algorithm for 1D real and imaginary times evolution to higher dimensions.It calculates the dynamics of many-body spin systems that sit on a lattice, and are described by a (quasi-) canonical PEPS tensor-network.The method is efficient and numerically stable, but often results in poor accuracy due to its oversimplified representation of local environments.
At its core lies a quasi-canonical form of the PEPS that allows a crude approximation of local TN environments.This approximation is often referred to as mean field approximation.When the PEPS is in the shape of a tree, this form is truly canonical: it corresponds to consecutive Schmidt decompositions of the many-body quantum state.Every edge in the graph connects two disjoint branches of the system, which correspond to the Schmidt decomposition between these two branches |ψ = D i=1 λ i |L i ⊗ |R i .Specifically, the TN consists of two types of tensors: tensors T a that are connected to the physical legs, and diagonal λ tensors that sit in the middle of every edge and correspond to the Schmidt weights, as illustrated in Fig. 3.The orthonormality of the Schmidt decomposition, and its correspondence with the TN structure implies that contraction over branches of the tree is given by a simple Kronecker delta function (Fig. 3b).Therefore, the RDMs of a given region can be readily computed using only the tensors that surround it (Fig. 3c), and in addition the tensors must satisfy local canonical conditions shown in (Fig. 3b).This type of approximation is often referred to as mean field approximation of the environment in the TN literature.
When the underlying graph has loops, the canonical form is no longer well-defined; removing an edge from the graph no longer divides it into two parts, and so it cannot be associated with a Schmidt decomposition between two branches.Nevertheless, we can still define a PEPS to be quasi-canonical if it satisfies the local canonical constraints of Fig. 3b2.In such case, the expression for local RDMs (e.g., Fig. 3c) is no longer exact.Nevertheless, when the graph looks locally like a tree, or when the quantum state has only short-range correlations, this approximation is often reasonable.
In the simple-update algorithm one usually starts from a quasi-canonical PEPS and then applies local gates that performs real or imaginary time evolution according the Trotter-Suzuki decomposition.For example, in the imaginary time evolution, if the Hamiltonian interaction term between the neighboring sites a, b is h ab , the operator U ab = e −δτ h ab will be applied, where δτ is a small Trotter-Suzuki time step.Once U ab is applied, a local SVD decomposition is performed, as shown in Fig. 4b-f, which guarantees that: (i) the resultant tensor network can be reshaped into a local PEPS with T a , λ, T b replaced by T a , λ , T b (ii) a truncation is performed so that the bond dimension of new tensors does not increase, and (iii) some of the local canonical conditions are (approximately) satisfied -see Fig. 4g.
When the operator U ab is not unitary (e.g., in the case ofs imaginary time evolution), or when truncation is performed, the resultant TN will no longer be quasicanonical.Some of the canonical conditions will be satisfied, not all of them.However, also in that case, the λ weights still provide a reasonable approximation for the local environments, as is evident by the success of the SU algorithm in many cases.In particular, in the imaginary time case, if the system reaches a fixed point, (the approximate ground state), it is also a fixed point of all the local SU steps.This state satisfies all the local canonical conditions, and is therefore a quasi-canonical state.
We conclude this section by noting that the SU steps can be easily turned into an algorithm for finding a quasicanonical form of a PEPS TN.Starting from an arbitrary PEPS, we can apply a "trivial" SU step with U ab = 1, without the truncation step.In other words, we essentially perform an SVD decomposition of the fused tensor in Fig. 4d, which yields tensor T a , λ , T b satisfying the local canonical condition of Fig. 4g.Repeating this step on all edges, if a fixed point is reached, it is by definition a quasi-canonical PEPS, and can be used to calculate local RDMs via the λ weights (see Fig. 3c).We call this algorithm trivial-SU, and note that it has already been used previously (see, e.g., the quasi-orthogonalization in Sec.B of Ref. [31], or the super-orthogonalization of Ref. [32]).See also interesting parallels between our BP construction, the trivial-SU and the NCD theory of Ref. [33] (see also Chapter 5 in Ref. [34]).

IV. BP-SU EQUIVALENCE
The trivial-SU algorithm and the BP algorithm for TN are two different algorithms for approximate contraction of PEPS.They originate from two very different places: the trivial-SU algorithm is a natural algorithm for TN, which relies on the Schmidt decomposition, whereas the BP algorithm is a message-passing algorithm for graphical models that originated from inference problems and the Bethe-Peierls approximation.It might therefore come as a surprise that these two algorithms are equivalent.In hindsight, this could have been anticipated, as both are iterative algorithms that are exact on trees.We prove Theorem IV.1 Every trivial SU fixed point corresponds to a BP fixed point such that the local RDMs computed in both methods are identical.
As a simple corollary of this theorem we conclude that if the trivial SU equations and the BP equations have a unique fixed point, both algorithms will yield the same RDMs.In light of this equivalence, the success of the imaginary time SU algorithm for many models (see, for example, models analyzed in Ref. [35]), is another example of the success of the Bethe-Peierls approximation.
To prove Theorem IV.1, we note that in the BP algorithm the TN remains fixed, while the BP messages evolve to a fixed point.In the trivial-SU, there are no messages, but the local tensors that make up the TN evolve until they converge to a quasi-canonical fixedpoint, without changing the underlying quantum state.In both cases, the evolution is done via local steps.Our proof uses two lemma: Lemma IV.2 Let T , T be two tensor-networks that represent the same state |ψ , such that T is obtained from T using a single trivial SU step on tensors T a , T b and the λ weight between them.Then every BP fixed-point of T has a corresponding fixed-point of T with the same RDMs, and vice-versa.
The idea of the proof is to show that the fixed point messages of the new TN can be constructed from the fixed point messages of the old TN, except for the local place of change, where the messages are adapted to fit the new tensors.The full proof of the lemma is given in Appendix B.
Using this lemma repeatably along the trivial-SU iterations, we conclude that the BP fixed points of an initial TN is equivalent to those of its quasi-canonical representation.To finish the proof, we show that the BP fixed point of a quasi-canonical PEPS yields the same RDMs as the λ weights do: Lemma IV.3 Given a TN in a quasi-canonical form (i.e., a fixed point of the trivial SU algorithm), it has a BP fixed point that gives the same RDMs estimates as those of the quasi-canonical form based on the λ weights.
The idea of the proof is that after "swallowing" a √ λ of each λ tensor in its two adjacent T a , T b tensors, we reach a PEPS for which the messages m a→b (x, x ) = λ x δ x,x are a BP fixed-point.The full proof is found in Appendix B.
In this section we present some numerical tests that compare the BP method to the trivial simple-update method.The asymptotic complexity of one step in both algorithms is similar.For example, on a square grid with virtual bond dimension D and physical bond dimension d, both steps take O(dD 5 ) basic arithmetic operations.What might be more interesting is the number of iterations needed for convergence, which can be tested numerically.We compared these numbers on two types of PEPS on finite square grids.One type was a random PEPS with normal complex entries and the other was an approximate ground state of the anti-ferromagnetic Heisenberg model on a square lattice with random, nearest-neighbors coupling.For every model we ran the BP algorithm and the trivial-SU algorithm on 20 − 50 random instances to calculate the 2-body RDMs.For each instance we calculated the ratio of the convergence time (i.e., number of iterations until convergence) T BP /T tSU .The results are given in Table I.While for the random PEPS instances the BP method seems slightly faster, in the AFH models, where correlations are of longer range, the trivial-SU seems to converge faster, in particular as the bond dimension D increases.The full histogram of the results, as well as the full numerical details can be found in Appendix C.

VI. DISCUSSION
In this work we established a bridge between the world of graphical models and tensor networks.We have defined the Belief Propagation method for PEPS contraction, which can be viewed as the ordinary BP method applied for Double Edge Factor Graphs (DEFG) [25] that are derived from the PEPS tensor network.Just as in the ordinary graphical models case, the fixed points of the BP iterations correspond to extreme points of the Bethe free-energy, which is defined for the underlying PEPS TN.We have shown that the BP algorithm is equivalent to the trivial SU algorithm on PEPS tensor networks, which leads to a quasi-canonical form.This correspondence has some interesting implications.First, since the fixed points of the imaginary time SU algorithm is a quasi-canonical PEPS, our result implies that its SU approximate environments correspond to a Bethe-Peierls approximation.The success of the SU algorithm can therefore be seen as another example of the power of the Bethe-Peierls approximation.Second, it shows that one of the simplest algorithms for approximating marginals in the world of graphical models is equivalent to one of the simplest approximate contraction algorithms in the world of tensor networks.
The equivalence of these two methods, which come from different fields and are derived from different principles, is interesting for several reasons.From the practical point of view, we can try to "import" other, more sophisticated algorithms for marginal approximations to the world of tensor networks.A natural candidate is the Generalized Belief-Propagation (GBP) algorithm [24], which generalizes the BP algorithm by considering messages from larger regions in the graph.Just as the BP algorithm converges to extreme points of the Bethe free-energy, the GBP algorithm converges to extreme points of the Kikuchi free-energy of the cluster variation method [36,37].As shown in Ref. [24], this algorithm provides a much better approximation of the marginals, at the price of a higher computational cost.It would be interesting to compare the performance of this algorithm, as well as other BP improvements [38][39][40][41][42], when acting on TNs to that of more accurate contraction algorithms, such as the corner transfer method (CTMRG) [8,9], the tensor-network renormalization group (TRG) [15,43] and boundary MPS (bMPS) [44], to name a few.
From the theoretical prospective, it would be interesting to better understand the physical and mathematical role of the complex Bethe free-energy that we have derived.In particular, we know that for tree-tensor networks, it is related to the Schmidt decomposition.Can we somehow relate it to the underlying entanglement structure also when the underlying graph has loops?Another interesting question is whether the BP equations can be used to analytically analyze models for which the ground state is a known PEPS, such as the AKLT model.and in coding theory as 'sum-product algorithm' [19].The name 'belief propagation' was coined by J. Pearl, who used it in the context of Bayesian networks [18,45].
The main objects in the BP algorithm are "messages" between factor and nodes and vice versa.To write them, let us define N i as the set of factors in which x i participates, and similarly N a to be the set of variables of the factor f a (i.e., adjacent nodes to f a ).A message from a factor f a to an adjacent node i ∈ N a is a non-negative function m a→i (x i ) and a message from node i to factor a is a non-negative function m i→a (x i ).The BP algorithm starts by initializing the messages (say, randomly), and then at each step, the messages are updated from the messages of the previous step by the local rules (see also Fig. 6): If the messages converge to a fixed point, they can be used to estimate the marginals on subsets of nodes.For example, the marginal of a single variable x i is given by where N is a normalization factor.The marginal over the variables of a factor are given by These two expressions are demonstrated in Fig. 7 For tree graphical models, the BP messages are promised to converge to a unique fixed point in linear time, and formulas (A3, A4) give the exact marginals [22].
When the underlying graph has loops, the BP algorithm is called 'loopy BP ', and the formulas for the marginals become, essentially, uncontrolled.Moreover, it is not known how fast the algorithm will converge, if ever, or if it has a unique fixed point.Nevertheless, in many practical cases, the loopy BP provides surprisingly good result.
While, a general theory to explain the performance of BP algorithm is still lacking, there are some partial results in this direction.An important result is due to Yedidia et al [24], who highlighted the correspondence between the Bethe-Peierls approximation and fixed points of the BP algorithm, which we now explain briefly.The starting point are models defined on tree graphs.A simple observation is that for these models, the global probability distribution can be written in terms of its local marginals: where P a (x a ) is the marginal on the nodes adjacent to a ∈ F, and P i (x i ) is the marginal of x i .Finally, is the number of factors that are adjacent to x i .Using Eq. (A5), we can write the energy in terms of the local marginals: The above expression is called the Bethe free-energy.When the underlying graph is not a tree, the Bethe freeenergy is still well defined, but no longer equal the exact free-energy.In such case, we can use it to approximate the local marginals.We write the Bethe free-energy as a function of unknown marginals {q a (x a ), q i (x i )}, and then we estimate the real marginals {P a (x a ), P i (x i )} by finding the {q a (x a ), q i (x i )} that minimize the Bethe free-energy.This procedure is exact on trees, where the Bethe free-energy is equal to the exact free energy, but on loopy graphs it is essentially an uncontrolled approximation; the resultant q a (x a ), q i (x i ), may be far from the actual marginals, and in fact, they might not be marginals of any underlying global distribution.Nevertheless, decades of experience in statistical mechanics has shown that this is often a good approximation that gives better results than simple mean field.In Ref. [24] it was showen that there is a one-to-one connection between the fixed-points of the BP equations and the extreme points of the Bethe free-energy.The Lagrange multipliers used to minimize the latter become the fixed-point BP messages, and the local marginals coincide.This connection between a message-passing inference algorithm, and a variational approach gave rise to a plethora of other message-passing algorithms, such as generalized belief propagation (GBP), which are based on more sophisticated free energies, such as Kikuchi's cluster variation method [36] 2. Mapping a PEPS tensor network to a graphical model In this section we present a mapping that takes a PEPS TN to a graphical model.Relations and dualities between graphical models and tensor networks have been studied over the years by several authors [17,46,47].Our approach shares some similarities with these works, but in particular builds on the Double Edge Factor Graph (DEFG) formalism of Ref. [25].This allows us to transform a tree tensor-network into a tree graphical model, and it also has the desirable property of messages being positive semi-definite matrices.
The mapping between PEPS and DEFG is illustrated in Fig. 8. Let |ψ be the many-body quantum state that is described by our TN, and consider the tensor network corresponding to ψ|ψ , in which, we clump every edge in |ψ with its equivalent edge in ψ| (see Fig. 8b).We call such pairs of edges 'double edges'.They run over D 2 values of the double indices (x, x ) of the ket and the bra TN.We map this TN into a graphical model as follows: • We associate every double edge with a node so that its double indices (x i , x i ) now become a single variable in the graphical model.We denote this pair by a single variable z i = (x i , x i ), and notice that it runs over D 2 discrete values.• We associate the contraction of every pair T a , T * a of bra-ket tensors along their physical leg with a factor.See Fig. 8b,c.Specifically, let T µ a;x1,...,x k be the PEPS tensor at node a with µ being the physical leg, then the resultant factor is given by See Fig. 9.As in the body of the paper, we write x a = (x 1 , . . ., x k ), x a = (x 1 , . . ., x k ), and z a = (z 1 , . . ., z k ) for the variables of the factor a. With this notation, we may write f a (z a ) = f a (x a , x a ).Definition A7 immediately implies that as a matrix, f a (x a , x a ) is positive semi-definite.
• Graphically, variable nodes are denoted by circles, and factors by squares.Adjacent variables and factors are connected by double lines (edges) that correspond to the double variable z i = (x i , x i ) that they represent.See Fig. 8.
With these definitions, the resultant graphical model is called a DEFG and describes the function Writing P (z) as P (x, x ), the positivity of the individual f a (x a , x a ) implies that also P (x, x ) is a positive semidefinite function.We can therefore interpret it as the density matrix of some fictitious quantum states that "lives on the edges of the PEPS", although it has a non-conventional normalization because Tr P = x,x P (x, x) is not necessarily equal to 1 (instead, it is x,x P (x, x ) = 1).
Once the factor graphical model is defined, we can run the BP iterations on it, which are simply the usual BP equations (A1, A2) with x i replaced by the double-edge variable z i .It is easy to see that these equations are equivalent to Eq. ( 2) in the paper by noting that every node i is adjacent to exactly two factors a, b (because it corresponds to an edge in the PEPS connecting two vertices), and therefore by Eq. (A9), a→i (z i ), which we identify with m (t+1) a→b from Eq. ( 2).Moreover, the summation za\{zi} in Eq. (A10) is exactly the contraction of the virtual legs in Eq. ( 2) and Fig. 1c.Finally, note that as f a (x a , x a ) are semi-definite, then Eqs.(A9, A10) imply that if the messages at time t are positive semi-definite then so are the messages at t + 1.
The above discussion shows that like in the ordinary graphical models, also here fixed points of the BP iterations are solving a Bethe-Peierls type of approximation.In particular, defining the local "marginals" We can write a complex Bethe free-energy which is defined by first choosing a specific branch of the logarithmic function.Note that in this case, because there are always exactly two adjacent factors to each variable z i , and so It is not very hard to show that the even though P a (z), P i (z i ), f a (z a ) might take complex values, F Bethe must be real.In Ref. [25] it was argued that also in this case fixed points of the BP iterations correspond to extremum points of the above functional.We note, however, that unlike the ordinary case, we see no reason why the complex Bethe free-energy should be positive.Interestingly, in all of our numerics, it was positive.Assume a trivial-SU step changes the TN T to T by locally changing the adjacent tensors T a , λ, T b to T a , λ , T b , while keeping the rest of the tensors fixed (see Fig. 4a-f with trivial U ab = 1).To simplify the book-keeping, we "swallow" the λ tensors in the T a tensors by splitting every λ tensor into λ = √ λ • √ λ and contracting each √ λ with its adjacent T a tensor, see Fig. 10.We denote the resulting tensor networks by F, F , and note that their local tensors are identical except for the F a , F b and F a , F b which are equal to the T a , T b , T a , T b tensors contracted with the appropriate √ λ tensors.The fact that the contraction of (T a , λ, T b ) is equal to the contraction of (T a , λ , T b ) implies that the contraction of (F a , F b ) is equal to the contraction of (F a , F b ).
Let {m a→b (x, x )} be fixed-point BP messages of F. We will use these messages to construct fixed-point BP messages {m a→b (x, x )} of F that give the same RDMs.All messages except for the a → b and b → a messages remain the same.The a → b and b → a messages are defined by the BP iterative equations using the new tensors F a , F b so that they will satisfy them.For example, if tensor F a is connected also to tensors F c , F d in addition to F b , then m a→b (x, x ) is given by the diagram in Fig. 1c, replacing T tensors by corresponding F tensors.To finish the proof, we need to show that this new set of messages (i) is a BP fixed-point, and (ii) produces the same RDMs according to the BP formula (see Fig. 2).Clearly, for adjacent vertices that have nothing to do with a, b, both conditions hold trivially, as the relevant messages and underlying tensors are unchanged.Let us then verify these points for vertices in the vicinity of a, b.
(3) a. Checking point (i): By definition, the a → b and b → a messages satisfy the BP equations.So we only need to verify that other messages from a or b (but not between them) satisfy the BP equations.Consider, for example, the message b → e in Fig. 11a.We need to verify that m b→e (x, x ) is indeed a BP fixed-point, given as the appropriate expression of m a→b , m f →b (see Fig. 1c for the BP equations).This is proved in Fig. 11b in a series of 5 simple equalities (see the caption for full explanation), which rely on the fact that the original messages are fixed point of the BP equations, and that the contraction of F a , F b is equal to the contraction of F a , F b .
b. Checking point (ii): By definition if we are interested in 2-body RDMs on vertices that are different from both a and b, then the RDM estimate will remain the same because neither the relevant messages, nor the tensors changed.We only need to verify for the RDM ρ ab and RDMs that contain a or b with other adjacent node, such as ρ be .For the former, ρ ab = ρ ab because it depends on the incoming messages to the a, b nodes (which remain the same), together with the contraction of F a , F b , which by assumption is identical to that of F a , F b .For the latter, the proof uses the same idea as in point (i).Using the assumption that the contraction of F a , F b is identical to that of F a F b , and that F e = F e , it is easy to show that the 3-body RDM ρ abe is identical to that of ρ abe , from which we deduce that ρ be = ρ be .This concludes the proof of Lemma IV.2.

Proof of Lemma IV.3
As in the first lemma, we first define F to be an equivalent TN in which every λ weight tensor in T was split into √ λ • √ λ and the √ λ tensors are contracted into the T a tensors to give the F a tensors (see Fig. 10).Next we define a set of messages for every two adjacent vertices a, b, where λ is the weight on the ab edge in the original T tensor.We claim that (i) these messages are BP fixed-point on the F TN, and (ii) they give the same 2-body RDMs as those of the trivial SU method of quasi-canonical T .Both claims are immediate.Claim (i) follows by writing the BP equation for the a → b message in terms of the F a tensor, and noticing that this expression is equal to it gives the λ x δ x,x using canonical condition on T .This is illustrated in Fig. 12. Claim (ii) follows from definitions of the 2-body RDMs of the BP method and the SU method (see Fig. 3e, and Fig. 2). FIG.
12. The proof of Lemma IV.3: defining the BP messages by Eq. (B1) and using the canonical condition shows that these messages are fixed point of the BP equations.analysis.
In the random PEPS configurations, the tensor entries where chosen as a + ib, where a, b were uniformly distributed in (−1, 1).In the AFH configurations, we used random couplings J ab uniformly distributed in the interval (−1, 0).To obtain the ground states of these models, we ran an imaginary time evolution with simple-update, with decreasing values of imaginary time steps by δτ = 0.1, . . ., 0.0001.After obtaining an approximation to the ground state, we applied a random local gauge change on every bond in order to get a TN that is far away from a canonical form.Specifically, for every virtual edge (a, b), we drew a random matrix V ab which was a product of a random unitary with a diagonal with random entries between (0.5, 2).We then inserted the identity V −1 ab V ab = 1 in the middle of the edge, absorbing V −1 ab in T a and V ab in T b .This way, the resultant TN was far from quasi-canonical, yet represented the same approximate ground state.
The convergence criteria for both BP and trivial-SU was taken with respect to the averaged trace distance of 2-body RDMs between consecutive iterations (see Fig. 3c for trivial-SU and Fig. 2

FIG. 1 .
FIG. 1. BP messages on a tree PEPS TN.(a) A local patch of a PEPS defined on a tree.The (a, b) edge defines a bipartition of the system into two branches.(b) The TN that corresponds to ψ|ψ .Also here the (a, b) edge defines two branches.The tensors that result from the contraction of each branch are the messages m a→b (x, x ) and m b→a (x, x ).(c) The messages satisfy a recursion relation, which are used to define the BP equations (2).

FIG. 3 .
FIG. 3. The properties of a canonical representation of a tree PEPS.(a) An example of a canonical representation of a tree PEPS and its relation to the Schmidt decomposition.The empty circles are diagonal tensors that correspond to the Schmidt weights λ. (b1) The orthonormality of the Schmidt bases implies a simple formula for the contraction of the left and right branches and (b2) a local canonical condition on the PEPS tensors.(c) A local expression for the reduced density matrices.

FIG. 4 .
FIG.4.The Simple-Update steps of applying a "gate" Uij on the tensors {Ti, λ, Tj}, and updating them to {T i , λ , T j }.(a) The original tensors.(b) Applying Uij.(c) The Ta, T b tensors and all surrounding λ weights are contracted into one big tensor, which is reshaped as a matrix.(d) An SVD is performed on the matrix (e) and (f) a trivial λiλ −1 i pairs are inserted to the external legs and define the new T a , λ , T b tensors.At this step, one can truncate the smallest weights of λ to reduce the bond dimension back to D. (g) If no truncation was done, the resulting tensors satisfy some of the local canonical conditions.

FIG. 7 .
FIG.7.Calculating the local marginals from the BP messages by the formulas in Eqs.(A3,A4).These formulas give the exact marginals when the underlying graphical model is a tree.

FIG. 8 .
FIG. 8. Mapping a tensor network to a graphical model of type double edge factor graph.

FIG. 10 .
FIG.10.Swallowing the λ weights in the T tensors and obtaining an equivalent TN with F tensors.The empty circles denote a Simple-Update weight tensor λxδxy and the red circle denote its square root: √ λxδxy.

FIG. 11 .
FIG. 11.(a) The description of the TN.canonical condition.(b) Proving that m b→e is given by the BP propagation of messages m a→b and m f →b : Equality (1) follows from definition, m b→e = m b→e .Then in (2) we use the assumption that m b→e is a fixed point of the BP equation, and similarly in (3) we use that assumption on m a→b .In (4) we use the fact that the contraction of Fa, F b is equal to the contraction of F a , F b , together with the definitions that all the new messages are equal to the old messages, except for the a ↔ b messages.Finally, in(5) we use the definition of m a→b which was designed to satisfy the BP equations.

4 FIG. 13 .
FIG.13.The ratio of the number of iterations that takes the BP algorithm to converge by that of the trivial-SU algorithm.The tests were on 4 different systems: a 4 × 4 and 10 × 10 anti-ferromagnetic Heisenberg models with random coupling, as well as 4 × 4 and 10 × 10 random PEPS.For one of these 4 cases, PEPS were used with bond dimension D = 2, 3, 4, and the statistics was generated using 20 − 50 different realizations.More details on the numerical procedure can be found the text body.

ab 1 <
10 −6 , where a, b denotes nearest-neighbors nodes and m is the total number of such neighbors.