Capturing exponential variance using polynomial resources: applying tensor networks to non-equilibrium stochastic processes

Estimating the expected value of an observable appearing in a non-equilibrium stochastic process usually involves sampling. If the observable's variance is high, many samples are required. In contrast, we show that performing the same task without sampling, using tensor network compression, efficiently captures high variances in systems of various geometries and dimensions. We provide examples for which matching the accuracy of our efficient method would require a sample size scaling exponentially with system size. In particular, the high variance observable $\mathrm{e}^{-\beta W}$, motivated by Jarzynski's equality, with $W$ the work done quenching from equilibrium at inverse temperature $\beta$, is exactly and efficiently captured by tensor networks.

Introduction.-Dynamical stochastic processes are used throughout the natural and social sciences when inaccessible degrees of freedom are well represented by random variables [1,2]. To calculate expected observable values, numerical methods are usually required. Out of equilibrium, the typical method is the dynamical Monte Carlo method [3][4][5][6][7][8]. Essentially, averaging over randomly sampled paths provides an unbiased estimate for the expected value of an observable. To obtain a fixed expected fractional error, the number of paths sampled must scale linearly with the variance divided by the square of the expected value. For a multitude of important observables, such as those appearing in the estimation of free energies [9] and likelihoods of rare events [10][11][12][13][14][15], this ratio is large: such observables are said to have high variance and sampling methods struggle when applied to them.
Here, we present an approach that is very different to sampling. We simultaneously follow all paths, which is made efficient by using controlled data compression, usually approximate but exact in special cases, based on tensor networks. While tensor networks have previously been used in conjunction with stochastic processes [16][17][18][19][20], the question of how their performance relates to variance has remained unanswered. Understanding this is crucial if we are to know whether or not tensor networks, which have had a revolutionary effect in simulating quantum systems [21][22][23][24][25][26][27] and have been used to great effect in solving partial differential equations [28][29][30][31], provide a useful and perhaps essential complementary technique to sampling in stochastic processes.
In this Letter, we address this question and our answer is very clear: high variance does not limit the accuracy of tensor network compression, and tensor networks can be applied efficiently to tackle problems, even out of equilibrium, for which sampling-based methods struggle. This opens the door for the use of tensor network methods on a wide variety of nonequilibrium stochastic systems for which capturing high variance is essential. In particular, we show that a distribution of weighted expectation values of high-variance observable e −βW , with W the work done quenching from equilibrium at inverse temperature β, is represented exactly by a highly compressed tensor network.
We focus on an Ising system, an example of which is shown in Fig. 1(a). It comprises N nodes, labeled by l, the configuration z l of each taking one of d ¼ 2 discrete values z l ¼ f−1; 1g. The configuration of all N nodes is given by the N tuple z ¼ ðz 1 ; …; z N Þ and the probability of being in configuration z is PðzÞ. Tensor networks best suit systems for which crucial quantities, like energy, are n bodied, with FIG. 1 (color online). Tensor network compression. (a) An Ising system whose degrees of freedom (blue circles with arrows) interact, in this case, with a two-dimensional lattice geometry (red lines). (b) The probability distributions P E ðzÞ and Pðz; tÞ, and Qðz; tÞ (see main text) at any time t are compressed by representing them (approximately or, in special cases, exactly) by a contraction of tensors (green circles) with the same geometry as the interactions. Each black leg corresponds to an index of a tensor, and the joining of two legs represents the contraction of the two corresponding indices. that shares the same geometry as the interactions. Here, A ½l is a tensor associated with node l. It has a physical index z l and k l auxiliary indices i l , one for each edge connected to l, and each taking one of χ values, which may in principle be large. The sum is over the values taken by all auxiliary indices, which is just a sum over an N E tuple i of indices. The Gibbs distribution is important because the tensor network representation is exact for χ ¼ d (see Supplemental Material [32]). This implies that P l d k l þ1 numbers may be used to represent d N others, providing a significant yet exact compression if the degrees k l are limited, as in lattice systems with local interactions.
As well as compression, tensor networks offer a means of calculating the partition function Z E , since [33][34][35][36]. For a one-dimensional (1D) chain, the partition function Z E relates to a product of transfer matrices and requires OðNd 2 Þ or OðNd 3 Þ resources to compute for open or periodic boundaries, respectively. In higher dimensions (if the tensor network has a large treewidth [37,38]) the tensor contractions cannot, in general, be performed both exactly and efficiently, but efficient strategies exist to perform them approximately. Levin and Nave [35] demonstrated that this can be done accurately for two-dimensional (2D) noncritical lattice systems using tensor renormalization group [39,40].
Contrastingly, estimating the partition function Z E directly by evaluating the sum Z E ¼ P z P E ðzÞ through random sampling is made difficult by the fact that, in general, the variance of observables requiring estimation grows quickly with system size [9]. The tensor network representations of P E ðzÞ and Z E show that high variance does not imply difficulty in equilibrium, away from criticality.
Nonequilibrium.-Out of equilibrium, the dynamics of a Markovian system [41] depends only on its current configuration, and the evolution of the distribution Pðz; tÞ is described by a master equation of the form Each non-negative off diagonal element Hðz; z 0 ; tÞ for z ≠ z 0 is the Poisson rate of a transition from z 0 to z at time t, and together these fix the nonpositive diagonals Hðz; z; tÞ ¼ − P z 0 ≠z Hðz 0 ; z; tÞ such that the normalization of Pðz; tÞ is conserved. H is commonly referred to as the Hamiltonian.
To simulate such dynamics using nonequilibrium tensor network methods, we represent Pðz; tÞ at any time by a tensor network, as in Eq. (2), with a small χ. Doing so assumes that this representation, while not necessarily exact, is accurate. There is no guarantee of this accurate compressibility on all occasions, but it is expected in many situations. For example, consider a quench from one Hamiltonian Hðz; z 0 ; 0Þ ¼ H 0 ðz; z 0 Þ to another Hðz; z 0 ; τÞ ¼ H 1 ðz; z 0 Þ ≠ H 0 ðz; z 0 Þ, where the system begins in the stationary state P 0 ðzÞ satisfying where h is some Hamiltonian-specific convergence rate), the system will converge to another stationary state P 1 ðzÞ satisfying Numerous examples have revealed that stationary states of local stochastic processes are accurately compressible via tensor network representations [16][17][18][19][20]. Thus, in such quenches both initial and long-time distributions P 0 ðzÞ and P 1 ðzÞ are accurately compressible. Unlike for quantum systems [42], compression errors are limited even when a system is driven away from equilibrium. The example on which we focus here is that of a thermalizing (equilibrating) Hamiltonian Hðz; z 0 ; tÞ ¼ H E ðz; z 0 Þ for which the Gibbs distribution is stationary, P z 0 H E ðz; z 0 ÞP E ðz 0 Þ ¼ 0, and it is the energy EðtÞ that is quenched by varying the bias λðtÞ appearing in Eq. (1).
The probability distribution Pðz; tÞ over configurations contains only partial information about the full probability distribution over the possible paths through configuration space taken by the stochastic process. As such, the expected values of only certain observables may be calculated from Pðz; tÞ. These include observables whose values OðzÞ depend on the configuration z of the system at a single time t, thus having expected value hOðtÞi ¼ P z OðzÞPðz; tÞ. We call such observables configuration dependent and use the example of the magnetization MðzÞ ¼ P l z l . The values of some other observables depend on the full path taken by the system and their expected values cannot be calculated from Pðz; tÞ. We call such observables path dependent, and use the example of the work done WðtÞ ¼ − R t 0 dsM(zðsÞ) _ λðsÞ by varying λðtÞ between times 0 and t.
Although not previously considered in the literature, the expected values of some path-dependent observables can indeed be calculated using tensor networks and, as we will show, provide us with a stark example of exact compressibility in the face of high variance out of equilibrium. The idea is to represent the relevant pathdependent information locally in time, not with Pðz; tÞ, but through the distribution of weighted conditional expected values Qðz;tÞ¼Pðz;tÞhOðz;tÞi. Here, hOðz; tÞi is the expected value an observable has accumulated by time t conditioned upon the system arriving at configuration z at that time. The expected value of interest hOðtÞi ¼ P z Qðz; tÞ is then obtained from Qðz; tÞ. It follows from Eq. (3) that the distribution Qðz; tÞ evolves as with H 0 ðz; z 0 ; tÞ ¼ Hðz; z 0 ; tÞ þ _ oðz; tÞδðz; z 0 Þ, where _ oðz; tÞ is the rate of increase of the natural logarithm of the observable at configuration z and time t.
It is desirable to predict, as we have for Pðz; tÞ, the accuracy of compressing Qðz; tÞ at any time during its evolution using tensor networks. Consider the quench in λðtÞ between times 0 and τ, starting from equilibrium. The distributions are initially equal Qðz; 0Þ ¼ Pðz; 0Þ; thus, the accurate compressibility of the latter implies the same of the former. Additionally, _ oðz; tÞ is only nonzero for times t < τ and the stochastic evolution is ergodic. Thus after a sufficiently long time t ≫ τ (relative again to convergence time scale h −1 ), the configurations will have mixed such that hOðz; tÞi ¼ hOðtÞi is independent of z and thus once again Qðz; tÞ ¼ hOðtÞiPðz; tÞ is as accurately compressible as Pðz; tÞ.
Numerical examples.-We demonstrate these behaviors for a system undergoing thermalizing Glauber dynamics [43] via local transitions, for z ≠ z 0 . Here, hðz; z 0 Þ ¼ hðz 0 ; zÞ are symmetric rates equaling a nonzero rate h only where z and z 0 differ by the configuration of a single node. The energy is quenched via the parameter λðtÞ varying from λ 0 to λ 1 over time 0 ≤ t ≤ τ according to a smoothed tanh ramp (see Supplemental Material [32]), as drawn in Fig. 2(b). We focus on configuration-dependent observable e MðtÞ , and pathdependent observables e −2βWðtÞ and e −βWðtÞ . All have variance over mean squared growing exponentially with system size N. Initially, we consider the Ising nodes to be in an open 1D chain, illustrated in Fig. 2(a).
To assess the accuracy of compression, we exactly calculate distributions Pðz; tÞ and Qðz; tÞ for small N ¼ 8 at time t, then calculate the error in the expected value of observables induced by compressing the distributions as a tensor network. The errors shown in Figs. 2(c) and 2(d) for e MðtÞ and e −2βWðtÞ , respectively, show that, despite large variances, expected values are relatively unaffected by tensor network compression. The distributions are exactly compressible at t ¼ 0 and thus no error occurs, as expected. The errors due compression initially increase as λðtÞ varies, then decrease exponentially to small values again on a time scale ∼h −1 . Interestingly, errors begin to decrease even at times t < τ prior to the end of the quench. For χ ≳ 4, the compression is near exact at all times. We arrive at similar conclusions for other types of variation tried, e.g., linear, variations in J, and other observables.
A striking example is found in the path-dependent observable e −βWðtÞ . The observable has received particular attention due to its featuring in several nonequilibrium identities in statistical physics, such as that by Jarzynski [44]. Crucially for our discussion, the nonequilibrium distribution Qðz; tÞ ¼ Pðz; tÞhOðz; tÞi for this special case has an equilibrium structure Qðz; tÞ ¼ P t ðzÞZ t ðβÞ=Z 0 ðβÞ [44], where we have used shorthands of the form P t ðzÞ for the Gibbs distribution corresponding to λðtÞ and Z t ðβÞ for the corresponding partition function (where from now on we normalize the Gibbs distribution to 1). It immediately follows from our discussion of systems in equilibrium that Qðz; tÞ, despite containing information about nonequilibrium high-variance observables, has an exact highly compressed χ ¼ d tensor network representation at all times t. This can be used to efficiently and accurately calculate not only hOðtÞi but a range of properties of the work distribution during such dynamics. Note that this exact behavior is particular to e −βWðtÞ and doesn't even extend to its square e −2βWðtÞ .
We have so far demonstrated accurate single-time compressibility. We next examine how this extends to a dynamical tensor network simulation, where compression of Pðz; tÞ or Qðz; tÞ occurs not only at a single time but at all times during their evolution. While one might expect the  compression errors at single times to accumulate, we find this is mitigated by the ergodicity of the evolution (unlike in quantum systems). For example, errors in Pðz; tÞ will not change the distribution to which it converges, and thus the significance of transient errors diminish, rather than accumulate, in time. In what follows, the specific algorithm we use to perform the evolutions of Eqs. (3) and (4) is timeevolution block decimation (TEBD) [20,45,46]. The TEBD algorithm uses a time step δt resulting in an error, beyond that due to compression, of OðNδt 2 Þ and requires time OðNχ 3 δt −1 Þ (see the Supplemental Material [32]).
We first calculated he M i and he −2βW i, where M ¼ MðτÞ and W ¼ MðτÞ are values at the end of the quench t ¼ τ.
There are no exact values available to compare against for large systems, but the compressibility expected from Figs. 2(c) and 2(d) is confirmed by our TEBD results in Figs. 3(a) and 3(b), respectively. The calculated expected values converge approximately exponentially with increasing χ, reaching acceptably converged values by χ ≲ 5, despite the variance over mean squared being very large for the N ¼ 200 considered.
We next calculated he −βW i. Since compression is exact for this observable if χ ≥ 2, the error ϵ 3 is purely due to the finite time step δt and, as stated above, scales as ϵ 3 ∝ N. Meanwhile the estimated variance over mean squared v 3 scales with e OðNÞ , as shown in Fig. 3(c). It is then particularly clear with this example that, to achieve the same fractional accuracy (namely, ϵ 3 ∝ N) as we efficiently obtain, a naïve sampling method would require a sample size and thus time e OðNÞ in contrast to the TEBD algorithm that requires time OðNÞ. Explicitly, our N ¼ 200, χ ¼ 2, hδt ¼ 10 −4 , hτ ¼ 10 calculation takes less than an hour and achieves an error ϵ 3 ≈ 10 −6 . We do not compare this against the cpu time of any one sampling algorithm, as this choice is likely to be unrepresentative. Instead, we note that, since v 3 ≈ 10 11 , our accuracy would require ≈10 17 samples to reproduce, and so matching our cpu time would require each path to be sampled in ≈10 −13 s.
As our final example, we demonstrate both the diversity of information stored in the distribution Qðz; tÞ and the application of our method to 2D lattice geometries, specifically, a N ¼ 64 × 64 square periodic lattice. We consider hOðtÞi S ¼ P z∈S Pðz; tÞhOðz; tÞi= P z∈S Pðz; tÞ, the expected value of OðtÞ ¼ e −βWðtÞ given the system's configuration z is in subset S at time t, where S is the set configurations in which one node has value z 1 ¼ 1. The observable has a very large variance. Its conditional expected value can be rewritten hOðtÞi S ¼ P z∈S Qðz; tÞ= P z∈S Pðz; tÞ. The numerator P z∈S Qðz; tÞ corresponds to a high-variance observable but can nevertheless be efficiently evaluated using equilibrium techniques for Gibbs distributions, due to the equilibrium structure of the nonequilibrium distribution Qðz; tÞ. In this case, we use the tensor renormalization group method [35] with intermediary dimension χ ¼ 3 (see Supplemental Material [32]), which suffices as the model parameters are far from criticality. The denominator P z∈S Pðz; tÞ is simply the low-variance expected value of an observable taking value unity when zðtÞ ∈ S, otherwise zero. This can be accurately calculated using our tensor network methods or the more common dynamical Monte Carlo methods [5][6][7]. We used the latter, with sample size ∼1.5 × 10 4 (see Supplemental Material [32]). The result is plotted in Fig. 3(d). Also plotted is r S ðtÞ, the ratio of the expected value hOðtÞi S to its value were the system in equilibrium at all times. This emphasizes that, despite exploiting an equilibrium structure, the dynamics being simulated is truly irreversible.
Discussion.-We have shown that tensor networks provide a way to overcome the challenges faced by sampling methods when estimating expected values of high-variance observables out of equilibrium, even finding an exactly compressible nonequilibrium example. While advanced techniques for variance reduction exist, such as sequential importance sampling [10][11][12][13] and branching methods [14,15], using these techniques usually requires judicious choices specific to the models to be simulated based on prior intuition about the process. No such intuition or choices are needed for a tensor network calculation. However, whether tensor networks remain accurate and efficient for geometries beyond 1D and 2D lattices, and other models, e.g., describing frustration or disorder, must still be established.
Finally, let us comment on how our findings relate to the wider use of tensor networks. While outstanding efficiency is possible using dynamical tensor network algorithms for The variance over mean squared v 3 of e −βW as a function of hτ and N, obtained by calculating he −2βW i using χ ¼ 10 and δt ¼ 10 −2 , and using an exact result for he −βW i. (d) For a periodic N ¼ 64 × 64 system, the expectation value hexp −βWðtÞ i S , conditional on one spin taking value z 1 ¼ 1 at t (full line). Also, the ratio r S ðtÞ of this value relative to that for a reversible quench (dashed line). Unless stated otherwise, the parameters used are βλ 0 ¼ 0, βλ 1 ¼ 1, hτ ¼ 10, βJ ¼ 1, N ¼ 200, and hδt ¼ 10 −3 . finite 1D pure quantum systems, there is an ongoing effort from the community to reach larger dimensions χ and sizes N in 2D. Reference [47] gives a state-of-the-art demonstration in which the ground state of an N ¼ 21 × 21 system with d ¼ 2 is calculated using χ ¼ 8. Meanwhile we have seen here that often very small dimensions χ are required to simulate classical systems, even during real time dynamics. Further, since only one copy of PðzÞ is needed in classical algorithms, compared to two copies of the wave function in quantum algorithms, those that take time, e.g., Oðχ 6 Þ for quantum systems, will instead take time, e.g., Oðχ 3 Þ for classical systems. It may therefore be the case that classical stochastic systems are currently in an even better position than quantum systems to benefit from current high-dimensional tensor network algorithms.