Machine Learning Statistical Gravity from Multi-Region Entanglement Entropy

The Ryu-Takayanagi formula directly connects quantum entanglement and geometry. Yet the assumption of static geometry lead to an exponentially small mutual information between far-separated disjoint regions, which does not hold in many systems such as free fermion conformal field theories. In this work, we proposed a microscopic model by superimposing entanglement features of an ensemble of random tensor networks of different bond dimensions, which can be mapped to a statistical gravity model consisting of a massive scalar field on a fluctuating background geometry. We propose a machine-learning algorithm that recovers the underlying geometry fluctuation from multi-region entanglement entropy data by modeling the bulk geometry distribution via a generative neural network. To demonstrate its effectiveness, we tested the model on a free fermion system and showed mutual information can be mediated effectively by geometric fluctuation. Remarkably, locality emerged from the learned distribution of bulk geometries, pointing to a local statistical gravity theory in the holographic bulk.


I. INTRODUCTION
The holographic duality [1][2][3][4][5] is a duality between boundary d-dimensional quantum field theories and bulk (d + 1)-dimensional gravitational theories in asymptotically anti-de Sitter (AdS) space. It provides an appealing explanation for the emergence of spacetime geometry from quantum entanglement [6][7][8][9][10][11][12][13][14][15][16]. The connection is manifested in the Ryu-Takayanagi (RT) formula [17,18] S(A) = 1 4G N min γ A |γ A | that relates the entanglement entropy S(A) of a boundary region A to the area of the extremal surface γ A in the bulk that is homologous to the same region A. Progress has been made to reconstruct the bulk geometry from the boundary data in terms of geodesic lengths [19][20][21][22], extremal areas [23][24][25] or entanglement entropies [26,27]. A majority of the effort has been focused on reconstructing a classical geometry from single-region entanglement entropies (or independent extremal surfaces). However, multi-region entanglement entropies further encode the correlation among multiple extremal surfaces, which could reveal how the bulk geometry fluctuations around its classical background (assuming a semiclassical description of the bulk gravity). In this work, we will explore the possibility to extract information about fluctuating holographic bulk geometries from multi-region entanglement entropies of a quantum system using generative models in machine learning.
A feature of the holography entanglement entropy based on the RT formula is that the mutual information I A:B = S A + S B − S AB vanishes between two disjoint boundary regions A and B that are far separated from each other [28,29], because the minimum surface enclosing the combined region AB will be a disjoint union of γ A and γ B such that the entropies simply add up as S AB = S A +S B , leaving no room for mutual information. While the vanishing mutual information is a correct feature of holographic conformal field theories (CFT), it is not generally the case for many other quantum systems (e.g. free-fermion CFT). One idea to remedy the problem is to introduce bulk matter fields to mediate the mutual information [30][31][32]. Another possibility is to consider statistical fluctuations of bulk geometries such that γ A and γ B are correlated to produce the finite mutual information. The statistical gravitational fluctuation may be viewed as an effective description arising from tracing out bulk matter fields. We will further explore the second possibility of fluctuating geometry using a concrete model of random tensor network (RTN) [33,34] with fluctuating bond dimensions. The bond dimension fluctuation translates to the bulk geometry fluctuation in the context of tensor network holography [35][36][37], which is presumably governed by some statistical gravity model. However, it is unclear what should be the appropriate bulk statistical gravity model that best reproduces the entanglement feature of a given quantum system on the boundary. To address this challenge, we propose to apply data-driven and machine-learning approaches to uncover the statistical gravity model behind the observational data of quantum many-body entanglement. What needs to be learned is the joint probability distribution of bond dimensions (or bulk geometries). Generative models [38][39][40][41][42][43][44] in machine learning provide us precisely the tool to learn unknown distributions from data. In particular, we apply a deep generative model [39,41] to describe the bulk geometry fluctuation. We train the model by matching the model predictions of multi-region entanglement entropies with their actual values evaluated in the given quantum many-body state. After training, the generative model should tell us the statistical gravity model that emerges from learning.
The approach developed in this work extends the general idea of entanglement feature learning [26], which aims to reconstruct the bulk geometry by learning from the entanglement data on the boundary. Compare to the previous work, this study makes significant progress in including the gravitational fluctuation in the model, which will enable us to learn an emergent gravity theory rather than a static classical geometry. We will focus on (1+1)D quantum systems, and assume that the system admits an approximate semiclassical geometry description in the arXiv:2110.01115v1 [hep-th] 3 Oct 2021 holographic bulk. Based on a random tensor network model with fluctuating bond dimensions, we first establish a holographic model for quantum entanglement involving a scalar matter field on a statistically fluctuating spatial geometry. Applying our approach to a freefermion CFT state with a large central charge, we uncover a statistical gravity model governed by Weyl field fluctuations propagating on the hyperbolic background geometry. We show that the Weyl field fluctuation has the emergent bulk locality by studying its bulk correlation. We further analyze the spectrum and the leading collective modes of the emergent gravity theory. We also show that the matter field mass gets renormalized by the gravitational fluctuation as expected.

A. Random Tensor Network Model
The random tensor network (RTN) model is an intuitive toy model for holographic duality, which directly connects quantum states and emergent geometries. The original proposal [33] of RTN assumes a fixed bond dimension on every link of the tensor network. It can be generalized to include bond dimension fluctuations (or more precisely, bond entanglement fluctuations) [34,45]. The generalized RTN model in consideration is defined as follows: (i) A planar graph G = (V, E) is given to describe the background network geometry, where V denotes the vertex set and E denotes the edge set. V = V blk ∪ V bdy is divided into two subsets: the bulk V blk and the boundary V bdy sets, see Fig. 1(b). (ii) A local Hilbert space H e v is associated with each pair (v, e) of vertex v ∈ V and its adjacent edge e ∈ E (for e not adjacent to v, the associated Hilbert space is considered trivial H e v ∼ = C), see Fig. 1 The probability measure of |Ψ in the RTN ensemble E RTN is given by P (|Ψ ) = P (|ψ )P (|φ ). The vertex state distribution P (|ψ ) = v∈V blk P (|ψ v ) is assumed to be factorized, and on each vertex, the distribution P (|ψ v ) is taken to be the Haar measure (i.e. uniform random states in H v ). The edge (link) state distribution P (|φ ) is generally a nontrivial joint distribution depending on all |φ e on all edges, which allows the quantum entanglement across different edges to fluctuate collectively. For any operator O (k) defined in k copies of the boundary Hilbert space H ⊗k bdy , its expectation value in the product state |Ψ ⊗k is defined to be We assume that the correlation between denominator and numerator is not important (which is generally valid in the semiclassical regime when fluctuations are weak), so that we can approximate the ensemble average of the ratio by the ratio of separate averages, where N k = E|Ψ ∈ERTN Ψ|Ψ k is the kth moment of the state norm squared. For example, the 2nd Rényi entanglement entropy S A (or more precisely, the purity e −S A ) of RTN states in a boundary region A can be calculated by taking k = 2 and O (k) = X A (the swap operator supported in region A), We will suppress the Rényi index throughout this work, and use S A to denote the 2nd Rényi entropy. The RTN model provides an effective description of entanglement entropies of typical quantum states on the holographic boundary, given the background geometry G together with fluctuations of states |ψ v , |φ e in the holographic bulk. It worth mention that in modeling the 2nd Rényi entanglement entropy by Eq. (4), the average over the RTN ensemble E RTN is taken neither on the state vector level (i.e. not a pure state superposition E|Ψ ∈ERTN |Ψ ), nor on the density matrix level (i.e. not a mixed state superposition E|Ψ ∈ERTN |Ψ Ψ|), but on the double density matrix level (as E|Ψ ∈ERTN (|Ψ Ψ|) ⊗2 ). The same average strategy commonly appeared in random tensor network/quantum circuit literatures [26,33,34,[46][47][48]. Such average may not have direct physical realization, nevertheless it defines a RTN model for entanglement entropy which can produce (i) positive mutual information I A:B that does not vanish between distant regions and (ii) possibly negative tripartite information I A:B:C (see Appendix A for a perturbative proof). These features indicate that the generalized RTN model is expressive enough to describe quantum chaotic states with information scrambling [49,50] and to capture mutual information between distant entanglement regions, which goes beyond holographic CFT states.

B. Ising and Dual Ising Models
Evaluating the ensemble average in Eq. (4) following the approach developed in Ref. 33, the RTN purity e −S A can be map to the partition function of an Ising model on the graph G with fluctuating coupling constants with P [σ|J] given by and P [J] given by δ J e − S(|φ e ) P (|φ ).
Here σ v = ±1 is the Ising variable defined on every vertex v ∈ V , J e ≥ 0 is the ferromagnetic coupling strength on every edge e ∈ E. J e is determined by S(|φ e ), the 2nd Rényi entropy of the state |φ e (entangled between the Hilbert spaces H v+e and H v−e where v ± are the two vertices on the boundary of e). J e characterizes how much the tensors are entangled with each other across the edge e in the tensor network. It corresponds to the notion of bond dimension when |φ e is maximally entangled. The distribution P [J] describes the how the effective bond dimension (bond entanglement) fluctuates in the RTN ensemble. Finally, the partition function is subject to the boundary condition that is set by the boundary region A of S A , ∀v ∈ V bdy : which is denoted as δ[σ bdy ⇔ A] in Eq. (5). The partition function Z[J] properly normalizes the Boltzmann weight of the Ising model, such that S A = 0 when the entanglement region A = ∅ is empty. Given that G is a planar graph [51], we can use the Kramers-Wannier duality to rewrite the Ising model Eq. (5) on the dual latticeG = (Ṽ ,Ẽ), as shown in Fig. 1(c), whereṼ corresponds to the set of faces in G andẼ ∼ = E. The dual Ising model takes the similar form with P [σ|J] given by  The RT formula can be recovered in the classical limit when the RTN bond dimensions are large and fixed, which corresponds to the deep ferromagnetic phase of the original Ising model (J e 1) or equivalently the deep paramagnetic phase of the dual Ising model (Jẽ 1). In such limit, the dual Ising correlation decays exponentially with the geodesic distance σ 1σ2 ∝ e −|γ12|/ξ , as illustrated in Fig. 2(a), which reproduces the RT formula S A = |γ 12 |/ξ with some appropriate choice of the correlation length ξ. Multi-region entanglement entropies will correspond to higher-point correlations functions, such as e −S AB ∼ σ 1σ2σ3σ4 ∼ e −|γ12|/ξ e −|γ34|/ξ in Fig. 2 Allowing the dual Ising couplingJ to fluctuate collectively will introduce perturbations to the geodesic distance |γ 12 | → |γ 12 | + δ|γ 12 | in a correlated manner, such that Thus the correlated geometric fluctuation provides an effective mechanism to generate the mutual information between far-separated regions A and B (beyond the classical RT formula). Therefore we anticipate the fluctuating RTN model to be a more expressive holographic model for entanglement entropies. However, it is not clear how the dual Ising couplingJ (or the effective bond dimension J) should fluctuate precisely in order to quantitatively reproduce all multi-region entanglement entropies of a given quantum many-body state. The remaining task is learn the distribution P [J] (or other equivalent distributions) from data.

C. Effective Statistical Gravity Model
Suppose the fluctuation ofJ is small around its static background configuration, such that there is a meaningful notion of background geometry in the bulk. The dual Ising model can be described by an effective field theory in the continuum limit where the dual Ising variableσṽ is coarse-grained to a massive real scalar field φ(x), as the Ising model universally flows to this massive Gaussian fixed point in the paramagnetic phase. The theory is defined in the holographic space (without time dimension). The fluctuating Ising couplingJ can be translated to a fluctuating bulk metric tensor g around a reference background geometryḡ [52], since a stronger local coupling creates a larger local correlation, which effectively reduces the local distance measure ds 2 = g ij dx i dx j between the correlated Ising variables. Therefore the purity of RTN state Eq. (9) can be effectively described by a statistical gravity model , where the gravity is "quenched" in the sense that the metric configuration is generated with a probability dis-tribution P [g] independent of the scalar field φ configuration.
In two-dimensional space, the metric tensor has three independent components. However two of them can be removed by gauge transformation g ij → g ij +∇ i ξ j +∇ j ξ i . We can choose the conformal gauge where the metric tensor g ij (x) is parametrized by a Weyl field ω(x) that rescales a fixed backgroundḡ ij (x) such that each Weyl field configuration represents a physically distinct geometry. As a result, the integration in Eq. (14). The unknown joint distribution P [ω] will be what we aim to learn from the entanglement entropy data.
To numerically evaluate the multi-point scalar field correlation, we can place the bulk field theory back on a lattice, say on the dual graphG = (Ṽ ,Ẽ). Using Regge calculus [53] to discretize the action, where xy can be interpreted as the geodesic distance between two vertices x and y on the background geometry. A x and A xy are the areas associated to the vertex x and the edge xy respectively. xy , A x , A xy are all fixed according to the choice of background metric, which will be specified later. The statistical variables in the model are the scalar field φ x and the Weyl field ω x in the holographic bulk. The model predicts the entanglement entropy on the holographic boundary by which is the underlying lattice model that will be used in the machine learning algorithm. The unknown distribution P [ω] will be parameterized by a generative model. By matching the model prediction with the actual data of entanglement entropies calculated from a quantum state, the algorithm can reconstruct the distribution P [ω] and infer the statistical gravity model behind the entanglement structure.

A. Generative Modeling
Generative modeling is about learning probability distributions [54]. We will apply the simplest latentvariable generative model [55] in this work. The basic idea is to start with a easy-to-sample prior distribution, such as a Gaussian distribution. Draw a random vector z ∈ R n (as a collection of latent variables) from the prior distribution P (z). Then transform the latent variables z by a deep neural network G ϑ (parametrized by some variational parameter ϑ) to the designated random variable ω, i.e. z → ω = G ϑ (z). The mapping G ϑ is called the generator, which defines the distribution of generated samples A large batch of ω can be sampled efficiently in parallel, when hardware accelerators (e.g GPU or TPU) are available. If the neural network G ϑ is expressive enough, Eq. (18) will provide a sufficiently expressive probability model P ϑ [ω] for the Weyl field ω configuration. The distribution P ϑ [ω] defines the model prediction of the purity based on Eq. (17) We will use S A | ϑ to denote the Rényi entropy predicted by the machine learning model as it depends on the model parameters ϑ. In Eq. (19), we introduced the short-hand notation to denote the scalar field correlation on a background Weyl field configuration. The conditional distribution P [φ|ω] is defined in Eq. (17) with S[φ|ω] given in Eq. (16). The scalar field correlation ∂A φ ω can be efficiently evaluated when S[φ|ω] is a Gaussian action (which is the case here). The task is to learn the optimal Weyl field distribution P ϑ [ω] that gives the best prediction of the purity data based on Eq. (19). The dataset will contain the purity {e −S A } of a quantum state in different regions A. The distribution P ϑ [ω] can be learned by optimizing model parameters ϑ to minimize the following loss function (to be explained later) As illustrated in Fig. 3, the training initiates from randomly choosing a batch of entanglement regions A. On one hand, we query the dataset to get the ground truth of S A . On the other hand, a collection of Weyl field configurations are sampled from the generative model, based on which the model prediction S A | ϑ is estimated. Then the loss function is calculated by comparing S A | ϑ with S A , and the gradient signal propagates back to train the parameters via gradient descent ϑ → ϑ−r∂ ϑ L ϑ . After some iterations, the parameters are expected to converge. In the following, we will explain different modules in Fig. 3 in detail.

B. Entanglement Dataset
While efficient experimental approaches [56,57] have been developed to estimate Rényi entropies from randomized measurements, which enables the acquisition of a large amount of entanglement data to drive the entanglement feature learning, preparing an entanglement dataset by numerically computing entanglement entropies from a given quantum many-body state remains difficult in general. As a proof of concept, we choose to use the ground state of a free fermion system for demonstration, on which entanglement entropies can be efficiently calculated.
Consider N copies of the (1+1)D massless Majorana fermion chain, described by the Hamiltonian where {χ i,a , χ j,b } = δ ij δ ab . Let |Ψ be the ground state of H. The 2nd Rényi entropy can be efficiently computed from the fermion correlation function, where C A,ij = Ψ|χ i,a χ j,a |Ψ (for i, j ∈ A) is the twopoint correlation function (matrix) of Majorana fermions restricted inside the entanglement region A. The quantum system is critical and is described by the free-fermion CFT at low energy.
To construct the dataset, we will take the Majorana fermion chain of 32 sites, and randomly sample a large collection of single-region, two-region, and three-region subsets. We then compute the entanglement entropy using Eq. (23) for every region and record the results in the entanglement dataset.

C. Bulk Model Solver
The bulk model solver is expected to calculate the scalar field correlation given the Weyl field background ω and the entanglement region A that specifies the scalar field inserting position on the boundary. We will use the lattice model specified by the action in Eq. (16), which describe a free scalar field φ. The action can be written as the bilinear form where x, y label the vertices on the dual graphG = (Ṽ ,Ẽ) on which the holographic model is defined. The kernel matrix takes the form of being the mass term, and being the discrete Laplace operator on the dual graph. The length xy and area A x , A xy constants are fixed and are set by the background geometry, as to be specified soon. The two-point correlation is given by the inverse of the kernel matrix, where a trainable constant β is introduced to take care of the field renormalization. Higher-point correlations follow from Wick's theorem. For example, We will treat β and m 2 as trainable parameters, which will be optimized (together with other model parameters for P [ω]) to fit the entanglement data.
Since we intend to apply our approach to entanglement data collected from CFTs, following the idea of AdS/CFT correspondence, it is natural to choose the two-dimension hyperbolic geometry (the spatial slice of AdS 3 ) as the background geometry. We use the following background metric where 0 ≤ θ < 2π and ρ ≤ ρ bdy (the UV cutoff scale is set by ρ bdy , which is another parameter to learn). The geodesic distance any two points on the boundary ρ = ρ bdy separated by θ is given by |γ|(θ) = arccosh(1 + 2 sinh 2 ρ bdy sin 2 (θ/2)) e ρ bdy 1 − −−−−− → 2 ln(sin(θ/2) + ρ bdy ). y   FIG. 4. Triangular lattice discretization of hyperbolic space in (ρ, θ) coordinate. The lattice is divided into different layers along the the radius direction. The ith layer corresponds to the radius ρi.
Without loss of generality, we chose to discretize space using a triangular lattice with periodic boundary condition along the θ-direction. All vertices in the same layer are of the same ρ-coordinate and their θ-coordinates are uniformly spaced, see Fig. 4. The geodesic distance xy between two vertices x and y is given by The area of an elementary triangle in the ith layer reads tan which defines the vertex and edge areas in a barycentric scheme. Specifically, the vertex area A x is given by for ρ x = ρ i . The edge area A xy is given by These equations defines xy , A x and A xy used in the lattice model Eq. (16), which all rely on the values of ρ i for different layers. The discretization scheme in the radial dimension is specified by how ρ i is spaced from 0 to ρ bdy . A bad choice of the discretization scheme may cause some triangle elements to have high aspect ratios, reducing the quality of the triangulation in approximating the continuous background geometry. We will take a data-driven approach to learn the optimal discretization scheme by treating {ρ i } as trainable parameters.
To summarize, the bulk model solver contains the following parameters: the normalization β and the squared mass m 2 associated with the scalar field dynamics, and the radial coordinates {ρ i } associated with the discretization of background geometry. These parameters will be trained together with other neural network parameters (see Sec. III D) to optimize the model prediction of the entanglement entropy data.

D. Neural Network Design
The central goal is to learn the Weyl field distribution P [ω] using a latent-variable generative model P ϑ [ω], recall Eq. (18). The key component of the generative model is a generator G ϑ that maps the latent variable z to a Weyl field configuration ω = G ϑ (z). The generator is realized as a deep neural network consists of consecutive layers of simpler maps where each layer g n (z) is an affine transformation followed by some non-linearity such as ReLU [58]. The weight and bias parameters are introduced to parametrize the affine transformations, which constitute part of the training parameters ϑ. It is both practical and theoretically motivated to enforce the neural network's architecture such that the learned distribution P ϑ [ω] will respects certain symmetries, i.e. to construct an equivariant neural network [59]. Let Q be a symmetry transformation that we wish to impose. The sufficient condition for the generated distribution to be symmetric (i.e. P ϑ [Qω] = P ϑ [ω]) is to require (i) Qg n (z) = g n (Qz) and (ii) P (Qz) = P (z). The symmetries in consideration are The translation symmetry can be imposed by parameter sharing between relation-related weights and biases, making the affine transformation in each layer effectively a convolution along the translation direction. The reflection symmetry can be imposed by using a reflection symmetric convolution kernel. The prior distribution P (z) = x P (z x ) automatically satisfies the symmetry condition as it factorizes to identical independent Gaussian distributions on every site.
We would like to emphasize that although each layer looks like a convolutional layer under the symmetry constraint, we do not restrict the convolution kernel to be local (the kernel size extends to the whole lattice), because we do not want to impose locality by hand. As we will see, a sense of locality could emerge in the neural network as the holographic model gets trained, which corresponds to the emergent locality in the bulk gravity theory.

E. Loss Function Design
The loss function is designed to evaluate the average difference between the purity e −S A | ϑ predicted by the holographic model and the purity e −S A given by the entanglement data. A straightforward option would be the mean squared error (MSE) loss The model prediction e −S A | ϑ should be evaluated according to Eq. (19), which involves the ensemble expectation E[ω]∼P ϑ . In practice, the expectation value can only be estimated by sampling a finite number of Weyl field configurations from the generative model P ϑ and take the average where N ω denotes the number of Weyl field samples.
With the help of modern GPU, ∂A φ ω can be computed in parallel efficiently. The sample size N ω is thus ultimately limited by the GPU memory. In our case, N ω ranges from 512 to 2048.
For any finite sample size N ω , the finite average e −S A | ϑ will have a finite variance, which bias the MSE loss causing the parameter to converge to a wrong saddle point. The bias can be corrected by assigning a larger weight to the prediction with a higher precision, i.e.
which can also be argued from the maximum-likelihood estimation. The variance is generally proportional to the square of the purity var e −S A | ϑ ∝ (e −S A | ϑ ) 2 , which leads to the mean squared relative error (MSRE) loss We numerically test the loss function by generating some data using a model with known parameters, and train new models with different loss function on the generated data to see if the parameter converges to the known result. Our test shows that Eq. (39) indeed converges better compare to Eq. (35). Therefore, we will use the MSRE loss function to train the model, as mentioned in Eq. (21).

A. Fitting Entanglement Data with Static and Fluctuating Geometry
We apply the proposed machine learning approach to learn the entanglement feature of a Majorana fermion chain of 32 sites (16 unit cells) with a relatively large central charge c = 8. The entanglement data is partitioned into the training set and the test set that does not overlap. Within the training/test set, the data can be further classified by the number of subregions of the entanglement region, including the single-region, doubleregion, and triple-region entanglement. To demonstrate the effect of introducing gravitational fluctuations, we will compare two holographic models: (i) the fluctuating model, i.e. the model e −S A | ϑ = E[ω]∼P ϑ ∂A φ ω proposed in Eq. (19) with fluctuating geometries, (ii) the static model, i.e. the model e −S A | ϑ = ∂A φ ω≡0 with a fixed static geometry. We train both models using the MSRE loss in Eq. (21). The algorithm is implemented in the TensorFlow [61] framework using the ADAM [62] optimizer. Upon convergence, the MSRE loss is evaluated on the test set to characterize the performance of the model.  If we train the static geometry model with singleregion data only, the model can easily achieve high accuracy (MSRE ∼ 10 −5 ) in predicting single-region entanglement, as also shown in Fig. 5(a). But the prediction of multi-region entanglement is rather inaccurate (MSRE ∼ 10 −1 ), meaning that the static geometry model overfits the single-region data and can not be generalized to multi-region data. If we include the double-region data in the training set, and train the static geometry model with both single-and double-region entanglement, the model will learn to predict double-region entanglement better at the price of losing the accuracy in predicting single-region entanglement, with the MSRE saturates at the ∼ 10 −2 level. This implies an intrinsic conflict for the static geometry model in modeling the single-and multi-region entanglement simultaneously.
However, by introducing gravitational fluctuations to the model, the fluctuating geometry model achieves one order of magnitude improvement in the prediction accuracy of both single-and double-region entanglement, as the MSRE drops to the ∼ 10 −3 level, which is also manifest in Fig. 5(b). This indicates that the gravitational fluctuation indeed helps to reconcile the conflict between single-and multi-region entanglements in the classical gravity model (RT formula) (see Appendix B for an analytic analysis of how the conflict can be reconciled in principle). Moreover, the prediction accuracy in tripleregions is also improved significantly, even if the model is never trained on the triple-region data. This speaks for the better generalizability of the fluctuating geometry model. As argued previously, the static geometry model suffers from the problem of vanishing mutual information between far separated regions. One motivation to introduce gravitational fluctuations is to mediate the mutual information between distant regions through the holographic bulk. Indeed, as shown in Fig. 6, by allowing the geometry to fluctuate, the model can better capture the behavior of mutual information. In particular, the static model fails to produce the non-vanishing mutual information between distant regions, which is most obviously seen in Fig. 6(a), where the regions are most far separated (compare to their sizes). However, the fluctuating model fixes this problem, demonstrating the importance of introducing the geometric fluctuation in modeling the multi-region entanglement.

B. Weyl Field Correlation and Effective Bulk Gravity Theory
After training, we want to open up the model and see what bulk gravity theory has been learned. With the trained generative model P ϑ [ω] that describes the statistical fluctuation of the Weyl field ω, we can explore various statistical properties of the distribution P ϑ [ω] to gain a deeper understanding of the optimal bulk gravity theory that emerges from learning the boundary entanglement data. xy of the Weyl field (in logarithmic scale) v.s. the bulk geodesic distance dxy, where x, y points are taken from the same layer. The distance between points on a larger radius (or a higher layer) appears farther due to the hyperbolic background geometry, but the inverse correlation length ∆ (the slope) remains roughly on the same order of magnitude across layers. We first study the covariance function Σ (ω) xy of the Weyl field ω, defined as We observe that its diagonal elements Σ (ω) xx (i.e. the local covariance) grows with the radius ρ x coordinate (as x approaches the boundary), see Fig. 7(a). This is because the discretization scale is changing along the radius direction. In our discretization scheme as shown in Fig. 4, the hyperbolic space is finer discretized towards the center of the bulk, therefore the field ω will appear to be stiffer near the bulk center, and hence its covariance is smaller. To eliminate this influence of the discretization scheme, we normalize (standardize) the covariance and define the correlation function We found that the Weyl field correlation C (ω) xy decays exponentially with respect to the geodesic distance d xy in the holographic bulk where the inverse correlation length ∆ remains almost the same across different layers in the bulk, as shown in Fig. 7(b). The short-ranged nature of the Weyl field correlation is more obviously shown in Fig. 8, which is an unequivocal sign of locality. In other words, the machine has learned from the entanglement data that the Weyl field fluctuation can be described by a local model (as the correlation is short-ranged) in the bulk. This emergent locality is remarkable since locality was never explicitly given to the generative model at the architecture level: the neural network in the generator G ϑ was fully connected, which in principle allows non-local / long-ranged correlation of ω across the bulk, yet a short-ranged correlation emerges from learning the entanglement data. Further more, we can learn about the leading modes of gravitational fluctuations in the machine-learned distribution P ϑ [ω] by computing the spectral decomposition of the covariance function where λ (α) is the αth eigenvalue and ω (α) is the corresponding eigenmode. The result is shown in Fig. 9. The long wave-length collective fluctuations emerges as the leading (low-energy) modes of gravitational fluctuation automatically. Using the covariant function Σ (ω) xy , one can reconstruct the effective gravitational action to the quadratic order (at Gaussian level) approximately. In this way, the machine-learning model helps us to extract a statistical gravity theory (in terms of the Weyl field theory) from the entanglement data, demonstrating a data-driven approach to establish the holographic duality.

C. Matter Field Mass Renormalization Effect
As we have seen, geometric fluctuation effectively introduces interactions between the bulk scalar field φ, which generates the desired behavior for mutual information. As a consequence, the bare mass m of the scalar field should also be renormalized by the gravitational interaction. Remarkably, we can observe such a renormalization effect in our holographic model, by comparing the static model (without geometric fluctuation) and the fluctuating model (with geometric fluctuation). The mass parameter m is trainable in both models, but their optimal values are different due to the renormalization effect. We train the static model on the single-region entanglement data, and the fluctuating model on both the singleand double-region entanglement data. For a range of total central charge 11/2 ≤ c ≤ 8 studied, we observe that the trained value of the (bare) mass m in the fluctuating model is systematically larger than that m 0 in the static model, as shown in Fig. 10.
The mass renormalization effect can be understood heuristically by consider a single-region entanglement. In the static model, the entanglement entropy is modeled by e −S A ∼ e −m0|γ A | where γ A is the geodesic connecting the entanglement cuts of A through the static bulk. With geometric fluctuation |γ A | → |γ A | + δ|γ A | (where δ|γ A | is the additional geodesic length due to the Weyl field ω), the entanglement entropy will be modeled by For these two models to match, we must have m > m 0 , which qualitatively explains our observation.

V. SUMMARY AND DISCUSSION
We present a machine-learning approach to extract the holography statistical gravity theory from the data of multi-region entanglement entropy in a quantum manybody system. Our work advances both the field of tensor network holography and the field of machine learning holography. (i) On the tensor network holography side, we generalize the random tensor network (RTN) model to incorporate the bond dimension fluctuation, which makes the model more expressive in capturing features of multiregion entanglement. We derive the holographic bulk theory for the RTN with bond dimension fluctuation and show that the dual gravity theory consists of a massive scalar field on a fluctuating background geometry. The idea of using Ising duality in the derivation is also quite original, which provides an alternative view of the bulk theory that has not been presented in literature, as we are aware of. (ii) On the machine learning holography side, our work goes beyond the previous approaches [26,[63][64][65][66][67][68] of inferring only a static background geometry from the boundary quantum data. By modeling the bulk geometric fluctuation with a generative model, our approach can extract a statistical gravity theory from the quantum entanglement data. Remarkably, we found that the machine-constructed gravity theory exhibit an emergent locality, which reveals the hidden bulk locality behind the non-local quantum entanglement on the boundary.
Our work provides a novel data-driven approach to explore the emergent gravity from quantum entanglement. Combining with the recent development of efficient numerical methods to simulate entanglement dynamics in quantum many-body systems [47,48,69], we can further explore the corresponding gravity dynamics in the holographic bulk, which will deepen our understanding of emergent gravity from quantum entanglement. On the practical side, our algorithm will boost the efficiency to model the entanglement structure of quantum manybody systems, which will find applications in quantum algorithm optimization and quantum circuit design.
graph G by M[G] (where each perfect matching is a subset of edges such that every vertex is covered and only covered by one edge). We define the bare propagators (the covariance functions) Σ (φ) and Σ (ω) from the inverses of the bare kernels K (φ) and K (ω) for both φ and ω fields respectively, Σ (φ) xy = ((K (φ) ) −1 ) xy , Σ (ω) xy = ((K (ω) ) −1 ) xy . (A6) Using perturbative field theory (treating g in Eq. (A1) as perturbation), to the 2nd order in g (and keeping only the tree level diagrams), we can calculate the covariance functions Substitute the correlation functions Eq. (A7)-Eq. (A9) to Eq. (A5), we find (to the g 2 order) In the case that Σ xy ≥ 0 (which is typically the case), we can ensure I A:B ≥ 0 and I A:B:C ≤ 0. The result proves that random tensor network model can produce a negative tripartite information I A:B:C , which is a unique feature of quantum many-body entanglement that can not be achieved in classical systems. A negative tripartite information is an indication of quantum information scrambling and chaotic quantum dynamics in the quantum system. Although the bulk theory is a classical statistical gravity model, it can still model the quantum chaotic entanglement features on the holographic boundary, which speak for the strong expression power of the random tensor network model.