AdS/CFT as a deep Boltzmann machine

We provide a deep Boltzmann machine (DBM) for the AdS/CFT correspondence. Under the philosophy that the bulk spacetime is a neural network, we give a dictionary between those, and obtain a restricted DBM as a discretized bulk scalar field theory in curved geometries. The probability distribution as training data is the generating functional of the boundary quantum field theory, and it trains neural network weights which are the metric of the bulk geometry. The deepest layer implements black hole horizons, and an employed regularization for the weights is an Einstein action. A large $N_c$ limit in holography reduces the DBM to a folded feed-forward architecture. We also neurally implement holographic renormalization into an autoencoder. The DBM for the AdS/CFT may serve as a platform for studying mechanisms of spacetime emergence in holography.


I. INTRODUCTION
Deep Boltzmann machines [1] are a particular type of neural networks in deep learning [2][3][4] for modeling probabilistic distribution of data sets.They are equipped with deep layers of units in their neural network architecture, and are a generalization of Boltzmann machines [5] which are one of the fundamental models of neural networks.Deepening the architecture enlarges the representation power of the models, and recent advances in training deep models in machine learning were initiated by analogues of the deep Boltzmann machines.
The neural network of a deep Boltzmann machine consists of visible units and hidden units.On those units binary variables live, and they interact with each other under a Hamiltonian called an energy function.Thus basically the deep Boltzmann machine is an Ising model in which spins only at a boundary layer are visible (observable), and the Hamiltonian allows inhomogeneity and nonlocality.For a given probability distribution of the observed spin configurations at the boundary layer, Ising bond strengths (called "weights") in the model Hamiltonian are trained to approximate the given distribution; that is the deep learning of the deep Boltzmann machine.The training determines the weights automatically, and a structure of the Hamiltonian emerges.Efficient algorithms for the training [6] accelerated the progress in deep learning.
In this paper we study a relation between the deep Boltzmann machines and the AdS/CFT correspondence [7][8][9] in quantum gravity.The AdS/CFT correspondence is a holographic duality between a (d + 1)-dimensional quantum gravity and a d-dimensional quantum field theory (QFT) without gravity.The latter lives at the boundary of the gravitational spacetime of the former.From the viewpoint of the QFT, the direction perpendicular to the boundary surface is an "emergent" space direction.Therefore, the aforementioned structure of the deep Boltzmann machines suits the scheme of the AdS/CFT, once we identify their visible layers with the QFT, and the hidden layers as the bulk spacetime.See Fig. 1.The trained weights are interpreted as the metric function of the bulk geometry.We detail the relation between the two schemes both of which are renowned independently in different sciences.
A motivation to bring them together also comes from recent progress in discretization of the AdS/CFT.Popular toy models of the AdS/CFT a la quantum information use MERA [10] and other tensor networks [11].In the first place, quantum gravity has a long history of Regge calculus [12] and dynamical triangulation [13] where spacetimes are approximated by networks.For formulating quantum gravity, we need dynamical network whose structure is determined in a self-organized manner.In that view, neural network architecture may provide a novel platform for quantum gravity and the emergent spacetime.
We show that the AdS/CFT correspondence naturally fits the scheme of the deep Boltzmann machines, where the bulk spacetime geometry is reinterpreted as a sparse neural network.We construct explicitly a deep Boltzmann machine architecture which represents an example of the AdS/CFT correspondence.Previously, in [17,18] a possibility of relating hidden variables of Boltzmann machines to bulk fields was mentioned [40].In [14], entanglement feature of a free fermion chain was trained at a random tensor network as a deep Boltzmann machine.The holographic interpretation for feed-foward deep neural networks was proposed and studied in [15,16] for training QFT and QCD linear response functions, and the obtained emergent bulk spacetime for large N c QCD exhibits interesting physical properties and computes other observables as predictions.While our work here naturally relates to these work, in this paper we concentrate on deep Boltzmann machines as an AdS/CFT correspondence.
Although at first sight the two schemes look similar, in details they possess different characteristics.For example, since deep Boltzmann machines have constraints for their architecture and trainability, we need a careful discretization of bulk field theories.In addition, the AdS/CFT correspondence is well-understood at the large N c limit, while that limit has not been studied in the Boltzmann machines.Furthermore, generalization in deep learning owes to degenerate sets of trained weights, while in the AdS/CFT that degeneration is not expected.In this paper we address these basic questions raised in relating the two schemes.We provide a concrete expression of a deep Boltzmann machine which satisfies the standard constraints, and find that the large N c limit brings the Boltzmann machine to a folded feed-forward architecture.We propose an Einstein action as a regularization of training to distinguish sets of weights to be interpreted as a smooth spacetime.
The organization of this paper is as follows.In Sec.II, we briefly review deep Boltzmann machines.In Sec.III, we provide a dictionary of the deep Boltzmann machines and the AdS/CFT, and construct a deep Boltzmann machine for a bulk scalar field theory in generic curved geometry.Discretization of the fields and the spacetime, and also the properties at the deepest layer are studied.In Sec.IV, we apply the standard large N c limit (saddle point approximation) to the deep Boltzmann machine and see the consistency with the holographic linear response.In Sec.V, we propose how we identify weights interpreted as a spacetime, through a regularization using Einstein action.Sec.VI is devoted to our summary and discussions.In Appendix A, we provide an autoencoderlike neural network architecture for holographic renormalization.

II. BRIEF REVIEW OF BOLTZMANN MACHINE
Boltzmann machines in machine learning are a network model for giving a probabilistic distribution P (v i ) of the input variables v i ∈ {0, 1} for i = 1, • • • , n.The probability P (v i ) of the Boltzmann machine is defined by with an energy function E(v i ) given by Here a i and w ij are real parameters, and are called bias and weight, respectively.The structure of the Boltzmann machine is specified by a network graph (see Fig. 2 Left) in which n units each of which takes the value v i are denoted by circles while the weights w are by lines connecting the circles.Obviously, in physics P is a Boltzmann distribution of a canonical ensemble of a classical Ising model, after which the name Boltzmann machine was named.As (2) is quadratic in v i , even if one varies the biases and the weights, the class of the probability distributions obtained by ( 2) is quite limited.Making the network deep enlarges the representation power of the architecture.Adding hidden variables h i , we have with the energy function The layer consisting of v i (h i ) is called visible (hidden) layer, and here, weights connecting units in the same layer are set to zero, so the network graph is restricted to have only limited connection lines (see Fig. 2 Center).This ( 4) is called a restricted Boltzmann machine.Suppose one has measured events and obtained many sets of {v i }.From that one can calculate a statistical probability distribution P ev (v i ).In machine learning, one trains the machine to mimic P ev (v i ).In the function P (v i ) of the Boltzmann machine, the weights and biases are trained parameters.The difference measure between the two probability distributions, called error function, is the relative entropy (alternatively called Kullback-Leibler (KL) divergence in machine learning) and one tries to minimize it by changing the parameters.When this divergence is minimized, the machine is welltrained.
The reason why the restriction in the graph of the Boltzmann machines is important is the conditional independence.In (4), when {h i } is given, E is linear in v i , so the probability P factorizes to a product of each unit v i , then the training requires a lot less computational resource.Note that without losing this conditional independence we can add a term i,j w i δ j i h i h j ; The Kronecker delta δ j i means that this is a self-interaction within the same unit.What is not allowed for the conditional independence is the term like h i h j or v i v j with i = j in the same layer.
Due to the hidden variables, the representation power of the restricted Boltzmann machines is greater.It is proven that with a sufficiently large number of hidden units any probability distribution is well approximated [20], which is called a universal approximation theorem for the restricted Boltzmann machines.Adding more hidden layers can help the representation power [41], and the following is called a deep Boltzmann machine [1], where the index k labels the hidden layers k = 1, 2, 3, • • • , N .The hidden variables h (k) i in N hidden layers, taking binary values, are summed as in (3): The visible layer consisting of units whose values are the input v i may be thought of as k = 0, which means v i = h (0) i .The weights are again restricted as in the restricted Boltzmann machines, see Fig. 2 Right.
Although the number of units in each hidden layer may not be equal to each other, in this paper we consider the case of having the same number of units in each layer, for a structural simplicity.

III. ADS/CFT AS A BOLTZMANN MACHINE A. Dictionary
Let us first describe the similarity between the AdS/CFT correspondence and the deep Boltzmann machine, and construct a dictionary between the two schemes.In the AdS/CFT correspondence [7], the fundamental formula relating the boundary and the bulk is the GKP-W relation [8,9] which is This expression is for the large N c limit of the QFT with its generating functional Z[J], while for the finite N c this expression should be replaced by [42] Z Here z is the emergent bulk coordinate, and z = 0 is the boundary of the asymptotically AdS bulk, where the boundary condition φ(z = 0) = J is put for the bulk field φ(x, z).
The deep Boltzmann machine approximates a given provability distribution by the formula (III A) with the energy function (6).The similarity between the quantum gravity version of the GKP-W relation (9) and the definition equation of the deep Boltzmann machine is obvious.The identification rules are as follows: the source function J(x) is the input value v i of the visible layer, the bulk field φ(x, z) is the hidden variables h (k) i , the emergent bulk coordinate z is the label for the hidden layers k, the generating function Z[J] of the QFT is the provability distribution P (v i ), and the bulk action III A for a summary of the correspondence.The path integral of the bulk field φ is replaced by the summation over the hidden variables h (k) i , so in general the quantum bulk of the AdS/CFT correspondence is a deep Boltzmann machine.
The resemblance is basically the fact that the deep Boltzmann machine tries to reproduce the probability distribution whose input is the values at the visible layer,

AdS/CFT Deep Boltzmann machine Bulk coordinate z
Hidden layer label k QFT source J(x) Input value vi Bulk field φ(x, z) Hidden variables h and in the AdS/CFT in the same manner, the bulk pathintegration tries to reproduce the generating functional of the boundary QFT where the input is the boundary value of the bulk field.
In order to make the probability interpretation of the QFT generating functional, we normalize it as Then, using the deep Boltzmann machine representation of the bulk, training of the bulk theory is possible to reduce the error function which is given by the Kullback-Leibler divergence of the QFT partition function and the model probability of the Boltzmann machine, As the deep Boltzmann machine allows arbitrary architecture for its neural network, it is naturally expected that the AdS/CFT correspondence may be included as an example of the Boltzmann machine.Below we shall demonstrate that a typical AdS/CFT model allows a deep Boltzmann machine architecture.

B. Bulk as a neural network
The simplest bulk action is for a free massive scalar field in an asymptotically AdS d+1 bulk geometry, We chose Euclideanized signature so that the AdS/CFT correspondence can fit the scheme of Boltzmann machines.The d-th coordinate τ ≡ x d is the Euclideanized time coordinate.Local interaction terms such as φ n can be treated similarly below, but in this paper we consider only the free case.
We assumed for simplicity that the metric depends only on the bulk emergent direction z and is diagonal, and assumed also a homogeneous spacetime about There exists a relation among them, In the standard Poincare coordinate system, the asymptotically AdS d+1 geometry is with the AdS radius L, so we have the condition near the AdS boundary z ∼ 0.
Let us discretize the action (12) to make it written like the energy function E of the deep Boltzmann machine.[43] First, the bulk geometry is discretized to a regular lattice whose sites are labeled by (k, i, l); The label k refers to the discretized bulk emergent direction z, where ∆z is the lattice spacing.In the same manner, we discretize x I and τ ≡ x d by the lattice spacing ∆x and ∆τ , giving the label i and l respectively, as x i,l .This simplest regularization scheme replaces the integration over d d xdz by a sum k,i,l .The bulk field φ(x, z) at the sites are written as Thus the bulk scalar field is the variables in the hidden units.Naturally, we identify the label k as the label for the layers of a deep Boltzmann machine.We define our visible layer as the AdS boundary value of the scalar field, i.e. the first k = 0 component of h, The z-derivative term in the bulk Lagrangian is replaced by As for the derivative terms concerning ∂ τ (and similarly for ∂ I ), we choose Note the dependence on the label k; the reason we chose this discretization will be clear below.
The background metric functions are discretized in the same manner, FIG. 3: The architecture of the deep Boltzmann machine for the AdS/CFT.The thick lines mean weights.The difference from the standard Boltzmann machines in Fig. 2 is that we allow a weight connecting the same unit, denoted as a curved line just above each unit in the figure.We omit drawing the l-direction.
Then the bulk action is written as This is recast to the following Boltzmann machine form: where the weights are given as These weights are symmetric.The path integral over the bulk field φ(x I , τ, z) is equivalent to the integration over all the hidden variables h 9) is written as where E is defined by (27).And through (22) we have This is a deep Boltzmann machine representation of the AdS/CFT correspondence.See Fig. 3 for our architecture.
The background metric appears as the weights of the Boltzmann machine.As is understood from ( 28) and ( 29), the weights are not all independent.They form quite a sparse neural network.The trained variables are (17).The bulk scalar field appears as the hidden variables to be summed, at which the boundary value of the bulk scalar field is identified with the visible units.
Note that, because we chose the discretization scheme (24), the weights w connecting the units in the same layer are completely diagonal (of the form δ j i δ m l ) as seen in (29).As explained earlier, this does not violate the conditional independence of the units in the same layer, which is important for the training of the Boltzmann machine.

C. Discretized values of the bulk field
Standard Boltzmann machines allow binary values for the variables h, while the AdS/CFT correspondence requires continuous values for the bulk field φ(x, z).To bridge these two, we need to discretize also the field value space.Suppose that typical values necessary for training the Boltzmann machines are in the range |φ| < A. Then a natural discretization of the values is given as where In this discretization, we have 2u 0 different values for φ to take.
To bring them to a set of binary-valued variables, we introduce the binary variable s (ũ) ∈ {0, 1} as Here we divided the single entry φ to u 0 entries s (ũ) with In effect, each unit referring to h in the Boltzmann machine is split into u 0 different units s.All of those split units need to share the same weight for every original connection with different unit h.In this manner, binary-valued Boltzmann machines can be constructed from the continuous-valued Boltzmann machines.

D. The deepest layer is the end of space
In the AdS/CFT correspondence, the IR end of the geometry is important, as it directly reflects the properties of allowed spectra of the QFT.Popular holographic geometries are confining geometries and black holes, and they have specific boundary conditions at the IR end of the geometry.Except for the cases of conformal field theories as the boundary QFT, the bulk geometry naturally terminates at some IR scale z = z IR .In the terminology of the deep Boltzmann machines, this means that the layers terminate at k = N with N ≡ z IR /∆z.Let us rephrase those geometric boundary conditions to the treatment around the deepest layer k = N of the deep Boltzmann machine.
First of all, the layers actually terminates at k = N , and there is no additional layer at k = N + 1.In terms of the weights, this condition means w The confining geometry refers to the Dirichlet boundary condition for the bulk field φ, as it simply means that the bulk field φ needs to vanish in the spacetime in the region specified by z > z IR .This location is called a "hard wall" in holography.In general, the condition of the hard wall means that the metric function which the scalar field feels has a special behavior there.In fact, to impose φ(z IR ) = 0 we just need that the mass c(z)m 2 at z = z IR diverges.So, in this case we can rephrase the Dirichlet boundary condition in terms of the metric function: Next, consider the black hole horizon condition instead.At the black hole horizon the zz component of the metric diverges, while the temporal dd component vanishes.Thus a(z) = 0, and d(z) diverges, while b(z) and c(z), and a(z)d(z) remain finite and nonzero.Therefore, the black hole boundary condition is with infinitesimally small ∆z.The confining condition (35) and the horizon condition (36) are examples of more general constraints.We can impose other boundary conditions if they are consistent with the large N c limit of the AdS/CFT, as we shall study in the next section.
For the pure AdS geometry, there is no IR end of the space, and the z direction is extended to z = ∞.So, to host all possible asymptotically AdS spacetimes in our Boltzmann machine architecture, we need to prepare infinitely deep Boltzmann machines.[44]

IV. SADDLE POINT OF BOLTZMANN MACHINE
The AdS/CFT correspondence has been studied in the large N c limit of the QFT, because it is the classical limit of the bulk which is the only reliable gravity calculation, in the absence of satisfactory quantum gravity formulation.The large N c limit, or the classical limit of the gravity theories, is equivalent to the zero temperature limit of the Boltzmann machine, E replaced by E/T and T → 0. At the limit, gravity theory can be well approximated by saddle points -the solutions of the classical equations of motion, and the on-shell action is simply substituted to the right hand side of (8).
The zero-temperature limit of Boltzmann machines has not been studied extensively, because the hidden/visible variables in ordinary Boltzmann machines take only binary values and the saddle approximation is not effective.In our case, as described in Sec.III C, we consider a certain limit of binary-valued Boltzmann machines to acquire continuous-valued variables.There the equations of motion, and the saddle points, make sense.In this section, we study consistency conditions of the classical limit (equivalently, the zero temperature limit, or the saddle point approximation) of the deep Boltzmann machine given in the previous section.For simplicity we treat the variables h as continuous variables.
First, let us consider the standard restricted Boltzmann machine (4) with continuous-valued variables, and how the classical limit causes an inconsistency.The saddle point equation is Since b i and w ji are the parameters to be fixed after the training with various sets of {v i }, this equation cannot be satisfied.Therefore, restricted Boltzmann machines with continuous hidden variables do not allow the saddle point approximation, on the contrary to the physical intuition.
Adding more hidden layers can resolve the issue.Suppose we have another hidden layer to the restricted Boltzmann machine, Then the saddle point equation is ji h (1) The first equation determines h (2) j for any given training value of v i , so it gives a consistent saddle point equation.The second equation simply shows that the middle layer variable h (1) takes a fixed value −[[w (1) ] T ] −1 c.So, substituting these to the original energy function (38), we obtain the saddle point approximation of the restricted Boltzmann machine, Here it should be noted that the obtained energy function is linear in v, so it does not have the form of the standard Boltzmann machines whose energy functions are bilinear in v.The reason of the linearity is that the saddle point equations for the k-odd and the k-even layers decouple from each other.Instead of adding more layers, we can introduce a selfcoupling δ j i h i h j as described earlier in Sec.II.For the case with just a single hidden layer with a uniform selfcoupling weight, we have The saddle point equation is which determines the value of the hidden unit h i in terms of the input v i , so it gives a consistent solution.The onshell value of the energy function is where Thus, as is expected, the effective energy function is bilinear in v i .Keeping these results in mind, we consider the deep Boltzmann machine which we defined in the previous section.The saddle point condition is So, the variables at the layer k are related to those of the layer k + 1 and of the layer k − 1.The equation has both the properties of the cases of ( 38) and (44).
Let us study the consistency with the IR boundary condition, the deepest layer.For simplicity, to look at the consistency, we consider the case with a homogeneous φ in x I and x d , which is equivalent to ignore the terms with b(z) and those with d(z).The architecture of the deep Boltzmann machine is shown in Fig. 4. At the deepest layer k = N , the saddle point equation gives where the symbol ∼ denotes a linear relation whose coefficients are given by weights.Similarly, using the saddle point equation at k = N − 1 which gives where we omit the coefficients.Then altogether, they give h (N −2) ∼ h (N −1) .Repeating this backwards in layers, we finally obtain which can also be written as In the continuum limit, this relation is In the boundary QFT of the AdS/CFT correspondence, this relation is equivalent to the linear response relation [22], Thus, the deep Boltzmann machine is found to be consistent with the standard analysis in the classical bulk side of the AdS/CFT correspondence.
It is intriguing that the saddle point approximation provides explicitly the relation between the variables at the adjacent layers.This relation is expected for neural networks of the feed-forward type.So, we find that the saddle point approximation of the deep Boltzmann machine provides a feed-forward architecture.A subtle difference from the standard feed-forward is that the linear relation starts at the deepest layer, not at the visible layer.In fact, looking at only the first hidden layer we find that the relation is just like (52), so it is not a linear relation between just the adjacent two layers.In fact, the scalar field equation is the second order differential equation, so, there is a backward wave in addition to the forward wave.These two waves satisfy the consistency condition at the deepest layer.Therefore, the saddle point approximation provides a "folded feed-forward" structure.Unfolding the folded structure is possible, and in Appendix A we provide an architecture of the unfolded type, which looks like an autoencoder.

V. REGULARIZATION AND EINSTEIN ACTION
In this section we study the condition for the trained weights to be interpreted as a bulk spacetime.The training should be performed in the following manner.First, prepare a quantum field theory for which one wants to know whether a gravity dual exists or not.Then calculate Z QFT [J] and its probability interpretation P QFT [J] by (10) [45].Prepare the deep Boltzmann architecture given in Sec.III B, and by updating the weights to reduce the KL divergence (11).Once the KL divergence decreases to enough accuracy, we say that the bulk is learned.
The metric function is encoded in the sparse weights w, w in the deep Boltzmann machine, given in (28) and (29).Although it can be easily reconstructed, there is one issue: generically, the training ends up with various different sets of weights, because the error function may have many almost degenerate local minima.Each of the local minima can approximate P QFT very well -which is related to the notion "generalization" in machine learning.
We are looking for a gravity dual.For the trained Boltzmann machine weights to be interpreted as a bulk spacetime, we need a criterion to pick up a certain set of the weights among the degenerate local minima.The criterion is simple: use an Einstein action for a regularization of the deep Boltzmann machine.[46] Basically, the generic trained weights take quite scattered values, and they are not a smooth function of z in the continuum limit ∆z → 0. For those configurations of weights, the Einstein action takes a large value.On the other hand, smooth metric functions of z, and so a set of the weights whose values do not drastically vary as one sweeps the depth of the layers, have lower values of the Einstein action.Therefore, the Einstein action can be used for selecting a proper set of weights which has a bulk spacetime interpretation.
A proposed regularization term is a discretization of the Einstein action with a negative cosmological term, To obtain the explicit discretization, for simplicity we consider a conformal spacetime When α(z) = 0, the metric reduces to the pure AdS metric.So the asymptotically AdS spacetimes allow only lim z→0 α(z) = 0.In terms of the previous a(z), b(z), c(z) and d(z), this ansatz leads to So, the discretization of the z direction as z = z k provides a lattice on which α k ≡ α(z k ) is defined for k = 0, 1, 2, • • • .For d = 3 as an example, the Einstein action becomes Discretizing this action, we obtain the regularization term is the error function Here c is a positive constant, and we ignored an additive constant term which is irrelevant for the training.The additive constant comes from the first term in (60), that is, the cosmological constant for the pure AdS spacetime.
It is easy to see that this regularization in fact favors a smoother distribution of the weights, due to the second term in (61).Using this regularization, during the training, the Boltzmann machine tries to minimize also the Einstein action at the same time.When the error decreases to a satisfactory small value, the weights can be interpreted as an Einstein spacetime.[47] When E reg = 0, we have a pure AdS geometry.In generic AdS/CFT correspondence, the bulk action can take various forms; it may have more supergravity fields, and it may suffer from higher derivative terms coming from quantum gravity corrections or stringy corrections.Therefore, in general, the regularization needs to allow more generic actions, such as or more with tensorial structures.This can be discretized by the same method, and we obtain a more general Einstein regularization.Note that here in the expression the coefficients c i are trained variables.In general we do not know the bulk gravity action, so we need to allow general action.When we say that the bulk is a spacetime, it means that it reduces the value of this general action.When the powers of the Riemann tensors stop at some fixed value, a low energy effective spacetime interpretation is possible.

VI. SUMMARY AND DISCUSSIONS
In this paper, we have shown that the standard AdS/CFT correspondence can be regarded as a deep Boltzmann machine.The neural network architecture, once properly defined, is interpreted as a bulk spacetime geometry.The network depth is the emergent direction in the bulk, and the network weights are metric components.Hidden variables correspond to discretized fields in the bulk, and the probability distribution given by the Boltzmann machine is the generating functional of the QFT dual to the bulk gravity.
For the mapping we used a bulk scalar field theory in curved geometries.The IR boundary conditions of the bulk, such as the black hole horizon or the hard wall, can be implemented to the weight behavior around the deepest layer of the Boltzmann machine.The large N c limit of the AdS/CFT is argued in the scheme of the Boltzmann machine, consistently giving an organized set of linear equations among weights.
Among many degenerate vacua of the deep Boltzmann machine, a set of weights which allows a spacetime interpretation is selected by a regularization in the error function in addition to the KL divergence.We have introduced a natural regularization based on the Einstein action and its generalization.
Our study provides a relation between the AdS/CFT correspondence and the deep Boltzmann machine.In view of the history of the quantum gravity, introducing discretization of the spacetime is natural, and we hope that more concepts on Boltzmann machines and deep learning can be imported to quantum gravity, so that it may shed light on the mystery of the bulk emergence in the holographic principle.
Several clarification and comments are in order.First, the discretization of the spacetime used in this paper favors a certain coordinate system, and thus the general coordinate transformation of the gravity theory is not seen in our framework.Furthermore, even the isometry transformation, which is the scale transformation in the QFT, is difficult to be implemented in our formulation.A hyperbolic network (as used in [14]) is better to be consistent with the isometry, but is difficult to find a continuum limit.It would be interesting to seek for a more desirable discretization scheme.In fact, a well-known approach for quantum gravity uses dynamical triangulation [12] in which connection bond topology (which is axon topology in neural networks) is dynamical.On the other hand standard neural networks have a fixed architecture while the weights are variable.We may need a refined discretization architecture, to have a more unified view of the quantum gravity and the deep learning.Generic quantum gravity may include even non-geometric landscape for which machine learning have been applied [24][25][26][27][28][29][30], and a possible relation to our approach of having neural network as a spacetime would be interesting.
Second, we have introduced the saddle point approximation in the evaluation of the deep Boltzmann machine, based on the standard large N c argument of the AdS/CFT correspondence.At the limit, linear relations among weights at layers close to each other are derived, and the information at the visible layer is processed through the bulk, as if it propagates.This means that the saddle point of the deep Boltzmann machine brings it to a folded feed-forward type deep neural network.The AdS/CFT interpretation of a feed-forward neural network was studied in [15,16] and the trained weights exhibit an interesting physical picture.
On the other hand, at finite N c (beyond the classical limit of the bulk), the relation between the AdS/CFT and the deep Boltzmann machine is a little ambiguous -in the bulk, only the scalar field (∼ hidden variables h) is path-integrated while the metric (∼ weights w) is not.This situation can be interpreted as that the scalar field is that of a probe brane in the bulk.What is the metric path-integral in the deep Boltzmann machine?It is a statistical summation of the network weights, which has been studied as statistical neural networks [31].It would be interesting to see more connection between the holographic principle and the statistical neural networks.In fact, in [32] a conformal transformation of data space was found at a layer-to-layer propagation, and it may allow a holographic interpretation.
Finally, we make a comment on a relation to quan-tum information.It is known that the AdS/CFT correspondence has a close relation to quantum information, in particular AdS/CFT toy models based on tensor networks have been studied.The structure of MERA [10] has a bulk hyperbolic space interpretation, and tensor networks using perfect tensors [11] provide a quantum correspondence between the bulk and the boundary in the AdS/CFT.Since it is known [23] that any quantum code allows its deep Boltzmann machine interpretation, the AdS tensor networks can be mapped to deep Boltzmann machines.In general obtained machine architecture tends to be complicated (since the number of quantum gates necessary to reproduce A-leg tensor is ∼ O(2 2A )), so a continuum limit to have a continuum field theory in the bulk, which we studied in this paper, is difficult to take.Further studies for bridging the holographic principle and deep learning are desired.
Note added: while this manuscript was prepared, we noticed that Ref. [33] which interprets the bulk as a deep generative model was submitted to arXiv recently.These two equations with f ± govern the nonnormalizable and the normalizable modes, respectively.Therefore, once a spacetime bulk metric is given, we find two functions f ± (− ), and use them to define the neural network by discretizing the η direction as η = n∆η.
Noting that the discretization of η gives and the discretized spatial dependence is interpreted as a convolution in the neural network, then we find that the neural network is defined where the network weights are Note that we also discretize the (d − 1)-dimensional space of the boundary QFT, as in (A9), where the covariant Laplacian can be identified as a convolution in neural network.Generically, spatial derivatives in field equations are identified as a combination of weights connecting nearby units.The locality of the bulk field theory is a constraint of the weights of the neural networks.In this way, we can always include spatial dependence of the external field and the response, as a convolutional neural network.So, (A11) defines a convolutional neural network equivalent to (A2), with a trivial activation function.
The left hand side of the neural network is governed by the propagation weight (A11) for the non-normalizable mode.The input data J(x) is placed at the initial layer η = η ini ∼ ∞.It propagates with W − .Then we identify the output φ − (x, η = ∞) as O(x) .We call this whole network, shown in Fig. 5, a holographic autoencoder.
In reality for the training, we may focus on slowly varying external field and use the low momentum expansion f ± = c ± (η) + d ± (η) + • • • , then the weights are given as We train the coefficient function c ± and d ± with a constraint that both consistently solve (A6).
Using the horizon behavior h(η) ∼ 1/η and g(η) ∼const., we find that (A6) has a universal solution c ± ∼ η m 2 /2, d ± ∼ −η /2. (A14) This means that effectively the weight W near the black hole horizon vanishes (except for the trivial "1+" part), due to the red shift factor via h(η).Therefore, the effective dimensions of the data space around the central part of the holographic autoencoder decrease, which is suitable for the name "autoencoder" usually used in machine learning.

FIG. 1 :
FIG. 1: Top: the AdS/CFT correspondence.The horizontal coordinate z is the emergent spatial direction.Bottom: a deep Boltzmann machine.Circles are units and lines are weights.Double circles are visible units.It has a layered structure, and "deep" means that they have many layers toword the right direction in the figure.

FIG. 4 :
FIG.4:The simplified deep Boltzmann machine at which the i-and l-dependence are ignored.

FIG. 5 :
FIG. 5: Holographic autoencoder.The depth of the lines show average weights, which decrease toward the black hole horizon (the neck part of the neural network).