Deep Learning black hole metrics from shear viscosity

Based on the AdS/CFT correspondence, we build up a simple deep neural network for the unsupervised learning of black-hole metrics from the complex frequency-dependent shear viscosity. The network architecture provides a discretized representation of the holographic renormalization group flow of shear response functions and is applicable for a large class of strongly coupled field theories. Given the existence of the horizon and guided by the continuity of spacetimes, we show that the Schwarzschild and Reissner-Nordstr\"{o}m metrics can be learned accurately. Moreover, we illustrate that the generalization ability of the deep neural network can be excellent, which indicates that using the black hole spacetime as a hidden medium, a wide spectrum of shear viscosity can be generated from a narrow frequency range. Our work might not only suggest a data-driven way to study holographic transports, but also shed new light on the emergence mechanism of black hole spacetimes from field theories.

Introduction.-Renormalization group (RG) is a physical scheme to understand various emergent phenomena in the world through iterative coarse graining [1][2][3][4]. Deep learning (DL) algorithm is the core driving force of the recent wave of artificial intelligence [5]. It has been suggested that RG and DL might have a common logic [6] and their relation has attracted a lot of interest [7][8][9][10][11][12][13][14]. In particular, Mehta and Schwab [15] constructed a mapping between variational RG and deep neural network (DNN) based on restricted Boltzmann machines, from which they claimed that DL algorithms may implement a generalized RG scheme. Later, it was pointed out [16] that the standard renormalization process is essentially the feature extraction by supervised learning but cannot be generated by a general unsupervised learning unless the network structure is specific, such as the multiscale entanglement renormalization ansatz (MERA) [17]. Furthermore, it was exhibited that by choosing the loss function to maximize the mutual information instead of the usual Kullback-Leibler divergence, the unsupervised learning can identify the relevant degrees of freedom and execute a RG process [18].
RG is believed as one of the key elements to understand the quantum gravity. In particular, by anti-de Sitter/conformal field theory (AdS/CFT) correspondence [19][20][21][22], a strongly coupled quantum critical theory in the d-dimensional spacetime is reorganized along the RG scale, inducing a classical theory of gravity in the d+1dimensional AdS spacetime. RG is not the only connection between DL and gravity. Through the study of tensor networks [23][24][25], especially MERA, it has been realized that how the geometry is emergent from field theories usually involves the network and optimization, which are two important ingredients of DL.
One can expect two benefits at least: it is helpful to understand how the spacetime emerges and a data-driven phenomenological model can be built up for strongly coupled field theories. Work in the second respect was initiated by [28], where the inverse problem of the AdS/CFT is studied: how to reconstruct the spacetime metric from the given field theory data by the DNN which implements the AdS/CFT. Sequently, the so called AdS/DL correspondence is applied to learn the bulk metric from the lattice QCD data of the finite-temperature chiral condensate. Interestingly, the emergent metric exhibits both a black hole horizon and an IR wall with finite height, signaling the crossover of the QCD thermal phases [29].
In the prototype of the AdS/DL (i.e. the first numerical experiment in [28]), the architecture of the DNN is according to the discretized equation of motion of the real φ 4 theory minimally coupled to the Einstein gravity, the data is the one-point function and the conjugate source with a label determined by the near-horizon scale field, the target is the metric of the Schwarzschild black hole, a key technique is to devise the regularization that ensures the continuity of the learned metric, and the DNN performs better near the boundary than near the horizon where the relative error is around 30%. In [33], it has been attempted to learn the Reissner-Nordström (RN) metric by AdS/DL but the mean square error (MSE) lies in the range from O 10 −3 to O 10 −1 . Importantly, it is revealed that the form of the regularization term must be fine-tuned for different metrics. This suggests that the DNN cannot find the target metric if it is unknown previously, because one cannot judge which is closer to the target metric under different regularizations.
In this letter, we will address the technical problems and extend the physical range of AdS/DL. We also hope to look for the insights on the emergence mechanism of spacetimes. Our strategies are as follows. First of all, consider that AdS/CFT is almost customized for the computation of the transports of strongly coupled quantum critical systems at finite temperatures [34] and in particular, the application of holography is anchored partially in the calculation of the shear viscosity [35]. Thereby we will adapt the complex frequency-dependent shear viscosity as the given field theory data. Second, we propose to build up the DNN according to the holographic RG flow of shear response function. This flow plays the essential role in the well-known holographic membrane paradigm [36,37], which interpolates the standard AdS/CFT correspondence and classical black hole membrane paradigm smoothly [38,39]. Third, our goal is to learn the black hole metric accurately. By selecting suitable coordinates, the metric at horizon can be fixed as zero. This will reduce the learning difficulties. Fourth, the system error in [28] comes from adding labels on the data and introducing the regularization. Because the horizon value of the response function is completely determined by the regularity analysis at horizon, we can transfer the data from IR to UV, which is contrary to [28]. Thus, our DNN can carry out an unsupervised algorithm, indicating that there are no artificially given labels. Fifth, we still use the regularization to guide the network finding a continuous metric. However, our training process has two stages and the regularization is only required in the first stage. So we can choose any regularization term as long as it can induce a smaller loss in the second stage. Finally, we will discuss possible extensions and physical implications.
From RG flow to DNN.-Suppose that an isotropic and translation-invariant strongly coupled field theory is dual to the 3+1-dimensional classical Einstein gravity, which allows a static black-hole solution with the metric ansatz ds 2 = −g tt (r)dt 2 + g rr (r)dr 2 + g xx (r)d x 2 . (1) When the black-hole background is perturbed by timedependent sources, the shear mode (δg) x1 x2 = h(r)e −iωt of the gravitational wave is controlled by the equation of motion In the Hamiltonian form, it can be written as where Π is the momentum conjugate to the massless field h. Consider the foliation in the r-direction and define the shear response function on each cutoff surface . . .
FIG. 1. The architecture of the DNN. The green and blue nodes have N layers, which upgrade the response functions from IR to UV by discretized RG flow equations (11). The arrows indicate the direction of data transfer.
Substituting Eq. (3) into Eq. (4), one can obtain a flow equation Note that this equation has been derived in [36] where the DC limit is focused 1 . We will study the frequencydependent behavior.
Applying the regularity of χ on the horizon, one can read off the horizon value of χ directly where r h is the horizon radius. Taking Eq. (7) as the IR boundary condition, the flow equation can be integrated to UV. However, it should be pointed out that the response function χ on the UV boundary is not equal to the shear viscosity η of the boundary field theory. In the Supplementary Material (SM), we will clarify the relationship between them using the Kubo formula of the complex frequency-dependent viscosity tensor [43,44] and the holographic renormalization of the Einstein-Maxwell theory [45]. In the metric ansatz (1), g xx can be fixed as r 2 without loss of generality but g tt is independent with g rr in general. However, there are many black holes which share the feature g tt g rr = 1, indicating that the radial pressure is the negative of the energy density [46]. For simplicity, we will study this situation at first and return to the more general case latter. For the simple situation, the metric ansatz can be reduced to where we have selected the coordinates so that the horizon is located at z = 1 and the boundary at z = 0. Accordingly, Eq. (6) is simplified as We will build up a DNN according to this flow equation. A schematic diagram of the DNN is plotted in FIG. 1. The N deep layers are located by discretizing the radial direction where z b is the UV boundary, z h is the IR boundary, and the integer n belong to [1, N ]. The trainable weights of the network represent the discretized metrics. The input of the network is the given value of the response at IR boundary. The output is the response at UV boundary. The data is transferred from the N th layer (IR) to the 1th layer (UV). The upgrade rule of the response data between layers is determined by the discretized representation of Eq. (9) Reχ (z + ∆z) = Reχ (z) + ∆z Reχ (z) Imχ (z) , Here we have separated the discretized flow equation into real and imaginary parts for the convenience in DL. We train this neural network to learn the mapping of frequencies to boundary response functions. Once training successfully, we can extract the discretized black hole metrics from the trained neural network. The loss function we choose is the L 2 -norm up to a regularization term, if existed. Here χ denotes the boundary response generated by the DNN with the target metric andχ is the learned result. We need a regularization term which can guide the DNN to find a continuous metric with the existence of a horizon. The form of the regularization term can be arbitrary as long as it can reduce the final loss. In practice, our regularization term can be specified as where the two parts are designed for the continuity and the horizon, respectively. They involve three hyperparameters c 1 , c 2 and c 3 . Data, training, and results.-We specify the discretized RG flow and hence the DNN by setting z b = 0.01, z h = 0.99, and N = 10. Using Eq. (11) with the IR boundary condition χ(z h ) = 1 and the Schwarzschild metric  or the RN metric where q is the charge density, we generate 2000 data (ω, χ(ω, z b )) from ω = 0 to ω = 1 with even spacing, see Fig. (2). The training set and validation set account for 90% and 10%, respectively. Obviously, we haven't labeled the data, so our learning algorithm is unsupervised.
We train the network in two stages. First, the initial weights are randomly selected from (0, 2). The loss function is given by the sum of Eq. (12) and Eq. (13). We will adopt the RMSProp optimizer [47]. Second, the initial weights of the DNN will be replaced by the trained weights of the first stage. The loss function is re-set as Eq. (12) without the regularization. Then the network will be trained again with the optimizer Adam [48].
After the training of each stage, one can read the loss, extract the weights, and calculate their error. It can be found that after the second stage of training, the performance of the DNN is usually improved. In particular, the loss (without regularizations) of the first stage can be reduced by several orders of magnitude. In addition, turning the regularization factors in the first stage can improve the performance of the DNN in the second stage. With these in mind, we will scan the parameter space of regularization factors carefully and train the DNN by two stages. At last, the results with minimum loss will be collected. Note that more details of the training scheme will be given in the SM. Importantly, we perform some statistical average to suppress the fluctuation due to the randomized initialization of the network, by which the predictions of the DNN are defined.
In TABLE S.1 of the SM, we list the final training reports after two stages and the statistical average of various numerical experiments in the letter. Among others, it is shown that from the data with ω ∈ [0, 1], the Schwarzschild and RN metrics can be learned with high accuracy: the mean relative error (MRE) is around O (0.1%) 2 . The target and learned metrics have been plotted in FIG. (3.a).
Hereto, we almost naively select the frequency range of the data as ∆ω = 1. One important question in DL is how well does the model generalize? To proceed, we consider different datasets with the narrow frequency range ∆ω = 10 −2 and keep each of them with 2000 data. Interestingly, we find that both Schwarzschild and RN metrics still can be well learned, although the error will increase when the frequency window is close to zero and especially when the charge density is large. This is shown by the MRE of the metrics learned from two typical windows, see the right half of FIG. (3.b). Furthermore, it suggests that the generalization ability of the DNN can be excellent. Indeed, in the left half of FIG. (3.b), we illustrate that using the metric learned from the data with ∆ω = 10 −2 , one can generate the data with ∆ω = 1 very accurately: in the best performing example, the MRE of the generated data can reach the order of one millionth! We also note that the examples with relatively large errors in FIG. (3.b) can be expected because the DC limit of shear viscosity is determined solely by the physics at horizon, and for the extremal RN black hole with q = √ 12, the IR CFT associated with the AdS 2 ×R 2 geometry dominates the low-frequency physics [49,50]. Similarly, we do not expect that the DNN can learn well from a very high frequency window, where the UV CFT should dominate 3 .
Two metric components.-A more general black hole metric has two independent metric components g tt and 2 The MSE is around O 10 −6 . 3 We have checked that the performance of the DNN is declined when the frequency of the window is high enough. In fact, it has been observed in [63,64] that the retarded Green function for the shear stress operator at the infinite frequency is determined by the energy density. We thank Matteo Baggioli for the discussion on this point. g rr . From Eq. (6), one can find that they appear in the form of the joint factor g rr /g tt . Therefore, the DNN can be applied to learn the joint factor but in general each of them cannot be learned separately from the shear response. Nevertheless, if there are other ways to determine one, the other can be obtained by the DNN. For example, there is evidence that the entanglement plays important role in weaving the spacetime [51][52][53][54][55]. Among others, it has been shown that the holographic entanglement entropy S(l) can be used to fix the bulk metric wherever the extremal surface reaches [56]. Here l is the scale of the boundary entangling region on which the bulk extremal surface is anchored [52]. Under the present metric ansatz, the holographic entanglement entropy only depends on g rr , so it can complement to the shear viscosity to determine two metric components. Conclusion and discussion.-With simple DL, we studied an inverse problem of AdS/CFT: given the complex frequency-dependent shear viscosity of boundary field theories at finite temperatures, whether the metrics of bulk black holes can be extracted? We showed that the Schwarzschild and RN metrics can be learned by the DNN with high accuracy. The network architecture can be taken as a discretized representation of the holographic RG flow of the shear response, hence being a new application of the holographic membrane paradigm. We emphasize that it is universal for any isotropic and translation-invariant field theories which are dual to the 3+1-dimensional Einstein gravity. The extensions to the symmetry-breaking theories, the higher spacetime dimensions, and the modified theories of gravity should be plausible. Among others, we note that the wave equation of the shear mode has been built up for a general isotropic theory without translational symmetry [57]. The equation involves a radially varying function describing the graviton mass. Accordingly, one can construct the RG flow and the DNN, where the graviton mass is encoded into new trainable weights. It would be interesting to see whether the DNN can learn the metric and the mass simultaneously. In addition to various extensions, there are two open questions which deserve to be mentioned. (i) Is there a better ansatz for the regularization term? Note that the regularization in this work is not to prevent overfitting as usual in machine learning. Instead, it is a guide to the minimum loss. We might need a deeper physical understanding of the regularization. (ii) How to realize the continuous RG flow by the DNN? Apparently, a direct increase in the number of layers requires very powerful computing capabilities. A more efficient method might be to apply the recently proposed DNN models of ordinary differential equations [58]. We ultimately hope that our work could suggest a data-driven way to study holographic transports.
Moreover, we found that the complete black hole metric from IR to UV can be well learned from the data with narrow frequency ranges. We also have checked that randomly deleting several data points in our numerical experiments will not qualitatively change the performance of the DNN. These two facts indicate that the shear viscosity encodes the spacetime in a very different way from the entanglement entropy. The latter probes the deeper spacetime only by the S(l) with a larger l, so any data point is necessary to reconstruct the spacetime. Perhaps we can describe this difference concisely as follows: the non-local observable (entanglement entropy) on the boundary probes the bulk spacetime locally, while the local observable (shear viscosity) probes non-locally.
Furthermore, this non-locality leads to the excellent generalization ability of the DNN, which should be important in the application to the experimental data that may be collected only in a part of the spectrum. Theoretically, from the perspective of machine learning, it usually implies that the data is highly structured 4 . This structure is often important but obscure 5 , due to the black-box problem of machine learning. However, here the structure is nothing but the higher-dimensional black hole spacetime. Also, it suggests that the strongly coupled field theory with gravity dual could exhibit another feature of the hologram in addition to encoding the higher dimension: the local (a small piece of the hologram) can reproduce the whole, see the schematic diagram FIG. (4) for this insight.
Acknowledgments  4 Another possibility is that the network has some symmetry, see [59] for an example. 5 For example, using the generative adversarial network (GAN), the approximate statistical predictions have been made recently in the string theory landscape [60], where the accurate extrapolation capability has been exhibited on simulating Kähler metrics. It was speculated that this is the first evidence of Reid's fantasy: all Calabi-Yau manifolds with fixed dimension are connected.
Supplementary material for 'Deep Learning black hole metrics from shear viscosity'

FREQUENCY-DEPENDENT SHEAR VISCOSITY
Compared to the shear viscosity at the zero frequency limit, the study of frequency-dependent counterparts is rare. So let's begin from reviewing the definition of the full complex frequency-dependent viscosity tensor. In [43], the Kubo formulas for the stress-stress response function at zero wavevector is derived from first principles. Then the frequency-dependent viscosity tensor is defined. The approach given in [43] involves the response to a uniform external strain and a microscopic Hamiltonian. In [44], an alternative field-theory approach is proposed, by which the Ward identity of viscosity coefficients in [43] is retrieved and extended. Here we will follow [44] to give the definition of the frequency-dependent shear viscosity.
For a theorist, the viscosity can be measured by sending a gravitational wave through the system [66]. Suppose that the two-dimensional flat space is perturbed by δg kl . The response tensor can be defined by Here the subscript r and the subscript a below indicate that we have invoked the closed time-path formalism to discuss the real-time response [61,62,65]. The elastic modulus and the viscosity tensor can be further defined by separating the right hand of Eq. (S.1) into two parts, The stress tensor can be derived by the variation of the generating functional with respect to the metric , (S. 3) and the second variation leads to the retarded correlator G ij,kl ra (x) ≡ 4δW δg aij (x)δg rkl (0) = δ kl T ij δ(x) − λ ijkl (x) − ∂ t η ijkl (x).

(S.4)
The elastic modulus is the stress response up to the zeroth-order in time derivatives, which can be determined by the constitute relation of perfect fluid. In hydrodynamic expansion, one has δ T ij (x) r = − P δ ik δ jl + 1 2 δ ij δ kl κ −1 δg rkl (x), (S. 5) where P is the pressure and κ −1 is the inverse compressibility. Then the elastic modulus can be given by λ ijkl (x) = P δ ik δ jl + δ il δ jk + δ ij δ kl κ −1 δ(x). (S.6) The viscosity tensor can be decomposed as where we have assumed that the system is isotropic in two spatial dimensions. The coefficients ζ, η, η H denote the bulk, shear, and Hall viscosities, respectively. Substituting the last two equations into Eq. (S.4), one can obtain Note that this formula applies to any frequencies. In contrast, the shear viscosity in most literatures is introduced by the constitutive relationship of hydrodynamics, so strictly speaking its Kubo formula is only applicable to low frequencies, although its form happens to be the same as Eq. (S.9).