Deep Quantum Geometry of Matrices

We employ machine learning techniques to provide accurate variational wave functions for matrix quantum mechanics, with multiple bosonic and fermionic matrices. The variational quantum Monte Carlo method is implemented with deep generative flows to search for gauge-invariant low-energy states. The ground state (and also long-lived metastable states) of an SUðNÞ matrix quantum mechanics with three bosonic matrices, and also its supersymmetric “mini-BMN” extension, are studied as a function of coupling and N. Known semiclassical fuzzy sphere states are recovered, and the collapse of these geometries in more strongly quantum regimes is probed using the variational wave function. We then describe a factorization of the quantum mechanical Hilbert space that corresponds to a spatial partition of the emergent geometry. Under this partition, the fuzzy sphere states show a boundary-law entanglement entropy in the large N limit.


I. INTRODUCTION
A quantitative, first-principles understanding of the emergence of spacetime from nongeometric microscopic degrees of freedom remains among the key challenges in quantum gravity. Holographic duality has provided a firm foundation for attacking this problem; we now know that supersymmetric large N matrix theories can lead to emergent geometry [1,2]. What remains is the technical challenge of solving these strongly quantum mechanical systems and extracting the emergent spacetime dynamics from their quantum states. Recent years have seen significant progress in numerical studies of large N matrix quantum mechanics at nonzero temperature. Using Monte Carlo simulations, quantitatively correct features of emergent black hole geometries have been obtained; see, e.g., Refs. [3][4][5]. To grapple with questions such as the emergence of local spacetime physics, and its associated short-distance entanglement [6,7], new and inherently quantum mechanical tools are needed.
Variational wave functions can capture essential aspects of low-energy physics. However, the design of accurate many-body wave-function ansatze has typically required significant physical insight. For example, the power of tensor network states, such as matrix product states, hinges upon an understanding of entanglement in local systems [8,9]. In contrast, we are faced with models where there is an emergent locality that is not manifest in the microscopic interactions. This locality cannot be used a priori; it must be uncovered. Facing a similar challenge of extracting the most relevant variables in high-dimensional data, deep learning has demonstrated remarkable success [10][11][12], in tasks ranging from image classification [13] to game playing [14]. These successes, and others, have motivated tackling many-body physics problems with the machine learning toolbox [15]. For example, there has been much interest and progress in applications of restricted Boltzmann machines to characterize states of spin systems [16][17][18][19].
In this work, we solve for low-energy states of quantum mechanical Hamiltonians with both bosons and fermions, using generative flows (normalizing flows [20][21][22] and masked autoregressive flows [23][24][25], in particular) and the variational quantum Monte Carlo method. Compared with spin systems, the problem we are trying to solve contains continuous degrees of freedom and gauge symmetry, and there is no explicit spatial locality. Recent works have applied generative models to physics problems [26][27][28] and have aimed to understand holographic geometry, broadly conceived, with machine learning [29][30][31]. We use generative flows to characterize emergent geometry in large N multimatrix quantum mechanics. As we have noted above, such models form the microscopic basis of established holographic dualities.
We focus on quantum mechanical models with three bosonic large N matrices. These are among the simplest models with the core structure that is common to holographic theories. The bosonic part of the Hamiltonian takes the form ½X i ;X j ½X i ;X j þ 1 2 ν 2 X i X i þ iνϵ ijk X i X j X k : ð1:1Þ Here, the X i are N-by-N traceless Hermitian matrices, with i¼1, 2, 3. The Π i are conjugate momenta, and ν is a mass deformation parameter. The potential energy in Eq. (1.1) is a total square: VðXÞ ¼ 1 4 tr½ðνϵ ijk X k þ i½X i ; X j Þ 2 . The supersymmetric extension of this model [32], discussed below, can be thought of as a simplified version of the BMN matrix quantum mechanics [33]. We refer to the supersymmetric model as mini-BMN, following Ref. [34]. For the low-energy physics we are exploring, the large N planar diagram expansion in this model is controlled by the dimensionless coupling λ ≡ N=ν 3 . Here, λ can be understood as the usual dimensionful 't Hooft coupling of a large N quantum mechanics at an energy scale set by the mass term (cf. Ref. [35]).
The mass deformation in the Hamiltonian (1.1) inhibits the spatial spread of wave functions-which will be helpful for numerics-and leads to minima of the potential at ½X i ; X j ¼ iνϵ ijk X k : ð1:2Þ In particular, one can have X i ¼ νJ i , with the J i being, for example, the N-dimensional irreducible representation of the suð2Þ algebra. This set of matrices defines a "fuzzy sphere" [36]. There are two important features of this solution. First, in the large N limit, the noncommutative algebra generated by the X i approaches the commutative algebra of functions on a smooth two-dimensional sphere [37,38]. Second, the large ν limit is a semiclassical limit in which the classical fuzzy sphere solution accurately describes the quantum state. In this semiclassical limit, the low-energy excitations above the fuzzy sphere state are obtained from classical harmonic perturbations of the matrices about the fuzzy sphere [39]. See also Ref. [40] for an analogous study of the large-mass BMN theory. At large N and ν, these excitations describe fields propagating on an emergent spatial geometry. By using the variational Monte Carlo method with generative flows, we obtain a fully quantum mechanical description of this emergent space. This result, in itself, is excessive given that the physics of the fuzzy sphere is accessible to semiclassical computations. Our variational wave functions will quantitatively reproduce the semiclassical results in the large ν limit, thereby providing a solid starting point for extending the variational method across the entire N and ν phase diagram. Exploring the parameter space, we find that the fuzzy sphere collapses upon moving into the small ν quantum regime. We consider two different "sectors" of the model, with different fermion number R. The first will be purely bosonic states, with R ¼ 0. The second will have an R ¼ N 2 − N. In this latter sector, the fuzzy sphere state is supersymmetric at large positive ν, so we refer to it as the "supersymmetric sector." In the bosonic sector of the model, the fuzzy sphere is a metastable state, and it collapses in a first-order large N transition at ν ∼ ν c ≈ 4. See Figs. 2 and 3 below. In the supersymmetric sector of the model, where the fuzzy sphere is stable, the collapse is found to be more gradual. See Figs. 6 and 7. In Fig. 8, we start to explore the small ν limit of the supersymmetric sector.
Beyond the energetics of the fuzzy sphere state, we define a factorization of the microscopic quantum mechanical Hilbert space that leads to a boundary-law entanglement entropy at large ν. See Eq. (5.14) below. This factorization both captures the emergent local dynamics of fields on the fuzzy sphere and also reveals a microscopic cutoff to this dynamics at a scale set by N. The nature of the emergent fields and their cutoffs can be usefully discussed in string theory realizations of the model. In string-theoretic constructions, fuzzy spheres arise from the polarization of D branes in background fields [41][42][43][44]. A matrix quantum mechanics theory such as Eq. (1.1) describes N "D0 branes"-see Ref. [32] and the discussion section below for a more precise characterization of the string theory embedding of mini-BMN theory-and the maximal fuzzy sphere corresponds to a configuration in which the D0 branes polarize into a single spherical D2 brane. There is no gravity associated with this emergent space; the emergent fields describe the lowenergy worldvolume dynamics of the D2 brane. In this case, the emergent fields are a Maxwell field and a single scalar field corresponding to transverse fluctuations of the brane. In the final section of the paper, we discuss how richer, gravitating states may arise in the opposite small ν limit of the model.

II. THE MINI-BMN MODEL
The mini-BMN Hamiltonian is [32] H The ijk and ABC indices are freely raised and lowered. Lower αβ indices are for spinors transforming in the 2 representation of SO(3), while upper indices are for2. We will not raise or lower spinor indices. The full Hamiltonian can then be written as where λ α † A ≡ðλ Aα Þ † and fλ α † A ;λ Bβ g¼δ AB δ α β are complex fermion creation and annihilation operators. This Hamiltonian is seen to have four supercharges, States that are invariant under all supercharges therefore have vanishing energy. Matrix quantum mechanics theories arising from microscopic string theory constructions are typically gauged. This means that physical states must be invariant under the SUðNÞ symmetry. In particular, physical states are annihilated by the generators ð2:5Þ

A. Representation of the fermion wave function
The mini-BMN wave function can be represented as a function from bosonic matrix coordinates to fermionic states ψðXÞ ¼ fðXÞjMðXÞi. Here, X denotes the three bosonic traceless Hermitian matrices. The function fðXÞ ≥ 0 is the norm of the wave function at X, while jMðXÞi is a normalized state of matrix fermions. A fermionic state with definite fermion number R is parametrized by a complex tensor M ra Aα such that where j0i is the state with all fermionic modes unoccupied. The definition (2.6) is parsed as follows: For any fixed r and a, η ra † ¼ P αA M ra Aα λ α † A is the creation operator for the matrix fermionic modes, where A runs over some orthonormal basis of the suðNÞ Lie algebra and α ¼ 1, 2 for two fermionic matrices. Then, Q a η ra † j0i is a state of multiple free fermions created by η † . The final summation over r in Eq. (2.6) is a decomposition of a general fermionic state into a sum of free fermion states. Such a representation is seen to be completely general (but not unique) if we have the number of free fermion states D sufficiently large.
For purely bosonic models, jMðXÞi is simply the phase of the wave function.

B. Gauge invariance and gauge fixing
The generators (2.5) correspond to the following action of an element U ∈ G ¼ SUðNÞ on the wave function: In other words, the group acts by matrix conjugation. The wave function is required to be invariant under the group action; i.e., Uψ ¼ ψ for any U ∈ G.
Gauge invariance allows us to evaluate the wave function using a representative for each orbit of the gauge group. Let X be the representative in the gauge orbit of X. Gauge invariance of the wave function implies that there must exist functionsf andM such that fðXÞ ¼fðXÞ; The functionsf andM take gauge representatives as inputs or may be thought of as gauge-invariant functions. The wave function we use will be in the form (2.8). The functionsf andM will be parametrized by neural networks, as we describe in Sec. III. We proceed to describe the gauge fixing we use to select the representative for each orbit, as well as the measure factor associated with this choice. The SUðNÞ gauge representativeX will be such that (1) X i ¼ UX i U −1 for i ¼ 1, 2, 3 and some unitary matrix U.
iðiþ1Þ is purely imaginary, with the imaginary part positive for i ¼ 1; 2; …; N − 1. The third condition is needed to fix the Uð1Þ N−1 residual gauge freedom after diagonalizing X 1 . The representativeX is well defined except on a subspace of measure zero where the matrices are degenerate. Then,X can be represented as a vector in R 2ðN 2 −1Þ with a positivity constraint on some components. The change of variables from X toX leads to a measure factor given by the volume of the gauge orbit: jX 2 iðiþ1Þ j: ð2:10Þ Keeping track of this measure (apart from an overall prefactor) will be important for proper sampling in the Monte Carlo algorithm. The derivation of (2.10) is shown in Sec. A of the Supplemental Material (SM) [45].

III. ARCHITECTURE DESIGN FOR MATRIX QUANTUM MECHANICS
In this work, we propose a variational Monte Carlo method with importance sampling to approximate the ground state of matrix quantum mechanics theories, leading to an upper bound on the ground-state energy. The importance sampling is implemented with generative flows. The basic workflow is sketched as follows: (1) Start with a wave function ψ θ with variational parameters θ. In our case, θ will characterize neural networks. (2) Write the expectation value of the Hamiltonian to be minimized as In the mini-BMN case, X denotes three traceless Hermitian matrices (indices omitted), and H X ½ψ θ is the energy density at X. Notationally, E X∼pðXÞ is the expectation value, with the random variable X drawn from the probability distribution pðXÞ. (3) Generate random samples according to the wavefunction probabilities X ∼ p θ ðXÞ ¼ jψ θ ðXÞj 2 , and evaluate their energy densities H X ½ψ θ . The variational energy (3.1) can then be estimated as the average of energy densities of the samples. (4) Update the parameters θ (via stochastic gradient descent) to minimize E θ : where t ¼ 1; 2; … denotes the steps of training and the parameter α > 0 sets the learning rate. The gradient of energy is estimated from Monte Carlo samples: The method is applicable even if the probabilities are available only up to an unknown normalization factor. (5) Repeat steps 3 and 4 until E θ converges. Observables of physical interest are evaluated with respect to the optimal parameters after training. In the following, we discuss details of parametrizing and sampling from gauge-invariant wave functions with fermions. Technicalities concerning the evaluation of H X ½ψ θ are spelled out in the SM, Sec. B [45]. More details concerning the training are given in the SM, Sec. D [45]. Benchmarks are presented at the end of this section.

A. Parametrizing and sampling the gauge-invariant wave function
We first describe how gauge invariance is incorporated into the variational Monte Carlo algorithm. As already discussed, an important step is to sample according to X ∼ jψðXÞj 2 . From Eq. (2.8), for a gauge-invariant wave function, jψðXÞj 2 ¼ jfðXÞj 2 . However, in samplingX, we must keep track of the measure factor ΔðXÞ in Eq. (2.10). We do this as follows: (1) SampleX according to pðXÞ ¼ ΔðXÞjfðXÞj 2 .
(3) Output samples X ¼ UXU −1 . The correctness of this procedure is shown in the SM, Sec. A [45].
Conversely, at the evaluation stage, ψðXÞ can be computed in the following steps for gauge-invariant wave functions (2.8): (1) Gauge fix X¼UXU −1 as discussed in the last section.
(2) ComputeMðXÞ andfðXÞ. Details of the structure of M andf will be discussed below. (3) Return ψðXÞ ¼fðXÞjUMðXÞU −1 i according to Eq. (2.8). We now describe the implementation ofM andf as neural networks. The basic building block, a multilayer fully connected (also called dense) neural network, is an elemental architecture capable of parametrizing complicated functions efficiently [12]. The neural network defines a function F∶x ↦ y mapping an input vector x to an output vector y via a sequence of affine and nonlinear transformations: θ is an affine transformation, where the weights M 1 θ and the biases b 1 θ are trainable parameters. The hyperbolic tangent nonlinearity then acts elementwise on A 1 θ ðxÞ. We experimented with different activation functions; the final result is not sensitive to this choice. Similar mappings are applied m times, allowing M i θ and b i θ to be different for different layers i, to produce the output vector y. The mapping F∶x ↦ y is nonlinear and capable of approximating any square integrable function if the number of layers and the dimensions of the affine transformations are sufficiently large [46]. The functionMðXÞ is implemented as such a multilayer fully connected neural network, mapping from vectorized X toM in Eq. (2.6), i.e., R 2ðN 2 −1Þ → R DR2ðN 2 −1Þ . The implementation offðXÞ is more interesting, as both evaluatingfðXÞ and sampling from the distribution pðXÞ¼ΔðXÞjfðXÞj 2 are necessary for the Monte Carlo algorithm. Generative flows are powerful tools to efficiently XIZHI HAN and SEAN A. HARTNOLL PHYS. REV. X 10, 011069 (2020) 011069-4 parametrize and sample from complicated probability distributions. The functionfðXÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pðXÞ=ΔðXÞ p , so we can focus on sampling and evaluating pðXÞ, which will be implemented by generative flows.
Two generative flow architectures are implemented for comparison: a normalizing flow and a masked autoregressive flow. The normalizing flow starts with a product of simple univariate probability distributions pðxÞ ¼ p 1 ðx 1 Þ… p M ðx M Þ, where the p i can be different. Values of x sampled from this distribution are passed through an invertible multilayer dense network as in Eq. (3.4). The probability distribution of the output y is then The masked autoregressive flow generates samples progressively. It requires an ordering of the components of the input, say, where the parameter depends only on previous components. Thus, x 1 is sampled independently, and for other components, the dependence F i is given by Eq. (3.4). The overall probability is the product When p i ðx i Þ are chosen as normal distributions, both flows are able to represent any multivariate normal distribution exactly. Features of the wave function (such as polynomial or exponential tails) can be probed by experimenting with different base distributions p i ðx i Þ. Choices of the base distributions and performances of the two flows are assessed in the following benchmark subsection and also in the SM, Sec. D [45]. We use both types of flow in the numerical results of Sec. IV.

B. Benchmarking the architecture
In Ref. [34], the Schrödinger equation for the N ¼ 2 mini-BMN model was solved numerically. Comparison with the results in that paper allows us to benchmark our architecture before moving to larger values of N. In Ref. [34], the Schrödinger equation is solved in sectors with a fixed fermion number and total SO(3) angular momentum j ¼ 0; 1=2. We do not constrain j, but we do fix the number of fermions in the variational wave function.
The variational energies obtained from our machine learning architecture with R ¼ 0 and R ¼ 2 are shown as a function of ν in Fig. 1. We take negative ν to compare with the results given in Ref. [34], which uses an opposite sign convention. (There is a particle-hole symmetry of the Hamiltonian (2.2) via ν → −ν, λ → λ † , λ † → λ, and X → −X). The masked autoregressive flow yields better (lower) variational energies. These energies are seen to be close to the j ¼ 0 results obtained in Ref. [34]. The variational results seem to be asymptotically accurate as jνj → ∞, while remaining a reasonably good approximation at small ν. Small ν is an intrinsically more difficult regime, as the potential develops flat directions (visualized in Ref. [34]), and hence the wave function is more complicated, possibly with long tails. In the supersymmetric R ¼ 2 sector, where quantum mechanical effects at small ν are expected to be strongest, further significant improvement at the smallest values of ν is seen with deeper autoregressive networks and more flexible base distributions, as we describe shortly. Analogous improvements in these regimes will also be seen at larger N in Sec. IV C and the SM, Sec. D [45].
In Fig. 1, the base distributions p i ðx i Þ, introduced in the previous subsection, are chosen to be a mixture of s generalized normal distributions: FIG. 1. Benchmarking the architecture: Variational groundstate energies for the mini-BMN model with N ¼ 2 and fermion numbers R ¼ 0 and R ¼ 2 (shown as dots) compared to the exact ground-state energy in the j ¼ 0 sector, obtained in Ref. [34] (shown as the dashed curve). Uncertainties are at or below the scale of the markers; in particular, the variational energies slightly below the dashed line are within the numerical error of the line. NF stands for normalizing flows and MAF for masked autoregressive flows. As described in the main text, the numbers in the brackets are, first, the number of layers in the neural networks and, second, the number of generalized normal distributions in each base distribution.
Here, the k i r are positive weights for each generalized normal distribution in the mixture. In Eq. (3.8), the k i r , α i r , β i r , and μ i r are learnable (i.e., variational) parameters. For autoregressive flows, these parameters further depend on Because of the gauge fixing conditions 2 and 3 in Sec. II B, some components x i are constrained to be positive. In the normalization flow, this case is implemented by an additional map x i ↦ expðx i Þ. For the autoregressive flows, we have a more refined control over the base distributions; in this case, for components x i that must be positive, we draw from Gamma distributions instead: where again the k i r , α i r , and β i r depend on x j , with 1 ≤ j < i, according to Eq. (3.4).
In Fig. 1, we have shown mixtures with s ¼ 1, 3, 5 distributions. The number of layers in Eq. (3.4) has been increased with s to search for potential improvements in the space of variational wave functions. As noted, the only improvement within the autoregressive flows in going beyond one layer and one generalized normal distribution is seen at the smallest values of ν with R ¼ 2. On the other hand, the gap between the variational energies of the two types of flows in Fig. 1 suggests that the wave function is complicated in this regime, so the more sophisticated MAF architecture shows an advantage. The recursive nature of the MAF flows means that they are already "deep" with only a single layer. The complexity of the small ν wave function should be contrasted with the fuzzy sphere phase at large positive ν as discussed in Sec. IV and shown in, e.g., Figs. 2 and 3. The wave function in this semiclassical regime is almost Gaussian, and indeed, the NF(1, 1) and MAF(1, 1) flows give similar energies when initialized near fuzzy sphere configurations. The NF architecture, in fact, gives slightly lower energies in this regime, so we have used normalizing flows in Figs. 2 and 3 for the fuzzy sphere.
The numerics above and below are performed with D ¼ 4 in Eq. (2.6), so the fermionic wave function jMðXÞi is a sum of four free fermion states for each value of the bosonic coordinates X. In the SM, Sec. D [45], we see that increasing D above 1 lowers the variational energy at small ν, indicating that the fermionic states are not Hartree-Fock in this regime.

IV. EMERGENCE OF GEOMETRY
A. Numerical results, bosonic sector The architecture described above gives a variational wave function for low-energy states of the mini-BMN model. With the wave function in hand, we can evaluate observables. We start with the purely bosonic sector of the model (i.e., R ¼ 0). Then, we add fermions. An important difference between the bosonic and supersymmetric cases is that the semiclassical fuzzy sphere state is metastable in the bosonic theory but stable in the supersymmetric theory. Figure 2 shows the expectation value of the radius for runs initialized close to a fuzzy sphere configuration (solid circles) and close to zero (open circles). For large ν, a fuzzy sphere state with large radius is found, in addition to a "collapsed" state without significant spatial extent. Below ν c ≈ 4, the fuzzy sphere state ceases to exist. The nature of the transition at ν c can be understood from the variational energy of the states, plotted in Fig. 3. The bosonic semiclassical fuzzy sphere state is seen to be metastable at large ν, as the collapsed state has lower energy. For ν < ν c , the fuzzy sphere is no longer even metastable. We gain a semiclassical understanding of this transition in Sec. IV B. Figures 2 and 3 show that the radius and energy of the fuzzy sphere state are accurately described by semiclassical formulas (derived in the following section) for all ν > ν c . In particular, E=N 3 and r=N are rapidly converging towards their large N values. Figure 4 further shows that the probability distribution for the radius r becomes strongly peaked about its semiclassical expectation value at large ν.
Analogous behavior to that shown in Figs. 2 and 3 has previously been seen in classical Monte Carlo simulations of a thermal analogue of our quantum transition [47][48][49]. These papers study the thermal partition function of models similar to Eq. (1.1) in the classical limit, i.e., without the Π 2 kinetic energy term. The fuzzy geometry emerges in a first-order phase transition as a low-temperature phase in these models. We see that, in our quantum mechanical context, the geometric phase is associated with the presence of a specific boundary-law entanglement.

B. Semiclassical analysis of the fuzzy sphere
The results above describe the emergence of a (metastable) geometric fuzzy sphere state at ν > ν c . In this section, we recall that in the ν → ∞ limit, the fluctuations of the geometry are classical fields. For finite ν > ν c , the background geometry is well defined at large N, but fluctuations will be described by an interacting (noncommutative) quantum field theory.
In the large ν limit, the wave function can be described semiclassically [39,40]. We now briefly review this limit, with details given in the SM, Sec. C [45]. These results provide a further useful check on the numerics and will guide our discussion of entanglement in Sec. V.
The minima of the classical potential occur at ½X i ; X j ¼ iνϵ ijk X k : ð4:2Þ These are supersymmetric solutions of the classical theory, annihilated by the supercharges (2.3) in the classical limit; therefore, they have vanishing energy. The solution of Eq. (4.2) is where the J i are representations of the suð2Þ algebra, ½J i ; J j ¼ iϵ ijk J k . Here, we are interested in maximal, N-dimensional irreducible representations. (Reducible representations can also be studied, corresponding to multiple polarized D branes.) The suð2Þ Casimir operator suggests a notion of "radius" given by Indeed, the algebra generated by the X i matrices tends towards the algebra of functions on a sphere as N → ∞ [37,38]. At finite N, a basis for this space of matrices is provided by the matrix spherical harmonicsŶ jm . These harmonics obey X 3 i¼1 ½J i ;½J i ;Ŷ jm ¼ jðjþ1ÞŶ jm ; ½J 3 ;Ŷ jm ¼mŶ jm : ð4:5Þ We construct theŶ jm explicitly in the SM, Sec. C [45]. The j index is restricted to 0 ≤ j ≤ j max ¼ N − 1. The space of matrices therefore defines a regularized or "fuzzy" sphere [36]. Matrix spherical harmonics are useful for parametrizing fluctuations about the classical state (4.3). Writing the classical equations of motion can be perturbed about the fuzzy sphere background to give linear equations for the parameters y i jm . The solutions of these equations define the classical normal modes. We find the normal modes in the SM, Sec. C [45], proceeding as in Refs. [39,40]. The normal mode frequencies are found to be νω, with Here, j is the "orbital" angular momentum, and the 1 is due to the vector nature of the X i . We will give a field theoretic interpretation of these modes shortly. The modes give the following semiclassical contribution to the energy of the fuzzy sphere state: This energy is shown in Fig. 3. The scaling as N 3 arises because there are N 2 oscillators, with maximal frequency of order N. This semiclassical contribution will be canceled out in the supersymmetric sector studied in Sec. IV C below. The normal modes (4.7) can be understood by mapping the matrix quantum mechanics Hamiltonian onto a noncommutative gauge theory. The analogous mapping for the classical model has been discussed in Ref. [50]. We carry out this map in the SM, Sec. C [45]. The original Hamiltonian (1.1) becomes the following noncommutative U(1) gauge theory on a unit spatial S 2 [setting the sphere radius to 1 in the field theory description will connect easily to the quantized modes in Eq. (4.7)]: The noncommutative star product ⋆ is defined in the SM [45] and where the derivatives generate rotations on the sphere L i ¼ −iϵ ijk x j ∂ k and ½f; g ⋆ ≡ f⋆g − g⋆f. In Eqs. (4.9) and (4.10), the vector potential a i can be decomposed into two components tangential to the sphere, which become the two-dimensional gauge field, and a component transverse to the sphere, which becomes a scalar field. This decomposition is described in the SM, Sec. C [45]. The normal modes (4.7) are coupled fluctuations of the gauge field and the transverse scalar field. The zero modes in Eq. (4.7) are pure gauge modes, given in Eq. (4.11) below. In Eq. (4.10), the effective coupling controlling quantum field theoretic interactions is seen to be 1=ðNνÞ 3=2 . The extra 1=N arises because the commutator ½a i ; a j ⋆ vanishes as N → ∞; see the SM [45]. Corrections to the Gaussian fuzzy sphere state are therefore controlled by a different coupling than that of the 't Hooft expansion (recall λ ¼ N=ν 3 ). The SUðNÞ gauge symmetry generators (2.5) are realized in an interesting way in the noncommutative field theory description. We see in the SM [45] that upon mapping to noncommutative fields, the gauge transformations become Here, n is the normal vector and yðθ; ϕÞ a local field on the sphere. The first term in Eq. (4.11) is the usual U(1) transformation. The second term describes a coordinate transformation with infinitesimal displacement n × ∇y. Indeed, it is known that noncommutative gauge theories mix internal and spacetime symmetries, which, in this case, are area-preserving diffeomorphisms of the sphere [51,52]. The emergent U(1) noncommutative gauge theory thereby realizes the large N limit of the microscopic SUðNÞ gauge symmetry, as area-preserving diffeomorphisms [37,38].
The fluctuation modes about the fuzzy sphere background allow a one-loop quantum effective potential for the radius to be computed in the SM, Sec. C [45]. The potential at N → ∞ is shown in Fig. 5. At large ν, the effective potential shows a metastable minimum at r ∼ Nν=2. For ν < ν 1-loop c;N¼∞ , this minimum ceases to exist. The large N, one-loop analysis therefore qualitatively reproduces the behavior seen in Figs. 2 and 3. The quantitative disagreement is mainly due to finite N corrections. The transition is only sharp as N → ∞.

C. Numerical results, supersymmetric sector
We now consider states with fermion number R ¼ N 2 − N. The fuzzy sphere background is now supersymmetric at large positive ν [32]. The contribution of the fermions to the ground-state energy is seen in the SM, Sec. C [45], to cancel the bosonic contribution (4.8) at one loop: In Fig. 6, the variational upper bound on the energy of the fuzzy sphere state remains close to zero for all values of ν. Figure 7 shows the radius as a function of ν. Probing the smallest values of ν requires a more powerful wave-function ansatz than those of Figs. 6 and 7. We will consider that regime shortly. In contrast to the states with zero fermion number in Fig. 3, here the fuzzy sphere is seen to be the stable ground state at large ν. However, the fuzzy sphere appears to merge with the collapsed state below a value of ν that decreases with N, which is physically plausible: While the classical fuzzy sphere radius r 2 ∼ ν 2 N 2 decreases at small ν, quantum fluctuations of the collapsed state are expected to grow in space as ν → 0. This is because the flat directions in the classical potential of the ν ¼ 0 theory, given by commuting matrices, are not lifted in the presence of supersymmetry [53]. Eventually, the fuzzy sphere should be subsumed into these quantum fluctuations. This smoother large N evolution towards small ν (relative to the bosonic sector) is mirrored in the thermal behavior of classical supersymmetric models [54,55].
Indeed, exploring the small ν region with more precision, we observe a physically expected feature. In Fig. 8, we see that as ν decreases towards zero, the radius not only ceases to follow the semiclassical decreasing behavior but turns around and starts to increase. The variance in the distribution of the radius is also seen to increase towards small ν, revealing the quantum mechanical nature of this regime. These behaviors (nonmonotonicity of radius and increasing variance) are expected-and proven for N ¼ 2-because the flat directions of the classical potential at ν ¼ 0 show that the extent of the wave function is set by purely quantum mechanical effects in this limit.
The small ν regime here is furthermore an opportunity to test the versatility of our variational ansatz away from semiclassical regimes. In the SM, Sec. D [45], we see that for small ν, MAFs achieve much lower energies than NFs. Increasing the number of distributions in the mixture and the number D of free fermions states in Eq. (2.6) further lowers the energy. These facts mirror the behavior we found in our N ¼ 2 benchmarking in Sec. III B at small ν, increasing our confidence in the ability of the network to capture this regime for large N also. The error in a variational ansatz is, as always, not controlled, and therefore, further exploration of this regime is warranted before very strong conclusions can be drawn. We plan to revisit this regime in future work, to search for the possible presence of emergent "throat" geometries, as we discuss in Sec. VI below.

V. ENTANGLEMENT ON THE FUZZY SPHERE
In this section, we show that the large ν fuzzy sphere state discussed above contains boundary-law entanglement. To compute the entanglement, one must first define a factorization of the Hilbert space. For our emergent space at finite N and ν, the geometry is both fuzzy and fluctuating, and hence lacks a canonical spatial partition. The fuzziness of the sphere is captured by a toy model of a free field on a sphere with an angular-momentum cutoff. Recall from Sec. IV that the noncommutative nature of the fuzzy sphere amounts to an angular-momentum cutoff j max ¼ N − 1. We will start, then, by defining a partition of the space of functions with such a cutoff.

A. Free field with an angular-momentum cutoff
Consider a free massive complex scalar field φðθ; ϕÞ on a unit two-sphere with the following Hamiltonian: Here, π is the field conjugate to φ. We impose a cutoff j ≤ j max on the angular momentum, rending the quantum mechanical problem well defined. The fields can therefore be decomposed into a sum of spherical harmonic modes: a jm Y jm ðθ; ϕÞ: ð5:2Þ The wave functional of the quantum field φðθ; ϕÞ is then a mapping from coefficients a jm to complex amplitudes. The ground-state wave functional of the Hamiltonian (5.1) is To calculate entanglement for quantum states, a factorization of the Hilbert space H ¼ H 1 ⊗ H 2 is prescribed. To motivate the construction of such a factorization in the fuzzy sphere case, we now review the general framework of defining entanglement in (factorizable) quantum field theories.
In quantum mechanics, a quantum state is a function from the configuration space Q to complex numbers, and the Hilbert space of all quantum states is commonly that of square integrable functions H ¼ L 2 ðQÞ. In quantum field theories, the space Q is furthermore a linear space of functions on some geometric manifold M, and thus an orthogonal decomposition Q ¼ Q 1 ⊕ Q 2 induces a factorization of H ¼ L 2 ðQ 1 Þ ⊗ L 2 ðQ 2 Þ, which can be exploited to define entanglement.
To define entanglement, it then suffices to find an orthogonal decomposition of the space of fields on the fuzzy sphere. Without an angular-momentum cutoff, i.e., with j max → ∞, there is a natural choice for any region A on the sphere, which sets Q 1 to be all functions supported on A, and Q 2 all functions supported onĀ, the complement of A. Any function f on M can be uniquely written as a sum of Here, χ A is the function on the sphere that is 1 on A and 0 otherwise. Note that the map of multiplication by χ A , f ↦ fχ A , acts as the projection Q 1 ⊕ Q 2 → Q 1 . Conversely, given any orthogonal projection operator P∶Q → Q, we can decompose Q ¼ imP ⊕ ker P.
When the cutoff j max is finite, multiplication by χ A will generally take the function out of the subspace of functions with j ≤ j max . However, we can still do our best to approximate the projector P ∞ A of multiplication by χ A , as defined in the previous paragraph, with a projector P j max A that lives in the subspace with j ≤ j max . Formally, let Q j max be the space of functions on the sphere spanned by Y jm ðθ; ϕÞ, with j ≤ j max . Define the orthogonal projector P j max A ∶Q j max → Q j max to minimize the distance kP j max A − P ∞ A k. The projector P j max A annihilates all functions in the orthogonal complement of Q j max , when viewed as an operator acting on Q ∞ . It is convenient to choose k·k to be the Frobenius norm, and in the SM, Sec. E [45], an explicit formula for P j max A is obtained. The projector P j max A then defines a factorization of the Hilbert space L 2 ðQ j max Þ ¼ L 2 ðimP j max A Þ ⊗ L 2 ðker P j max A Þ for any region A, and entanglement can be evaluated in the usual way. In particular, the second Rényi entropy of a pure state jψi on a region A is where x A ¼ Px and xĀ ¼ ðI − PÞx are integrated over imP and ker P, for P ¼ P j max A , and x A and xĀ can be more compactly combined into a field x with j ≤ j max . Note that the various x's in Eq. (5.4) denote functions on the sphere.
The projector P j max A is found to have two important geometric features: (1) The trace of the projector, which counts the number of modes in a region, is proportional to the size of the region. Specifically, at large j max , trP j max A ∝ j 2 max jAj as is seen numerically in Fig. 9 and understood analytically in the SM, Sec. E [45]. FIG. 9. Trace of the projector versus fractional area of the region (a spherical cap with polar angle θ A ), with different angular-momentum cutoffs j max . A linear proportionality is observed at large j max . The discreteness in the plot arises because the finite j max space of functions cannot resolve all angles.
(2) The second Rényi entropy defined by the projector follows a boundary law. At large j max , with the mass fixed to μ ¼ 1, the entropy S 2 ≈ 0.03j max j∂Aj as is seen numerically in Fig. 10 and understood analytically in the SM, Sec. E [45]. This boundary entanglement law in Fig. 10 is of course precisely the expected entanglement in the ground state of a local quantum field [6,7]. As the cutoff j max is removed, the entanglement grows unboundedly.
The partition we have just defined can now be adapted to the fluctuations about the large ν fuzzy sphere state in the matrix quantum mechanics model. We do this in the following subsection. Intuitively, we would like to replace the jðj þ 1Þ þ μ 2 spectrum of the free field in the wave function (5.3) with the matrix mechanics modes (4.7). Recall that the matrix modes are cut off at angular momentum j max ¼ N − 1.

B. Fuzzy sphere in the mini-BMN model
Now, we address two additional subtleties that arise when adapting the free field ideas above to the mini-BMN fuzzy sphere. First, the mini-BMN theory is an SUðNÞ gauge theory. It is known that entanglement in gauge theories may depend upon the choice of gauge-invariant algebras associated with spatial regions [56]. Different prescriptions correspond to different boundary or gauge conditions [57]. However, for a fuzzy geometry, the boundaries of regions and gauge edge modes are not sharply defined. To introduce the fewest additional degrees of freedom, we choose to factorize the physical Hilbert space, instead of an extended one [58,59], to evaluate entanglement in the mini-BMN model. This method is similar to the "balanced center" procedure in Ref. [56], where edge modes are absent [60].
Second, the emergent fields include fluctuations of the geometry itself. The factorization that we have discussed in the previous subsection is tailored to a region on the sphere and does not need to approximate a spatial region in other geometries. The partition is even less meaningful in nongeometric regions of the Hilbert space. The variational wave function we have constructed can be used to compute entanglement for any given factorization of the Hilbert space, but it is unclear if preferred factorizations exist away from geometric limits. In this work, we focus on the entanglement in the ν → ∞ limit where the fields are infinitesimal and hence do not backreact on the spherical geometry. In this limit, the factorization is precisely-up to issues of gauge invariance-that of the free-field case discussed in the previous subsection.
The matrices corresponding to the infinitesimal fields on the fuzzy sphere are, cf. Eq. (4.6), which should be thought of as living in the tangent space at X i ¼ νJ i . At large ν, the wave function is strongly supported on the classical configuration, and hence in this limit, the infinitesimal description is accurate. Gauge transformations then act as where ϵ is infinitesimal and Y is an arbitrary Hermitian matrix. The ϵ½Y; A i term is omitted in Eq. (5.6) as it is of higher order. Gauge invariance of the state is manifested as Physical states are wave functions on gauge orbits ½A i , the set of infinitesimal matrices differing from A i by a gauge transformation (5.6). Similarly to the discussion of free fields above, a partition of the space of gauge orbits is specified by a projector P. We now explain how this projector is constructed. Given a projector P 0 acting on infinitesimal matrices A i , a projector acting on gauge orbits can be defined as However, for P to be well defined, P 0 must preserve gauge directions: for any A i , Y and some Y 0 dependent on Y. Let V be the subspace of gauge directions: V ¼ fi½Y; J i ∶Yis Hermitiang: ð5:10Þ Then, Eq. (5.9) is equivalent to the requirement that P 0 ðVÞ ⊂ V. The strategy for finding the projector P is to solve for the projector P 0 that minimizes kP 0 − χ A k subject to the constraint that Eq. (5.9) is satisfied. Then, P is defined via P 0 as in Eq. (5.8).
The problem of minimizing kP 0 − χ A k for orthogonal projectors P 0 such that P 0 ðVÞ ⊂ V is exactly solvable as follows. The condition that P 0 ðVÞ ⊂ V is equivalent to imposing that P 0 ¼ P V ⊕ P V ⊥ , where P V is some projector in the subspace V and P V ⊥ in its orthogonal complement Via the correspondence between matrix spherical harmon-icsŶ jm and spherical harmonic functions Y jm ðθ; ϕÞ-in the SM, Sec. C [45]-both of these minimizations become the same problem as in the free field case, with a detailed solution in the SM, Sec. E [45].
The second Rényi entropy, in terms of gauge orbits, is evaluated similarly to Eq. (5.4): where Δ are measure factors for gauge orbits and ψ inv ð½AÞ ¼ ψðνJ þ AÞ. Recall that ψ is gauge invariant according to Eq. (5.7). The formula (5.11) as displayed does not involve any gauge choice. However, there are some gauges where evaluating Eq. (5.11) is particularly convenient. The gauge we choose for this purpose, which is different from that in Sec. II B, is that A ∈ V ⊥ ; i.e., the fields are perpendicular to gauge directions. In this gauge, measure factors are trivial, and the projector is simply P V ⊥ , which minimizes kP V ⊥ − χ A j V ⊥ k: where ψ ⊥ ðAÞ is defined as ψðνJ þ AÞ for A ∈ V ⊥ [61]. The bosonic fuzzy sphere wave function can be written in the ν→∞ limit as follows. As in Eq. (4.6), the perturbations can be decomposed as A i ¼ P a δx a P jm y i jmaŶjm , where the y i jma diagonalize the potential energy at quadratic order in A so that V ¼ ðν 2 =2Þ P a ω 2 a ðδx a Þ 2 þÁÁÁ (see SM, Sec. C [45]). The wave function is then, analogously to Eq. (5.3), ð5:13Þ The frequencies are given by Eq. (4.7), excluding the pure gauge zero modes. Using this wave function, the Rényi entropy (5.12) can be computed exactly and is shown as a solid line in Fig. 11. As N → ∞, these curves approach a boundary law Here, j∂Aj ¼ 2π sin θ A is again the circumference of the spherical cap A [in units where the sphere has radius one, consistent with the field theoretic description in Eq. (4.9)]. The result (5.14) is the same as that of the toy model in Fig. 10, with j max now set by the microscopic matrix dynamics to be N − 1. (A simpler instance of entanglement revealing the inherent graininess of a spacetime built from matrices is two-dimensional string theory [62,63]). This regulated boundary-law entanglement underpins the emergent locality on the fuzzy sphere at large N and ν. Recall from the discussion around Eq. (4.9) that there are only two emergent fields on the sphere: a Maxwell field and a scalar field. The perpendicular gauge choice we have made translates into the Coulomb gauge for the emergent Maxwell field, cf. the discussion around Eq. (4.11) above. The factor of N in Eq. (5.14) is due to the microscopic cutoff at a scale L fuzz ∼ L sph =N. Previous work on the entanglement of a free field on a fuzzy sphere involved similar wave functions but a different factorization of the Hilbert space, which was inspired instead by coherent states [64][65][66][67]. Those results did not always produce boundary-law entanglement. Here, we see that the UV/IR mixing in noncommutative field theories does not preclude a partition of the large N and large ν Hilbert space with a boundary-law entanglement.
We can also evaluate the entropy (5.12) using the large ν variational wave functions, without assuming the asymptotic form (5.13). The results are shown as dots in Fig. 11. However, we stress that only the ν → ∞ limit has a clear physical meaning, where fluctuations are infinitesimal. The variational results are close to the exact values in Fig. 11, showing that the neural network ansatz captures the entanglement structure of these matrix wave functions.
The results in this section are for the bosonic fuzzy sphere. The projection we introduced in order to partition the space of matrices can be extended in a similar, but more involved, way to factorize the fermionic Hilbert space.

VI. DISCUSSION
We have seen that neural network variational wave functions capture, in detail, the physics of a semiclassical spherical geometry that emerges in the mini-BMN model (2.1) at large ν. Away from the semiclassical limit, the spherical geometry either abruptly or gradually collapses towards a new state. In Fig. 8, we saw that in the "supersymmetric" sector, this new state was characterized by an increase in both the expectation value and quantum mechanical variance of the radius as ν → 0. To understand the physics of this process, and to start thinking about the nature of the collapsed state as ν → 0, it is helpful to consider the string theoretic embedding of the model.
The mini-BMN model can be realized in string theory as the description of N D-particles in an AdS 4 spacetime. Let us review some aspects of this realization [32]. The parameter 1 ν 3 ∼ g s L AdS L s 3 :

ð6:1Þ
Here, L AdS is the AdS radius, L s is the string length, and g s is the string coupling. The proportionality in Eq. (6.1) depends on the volume, in units of the string length, of internal cycles wrapped by the branes in the compactification down to AdS 4 . In particular, the mass of a single Dparticle is proportional to 1=g s times the wrapped internal volume. The strength of the gravitational backreaction of N coincident D-particles is then controlled by G N × N=g s . Here, G N ∼ g 2 s is the four-dimensional Newton constant, where we have suppressed a factor of the volume of the compactification manifold. Therefore, if we keep the AdS radius fixed in string units, gravitational backreation becomes important when g s N ∼ N=ν 3 ≳ 1. Up to factors of the volume of compactification cycles, this is equivalent to the statement that the dimensionless 't Hooft coupling λ ¼ N=ν 3 , introduced below Eq. (1.1), becomes large.
For N=ν 3 ≲ 1, then, the D-particles can be treated as light probes on the background AdS spacetime. The fuzzy sphere configuration describes a polarization of the Dparticles into spherical "dual giant gravitons." From the string theory perspective, this polarization is driven by the fourform flux Ω ∼ 1=L AdS supporting the background AdS 4 spacetime. Together with the discussion in the previous paragraph on the strength of the gravitational interaction, we can write the heuristic relation N=ν 3 ∼ gravity=flux. At large ν, the flux wins out, and semiclassical fuzzy spheres can exist; however, at small ν, gravitational forces cause the spheres to collapse. The entanglement and emergent locality that we have described in this paper is that of the polarized spheres, whose excitations are described by the usual gauge fields and transverse scalar fields of string theoretic D-branes.
For N=ν 3 ≫ 1, it is possible that the strongly interacting, collapsed D-particles will develop a geometric "throat," in the spirit of the canonical holographic correspondence [1]. It is not well understood when such a throat would be captured by the mini-BMN matrix quantum mechanics. The variational wave functions that we have developed here provide a new window into this problem. In particular, we hope to investigate the small ν collapsed state in more detail in the future, with the objective of revealing any entanglement associated with emergent local dynamics in the throat spacetime. If the emergent dynamics includes gravity, there are two potentially interesting complications. First, the entanglement of bulk fields may be entwined with entanglement due to the "stringy" degrees of freedom that seem to be manifested in the Bekenstein-Hawking entropy of black holes as well as in the Ryu-Takayanagi formula [68][69][70][71]. Second, and perhaps relatedly, it may become crucial to understand the edge-mode contribution to the entanglement, which we have avoided in our discussion here [72,73].
More generally, the methods we have developed will be applicable to a wide range of quantum problems of interest in the holographic correspondence. The benefit of the variational neural network approach is direct access to properties of the zero-temperature quantum mechanical state. Optimizing the numerical methods and variational ansatz further, and with more computational power, it should not be difficult to work with larger values of N. In addition to understanding the emergence of spacetime from first principles, it should also be possible to study, for example, the microstates and dynamics of quantum black holes.
The code used in the paper is available online [74].