Quantum Entanglement in Neural Network States

Machine learning, one of today's most rapidly growing interdisciplinary fields, promises an unprecedented perspective for solving intricate quantum many-body problems. Understanding the physical aspects of the representative artificial neural-network states is recently becoming highly desirable in the applications of machine learning techniques to quantum many-body physics. Here, we study the quantum entanglement properties of neural-network states, with a focus on the restricted-Boltzmann-machine (RBM) architecture. We prove that the entanglement of all short-range RBM states satisfies an area law for arbitrary dimensions and bipartition geometry. For long-range RBM states we show by using an exact construction that such states could exhibit volume-law entanglement, implying a notable capability of RBM in representing efficiently quantum states with massive entanglement. We further examine generic RBM states with random weight parameters. We find that their averaged entanglement entropy obeys volume-law scaling and meantime strongly deviates from the Page-entropy of the completely random pure states. We show that their entanglement spectrum has no universal part associated with random matrix theory and bears a Poisson-type level statistics. Using reinforcement learning, we demonstrate that RBM is capable of finding the ground state (with power-law entanglement) of a model Hamiltonian with long-range interaction. In addition, we show, through a concrete example of the one-dimensional symmetry-protected topological cluster states, that the RBM representation may also be used as a tool to analytically compute the entanglement spectrum. Our results uncover the unparalleled power of artificial neural networks in representing quantum many-body states, which paves a novel way to bridge computer science based machine learning techniques to outstanding quantum condensed matter physics problems.


I. INTRODUCTION
Understanding the behavior of quantum many-body systems beyond the standard mean field paradigm is a central (and daunting) task in condensed matter physics. One challenge lies in the exponential scaling of the Hilbert space dimension [1][2][3]. In principle, a complete description of a generic many-body state requires an exponential amount of information, rendering the problem unattainable even numerically. Yet, fortunately physical states usually only access a tiny corner of the entire Hilbert space and can often be characterized with much less classical resources. Constructing efficient representations of such states are thus of crucial importance in tackling quantum many-body problems. Notable examples include quantum states with area-law entanglement [4], such as ground states of local gapped Hamiltonians [5] or the eigenstates of many-body localized systems [6], which can be efficiently represented in terms of matrix product states (MPS) [7][8][9] or tensor-network states in general [10][11][12]. These compact representations of quantum states play a vital role and are indispensable for tackling a variety of many-body problems ranging from the classification of topological phases [13,14] to the construction of the Ads/CFT correspondence [15,16]. In addition, they are also the backbones of a number of efficient classical algorithms for solving intricate manybody problems, e.g., DMRG (density-matrix renormalization group) [9,17,18], TEBD (time-evolving block decimation) [19], PEPS [10,12] (projected entangled pair states), and MERA (multiscale entanglement renormalization ansatz) methods [20,21]. Recently, a novel neural-network representation of quantum many-body states has been introduced [22] in solving many-body problems with machine learning techniques. However, the entanglement properties (which are crucial for the renowned MPS/tensor-network representations) of these neural-network states remain unknown. In this paper, we fill this crucial gap by studying the entanglement properties of these many-body neural-network quantum states both analytically and numerically. Our work provides an important connection between the physical properties of many-body quantum entanglement and the computer science properties of neural network based machine leaning.
Machine learning is the core of artificial intelligence and data science [23]. It powers many aspects of modern society and its applications have become ubiquitous throughout science, technology, and commerce [24,25]. In fact, perhaps because of the dominant presence of big arXiv:1701.04844v3 [cond-mat.dis-nn] 11 May 2017 data in our modern world, the terms artificial intelligence, machine learning, neural networks, deep learning, etc. have generically entered the lexicon of the cultural world, well outside the technical world of computer science where they originated, often appearing in everyday press and popular articles or stories-for example, the software technology underlying automated self-driving cars depends crucially on artificial intelligence and machine learning. Within physics, applications of machinelearning techniques have recently been invoked in various contexts such as gravitational wave analysis [26,27], black hole detection [28], material design [29], and classification of the classical liquid-gas transitions [30]. Very recently, these techniques have been introduced to manybody quantum condensed-matter physics, raising considerable interest across different communities [22,[31][32][33][34][35][36][37][38][39][40][41][42][43][44]. Exciting progress has been made in identifying quantum phases and transitions among them (either conventional symmetry-broken [33,35,37,38] or topological phases [32]), modeling thermodynamic observables [36], constructing decoders for topological codes [39], accelerating Monte Carlo simulations [41,42], and establishing connections to renormalization group techniques [45,46], etc. In addition, machine-learning ideas have also been explored in measuring quantum entanglement and wavefunction tomography through the analyses of data extracted from quantum gas microscopes in cold atom experiments [47]. The fledgling field of machine learning applications in physics appears to be in its rapidly growing early phase with many expected future breakthroughs as it matures.
From the numerical perspective, the applications of machine-learning techniques to many-body problems would rely vitally on the underlying data structures of the artificial neural networks, whose connections to the entanglement features of the corresponding quantum states are particularly desirable to be addressed. In this paper, we study the entanglement properties, such as the entanglement entropy and spectrum, of the neural-network states. We focus on the quantum states represented by the restricted Boltzmann machine (RBM), which is a stochastic artificial neural network with widespread applications [40,[48][49][50]. We first prove the general result that all short-range RBM states obey an entanglement area-law, independent of dimensionality and bipartition geometry. Since the one-dimensional (1D) symmetryprotected topological (SPT) cluster states and toric code states (in both 2D and 3D) have an exact short-range RBM representation [51], it follows immediately that they all have area-law entanglement. For long-range RBM states, calculating their entanglement entropy and spectrum analytically is very challenging (if not impossible) and we thus resort to numerical simulations. We randomly sample the weight parameters of the RBM states and compute their entanglement entropy and spectrum. We find that their entanglement entropy exhibits 1: A 2D pictorial illustration of artificial-neural-network quantum states in the restricted-Boltzmann-machine architecture. The yellow balls (green cubes) denote the neurons on the visible (hidden) layer, corresponding to the physical (auxiliary) spins. The brown lines show the connections between visible and hidden neurons, with the weight parameter denoted by W rr (only a small portion of the connections is shown for best visualization). Here, we also show a typical bipartition of the system into two subsystems A and B in order to study the entanglement properties of the neural-network states.
a volume-law scaling in general. However, surprisingly their entropy is noticeably less than the Page-entropy for random pure states, and their entanglement spectrum has no universal part associated with random matrix theory and bears a Poisson-type level statistics. This indicates that the RBM states with random weight parameters live in a very restricted subspace of the entire Hilbert space (in spite of manifesting a volume law entanglement entropy) and are not irreversible-namely there exists an efficient algorithm to completely disentangle these states [52].
In addition, we analytically construct a family of RBM states with maximal volume-law entanglement. These states cannot be described in terms of matrix product states or tensor-network states with a computationally tractable bond dimension. In sharp contrast, their RBM representation is remarkably efficient, requiring only a small number of parameters that scales linearly with the system size. This shows, in an exact fashion and the most explicit way, the unparalleled power of neural networks in describing many-body quantum states with large entanglement. Unlike MPS/tensor-network states, entanglement is not the limiting factor for the efficiency of the neural-network representation. As an important consequence, we are able to calculate (through a reinforcement-learning scheme [22,53]) the ground state, whose entanglement has a power-law scaling with system size, of a spin Hamiltonian with long-range interaction. Finally, we show that the RBM representation could also be used as a tool to analytically compute the entanglement entropy and spectrum for certain quantum states with short-range RBM descriptions. We demonstrate this by using a concrete example of the 1D SPT cluster states. Our results not only demonstrate explicitly the exceptional power of artificial neural networks in representing quantum many-body states, but also reveal some crucial aspects of their data structures, which provide a valuable guide for the emerging new field of machine learning and many-body quantum physics.

II. NEURAL-NETWORK REPRESENTATION AND QUANTUM ENTANGLEMENT: CONCEPTS AND NOTATIONS
An artificial-neural-network representation of quantum many-body states has recently been introduced by Carleo and Troyer in Ref. [22], where they demonstrated the remarkable power of a reinforcement-learning approach in calculating the ground-state or simulating the unitary time evolution of complex quantum systems with strong interactions. We show elsewhere that this representation can be used to describe topological states, even for those with long-range entanglement [51]. To start with, let us first briefly introduce this representation in the RBM architectures. We consider a quantum system with N spins living on a d dimensional cubic lattice Ξ = (σ r1 , σ r2 , · · · , σ r N ). Correspondingly we introduce Ξ Y for spins in a subsystem Y as Ξ Y = {σ r : r ∈ Y }. The geometric details of the lattice do not matter. Here, we choose cubic lattices and focus on spin-1 2 (qubits) systems for simplicity. An RBM neural network contains two layers, one visible layer with N nodes (visible neurons) corresponding to the physical spins, the other a hidden layer with M auxiliary nodes (h r1 , h r2 , · · · , h r M ) (hidden neurons). The neurons in the hidden layer are connected to these in the visible layer, but there is no connection among neurons in the same layer (see Fig. 1 for an 2D illustration). The RBM neural-network representation of a quantum state is obtained by tracing out the hidden neurons [22] : where {h r } = {−1, 1} M denotes the possible configurations of the hidden spin variables and the weights Ω = (a r , b r , W rr ) are parameters needed to be trained to best represent the many-body quantum state. It is worthwhile to mention that the RBM state defined in Eq. (1) is a variational state with its amplitude and phase specified by Φ M (Ξ; Ω). The actual quantum state should be understood as (up to an irrelevant normalization constant) |Ψ(Ω) ≡ Ξ Φ M (Ξ; Ω)|Ξ , similar to the Laughlin-like description of the resonating-valence-bond ground state of the exactly-solvable Haldane-Shastry model [54,55].
We remark that RBMs can be trained in either supervised or unsupervised ways, and in the machine-learning community RBMs have had successful applications in classification [50], dimensionality reduction [48], feature learning [56], and collaborative filtering [49], etc. Mathematically, the ability of the RBM to approximate any many-body state is assured by representability theorems [57][58][59]. Nevertheless, the approximation may require a huge number (exponential in system size) of neurons and parameters, thus rendering the representation impractical, especially in numerical simulations. A question of both theoretical and practical interest is: what kind of many-body quantum states can be efficiently described by RBMs with a numerically feasible number of neurons and parameters? It is now established that entanglement plays a crucial role in determining whether a quantum state can be efficiently represented by MPS/tensor-network or not. Quantum states with volume-law entanglement cannot be described efficiently by MPS/tensor-network and thus cannot be simulated efficiently by DMRG, PEPS, or MERA. In sharp contrast, as we will show in the following sections, RBMs are indeed capable of efficiently describing certain specific quantum states with volume-law entanglement, giving rise to the great potential of numerically simulating these states with new machine learning algorithms based on RBMs.
In this paper, we study the quantum entanglement properties of the RBM states. In particular, we investigate the entanglement entropy, the Rényi entropy [60], and the entanglement spectrum [61], which are three of the most broadly used quantities for characterizing many-body entanglement of a pure quantum state. These quantities can be defined as follows: considering a pure many-body quantum state |ψ , we divide the system into two subregions, A and B (a typical bipartition of a 2D system is shown in Fig.1) . We then construct the reduced density matrix of subsystem A by tracing out the degree of freedom in B: ρ A (|ψ ) = Tr B (|ψ ψ|). The α-th order Rényi entropy is defined as The zeroth order (α = 0) Rényi entropy is related to the rank, namely, the number of nonzero singular values of ρ A . When α → 1, the first order Rényi entropy reduces to the von Neumann entropy, In the literature, the entanglement entropy usually means the von Neumann entropy. However, throughout this paper we do not differentiate between the entanglement entropy and the Rényi entropy, since most of the results are valid for the Rényi entropy to all orders. To define the entanglement spectrum, we first define the entanglement Hamiltonian by taking the log of ρ A : and then the entanglement spectrum is defined as the spectrum of H ent . We mention that entanglement is nowadays a central concept in many branches of quantum physics. In condensed matter physics, entanglement entropy and spectrum have proven to be powerful tools characterizing topological phases [4,[61][62][63][64], quantum phases transitions [65][66][67], and many-body localization [68][69][70], etc. A number of theoretical proposals have been introduced to measure entanglement entropy [71][72][73][74] and spectrum [75] in many-body systems. Notably, experimental measurements of the second-order Rényi entropy have been achieved in recent cold-atom experiments in optical lattices [76,77]. We expect our study of entanglement properties of neural-network states would also provide novel inspiration in this context.

III. AREA-LAW ENTANGLEMENT FOR SHORT-RANGE NEURAL-NETWORK STATES
We start with short-range RBM states and prove that they obey an area-law entanglement scaling, namely the amount of entanglement between a subsystem and its complement scales at most as the surface area or the boundary rather than the volume of the subsystem [4]. Historically, the study of entanglement area laws is inspired by the holographic principle in black hole physics, where the Bekenstein-Hawking entropy of a black hole is believed to scale as its boundary surface rather than its volume [78]. It has been argued that the origin of the black hole entropy is the quantum entanglement between the inside and the outside of the black hole [79][80][81]. Although it is apparent that entanglement in "natural" quantum systems should roughly live on the boundary and many numerical simulations indeed support this intuition, rigorously proving the area law for a given family of quantum states is notoriously challenging and often involves sophisticated mathematical techniques [4]. A breakthrough was first made by Hastings in Ref. [5], where he proved an entanglement area law for the ground states of 1D gapped local Hamiltonians by using the Lieb-Robinson bound [82]. More recently, this proof has been simplified and generalized to ground states with a finite number of degeneracy by a combinatorial approach based on Chebyshev polynomials [83,84]. Unfortunately, both the Lieb-Robinson bound approach and the combinatorial approach seem unlikely to carry over to the case of higher dimensions. Establishing the area-law entanglement for ground states of gapped Hamiltonians in more FIG. 2: A sketch for the proof of area-law entanglement for short-range restricted-Boltzmann-machine states. The system is divided into two subsystems A and B with the red line showing the interface boundary. In order to show that the Rényi entropy S A α obey an area law for any α, the subsystem A (B) is further divided into three parts A1, A2 and A3 (B1, B2 and B3). Since the neural-network is short-range, the Γrfactors with r ∈ A1 ∪ A2 (r ∈ B1 ∪ B2 ) is independent of spin configurations in region B (A), thus we can group the spins in region A1 and B1 with their Γr-factors. The entropy of the reduced density matrix ρA then only depends on the degree of freedom in region A2 ∪ A3, which is proportional to the surface area of region A. This gives us a clear geometric picture of why S A α is upper bounded by the surface area of A, up to an unimportant scaling constant as given in Eq.(2). than one dimension remains a major open problem (and arguably the most important one) in the field of Hamiltonian complexity [3].
Here we prove that short-range RBM states obey the area law of entanglement in any dimension for arbitrary bipartition geometry. To be precise, we call a RBM state a R-range RBM state if each hidden neuron is only connected to these visible neurons within a R neighborhood, i.e., W rr = 0 if |r − r | > R. For instance, in Ref. [51] we have demonstrated that both the 1D SPT cluster states and toric code states (both 2D and 3D) can be represented exactly by RBMs with hidden neurons being connected only to nearest visible neurons. These states are 1-range RBM states. For general R-range RBM states, we have the following theorem: Theorem 1. -For a R-range RBM state, the Rényi entropy for all orders satisfies where S(A) denotes the surface area of the subsystem A. This area law is valid in any dimension and for arbitrary bipartition geometry.
Proof. For RBMs, since there is no intra-layer connections between neurons, we can explicitly factor out the hidden variables and rewrite Φ M (Ξ; Ω) in a product form: Φ M (Ξ; Ω) = r e arσ z r r Γ r (Ξ r ), where we have introduced a local subregion notation r 0 ≡ {r : |r − r 0 | < R}, and a local function Γ r (Ξ r ) = 2 cosh(b r + r W r r σ z r ). We call Γ r (Ξ r ) the Γ r -factor for the hidden neuron at r . Moreover, by the definition of R-range RBM, the values of Γ r -factors only depend on the configuration of these visible neurons (physical spins) within a R neighborhood (denoted by N r (R)). We can thus simplify Γ r (Ξ r ) as Γ r (Ξ r ) = 2 cosh(b r + r∈N r (R) W r r σ z r ). This indicates the locality feature of Γ r -factors, which is the origin of the area law.
In order to utilize the locality-feature of Γ r -factors, we can further divide the subregion A (B) into three parts, A 1 , A 2 , and A 3 (B 1 , B 2 , and B 3 ), as illustrated in Fig.  2. Explicitly, the subregion with r directly coupled (via one hidden neuron) to the lattice sites in B is defined to be A 3 , the one directly coupled to A 3 within A is defined to be A 2 , and the rest of A is A 1 . The subregions B 1,2,3 are introduced correspondingly. One can regard the subregions A 2 , A 3 , B 2 , and B 3 as hypersurfaces with where (3), we only have at most 2 |Y | terms, with |Y | denoting the number of spins in region Y , in the summation and each term is a tensor product of orthogonal states of A and B. This gives the upper bound in Eq. (2) after tracing out the degrees of freedom in region B. We stress two crucial aspects of Eq. (3): (i) |ϕ A is independent of spin configurations in region B and |ϕ B is independent of spin configurations in region A; (ii) the coefficients r ∈A3∪B3 Γ r (Ξ r ) for each orthogonal components |ϕ A |ϕ B are independent of spin configurations outside Y . (i) and (ii) are crucial for the validity of the proof and they are made evident by the deliberate partitions of both subregions A and B further into three smaller parts.
We emphasize that in the above proof, we did not specify the dimensionality or the geometry of the bipartition. The proof works for any dimension and any bipartition of the system. Thus, it might shed new light on the important challenging problem of proving the entanglement area law for local gapped Hamiltonians in higher dimensions [5,[83][84][85], given the possibility that all ground states of these Hamiltonians perhaps are representable by short-range RBMs, although a rigorous proof of this still remains unclear. Intuitively, one can increase the number of hidden neurons to increase the number of local weight parameters. When there are enough free parameters, the corresponding RBMs should be able to repre-sent the ground states of general local gapped Hamiltonians. This would work because of a crucial aspect of our proof-the numbers of hidden neurons and weight parameters are unlimited as long as the connections are finite-ranged.
As shown in our previous work [51], the 1D SPT cluster states, the toric code states in both 2D and 3D, and the low-energy excited states with abelian anyons of the toric code Hamiltonians can all be represented exactly by short-range RBMs with R = 1. An immediate corollary of the above theorem is that the entanglement of all these states fulfill an area law. In fact, based on the RBM representation, one can even compute analytically the entanglement spectrum of the 1D SPT cluster state, as we will show in a latter section (see Sec. V). It is important to clarify that although the RBMs are short-range, their represented quantum states can capture long-range entanglement. The RBM representation of the toric code states (both in 2D and 3D), which have intrinsic topological orders (long-range entangled), are such examples. The area law of short-range RBM states does not imply short-range entanglement. This distinction between short-ranged in the RBM sense and short-ranged in the entanglement sense is an important point.
In 1D, the area-law bound in Eq. (2) gives rise to an interesting relation between the RBM and MPS representations of quantum many-body states. It has been proved that a bounded Rényi entropy of all the orders in 1D necessarily guarantees an efficient MPS representation [86] (note that counterexample does exist if only the von Neumann entropy is bounded [87]). As a result, our area-law results imply that all 1D short-range RBM states can be efficiently described in terms of MPS. However, the validity of the inverse statement is unknown. It would be interesting to find out whether all MPS descriptions with small bond dimensions have efficient RBM representations or not, and if so, what the general procedure is for recasting MPS into RBM states. It would also be interesting to investigate the relations between higher dimensional RBM states and tensor-network states, PEPS, or MERA. Nonetheless it is worth emphasizing here that the entanglement scaling of RBM states is sharply distinctive from MPS-the maximal entanglement entropy of a R-range RBM state scales linearly with R whereas a bond-dimension (χ) MPS has an entanglement entropy scaling as log χ. This implies that even a RBM state can be generically converted to a MPS, the parameterization in RBM states is much more efficient for representing highly entangled quantum states.
We remark that our rigorous proof of entanglement area law for short-range RBMs provides a valuable guide for some practical numerical calculations. For instance, in some circumstances we know that the problem may only involve a small amount (an area law) entanglement, then we may use short-range, rather than long-range, RBMs to reduce the number of parameters and conse-

Visible'Layer
Hidden'Layer The k-th hidden neuron is connected to two visible neurons at sites k and k + 1 (k + 1 − N and k + 1 − N 2 ) with connection weight parameters equal iπ 4 . The on-site potentials for the visible neurons are chosen to be zero a k = 0 (∀k). For the hidden neurons, b k is chosen as: . The scissors show a cut of the system into two subsystems (A and B) with equal sizes and for this bipartition the Rényi entropy is S A α = N 2 log 2, proportional to the system size. This is also the maximal amount of entropy one can have for a system with N qubits.
quently speed up the calculations (we have tested this in a numerical experiment of finding the ground state of the transverse-field Ising model via reinforcement learning and a considerable speedup has indeed been obtained). On the other hand, if the problem to be solved involves large entanglement (such as some quantum criticality or quantum dynamic problems), then short-range RBMs will necessarily not work and we should choose a longrange RBM to begin with.

IV. VOLUME-LAW ENTANGLEMENT IN LONG-RANGE NEURAL-NETWORK STATES
In the last section, we proved that all short-range RBM states satisfy an area-law entanglement. What about RBM states with long-range neural connections? From the linear-in-R entanglement-entropy scaling of R-range RBM states derived in the last section, we would anticipate that long-range RBM states could exhibit volumelaw entanglement. In this section, we explicitly show that this is indeed true by a rigorous exact construction and a numeric benchmark.
A. Exact construction of maximal volume-law entangled neural-network states Here we construct analytically families of neuralnetwork states with volume-law entanglement. These states are exact and have unified closed-form RBM representations. More strikingly, the RBM representation of these states is surprisingly efficient-the number of nonzero parameters scales only linearly with the system size! We stress that efficient representations of quantum states play a vital role in solving many-body problems, especially when numerical approaches are employed. A prominent example is the advantageous usage of MPS representation in DMRG [9] (for the ground states), TEBD [19] (for time evolution), and DMRG-X [88] (for highly excited eigenstates of local Hamiltonians deep in the many-body localization region) algorithms. Yet, the MPS/tensor-network representation is efficient only in describing quantum states with area-law entanglement and thus presents serious practical limitations in solving problems involving volume-law entanglement states. As introduced in the previous section, the construction philosophy of neural-network states is very different from that of MPS/tensor-network states. This gives rise to the possibility for neural networks to represent efficiently quantum states and solve problems with volume-law entanglement. We also stress that our exact results here provide an important anchor point for future theoretical and numerical studies and should have far-reaching implications in the applications of machine learning techniques in solving currently intractable many-body problems. In the subsection C, we indeed use RBMs to solve the ground state (with massive power-law entanglement) of a modified Haldane-Shastry model with long-range interactions by using the reinforcement learning.
We first give an 1D example. Let us consider an 1D system of N qubits. The goal is to construct a RBM state with maximal volume-law entanglement entropy. To this end, we introduce a RBM with N visible and M = 3N 2 − 1 hidden neurons. Here, the floor function x denotes the largest integer less or equal to x. The weight parameters of Φ M (Ξ; Ω), which characterize the RBM as defined in Eq. (1), are chosen to be where S is a set of paired integers defined by S ≡ {(i, j) : The ceiling function x denotes the smallest integer greater than or equal to x. A pictorial illustration of this RBM is shown in Fig. 3. Now, we show that the quantum states described by the above RBM have volume-law entanglement entropy for any contiguous region no larger than half of the system size. To be more precise, we have the following theorem: Theorem 2.-For an 1D RBM state with weight parameters specified by Eqs. (4)(5)(6), if we divide the system into two parts A and B with A consisting the first l (1 ≤ l ≤ N 2 ) qubits and B the rest, then the corresponding Rényi entropy of ρ A is Proof. As mentioned in Sec. III, since there is no intra-layer connection between neurons for a RBM, we can explicitly factor out the hidden variables and rewrite Φ M (Ξ; Ω) in a product form: . From Eq. (4), a k = 0 for all k ∈ [1, N ], thus the first term N k=1 e a k σ z k simply equals one and can be omitted from Φ M (Ξ; Ω). Consequently, the variational wavefunction Φ M (Ξ; Ω) only depends on the Γ k -factors, which correspond to the hidden neurons. As shown in Fig. 3 connects its corresponding hidden neuron at site k to two nearest-neighbor visible neurons at sites k and k + 1. Γ k has only two possible values: connects its corresponding hidden neuron at site k to two far-away separated visible neurons at sites k + 1 − N and k . These features of the Γ kfactors are crucial in the following proof of Eq. (7).
For convenience, we define two sets of integers: We note that B = B 1 ∪ B 2 . By using the features of the Γ k -factors discussed above, the RBM state reduces to |Ψ(Ω) with C a positive constant. By tracing out the degrees of freedom in region B and putting back the normalization constant, we obtain the reduced density matrix ρ A = I/2 l , with I the identity matrix of dimension 2 l × 2 l . This completes the proof.
It is worthwhile to mention that the subregion A does not necessarily have to be at the left end. In fact, A can be any contiguous region of length l and Eq. (7) still remains valid, although the details of the proof would change slightly in this situation. We choose A to be at the left end just for convenience. In the limit N → ∞, for any contiguous region its entanglement entropy scales linearly with the size of the region-a volume-law entanglement.
For the 2D case, we can construct volume-law entan-gled RBM states in a similar manner. We consider a system of N qubits living on a L x ×L y square lattice denoted as Λ. We assume L x and L y are even integer numbers for simplicity (one can use the floor and ceiling functions to deal with the case of odd numbers, but the notations will be more cumbersome). We label each vertex of the lattice by a pair of indices (k x , k y ) (1 ≤ k x ≤ L x and 1 ≤ k y ≤ L y ) and attach to it a qubit N = L x × L y . We construct a RBM with N visible and 5 2 L x L y −L x −L y hidden neurons. The hidden neurons are divided into three groups. The first (second) group, denoted by X (Y), has (L x − 1) × L y (L x × (L y − 1)) neurons that connect nearest visible neurons along the x (y) direction. The third group, denoted by Z, contains L x × 1 2 L y hidden neurons that connect visible neurons nonlocally. One can draw an analogy with the 1D example: the neurons in groups X and Y connect nearest visible neurons and they correspond to the first N − 1 hidden neurons in the 1D case, and similarly those in group Z correspond to the remaining ones. The hidden neurons in X , Y and Z are labeled by x = (x 1 , x 2 ), y = (y 1 , y 2 ), and z = (z 1 , z 2 ), respectively. Following the 1D example, the weight parameters can be chosen as 2D , and S (Z) 2D are the three sets that specify the connections between the visible neurons and the three groups of hidden neurons. They are defined as S (X ) Following the proof of theorem 2, it is straightforward to verify that the entanglement entropy for any small regular contiguous subregion A scales linearly with the volume of A and is maximal S A α = N A log 2. Here, N A denotes the number of qubits inside region A.
We mention that similar constructions carry over to higher dimensions straightforwardly. For a system defined on a simple cubic lattice in d-dimension with N = L d qubits, our construction requires M = 2d+1 2 L d − dL d−1 hidden neurons and 3M nonzero weight parameters. Both the number of hidden neurons and the number of parameters scale only linearly with the system size. In contrast, if we express these RBM states in terms of MPS/tensor-networks, the bond dimension will grow exponentially with the system size, and the problem quickly becomes intractable. This demonstrates explicitly a unique advantage of RBMs in representing quan- tum many-body states with massive entanglement.

B. Entanglement benchmarking
For a general RBM state with long-range connections, the entanglement entropy cannot be calculated analytically. We thus resort to numerical simulations. We study the entanglement properties of RBM states with random weight parameters. We consider an 1D system with N qubits. The corresponding RBM has N visible and M hidden neurons with the weight parameters chosen randomly and independently. For each random sample, we calculate numerically the coefficients for all possible spin configurations (there are 2 N configurations) and normalize them to obtain the corresponding quantum state in the computational basis. We then make an equal bipartition and calculate the reduced density matrix ρ A for the A subsystem. We diagonalize ρ A to compute the desired entanglement entropy and spectrum. The number of samples used for numerics ranges from 10 6 (N = 6) to 10 3 (N = 22). We mention that although we focus only on 1D systems, some entanglement features discovered here should carry over to higher dimensions as well. Extensive higher dimensional RMB-based numerics are left for future studies.
In Fig. 4(a), we plot the averaged entanglement en-tropy scaling with different system sizes. When γ is small (γ = 1, 2, 3), we find that the averaged entanglement entropy scales linearly with the system size-a volume law (This is another indication that entanglement is not the limiting factor for the RBMs in representing quantum many-body states). Here, γ = M/N denotes the ratio between the number of hidden and visible neurons. However, when γ increases the entanglement apparently bends downwards and seems to saturate at large N . This appears surprising at the first sight because an increase of γ means an increasing of number of connections between visible neurons, and intuitively the entanglement should increase as well. In fact, the bending of the curve at large γ may be understood by looking at the original RBM representation in Eq. (1). Since we choose W kk randomly, on average Φ M (Ξ; Ω) will become less and less dependent on the spin configuration Ξ as γ increases. In other words, in the represented many-body quantum wavefunction the difference between the coefficients of each component becomes smaller and smaller. Thus, the state become closer and closer to a product state and therefore the entanglement decreases. This is further justified in the inset of Fig. 4(b), where the von Neumann entropy S A 1 is shown as a function of γ for different system sizes. From this figure, S A 1 reaches its maximal value at a critical γ * ≈ 0.7, independent of system size. When γ > γ * , S A 1 decreases as we increase γ. It is also worthwhile to mention that when we fix M as a finite number (then γ → 0 in the thermodynamic limit N → ∞), then S A 1 is upper bounded by M log 2, regardless of the system size. This can be understood heuristically by imagining all the hidden neurons being grouped into the subsystem B, then the subsystem A can only have at most 2 M degrees of freedom that are entangled with B. Consequently, S A 1 is bounded by M log 2. This explains the numerical observation in the inset of Fig. 4(b) that for M = 1 (smallest γ), S A 1 ≈ log 2 independent of the system size.
In order to compare the RBM states with random parameters with generic random pure states, we also calculate the so-called Page entropy [89], which is the averaged entanglement entropy over pure states drawn randomly from the entire Hilbert space of the system. The Page entropy provides an estimate for entanglement in extended thermal states [90] and has been widely used in the context of quantum chaos [91], blackhole information [92,93], and many-body localization [94,95]. From the random matrix theory, it can be computed as: 1 k , where d A and d B denote the Hilbert space dimensions of subsystems A and B, respectively [89]. An interesting observation in Fig. 4(a) and the inset of Fig. 4(b) is that the entanglement entropy is always smaller than the Page entropy for all γ. This implies that the pattern of entanglement for the RBM states with random parameters is distinct from that of random pure states, which is consistent with the fact that the RBM states live in a very small restricted subspace of the entire Hilbert space. This also indicates that a random state in the Hilbert space is probably not well described by a RBM efficiently. In Fig. 4(b), we plot the entanglement distribution for different γ. We find that, as γ increases, the distribution becomes broader and the density peak shifts towards smaller values. This is in agreement with the observation in Fig. 4(a) that the entanglement decreases as γ increases. Fig. 4(c) shows the results for the Rényi entropy of orders two and one half. As expected, S 2 behaves very similarly to S 1 . For S 1/2 , we find a similar volume-law scaling of entanglement, but the bending feature does not show up at γ = 4 due to finite-size effects.
The entanglement entropy studied above provides a wealth of information about the data structure of the RBM states. Yet, as has been realized in a number of different physical contexts, the entanglement entropy cannot capture the full entanglement structure of the system [96][97][98][99][100][101]. Much greater information can be extracted from the entanglement spectrum. In order to obtain a more comprehensive understanding of the data structure of the RBM states with random weight parameters, we have therefore also calculated their entanglement spectra and the entanglement Hamiltonian level statistics. In Fig. 5(a), we plot the averaged entanglement spectrum for different γ. We find that the en-tanglement spectrum for the RBM states is completely different from the Marchenko-Pastur distribution derived from random matrix theory [102]. More specifically, the Marchenko-Pastur distribution describes the asymptotic average density of eigenvalues of a Wishart matrix (a matrix of the form Y = XX † with X a random rectangular matrix). It is shown recently in Ref. [97] that the entanglement spectrum of highly excited eigenstates in the delocalized phase bears a two-component structure: (i) a universal part that is associated with random matrix theory, i.e., a universal tail that follows the Marchenko-Pastur distribution and thus is model independent, and (ii) a model-dependent nonuniversal part which dominates the weights in the spectrum. In the localized phase, the universal part of the spectrum disappears in the thermodynamic limit, leaving only the nonuniversal part that leads to an area-law scaling of the entanglement entropy. In our case for the RBM states with random weight parameters, the universal part disappears completely even for a system size as small as N = 20. In this sense, these RBM states are less random than the highly excited eigenstates in both the delocalized and localized phases. This further shows that, although these RBM states obey a volume law of entanglement entropy on average, they are living in a very restricted subspace of the entire Hilbert space. In Fig. 5(b), we plot the density of states for the entanglement Hamiltonian of these RBM states. We find that a broader distribution shows up as we increase γ and the peak moves to the right, which is consistent with the surprising results from Fig. 4 (i.e., entanglement decreases as γ increases).
Another quantity which is also useful in understanding the data structure of the RBM states is the adjacent gap ratio r defined as r n = min(δn,δn−1) max(δn,δn−1) , with δ n the level spacing between the n-th and the (n−1)-th eigenstates of the entanglement Hamiltonian. We note that the importance of the distribution of r has been broadly appreciated in various contexts. In quantum chaos [103], it is argued that whereas the level statistics for integrable quantum Hamiltonians obeys a Poisson law [104], the case for Hamiltonians with chaotic dynamics must follow one of the three classical ensembles from random matrix theory [105], namely the Gaussian orthogonal ensemble (GOE), the Gaussian unitary ensemble (GUE) and the Gaussian symplectic ensemble (GSE). These three ensembles correspond to Hermitian random matrices with entries being independently distributed random real, complex, and quaternionic variables, respectively [106]. In many-body localization, it is generally believed (verified by extensive numerical calculations recently) that Hamiltonians in the delocalized and localized regions manifest respectively GOE (or GUE) and Poisson level statistics [68]. The level statistics of the entanglement Hamiltonians in this context has also been studied recently in Ref. [98]. It was shown that in the thermal phase the entangle- ment spectrum shows level statistics that in agreement with predictions from random matrix theory and is governed by the same random matrix ensemble as the energy spectrum. Yet, in the many-body localized phase, the entanglement spectrum shows a semi-Poisson distribution, in contrast to the energy spectrum following a Poisson law. For the RBM states with random weight parameters studied in this section, we find that their entanglement spectra follow a Poisson distribution, as shown in Fig. 5(c). The averaged value of r n (over 10 4 random samples) with N = 20 and γ = 3 equals 0.378, which is in good agreement with the Poisson predicted value 2 ln 2 − 1 ≈ 0.386 [107]. The small deviation could be attributed to finite-size effect. Thus, these RBM states are distinct from the eigenstates of Hamiltonians in either the delocalized or localized phases on average. In addition, we also remark that the Poisson behavior and the lack of universal part in the entanglement spectra of these RBM states imply that they are not irreversible-namely there exists an efficient algorithm to completely disentangle these states [52]. Finding out the disentangling algorithm would provide some insight into the nature of neural-network quantum states and is an interesting topic for future investigation.
C. Reinforcement learning of ground states with power-law entanglement The above discussion shows that, unlike MPS/tensornetwork states, entanglement is not the limiting factor for the efficiency of the RBM representation. As an important consequence, RBM might be capable of solving some quantum many-body problems where massive entanglement is involved. To demonstrate this unprecedented power, in this section we consider the problem of finding the ground state (with power-law entanglement) of a spin-1/2 Hamiltonian with long-range interaction, through a reinforcement-learning scheme [22,53]. We consider N spin-1/2 particles living on a ring (see Fig.  6) with a modified Haldane-Shastry [54,55] Hamiltonian given by where d ij = N π | sin[π(j − k)/N ]| is the so-called "chord distance". Since it has long-range interactions with a power-law decaying strength, we expect its ground state to have power-law entanglement. Although a rigorous proof is still lacking and seems very hard to obtain, we can verify the entanglement power law numerically. In Fig. 7(a), we plot the von Neumann entropy for the ground state of H MHS calculated from exact diagonalization (ED). We find that it indeed has an excellent power-law fit with the system size.
We now show that RBM is capable of faithfully and efficiently representing the ground state of H MHS and the representing RBM can be efficiently obtained via reinforcement learning, despite the fact that the ground state has a large amount of entanglement entropy. Since the Hamiltonian has a lattice translation symmetry, we can use this symmetry to reduce the number of variational parameters and for integer hidden-variable density (γ = 1, 2, · · · ) the weight matrix takes the form of feature filters W (f ) j with f ∈ [1, γ], as described in Ref. [22]. In Fig. 7(b), we plot the different spin correlations obtained via reinforcement learning, for small system sizes. We compare the RBM result with that from exact diagonalization. As shown in this figure, the RBM result matches the ED result very well. The accuracy of the RBM result can be improved by increasing γ and the number of iterations in the training process. In Fig. 7(c), we show the feature maps after a typical reinforcement learning process with γ = 4 and N = 20. The accuracy of the trained RBM can be quantified by the relative error on the ground state energy rel = |(E 22]. For the parameters shown in Fig. 7(c), we find rel ∼ 10 −5 . We then move on to calculate the correlation functions and ground state energy density for larger system sizes, which are far beyond the capability of the ED technique. We plot some of the results for N = 100 in Fig. 7(d). We find that the correlation σ z 1σ z 1+j has a sharp jump at j = 2, which is also obtained in our ED calculations for smaller system sizes, as shown in Fig.7(b).
We remark that the DMRG/MPS-based simulations are particularly challenging for the above problem and would presumably require a substantially larger number of variational parameters than the RBM approach [9,108]. In this regard, the reinforcement-learning based RBM technique has apparent advantages when large entanglement and long-range interactions are involved. Moreover, as pointed out in Ref. [22], the RBM approach works as well in higher dimensions and for dynamical problems. We also mention that H MHS might be realized with trapped ions in a ring geometry [109,110]. Thus the numerically calculated correlations could be experimentally verified in the future.

V. AN ANALYTICAL RBM RECIPE FOR CALCULATING ENTANGLEMENT
In Sec. III, we proved that all short-range RBM states obey an area law entanglement. Can we calculate the entanglement entropy and spectrum analytically? For a general many-body state, this is an outstanding challenge, especially for a system with a finite size. In fact, most of the past works focus on the thermodynamic limit and compute entanglement entropy asymptotically. The methods used often involve complicated mathematics [4]. For instance, the Fisher-Hartwig formula has been used to evaluate the asymptotic behavior of the entanglement entropy for the critical XX model and other isotropic models [111][112][113][114]. Another notable approach is the use of conformal field theory, where some universal properties of entanglement entropy have been established for critical (1+1)-dimensional systems [115][116][117]. For calculating the exactly entanglement entropy of a finite system, we note the quotient group method, which has been used to calculate the entropy of an arbitrary bipartition of the 2D toric code states [118,119]. Here, however, we introduce an alternative approach and show that the RBM representation would also help the analytical calculation of the entanglement entropy and spectrum. As an example, we consider the 1D SPT cluster state |Ψ cluster , which is the ground state of the cluster Hamiltonian H cluster defined on an 1D lattice with periodic boundary condition: x kσ z k+1 . This state is a topological state protected by Z 2 × Z 2 symmetry [120]. It serves as a simple toy model for studying SPT phases and has important applications in measurement-based quantum computation [121][122][123]. An exact and efficient RBM representation of the 1D cluster state has been found in [51]. This representation has N hidden neurons with each one connecting only locally to the visible neurons within distance one. The weight parameters are specified as the following: where ω µ s (µ = 1, 0, −1) are positive real numbers giving by (ω 1 , ω 0 , ω −1 ) = π 4 (2, 3, 1). In the product form, the normalized 1D cluster state reads where the Γ k -factor only depends on the configurations of three nearest visible neurons Γ k (Ξ) = Γ k (σ z k −1 , σ z k , σ z k +1 ) = cos[ π 4 (1 + 2σ z k −1 + 3σ z k + σ z k +1 )] (note that we have put back the normalization constant and have rescaled all the Γ k -factor by 1/2).
In order to study its entanglement properties, we consider an arbitrary bipartition of the system into two parts A and B and we aim to calculate the entanglement entropy and spectrum of subsystem A analytically. For convenience, we further divide the subregion A (B) into two parts A 1 and A 2 (B 1 and B 2 ) with A 2 (B 2 ) containing only four sites α 1 , α 2 , α 3 , and α 4 (β 1 , β 2 , β 3 , and β 4 ), as shown in Fig. 8. Using the fact that the RBM is short-ranged and Γ k (Ξ) = ± √ 2 2 , we can rewrite |Ψ cluster in the following form, where the subregions A FIG. 8: A sketch for computing analytically the entanglement entropy and spectrum of the 1D symmetry-protectedtopological cluster state, through the corresponding exact short-range restricted-Boltzmann-machine representation. The scissors show a cut of the system into two subsystems A and B. We calculate the entanglement entropy and spectrum from the reduced density matrix ρA. In order to conveniently explore the short-range feature of the restricted-Boltzmannmachine, we further divide A (B) into A1 and A2 (B1 and B2). It is important that the Γ k -factors in subregion A1 (B1) are independent of spin configurations in B (A), such that the summations of spin configurations in A1 and B1 are interchangable and can be factorize out explicitly, as shown in Eq. (11). and B show up explicitly with (11) is crucial in calculating the entanglement entropy and spectrum. Compared with Eq. (10), it contains only 2 8 , rather than 2 N , terms in the summation. Noting that |Ψ A (|Ψ B ) only depends on the spin configurations within subregion A 2 (B 2 ) and , one can do an unitary transformation U A (U B ) within subregion A (B) to rotate the basis of the Hilbert space H A (H B ) of A (B). Note that this rotation will not affect the entanglement entropy and spectrum. In the new basis, |Ψ A (Ξ A2 ) (|Ψ B (Ξ B2 ) ) is just a basis vector of H A (H B ). By tracing out the degrees of freedom in subregion B and plugging in the parameter values in Eq. (9), we find a very simple expression for the reduced density matrix ρ A in the new basis where M 1 = 1 4 |ψ 1 ψ 1 | and M 2 = 1 4 |ψ 2 ψ 2 | are four-byfour matrices with |ψ 1 = 1 2 (|0 + |1 ) ⊗ (|0 − |1 ) and |ψ 2 = 1 2 (|0 − |1 ) ⊗ (|0 − |1 ), respectively; 0 is a zero matrix of dimension 2 N A 1 × 2 N A 1 with N A1 denoting the number of spins in subregion A 1 . From Eq. (12), the eigenvalues of ρ A can be obtained readily. ρ A has only four nonzero eigenvalues that are degenerate and equal to 1 4 . As a result, the Rényi entropy is given by S A α = 2 log 2, ∀α.
For the entanglement spectrum, H ent has a four-fold degeneracy with the four smallest eigenvalues equal to 2 log 2 and the rest are infinite. The four-fold degeneracy is a signature of SPT phases [61,[124][125][126]. We expect that this RBM approach would carry over to calculating the entanglement entropy and spectrum for the 2D and 3D toric code states, whose RBM representation has already been given in Ref. [51], although the calculation will be more technically involved. Undoubtedly, like all analytical methods in calculating entanglement, our RBM approach has obvious limitations and cannot be applied in general to an arbitrary short-range RBM state. For one thing, given a specific quantum many-body system, there is so far no systematic way to write down its wave function in terms of RBM. For another, we need certain symmetries (such as the translational symmetry) to substantially simplify the equations. Thus, this approach works only in certain specific circumstances. However, we emphasize that our method does not contain sophisticated mathematics and is a completely new approach never considered before in the literature.

VI. CONCLUSION AND OUTLOOK
In summary, we have studied the entanglement properties of neural-network quantum states in the RBM architecture. In particular, we have proved that all shortrange RBM states satisfy an area law of entanglement for arbitrary dimensions and bipartition geometry. This not only immediately implies an area law for the entanglement of the 1D SPT cluster states and the 2D/3D toric code states (with or without anyonic excitations), but also sheds light on the open problem of proving the entanglement area law for the ground states of local gapped Hamiltonians in higher dimensions. For generic longrange RBM states with random parameters, we numerically studied their entanglement entropy and spectrum. We found: (i) the averaged entanglement entropy follows a volume law, but is significantly smaller than the Page-entropy for random pure states; (ii) their entanglement spectrum has no universal part associated with random matrix theory and manifests a Poisson-type level statistics. In addition, we analytically constructed families of RBM states (in both 1D and 2D) with maximal volume-law entanglement, which cannot be represented efficiently in terms of matrix product states or tensor-network states. For these states, the RBM representation is remarkably efficient, requiring only a small number of parameters scaling linearly with the system size. These results explicitly show, in an exact fashion, the remarkable power of artificial neural networks in describing quantum states with massive entanglement. Unlike MPS or tensor-network states, entanglement is not the limiting factor for the efficiency of the neuralnetwork representation of quantum many-body states. Through reinforcement learning of a modified Haldane-Shastry model, we have shown that RBM is capable of calculating the ground state, which has power-law entanglement. The corresponding ground-state energy and correlations can also be efficiently obtained. Finally, we also demonstrated, through a concrete example, that the RBM representation could be used as a tool to analytically compute the entanglement entropy and spectrum for finite systems. Our results reveal some crucial aspects of the data structures of neural-network quantum states and provide a useful guide for the practical applications of machine learning techniques in solving quantum manybody problems.
There remain many open questions. First, what are the limiting factors for RBMs in efficiently representing quantum states? In the future, it would be interesting and important to find out the necessary and sufficient conditions under which a many-body state can be represented efficiently by neural networks and find out how to convert a general quantum state satisfying these conditions into RBMs. This would help develop new machinelearning algorithms for solving many-body problems and advance the understanding, from a physical perspective [127], of the power of machine learning itself. It would also be interesting to study entanglement properties in other types of artificial-neural-network states [33,34]. Another interesting direction worth more investigations is the relation between the MPS/tensor-network representation and the neural-network representation. In this context, we note a recent work on supervised machine learning with quantum-inspired tensor networks [128]. From Sec. III, we now know that all short-range RBM states in 1D can be represented in terms of MPS. What about higher dimensions and the inverse statement? Can we rewrite all states bearing a MPS/tensor-network representation with small bond dimensions in terms of shortrange RBMs? These are among the questions worth future exploration.