Neural-Network Quantum States, String-Bond States, and Chiral Topological States

Neural-Network Quantum States have been recently introduced as an Ansatz for describing the wave function of quantum many-body systems. We show that there are strong connections between Neural-Network Quantum States in the form of Restricted Boltzmann Machines and some classes of Tensor-Network states in arbitrary dimensions. In particular we demonstrate that short-range Restricted Boltzmann Machines are Entangled Plaquette States, while fully connected Restricted Boltzmann Machines are String-Bond States with a nonlocal geometry and low bond dimension. These results shed light on the underlying architecture of Restricted Boltzmann Machines and their efficiency at representing many-body quantum states. String-Bond States also provide a generic way of enhancing the power of Neural-Network Quantum States and a natural generalization to systems with larger local Hilbert space. We compare the advantages and drawbacks of these different classes of states and present a method to combine them together. This allows us to benefit from both the entanglement structure of Tensor Networks and the efficiency of Neural-Network Quantum States into a single Ansatz capable of targeting the wave function of strongly correlated systems. While it remains a challenge to describe states with chiral topological order using traditional Tensor Networks, we show that Neural-Network Quantum States and their String-Bond States extension can describe a lattice Fractional Quantum Hall state exactly. In addition, we provide numerical evidence that Neural-Network Quantum States can approximate a chiral spin liquid with better accuracy than Entangled Plaquette States and local String-Bond States. Our results demonstrate the efficiency of neural networks to describe complex quantum wave functions and pave the way towards the use of String-Bond States as a tool in more traditional machine-learning applications.


I. INTRODUCTION
Recognizing complex patterns is a central problem that pervades all fields of science. The increased computational power of modern computers has allowed the application of advanced methods to the extraction of such patterns from humongous amounts of data and we are witnessing an ever increasing effort to find novel applications in numerous disciplines. This led to a line of research now called Quantum Machine Learning[1], which is divided in two main different branches. The first tries to develop quantum algorithms capable of learning, i.e. to exploit speed ups from quantum computers to make machines learn faster and better. The second, that we will consider in this work, tries to use classical machine learning algorithms to extract insightful information about quantum systems.
The versatility of machine learning has allowed scientists to employ it in a number of problems which span from quantum control [2][3][4] and error correcting codes [5] to tomography [6]. In the last few years we are experiencing interesting developments also for some central problems in condensed matter, such as quantum phase classification/recognition [7][8][9][10], improvement of dynamical mean field theory [11], enhancement of Quantum Monte Carlo methods [12,13] or approximations of thermodynamic observables in statistical systems [14].
An idea which received a lot of attention from the scientific community consists in using neural networks as variational wave functions to approximate ground states of many-body quantum systems [15]. These networks are trained/optimized by the standard Variational Monte Carlo (VMC) method and while a few different neuralnetwork architectures have been tested [15][16][17], the most promising results so far have been achieved with Boltzmann Machines [18]. In particular, state of the art numerical results have been obtained on popular models with Restricted Boltzmann Machines (RBM) and recent effort has demonstrated the power of Deep Boltzmann Machines to represent ground states of many-body Hamiltonians with polynomial-size gap and quantum states generated by any polynomial size quantum circuits [19,20].
Other seemingly unrelated classes of states that are widely used in condensed matter physics are Tensor Networks States. In 1D, Matrix Product States (MPS) can approximate ground states of physical Hamiltonians efficiently [21,22] and their structure has led to both analytical insights over the entanglement properties of physical systems as well as efficient variational algorithms for approximating them [23][24][25]. The natural extension of MPS to larger dimensional systems are Projected Entangled Pair States (PEPS) [26], but their exact contraction is #P hard [27] and algorithms for optimizing them need to rely on approximations. Another approach to define higher dimensional Tensor Networks consists in first dividing the lattice into overlapping clusters of spins. The wave function of the spins in each cluster is then described by a simple Tensor Network. The global wave function is finally taken to be the product of these Tensor Networks, which introduces correlations among the different clus-arXiv:1710.04045v3 [quant-ph] 7 Mar 2018 ters. This construction for local clusters parametrized by a full tensor gives rise to Entangled Plaquette States (EPS) [28][29][30], while taking one dimensional clusters of spins each described by a MPS leads to a String-Bond States (SBS) Ansatz [31,32]. These states can be variationally optimized using the VMC method [31,33] and have been applied to 2D and 3D systems.
All these variational wave functions have been successful in describing strongly correlated quantum many body systems, including topologically ordered states. The Toric code [34] is a prototypical example which can be written exactly as a PEPS [35], an EPS [30], a SBS [31] or a short-range RBM [36]. This shows that in some cases Tensor Networks and Neural-Network Quantum States can be related. Indeed it was recently shown that local Tensor Networks can be represented efficiently by Deep Boltzmann Machines [19,20,37]. Not every topological state can however easily be represented by local Tensor Networks. A class of states for which this is challenging are chiral topological states breaking time-reversal symmetry. Such states were first realized in the context of the Fractional Quantum Hall (FQH) effect [38] and significant progress has since been made towards the construction of lattice models displaying the same physics, either in Hamiltonians realizing fractional Chern insulators [39][40][41][42][43][44] or in quantum anti-ferromagnets on several lattices [45][46][47][48]. One approach to describe the wave function of these anti-ferromagnets is to use parton constructed wave functions [49][50][51][52]. It has also been suggested to construct chiral lattice wave functions from the FQH continuum wave functions, the paradigmatic example being the Kalmeyer-Laughlin wave function [53]. Efforts to construct chiral topological states with PEPS have been undertaken recently [54][55][56][57][58], but the resulting states are critical. In the non-interacting case it has moreover been proven that the local parent Hamiltonian of a chiral fermionic Gaussian PEPS has to be gapless [55].
In this work we show that there is a strong relation between Restricted Boltzmann Machines and Tensor Network States in arbitrary dimension. We demonstrate that short-range RBM are a special subclass of EPS, while fully-connected RBM are a subclass of SBS with a flexible non-local geometry and low bond dimension. This relation provides additional insights over the geometric structure of RBM and their efficiency. We discuss the advantages and drawbacks of RBM and SBS and provide a way to combine them together. This generalization in the form of non-local String-Bond States takes leverage of both the entanglement structure of Tensor Networks and the efficiency of RBM. It allows for the description of states with larger local Hilbert space and has a flexible geometry. It can moreover be combined with more traditional Ansatz wave functions that serve as an initial approximation of the ground state.
We then apply these methods to the challenging problem of approximating chiral topological states. We prove that any Jastrow wave function, and thus the Kalmeyer-Laughlin wave function, can be written exactly as a RBM. We moreover show that a remarkable accuracy can be achieved numerically with much less parameters than is required for an exact construction. We numerically evaluate the power of EPS, SBS and RBM to approximate the ground state of a chiral spin liquid for which the Laughlin state is already a good approximation [45] and find that RBM and non-local SBS are able to achieve lower energy than the Laughlin wave function. By combining these classes of states with the Laughlin wave function, we are able to reach even lower energies and to characterize the properties of the ground state of the model.
The paper is organized as follows: in Section II we introduce the Variational Monte Carlo method and how it can be used to optimize both Tensor-Network and Neural-Network States. In Section III the mapping between RBM, EPS and SBS is derived and its geometric implications are discussed. Finally we apply these techniques to the approximation of chiral topological states in Section IV.

II. VARIATIONAL MONTE CARLO WITH TENSOR NETWORKS AND NEURAL-NETWORK STATES
A. The Variational Monte Carlo method Given a general Hamiltonian H, one of the main challenges of quantum many-body physics is to find its ground state |ψ 0 satisfying the Schrödinger equation H|ψ 0 = E 0 |ψ 0 . This eigenvalue problem can be mapped to an optimization problem through the variational principle, stating that the energy of any quantum state is higher than the energy of the ground state. A general pure quantum state on a lattice with N spins can be expressed in the basis spanned by |s 1 , . . . , s N , where s i are the projection of the spins on the z axis, as |ψ = s1,...,s N ψ(s 1 , . . . , s N )|s 1 , . . . , s N . (1) Finding the ground state amounts to finding the exponentially many parameters ψ(s 1 , . . . , s N ) minimizing the energy, which can only be done exactly for small sizes. Instead of searching for the ground state in the full Hilbert space, one may restrict the search to an Ansatz class specified by a particular form for the function ψ w (s 1 , . . . , s N ) depending on polynomially many variational parameters w. The Variational Monte Carlo method [59,60] (VMC) provides a general algorithm for optimizing the energy of such a wave function. One can compute the energy by expressing it as where s = s 1 , . . . , s N is a spin configuration, p(s) = |ψw(s)| 2 s |ψw(s)| 2 is a classical probability distribution and the local energy E loc (s) = s s|H|s ψw(s ) ψw(s) can be evaluated efficiently for Hamiltonians involving few-body interactions. The energy is therefore an expectation value with respect to a probability distribution p that can be evaluated using Markov Chain Monte Carlo sampling techniques such as the Metropolis-Hastings algorithm [61,62]. The second ingredient required to minimize the energy with respect to the parameters w is the gradient of the energy, which can be expressed in a similar form since where we have defined ∆ wi (s) = 1 ψw(s) ∂ψw(s) ∂wi as the logderivative of the wave function with respect to some parameter w i . This is also an expectation value with respect to the same probability distribution p and can therefore be sampled at the same time, which allows for the use of gradient-based optimization methods. At each iteration, the energy and its gradient are computed with Monte Carlo, the parameters w are updated by small steps in the direction of negative energy derivative (w i ← w i − α ∂Ew ∂wi ) and the process is repeated until convergence of the energy. The VMC method, in its simplest form, only requires the efficient computation of ψw(s ) ψw(s) for two spin configurations s and s , as well as the log-derivative of the wave function ∆ w (s). More efficient optimization methods can be used, such as conjugategradient descent, Stochastic Reconfiguration [63,64], the Newton method [65] or the linear method [66][67][68].
At this point one has to choose a special form for the wave function ψ w . One of the traditional variational wave functions for a many-body quantum system is a Jastrow wave function [59,69], which consists in its most general form of a product of wave functions for all pairs of spins: where each f ij is fully specified by its four values f ij (s i , s j ), s i , s j ∈ {−1, 1}. Such an Ansatz does not presuppose a particular local geometry of the many-body quantum state: in general this Ansatz can be non-local due to the correlations between all pairs of spins (Fig. 1a). A local structure can be introduced by choosing a form for f ij which decays with the distance between position i and j.

B. Variational Monte Carlo method with Tensor Networks
In condensed matter physics, important assets to simplify the problem are the geometric structure and locality of physical Hamiltonians. In 1D, it has been proven that ground states of gapped local Hamiltonians have an entanglement entropy of a subsystem which grows only like the boundary of the subsystem [21]. States satisfying such an area-law can be efficiently approximated by Matrix Product States (MPS) [22]. Matrix Product State are one dimensional Tensor Network States whose wave function for a spin configuration reads For each spin and lattice site, the matrix A si i of dimension D × D, where D is called the bond dimension, contains the variational parameters. Matrix Product States can be efficiently optimized using the Density Matrix Renormalization Group (DMRG) [70], but the previously described VMC method can also be applied [31,33] by observing that the ratio of two configurations is straightforward to compute, and that the log-derivative with respect to some matrix A s k k is given by In some cases, this method is less likely to be trapped in a local minimum than DMRG, since all coefficients can be updated at once. In addition, the cost only scales as O(D 3 ) in the bond dimension for periodic boundary conditions.
In higher dimensions, Matrix Product States can be defined by mapping the system to a line (Fig. 1b). The problem of this construction is evident from Fig. 1b. Spins which sit close to each other might be separated by a long distance on the line, the Ansatz thus fails to reproduce the local structure of the state, which leads to an exponential scaling of the computing resources needed with the system size [71]. The natural extension of MPS to 2D systems are Projected Entangled Pair States (PEPS) [26], for which the wave function can be written as a contraction of local tensors on the 2D lattice. While PEPS have been successful in describing strongly correlated quantum many body systems, their exact contraction is #P hard [27] and their optimization cannot rely on the standard VMC method without approximations. In the following we will instead consider other classes of tensornetwork states in more than one dimension for which the exact computation of the wave function is efficient, which allows for the direct use of the VMC method.
One approach consists in cutting a lattice in P small clusters of n p spins, or plaquettes, and construct the wave function exactly on each plaquette. The wave function of the full quantum system is then taken to be the product of the wave functions in each plaquette, in a mean-field fashion. Choosing overlapping plaquettes allows one to go beyond mean-field and include correlations between different plaquettes (Fig. 1c). The wave function of such an Entangled Plaquette State (EPS, also called a Correlated Product State) is written as [28][29][30]: where a coefficient C sp p is assigned to each of the 2 np (for spin-1/2 particles) configurations s p = s a1 , . . . , s an p of the spins on the plaquette p. Each C p can be seen as the most general function on the Hilbert space corresponding to the spins in plaquette p. The accuracy can be improved by enlarging the size of the plaquettes and the Ansatz is exact once the size of the plaquettes reaches the size of the lattice (which can only be achieved on small lattices). Moreover, once the spin configuration s p is fixed, the log-derivative of the wave function with respect to the variational parameters is simply which is efficient to compute. EPS are limited to small plaquettes since for each plaquette the number of coefficients scales exponentially with the size of the plaquette. However one can generalize this Ansatz by describing the state of clusters of spins by a MPS, avoiding the exponentially many coefficients needed. The lattice is now cut in overlapping 1D strings which can mediate correlations on longer distances compared to local plaquettes (Fig. 1d). The resulting Ansatz is a String-Bond State (SBS) [31] defined by a set of strings i ∈ S (each string i is an ordered subset of the set of spins) and a MPS for each string: The descriptive power of this Ansatz is highly dependant on the choice of strings: for example, by using small strings covering small plaquettes and a large bond dimension it includes EPS; whereas a single long string in a snake pattern includes MPS in 2D. In 3D, it has been used by choosing strings parallel to the axes of the lattice [32]. Since the form of the wave function is a product of MPS, the log-derivative with respect to some elements present in one of the MPS is simply the logderivative for the corresponding MPS (Eq. (6)). The VMC procedure for optimizing SBS and MPS thus have the same cost. In addition, the ratio of two configurations which differ only by a few spins can be computed by considering only the strings including these spins, which speeds up the computation considerably. Let us note that a SBS can be mapped analytically to a MPS, but that the resulting MPS would have a bond dimension exponential in the number of strings.

C. Variational Monte Carlo method with Neural Networks
Recently, it was realized that the VMC method can be viewed as a form of learning, which motivated the use of another class of seemingly unrelated states for describing the ground state of many-body quantum states: Neural-Network Quantum States [15] are quantum states for which the wave function has the structure of an artificial neural network. While a few different networks have been investigated [6,[15][16][17], the most promising results so far have been obtained with Boltzmann Machines [18]. Boltzmann Machines are a kind of generative stochastic artificial neural networks that can learn a distribution over the set of their inputs. In quantum many-body physics, the inputs are spin configurations and the wave function is interpreted as a (complex) probability distribution that the networks tries to approximate. Boltzmann Machines consist of two sets of binary units (classical spins): the visible units v i , i ∈ {1, . . . , N }, corresponding to the configurations of the original spins in a chosen basis, and hidden units h j , j ∈ {1, . . . , M } which introduce correlations between the visible units. The whole system interacts through an Ising interaction which defines a joint probability distribution over the visible and hidden units as the Boltzmann weight of this Hamiltonian: where the Hamiltonian H is defined as and Z is the partition function. The marginal probability of a visible configuration is then given by summing over all possible hidden configurations: and we take this quantity as Ansatz for the wave function: ψ w (s) = P (s). The variational parameters are the complex parameters of the Ising Hamiltonian. In the case where there are interactions between the hidden units ( Fig. 2a), the Boltzmann Machine is called a Deep Boltzmann Machine. It has been shown that Deep Boltz- mann Machines can efficiently represent ground states of many-body Hamiltonians with polynomial-size gap, local tensor-network state and quantum states generated by any polynomial size quantum circuits [19,20,37]. On the other hand, computing the wave function ψ w (s) of such a Deep Boltzmann Machine in the general case is intractable, due to the exponential sum over the hidden variables, so the VMC method cannot be applied to Deep Boltzmann Machines without approximations. We therefore turn to the investigation of Restricted Boltzmann Machines (RBM), which only include interactions between the visible and hidden units (as well as the onebody interaction terms which correspond to biases). In this case, the sum over the hidden units can be performed analytically and the resulting wave function can be written as (here we take the hidden units to have values ±1): RBM can represent many quantum states of interest, such as the toric code [36], any graph state, cluster states and coherent thermal states [19]; the possibility of computing efficiently ψ w (s) prevents it however to approximate all PEPS and ground states of local Hamiltonians [19]. On the other hand, since computing ψ w (s) and its derivative is very efficient, RBM can be optimized numerically via the VMC method.

III. RELATIONSHIP BETWEEN TENSOR-NETWORK AND NEURAL-NETWORK STATES
While the machine learning perspective which leads to the application of Boltzmann Machines to quantum many-body systems seems quite different from the information-theoretic approach to the structure of tensor-network states, we will see that they are in fact intimately related. It was recently shown that while fully connected RBM can exhibit volume-law entanglement, contrary to local tensor networks, all short-range RBM satisfy an area law [72]. Moreover short-range and sufficiently sparse RBM can be written as a MPS [37], but doing so for a fully-connected RBM would require an exponential scaling of the bond dimension with the size of the system. In this section we show that there is a tighter connection between RBM and the previously introduced tensor networks in arbitrary dimension.
A. Jastrow wave functions, RBM and the Majumdar-Gosh model Before turning to tensor networks, let us first consider the simple case of the Jastrow wave function (Eq. (4)). Boltzmann Machines including only interactions between the visible units lead to a wave function which has the form of a product between functions of pairs of spins, and is thus a Jastrow wave function. More generally, semi-restricted Boltzmann Machines including interactions between visible units as well as between hidden and visible units are a product of a RBM and a Jastrow factor. Nevertheless, one may ask whether a RBM alone is enough to describe a Jastrow factor. We first rewrite the RBM as where we have redefined the parameters with uppercase letters as the exponential of the original parameters, thus removing the exponentials in the hyperbolic cosine. This form will be convenient for the numerical simulations presented later. Since Jastrow wave functions are a product of functions of all pairs of spins, let us show that a RBM with one hidden unit can represent any function of two spins. It then follows that a RBM with M = N (N − 1)/2 hidden units, each representing a function of one pair of spins, can represent a Jastrow wave function with polynomial resources. We thus have to solve for a system of four non-linear equations with s 1 , s 2 ∈ {−1, 1} and f the most general function of two spins : ψ w (s 1 , s 2 ) = f (s 1 , s 2 ). This system is solved in Appendix A, providing an analytical solution for the parameters of the RBM to represent the Jastrow wave function exactly, or to arbitrary precision if f (s 1 , s 2 ) = 0 for some spins.
As an application, we use this result to write the ground state of the Majumdar-Gosh model [73] exactly as a RBM. The Majumdar-Ghosh model is defined by the following spin-1/2 Hamiltonian: The ground state wave function is a product of singlets formed by neighboring pairs of spins: This wave function can also be expanded in the computational basis as Using the previous result, each function of two spins f can be written as a RBM using one hidden unit, which leads to a RBM representation of the ground states with M = N/2 hidden units. We also find numerically on small systems that a RBM using less than M = N/2 has higher energy than the ground state, which suggests that M = N/2 could be optimal.
B. Short-range RBM are EPS Let us now turn to the specific case of RBM with shortrange connections (sRBM). This encompasses all quantum states that have previously been written exactly as a RBM, such as for example the toric code or the 1D symmetry-protected topological cluster state [36]. Such states have weights connections between visible hidden units that are local. Each hidden unit is connected to a local region with at most d neighboring spins. If we divide the lattice into M subsets p i , i ∈ {1, . . . , M }, the wave function can be rewritten as (we omit here the biases a j which are local one-body terms): where s i is the spin configuration in the subset p i . This is the form (Eq. (7)) of an EPS (Fig. 3a). For translational invariant systems, the short-range RBM becomes a convolutional RBM, which corresponds to a translational invariant EPS. The main difference between a short-range RBM and an EPS is that the RBM considers a very specific function among all possible functions of the spins inside a plaquette, hence EPS are more general than shortrange RBM. This also directly implies that the entanglement of short-range RBM follows an area law. The main advantage of short-range RBM over EPS is that due to the exponential scaling of EPS with the size of the plaquettes, larger plaquettes can be used in short-range RBM than in EPS. Since in practice for finite systems it is possible to work directly with fully-connected RBM, we argue that EPS or fully-connected RBM should be preferred to short-range RBM for numerical purposes.

C. Fully-connected RBM are SBS
Fully-connected RBM, on the other hand, do not always satisfy an area law [72] and hence cannot always be approximated by local tensor networks. Nevertheless, one can express the RBM wave function as (here we also omit the bias a j ): where are diagonal matrices of bond dimension 2. This shows that RBM are String-Bond States, as the wave function can be written as a product of MPS over strings, where each hidden unit corresponds to one string. The only difference between the SBS as depicted in Fig. 1d and the RBM is the geometry of the strings. In a fully-connected RBM, each string goes over the full lattice, while SBS have traditionally been used with smaller strings and with at most a few strings overlapping at each lattice site.

D. Generalizing RBM to non-local SBS
In the SBS language, RBM consists in many strings overlapping on the full lattice. The matrices in each string in the RBM are diagonal, hence commute, so they can be moved in the string up to a reordering of the spins. This means that each string does not have a fixed geometry and can adapt to stronger correlations in different parts of the lattice, even over long distances. This motivates us to generalize RBM to SBS with diagonal matrices in which each string covers the full lattice (Fig. 3b). In the following we denote these states as non-local dSBS. This amounts to relaxing the constraints on the RBM parameters to the most general diagonal matrix and enlarging the bond dimension of the matrices. For example taking the matrices with different parameters a sj i,j for each string, lattice site and spin direction, leads to the wave function (here D = 3): Note that even for 2 × 2 matrices, the non-local dSBS is more general than a RBM since the coefficients in each of the two matrices corresponding to one spin are independent from each other, which is not the case in the RBM. Generalizing such a wave function to larger spins than spin-1/2 is straightforward, since the spin s i is just indexing the parameters. This provides a way of defining a natural generalization of RBM which can handle systems with larger physical dimension. For instance this can be applied to spin-1 systems, while a naive construction for a RBM with spin-1 visible and hidden units leads to additional constraints, as well as to approximate bosonic systems by truncating the local Hilbert space of the bosons.
A further way to extend this class of states is to include non-commuting matrices. This fixes the geometry of each string by defining an order and also enables to represent more complicated interactions. In the following we will refer to SBS in such a geometry as non-local SBS. The advantage of this approach is that it can capture more complex correlations within each string, while introducing additional geometric information about the problem at hand. It comes however at a greater numerical cost than non-local dSBS or RBM due to the additional number of parameters. In practice, one can use an already optimized RBM or dSBS as a way of initializing a nonlocal SBS.
In some cases, the SBS representation is more compact than the RBM/dSBS representation. Let us consider again the ground state of the Majumdar-Gosh Hamiltonian, which we previously wrote as a RBM with M = N/2 hidden units. The ground state of the Majumdar-Gosh Hamiltonian can also be written as a simple MPS with bond dimension 3 and periodic boundary conditions, with matrices [24] A sn=−1 or for open boundary conditions with Since this state is a MPS, it is also a SBS with 1 string. The RBM representation of the same state requires N/2 strings. In practice the number of non-zero coefficients are comparable, since in both cases the representation is sparse, but for numerical purposes a fully-connected RBM needs of the order O(N 2 ) parameters before finding the exact ground state, while a MPS or SBS with one string will need O(N ) parameters for both open and periodic boundary conditions. Another example is the AKLT model [74] defined by the following spin-1 Hamiltonian in periodic boundary conditions: Its ground state has a simple form as a MPS of bond dimension 2. It can also be written as an exact RBM by mapping the system to a spin-1/2 chain, but the number of hidden units needed for an exact representation scales as O(N 2 ) in the system size [75]. We have numerically optimized the spin-1 extension of a RBM with form Eq. (27) (see Appendix B for the details of the numerical optimization) and found that already for small sizes of the chain a much higher number of parameters is required to approach the ground state energy as compared to a SBS with non-commuting matrices, which is exact with one string of bond dimension 2 (Fig. 4). We will also show in Section IV that in some other cases the RBM needs less parameters than a SBS to obtain a similar energy. This demonstrates that both RBM and SBS have advantages and that their efficiency depends on the particular model that is investigated. It remains an open question whether there exist MPS or SBS which can provably not be efficiently approximated by a RBM (for which the RBM would need exponentially many parameters). To be able to use both the advantages of RBM (efficient to compute, few parameters) and of SBS (complex representation, geometric interpretation), one can use the flexibility of SBS by including some strings that have a full MPS over the whole lattice, some strings which include only local connections and that will ensure that the locality of the system is preserved, and some strings that have the form of an RBM and that can easily capture large entanglement and long-range correlations. In many cases of interest, an initial approximation of the ground state can be obtained, either by optimizing simpler wave functions or by first applying DMRG to optimize a MPS. This initial approximation can then be used in conjunction with the previous Ansatz classes by multiplying an Ansatz wave function with the initial approximation. For the resulting wave function the ratio of the wave function on two configurations as well as the log-derivatives depend only on the respective ratio and log-derivatives of each separate wave function, making the application of the VMC method straightforward. This procedure has the advantage of reducing the number of parameters necessary for obtaining a good approximation to the ground state and making the optimization procedure more stable, since the initial state is not a completely random state. Such a procedure provides a generic way to enhance the power of more specific Ansatz wave functions tailored to particular problems, as we will demonstrate in the next section. A similar technique has been used to construct tensor-product projected states with tensor networks in Ref. 76 and more generally it can be used to project the wave function of an initial reference state in a Fock space and is thus also suitable to describe fermionic systems.

IV. APPLICATION TO CHIRAL TOPOLOGICAL STATES
In this section we turn to a practical application on a challenging problem for traditional tensor-network methods, namely the approximation of a state with chiral topological order. While chiral topological PEPS have been constructed, the resulting states are critical. Moreover the local parent Hamiltonian of a chiral fermionic Gaussian PEPS has to be gapless [55]. In the following we investigate if this obstruction carries on to the tensor-network and neural-network states that we have introduced previously.

A. RBM can describe a Laughlin state exactly
Let us consider a lattice version of the Laughlin wave function at filling factor 1/2 defined for a spin-1/2 system as where δ s fixes the total spin to 0, the z i are the complex coordinates of the positions of the lattice sites and the phase factor are defined as χ s k k = e iπ(k−1)(s k +1)/2 , ensuring that the state is a singlet. This wave function is equivalent to the Kalmeyer-Laughlin wave function in the thermodynamic limit and has been shown to describe a lattice state sharing the topological properties of the continuum Laughlin states on several lattices [77][78][79]. In addition, it can be written as a correlator from conformal fields, which has enabled the exact derivation of parent Hamiltonians for this state on any finite lattice [80].
The Laughlin wave function has the structure of a Jastrow wave function and we have shown in Section III A that any Jastrow wave function can be written as a RBM with M = N (N −1)/2 hidden units. It follows that RBM and non-local SBS can represent a gapped chiral topological state exactly. This is in sharp contrast to local tensor-network states for which there is no exact description of a (non-critical) chiral topological state known. This difference is due to the non-local connections in the RBM and Jastrow wave function which allow them to easily describe a Laughlin state. We note that a chiral p-wave superconductor is another example of a gapped chiral topological state which has been recently written as a (fermionic) quasi-local Boltzmann Machine [20].
The previous construction is however not satisfactory in the sense that the RBM requires a number of hidden units scaling as O(N 2 ), which is too high for numerical purposes on lattices which are not extremely small. We thus turn to the approximate representation of the Laughlin wave function using a RBM.

B. Numerical approximation of a Laughlin state
The lattice Laughlin wave function we consider has an exact parent Hamiltonian on a finite lattice [80] defined as where w ij = zi+zj zi−zj and S j = (S x j , S y j , S z j ) is the spin operator at site j. We specialize to the square lattice with open boundary conditions and minimize the energy of different wave functions with respect to this Hamiltonian by applying the VMC method presented in Section II B with a Stochastic Reconfiguration optimization which is equivalent to the natural gradient descent [63,81,82] (details of the numerical optimization can be found in Appendix B). Results are presented in Table I.
We find that EPS with plaquettes of size up to 3 × 3 have an energy difference with the Laughlin state of the order 10 −2 , which is better than a short-range RBM (denoted sRBM) on 3 × 3 plaquettes and up to M = 4 hidden units per plaquette, while the energy of a fully connected RBM with M = 2N hidden units is within 10 −5 of the energy of the ground state. The resulting RBM uses much less hidden units than would be required for it to be exact, yet reaches an overlap of 99.99% with the Laughlin wave function. This result shows that the fully-connected structure of the RBM is an advantage to describe this state and that EPS can be used instead of short-range RBM. We have moreover found that EPS are easier to optimize numerically than a short-range RBM: they are more stable, since each coefficient is considered separately, no exponentials or products that lead to unstable behavior are present and the derivatives have a very simple form (Eq. (8)).

C. Numerical approximation of a chiral spin liquid
The previous results indicate that RBM might be useful for approximating chiral topological states numerically, but are limited to relatively small sizes due to the non-local nature of the parent Hamiltonian, which includes interactions between all triplets of spins on the lattice. In Ref. 45 a local Hamiltonian stabilizing a state in the same class as the Laughlin state was obtained by restricting H parent to local terms and setting the longrange interactions to zero. This leads to the Hamiltonian (35) where < i, j > indicates indices of nearest neighbours on the lattice and < i, j, k > indicates indices of all triangles of neighboring spins, with vertices labelled in the counter clockwise direction. We focus on the case J = 1, J χ = 1 for which the ground state of H l has above 98% overlap with the Laughlin wave function (Eq.(33)) on a 4 × 4 lattice. We minimize the energy of different classes of states on a 4 × 4 and 10 × 10 square lattice with open boundary conditions. For optimizing wave functions with tens of thousands of parameters we use a batch version of Stochastic Reconfiguration which optimizes a random subset of the parameters at each iteration (see Appendix B). We consider several Ansatz wave functions including EPS with plaquettes of size 2 × 2, 3 × 2, 4 × 2 and 3 × 3, local SBS covering the lattice with horizontal, vertical and diagonal strings and increasing bond dimension, RBM with increasing number of hidden units, non-local SBS with diagonal matrices (denoted dSBS) or with non-commuting matrices of bond dimension 2 and different number of strings covering the full lattice. We observe that while the optimization of EPS and SBS is particularly stable, the optimization of RBM can lead to numerical instabilities that are resolved by writing the RBM in the form presented in Eq. (14). Since we use the same optimization procedure for all Ansatz wave functions and since the required time (and memory) to perform the optimization is mainly a function of the number of parameters and of the accuracy, we can compare the Ansatz classes by comparing how many parameters are needed to obtain a similar energy.
We first focus (Fig. 5a) on a 4 × 4 lattice for which the exact ground state can be obtained using exact diagonalization. Local SBS have an energy higher than the Laughlin state and the energy is saturated with increasing bond dimension, which means that the pattern of horizontal, vertical and diagonal strings is not enough to capture all correlations in the ground state. While a large 4 × 4 plaquette would make EPS exact on this small lattice, this would require 2 16 parameters. The energy of the Laughlin state is already reached for 3 × 3 plaquettes. RBM with a number of hidden units larger than N and non-local SBS with a corresponding number of strings have lower energy than the Laughlin state or the Jastrow wave function. When the number of strings grows, the energy decreases even further. On a larger 10 × 10 lattice (Fig. 5b) the exact ground state energy is unknown but we can compare the energy of the different Ansatz wave functions and observe similar results. Only the Jastrow wave function, non-local SBS and RBM have an energy comparable to the Laughlin state. Notice that non-local SBS have a constant factor more parameters than a RBM for the same number of strings. On the one side this allows SBS to achieve better energy than RBM with the same number of strings. On the other side this comes with the drawback than we can only optimize fewer strings and on the large lattice we are numerically limited to non-local dSBS with up to N strings. We can conclude that RBM are particularly efficient in this example since they require significantly less parameters than SBS for attaining the same energy. This has to be contrasted with the previous examples of the Majumdar-Gosh and AKLT models where the opposite was true. Therefore each class of states has advantages and drawbacks depending on the model we are looking at. We note in addition that a nonlocal SBS can be initialized with the results of a previous optimization with a RBM, which could provide a way of minimizing the difficulties of optimizing large number of parameters.
As we have previously noticed, we can also use an initial approximation of the ground state in combination with the previous Ansatz classes. In the case of the Hamiltonian H l , the analytical Laughlin wave function can be used as our initial approximation in Eq. 32. We denote l-EPS (resp. l-SBS, l-RBM) a wave function that consists in a product of the Laughlin wave function and an EPS (resp. SBS, RBM) and minimize the energy of the resulting states. This allows us to obtain lower energies for each Ansatz class (Fig. 5c). Once

S
(2) A = − ln Tr ρ 2 A of each subregion using the Metropolis-Hastings Monte Carlo algorithm with two independent spin chains [83,84]. The topological entanglement entropy is then defined as [85,86] S topo =S and is expected to be equal to − ln 2 ≈ −0.347 for the Laughlin state [87]. The results we obtain are presented in Table II and provide additional evidence that the ground state of H l has the same topological properties as the Laughlin state. The Hamiltonian H l was recently investigated on an infinite lattice using infinite-PEPS [88] and further evidence was provided that the ground state is chiral. The PEPS results suggest the presence of longrange algebraically decaying correlations that may be a feature of the model or a restriction of PEPS to study chiral systems. The correlations on short distances agree with the correlations that we can compute on our finite system (Fig. 7a) but our method does not allow us to make claims about the long-distance behavior of the correlation function. We also observe that fully-connected RBM cannot be defined directly in the thermodynamic limit without a truncation of the distance of the interaction between visible and hidden units, thus transforming the RBM into a short-range RBM (albeit of larger range than an EPS). In Ref. [72] it was observed that the entanglement entropy of some specific short-range RBM can be computed analytically from the weights of the RBM. The method we use here works in the general case and also for a fully-connected RBM, but requires Monte Carlo sampling of the wave function. The optimized RBM weights encode every information about the wave function, it would thus be interesting to understand more precisely which quantities can be extracted directly from them. Whether direct information about the phase of the system can be obtained in this way without requiring Monte Carlo sampling remains an interesting open problem for future work.

V. CONCLUSION
We have shown that there is a strong connection between Neural-Network Quantum States in the form of Boltzmann Machines and some Tensor-Network states that can be optimized using the Variational Monte Carlo method : while short-range Restricted Boltzmann Machines are a subclass of Entangled Plaquette States, fully connected Restricted Boltzmann Machines are a subclass of String-Bond States. These String-Bond States are however different from traditional String-Bond States due to their non-local structure which connects every spin on the lattice to every string. This enabled us to generalize Restricted Boltzmann Machines by introducing nonlocal (diagonal or non-commuting) String-Bond States which can be defined for larger local Hilbert space and with additional geometric flexibility. We compared the power of these different classes of states and showed that while there are cases where String-Bond States require less parameters than fully-connected Restricted Boltzmann Machines to describe the ground state of a manybody Hamiltonian, there are also cases where the additional parameters in each string make String-Bond States less efficient to optimize numerically. We applied these methods to the challenging problem of describing states with chiral topological order, which is hard for traditional Tensor Networks. We showed that every Jastrow wave function, and thus a Laughlin wave function, can be written as an exact Restricted Boltzmann Machine. In addition we gave numerical evidence that a Restricted Boltzmann Machine with a much smaller number of hidden units can still give a good approximation to the Laughlin state. Finally we turned to the approximation of the ground state of a chiral spin liquid and showed that Restricted Boltzmann Machines achieve a lower energy than the Laughlin state and the same topological entanglement entropy. We argued that combining different classes of states allows to take advantage of the initial knowledge of the model and of the particularities of each class. This was demonstrated by combining a Jastrow wave function to Tensor Networks and Restricted Boltzmann Machines, which allowed us to get lower energies than the initial states and characterize the ground state.
Our work sheds some light on the representative power of Restricted Boltzmann Machines and establish a bridge between their optimization and the optimization of Tensor Network states. On the one hand, the methods developed in this work can be used to target the ground state of other Hamiltonians and it would be interesting to know whether similar results can be achieved for example for non-Abelian chiral spin liquids [89,90] or generalized to fermionic systems of electrons in the continuum displaying the Fractional Quantum Hall effect. On the other hand, we also showed that some tools used in machine learning can be rephrased in Tensor Network language, thus providing additional physical insights about the systems they describe. Matrix Product States have already been used as a tool for supervised learning [91,92] and our work opens up the possibility of using not only Restricted Boltzmann Machines, but also String-Bond States to represent a probability distribution over some data while encoding additional information about its geometric structure.
Note added. After the completion of this manuscript, related independent work came to our attention. Y. Nomura et al. [93] combine RBM with pair product wave functions and apply them to the Heisenberg and Hubbard models. S. R. Clark [94] constructs a mapping between RBM and EPS/Correlator Product States. R. Kaubruegger et al. [95] give further analytical and numerical evidence supporting the application of RBM to chiral topological states such as the Laughlin state.
Let us show that a RBM with one hidden unit can represent any function f of two spins. It then follows that a RBM with M = N (N − 1)/2 hidden units, each representing a function of one pair of spins, can represent a Jastrow wave function. We parametrize f by its four values on two spins s 1 , s 2 ∈ {−1, 1} and solve for a system of four non-linear equations: where we have set B 1 = B 2 = 1. The RBM is well defined when all parameters are non-zero and we change of variables by defining X = W 1 W 2 , Y = W1 W2 , A = A 1 A 2 , B = A1 A2 , obtaining a new set of equations: We first suppose that the values F sisj are non-zero. These quadratic equations all have non-zero analytical solutions in the complex plane, that we denote A 0 , B 0 , X 0 , Y 0 . The original parameters are then the solutions of which is again a set of quadratic equations with nonzero analytical solutions. If F 11 = F −1−1 = 0 (resp. F 1−1 = F −11 = 0), the exact solution is given directly by A 0 = 1, X 0 = i (resp. B 0 = 1, Y = i). In the remaining cases where some F sisj are zeros, the equations do not always have an exact solution, but the function can still be approximated to arbitrary precision. This case corresponds to strong restrictions on the part of the Hilbert space which is used to write the wave function and these constraints can also be imposed on the states directly by adding a delta function to the wave function which is equal to 1 only when the constraints on the spins are satisfied. Having a Markov Chain Monte Carlo sampling which does not visit these states then allows for a more efficient sampling.