Simplicity of mean-field theories in neural quantum states

The utilization of artificial neural networks for representing quantum many-body wave functions has garnered significant attention, with enormous recent progress for both ground states and non-equilibrium dynamics. However, quantifying state complexity within this neural quantum states framework remains elusive. In this study, we address this key open question from the complementary point of view: Which states are simple to represent with neural quantum states? Concretely, we show on a general level that ground states of mean-field theories with permutation symmetry only require a limited number of independent neural network parameters. We analytically establish that, in the thermodynamic limit, convergence to the ground state of the fully-connected transverse-field Ising model (TFIM), the mean-field Ising model, can be achieved with just one single parameter. Expanding our analysis, we explore the behavior of the 1-parameter ansatz under breaking of the permutation symmetry. For that purpose, we consider the TFIM with tunable long-range interactions, characterized by an interaction exponent $\alpha$. We show analytically that the 1-parameter ansatz for the neural quantum state still accurately captures the ground state for a whole range of values for $0\le \alpha \le 1$, implying a mean-field description of the model in this regime.


I. INTRODUCTION
Representing the wave function of complex quantum matter is exceedingly difficult.Addressing this challenge has prompted the proposal of various techniques and approximations [1][2][3][4][5][6].However, each method encounters distinct difficulties.For instance, exact diagonalization becomes computationally intractable for large systems due to the exponentially growing basis.Quantum Monte Carlo methods can encounter the sign problem [7] for many interesting systems.Tensor networks experience an exponentially increasing bond dimension for systems with volume-law scaling of entanglement entropy [8,9].Recently, neural quantum states (NQSs) have been introduced as an alternative method to leverage the expressive power of neural networks to represent the quantum wave function [10].
This approach has delivered remarkable results in discovering ground states and describing dynamics of quantum many-body systems in previously inaccessible regimes via other methods [11][12][13][14][15].One key feature that distinguishes NQS from tensor networks is the ability to represent volume-law entangled states [16].
Despite demonstrations of the expressive power of NQS [17,18], there is a critical need to understand a metric of complexity for it, similar to how bond dimension and circuit depth are utilized for characterizing tensor networks and quantum circuits, respectively.The outstanding question is: What criteria can be employed to quantify the complexity of a neural quantum state?In a first step we approach this question from the complementary side: Which states can be easily described by a neural quantum network?Much like how we know that area-law states can be efficiently described using a tensor network, recognizing the states that NQS can easily describe might be crucial in advancing our understanding of NQS complexity.
A natural choice is to consider that the complexity is related to the number of independent variational parameters, K, a trained NQS requires to describe a specific quantum state accurately.In this work, we characterize states that have minimal complexity in this sense under the neural quantum states formalism.In particular, we show that neural quantum states can describe mean-field theories with permutation symmetry with a very small number of parameters.We show that on a general level, this leads to a reduction in required network parameters from K × L to just K, where K and L correspond to the number of hidden neurons and the number of physical spins respectively, see Fig. 1 for an illustration.
We demonstrate this using the fully-connected transverse-field Ising model.Most importantly we find that for this model convergence in the thermodynamic limit is achieved even with a single parameter, i.e., K = 1.This makes the mean-field solution of the TFIM as simple as possible for NQS.As a next step, we study the behavior of the network beyond mean-field theory by breaking the permutation symmetry.For that purpose, we consider the long-range interacting TFIM with power-law decaying interactions.This allows us to tune the deviation from mean-field through the interaction exponent α in a controlled way.We show analytically that a neural network with a single parameter is sufficient to obtain the ground state for α ≤ 1 in the thermodynamic limit.As a consequence, we find that the ground state of the long-range interacting TFIM is still described by a MFT for this range of values of α.

II. NEURAL QUANTUM STATES AND PERMUTATION INVARIANCE
Neural quantum states offer a framework that leverages the expressive capabilities and generalization power arXiv:2308.10934v2[quant-ph] 11 Jun 2024 of neural networks for representing quantum wave functions.This approach relies around the notion of expressing a wave function within a complete basis set denoted as |s⟩, characterized by the expansion: Here, Ψ W (s) represents the neural network ansatz which depends on a set of variational parameters W that we learn according to a prescription of choice.In this work we will consider a 1D lattice of size L with periodic boundary conditions and spin-1/2 degrees of freedom.We take as a basis |s⟩, the spin configuration in the computational basis.
We are particularly interested in the structure of W for a permutation invariant Hamiltonian as they appear naturally for mean-field theories.For this, we will approximate the wave function by an artificial neural network.So we consider a general feed-forward neural network with L-input units, K-hidden neurons, no bias, and lastly a single output neuron that returns the value of log(Ψ W (s)).A schematic depiction of the initial architecture can be found in Fig. 1A.
One important technique to minimize the number of parameters needed to describe the wave function in NQS is to take advantage of the symmetries at the level of the weight matrix [10], reducing the number of independent parameters.As a consequence, given that the network doesn't need to find the symmetry on its own, the computational cost of NQS can be greatly reduced without compromising the accuracy of the model.
The variational ansatz for our architecture can be expressed as: where y i (s) ≡ j W ij s j , f denotes the activation function and W ij represents the weight matrix elements.However, to ensure that the output remains invariant under any permutation of the input configuration, we impose a constraint between the elements of W i,j .For this, we consider the set of all possible permutations of the form π : s j → s π(j) , i.e., π(s) ≡ (s π(1) , . . ., s π(L) ) and we require that Given that due to the nature of permutations π is a bijective mapping and hence there is a unique π −1 , we then rewrite our constraint in a more convenient way as, Since this is supposed to hold for any permutation and to be independent of the values of s j , it follows that the quantity in parenthesis has to vanish, e.g., W ij = W iπ −1 (j) .Given that we include all possible permutations of our input configurations this immediately implies that all elements along the rows are constrained to be exactly identical (W ij = W i ).Therefore, our weight matrix now has the form: Enforcing the permutation symmetry effectively reduced the number of independent parameters from L×K in the general weight matrix to K in the constrained case.This is reminiscent of the implementations of symmetry for translational invariance in [10,19,20] where the weight matrix is reduced to a set of circulant matrices [21] and each matrix can be treated as a convolutional kernel that is applied on all translated versions of the spin configuration s.It is worth mentioning that permutation symmetry imposes a much stronger constraint than translation invariance.
As a key consequence of the above considerations, we can map the symmetry-imposed weight matrix to a new feed-forward network that takes as input the total magnetization M s = i s i and returns the value of log Ψ(s) (see Fig. 1B).Using this structure, the symmetry-imposed output wave function can be written as: It is important to note that this neural network wave function is not generally equivalent to a product state ansatz.Therefore, it opens the possibility to capture finite-size effects or correlations that could not be accounted for otherwise.Product states can still be captured by our ansatz as we will show in section III, where in the thermodynamic limit this ansatz becomes a product state asymptotically.Therefore, when aiming to describe the ground state of a permutation-invariant Hamiltonian, Eq. ( 6) provides an ansatz that accurately approximates the exact value for sufficiently many parameters W k .This can be guaranteed due to the fact that multi-layer feed-forward neural networks have been shown to be universal function approximators [22].Thus, for any targeted error ϵ, there exists a K such that That is, we may bound our approximation error by an arbitrary amount ϵ by tuning the value of K.The natural question is: How many parameters are necessary and how does this number depend on system size?We discuss this in detail in the remainder of the manuscript.

III. LEARNING THE FULLY-CONNECTED TFIM GROUND STATE
We proceed by benchmarking the ansatz proposed in the previous section for a particular permutationinvariant Hamiltonian.This class of systems which give rise to mean-field models with permutation symmetry can be cast as long-range spin or boson models [23].Concretely, we consider the fully-connected TFIM, whose Hamiltonian in terms of the Pauli matrices S ρ (ρ = x, y, z), takes the form: This model exhibits a quantum phase transition at g c = 2J separating a ferromagnetic from a paramagnetic phase [24].This critical point as well as ground states, and excited states have been studied previously in [23,25,26].In the remainder of this work, we fix the value of J = 1.
Although we expect that the choice of activation function is not crucial, we will take as ansatz the one defined in Eq. ( 6), with f (x) = log (cosh(x)) as the activation function which is the choice that maps our model to a restricted Boltzmann machine which is a common reference for NQS [10,11,16].Therefore, the wave function ansatz we will use in the following is given by: In order to numerically obtain the ground state of this model we will use stochastic reconfiguration [27,28] to minimize the variational energy E(W ) (see Eq. ( 13)).It's of particular interest to study how the convergence of the model is affected by the choice of K.That is, for a given system size L how many parameters do we need to converge to the exact ground state?
To evaluate the accuracy of this ansatz in finding the ground state of the fully-connected TFIM, we train the model and track both the relative energy: where the expectation values are defined as: ⟨. . .⟩ ≡ ⟨Ψ W | . . .|Ψ W ⟩, with appropriate normalization of the wave function and E ED denotes the ground state energy obtained by exact diagonalization.As a further measure for the accuracy of our obtained wave function, we study also the energy fluctuation density defined as: If our variational wave function is the targeted ground state, we would have σ 2 (H) = 0 on fundamental grounds.In Fig. 1C we show the results of the ground state search training for several choices of K at a fixed small system size of L = 12.The averages were computed by summing over the full Hilbert space basis configurations, which we refer to as exact sampling.We find that as one increases the number of independent variational parameters K, the model is able to lower the error further.This coincides with our claim emerging from the universal function approximator theorem in Eq. ( 7), according FIG. 2. Relative energy for one variational parameter K = 1 under training iterations for several system sizes L. We used exact sampling when averaging over spin configurations.
to which we can reduce our error arbitrarily by taking more parameters K.
At this point, one further question arises: how do the parameter requirements scale with systems size?To address this question, in Fig. 2 we show the relative energy for several L using only one parameter K = 1.Here, we used exact diagonalization to compute the ground state energy and compare to our variational ansatz.It appears that as L grows the K = 1 result improves, signaling that convergence in fact requires fewer parameters as we increase system size.To further study this apparent behavior, in the following section, we will attempt to understand analytically how far we can get with a single parameter ansatz.

A. Thermodynamic limit
In this section, we demonstrate that the ground state of the fully-connected TFIM can be exactly described with only one variational parameter in the thermodynamic limit, building upon the observed trend in finitesize systems in Fig. 2. We argue that this result signals the low complexity of permutation-symmetric models for neural quantum states.By taking the one-parameter case (K = 1) we can take the following ansatz: Due to the fact that this ansatz has only one variational parameter we can compute quantities such as the magnetization, energy, and the energy fluctuations analytically.This will allow us to show the asymptotic convergence to the ground state in the thermodynamic limit.To proceed with this computation we will restrict our system without loss of generality to the ferromagnetic phase and assume that L is sufficiently large.Under these assumptions, the typical configurations s are such that |M s | ∼ L, and hence we can approximate the log cosh function as log(cosh(W M s )) ≈ W M s for W M s ∼ L ≫ 1.We emphasize that the analytics conducted are applicable to any activation function exhibiting asymptotically linear behavior.This generality in our findings underscores the broader applicability of the results.
Details on all the following computations can be found in App. A. We first compute the energy as a function of W [10].For the energy we obtain: For the ground state we minimize Eq. ( 14) yielding the value of W in the ground state, which is given by: We can go one step further and relate this single parameter directly to the mean-field order parameter.If we compute the magnetization M ≡ L −1 i ⟨S z i ⟩, one finds that the parameter W is related to the magnetization by Where the second equality was obtained by substituting the value of W in the ground state Eq.(15).It is worth mentioning that this equation is identical to the self-consistency equation for the mean-field solution of the nearest-neighbor Ising model at zero temperature, then we can identify that W ∼ M .Hence, it serves as an example of how the network parameters can be directly related to physical quantities, and studying their structure may be beneficial to understand NQS complexity.As a next step, we show that this state is an eigenstate of our Hamiltonian asymptotically in L. To do this, we make use of the energy fluctuations.In App.A, we obtain as an analytical expression: This result indicates that the scale of the energy fluctuations asymptotically vanishes as 1/ √ L. Implying that, in the thermodynamic limit, our ansatz becomes an exact eigenstate of the Hamiltonian.The one-parameter ansatz also provides plenty of information on the physics of the model.For instance, from Eq. ( 15) one may also extract the critical g separating the ferromagnetic and paramagnetic region.This can be detected by noticing that the cosh −1 becomes undefined for g > 2J.

IV. BREAKING PERMUTATION SYMMETRY
We now study the effect of deviating from mean-field by breaking the permutation symmetry.For this, we will study the convergence to the ground state for the longrange interacting TFIM with periodic boundary conditions (PBC).The Hamiltonian for this model is given by: Where N (L, α) |i−j| α is the so-called Kac normalization factor used to ensure that the energy is extensive.We take |i − j| to be the minimum distance between two lattice sites in PBC.When α = 0, the model is equivalent to the fully-connected TFIM studied before, while as α → ∞, the model reduces to the nearestneighbor transverse-field Ising model.This model can be realized experimentally in trapped ion simulators for 0 ≤ α ≲ 3.5 [29][30][31][32][33][34][35] and is known to go through several phase transitions as a function of α [36][37][38].This setup is ideal for our purposes because we can adjust the deviation from mean-field upon tuning the value of α, observing the eventual breakdown of our ansatz.
To verify the convergence to the ground state of this model as we vary the interaction range α, we compute the energy fluctuation density σ 2 (H) of the one-parameter ansatz.We deduce an analytical expression which allows us to establish its system-size dependence: Details of this computation can be found in App. A. Given that we're interested in the thermodynamic limit behavior, we go ahead and rewrite the two remaining sums in terms of the generalized harmonic numbers H n,r ≡ n k=1 k −r .For r > 1, we make use of the fact that, in the limit when n → ∞, H n,r converges to the Riemann zeta function defined as ζ(r) ≡ ∞ k=1 k −r .This allows us to obtain the energy fluctuations in the thermodynamic limit directly as By analyzing the generalized harmonic series ratio convergence as a function of α in Eq. ( 20), we find that in general, for α ≤ 1 the energy fluctuations will vanish.This result implies that even though we have broken the permutation symmetry our ansatz is still able to exactly approach the ground state for large enough systems which is remarkable.
For the case of α > 1, we no longer have diverging sums and can write the entire expression as a function of the ratio of the Riemann zeta functions.With this, we have the precise scaling with which the energy fluctuations decay as a function of α in the thermodynamic limit.In Fig. 3, we plot the expression σ 2 for different system sizes.Remarkably, we observe that the fluctuations vanish asymptotically as 1/ √ L for α ≤ 1/2 precisely as for the α = 0 mean-field case.In contrast, for 1/2 < α ≤ 1 the fluctuations still vanish asymptotically but with a slower decay.For this last interval the decays at the edges (α = 1/2, 1) follow the scaling log(L)/L and ζ(2)/ log(L) respectively.For values of α > 1, we also plot the limiting curve, where we see that the energy fluctuations steadily increase until saturating to an upper bound value of g 2 /4J.This value comes from evaluating the energy fluctuation density at the ground state with the W given by Eq. ( 15).
The one-parameter ansatz consequently captures exactly the ground state for a finite range of alpha values.Interestingly, we find that the one-parameter ansatz can exactly describe quantum states even without permutation symmetry in the thermodynamic limit.Furthermore, it indicates that the long-range interacting TFIM has a simplified ground state within the range 0 < α ≤ 1.
This type of long-range Ising model is known to exhibit quantum phase transitions as a function of α, a Kosterlitz-Thouless transition at α = 1, and for α ≥ 2 one recovers short-range behavior [29,[36][37][38].However, the details about the ground-state properties in partic-ular in the regime of low α have not been fully settled so far.For instance, in Ref. [37], it has been found that for even larger α < 5/3 the ground state might be described by a long-range mean-field theory.It is unclear, however, to which extent the utilized epsilon expansion in this high-α regime remains accurate.
For a comprehensive study of the prior findings related to this model, we would like to further direct the reader's attention to the recent review in Ref. [39].It is essential to emphasize that while our conclusion of mean-field type behavior may appear intuitive, there has been a notable absence of formal proof establishing this models' ground state as a one-parameter simplified ansatz.While previous computations, employing linear spin wave theory for instance, have indeed yielded results consistent with our own observations, it is also noteworthy to emphasize that there has remained also a crucial difference.Within linear spin wave theory it has been shown that no inconsistencies are generated [39], which likely make the approach reliable, but still based on some unproven assumptions.Here, with our approach no such assumption is required providing us with a conclusive and rigorous proof.

V. CONCLUSION
In this work, we have investigated a class of models that demonstrate minimal complexity within the framework of NQS in the sense of the number of parameters required to describe the ground state.By imposing permutation symmetry on our networks, we have found that NQSs are particularly well-suited to describe permutation-invariant mean-field states.To validate this claim, we have investigated the ground state of the fullyconnected TFIM.In particular, we show that, even for finite systems, the ground state can be accurately captured using a small number of parameters, and this approximation improves as the system size increases.
An important advantage of our approach is the ability to capture finite-size effects, entanglement and quantum correlations, as it does not necessarily assume a product state.This is particularly relevant, as discussed in detail for instance in Ref. [23], permutation symmetry does not always imply a product-state structure, especially for finite-sized systems or for dynamics.For the latter case it is also very important to note that such permutationinvariant Hamiltonians are key for the creation of spin squeezed states through one-axis twisting [40].Let us note, however, that we furthermore show that product states, such as those observed in the thermodynamic limit of our model, can still be achieved.Specifically, by opting for an activation function which behaves linearly at least asymptotically, we can witness the emergence of a product-state structure for the ground state in the thermodynamic limit.
Motivated by these observations, we have proposed a one-parameter ansatz and have analytically shown that the ground state can be described by a single variational parameter in the thermodynamic limit.Furthermore, we have examined the impact of breaking permutation symmetry by studying the ground state of the long-range interacting TFIM and tuning the interaction range.Surprisingly, we have found that the one-parameter ansatz remains effective for values of α up to 1, suggesting that the model is still described by mean-field in this regime and robustness in the presence of weak symmetry breaking.
A potential direction for future research lies in exploring the possibility of learning potential simplifications, such as the one discussed in this work for permutation symmetry, within the NQS parametrization directly from the network parameters.Such an approach may have the ability to shed light on the overall complexity of the network for more complicated scenarios.Additionally, an intriguing angle to consider is whether we can uncover underlying physics in the network parameters, such as the relation found here between the 1-parameter model and magnetization.In other words, can we demystify the black-box nature of the neural network and gain insights into the specific physical principles it exploits to represent the quantum state effectively?
While our work has shed some light on which quantum states are simple to capture with NQSs, it remains an open question to identify a general and physical characterization of states, which are difficult for NQSs.Recent developments have highlighted that even ground states of frustrated quantum magnets appear to be manageable within NQS [12], although very deep networks seem to be necessary in order to be quantitatively accurate.This suggest that the intricate sign structure in such frustrated quantum magnets might be what is difficult to represent within NQSs.This would apply equivalently also to fermionic quantum matter.However, finally, this question will have to be explored in further detail in order to eventually arrive at a physical understanding of NQS complexity in the end.

DATA AVAILABILITY
The data shown in the figures is available on Zenodo [41].
We are interested in computing the local energy, which is defined as E loc (s) = ⟨s|H|Ψ⟩ ⟨s|Ψ⟩ .For our model, the local energy takes the form: where s ′ (s i ) means we have flipped the spin in the i-th site.Using the fact that log⟨s|Ψ⟩ = log cosh (W i s i ) from our ansatz, we can rewrite the second term as: In the ferromagnetic phase, we can use the approximation log cosh x ≈ x for x ∼ L ≫ 1. Making this approximation, the second term reduces to: To eliminate the exponential term and have all dependence on s i be polynomial, we can make use of the following identity: This simplification may not be immediately obvious, but it will be helpful when we sum over all spin configurations.Using this identity, we can simplify our local energy expression to the form: With this local energy, we can compute the total energy of the system as follows: The normalization can be readily computed from our ansatz and the exponential term simplification, Analogously, by substituting into the equation for the total energy, we derive the subsequent expression: The importance of the polynomial dependence in s i becomes evident in this expression.Specifically, in the first term, two terms in the product will interact with the external s i and s j , respectively, namely (cosh(2W )s i + sinh(2W )) and (cosh(2W )s j + sinh(2W )).Thus, the first term contains two terms that give sinh(2W ) and L − 2 terms that give cosh(2W ), so that the overall expression simplifies to: . (A13)

Energy fluctuations
The other quantity we need to compute analytically is the energy fluctuation density, For the fluctuations, all that remains to be computed is the first term ⟨H 2 ⟩ since the second term is precisely the energy computed above ⟨H⟩ = E.In particular, a good simplification we can perform to rewrite the H 2 term in terms of the local energy is: The form of E loc (s) 2 is given by: Analogously, we may rewrite all sums in this way to apply the same method to sum over spin configurations as the one used in the energy calculation.If one follows this process we find that the value for the expectation value of H 2 is:

FIG. 3 .
FIG.3.Normalized energy fluctuations of the long-range interacting transverse-field Ising model are shown as a function of interaction range α for various system sizes.The fluctuations are normalized by the coupling J due to the inaccessibility of the gap in the thermodynamic limit for non-zero values of α.In red we show the limiting curve in the thermodynamic limit.