Discovering Quantum Phase Transitions with Fermionic Neural Networks

Deep neural networks have been extremely successful as highly accurate wave function ans\"atze for variational Monte Carlo calculations of molecular ground states. We present an extension of one such ansatz, FermiNet, to calculations of the ground states of periodic Hamiltonians, and study the homogeneous electron gas. FermiNet calculations of the ground-state energies of small electron gas systems are in excellent agreement with previous initiator full configuration interaction quantum Monte Carlo and diffusion Monte Carlo calculations. We investigate the spin-polarized homogeneous electron gas and demonstrate that the same neural network architecture is capable of accurately representing both the delocalized Fermi liquid state and the localized Wigner crystal state. The network is given no \emph{a priori} knowledge that a phase transition exists, but converges on the translationally invariant ground state at high density and spontaneously breaks the symmetry to produce the crystalline ground state at low density.

The correlated motion of electrons in condensed matter gives rise to rich emergent phenomena. Although these are governed by fundamental quantum mechanical principles known for almost a century, they remain difficult to understand and even harder to predict theoretically or computationally. One of the major themes of modern condensed matter physics is the study of phase transitions caused by electron correlation.
The difficulty of solving the Schrödinger equation scales exponentially with particle number in general, so exact solutions for interacting many-electron systems are rarely accessible. This explains why approximate numerical techniques have become such vital tools in the search for exotic zero-temperature phases, providing accurate predictions of experimentally observable quantities in phases already understood qualitatively. Most computational approaches, however, encode prior assumptions about the appropriate phase, which poses a substantial difficulty in predicting previously unknown electronic states. Changes in symmetry or topology are rarely discovered computationally before they have been seen experimentally or proposed on theoretical grounds.
In this Letter, we introduce a neural-network-based approach to predicting the qualitative nature of electronic ground states in condensed matter. We utilize a representation of the wave function, the fermionic neural network (FermiNet) [1], which is capable of representing any antisymmetric state [2], and requires no a priori knowledge of the system being studied. Guided by the quantum mechanical variational principle alone, without reference to experimental data, the FermiNet can learn the ground state of a many-body interacting Hamiltonian. Phase * g.cassella20@imperial.ac.uk transitions are seen by studying changes in the ground state as the parameters of the system are varied.
A significant body of recent work has used machine learning to detect phase transitions in simulated classical [3][4][5] and quantum [6][7][8] systems, but these studies required a source of external data, looking for patterns characteristic of different phases. Our approach requires only the Hamiltonian. There has also been work using neural network ansätze to study lattice models and spin systems, including their phase transitions [9][10][11][12][13], but for applications to many real systems, the wave function must be treated, as in the present work, in continuous space.
The flexibility of the FermiNet hinges on the universal approximation property of neural networks [14,15], which makes them a versatile tool for approximating high-dimensional functions and has led to radical advances in many computational fields [16][17][18][19]. This success has motivated the application of neural networks to solving problems across the physical sciences, including quantum mechanics [9,[20][21][22]. Several neural-networkbased wave functions in both first-quantized [1,[23][24][25][26] and second-quantized [27] representations have recently been used to compute the ground-state energies of molecules to a level of accuracy rivaling, or in some cases exceeding, sophisticated quantum chemistry methods such as coupled cluster with singles, doubles, and perturbative triples [28]. The FermiNet and ansätze derived from it are the most accurate of these so far, gaining an advantage over second-quantized neural and most quantum chemical approaches because they are basis-set free. Choosing an appropriate basis set for a given system requires some understanding of the qualitative nature of the ground-state wave function. Freedom from this requirement, coupled with the flexibility of the neural rep-resentation, enables the application of the FermiNet to generic phases of matter.
We extend the FermiNet, which has previously only been applied to atoms and molecules [1,23,29,30] [31], to periodic systems. Recent work has used neural network ansätze to study periodic systems in continuous space, but has either focused on bosonic systems [32] or used small basis sets [33], restricting their accuracy.
We demonstrate the flexibility of the periodic Fer-miNet by studying the quantum phase transition between the Fermi liquid and Wigner crystal [34] in the threedimensional interacting homogeneous electron gas (HEG) [35]. Two-dimensional Wigner crystals were very recently imaged for the first time [36][37][38], but three-dimensional Wigner crystals have not yet been observed in electronic systems and are thus less well understood. The zerotemperature properties of the three-dimensional HEG depend on a single dimensionless parameter, r s , defined as the ratio of the radius of a sphere that contains one electron on average to the Bohr radius. At high density (small r s ), the ground state is a weakly interacting Fermi liquid. At low density (large r s ), the correlations are stronger and the translational symmetry breaks spontaneously, giving rise to a spatially ordered Wigner crystal [34]. We find that the same neural network architecture learns the appropriate ground-state wave function either side of the Wigner phase transition, spontaneously breaking continuous translational symmetry when the crystal phase is stable. As we give the network no information about the nature of the ground state, the degree of inductive bias in the determination is very low.
The Hamiltonian for a finite HEG of N electrons subject to periodic boundary conditions is where the indices i label the N electrons in the simulation cell and U Coulomb is the Coulomb energy per simulation cell of an infinite periodic lattice of identical copies of that cell In practice, the Coulomb energy is evaluated using the Ewald method [39,40]. We work in Hartree atomic units, where energies are measured in Hartrees (1 Ha ≈ 27.211 eV) and distances in Bohr radii. The wave function represented by a FermiNet is a sum of determinants of many-electron (not one-electron) functions [1,23]: where x = (r, α) labels the spatial and spin coordinates of an electron, and the set {x /j } includes all-electron coordinates except x j . Multiplicative coefficients are not required as they can be absorbed into the determinants. The many-electron orbital ψ k i (x j ; {x /j }) depends on the coordinates x j of the j-th electron, and, in a permutation-invariant fashion, on the set of all other electron coordinates. The use of many-electron orbitals makes a FermiNet determinant much more flexible than a Slater determinant of one-electron orbitals, and it has been shown [2] that a single determinant of this form can represent any antisymmetric function. The proof of this theorem depends upon the construction of discontinuous functions that cannot be represented in practice by a finite network of a reasonable size. Nevertheless, a linear combination of a small number of FermiNet determinants has a much greater representational capacity than a linear combination of an equal number of Slater determinants [1].
It is convenient to work with spin-assigned wave functions [45], replacing Ψ(r 1 , α 1 ; . . . , r N , α N ) with a function of position alone: Ψ(r 1 , . . . , r N ) Ψ(r 1 , ↑; . . . r N ↑ , ↑ ; r N ↑ +1 , ↓; . . . , r N ↓ , ↓), where N = N ↑ + N ↓ is the number of electrons and N ↑ − N ↓ = 2S z is the spin polarization. The spin-assigned wave function is only antisymmetric on interchange of the position coordinates of electrons of the same spin, but assigning the spins has no effect on expectation values of spin-independent operators. Relabeling the electron positions according to the assigned spins, a FermiNet determinant becomes (in block-matrix form, determinant label k dropped for clarity), .
(3) The original FermiNet architecture assumed that the determinant above was block diagonal. We have found that removing this constraint provides a small but noticeable variational improvement. Dense determinants are hence used unless otherwise stated. A comparison between results obtained using dense and block-diagonal determinants can be found in the Supplementary Material.
The FermiNet uses a neural network to approximate the many-electron orbitals appearing in the determinants [1]. The network consists of two parallel streams, for processing one-electron and two-electron information. The one-electron stream is constructed of repeating blocks, where each block contains a nonlinear layer and a permutation-equivariant function. The twoelectron stream is a comparatively small fully-connected feed-forward network. The outputs of the one-and two-electron streams at each layer are fed into the permutation-equivariant functions. The multiple outputs of the one-electron stream are fed through a final linear layer to produce the required number of manyelectron functions, {φ kα i }. This may be generalized to complex-valued functions by doubling the output dimension of the final linear layer and taking pairs of the resulting outputs to represent the real and imaginary components of φ kα i . All results were obtained with real wave functions unless otherwise noted. Finally, the network I: Correlation energy of the spin unpolarized N = 14 HEG with simple cubic boundary conditions. The i-FCIQMC energies [41] were calculated using a basis of 778 plane-wave orbitals for r s = 5.0 or 2378 plane waves otherwise, corresponding to Hilbert spaces of 10 24 and 10 31 Slater determinants, respectively. The extrapolation of the i-FCIQMC results to the complete basis set limit may yield correlation energies that are 1-2 mHa too negative [42]. The Slater-Jastrow-backflow (SJB) VMC and DMC results were calculated using the casino program [43,44] with a single-determinant SJB trial wave function optimized using variance minimization and then energy minimization. The DMC results were extrapolated to zero time step. The FermiNet results were obtained as explained in the text. outputs are multiplied by a parameterized multiplicative envelope, f , to produce the many-electron orbitals ψ kα i (r) = f kα i (r)φ kα i (r). The electron position vectors r i and norms r i , and the electron-electron separation vectors (r i − r j ) and norms r i − r j , are supplied as inputs to the network, the output of which is the value of the many-electron wave function corresponding to those inputs. Full details of the network architecture are given in Ref. [1] and the Supplementary Material.
To adapt the FermiNet architecture to periodic systems, it is sufficient to modify the input features to ensure that periodic boundary conditions are satisfied. Periodic input features are most easily expressed in the basis {a 1 , a 2 , a 3 } of primitive Bravais lattice vectors of the simulation cell. An arbitrary vector r is written as s 1 a 1 + s 2 a 2 + s 3 a 3 and the periodic input features corresponding to r are obtained from the fractional coordinates s i via the component-wise transformation s i → (sin(2πs i ), cos(2πs i )). A periodic analogue of the Euclidean norm may be defined in terms of fractional coordinates as where S ij = a i · a j acts as a metric tensor in the fractional coordinate system. This definition of the norm is smooth, periodic with respect to the simulation cell, and proportional to the Euclidean norm as s → 0. Unlike the simpler norm introduced in [32], it retains these properties for non-cubic simulation cells. These changes are sufficient to satisfy the periodic boundary conditions, but we have found that convergence speed and asymptotic convergence are improved by including an envelope of the form for real wave functions, or for complex wave functions. The k m are simulation-cell reciprocal lattice vectors up to the Fermi wavevector of the noninteracting electron gas, and ν kα im , µ kα im are learnable parameters. Finally, when simulating the electron gas, the absence of nuclei (and hence electron-nuclear cusps) removes the need to include the norms of the electron positions as inputs.
The FermiNet wave function is optimized using the variational Monte Carlo (VMC) method [45]: the parameters of the network are adjusted to lower the expectation value of the energy, which is calculated using Metropolis-Hastings Monte Carlo integration over the 3N -dimensional space of electron positions. Gradients of the energy are obtained using standard back-propagation techniques, and the network parameters are updated using the Kronecker-factored approximate curvature algorithm [46], which approximates natural gradient descent [47] in a way that scales to large neural networks. Natural gradient descent is equivalent, up to a normalization constant, to the stochastic reconfiguration method [48] frequently used in VMC [1,49]. Unless specified, all calculations used the same hyperparameters as in Ref. [1] which are given in the Supplementary Material. Table I shows the results of FermiNet calculations of the total energy of a 14-electron simple cubic simulation cell of unpolarized HEG at four different densities. This system is sufficiently small that near-exact initiator full configuration interaction quantum Monte Carlo (i-FCIQMC) benchmarks are available [41]. The i-FCIQMC method [50] performs a stochastic diagonalization in a finite basis set, enabling the study of Hilbert spaces far larger than with exact diagonalization [50,51]. However, the fermion sign problem in FCIQMC increases rapidly with r s , rendering i-FCIQMC calculations at low densities with large basis sets impractical; the calculations at r s = 5 were ∼10 4 times more expensive than those at r s = 1 [41]. Table I also includes VMC results calculated using a conventional Slater-Jastrow-backflow (SJB) wave function, and fixed-node diffusion Monte Carlo (DMC) results based on the VMC-optimized SJB wave function. The parameters of the VMC SJB wave function were optimized using variance minimization and then energy minimization, as implemented in the casino code [43,44]. The DMC results were extrapolated to zero time step.
Although FermiNet is a VMC method, it achieves an accuracy similar to that of SJB-DMC, with both approaches obtaining 99% of the i-FCIQMC correlation energy extrapolated to the complete basis set limit (which may be 1-2 mHa too large [42]). FermiNet obtains a similar fraction of the correlation energy for molecular systems with a comparable number of electrons [1]. Again as in molecular systems, calculations using sixteen Fer-miNet determinants are noticeably better than calculations using one FermiNet determinant.
To assess the performance of FermiNet as the strength of the correlation increases, we study the N =27 electron spin-polarized HEG in the density range from r s = 1 to 90. Prior work [52,53] had found Wigner crystallization to occur in the interval r s = [100, 110], although a recent study [54] lowers this estimate substantially. The 27-electron system studied here is very small and there are substantial finite-system-size effects that broaden the phase transition and move it to a much higher density. Ground-state energies obtained using VMC with a Fer-miNet wave function and using VMC and DMC with SJB wave functions targeted at gas and crystal states are compared in Fig. 1(a) for the 27-electron system. FermiNet energies for r s ≤ 1 were obtained using complex wave functions, as discussed below. The precise form of the SJB ansätze used to describe gases and crystals are detailed in the Supplementary Material and Ref. [53]. Fer-miNet VMC calculations produce a tighter variational lower bound than both the SJB gas and crystal wave functions at all densities. Furthermore, FermiNet outperforms fixed-node DMC calculations based on a SJB gas wave function across the entire density range, even at r s ≤ 1. In the low-density regime, fixed-node DMC calculations using the SJB crystal wave function give slightly better results than our FermiNet VMC calculations. These results suggest that the nodal surface of the SJB crystal wave function is highly accurate but that the shape of the wave function away from the nodal surface is captured better by the FermiNet.
The Wigner crystal ground state of the HEG in the low-density limit is expected to be body-centered cubic (bcc) as this structure minimizes the packing density and has the lowest Madelung energy [34]. The emergent localization of the wave function due to Wigner crystallization can be seen by accumulating the expectation value of the one-electron density operator, where the expectation value is taken over samples of oneelectron coordinates. An order parameter for the brokensymmetry state is the Fourier component of ρ(r) corresponding to any primitive reciprocal lattice vector, b W i , of the emergent crystal: Scans of the one-electron density corresponding to the optimized FermiNet wave functions at r s = 10 and 70 are shown in Fig. 1(b). The figure shows the density in the (a 2 , a 3 ) plane, which is normal to the (011) direction of the conventional bcc cell. Fig. 1(c) shows the order parameterρ as calculated from VMC simulations using the FermiNet and SJB gas and crystal wave functions. These results show that FermiNet is capable of learning wave functions in both the gas and Wigner crystal states to very high accuracy without any hand-crafted features indicating whether the wave function should be localized or diffuse, any specific designation of crystal sites, or any other information that a transition should occur. We emphasize again that, unlike the gas and crystal SJB trial wave functions required to describe the gaseous and crystalline states accurately, the form of the FermiNet ansatz is identical across the entire density range.
For r s ≤ 1, real-valued FermiNets often become trapped in local minima during optimization, with energies typically ∼0.1% higher than the SJB-DMC benchmarks. However, complex-valued FermiNets do not become trapped in local minima at the same densities. The Hamiltonian here is real-valued, so the ground state can be taken to be a real function, and the converged complex FermiNets yield real-valued wave functions multiplied by a trivial (uniform in space) complex phase. This indicates that using a complex wave function improves optimization, rather than simply increasing representational capacity. Optimization also frequently becomes stranded in local minima for densities close to the phase transition in the crystalline regime (r s = 2, 3, and 5); however, this is reliably avoided by utilizing a higher learning rate, detailed in the Supplementary Material.
The center-of-mass coordinate of the electrons in the HEG separates and the wave function can be factored into a center-of-mass term, which is constant in the ground state, and a term that depends only on the vector separations of electrons. The one-electron density of the true ground state is thus uniform, not crystalline as we have found, and the crystalline order at low density appears in the pair-correlation function, not the oneelectron density. This is known as a "floating crystal" state [55,56]. As the size of the simulation cell tends to infinity, the cost of localizing the center of mass reduces to zero, the floating crystal becomes degenerate with the corresponding state of broken translation symmetry, and the phase transition is believed to become first order [54], with the order parameter jumping from zero to a finite value at the critical density.
In Refs. [53,54], it is shown (by considering a Slater determinant of Gaussian orbitals, with widths given by an empirical formula) that the energy difference between the fixed and floating crystal is approximately While the FermiNet differs from the Slater-type wave functions used to derive ∆E, we expect a similar reduction in kinetic energy. At low r s , ∆E is large (20mHa at r s = 2), so we would expect FermiNet to learn the floating crystal state. Fig. 1 (b, c) show that FermiNet instead learns the fixed crystal. The notion of a fixed origin can be removed by removing the one-electron features, however we find this increases the energy obtained. This suggests that the two-electron stream is insufficiently flexible to fully describe the two-electron correlations in the Wigner crystal without help from the oneelectron stream. Improving the flexibility of the twoelectron stream will be the focus of future work. We do not believe that these issues impact the central conclusion of the present work, and stress that in real condensed matter systems the Hamiltonian does not possess continuous translational symmetry.
To summarize, we have extended the FermiNet neural wave function to calculations with periodic boundary conditions. This we accomplished by making minimal, physically-motivated, modifications to render the input features periodic, and by adding a periodic envelope function. As proof of concept, we have demonstrated the accuracy of the modified architecture on the N =14 HEG, where we obtained ∼99% of the correlation energy and slightly outperformed VMC and DMC calculations using conventional one-determinant SJB trial wave functions. For the N =27 HEG, we see that the FermiNet is capable of learning the localized Wigner crystal phase a priori, producing energies in excellent agreement with SJB trial wave functions which encode the qualitative nature of the ground state in their construction. This suggests that the FermiNet may be capable of determining novel quantum phases in condensed matter given only the Hamiltonian.
To study quantum phase transitions in realistic strongly-correlated electronic systems, it will be necessary to scale to larger numbers of electrons to overcome finite-size effects. This may require additional innovations in the neural network architecture employed. The FermiNet could also be used as a trial wave function for DMC calculations in periodic boundary conditions, an approach that yields small improvements in molecular systems [29]. More generally, we believe that the flexibility and accuracy offered by neural networks make them promising tools for studying complex correlation effects and other emergent phenomena. The advantages of neural-network-based methods are most compelling when the phenomena in question are unexpected or not yet understood.
This work was undertaken with funding from the UK Engineering and Physical Sciences Research Council (EP/T51780X/1) (GC) and the Aker scholarship (HS). Calculations were carried out with resources provided by the Baskerville Accelerated Compute Facility through a UK Research and Innovation Access to HPC grant. Via his membership of the UK's HEC Materials Chemistry Consortium, which is funded by EPSRC (EP/R029431), Foulkes used the UK Materials and Molecular Modelling Hub for computational resources, MMM Hub, which is partially funded by EPSRC (EP/T022213). Our SJB-DMC Wigner crystal calculations were performed using the Lancaster University's High-End Computing cluster, and the ARCHER2 UK National Supercomputing Service (http://www.archer2.ac.uk) via our membership of the UK's HEC Materials Chemistry Consortium, which is funded by EPSRC (EP/R029431).

SUPPLEMENTARY FERMIONIC NEURAL NETWORKS
The FermiNet architecture maps a set of input features derived from the electron coordinates, {r α j }, where α labels the spin of the electron, to the set of functions ψ kα i (r j ; {r /j }). In the original FermiNet, this set of inputs to each layer of the network is with capitalized subscripts referring to atomic coordinates, and . the Euclidean norm. These input features are updated by consecutive transformations, The first transformation (one subscripted index) is referred to as the one-electron stream, and the second (two subscripted indices) the two-electron stream. The outputs from the L th transformation are subject to a final, spin-dependent, linear transformation and multiplied by (in open boundary conditions) an exponentially decaying envelope [23] which enforces the decay of the wave function as r i → ∞, where and The functions ψ kα i are used as the inputs to the determinants, (Eq. (2)), in the main text. Note that the pooling operations in (Eq. (10)) are chosen such that this feature vector is only permutation-invariant with respect to the exchange of electrons of the same spin. Thus, the desired fermionic exchange statistics are enforced even with a dense determinant. To construct complex wave functions, the number of functions φ kα i output from the network is doubled, and the inputs to the determinant become The set of parameters, are all learnable. Pretraining these parameters to minimize the deviation between FermiNet orbitals and Hartree-Fock orbitals is possible, but we find that it is often unnecessary to achieve a well converged result and it is not performed in any calculations in the current study. The linear transformations specified by V l and W l are known as hidden layers. For a more extensive description of the FermiNet architecture, see Pfau et al. [1].
FermiNets are trained via the variational Monte Carlo (VMC) method, a detailed description of which is provided by Foulkes et al. [45]. The parameters θ are optimized via gradient descent to minimize H . This guides the wave function Ψ θ toward the ground state as a result of the variational principle. Working in log-space, the gradient of H with respect to the parameters θ is where E L (r) = Ψ −1 (r)HΨ(r) and the expectation value is evaluated for samples of r taken from the probability amplitude |Ψ(r)| 2 . The kinetic component of the local energy is calculated in log-space via, We employ the Kronecker-factored approximate curvature algorithm [46] which uses an approximation to the Fisher information matrix to carry out natural gradient descent [47]. For complex wave functions, the inverse Fisher matrix is not strictly the appropriate choice of metric for natural gradient descent, and one should instead use the Fubini-Study metric [58]. In the present work, we found that natural gradient using the usual inverse Fisher was capable of finding the ground state, possibly because the Hamiltonian is real-valued. A full treatment of the optimization of complex wave functions is beyond the scope of the present work.

PERIODIC BOUNDARY CONDITIONS
The ground state wave function of an interacting system possesses a macroscopically large number of degrees of freedom n, due to the many-body interactions between all of the charges in the system. Solving for the manybody wave function Ψ(r 1 , . . . , r n ) in R 3n is intractable for n approaching the-effectively infinite on a computational scale-number of electrons found in a real solid.
To approximate real solids by simulating a small number of electrons we employ periodic boundary conditions: a finite-sized simulation cell is embedded in a periodic array of images of all charges in the simulation cell. The resulting Hamiltonian possesses discrete translational symmetry: displacing any charge by a simulation cell lattice vector leaves the system invariant. The many-body eigenfunctions then have the property [59] Ψ(r 1 , . . . , r i , . . . , r n ) = Ψ(r 1 , . . . , r i + R S , . . . , r n ), (22) where R S is a simulation cell lattice vector. As a result, the problem of finding eigenfunctions on R 3n for extremely large n has been reduced to a problem of finding eigenfunctions on the torus T 3n where n is a small number. The errors arising due to this approximation are known as finite-size effects. A full treatment of finite-size effects are beyond the scope of the present work, and do not alter the conclusions of the comparisons presented as all systems being compared are utilizing the same finitesize Hamiltonian.

FERMINET WITH PERIODIC BOUNDARY CONDITIONS
To impose the constraint (Eq. (22)) on the FermiNet with it is sufficient to choose an alternative set of input features to the first layer of the FermiNet which are invariant under the translation of any one electron coordinate by a simulation cell lattice vector. In the following sections we describe modifications to the coordinate and distance features which fulfill this requirement.

Fractional coordinates
Any vector in real space can be expressed as a linear combination of primitive simulation cell lattice vectors (a 1 , a 2 , a 3 defining s = (s 1 , s 2 , s 3 ) ∈ R 3 , which is an equally valid representation of a position on the lattice that we will refer to as fractional coordinates. There is a one-to-one mapping between positions in real space and positions in fractional coordinates, where, is a matrix whose columns consist of the primitive simulation cell lattice vectors. The simulation cell is a parallelepiped in real space but a unit cube in fractional coordinates. As a result, it is simple to construct maps that are periodic under translations by simulation cell lattice vectors using trigonometric functions. All vectors v in the original set of input features [Eqs. (11) and (12)] are replaced via the component-wise mapping v n → (sin(2πs n ), cos(2πs n )) .
An additional subtlety is introduced by the fact that any continuous, unique labeling of points on a unit circle requires two numbers. This necessitates the use of both sine and cosine input features for each spatial dimension by recognizing that the torus T 3n decomposes into a product of unit circles, S n × S n × S n . Displacing v by a simulation cell lattice vector leaves the value of the right-hand side of (Eq. (26)) invariant as desired.

Periodic norm
The periodic analogue of ||v|| must retain a cusp as v → 0, resembling the Euclidean norm. The inclusion of the Euclidean norm features is known to have a substantial impact on the accuracy of the FermiNet [1]: the network is incapable of introducing discontinuities into the wave function, and thus cannot satisfy the Kato cusp conditions [60] on the derivatives of the wave function as electrons approach nuclei and each other without cusps being included explicitly in the input features. Similarly, the periodic norm must be continuous everywhere except at the cusps, as the network will be unable to remove these discontinuities from the wave function, resulting in an unphysical contribution to the kinetic energy. In summary, we require a function of s which behaves like |v| = v 2 x + v 2 y + v 2 z as v → R, is periodic on the domain [0, 1] 3 , and whose derivative vanishes at the simulation cell boundaries to ensure continuity.
We proceed by considering the definition of the Euclidean norm as the Euclidean inner product of a vector with itself, where We conjecture by analogy that the norm in terms of the periodic co-ordinates (Eq. (26)) should be This definition of the norm possesses all of the properties that we desired: as s i → 0 this expression reduces to the Euclidean norm [Eq. (27)] by considering the firstorder Taylor expansions of sine and cosine; the periodicity is obvious as the expression is invariant to translations s i → s i ± 1; and, as a result of reducing to the Euclidean norm at the origin, this function retains the desired cusps, while also being differentiable in the rest of the unit cell. All distances, ||.||, in the original set of input features [Eqs. (11) and (12)] are replaced by the periodic norm.

Periodic multiplicative envelope
For the periodic envelope function introduced in Eqs. (5) and (6), ν k im and µ k im are strictly positive learnable parameters, optimized during training. The envelope eases the representation of highly oscillatory functions, while still being trivially capable of representing any function representable by the network with no envelope because k 0 = 0. All ν k im and µ k im are initialized to small random values except ν k i0 = 1. Other initialization schemes may be more appropriate, but have not been studied here. Fig. 2 demonstrates the improved training performance due to the envelope. Four A100 GPUs were used for all calculations presented. Calculations were carried out using single precision floating point numbers, as we found statistically identical results were achieved using double precision at the cost of approximately doubling runtime. The modifications to FermiNet were implemented using the JAX Python library [61], extending a development version of the FermiNet [62]. Optimization used a JAX implementation of the Kronecker-factored approximate curvature (KFAC) gradient descent algorithm [23,46,63]. In the N =14 electron gas, a FermiNet with 4 layers of 256 units in the one-electron stream and 32 units in the two-electron stream was used for the 1 and 16 determinant calculations. In all calculations for the N =27 HEG, we used a FermiNet of four layers with 512/64 units in the one/two-electron streams respectively with 16 determinants. For both systems, the wave function was optimized over 3e5 training iterations and 5e4 additional samples of H with the wave function parameters frozen were taken to obtain the final energies. The standard error associated with these energies was evaluated using a reblocking method [64] to account for sequential correlations introduced by the Monte Carlo sampling strategy.
For the majority of calculations, we employ a set of network and training hyperparameters identical to those previously used to obtain results on molecular systems [1]. For the N =27 HEG at r s = 2, 3, and 5, we find that an initially more aggressive learning rate (base value 1e-2 versus the default 1e-4) is required to reliably avoid local minima. This is accompanied by an increased KFAC norm constraint and damping of 1.0 and 1e-1, respectively.

Slater-Jastrow-backflow calculations
The Slater-Jastrow-backflow (SJB) wave function ansatz used to produce the benchmark VMC and DMC calculations in the main text takes the form Ψ(r) = exp(J(r))S(x(r)). (30) The Slater determinant S is composed of one-electron functions and enforces the fermionic antisymmetry of the wave function, just as in the FermiNet. This determinant is evaluated at coordinates which are modified by a backflow transformation, and multiplied by a Jastrow factor exp(J) which is a permutation-invariant function of the electronic coordinates. Here we will only provide a brief overview of the terms incorporated into these factors. A much more detailed account is provided in Ref. [54]. All SJB VMC and DMC calculations were performed using the casino program [43,44] For the Fermi fluid, the one-electron orbitals in the Slater determinant are the Hartree-Fock orbitals for the homogeneous electron gas, where the {k} are the N/2 (spin unpolarized) or N (spin polarized) smallest simulation cell reciprocal lattice vectors. In the crystal, periodic one-electron orbitals are evaluated as sums over periodic images of site-centered Gaussian functions, where R P is a primitive-cell lattice point within the simulation cell, R S is a simulation cell lattice point, and C is an optimizable parameter controlling the width of the Gaussian. This sum is truncated when the contributions of the images of the Gaussian basis functions become smaller than 10 −7 at the edge of the simulation cell in which the orbital is being evaluated. There are N primitive-cell lattice points within the simulation cell. The Jastrow exponent consists of a sum of three terms, where u is a power series in the electronic separations which includes fixed terms to impose the Kato cusp conditions [60]. This term is smoothly cut off at a radius less than or equal to the radius of the largest sphere that can be inscribed in the Wigner-Seitz cell of the simulation cell. The p term is where A consists of shells of simulation cell reciprocal lattice vectors, and A + excludes one from each pair of vectors which are related by inversion symmetry. Similarly, where B consists of shells of Wigner crystal primitive cell reciprocal lattice vectors. The q term is omitted from the Fermi fluid wave function, as it does not retain continuous translational invariance with respect to the electronic center of mass. The backflow transformation consists of two terms, where η is mathematically identical to the Jastrow u term, and π has the form of the gradient of the Jastrow p term: The coefficients a A , b B , and c A are all optimizable parameters.
All of these terms are evaluated using a minimum image convention.

BLOCK DETERMINANTS
In the main text we introduce the concept of dense determinants in the FermiNet architecture. Here we provide a comparison with energies for the unpolarized N =14 HEG, obtained using block diagonal determinants. These values are presented in Table III, alongside the results obtained using dense determinants, copied from the main text for comparison. In all cases dense determinants offer a small variational improvement in the correlation energy obtained.   IV: Total energy per electron of the N =27 HEG in a body-centered cubic simulation cell at a range of densities, obtained via variational Monte Carlo. SJB values were obtained from our calculations using the casino package. Here, "gas" refers to a Slater determinant of plane-wave orbitals, and "crystal" refers to a Slater determinant of Gaussian orbitals. FermiNet results for r s ≤ 1 used complex wave functions.