Active Learning a One-dimensional Density Functional Theory

Density functional theory underlies the most successful and widely used numerical methods for electronic structure prediction of solids. However, it has the fundamental shortcoming that the universal density functional is unknown. In addition, the computational result---energy and charge density distribution of the ground state---is useful for electronic properties of solids mostly when reduced to a band structure interpretation based on the Kohn-Sham approach. Here, we demonstrate how machine learning algorithms can help to free density functional theory from these limitations. We study a theory of spinless fermions on a one-dimensional lattice. The density functional is implicitly represented by a neural network, which predicts, besides the ground-state energy and density distribution, density-density correlation functions. At no point do we require a band structure interpretation. The training data, obtained via exact diagonalization, feeds into an active learning scheme which minimizes the computational costs for data generation. We show that the network results are of high quantitative accuracy and, despite learning on random potentials, capture both symmetry-breaking and topological phase transitions correctly.

Introduction.-Materials with strong electronic correlations host a variety of intriguing phenomena and quantum phases.Modeling and understanding these systems are among the greatest challenges in theoretical condensed matter physics.For quantitative and predictive results, numerical calculations are indispensable.The most widely and successfully used numerical approach to the electronic structure problem is based on density functional theory (DFT).In condensed matter physics, DFT is often linked to band structure calculations, while it is in principle much more powerful than that.The Hohenberg-Kohn theorems guarantee that a (potentially correlated) many-body ground state is uniquely determined by its energy and charge density distribution [1].However, for practical implementations and a physical interpretation of calculated results, the Kohn-Sham ansatz is commonly used, producing the band structure of a different, non-interacting system with the same energy and density [2].The implicit assumption is that this band structure captures the essential physics of the original system, at least if correlations are weak enough.
A critical shortcoming of DFT is that its eponymous functional is not known; instead, approximations on various levels of complexity are commonly employed [3].It is important to emphasize that most of the functional is universal, representing the many-particle Schrödinger equation.The only non-universal input in a DFT calculation for a crystal is the potential landscape within the unit cell induced from the ions and core electrons as well as the particle number, both of which do not affect the universal part of the functional.
In this work, we take an approach that follows three guiding principles: (i) Implicit knowledge representation is the key strength of neural networks.Therefore, we use a neural network to implicitely represent the (minimized) DFT functional.(ii) We aim at solving for phases of quantum matter beyond the band structure paradigm.To that end, we train the neural network to directly output correlation functions [32], which can be used to characterize phases and phase transitions.(iii) Training the neural network is the key challenge as data -theoretical or experimental -is precious.We set up an algorithm, in which the neural network can be trained with data from different system sizes.This is the basis for an active learning scheme, via which the neural network requests costly training data for larger systems only in situations where it detects large finite-size effects.
Figures 1 (a) and (b) summarize our model, neural network, and workflow.We choose to work with a onedimensional lattice model of spinless fermions.The hopping and interaction terms of the model define our 'universal Schrödinger equation' and are therefore left unaltered throughout the study.Input to the neural network is the problem-specific potential and particle number.Its output is the ground state energy E GS as well as the density-density correlation function.We start by introducing the active learning scheme and demonstrate the quantitative accuracy of network predictions, after training it on random potentials.The active training shows superior performance compared to conventional training.We obtain mean squared errors of the energy of 3.08•10 −4 in units of the hopping integral.Finally, we apply the trained network to a topological and a symmetrybreaking phase transition.Our results demonstrate a scalable architecture, able to capture interacting lattice models, with successful applications to structured phases.
Model.-While density functional theory was originally formulated as a continuum theory, it has also been successfully applied to lattice models [33].We consider a Hamiltonian for spinless fermions on a one-dimensional lattice with sites labelled by i = 1, • • • , L under periodic boundary conditions, where ĉ † i and ĉi are the fermion creation and annihilation operators on site i and ni = ĉ † i ĉi is the corresponding density operator.Nearest-neighbor hopping is parametrized by t, which will serve as the energy unit throughout.The particles are subject to a repulsive interaction on nearestand next-nearest-neighbor sites which we fix to U = 1 and U = 0.5 so as to model a lattice analogue of the Coulomb interaction.This parameter choice places the system in a metallic, but strongly correlated phase in absence of a potential V i , even at half filling [34].
Motivated by the Hohenberg-Kohn theorems, we consider the kinetic term and the electron-electron interactions as universal, such that the external or ionic potential Vext = i V i ni together with the filling uniquely determine the ground state and all of its properties.We only consider potentials with periodicity of four sites.That is, the four values V i , i = 1, ..., 4, completely specify the Hamiltonian for any lattice size L = 4N uc , with N uc the number of unit cells and V i = V i+4 for all i.This four-site unit cell can be thought of as the discretized unit cell of a periodic crystal, while V i is the ionic potential in this analogy.We restrict it to the range We further denote by the real number 0 < n e < 4 the particle filling per unit cell.We emphasize that despite imposing this periodicity to the potential, our approach is able to capture phases which spontaneously break the four-site translation symmetry.
Learning.-Thesupervised-machine-learning algo-rithm we use bypasses the Kohn-Sham scheme by directly learning the map from the external parameters n e and V i to the corresponding ground-state energy and density-density correlators ni nj GS .The densitydensity correlators are calculated for two adjacent unit cells [35].The chosen fully connected neural network [36] consists of four hidden layers which increase in size towards the output as depicted in Fig. 1 (a) [37].
A central challenge in machine learning is unbiased and efficient data generation; one usually deals with limited computational or experimental resources.Here, we generate data by finite-size exact diagonalization (ED) of systems with randomly chosen n e , V i .In order to reduce finite size effects, these examples should naively be generated with as large systems as possible.However, the computational cost for data generation with ED grows exponentially with system size.For this reason, we employ an active learning procedure, performing costly large system ED, as depicted in Fig. 1 (b), only if necessary.Using random values for n e and V i , the scheme iteratively computes larger systems until the finite size deviation between ground state energies lies below a priorly chosen threshold θ.Correspondingly, the fast computation of smaller systems is used as often as possible, while providing more accurate data in critical cases.The lower significance of inaccurate samples is accounted for by a sample weight [37].The samples are further augmented by applying translations within the unit cell and inversion, allowing the network to capture the symmetries of the universal part of the Hamiltonian.
We contrast the active learning approach outlined above with a passive learning scheme using training data generated for systems of fixed size N uc = 5, with filling n e and on-site potentials V i chosen randomly.This system size is still solvable efficiently by ED, while sufficiently reducing finite size effects.Both learning procedures were run with an equal time budget to ensure comparability.
A mean absolute error loss function is then optimized to obtain the weights and biases of the actively (ALN) and passively (PLN) learning neural network.The resulting performance is evaluated on unseen data, consisting of 20 % of the full data set.Overfitting was avoided for both systems by suitable hyperparameter choices [37].The absence of significant deviations in the energy correlation plot in Fig. 1 (c) shows that the ALN performs well on random data, with an absolute error peaked at 1.2 • 10 −2 .Similarly, Fig. 1 (d) shows only small errors in the correlator prediction, with an overall mean absolute error of 1.8 • 10 −3 .The PLN performs worse in predicting energy values and correlators, as shown in the inset of Fig. 1 (e) and in Fig. 1 (f), with errors at least twice as large as in the case of the ALN.This comparison shows the advantage of intelligent data generation at fixed computational time budget.
Random potentials rarely represent a relevant physical scenario.We therefore move on to investigate how the Neural network results for a transition between different atomic limit insulators.(a) Compressibility κ for various potential strengths at quarter filling, as calculated from the actively and passively learned neural network, exact diagonalization (ED) and density matrix renormalization group (DMRG) of several system sizes.(b) Schematic depiction of the potential in the 4-site unit cell: Depending on the strength and sign, an obstructed atomic limit, metallic, and atomic limit phase can be realized.(c) Corresponding observable C as calculated from the 8x8 density-density correlator for the same numerical methods as used for the compressibility.The insets show the correlator as obtained from the ALN in the atomic (lower left) and obstructed atomic limit (upper right) with the unit cell depicted in red.
randomly trained ALN and PLN perform for structured systems, where V i obey further symmetries.
Learnability of obstructed atomic limits.-Weconsider the model introduced in Eq. (1) for a potential choice (V 1 , V 2 , V 3 , V 4 ) = (0, V, V, 0) at quarter filling (n e = 1).The system is metallic for V = 0 which separates two distinct insulating phases for V > 0 and V < 0. For V > 0 (V < 0), Wannier functions are localized between the unit cells (in the middle of the unit cell).This corresponds to two topologically distinct atomic-limit insulators, with the intra-unit-cell hopping being effectively reduced (enhanced) compared to the inter-unit-cell hopping.The insulating nature of these phases can be shown by calculating the compressibility where n e is the electron filling and E GS (n e ) the corresponding ground state energy.Figure 2 (a) shows that, as one approaches the critical metallic state around V = 0, κ increases rapidly.We emphasize that since κ is the second derivative of the energy, it is extremely susceptible to errors.Note, further, that the ED data shows a strong even-odd effect in N uc .We also calculated κ(V ) with the density matrix renormalization group algorithm (DMRG) [38][39] for N uc = 28.Compared with these exact results, the ALN produces a meaningful κ(V ) with a peak value κ(V = 0) close to the DMRG result.On the contrary, the PLN is worse with a less pronounced and non-symmetric peak.
The location of Wannier centers can be used to differentiate between the two phases.Defining C = ( n2 n2 − n2 n3 ) − ( n4 n4 − n4 n5 ) [40], the trivial atomic limit with localization in the unit cell is obtained for C > 0, the phase transition happens at C = 0 and the non-trivial atomic limit has C < 0. Figure 2 (c) highlights that both networks are able to capture C across the transition well, but the ALN results are markedly more accurate than the PLN results when compared with the ED and DMRG data.This supports the statement that active learning delivers quantitatively better results.
Learnability of spontaneously symmetry broken phases.-Spontaneousbreaking of translation symmetry can be triggered by introducing a potential of the form (V 1 , V 2 , V 3 , V 4 ) = (−V, V, −V, V ) at quarter filling (n e = 1).This phase arises from the competition between the next-nearest-neighbor interaction U and the increasing potential V and breaks spontaneously the four-site translation symmetry at V c ≈ 1.8 [37].The metallic system shows a smooth dependence of E GS on n e around quarter filling [Fig.3 (a)].With increasing V , E GS (n e ) develops a kink at n e = 1, signalling the emergence of the symmetry broken charge-density wave (CDW) insulator [Fig.3 (b)].Both neural networks represent the different phases very well, the deviations at small fillings are attributed to the limited amount of training samples in this limit.
Figure 3 (c) shows that the correlation in the metallic phase is short ranged and fast decaying, whereas the symmetry broken phase possesses a distinct order.The corresponding order parameter C SSB = 1 Nuc i (−1) i n2i+1 is, however, zero, since the two degenerate ground states have opposite imbalance in electron density between the first and third site of each unit cell.Instead, the order can be diagnosed by computing the square of the order parameter from the density-density correlation functions.Its nonzero value is implied by the inequivalence between off-diagonal terms in the density-density correlator, highlighted by the red and green squares in Fig. 3 (c).This behaviour is well captured by the ALN, producing quantitatively accurate correlations in both phases.
Conclusion.-We presented a supervised learning approach for lattice DFT, bypassing the Kohn-Sham solution scheme.Employing an active learning procedure allowed us to improve our results at fixed computational cost regarding data generation.Focussing on correlation functions on a subsystem and taking only the potential landscape in the unit cell and particle number as input results in a scalable architecture.Besides verification of our algorithm on unseen random potentials, we demonstrated that the trained networks reliably solve for different structured phases.
Looking ahead, it is highly desirable to construct similar implicit (neural network) representations of DFT for systems in continuous space and higher dimensions, in particular to attack the electronic structure problem in strongly correlated regimes.The main challenge is the generation of valid and balanced data sets, and the incorporation of data from various sources, including conventional DFT, Monte Carlo calculations, experiments, and future quantum simulation devices.Two concepts on which our study relies, (1) focus on correlation functions instead of quantum states and (2) the use of active learning, should prove useful in this future venture.

I. NETWORK PARAMETERS
The supervised-machine-learning algorithm we propose in this paper uses dense neural networks of identical architecture, for both the active and passive learning scheme.The layers are connected by Softplus(x) = ln (1 + e x ) activation functions, except for the output layer.The last layer takes into account that correlator values and ground state energies have different ranges, by employing a linear activation function.The choice of training examples is crucial in a machine learning setting, as data is precious.An intelligent data generation procedure is advantageous if computational time is finite, removing the necessity to always calculate as large systems as possible.Naively, the latter is the way to go in order to avoid finite size biases in the training examples.We contrast these two approaches as active and passive learning schemes.
The naive approach generates data with exact diagonalization of a 5 unit cell system, with electron fillings n e and potentials V i in the unit cell chosen randomly.The potentials are in the range |V i | ≤ 4, and the filling can assume values between 0.6 ≤ n e ≤ 3.4 electrons per unit cell.Data augmentation is then used to enlarge the dataset and allow the neural network to capture the underlying symmetries of the physical problem.This means applying translations V i → V i+1 and inversion within the unit cell.The resulting dataset consists of 12.020 pairs of fillings and external potentials, mapped to the corresponding ground state energies and density-density correlators.The split in training, validation and test sets is illustrated in Tab.II.However, large system diagonalization is not always necessary to obtain accurate data.This is the basis for an active learning scheme, which performs the diagonalization of large systems only if strong finite size effects are detected.The input to this procedure is the random choice of electron fillings n e and potentials V i in the unit cell.The former assuming values between 0.5 ≤ n e ≤ 3.5 electrons per unit cell and the latter being chosen in the range |V i | ≤ 4. The training data for both networks was chosen as |V i | ≤ 4 in order to ensure that the transition to the symmetry broken phase lies within the trained potential range.
If the ground state energy for a N uc = 3 and N uc = 4 unit cell system, calculated with exact diagonalization, deviates more than a priorly chosen threshold θ, a system with one additional unit cell is being calculated.This procedure is repeated up to N uc = 6 unit cells if necessary.This means that the neural network can query a larger system whenever the deviation of ground state energies exceeds the chosen threshold θ, thereby reducing finite size effects.A sampleweight is used to capture the lower importance of small systems, whenever finite size effects are detected.Data augmentation is again used not only to enlarge the dataset to 74.500 input-output data pairs (see Tab. III), but also to allow the network to capture the underlying symmetries.a This weight is increased to 3 if no 20 or 24 site system had to be calculated for this sample.
b This weight is increased to 3 if no 24 site system had to be calculated for this sample.
The choice of the threshold θ is a crucial parameter of the active learning scheme.We therefore investigate the implications of different thresholds (see Tab.The resulting mean absolute error is presented in Fig. 1, indicating that the error decreases with increasing θ.When considering random potentials, it is therefore advantageous to use training data generated with as large θ as possible, resulting in the largest possible dataset.Consequently, this means larger systems with N uc = 5, 6 are never calculated.
The random data generation with potentials in the range |V i | ≤ 4 causes however mostly situations where the electrons are localized in the potential landscape, due to large potentials V i .Finite size deviations are however negligible in such a case.This means that a good performance on this data does not necessarily correspond to a good elimination of finite size effects in a realistic physical potential.Figure 1 highlights this with the evaluation of a dataset with potentials |V i | ≤ 1, with the minimum of the obtained mean absolute error shifting towards smaller values of θ.
Consequently, a trade-off is needed between the total number of training samples and the number of samples with N uc = 5, 6.The best compromise is reached for θ = 0.0025, with a large enough size of the dataset, while containing significant information about larger systems.This parameter was therefore used to generate the final dataset used to train the actively learning neural network in this paper.

III. TRAINING PERFORMANCE
The actively and passively constructed datasets are split into training, validation and test sets.Achieving consistent performance on the optimized and unseen data indicates that the network has not been overfitted.The corresponding results for both training schemes are presented in Tab.Va and Vb, highlighting that the intended mapping from electron fillings and potentials to ground state energy and density-density correlator is well captured.

IV. DETERMINATION OF THE POINT OF SPONTANEOUS SYMMETRY BREAKING
The four-site translational symmetry of the considered extended Hubbard model can be spontaneously broken by the introduction of an external potential Vext = 4 i=1 V i ni .Choosing a potential of the form V 1 = −V 2 = V 3 = −V 4 = −V causes a competition between next-nearest neighbor repulsion U and the potential V at quarter filling.As a result, a symmetry breaking phase with two degenerate ground states emerges, corresponding to ordering the electrons either to site one or to site three in each unit cell.
We consider two approaches in order to identify the potential strength at which the spontaneous symmetry breaking occurs.For each of these we investigate a finite size scaling plot, to derive the extrapolated point of the phase transition in the thermodynamic limit.
As stated above, the symmetry broken phase possesses a distinct order in the density-density correlator, occupying only sites with negative potential.By adding a small potential on one site, e.g.V 1 = −V − δV, δV V , one of the degenerate ground states is favoured.Consequently, the occupation ni GS will concentrate on the first lattice site of each unit cell.We therefore investigate the difference in the occupation of site one and three, C = n1 GS − n3 GS .In the metallic phase for a vanishing potential V = 0, all sites are equally occupied, despite the small offset δV .This changes around the point of the phase transition, where explicitly one of the symmetry breaking ground states

FIG. 1 .
FIG. 1.(a) Schematic of the dense neural network used to learn the map from unit-cell potentials and filling to ground-state energy and density-density correlators.(b) Active learning allows to check the energy deviations of different system sizes and continuing to larger systems only if necessary.(c) Exact versus predicted ground-state energies on the test data set of the actively learning network (ALN).The inset shows the absolute error on the test-energy values as a density histogram n.(d) Mean absolute error per correlator entry on the test data set for the ALN.(e) Exact versus predicted ground-state energies for the network trained on a single system size (PLN), with the inset showing the density n of absolute errors on the test data set.(f) Mean absolute error per correlator entry on the test data set for the PLN.

FIG. 3 .
FIG. 3. Neural network results for a spontaneous symmetry breaking phase.(a) Ground state energy for V = 0 for several electron fillings n e as calculated from the actively and passively learned neural network and ED.The inset displays the non-interacting bandstructure.(b) Ground state energy as a function of the electron filling n e in the symmetry broken phase (V = 4).The kink at ne = 1 signals an interaction-induced incompressible phase.The inset reveals the band flattening of the non-interacting system.(c) Phase diagram, schematic of the potential, and density-density correlation functions.The latter are obtained from the ALN for V = 0 (left) and V = 4 (right).Off-diagonal terms, whose inequivalence signals the symmetry breaking phase (right) are highlighted in red and green.
IV), and test the performance of the trained models on unseen examples.These examples are not part of the training set and were generated with random fillings n e and potentials V i in the range |V i | ≤ 4 for systems of N uc = 5, 6. TABLE IV: Number of samples for different choices of the threshold θ, calculated with exact diagonalization and an equal time budget for each parameter choice.Nuc θ = 0.0 θ = 0.001 θ = 0.0025 θ = 0.005 θ = 0.01 θ = 1

FIG. 1 :
FIG. 1: Mean absolute error of neural network predictions for various datasets, trained on examples generated with different threshold θ as illustrated in Tab IV.Random potentials were used to generate the evaluation data, with a N uc = 5, |V i | ≤ 4 dataset consisting of 5600 samples, N uc = 6, |V i | ≤ 4 containing 220 datapoints and 1205 N uc = 5, |V i | ≤ 1 input -output pairs.The evaluation sets were not used in the training process of the neural networks.

FIG. 2 :
FIG. 2: (a) Finite size scaling plot of the critical potential V c , obtained with a small potential δV to break the ground state degeneracy.Density matrix renormalization group (DMRG) calculations of open boundary conditions are combined with the exact diagonalization (ED) of a 16 site system.(b) Schematic of the potential landscape in the unit cell.(c) Finite size scaling plot of the critical potential V c , obtained with an additional half unit cell and electron to break the ground state degeneracy.Density matrix renormalization group (DMRG) calculations of open boundary conditions are combined with the exact diagonalization (ED) of an 18 site system.(d) Schematic of the potential landscape with additional two lattice sites.

TABLE I :
Table I indicates the relevant parameters used in this paper.Relevant parameters used to create the neural networks in this paper.

TABLE II :
Data for the passively trained neural network.

TABLE III :
Data for the actively trained neural network.