Order-$N$ orbital-free density-functional calculations with machine learning of functional derivatives for semiconductors and metals

Orbital-free density functional theory (OFDFT) offers a challenging way of electronic-structure calculations scaled as $\mathcal{O}(N)$ computation for system size $N$. We here develop a scheme of the OFDFT calculations based on the accurate and transferrable kinetic-energy density functional (KEDF) which is created in an unprecedented way using appropriately constructed neural network (NN). We show that our OFDFT scheme reproduces the electron density obtained in the state-of-the-art DFT calculations and then provides accurate structural properties of 24 different systems, ranging from atoms, molecules, metals, semiconductors and an ionic material. The accuracy and the transferability of our KEDF is achieved by our NN training system in which the kinetic-energy functional derivative (KEFD) at each real-space grid point is used. The choice of the KEFD as a set of training data is essentially important, because first it appears directly in the Euler equation which one should solve and second, its learning assists in reproducing the physical quantity expressed as the first derivative of the total energy. More generally, the present development of KEDF $T[\rho]$ is in the line of systematic expansion in terms of the functional derivatives $\delta^{\ell_1} T/\delta \rho^{\ell_1}$ through progressive increase of $\ell_1$. The present numerical success demonstrates the validity of this approach. The computational cost of the present OFDFT scheme indeed shows the $\mathcal{O}(N)$ scaling, as is evidenced by the computations of the semiconductor SiC used in power electronics.


I. INTRODUCTION
Density Functional Theory (DFT) [1] proves that the ground-state total energy E of an interacting electron system is a unique universal functional G[ρ] of the electron density ρ(r) plus the electrostatic energy V ext [ρ] under the external potential v ext (r), opening a possibility to compute physical properties of real materials by solving an Euler equation δE[ρ]/δρ(r) = µ, where µ is the Lagrange multiplier that enforces density normalization. Various attempts have been made by introducing virtual non-interacting electron systems, in which the electron densities are identical to those in corresponding real materials, and then decomposing G to the kinetic energy of the noninteracting system T s [ρ], the classical electronelectron interaction energy E H [ρ], and the remaining exchange-correlation energy E xc [ρ] [2]. In this scheme, T s is expressed as a sum of the kinetic-energy contribution from each orbital φ i (r) (Kohn-Sham orbital) as, and thus the original Euler equation in DFT, becomes a set of Schrödinger-like equations (Kohn-Sham equations) which in turn determine φ i self-consistently. A numerous number of works adopting this Kohn-Sham (KS) scheme (KSDFT) has been applied to a various materials and achieved unprecedented success [3,4], depending on the level of the approximation to the exchange-correlation functional (Jacob's ladder) [5]. However, solving the Kohn-Sham equations for all the occupied orbitals in the system is a computational burden scaling with the system size N as O(N 3 ), thus restricting the applicability of DFT. The scheme with lowerorder scaling is highly demanded in materials science and also in advancing DFT. One of the solutions in a legitimate way is the orbital-free density-functional theory (OFDFT) in which T s [ρ] is expressed as a functional of ρ, the kinetic energy density functional (KEDF), and the Euler equation Eq. (2) remains as a single equation for ρ, thus OFDFT being expected to be O(N ) scheme.
Such OFDFT approach free from the orbitals is in the heart of the original DFT [1] and was initiated much before DFT, being known as Thomas-Fermi (TF) theory for the homogeneous electron gas [6,7] and von Weizsäcker (vW) gradient expression [8]. Based on these approaches, the kinetic energy functional T s is generally written as where τ TF (r) is the kinetic energy density in the TF approximation, τ TF = (3/10)(3π 2 ) 2/3 ρ 5/3 , and F is so called the enhancement factor. There have been a lot of efforts to develop the enhancement factor [9,10] either in a semilocal [11][12][13][14] or a nonlocal form [15][16][17][18][19][20][21][22] to reproduce the KS kinetic energy T KS as accurate as possible. Among the various semilocal forms, recent two functionals, PGSLβ (Pauli-Gaussian second order and Laplacian with a parameter β) [12,13] and LKT (Luo-Karasiev-Trickey) [14], reproduce experimental structural properties successfully within the error of less than a few percent for the lattice constants and of about ten percent for the bulk moduli. Semilocal KEDFs have been usually developed in the form of the generalized gradient approximation (GGA) and satisfy some of the exact conditions for (a) the small limit of the density gradient in the gradient expansion (GE), (b) the large limit of the density-gradient in GE, (c) the positivity of Pauli potential [23], and (d) the linear response function in the homogeneous density limit, namely being equal to the Lindhard function [24]. PGSLβ satisfies the first three conditions (a)-(c). The parameter β is fitted so as to minimize the mean absolute relative errors of physical quantities such as the cell volume, the bulk modulus, the total energy at the equilibrium volume, and the electron density, with respect to the corresponding values obtained in the KS scheme, leading to the value of β = 0.25 [12,13]. LKT satisfies the conditions (b) and (c), and is left with a single parameter "a". The allowable range of "a" is determined by the condition (c) for atomic densities generated by pseudopotentials. The obtained range is 0.0 ≤ a ≤ 1.3 and the value of a = 1.3 is typically used [14]. Karasiev et al. [25] proposed another semilocal parameter-free KEDF that satisfies the conditions (a)-(c). However, The bulk moduli from this functional are around 50 % higher than the reference KS values. Hence, we will not include the results by this functional in our benchmark in the present paper.
Inclusion of the nonlocal effects in the enhancement factor indeed improves the performance of KEDF [18,19,22,26,27]. The nonlocal functional typically reproduces the structural properties within the error of less than 1 % for the lattice constants and 10 % for the bulk moduli [27]. Nonlocal KEDFs such as WGC (Wang-Govind-Carter) [19] and related nonlocal KEDFs [15-18, 20, 21] were constructed to satisfy the conditions (a), (b), and (d). HC (Huang-Carter) [22] functional improves the performance of WGC and its relatives in semiconductors by using two parameters adjusted to reproduce bulk moduli, equilibrium volumes, and equilibrium energies by KSDFT. EvW-WGC (enhanced von Weizsäcker Wang-Govind-Carter) [26] functional is an another extension of WGC, which is a linear combination of vW, TF and WGC, where a system-dependent parameter determines the portions of each KEDFs. EvW-WGC is more accurate than HC if an optimally adjusted parameter is used. MGP (Mi-Genova-Pavanello) [27] functional uses a unique way of imposing the condition (d), namely the functional integration of the inverse of the Lindhard function in homogeneous density limit. In spite of the better performance generally observed, the nonlocal scheme requires the heavier computational cost, which scales as O(N log N ) and, if possible, the true O(N ) scheme enabled by the semilocal scheme is desired.
In this paper, to establish an alternative scheme that follows strictly the O(N ) scaling, we propose a new enhancement factor in KEDF by neural-network (NN) machine learning. Without resorting to the reproducibility of the structural properties of target materials, we focus on reproducing the electron density which is the quantity of basics in DFT. We show that our NN which is trained only with the electron density in diamond generates a KEDF and successfully reproduces structural properties of a variety of materials, thus demonstrating the potential of the machine learning for further developments of the density functional.
Coming back to the DFT itself, the energy as a functional of the electron density, KEDF in the present case, is primary. Its functional derivative defines the Euler equation and, on the other hand, provides the physical properties related to the first order derivative of the total energy. The higher derivatives are obviously important to describe the linear and nonlinear responses of materials. It is thus desirable to construct KEDF as well as its functional derivatives by term-by-term conformation of the kinetic-energy functional-derivative (KEFD) δ ℓ1 T /δρ ℓ1 to the KS derivative through progressive increase of ℓ 1 . This is combined with the variational optimization of KEDF as a functional of both ρ and its spatial derivatives d ℓ2 ρ/dr ℓ2 with increasing ℓ 2 . This systematic approach was formidable in the past but now may be practical using the machine learning. We here demonstrate a success of our first attempt with ℓ 1 = 1 and ℓ 2 = 2.
The organization of the present paper is as follows. In II, we explain the way of constructing our NN KEDFs. Computational details are also explained. The obtained NN-KEDFs are applied to 24 systems which include 7 atoms, 6 diatomic molecules and 11 solids ranging from metals, semiconductors to an ionic solid in III. The accuracy of our NN KEDFs is assessed in detail and discussed. In III, we also demonstrate the O(N ) computational time scaling of our orbital-free implementation achieved for the system with thousands of atoms. We summarize our findings in IV.

II. METHODS
A. Neural network for developing kinetic energy density functional The neural network (NN) in general consists of an input layer, multiple hidden layers, and an output layer. Each of those layers with an index l (l = 0, . . . , N ) is composed of neurons with indices k (k = 0, . . . , D l ), where N −1 is the number of hidden layers and D l is the number of neurons in the l-th layer (Fig. 1). The output from the j-th neuron in the l-th layer (1 ≤ l ≤ N − 1) is generally written as where σ (l) is the activation function with the variable a . . .
, are called weights. For each layer, by using additional inputs z j0 works as the bias parameter.
In order to develop our NN KEDF, we seek for the enhancement factor F NN as a functional of dimensionless quantities derived from the gradient and Laplacian of the electron density, i.e., s 2 and q, (namely, up to ℓ 2 = 2) where s = |∇ρ|/[2(3π 2 ) 1/3 ρ 4/3 ] and q = ∇ 2 ρ/[4(3π 2 ) 2/3 ρ 5/3 ]. The KEDF with this enhancement factor, satisfies the uniform scaling condition [28]. Here, the generic notation W ≡ {W i ; i = 1 . . . n W } represents the set of weight parameters in the NN, { W (l) jk } in Eq. (4). Figure 1 shows the schematic structure of the NN with two inputs s 2 and q. The output is the enhancement factor F NN (s 2 , q) introduced in Eq. (5). The inputs and outputs are of course spatial dependent and thus the functions of the real-space position r.
The explicit formula of the enhancement factor in the present NN KEDF is, e.g., for D l = D and N = 4 (see subsection II C for determination of the structure of our present NN), 2 ) = (s 2 , q). In the present paper we adopt the activation function defined as the exponential linear unit (ELU) [29] σ(a (l) for each layer (1 ≤ l ≤ N − 1) and an identity function for an output layer (l = N ).
We adopt the enhancement factor F NN (s 2 , q) as the output variable z (N ) 1 . However, F NN (s 2 , q) is not directly learned as the training data. Instead, the KEFD with ℓ 1 = 1 from the KS scheme, δT KS /δρ, is used as the training data since we adopt the cost function defined in terms of the KEFD [Eq. (8) below] is used as a criterion of the performance of our NN in the present paper.
B. Density-functional computation scheme Actual Density-Functional (DFT) computations in the construction of our NN and its validation have been performed by our real-space code RSDFT [30][31][32] which is highly optimized for the massively parallel architecture. In RSDFT scheme, all the quantities are computed on grid points in real space and the converged results are obtained by reducing the grid spacing systematically. Thanks to the real-space treatments, the scheme is essentially free from Fourier Transform, which releases heavy communication burden in the massively-parallel architecture of computers. For the exchange-correlation functional, we use generalized gradient approximation by Perdew, Burke, and Ernzerhof (PBE) [33].
In usual first-principles DFT calculations, orbitaldependent nonlocal pseudopotentials (NLPSs) are used to simulate nuclei and core electrons in real materials [34,35]. Yet in the OFDFT scheme, the orbitals are unavailable so that we need to construct ab-initio local ionic pseudopotentials (LIPSs) to simulate nuclei and core electrons. We have newly generated such LIPSs following the scheme by Carter and collaborators [22,[36][37][38]. Our scheme to generate LIPS consists of (i) the generation of the electron density of the target element in KSDFT scheme, (ii) obtaining the effective potential from the above density by solving the inverse problem with the Kadantsev-Stott method [39], and (iii) proper skeletonizing of the effective potential (Appendix A for details). It should be emphasized that our construction scheme is totally free from adjustable parameters. We have generated LIPSs for 7 elements, Li, C, Na, Al, Si, Cl, and Cu. The validity and transferability of the obtained LIPSs are evidenced by the electron densities and the structural properties of 8 various materials, bcc-Li, fcc-Al, fcc-Cu, bcc-Na, NaCl, ds-Si, ds-C, and zincblende(3C)-SiC (Appendix A).

C. Training of neural network toward kinetic energy functional derivative
In order to develop the KEDF by our NN that best reproduces the KS-KEDF, T KS [ρ], within the framework of ℓ 1 = 1, we minimize during the training process the cost function for KEFD, which is the mean-squared error between the kinetic energy functional derivative (KEFD), δT NN /δρ, obtained by our NN and KEFD, δT KS /δρ, from the KS orbitals {φ i (r)}. Here, N t is the total number of the training data at the real space position r n . The analytical expressions of KEFDs are given by (see Appendix B for the derivation), with c 0 = 3(3π 2 ) 2/3 /10, and where ε HOKS is the highest-occupied KS eigenvalue, and ε i and f i are the KS eigenvalue and occupation number of the i-th KS orbital, and ρ KS = i f i |φ i | 2 , respectively [23]. Instead of directly learning F NN (s 2 , q) from the training data, we minimize the cost function Eq. (8) between KEFDs because we observe that the functional derivative is poorly reproduced when KEDF is trained [40] and this training fails to optimize the electron density in some cases. This problem in the functional derivative leading to the erroneous solution of Euler equation is also reported in the previous efforts to develop KEDF with machine learning [41][42][43][44][45][46][47], where the training set is the KEDF itself. As the training and test sets in the deep learning, we adopt the kinetic energy functional derivative (KEFD) at each real-space grid point. It is noteworthy that the KEFD, [δT [ρ]/δρ](r), appears directly in the Euler equation Eq. (2) and thus assures accuracy of the solution of the Euler equation, which is crucial to reproduce the target electron density ρ(r).
The training set consists of the KEFD at 13,824 realspace grid points obtained for diamond. For efficient training, we adopt the stochastic natural gradient descent (SNGD), which is a mini-batch training based on natural gradient method [48]. In SNGD, we randomly select N b training data, then update the i-th weight W i at t-th epoch as where G is a metric tensor defined by δρ .
(12) Here the learning rate η is fixed at η = 0.1, and n W is the total number of the NN weight parameters, { W k }. To achieve efficient convergence in Eq. (11), we blend the natural gradient and the ordinary gradient by introducing a weighted diagonal matrix ν(t)tr(G)δ jk , where the scheduling ν(t) = ν 0 /(1 + bt) with b = 0.01 is employed during the epochs being decreased from ν 0 = 10 −5 . The determination of those training parameters is explained in Appendix C.
The calculations of ∂L/∂W and Eq. (12) are performed by the backpropagation [49,50]. The algorithm of the backpropagation in the present paper is briefly explained in Appendix D. The training data set was divided into 90% for the training and 10% for the validation. We have examined the dependence of the accuracy of the obtained δT NN /δρ on the structure of our NN, namely the number of hidden layers and the number of neurons in each layer. The accuracy is assessed by the deviation from the training set {δT KS /δρ} at grid points in the real space. The 13824 grid points in a unit cell of diamond are chosen to construct the training set. Table I shows the root mean square error (RMSE) of δT NN /δρ from δT KS /δρ obtained in several NNs. Hereafter we abbreviate the atomic unit of energy (hartree) as Ha. From this examination, we have adopted the NN with three hidden layers and five neurons in each layer in the present paper. The actual weight parameters W of our NN is available upon request. The trend that the RMSE decrease with increasing the number of weight parameters observed in Table I is indicative of further improvement of our NN for better KEDF in future.

D. Augmentation of the enhancement factor
The enhancement factor in the present paper is further augmented by requiring rigorous limits for s → 0 and s → ∞: When s → 0, it should be the second-order gradient expansion [51] as 1 + (5/27)s 2 , whereas when s → ∞ it should be equal to the vW KEDF [8], F → (5/3)s 2 . A form which satisfies those limits within the first order of q 2 is, with α being 40/27. Here, β was previously adjusted to reproduce structural properties of target materials [12,13]. Instead, we here determine β so that the second functional derivative of KEDF derived from Eq. (13) in the small-s limit reproduces the Lindhard function [24]. We have found that β = 0.382 well satisfies this homogeneous-limit condition (for the fitting procedure, see Appendix E).
Then we finally propose our enhancement factor, where X(q) = exp(−Aq 4 ) is an interpolation function between the small-q and large-q subsystems. This is to enhance the accuracy especially for large q obtained by the flexibility of the NN, following the concept of subsystem functionals in DFT [52][53][54][55]. Our final KEDF,T NN , is given by Eq. (5) with F NN being replaced byF NN .
The parameter A can be optimized by minimizing the cost function L by expanding the metric tensor G jk to (n W + 1) × (n W + 1) dimensions by adding the components related to ∂[δT NN (r p )/δρ]/∂A. The optimized A, however, shows multi-minima structure with nearly flat dependence of L on A in the range 10 < A < 32, causing uncertainty in the optimization. Then we additionally minimize the difference between the electron density from our KEDF and the KS electron density, which more reliably settles A at 31.62 (see Appendix F for details).  . The abscissa is the reduced density gradient s and the enhancement factor F (s 2 , q = 0). vW, GE2, and NN denote the enhancement factors of the vW KEDF [8], the second-order gradient expansion [51], andF NN (the present paper Eq. (14)), respectively. and 4. At q = 0,F NN recovers the limiting form F (0) as expected. The behavior ofF NN is substantially different from the limiting form F (0) at q = 4, which is ascribed to the form of NN KEDF, Eq. (6), and relevant to its accuracy. within a typical range of s (0 ≤ s ≤ 2.0). The abscissa is the reduced density gradient s and the enhancement factor F (s 2 , q = 4). vW, GE2, and NN denote the enhancement factors of the vW KEDF [8], the second-order gradient expansion [51], andF NN (the present paper Eq. (14)), respectively.

Electron densities
We assess the accuracy of KEDFs with the selfconsistent-field (SCF) density obtained by minimizing the total energy with the developed KEDF. One way is the direct minimization with iterative techniques [56][57][58][59]. The other way is to solve the Euler equation Eq. (2) by the matrix diagonalization, which we adopt in this work. By introducing vW KEDF [8], , the Euler equation becomes a Schrödinger-type equation for √ ρ [60]. However, it is recognized that the diagonalization of such Schrödinger-type equation suffers from ill convergence [61]. We have found that this difficulty can be circumvented in all KEDFs we considered by introducing a parameter λ and by performing the rearranged diagonalization scheme. Previously, this sort of rearrangement has been applied to the TF(λ)vW functional with λ = 1/5 or 1/9 [62]. We first express KEDF as the sum of scaled vW functional and the rest part: Then Eq. (2) becomes where v KS = v ext + δE H /δρ + δE xc /δρ is the usual KS potential. In our scheme, we have found that λ = 10 leads to a satisfactory convergence in the SCF calculations. Note that any choice of λ offers the same correct SCF solution, if it converges.  Figure 5 shows calculated SCF electron density obtained by our NN KEDF. For comparison, the computed densities using the KEDFs in the past, i.e., PGSL0.25 [12] defined by Eq. (13) with β = 0.25, LKT [14], and the conventional TF(1/5)vW (TF(λ)vW KEDF with λ = 1/5) [63] are also shown. Figure 5 (a) is the electron density along [111] direction in diamond-structured(ds)-Si and Fig. 5 (b) shows its difference from the KS density. The obtained electron density shows overall superiority of our NN KEDFs against PGSL0.25 and TF(1/5)vW most clearly visible at the nuclear site (d = 4.47 bohr). When compared with LKT, NN tends to be more accurate in the bonding region (0 < d < 4.47 bohr). We have also calculated the electron densities with our NN KEDF of other 23 systems (10 solids, 6 diatomic molecules and 7 atoms), and found that the obtained densities satisfactorily reproduce the corresponding KS densities (Supplemental Material [64] for details). It is noteworthy that our NN KEDF obtained by the training data of only ds-C shows high transferability for various systems.
The superiority of our NN KEDF in reproducing the KS densities is quantified by the RMSE of the SCF density ρ scf with respect to the KS density ρ KS for all the 24 systems examined in the present paper. Table II shows the RMSE for periodic systems, including semiconductors, metals and an ionic material with some of their polytypes: diamond, graphene, ds-Si, face-centered-cubic(fcc)-Si, β-tin Si, zincblende(3C)-SiC, body-centered-cubic(bcc)-Li, fcc-Al, fcc-Cu, bcc-Na, and NaCl. For comparison, the results obtained with the KEDFs in the past are also shown. We also present the results by the uninterpolated NN enhancement factor F NN (i.e., A → ∞ in Eq. (14)) (labelled as NN [bare] ) trained by the same scheme as NN. The overall good performance of NN is demonstrated by the RMSE for each material. The RMSE averaged over all 11 materials (the right end column "ratio" in Table II) clearly shows the superiority of the present NN KEDF to other KEDFs in the past.
The RMSE of the SCF density has been also computed for 13 isolated systems, Li, C, Na, Al, Si, Cl, and Cu atoms, and Li 2 , C 2 , Na 2 , Al 2 , Si 2 , and Cl 2 molecules, and then averaged over all the atoms and molecules, as shown in Table III and IV. The same RMSE "ratio" indicates that NN and TF(1/5)vW are most accurate in atoms, whereas NN and NN [bare] are most accurate in molecules.
These results indicate that we have succeeded to construct NN KEDFs that outperform previous ones without resorting to system-dependent parameters. The electron density calculated with NN [bare] shows better performance than NN in some cases. However, physical quantities shown below indicate the limitation of NN [bare] . The present NN up to ℓ 1 = 1 makes the agreement with KS at the first-order derivative of T , while the density is a quantity related at the first order level ρ(r) = −δE[ρ]/δµ(r) where µ(r) is the local chemical potential. Hence the success in reproducing the KS density is intrinsic to the present scheme.
The NN adopted in the present paper produces the minimum value of the cost function L min = 3.519 × 10 −2 Ha 2 . We have examined other NNs which produce the cost function of L min < L ≤ 1.8L min Ha 2 . We have confirmed that the obtained RMSE of the electron density with such NN increases less than 1% at most.

Structural properties
To further examine the validity of our OFDFT scheme with the NN KEDF, we have calculated structural properties and energetics for test systems: the lattice constants a 0 , bulk moduli B 0 and cohesive energies E coh of 11 solids, and the bond lengths r 0 of 6 molecules.
Tables V and VI show the lattice constants and bulk moduli for 11 solids along with the corresponding values obtained using other KEDFs in the past. We compare the relative errors with respect to the KS values (numbers in the parentheses in Tables V and VI). Our NN KEDF provides the smallest relative errors in 10 cases (a 0 for β-tin Si, fcc-Al, fcc-Cu, graphene; B 0 for diamond, fcc-Si, β-tin Si, fcc-Al, fcc-Cu, NaCl), whereas the number of the cases with the smallest errors are 6 for LKT and 5 for PGSL0. 25. The overall quantitative index of the superiority is evaluated as the mean absolute relative errors (MAREs) with respect to the KS values for those quantities. The NN KEDF clearly shows the minimum MAREs for both a 0 and B 0 (Table VI), indicating its superior performance. For all structural properties, NN [bare] produces larger MAREs than the NN functional. This indicates the importance of augmenting the enhancement factor and the validity of concept of the subsystem DFT [52]. It is noteworthy that our NN KEDF functional is trained by a part of the data in diamond and reproduces KSDFT results reasonably for the structural properties of a variety of materials including metals and ionic crystals with no adjustable parameters, where the machine learning parameters are determined uniquely in an ab initio fashion after minimizing the cost function. Table VII shows the bond lengths of 6 molecules. The MAREs with respect to the KS values show that NN KEDF performs overall better than other KEDFs. However, the obtained MAREs of the bond lengths of the molecules are certainly larger than the MAREs of the lattice constants of the solids. This is presumably due to the fact that we have trained our NN using the data of the solid, diamond-structured carbon. This demonstrates the difficulty to develop the KEDF quantitatively valid for both localized and delocalized systems. Table VIII shows the calculated cohesive energies (E coh ) for 11 solids. The chemical trend obtained in the KS scheme is reproduced by our NN KEDF. However, the quantitative reproduction of the KS values is not satisfactory. In fact, all the KEDFs including those in the past provide MAREs of the cohesive energy larger than 20 % (the right-end column). Among them, the NN KEDF keeps the smallest MAREs.
The lattice constant and the bulk modulus are obtained by the behavior of the total energy E as a function of the volume V around its minimum point: a 0 is obtained as the zero point of δE/δV and B 0 requires the second derivative δ 2 E/δV 2 . In our OFDFT with the NN KEDF with ℓ 1 = 1, the first derivative of E with respect to the density is trained. Hence the high reproducibility of a 0 may be intrinsic in the present scheme, while it is natural that the MAREs for B 0 are worse than those for a 0 . Along this line, the machine learning up to ℓ 1 = 2 is desired for physical quantities associated with the second derivative of E such as B 0 . Since B 0 calculated with ℓ 1 = 1 already shows fair agreements with that by KS-DFT and experiments, the higher-order machine learning has potential to provide us with systematically more accurate KEDF.

C. Order N DFT computations
Finally, we have analyzed the computational time of both KSDFT and OFDFT in order to demonstrate our O(N ) scheme. We decompose the computational time for a single SCF iteration (t SCF ) as t SCF = t 1 +t 2 +t 3 +t others , where t 1 is the time for the subspace diagonalization (SD), conjugate-gradient minimization (CG) and Gram-Schmidt (GS) orthonormalization, t 2 for updating density and calculating the total energy, t 3 for the mixing procedure to obtain the new input potential, and t others for other procedures such as MPI gathering and broadcasting eigenvectors. The test systems are 4H-SiC supercells with sizes of 576, 1024, 1600, 2400, and 4704 atoms. Only gamma point is sampled for the Brillouin zone integration in KSDFT. The grid-spacing is chosen as 0.39Å. We use 36 eigenvectors for the diagonalization of Eq. (16). Figure 6 (a) shows the results. The predominant computational time during a single SCF iteration in KSDFT is t 1 , whereas it is t 2 or t 3 in OFDFT. The latter scales with O(N 1.2 ) in our numerical confirmation. The overall computational time in our OFDFT scheme and in our KS-scheme using RSDFT code [30][31][32], which may be the fastest available code, is shown in Fig. 6(b), indicating that the present OFDFT scheme has now achieved essentially the O(N ) scaling.
Several O(N ) density-functional calculations have been proposed and developed in the past. One primitive way is to introduce localized-orbital basis sets to   [65,66]. This is obviously not based on a legitimate principle but relies on the incompleteness of the basis set practically. Another scheme is based on the "nearsightedness principle" [67] of many-electron systems, which states that principal quantities to describe physical properties are essentially local. One of such quantities may be the density matrix and the O(N ) scheme with the density matrix has been developed [68]. Yet in the actual computations, one need to truncate the density matrix in real space, which is actually a system dependent procedure. On the other hand, OFDFT does not require such system-dependent procedure once the suitable kinetic energy functional is developed. In this sense, OFDFT has a potential to become the most legitimate and practical O(N ) scheme, which is applicable to a broad range of materials on the equal footing.

IV. CONCLUSION
We have developed a scheme of the orbital-free densityfunctional-theory (OFDFT) calculations based on the accurate and transferrable kinetic-energy density functional (KEDF) which is created in an unprecedented way using appropriately constructed neural network (NN). Our OFDFT scheme has reproduced the electron density obtained in the state-of-the-art DFT calculations and then provided accurate structural properties of 24 different systems, ranging from atoms, molecules, metals, semiconductors and an ionic material. The accuracy and the transferability of our KEDF have been achieved by our NN training system in which the kinetic-energy functional derivative (KEFD) at each real-space grid point in diamond-structured carbon is used as the training data. The choice of the KEFD as the training data is essential in the following sense: First, it appears directly in the Euler equation which one should solve and it allows us a transparent and intuitive understanding, where the density and its derivatives are the primary and fundamental quantities in the spirit of DFT. Second, its leaning assists in reproducing the physical quantity expressed as the first derivative of the total energy. More generally, the present development of KEDF T [ρ] is in the line of systematic expansion in terms of the functional derivatives δ ℓ1 T /δρ ℓ1 through progressive increase of ℓ 1 . The present numerical success has demonstrated the validity of this approach with the detailed results for case of ℓ 1 = 1. The computational cost of the present OFDFT scheme for the system size N has indeed shown the scaling of O(N ) inherent to OFDFT, as is evidenced by the computations of SiC consisting of thousands of Si and C atoms which are important in developing power-electronics devices.

ACKNOWLEDGMENTS
This work was partly supported by the projects conducted under MEXT Japan named as "Priority Issue on Post-K computer" and "Program for Promoting Research on the Supercomputer Fugaku". In the latter, we are involved in the two subprojects, "Basic science for emergence and functionality in quantum matter: innovative strongly-correlated electron science by integration of Fugaku and frontier experiments" and "Multiscale simulations based on quantum theory toward the developments of energy-saving next-generation semiconductor devices". The JSPS grants-in-aid (Grant Nos. 16H06345 and 18H03873) also support the present work.
Computations were performed with the resources provided by HPCI System (Project ID: hp180170, hp180226, hp190145, hp190172, hp200122 and hp200132) and by Supercomputer Center at Institute for Solid State Physics, University of Tokyo and at Institute for Molecular Sciences, Natural Institute of Natural Sciences In this appendix, we explain our scheme to generate ab-initio local ionic pseudopotentials (LIPSs), its application to 7 elements, lithium, carbon, sodium, aluminum, silicon, chlorine, and copper, and the validity and the transferability of the generated NLPSs.
Local pseudopotentials combined with OFDFT are developed in the pioneering work by Carter and her collaborators [22,[36][37][38]. They are called "bulk-derived local pseudopotential" (BLPS) since the potentials contain parameters which are optimized to reproduce structural properties of target bulk materials: The parameters are (1) the value of the non-Coulombic part of the BLPS in reciprocal space at G = 0 and (2) the cutoff radius r c beyond which the Coulombic tail is imposed on the BLPS in real space. In the construction of BLPS, the KSDFT calculations with NLPS are first performed to obtain the bulk modulus B 0 , equilibrium volumes V 0 , and also the energy ordering of various phases. Then the above parameters in BLPS are adjusted to reproduce those structural characteristics obtained by the KSDFT calculations. Such fitting is also improved by minimizing the difference between forces obtained by the BLPS with OFDFT and those by an NLPS with KSDFT [38]. Currently, these BLPSs are available for Al, As, Ga, In, Li, Mg, P, Sb, Si [69]. After the appropriate fitting, the bulk properties calculated from the BLPSs are in good agreement with those by NLPSs [37].
In spite of relatively good performance of BLPSs, the above mentioned fitting process is manual and cumbersome. The iterative improvement of the BLPS is unavoidable until required target properties can be obtained within acceptably small errors. Furthermore, the previous procedure to solve KS equation inversely and obtain the effective potential [22,36,37] using the Wang-Parr method [70] or the Wu-Yang method [71] require good initial guess for the effective potential.
In our work, we aim to eliminate the fitting parameters and to circumvent the iterative fitting procedure. In our approach, the LIPS can be treated solely in reciprocal space and thus there is no need to perform Fourier-Bessel transform which requires elaborate interpolation in reciprocal space.
The first step is to find an effective potential V eff (r) that reproduces a target electron density generated by the KSDFT calculations using NLPSs. Among many ways to invert the KS equation, we adopt the Kadantsev-Stott method [39] based on the Haydock-Foulkes variational principle [72], which does not require a particularly good initial guess for the local pseudopotential. In this method, a functional Υ[V eff (r)] is defined as where ǫ i are eigenvalues corresponding to the KS orbitals {φ i }, and ρ target (r) is the target electron density obtained by the KSDFT calculation with NLPSs, which is to be reproduced here. The Haydock-Foulkes variational principle ensures that the true effective potential corresponding to the target density satisfies the stationary condition δΥ[V eff ]/δV eff = 0.
In the k-th iteration of the minimization of Υ[V ], the KS equation is solved inversely using the V (k−1) eff (r) in the previous iteration to obtain the density ρ (k) (r). We note that the functional derivative of Υ satisfies an identity δΥ[V ]/δV = −ρ(r) + ρ target (r). Hence in the k-th iteration, V (k) eff (r) is explored along a line defined by and α 1 is determined by the line search. We iteratively continue this process until σ, becomes less than the preset value. We have implemented this variational minimization method in the RS-DFT code [30][31][32].
In actual computations, we first generate the target bulk electron densities by solving the KS equations using NLPSs. We then invert the KS equation using these target bulk electron densities to obtain the local effective potentials V eff (r) for these targets using the Kadantsev-Stott method explained above. The settings of k-point grids and real-space grid spacing used in the inversion are the same as those used in generating the target bulk electron densities.
Next, the Hartree potential V H (r) and exchangecorrelation potential V xc (r) among valence electrons are subtracted from V eff (r) to extract the local ionic potential for the bulk: We then convert V ion bulk (r) into the atom-centered local ionic pseudopotential in reciprocal spaceṼ ion loc (G) as follows. First, V ion bulk (r) is Fourier-transformed into reciprocal space as where Ω is the unit cell volume. Then using the structure factor S(G) of the target bulk material, we obtain the Fourier transformed atom-centered ionic pseudopotential: Finally, by spherically averaging V ion (G), we obtain the atom-centered local ionic pseudopotential in reciprocal space,Ṽ ion loc (G): where N G is the number of G vectors with the length being G. The procedure to constructṼ ion loc (G) explained above ensures that the resultingṼ ion loc has the proper Coulombic tail due to the nucleus and core electrons in real space. Hence we fitṼ ion loc (G) numerically obtained above to the form, which corresponds to the Fourier-inversed-transform, where c core 1 + c core 2 = 1 [35,73]. The KSDFT calculations to obtain the target bulk densities were performed using the RSDFT code [30][31][32]. We adopted the generalized gradient approximation of Perdew, Burke, and Ernzerhof [33] for the exchangecorrelation functional. We used Troullier-Martins (TM) NLPSs [74,75]. As target densities, we chose the densities of ds-Si, ds-C, bcc-Li, bcc-Na, fcc-Al, fcc-Cu, and NaCl with the lattice constants of 5.466Å, 3.560Å, 3.430 A, 4.212Å, 4.051Å, 3.634Å, and 5.694Å, respectively, all of which are equilibrium values within the KSDFT scheme using NLPSs. The k-point sampling grids were taken to be 4 × 4 × 4 in all calculations. The real-space grid spacing was 0.190Å for Si, 0.178Å for C, 0.172Å for Li, 0.211Å for Na, 0.169Å for Al, 0.151Å for Cu, and 0.190Å for NaCl.
We now assess the quality of our LIPSs. We compare the electron densities of bcc-Li, fcc-Al, fcc-Cu, bcc-Na, NaCl, ds-Si, ds-C, and zincblende(3C)-SiC calculated with the NLPS and our LIPS. We denote the density obtained by the NLPS as ρ NLPS and that by our LIPS as ρ LIPS . Figures 7,8,and 9  . For Al and Si, the data from BLPS is also available. Hence we plot the density obtained by the BLPS, labelled as ρ BLPS in Figs. 7 and 9. Figures 7, 8, and 9 clearly show that the electron density obtained by our LIPS well reproduces the density by NLPS. The density by BLPS also seems to be good enough. For Al [ Fig. 7 (b)] and Si [ Fig. 9 (a)], it appears that |ρ LIPS − ρ NLPS | is generally smaller than |ρ BLPS − ρ NLPS |. As for diamond C and SiC, the calculated ρ LIPS satisfactorily reproduce the characteristic features of ρ NLPS : The peculiar double peaks between the C-C bonds in diamond C and the substantial iconicity in SiC (Fig. 9). The maximum deviation of ρ LIPS − ρ NLPS for diamond is 4.38 × 10 −2 bohr −3 at the C nuclear site for diamond C. For SiC it is 4.49 × 10 −2 bohr −3 again at the C nuclear site.
We also evaluate the quality of our LIPSs by calculating the total energy as a function of the volume E(V ) for ds-Si, ds-C, bcc-Li, bcc-Na, fcc-Al, fcc-Cu, NaCl, and zincblende(3C)-SiC. The obtained E(V ) is fitted to Murnaghan's equation of state [76] to deduce the equilibrium lattice constant a 0 and the bulk modulus B 0 . The results are shown in Table X along with those obtained by the NLPSs. The results from the BLPS for Li, Al, and Si are also tabulated.
It is clear that the structural properties of the 8 different materials produced by our LIPSs are as accurate as those from the NLPSs. The electron densities and the structural properties obtained above certainly assures the reliability and the transferability of the frozen-core approximation (pseudopotential scheme) using our LIPSs.

Appendix C: Training details
We here determine the training hyperparameters, η and ν, appearing in the NN weight updating formula Eq. (11). We have first performed a grid search for optimum constant values of η and ν in the range of 10 −4 ≤ η ≤ 10 and 10 −8 ≤ ν ≤ 1, and found that a pair of constants, (η = 0.1, ν = 10 −5 ), provides the smallest RMSE √ 2L of 0.228. Since the ordinary gradient scheme represented by ν is helpful only at early stages of training, we introduce proper time scheduling of ν, keeping η = 0.1 constant. We consider three options for ν(t), ν(t) = ν 0 /(1 + bt), ν 0 / √ 1 + bt, and ν 0 exp(−bt), where t is the number of the training epoch and ν 0 = 10 −5 . We have found that ν = ν 0 /(1 + bt) with b = 10 −2 provides the smallest RMSE and adopted this scheduling in the present work. Figure 10 shows variation of the cost function L during the optimization for the case of the fixed A = 10 1.5 (see Appendix F below). It is noteworthy that the training of only 100 epochs makes the cost function small by three-orders-of-magnitude, being less than 0.1 Ha 2 . It is shown that the cost function becomes smallest at the 1400-th epoch. We thus adopt the NN weight at the 1400-th epoch as the final trained weight. The total computational time for the optimization of the NN weights is typically less than 30 minutes on usual house computers, which is the orders-of-magnitudes shorter than the computational time for the SCF calculations. We note that 3000 training points are sufficient to minimize the cost function to the value less than 0.06 Ha 2 .

Appendix D: Functional derivative training
In order to calculate ∂L/∂W , we need to obtain several derivatives, ∂F NN /∂W , ∂(∂F NN /∂(s 2 ))/∂W and ∂(∂F NN /∂q)∂W . In this work, the first derivative is obtained by the conventional backpropagation, and the second and the third derivatives are obtained by the backpropagation for the derivatives [49,50]. Here we briefly overview the algorithm of the backpropagation in order to explain how to compute the partial derivative of NN outputs with respect to the NN weights W . The number of the neuron in the final layer is D N = 1 in our case. In the conventional backpropagation, our aim is to compute ∂z Using the quantity δ (l) j that satisfies the recursive formula we can obtain ∂z By introducing a quantity ζ (l) jr that satisfies the recursive formula we can compute ∂γ ji starting from ζ (N ) jr = 0 since we take the activation function in the output layer as an identity function, z Appendix E: Determination of the enhancement factor F (0) (s 2 , q) We have optimized β in the enhancement factor F (0) (s 2 , q), Eq. (13), so that the inverse of the response function derived from this enhancement factor in the homogeneous-gas limit, reproduces the Lindhard function: Here η = k/(2k F ) with k F being (3π 2 ρ) 1/3 . To this end, we have discretized η and defined 10 4 points η i ∈ [10 −5 , 4]. Then, we have performed the least square fitting that minimizes the cost function C which results in the optimized value, β = 0.382. The obtained χ 0 (η) is compared with χ Lind (η) in Fig. 11.

Appendix F: Determination of subspace decomposition parameter A
The parameter A which defines our subsystem DFT functional (Eq. (14)) can be determined by minimizing the cost function L (Eq. (8)). In the case, the metric tensor G ik (Eq. (12)) is expanded to (n W + 1) × (n W + 1) dimension by adding the components related to ∂[δT NN (r p )/δρ]/∂A. We have indeed minimized the cost function with the expanded metric tensor and determined the parameter A. In the actual minimization, we have performed grid search for the initial values of A m = 10 0.5m with m being integers from 0 to 9 (a typical logarithmic grid) and the subsequent neural-network minimization of the cost function L. We have found that the optimized value A opt depends on the initial choice A m as shown in Table XI, indicating that the cost function L shows multi-stability as a function of A. This is presumably due to a fact that the NN weights {W i } rather than A are decisive to determine the cost function. The computed RMSE of our KEFD with respect to the KS-KEFD, i.e., √ 2L itself, for each value of A opt are tabulated in Table XI. For A = 10.549, √ 2L has the minimum value of 0.269 Ha although it increases only by 7% for A = 31.76.
Observing the insensitivity of the cost function to the parameter A, we have also performed the minimization of the cost function with the original metric tensor with the dimension of n W with the fixed value of A = A m . The obtained √ 2L is shown in Table XI. The √ 2L values are comparable with those obtained for A opt and the minimum value 0.264 Ha for A m = 10 is even smaller than the value 0.269 Ha for A = 10.549 above. This is presumably because we have succeeded to optimize {W i } better with the fixed choice of A. This may indicate the limitation of the present SNGD optimization scheme for the two parameter sets with different mathematical structures.
Having observed that the √ 2L values are insensitive to the value of A, we introduce another criterion to determine the A value. That is to minimize the difference of the electron density ρ SCF obtained by the SCF calculations in our OFDFT scheme with each A and {W i } values from that ρ KS obtained by the KSDFT scheme. The minimum RMSE 1.186 ×10 −2 bohr −3 of those two densities are obtained for the A value of 10 1.5 . This set of parameters, A and {W i } in turn produces value of 0.274 Ha for √ 2L which is just 3.8 % larger than the minimum √ 2L explained above. Recalling that the electron density is the fundamental quantity in DFT, we adopt the value of A = 10 1.5 in the present work. Further examination of the A value remains in future.      In this supplemental material, we show the calculated Self-Consistent-Field (SCF) electron densities obtained by our NN KEDF along with those obtained by using the KEDFs in the past, i.e., PGSL0.25, LKT, and the conventional TF(1/5)vW [TF(λ)vW with λ = 1/5] KEDFs. As is demonstrated by RMSE of the SCF densities with respect to the KS density in Tables II, III, and IV in the main text, our NN KEDFs outperform the previous KEDFs. The SCF density of diamond-structured (ds-) Si shown in Fig. 5 in the main text also shows the superiority of the present NN KEDF. Here Figs. S1-S23 represent the SCF densities of other 23 systems which corroborate the superiority of the NN KEDF: The densities of ds-C, graphene, fcc-Si, β-tin Si, zincblende(3C)-SiC, bcc-Li, fcc-Al, fcc-Cu, bcc-Na, NaCl, Li2, C2, Na2, Al2, Si2, Cl2, Li atom, C atom, Na atom, Al atom, Si atom, Cl atom, and Cu atom.