Quantum deep field: data-driven wave function, electron density generation, and atomization energy prediction and extrapolation with machine learning

Deep neural networks (DNNs) have been used to successfully predict molecular properties calculated based on the Kohn--Sham density functional theory (KS-DFT). Although this prediction is fast and accurate, we believe that a DNN model for KS-DFT must not only predict the properties but also provide the electron density of a molecule. This letter presents the quantum deep field (QDF), which provides the electron density with an unsupervised but end-to-end physics-informed modeling by learning the atomization energy on a large-scale dataset. QDF performed well at atomization energy prediction, generated valid electron density, and demonstrated extrapolation. Our QDF implementation is available at https://github.com/masashitsubaki/QuantumDeepField_molecule.

Quantum chemical simulations, such as Kohn-Sham density functional theory (KS-DFT) calculations have been recently approximated by machine learning (ML) techniques such as kernel methods [1][2][3]. Very recently, deep neural networks (DNNs) [4,5] have been used to successfully predict molecular properties, such as the atomization energy, HOMO, and LUMO, on large-scale datasets. Although this prediction is fast and accurate, there is a problem: a DNN model for KS-DFT must be consistently based on an understanding of the underlying physics and must not only predict the properties but also provide the fundamental quantum characteristics (i.e., the wave function/orbital and electron density) of a molecule. Most existing DNN models, however, cannot provide the electron density because they consider only the atomic coordinates and ignore the molecular field. Furthermore, they are mainly interested in predicting the final output, i.e., the interpolation accuracy of a molecular property within a benchmark dataset, and do not focus on capturing the fundamental characteristics of molecules. This does not lead to learning a physically meaningful model and extrapolation, i.e., prediction for totally unknown molecules in terms of its size and structure that do not appear in the dataset. Extrapolation is important in not only molecular science but also real applications for transferring to other molecules or crystals and predicting their properties [6,7] in materials informatics.
In this letter, we present a simple framework called the quantum deep field (QDF) that provides the electron density of molecules by learning their atomization energies on a large-scale dataset. Crucially, our data-driven QDF framework requires only the molecule-energy pairs (e.g., the QM9 dataset [8]) and does not require the molecule-density pairs for training; in other words, QDF generates the electron density indirectly or in an unsupervised fashion with end-to-end physics-informed modeling. The QDF model involves three linear/nonlinear components: (1) a linear combination of atomic orbitals (LCAO): φ → ψ, where φ is the atomic basis function given by the Gaussian-type orbital (GTO) and ψ is the KS molecular orbital, (2) a nonlinear energy functional: ψ → E, where E is the atomization energy, and (3) a nonlinear Hohenberg-Kohn (HK) map: ρ → V , where ρ is the electron density and V is the external potential. We optimize all parameters of the LCAO, energy functional, and HK map, in which the latter two are implemented by simple DNNs, simultaneously using the backpropagation and stochastic gradient descent (SGD) [9]. We emphasize that learning ψ → ρ → V serves as the physical constraint on learning ψ → E (FIG. 1); this allows QDF to provide valid ψ and ρ (FIG. 2) and leads to high prediction and better extrapolation performance on E.
Initially, a molecule is denoted by , where a m is the mth atom, R m is the 3D coordinate of a m , and M is the number of atoms in M. Given M, a set of the molecular orbitals (or wave functions) is denoted by {ψ 1 (r), ψ 2 (r), · · · , ψ N (r)} = {ψ n (r)} N n=1 , where r is a position in the field and N is the number of orbitals. The LCAO (or the superposition of wave functions) provides the nth molecular orbital: basis function whose origin is R i , and N is the number of basis functions. As the basis function, we use the GTO: where D i = ||r−R i ||, q i is the principle quantum number, ζ i is the orbital exponent, and Z(q i , ζ i ) is the normalization term. Additionally, we use the 6-31G basis set and represent {ψ n (r)} N n=1 = ψ(r) with the N -dimensional vector: where ψ(r) ∈ R N has ψ n (r) as its nth element and c i ∈ R N has c ni as its nth element. Note that the orbital exponents {ζ 1 , ζ 2 , · · · , ζ N } = {ζ i } N i=1 and the coefficient vectors {c 1 , c 2 , · · · , c N } = {c i } N i=1 are randomly initialized and then learned/optimized for predicting the atomization energy using the backpropagation and SGD.
For ψ, we express a DNN-based energy functional F DNN as follows: where E M is the predicted atomization energy of M.
Herein we use a simple feedforward architecture for the implementation of F DNN , in which the DNN models the interaction between ψ n and ψ m . Finally, we minimize the loss function: where E M is the atomization energy of M in the QM9 dataset. Details about F DNN and its optimization are described in Supplementary. Unfortunately, only minimizing L E does not lead to learning a physically meaningful model; because the DNN has a strong nonlinearity, such a powerful F DNN [ψ] will output the correct atomization energy E M even if ψ is not valid. In other words, the model does not guarantee the KS orbitals ψ(r), which means that ψ(r) cannot provide the correct electron density by ρ(r) = N n=1 |ψ n (r)| 2 . To address this problem, we impose a constraint on ψ(r) based on the HK theorem, which ensures that the external potential V (r) is a unique (i.e., a nonlinear but one-to-one correspondence) function of the electron density ρ(r), i.e., V (r) ↔ ρ(r). Specifically, we implement the constraint by a nonlinear map ρ(r) → V (r), which we refer to as the HK map [2,10] and learn the nonlinearity using a simple DNN.
Formally, we consider a Gaussian external potential [2,11]: where Z m is the nuclear charge of a m . We assume V M (r) to be the correct external potential of M; that is, V M (r) is used as a target for minimizing loss in the model. Additionally, the electron density is given by For ρ(r), we express a DNN-based HK map HK DNN as follows: where V M (r) is the predicted external potential of M. This letter uses a simple feedforward architecture for the implementation of HK DNN . Finally, we minimize the loss function: Details about HK DNN and its optimization are described in Supplementary.
Note that the HK map used in [2] learns V (r) → ρ(r) by a supervised kernel method, where V (r) is the input potential and ρ(r) is the target density to be learned. In contrast, our HK map is different; that is, the direction is opposite, i.e., ρ(r) → V (r), where the input density is  [5]. We note that these results can vary and SchNet may outperform QDF with careful tuning of its hyperparameters; however, our main aim herein is not to build a competitive model with regard to the interpolation performance within a single benchmark dataset. ρ(r) = N n=1 |ψ n (r)| 2 , ψ n (r) is obtained by LCAO, and V (r) is the target potential to be learned.
Furthermore, please note that as a total learning algorithm of QDF, we minimize Eq. (5) and Eq. (9) alternately in an end-to-end fashion and optimize all parameters of the LCAO, F DNN , and HK DNN using the backpropagation and SGD. We believe that this algorithm involving the HK map constraint allows QDF to function as a self-consistent learning machine for KS-DFT.
To evaluate the QDF performance, we first describe the learning/prediction results of atomization energy E. For training and testing, we used the QM9under14atoms dataset, which is a subset (15,000 samples of relatively small-sized molecules) of the full set (130,000 samples) of the QM9 dataset (see Supplementary). FIG. 3(a) displays the prediction errors, mean absolute error (MAE, where lower is better) in units of kcal/mol, of our QDF and a baseline model in the form of a learning curve. As the baseline, we implemented a variant of graph neural networks (GNNs) [12], which uses the molecular structure (i.e., the types and 3D coordinates of constituent atoms) alone, does not consider the molecular field, and has hierarchical structure and strong nonlinearity in the modeling phase of the molecular structure. Compared to the GNN, our QDF was able to predict E and its er- FIG. 4. The electron densities, generated by our QDF and calculated by the B3LYP simulation, on the chemical bonds. We extracted 1,000 points between the two atoms on the xaxis and displayed the intensity (i.e., the normalized density or potential value) on the y-axis.
ror was much closer to chemical accuracy (1.0 kcal/mol). Additionally, FIG. 3(b) describes how the accuracy improves as the number of training samples increases. The resulting curve was almost linear, and this result is useful for estimating how many training samples are required to achieve a desired accuracy. TABLE I shows the final prediction errors and model sizes of the GNN, QDF, and others [4,5] as references. SchNet, which is a variant of the earlier proposed deep tensor neural network (DTNN), is now a standard stateof-the-art deep learning model. We argue that our implemented GNN is not a weak baseline because it achieved a reasonable performance that was competitive with that of DTNN. Additionally, we believe that QDF outperformed (or was competitive with) SchNet in terms of the prediction error; both MAEs were close to 1.20 kcal/mol. In terms of the model size, however, SchNet has more than 1.5 million learning parameters; in contrast, QDF, which has less than half a million parameters, is much more compact.
From the test dataset, we extracted 10 molecules and visualized their electron density maps in FIG. 2 using Mayavi [13]. We first observed that the electron density ρ is higher at points around C, N, O, and F, which have high electronegativity (red to green) than that at points around H, which has low electronegativity (blue), indicating that ρ transfers from H to C, N, O, and F were qualitatively reproduced by learning the atomization energy E. Additionally, ρ increases between two atoms; for example, ρ between two carbon atoms in C 2 H 2 , C 6 H 5 OH, and other molecules are high, indicating that the formation of double and triple bonding was also reproduced. Furthermore, ρ is much higher on the bonding of F atoms. We believe that QDF has the potential to generate valid ρ even if the model is trained only with respect to E.
We compared the electron densities obtained by our QDF and a hybrid-functional (B3LYP) simulation, in which we focused on the density profiles along the chem- ical bonds of ethane (CH 3 CH 3 ) and benzene (C 6 H 6 ) shown in FIG. 4. We first found that the QDF density (red line) could capture the characteristic double peak features of the B3LYP density (yellow dotted line) on the C -C bond in CH 3 CH 3 and the C --C bond in C 6 H 6 . Additionally, comparing CH 3 CH 3 and C 6 H 6 , the difference with respect to the sharpness of these double peaks could also be reproduced.
Furthermore, we compared the net charge of an atom in a molecule obtained by our QDF with that obtained by the B3LYP simulation and estimated by Bader analysis [14]. Indeed, Bader analysis cannot be directly applied to the current QDF because we generated the grid points around the atoms assuming a spherical distribution (see FIG. 2 and Supplementary). Therefore, we estimated the net charge of the C atom by summing the electron densities inside a sphere with a specific radius, which is half the length of the C -C bond in CH 3 CH 3 (0.76Å) and the C --C bond in C 6 H 6 (0.70Å). Each estimated net charge is as follows: C in CH 3 CH 3 : 4.05 (B3LYP by Bader), 3.87 (QDF by sphere), and its error is −4.44%; C in C 6 H 6 : 4.11 (B3LYP by Bader), 3.89 (QDF by sphere), and its error is −5.35%. These indicate that our QDF could quantitatively reproduce the electron density distribution when compared with that calculated by the B3LYP.
Here, note that we simply followed the work of Brockherde et al. [2] and used the Gaussian external potential as a target for learning the HK map. Although the external potential V is not limited to a Gaussian, the current V between two atoms (blue dotted line in FIG. 4) seems to be reasonable for reproducing the double peak of the B3LYP density. We also believe that this Gaussian potential should be improved because our results could not reproduce the density close to the nucleus (green circle in FIG. 4). Hence, another V and atomic orbital, e.g., a Slater-type orbital (STO), would be more suitable. Overall, FIG. 4 demonstrates the viability of QDF for generating valid ρ in an unsupervised fashion.
Lastly, using a more practical evaluation setting, we present evidence that QDF can capture the molecular orbital/wave function and electron density, i.e., the fundamental characteristics of molecules, from a large-scale dataset. We believe that the following result is an interesting finding of this study that has the potential to facilitate extrapolation, which is difficult to solve by general ML approaches in principle.
We assume that if an ML model could capture the fundamental characteristics of data, the model can be used to conduct a prediction for totally unknown data, i.e., perform an extrapolation. This study evaluated an extrapolation as follows: we trained a model with small molecules and then tested it with large molecules. Specifically, we trained the QDF model with small molecules consisting of fewer than 14 atoms (15,000 samples; the prediction performance was already shown in FIG. 3(a) and TABLE I) and then tested it with large molecules consisting of more than 15 atoms (115,000 samples), which is the remainder of the QM9 dataset (see FIG. 5(a)).
FIG . 5(b) shows the results of this large-scale extrapolation evaluation. The accuracy achieved by GNN is the same as (or superior to) that of the QDF in interpolation; however, the QDF could maintain this accuracy even when the molecular size increases in extrapolation, whereas the GNN could not. Actually, the GNN and its variants have strong nonlinearity in the modeling phase of the molecular structure. By eliminating such nonlinearity using LCAO and imposing the HK map constraint, QDF does not suffer from overfitting. We believe that this is evidence that QDF can capture the fundamental quantum characteristics of molecules that are indepen-dent of the system. FIG. 5(c) shows the error analysis on QDF. We found that the molecules with large errors (over 10 kcal/mol) are often sterically strained or have a cage-like structure and are polarized. However, the number of such molecules in the QM9 dataset is very small, e.g., the ratio of polarized molecules is less than 0.3%. In contrast, although the ratio of molecules including F atom(s) is less than 1.5%, their errors were not large, approximately 0.1 kcal/mol, because these molecules have a stable ring structure and are not polarized; this shows the robustness of QDF.
This study proposed the QDF framework, which is different from other deep learning approaches; in other words, QDF is not an extension of existing DNN models. Our aim was to design a simple DNN model without incorporating complicated techniques/architectures. To model the electron density, some ML approaches have been proposed [15][16][17][18][19]; in particular, we were inspired by Grisafi et al. [15]. They (1) created an electron density dataset including 1,000 configurations of a few kinds of small molecules (C 4 H 6 and C 4 H 10 ), (2) learned a Gaussian process (GP) model for ρ (i.e., supervised learning) of these molecules, and (3) evaluated the transferability of the learned model by predicting the ρ of large molecules (C 8 H 10 and C 8 H 18 ). In contrast, our QDF can be viewed as an unsupervised model to reproduce ρ using only a large-scale dataset of atomization energy E (not ρ). Additionally, we demonstrated the large-scale extrapolation in predicting E. A supervised model as GP would be relatively easy to train and superior to QDF in predicting ρ because the unsupervised QDF is a model closer to generative adversarial networks [20] and would be unstable when generating ρ.
Furthermore, QDF can also be viewed as one of the approaches such as the physics-informed, Hamiltonian, Fermionic neural networks [21][22][23][24]; these solve the physical problems and equations using physically meaningful modeling. QDF is designed as a self-consistent machine to solve the KS equation with minimal (three) learning and physical constraint components: LCAO, F DNN [ψ], and HK DNN (ρ). We believe that integrating a supervised model with a dataset of the electron density [25] (i.e., ρ in FIG. 1 is given as a target) and an unsupervised but physically informed and meaningful model with a dataset of the atomization energy, HOMO-LUMO gap, and other properties [26] will yield an interesting hybrid ML model. QDF will admit many extensions (e.g., for crystals [27]) and applications (e.g., for transfer learning to solve more practical problems in materials informatics [6,28,29]), and our research position and future directions could prove useful (see Supplementary).

SUPPLEMENTARY Dataset
The QDF model was trained using the QM9 dataset [8], which contains approximately 130,000 samples of small, stable organic molecules made up of H, C, N, O, and F atoms, along with 13 quantum chemical properties (e.g., the atomization energy, HOMO, and LUMO) for each molecule. These molecular properties were obtained by a DFT calculation (Gaussian 09) at the B3LYP/6-31G(2df,p) level of theory. However, the model could not be trained using all 130,000 molecules owing to the computational cost of processing a large number of grid points in each molecular field. Therefore, this study used a subset of the QM9 dataset with a limited number of atoms M 14 per molecule, which we refer to as the QM9under14atoms dataset in the main text. The number of samples in the QM9under14atoms dataset was approximately 15,000 molecules. We randomly shuffled and split the QM9under14atoms into training/validation/test sets as 8/1/1, in which the validation set was used to tune the model hyperparameters (see the Hyperparameters section below). We calculated the MAE in TA-BLE I (in the main text) by taking the mean of all results obtained by 10-fold cross validation (CV) on the QM9under14atoms dataset. Additionally, we chose a result from the CV results and display its learning curve in FIG. 3(a) in the main text. For the extrapolation evaluation, we used the QM9over15atoms dataset (i.e., M 15), in which the number of samples was approximately 115,000 molecules, which is 10 times more than that of the QM9under14atoms (training) dataset.
We emphasize that since QDF requires only the molecule-energy (or other properties) pairs for training, we can use the popular QM9 dataset. This has the advantage that existing large-scale datasets can be easily leveraged and we could use another one such as the Alchemy dataset [26]. Very recently, a large-scale electron density dataset was created [25] for supervised learning; however, we believe that QDF, which is an unsupervised method for learning electron density, is a reasonable proposition, and integrating the supervised/unsupervised approaches with the density/energy datasets could lead to an interesting hybrid model in the future.

Molecular field definition
Given a molecule M = {(a m , R m )} M m=1 , we consider spheres with a radius of SÅ, where each sphere covers each atom centred on R m , and then divide the sphere into grids (or meshes) in intervals of GÅ. This process yields many grid points in M, as shown in FIG. 2 in the main text. The grid-based field of M is denoted by where r j is the 3D coordinate of the jth point and F M is the number of points in F M .

Architectures of F DNN and HK DNN
To implement F DNN described in the main text, given {ψ(r j )} F M j=1 , where each ψ(r j ) denotes the KS orbitals obtained by LCAO on r j in the above defined grid field, we first consider the following feedforward architecture: where = 1, 2, · · · , L is the number of hidden layers (ψ (1) (r j ) = ψ(r j ) and L is the final layer), ReLU is the nonlinear activation function ReLU(x) = max(0, x), W ( ) E ∈ R N ×N is the weight matrix in layer , and b ( ) E ∈ R N is the bias vector in layer . We then sum over {ψ (L) (r j )} F M j=1 and output an atomization energy with the following vanilla linear regressor: where w E ∈ R N is the weight vector and b E ∈ R is the bias scalar. FIG. 6 illustrates the architecture of F DNN . As this figure makes clear, each layer models the interaction between the nth and mth KS orbitals in the vector. We implement HK DNN described in the main text using a similar feedforward architecture: Thus, HK DNN maps a scalar ρ to another scalar V . In this map, each hidden layer h ∈ R N is an N -dimensional vector, where N is the number of hidden units and a hyperparameter of the model. FIG. 7 illustrates the architecture of HK DNN .

Optimization
Using the backpropagation and an iterative SGDbased learning algorithm, we minimize the loss function L E in the main text; in other words, The set of KS orbitals obtained by LCAO 6. Architecture of the DNN-based energy functional FDNN described in the main text.
we update the set of learning parameters where L E M k is the atomization energy loss value of the kth molecule M in the training dataset, α is the learning rate, and B is the batch size. Additionally, we also minimize the loss function where L V M k is the external potential loss value of the kth molecule M in the training dataset. In practice, the SGD in this study used the Adam optimizer [9]. Note that we update Θ E and Θ V alternately and the learning parameters in LCAO, i.e., {ζ i } N i=1 and {c i } N i=1 , are shared in Θ E and Θ V .

Normalization
In LCAO, we consider the normalization for the coefficients, i.e., N i=1 c 2 ni = 1. This can be implemented by updating c n in the iterative SGD-based learning algorithm as follows: where c n ∈ R N is not the nth row vector (i.e., c n ) but the nth column vector of the coefficient matrix (see our implementation). Additionally, the normalization term in GTO is calculated as follows: Note that because each orbital exponent ζ i is a learning parameter of the model, Z(q i , ζ i ) is recalculated every time the model parameters are updated. Furthermore, we must consider the total electrons: ρ(r)dr ≈ where N elec is the total electrons in M. We implemented this by updating ψ n in the iterative algorithm as follows: The set of KS orbitals obtained by LCAO where ψ n ∈ R F M is not the jth row vector (i.e., ψ(r j ) ∈ R N ) but the nth column vector of the matrix in FIGs 6 and 7 (see our implementation).

Hyperparameters
All model/optimization hyperparameters and their values used in this letter are listed in TABLE II (see also our source code). It is important to note that N , which is the dimensionality of coefficient vector or the number of atomic basis functions in LCAO, is actually different for each molecule in quantum simulations. However, an ML model needs to set a common global N for all molecules in a dataset and the QM9 dataset contains only small organic molecules made up of less than 30 H, C, N, O, and F atoms. Considering the maximum size of molecules in the QM9 dataset and a standard 6-31G basis set, we set N = 200.

Limitations and future directions
Our current QDF framework also comes with some limitations relative to the DFT calculations and other ML/DNN approaches.
QDF and other ML/DNN approaches using such as the Coulomb matrix [1] and SchNet [5] require a 3D (e.g., the DFT-relaxed) structure of molecule in the first place; in other words, these do not involve the molecular structure optimization process and cannot be a complete replacement of the DFT calculations. We believe the development of ML/DNN models including the molecular structure optimization process is interesting but very challenging in terms of learning the model and optimizing the structure; we leave it for future work.
To avoid the use of the 3D structure required in the Coulomb matrix, SchNet, and our QDF, we can use the molecular descriptors based on chemical composition data. Indeed, Faber et al., 2017 [3] conducted a comprehensive evaluation of various ML models with various descriptors on the QM9 dataset. In this evaluation, the prediction performance for the atomization energy of the kernel ridge regression (KRR) as the ML model and the molecular fingerprints as the descriptor was very poor; the error was 4.25 eV = 98.0 kcal/mol. Faber et al. also reported that the performance of the GNN [12] using the molecular graph was significantly better than that of KRR. However, the molecular graph is based on the adjacency (i.e., binary) matrix describing the chemical bonds (e.g., C --O and N -H); this implies that GNNs consider the atomic wave/basis functions as the binary (i.e., 0 or 1) values. Additionally, the deep nonlinearity in GNNs destroys the linearity in LCAO. Furthermore, as we have described in the main text, these characteristics degrade the extrapolation performance compared to our QDF. From these observations, we believe that the 3D structure, not the fingerprint and the graph, of molecules is required to improve prediction and extrapolation performance.
Thus, while it is better to achieve high prediction and extrapolation performance using only the fingerprint and the graph in terms of cost and simplicity, current ML models struggle to achieve this without the 3D structure. However, we believe that our QDF can be adapted to the molecular graph based on the atomic distances focused on the chemical bonds obtained by SMILES and a software such as RDKit (https://www.rdkit.org/). Using the atomic distances or chemical bond lengths, our QDF can learn the atomization energy; this will be a "compromise" between the current QDF with the DFT-relaxed 3D structure and the GNN with the molecular graph, which can be called a graph-based QDF. We believe that if the graph-based QDF achieves reasonable performance compared to the current QDF, our QDF framework can be used as a practical high-throughput prediction method without the 3D structure.
While our implemented QDF model fulfills the normalization condition in the atomic and molecular orbitals (precisely, the basis functions and the KS orbitals), the model does not satisfy the orthogonality of the orbitals. We believe that it is difficult for an ML model to consider the orthogonality in terms of learning cost, e.g., we would additionally need to learn the orthogonality for all orbital pairs. Fortunately, however, we obtained qualitatively valid electron density from the current orbitals, so this study did not consider the orthogonality.
In QDF, the set of orbital exponents {ζ i } N i=1 and coefficient vectors {c i } N i=1 of LCAO are the common global learning parameters for all molecules; on the other hand, these actually differ for each real molecule (i.e., characterized with the atomic environment) in quantum simulations. This is considered in GNN variants (e.g., the interaction blocks in DTNN and SchNet) and ignoring this characteristic may be a limitation of the current QDF. However, we also believe that such global parameters prevent the model from being too flexible for each molecule. Considering this, our implementation would be reasonable in terms of reducing the model parameters and complexity, leading to robust extrapolation.
The selection of the basis set is critical for the pre-diction accuracy of QDF, which is the same as for the computational accuracy of DFT. This study used the 6-31G basis set considering the level of theory of the QM9 dataset; of course, QDF can use other basis sets, such as the 6-311G, 6-31+G(d,p), and STO not GTO, for improving the performances of atomization energy prediction/extrapolation and electron density generation. Additionally, although this study focused on the isolated molecule, for modeling the crystal structure QDF must use the plane wave basis set. Recently, the crystal graph convolutional neural network (GCNN) has been proposed [27] and successfully learned and predicted the formation energy, band gap, and other properties of crystals of over 60,000 samples in the Materials Project database [7]. The crystal GCNN and its variants, however, do not consider the plane waves and LCAO (i.e., tight-binding approximation); we believe that QDF can be extended to crystals and will replace the GCNN. In the crystal modeling, the selection of the plane wave basis set is also critical for the prediction accuracy of QDF. The nonlinear components in the current implementation are most straightforward; in other words, the model used in this study is the simplest special case of the QDF framework. As shown in FIGs 6 and 7, we implemented F DNN and HK DNN by a feedforward architecture; however, these can be further extended. Indeed, both DNNs do not use any additional architectures and techniques, e.g., residual networks, drop out, and batch normalization, and these will be effective for QDF. Additionally, the current HK map is V = HK DNN (ρ), that is, this is a local density approximation (LDA) fashion and this study used it as a first step. We emphasize that even using this LDA-like HK map, we achieved high accuracy in terms of predicting/extrapolating the atomization energy and generating the electron density. Of course, this can be improved. For example, instead of V = HK DNN (ρ), we can consider a generalized gradient approximation, i.e., the GGA-like HK map V = HK DNN (ρ, ∇ρ). Furthermore, as we have described in the main text (see also comparison of the potentials in FIG. 8), the Gaussian external potential should also be improved and this can be addressed relatively easily because we only consider and set another external potential and learn it as a new target.
QDF requires much more training time and memory than GNN, because of the 6-31G basis set and all points in the fine grid fields of molecules. Indeed, GNN can use a large batch size (e.g., B = 64 and B = 128); however, QDF can use B = 4 at most and this did not allow us to train the model on all 130,000 molecules of the QM9 dataset. We believe that training a QDF model on a larger number of data samples will require the efficient use of dozens of GPUs.
Although QDF has the above limitations (e.g., as a replacement model of the DFT calculations), the important point is that QDF has many extensions and applications; in particular, we believe that QDF will provide transfer learning applications to solve practical problems in materials informatics. For example, the pre-trained QDF model with the atomization energy of the largescale QM9 dataset can be used for transfer learning and predicting of other properties (e.g., the HOMO, LUMO, and their gap) universally. The reason is that the model can encode information about the fundamental quantum characteristics (i.e., ψ and ρ) of molecules by learning the HK map constraint, which is shown by the electron density comparison with the DFT calculation and the extrapolation evaluation in predicting the atomization energy of large molecules. We now plan to transfer the pretrained QDF model to other datasets and properties such as the HOMO-LUMO gap of much larger molecules, i.e., polymers [6], extend the current implementation to crystal structure data in the Materials Project database [7], and evaluate the extrapolation in predicting the polymer and crystal properties.

Data and code availability
Our QDF implementation in PyTorch, including the preprocessed QM9 dataset, is available at https:// github.com/masashitsubaki. Model extensions can be created by forking this source code.