Active learning and element embedding approach in neural networks for infinite-layer versus perovskite oxides

Combining density functional theory simulations and active learning of neural networks, we explore formation energies of oxygen vacancy layers, lattice parameters, and their correlations in infinite-layer versus perovskite oxides across the periodic table, and place the superconducting nickelate and cuprate families in a comprehensive statistical context. We show that neural networks predict these observables with high precision, using only 30-50% of the data for training. Element embedding autonomously identifies concepts of chemical similarity between the individual elements in line with human knowledge. Based on the fundamental concepts of entropy and information, active learning composes the training set by an optimal strategy without a priori knowledge and provides systematic control over the prediction accuracy. This offers key ingredients to considerably accelerate scans of large parameter spaces and exemplifies how artificial intelligence may assist on the quantum scale in finding novel materials with optimized properties.

Over the last years, artificial intelligence (AI) algorithms have attracted increasing attention in computational materials science.Machine learning techniques [1][2][3][4][5][6][7][8][9][10] such as deep learning [11][12][13][14] allow for a variety of different intriguing and often unconventional approaches, ranging from applications in molecular dynamics [15], the unsupervised identification of latent knowledge in scientific literature [16], to the understanding of chemical trends from materials data [17,18].In parallel, the increasing computational resources have driven high-throughput searches to identify novel materials with enhanced properties, which resulted in the emergence of different materials databases [19][20][21][22].However, screening large parameter spaces by quantum-scale materials simulations, e.g., employing density functional theory (DFT), is still impeded by a high energy and time consumption.
Aiming for a more efficient strategy, here we complement systematic first-principles simulations across the periodic table with deep learning of artificial neural networks (NNs).We use the topical infinite-layer oxides (IL, ABO 2 ) [23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39] and the respective perovskites (P, ABO 3 ) to show that NNs are capable of understanding the formation energies of oxygen vacancy layers, as well as the lattice parameters of the individual compounds.These observables act as a fingerprint of the reduction reaction.Hence, despite the complexity of these two materials classes and their relations, as evidenced by detailed statistical analysis, NNs autonomously unravel the systematics of their quantum-chemical bonding by using just 30-50% of the data for training.Subsequently, they predict the properties of all compounds, even those they have never seen, with high accuracy, well within the error bars of DFT itself.Interestingly, it turns out to be sufficient to only provide the Aand B-site element names as input to the NNs, and no further atomic properties.Element embedding [17,40] leads to the emergence of a very unique AI understanding of the chemical relations between the individual elements that mirrors the conventional picture of the periodic table.Finally, we show that combining these techniques with active learning [2,41] allows for an efficient screening of the materials parameter space, being clearly superior to a randomly selected training set and providing systematic accuracy control.We provide detailed visual insight into the algorithm's working mechanisms and its performance, exemplifying the potential of AI to considerably accelerate high-throughput materials optimization. Methodology.
We performed first-principles simulations [42][43][44][45] to construct a database of ground-state energies and optimized lattice parameters for 4692 combinations of different elements at the A and B sites (as detailed below) for both the P and the IL oxides, which were modeled by using cubic [8] and tetragonal [28,30] unit cells, respectively.We adopted the DFT+U standards of the Materials Project database [19,46,47].As a difference, rare-earth 4f electrons were consistently frozen in the core [24,30,34,38,48].For the elemental bulk references, we used the Materials Project ground state crystal structures and energies, recalculating 4f and finite-U compounds to ensure consistency.NNs were realized in Keras/Tensorflow 2 [49,50], and the active-learning algorithm was developed in Python 3. The formation energies of the oxygen vacancy layers are determined from DFT ground-state energies by ) models the oxygen-rich limit [51].The heats of formation of the P phase from the constituent bulk elements read All energies are given per formula unit.
Data exploration and statistical analysis.We begin by providing an overview of the data set from a thermodynamic, a structural, and a statistical perspective.Fig. 1(a) displays the entire data in a E VO f vs. E P f phase diagram, comparing the relative stability of the IL and the P structure, as well as their stability with respect to the constituent bulk elements.This is motivated by recent experiments on IL oxides that attracted considerable attention, specifically superconducting nickelates [23,29,35], which are initially stabilized as P films on SrTiO 3 (001) via heteroepitaxy, followed by a topotactic reduction of the apical oxygen ions.E VO f ranges from −8 to +7 eV, while E P f covers almost 30 eV.The plot reveals an overall linear trend, correlating the P stability and its reduction energy.However, the data scatters broadly around the regression line E VO f = −0.36E P f − 1.37 eV.Superimposing  this plot with the Goldschmidt tolerance factor t = r A +rO √ 2(r B +rO) calculated from the ionic radii [Fig.1(b)] reflects that the P stability (moving from right to left) increases with t, reducing again for t > 1.Again, we find that the data scatters broadly around this well-known trend.Structural analysis [Fig. 1

(c)]
shows that most materials exhibit the tendency to contract vertically upon reduction (up to 50%), expanding simultaneously in the plane (up to 10%) with reduced volume, particularly those materials where the reaction is exothermic (E VO f < 0).For some very stable compounds, the changes are rather modest (center of the plot).In sharp contrast, a few materials expand massively in apical direction (c IL 0 /a IL 0 ∼ 2 − 3) with 10 − 20% basal contraction.
Figure 1(a) places the formally d 9 IL nickelates and cuprates in an interesting context (cf.Table I).The nickelates appear as a compact family in the phase diagram, exhibiting a stable P phase, but being simultaneously close to the IL regime; palladates [33] and platinates are even more easily reduced.In contrast, the cuprate family extends widely over the IL region.This reflects the naturally preferred 4-fold coordinated plaquette structure typical for high-T C cuprate superconductors.Continuing this series, d 9 alkali-metal Zn/Cd/Hg Table I.Energies and lattice parameters for selected systems (cf.Fig. 1).for either a fixed A or B site, respectively, unraveling site-and elementresolved trends in the relative stability of P and IL phases across the periodic table.At the A site, most of the central transition metals induce strong tendencies towards the planar IL configuration, particularly W. The remaining elements generally stabilize P, specifically Ca, Sr, Ba, Sc, Y, Pb, and the rare-earth metals.We observe a decreasing trend of E VO f across the rare-earth metals from 3.2 (La) to 2.2 eV (Lu), and shifting from the Sc group (including rare-earth metals, A 3+ ) to the alkali metals (A 1+ ).The B site exhibits a much higher contrast among the different elements: Alkali metals, particularly K, induce the IL phase.The late transition metals (Ni, Cu, Zn groups) largely display an increasingly negative E VO f as well, which highlights their tendency towards the planar IL geometry discussed above [Fig.1(a)].In contrast, the P phase is clearly preferred by the early transition metals as well as by the aluminates.Also Si favors the formation of P oxides such as Mg 2+ Si 4+ O 2− 3 and Ca 2+ Si 4+ O 2− 3 , which are abundant in the lower part of the Earth's mantle [52].
The symmetric matrix in Fig. 2(c) displays the Pearson product-moment correlation coefficients between different observables, ranging from atomic properties of the A and B site elements to the energies and lattice parameters as determined from first principles.E P f shows a modest dependence on the A site, whereas E VO f lacks significant correlations apart from being anticorrelated with E P f (−0.8), which reflects the linear trend observed in Fig. 1. a P 0 and a IL 0 correlate predominantly with the B site, particularly r B (0.8), and are also significantly intercorrelated (0.9).In sharp contrast, c IL 0 exhibits almost no correlations with the other quantities, at most with the Goldschmidt tolerance factor t.While optimized descriptors [6,7,9] may enhance the correlation, this indicates that a nonlinear methodology is required to reliably predict this quantity, which turns out to be challenging, as shown below.
Active learning of neural networks.The interesting question arises whether the insights presented so far would have been possible without explicitly calculating the entire data set, but only a fraction of it.We address this aspect by implementing an active learning (AL) algorithm, which constitutes a form of semi-supervised learning [2,41].Two NNs are trained in parallel [Fig.3(a)].They take the names of the elements at the A and B sites as categorical input, which are onehot encoded and subsequently processed by a 16-dimensional embedding layer [17].Such element embedding is inspired by word embedding [40], a technique used in language processing to represent words in a semantically insightful way in a vector space of compact dimension.Optionally, the NNs feature a parallel numerical input channel to complement the output of the embedding layer by the atomic radii r A,B and the electronegativities χ A,B , which turned out to be largely redundant in view of the more powerful embedding technique.This input layer is followed by a sequence of hidden layers, featuring 512, 256, and 128 densely connected neurons, respectively.We explored different NN architectures and found the present one to yield optimal results.The output layer provides energies or lattice parameters.We apply error backpropagation on the training set (a small subset of the parameter space, ∼ 20%) to automatically adapt the weights that connect the individual neurons, until an optimal mapping from input to output is achieved.Given the observables x 1,2 i as predicted by NN 1 and NN 2 and the respective DFT ground truth x DFT i (either energies or lattice parameters), we define by averaging over i: For Σ (blue), the color scale adaptively maximizes the contrast in each case, highlighting the materials selected for updating the training set (black).The MAE (red) is not used during AL, but can be exploited to trace the performance.hibit the highest Σ(A, B), followed by further NN training.Interestingly, this quantity represents an estimate of the local entropy in the parameter space, which would read H(A, B) ∼ i log σ i (A, B) in case the predictions x i followed uncorrelated normal distributions with σ i (A, B) ∼ x 1 i (A, B) − x 2 i (A, B) .In this spirit, the present AL algorithm statistically maximizes the information entailed in the training set.From the definition of Σ(A, B) it follows that the DFT ground truth is not required by the AL algorithm to select interesting materials candidates; we use it only a posteriori to analyze the AL performance [Figs.3(b) and (c)].
Fig. 4 provides an impression of the NN accuracy.The prediction of the basal lattice parameters a P 0 (not shown) and a IL 0 proved to be straightforward, whereas c IL 0 turned out to be challenging.This can be traced back to the sparse data available for vertically expanding materials [Fig.1(c)] and the only weak correlations of c IL 0 with other observables [Fig.2(c)].Here, AL significantly enhances the prediction accuracy as compared to a randomly chosen training set (Fig. 4).As an example, boron at the B site, combined with a post-transitionmetal element at the A site, tends to induce a large vertical expansion.Already in the first iteration, these unconventional compounds are automatically identified and included in the training set [Fig.3(c)].
AL-iterating towards ∼ 50% training set size, we already obtain a MAE ∼ 0.1 eV for E VO f per vacancy (Fig. 4).Relative to their range of ∼ 15 eV, this corresponds to < 0.7%.The heats of formation are predicted even more accurately, reaching 25 meV/atom (not shown), which is comparable to recent work on perovskites (20-34 meV/atom [12]).This reflects that E VO f is a fingerprint of the complex reaction and thus more demanding to predict.For elpasolites, a heat of formation accuracy of 150 meV/atom was obtained [17].As a reference, the DFT accuracy can be considered as ∼ 0.1 eV [11,53].A MAE of 0.2 eV is achieved already around 35%.A similar trend can be seen for the lattice parameters [Fig.3(b)].In general, we observed that ensemble-averaged predictions of multiple NNs are more accurate than predictions by the individual NNs, attaining < 1% relative error for ∼ 35% training set size.
Figure 5 explores the automatically generated NN element embedding vectors by using stochastic neighbor embedding (t-SNE [54]).The nontrivial projection of a 16-dimensional space to two dimensions reveals that the NNs develop a very unique understanding of the chemical similarity between the individual elements, mirroring the conventional picture of the periodic table.This is even more compelling as the NNs are agnostic about concepts such as the atomic number or the group of a particular element.In addition, we observed that this approach increases the accuracy as compared to directly passing the high-dimensional one-hot encoded element vectors to the densely connected hidden layers.
The AL algorithm can be stopped when the desired accuracy is reached [Fig.3(b)], establishing the latter as a systematic control parameter.Moreover, only the autonomously selected materials need to be calculated ab initio in each iteration.These aspects lead to a substantial gain in performance and energy efficiency as compared to conventional high-throughput calculations.The presented methodology can be straightforwardly generalized to efficiently predict and enhance a broad scope of observables, e.g., the thermoelectric performance [55][56][57], across a large variety of interesting materials classes.

Figure 1 .
Figure 1.(a) Reduction of the apical oxygen ions in P oxides (n = 3) results in the emergence of the anisotropic IL structure (n = 2).This reaction is associated with the energy E V O f .The phase diagram compares the relative stability of the IL versus the P structure as a function of the respective P heat of formation for the entire data set.A number of interesting compounds is highlighted.(b) Superposition of (a) with the Goldschmidt tolerance factor t. (c) Structural perspective on the data, comparing apical to basal changes upon reduction and superimposing them with E V O f .
) and (b) show averaged E VO f

Figure 2 .
Figure 2. Statistical analysis of IL and P oxides.Panels (a) and (b) display trends of E V O f across the periodic table, fixing either the A or the B site and subsequently averaging over the other site, indicating site-resolved which elements tend to stabilize which of the two phases.(c)The correlation matrix unravels the interdependence of the different observables, including atomic properties (atomic number Z, periodic table group g, atomic radius r, and electronegativity χ), the Goldschmidt tolerance factor t, and different energies and lattice parameters as determined from first principles (blue labels).

Figure 3 .
Figure 3. (a) Active learning (AL) cycle.(b) Evolution of Σ and the MAE with the number of AL iterations, shown exemplarily for lattice constant prediction.(c) Parameter space maps monitor consecutive AL iterations, the chemical elements being ordered alphabetically.For Σ (blue), the color scale adaptively maximizes the contrast in each case, highlighting the materials selected for updating the training set (black).The MAE (red) is not used during AL, but can be exploited to trace the performance.

Figure 4 .
Figure 4. Prediction of E V Of , a IL 0 , and c IL 0 by a single NN versus the DFT ground truth, using only ∼ 50% of the data as training set (blue points, seen by the NN).The red points represent the test set, which has never been presented to the before.Contrasting the AL results with those obtained for a randomly chosen training set of equal size reveals the advantages of AL.The predictions for a P 0 and heats of formation are even more accurate (not shown).

Figure 5 .
Figure 5. Stochastic neighbor embedding (t-SNE) analysis of the element embedding vectors (16 → 2 dimensions) shows that the NNs automatically develop their own concept of chemical similarity.