Graph-based machine learning beyond stable materials and relaxed crystal structures

There has been a recent surge of interest in using machine learning to approximate density functional theory (DFT) in materials science. However, many of the most performant models are evaluated on large databases of computed properties of, primarily, materials with precise atomic coordinates available, and which have been experimentally synthesized, i.e., which are thermodynamically stable or metastable. These aspects provide challenges when applying such models on theoretical candidate materials, for example for materials discovery, where the coordinates are not known. To extend the scope of this methodology, we investigate the performance of the Crystal Graph Convolutional Neural Network (CGCNN) on a data set of theoretical structures in three related ternary phase diagrams (Ti,Zr,Hf)-Zn-N, which thus include many highly unstable structures. We then investigate the impact on the performance of using atomic positions that are only partially relaxed into local energy minima. We also explore options for improving the performance in these scenarios by transfer learning, either from models trained on a large database of mostly stable systems, or a different but related phase diagram. Models pre-trained on stable materials do not significantly improve performance, but models trained on similar data transfer very well. We demonstrate how our findings can be utilized to generate phase diagrams with a major reduction in computational effort.


I. INTRODUCTION
Discovering new materials is a driving force for new technologies.With the increase in computational resources, there has been a surge in available data from automated high-throughput materials simulations, obtained using supercomputers.The computed properties have been made available in online databases where one can search for materials with specific desired properties, which enables new applications [1].
A central concern in the design of a new material is its thermodynamical stability.This is determined by the formation energy of a phase in relation to that of other phases at the same composition, as well as the most stable combination of competing phases it can decompose into.Hence, an extensive, in principle exhaustive, search of the corresponding chemical space is needed to properly determine stability.A common computational method for this purpose is density functional theory (DFT) [2,3], which on the semi-local level of theory is usually considered as having a fairly low computational cost compared to other quantum-mechanics-based methods, but nevertheless limits how many materials that can be screened.Using machine learning as an approximation of conventional simulation methods such as DFT has recently gained interest [4][5][6].Many methods are reported with impressive results with prediction errors close to or lower than that of DFT [7], and their low computational effort make them highly interesting for complementing or replacing quantum-mechanical-based calculations for, e.g., computing formation energies.
Much effort has been put into finding suitable representations, so called descriptors, of crystal structures for use in machine learning methods.Early works include, for example, smooth overlap of atomic positions [8], three descriptors inspired from the Coloumb matrix description of molecules [9], descriptors based only on stoichiometry [10] , and the many-body tensor-representation [11].
An alternative approach to predicting material properties from a descriptor of the crystal structure is to learn an interatomic potential (see, e.g., Refs.[12][13][14][15].The use of interatomic potentials are perhaps more associated with molecular dynamics simulations, but they can also be used to relax the atomic positions and predict the formation energies of competing phases in a phase diagram [16].
Recently, much work has been directed towards so called feature learning which aims to find machine learning models that can learn both suitable descriptors and the target value from these descriptors.In particular, the use of neural networks for graph data, so called graph neural networks (GNNs), have received more interest in chemistry and physics.One motivation for this is that molecules can naturally be described as graphs [17,18], but a pioneering work by Xie and Grossman [19] shows that this type of model is also very appealing for crystals.Xie and Grossman demonstrate the predictive performance of their Crystal Graph Convolutional Neural Network (CGCNN) by predicting multiple properties of crystals, including the formation energies of 9350 crystals from the Materials Project database [20].A number of works have followed (see, e.g., Refs.[21][22][23] which all develop GNNs for crystal structures and which are evaluated using data from the Materials Project.
However, most of the available databases of computational materials properties, such as the Materials Project, are based on structures found experimentally, e.g., as reported in the Inorganic Crystal Structure Database [24] or Crystallography Open Database [25][26][27].This gives a bias towards stable, or close to stable, structures, meaning evaluation of models on these databases is confined to a restricted subspace of all possible phases.Hence, the performance for generated, possibly highly unstable, theoretical structures cannot be assumed to match this performance.
Candidate materials for a phase diagram are typically created by substituting atoms into the crystal structures of other, already known, materials.The atomic positions of these candidate materials can be far away from the local energy minima.If one seeks the formation energy of the relaxed structure, the model has to, in principle, perform two tasks.Not only does it have to predict the formation energy, but it needs to do so by overcoming the unrelaxed structure.It has been shown that this issue indeed affects the predictive performance of GNNs, which are relying on knowledge about the structure [23,28].
The performance of machine learning methods on highly unstable material phases, as needed to predict phase diagrams, has so far not been considered in great detail.Neither has it been thoroughly studied how the predictive performance of a model is affected when predicting formation energies of structures which are not completely relaxed, and how it is improved when the structure moves closer to the relaxed geometry.To investigate this, we perform an empirical evaluation, using the CGCNN model and a data set with theoretical structures from three related ternary phase diagrams, Hf-Zn-N, Ti-Zn-N, and Zr-Zn-N generated by Tholander et al. [29].First, we investigate the performance when CGCNN is trained with this data set of mostly unstable materials.Then, we investigate how the performance changes when the atomic positions have only been partially relaxed into local minima.When the performance in these settings has been established, we explore options for reducing the need for training data by using transfer learning.This is done by pre-training a model using either a large database with a variety of mostly stable materials, or data from one of the related phase diagrams.The pre-trained models are then fine-tuned with data from the phase diagram of interest.
We conclude our findings by using CGCNN to speed up the generation of phase diagrams.CGCNN predicts the formation energies of partially relaxed structures between steps in the relaxation process, and structures far from the convex hull are discarded and excluded from further relaxation.Thus, the total computational cost is reduced.

A. Composition phase diagrams and thermodynamical stability
A composition phase diagram shows the stable phase of a material as a function of its composition.A phase is stable if its formation energy is lower than any linear combination of competing phases.This means that the stability of each phase can be determined by constructing a convex hull of the phases in the phase diagram [30,31].If a phase belongs to the convex hull it is stable, otherwise, it will decompose into the linear combination of phases on the hull that decreases the total energy.To generate a phase diagram without any prior knowledge of which phases that are stable, a large number of potential materials needs to be considered since one in principle need to determine the formation energy of every possible phase at every composition.This type of phase diagrams are highly relevant in materials design since finding materials that are stable is a central aspect.Additionally, they solve the crystal structure prediction problem, i.e., for a given composition, the corresponding crystal structure can be determined.
Computational methods, in particular DFT, are often used to predict formation energies of unknown phases.To generate the potential materials, other elements are substituted into already known materials [32,33].This procedure generates hypothetical materials for which the exact geometry is unknown.Hence, before their formation energy can be calculated, they have to go through a relaxation procedure which typically consist of a number of steps where the cell volume, cell shape and atom positions are changed into local minimum of the formation energy.

B. Graph neural networks
Graph neural networks (GNNs) are a special kind of artificial neural networks designed to handle graph structured data.Let G be a graph defined by (V, E) where V is the set of nodes and E is the set of edges.A node v ∈ V is associated with a feature vector x v and an edge (v, w) ∈ E with a feature vector e vw .A GNN is then a parametrized function f θ (•) that computes some target value y = f θ (x), where x = {x v : v ∈ V } ∪ {e vw : (v, w) ∈ E}.In short, the function f θ (•) is based on a message passing scheme where a hidden representation h v of a node v ∈ V is iteratively updated by aggregating information from its neighborhood, with each iteration corresponding to a layer in the neural network.Following the notation introduced by Gilmer et al. [34], the hidden representation of a node v ∈ V after iteration t is called h t v .It is updated by first computing a message FIG.1: Illustration of how the hidden node representations are updated.Information is aggregated from the neighborhood of each node, meaning that after some iterations, the node feature vector contain information about its initial state, but also about its neighborhood.(a) Initial graph (b) after t iterations.Normally, a node is represented by a high dimensional vector, and the colors are just for illustrative purposes.
and then computing node representation as Here, N (v) represents the neighborhood of v, and the message function M t (•) and update function U t (•) are parametrized by θ and learned via, e.g., stochastic gradient descent.The process of propagating information to the neighbourhood is illustrated in Fig. 1.
In this work we use the Crystal Graph Convolutional Neural Network (CGCNN) [19].A crystal is modeled from the unit cell as a graph with the atoms as nodes and edges between the closest neighboring atoms within a certain distance, taking into account the periodic boundary.The initial atom representation x v is a 92-dimensional vector with one-hot encoded physical properties of the atom, and the vectors e vw encoding the distance between atoms, expanded with a radial basis function similar to SchNet [35].

C. Ensembles and uncertainty quantification
An ensemble combines multiple models by averaging the predictions made by the individual ensemble members into a final prediction.One reason to use ensembles is that it has been shown to improve the predictive performance [36].Furthermore, it provides a way of estimating the total uncertainty of the predictions.This can be divided into aleatoric and epistemic uncertainty.Aleatoric refers to the underlying natural randomness.In machine learning terms, it is the uncertainty that remains even if the model parameters θ are perfectly known.Epistemic instead refers to uncertainty due to lack of knowledge.This arises in machine learning since a model is trained from a finite training data set, resulting in (epistemic) uncertainty about the parameter values themselves.For example, an ensemble could consist of N different neural networks, trained on the same data but with random initializations and stochastic optimization.Thus, N plausible parameter configurations {θ i } N i=1 are obtained, which gives N different predictions.Using the spread of the predictions of the ensemble members has proven to be an efficient way of quantifying epistemic uncertainty for neural networks [36,37].

D. Data sets
In this work we use a data set of Tholander et al. [29] (TAATA, after the name of the authors).This data set contains DFT calculations, including structure relaxation, for a large number of materials phases in the phase diagrams Hf-Zn-N, Ti-Zn-N, and Zr-Zn-N.These three phase diagrams make up the three subsets TAATA-Hf, TAATA-Ti, and TAATA-Zr.The three subsets consist of 4839 (TAATA-Hf), 4328 (TAATA-Ti), and 3650 (TAATA-Zr) materials, each with three different relaxation steps: preprerelax, prerelax and final relax, with final relax being the most accurate level.These relaxations were performed using the projector augmented wave method (PAW) [38] and Perdew-Burke-Ernhofer generalized gradient approximation (PBE-GGA) [39] as exchange-correlation functional, implemented in the Vienna Ab-initio Simulation Package [40,41] (VASP) (v5.4.1).All calculations used a 600 eV plane wave energy cutoff.In the first relaxation step, preprerelax, only the unit cell volume is relaxed, which requires on average approximately 5 core minutes/phase.In the prerelax and final relaxation steps, volume, cell shape, and the internal degrees of freedom describing the atomic coordinates are relaxed, with the final step done with higher precision (the VASP-setting PREC set to "Accurate" instead of "Normal", which controls the number of grid points in the Fourier transform grids and the accuracy of the PAW projector representation in real space).These two steps require approximately 60 core minutes/phase (prerelax) and 540 core minutes/phase (final).
We have also used a model trained on a data set extracted from Materials Project by Xie and Grossman [19] (the model can be found together with the CGCNN code [42]).This training data set consists of roughly 37 000 materials with completely relaxed structures, covering 87 elements, and each material consisting of up to seven different elements.Apart from covering a different chemical space, this data set has not been produced with the same computational parameters as TAATA.

E. Transfer learning
A machine learning model requires data from DFT calculations to learn to approximate DFT, and typically improves with more data.This means that there is a trade-off between the computational expense of producing DFT simulations (training data) and model accuracy.Relying on training data that, in some sense, is expensive to obtain is common in machine learning (e.g., images annotated by humans according to what they depict), and much effort has therefore been put into investigating methods to reduce the need for data.One such method is transfer learning [43].The idea is simple: a model is pretrained on one problem, after which it is fine-tuned by training on data from a different, but similar, problem.This can be seen from the perspective of feature learning as assuming that the two problems could be represented well with more or less the same features.If the problem of interest is suffering from low availability of training data, these features can be learned better from a different problem where more data is available, and then transferred to the new problem domain.

III. METHOD
We use an ensemble of five (N = 5) CGCNNs.Each network is trained on the different subsets of the TAATA data set using an L1 loss function and the Adam optimizer [44].Similarly to the original work, the optimized hyperparameters are the learning rate, number of layers, and weight decay, using a grid search with completely relaxed structures from all three TAATA subsets.The combination of hyperparameters with the best MAE on a validation set was then chosen for all subsequent investigations.For the other hyperparameters, the default values provided in the CGCNN code [42] were used.
The individual ensemble members are trained with access to the same training data.However, some of the training data is reserved to be used only as validation data to avoid overfitting by early stopping.To capture the uncertainty of model parameters (i.e., the epistemic uncertainty), the split of training/validation data is different for the different members, in addition to initializing the model parameters randomly and using stochasticgradient-based training.When using CGCNN to aid in the generation of a phase diagram (described later in this section), we estimate the total uncertainty as σ = σ 2 a + σ 2 e , where σ 2 a and σ 2 e are the aleatoric and epistemic uncertainty respectively.We estimate σ 2 a as the average of the mean squared errors (MSE) of the ensemble members when predicting formation energies of the validation set during training, and σ 2 e as the variance of the predictions made by the individual ensemble members.
To investigate transfer learning, we have compared two different scenarios.In the first scenario, data is assumed to be available for a phase diagram that is related to an unknown phase diagram of interest.W want to use this data to improve the predictive performance for the unknown phase diagram.This situation could appear, for example, when a study is expanded into a larger chemical space.As a second scenario, related phase diagrams are not available, but we instead try to utilize data from a large database of materials involving all elements like Materials Project.We explore and compare both these scenarios by (i) transferring a model pre-trained on one TAATA phase diagram by fine-tuning with data of another; (ii) transferring a model trained by Xie and Grossman [19] on Materials Project data (available online [42]) by fine-tuning with data from one of the TAATA phase diagrams.
Finally, we illustrate how our findings can be used to accelerate the generation of a phase diagram with CGCNN using the following scheme: Start by creating a set of candidate structures, S, e.g., by substituting desired chemical elements in structure prototypes available in present materials databases.This set is divided into a training set S train and an evaluation set S eval .The structures in S train are fully relaxed, through the prepre-, pre-, and final relaxation steps, and their formation energies are obtained.Using the partially relaxed training structures and their final formation energies, two (ensembles of) CGCNN models are trained for the preprerelax and prerelax steps, respectively.Then, all the structures in S eval are relaxed to just the preprerelax step.The preprerelax-level CGCNN model predicts the final formation energies of the partially relaxed evaluation structures and a phase diagram is constructed based on these predictions together with all the structures in the training set.Then, all structures in S eval which, in the predicted phase diagram, has an energy above the resulting convex hull of stability E hull > 2σ are removed (with σ being the total uncertainty of the prediction described above).The remaining structures go through the prerelax step and the procedure of predicting a phase diagram and removing structures far from the convex hull is repeated.The remaining structures are then relaxed through the final step and the phase diagram is constructed using the energies computed by DFT for the structures in S train and the non-discarded structures in S eval .

IV. RESULTS
Test sets for evaluating the performence were created by setting aside 20 % of the materials for each TAATA subset.For the partially relaxed structures, structures used for training and testing have been relaxed equally far.However, the target formation energy is always the final formation energy, obtained at the last relaxation step.There are no major differences between the three phase diagrams, and the errors for the different levels of relaxation are ordered as expected.

A. Training on different relaxation levels
Test MAEs for models trained from scratch are shown in Fig. 2. For the largest training set size, i.e., all of the data points not used for testing (2920, 3460, and 3870 materials for TAATA-Zr, TAATA-Ti, and TAATA-Hf, respectively) the test MAE in eV/atom is 0.038 for TAATA-Ti, 0.043 for TAATA-Hf, and 0.046 for TAATA-Zr when using fully relaxed structures.The corresponding test errors for prerelax is 0.11 eV/atom (TAATA-Hf and TAATA-Ti) and 0.12 eV/atom (TAATA-Zr), and for preprerelax 0.20 eV/atom (TAATA-Hf and TAATA-Ti) and 0.21 eV/atom (TAATA-Zr).It is clear that the performance improves with the amount of training data, and in general worsen with more inaccurate descriptions of the atomic positions, as expected.
Comparing these results with using the sine-based Coloumb matrix together with kernel ridge regression (KRR) [9], the Coloumb matrix seems to be very robust with respect to the relaxation levels.For example, using the largest training size of 3870 training materials from the TAATA-Hf data set, the test MAE is 0.33 eV/atom for preprerelax, 0.32 eV/atom for prerelax and 0.31 eV/atom for the final relaxation level.However, these errors are still much larger than those of the CGCNN model.It also seems like the Coloumb matrix with KRR suffers from overfitting for small training sizes with a test MAE of more than 0.9 eV/atom for TAATA-Hf when using 60 training datapoints (for full details on the results with KRR, see supplemental material [45]).To put the results in some perspective, a model always predicting a formation energy of 0 eV/atom would yield a test MAE of roughly 0.5 eV/atom.(A model that instead predicts according to the mean of the largest train- FIG.3: Predicted vs actual formation energy for models trained with all data except for the test data (3870, 3460, and 2920 datapoints for TAATA-Hf, TAATA-Ti, and TAATA-Zr respectively).The overall slope of the points vs. the ideal line (black) suggests a tendency to underestimate the formation energy at higher values, and underestimate at lower values.
ing set, roughly −0.2 eV/atom for all three subsets, improves that MAE with less than 0.05 eV/atom.) We further investigate how the predicted formation energies compare to the actual formation energies in the predicted-versus-target scatter plot in Fig. 3 (since the performance is similar between the different phase diagrams, we have only separated the predictions according to relaxation level and not according to phase diagram).Apart from again demonstrating that more accurate structures give better predictions (as can be seen also in Fig. 2), Fig. 3 shows a small tendency of the model to underestimate high formation energies and overestimate lower formation energies for the less accurate relaxation levels.
All predictions using CGCNN have, so far, been the average of the five ensemble members.To demonstrate the improved performance enabled by using an ensemble, we compare the test MAE of the ensemble with those of the individual ensemble members, using the TAATA-Hf data set.The results can be seen in Fig. 4, and demonstrate the power of using ensembles; the MAE of the ensemble is consistently lower than that of the individual members, except for the smaller training sets where individual members occasionally outperform the ensemble.On the other hand, in the case of small training sets, some of the members are also far worse.We obtain the same qualitative results for the other data sets; see the supplemental material [45].I: Test MAEs (eV/atom) for models pre-trained on different data sets when predicting formation energies of materials from TAATA-Hf.The relaxation levels indicate which structures have been used for fine-tuning (and hence, testing).TAATA models are pre-trained on the same relaxation levels as transferred to, whereas the MP models are transferred from the same model.

B. Transfer learning
Test errors when transferring models to the TAATA-Hf data set are shown in Fig. 5. Apart from pre-training on a different TAATA subset, we have also used a pretrained CGCNN model provided by Xie and Grossman [19], trained on data from Materials Project (MP) (available online [42]).The results when training a model from scratch are provided as a comparison.For the pre-trained models, test MAEs before being fine-tuned are presented in Table I.To put the poor performance of the pre-trained models without fine-tuning into perspective, we reiterate that simply guessing the formation energy as 0 (alternatively, as the mean of the training data) gives a test MAE of roughly 0.5 eV/atom.However, as shown in Fig. 5, as soon as the models are fine-tuned on a small training data set from the actual phase diagram of interest, the errors drop significantly.This is particularly apparent for the models pre-trained on the related TAATA data set.
The model pre-trained on Materials Project data has been trained on completely relaxed structures but still, before being fine-tuned, performs worse or similar than guessing 0 eV/atom.This may seem surprising since the model has been trained on a large data set of relaxed structures.It should, however, be kept in mind that the computations for the materials in Materials Project have been performed with different computational parameters than those used for TAATA.To investigate the performance more closely, the predicted formation energies compared to the actual formation energies for the TAATA-Hf data set are shown in Fig. 6a.It is clear that without any fine-tuning the model severely underestimates the formation energy for many structures, leading to the high MAE.Fine-tuning with 60 data points (Fig. 6b) appears to compensate for this underestimation.We also compare this fine-tuning with training the model from scratch with 60 data points (Fig. 6c).When transferring to the other two data sets, we observe the same qualitative behavior, see the supplemental material [45].

C. Using CGCNN for generating a phase diagram
We now demonstrate the scheme described in Sec.III for predicting the full phase diagram of Hf-Zn-N, using a model transferred from Zr-Zn-N by fine-tuning with 60 data points.The development of the phase diagram during structure relaxation is shown in Fig. 7.The final convex hulls of stability is the same as one would arrive to by relaxing and computing formation energies for all candidate structures with DFT.However, by following the suggested scheme, there is a significant reduction in the number of such calculations that are needed.There are 4779 structures in the initial evaluation set, which are relaxed at the preprerelax step.After predicting the initial phase diagram (Fig. 7a), 2227 structures are kept and go through the prerelax step.From the predicted phase diagram at this level (Fig. 7b), 1210 structures are kept and are completely relaxed.Including the training data, 1270 structures are completely relaxed, instead of all of the 4839 candidate structures.Using the average computation times for the different steps, an equivalent of 1417 complete structure relaxations have been performed, which is less than 30 % of the initial number of candidate structures.

A. Prediction performance on the TAATA data set
The results in Fig. 2 show that CGCNN performs well when predicting formation energies of the materials in the TAATA data set.The models trained on the largest training sets of relaxed structures show accuracies close to that of the original CGCNN work [19], where a test MAE of 0.039 eV/atom was reported for formation energies of structures in Materials Project.However, the size of the training set for that model was much larger, roughly 37 000 materials.This indicates that this type of data, where materials are confined to only a few elements and for the most part are highly unstable, is indeed manageable for this model.CGCNN also outperforms KRR with the sine-based Coloumb matrix descriptor, further emphasizing that it is a powerful method, and that GNNs in general are an interesting way to further improve the application of machine learning to materials science.
The worsened performance of the model when the materials are not fully relaxed is not surprising, considering that the model relies on information about the geometry.Nevertheless, the model still seems to be able to extract useful information and learn also from these inaccurate descriptions, since the performance is improving as the training set is growing.Additionally, it seems like the error has not saturated at our largest training sets, indicating that the performance could further improve with more data.This can be seen as an indication that the model is able to overcome the fact that the materials are not relaxed and, in some sense, jointly learn the combination of structure relaxation and prediction of the formation energy.

B. Transfer learning
A model pre-trained on one phase diagram performs very poorly when applied to a different phase diagram without fine-tuning, which is perhaps surprising considering the similarity of the phase diagrams in the TAATA data set.We speculate that the reason for this is that For the preprerelax and prerelax steps, formation energies are predicted using CGCNN for structures not included in the training set.Dark green squares are structures above the convex hull, but with energy above hull within the threshold 2 σ 2 a + σ 2 e and have therefore been further relaxed.Light red squares are structures above hull with energy above hull outside the threshold, and have thus not been further relaxed.For the final level, all formation energies have been computed with DFT, and red squares indicates structures above the convex hull.The phase diagrams have been plotted with pymatgen [46].
the pre-trained model has only seen three different atom types, and in particular has not seen data with the new atom type introduced in the data set it is transferred to, meaning that it is unable to create a good representation of this element.This suspicion is strengthened by the fact that only a small amount of data for fine-tuning improves the performance considerably.However, we see the same behavior with poor performance before fine-tuning of the model pre-trained on the Materials Project data set, which has seen a large variety of atoms.In Fig. 6a we see that in this case, most of the prediction error stems from the model underestimating the formation energy.We believe that one reason for this is the fact that the model is pre-trained with mostly stable materials, which in general will have lower formation energies.This would then explain the underestimation in the predicted formation energies for TAATA, which mostly contain unstable materials.However, another reason could be the difference in computational parameters used to produce the data, which could lead to systematic errors.Either way, using only a small amount of training data for fine-tuning improves the performance by compensating for the underestimation of formation energies.We therefore conclude that while CGCNN is capable of handling also unstable structures, it is important that it is exposed to this during training.
Another observation that can be made in Fig. 6 is that after fine-tuning, the errors seem higher for materials with high formation energies, and that this difference is more prominent for the partially relaxed structures (Fig. 6b).This does not seem to be the case when training from scratch (Fig. 6c), and a reason for this could be that the pre-training on relaxed structures with lower formation energies is lingering even after fine-tuning.One could think of combining a model trained from scratch with a model pre-trained on Materials Project to obtain the best performance for all formation energies.
The fact that a small amount of extra data gives these significant improvements suggests that, if possible, it can be worthwhile to generate a small annotated training data set for the specific task at hand, instead of relying solely on models trained on some generic database.It seems that when the target data set is a certain phase diagram, pre-training on a data set like Materials Project improves over training a model from scratch, at least when the data availability is small.However, if possible, pre-training on a similar data set to the one under study improves the performance substantially.For example, when predicting formation energies of relaxed structures from TAATA-Hf the test MAE is 0.39 eV/atom when the model was trained from scratch, using 60 data points.When using a model pre-trained on Materials Project and fine-tuning it with 60 datapoints, the error lowers to 0.23 eV/atom.If instead using a model pre-trained on the TAATA-Zr data set and fine-tuned with 60 datapoints, the test MAE is as low as 0.049 eV/atom.Screening a large chemical space of similar compounds is not an uncommon application, and our results show that there is a possibility of very high gains for such applications by using transfer learning.

C. Using CGCNN for generating phase diagrams
Our method for generating a phase diagram with the help of carefully constructed CGCNN models is an example how our findings can be utilized.We acknowledge that the methodology in this example can be further improved; we have, for example, just chosen a training set at random, and our threshold for when a structure is removed from the relaxation process is chosen more or less arbitrarily.However, we think that the results are promising and indeed demonstrates the potential in using machine learning in combination with DFT in highthroughput screening.

VI. CONCLUSIONS
We have demonstrated that CGCNN can be trained to predict formation energies of materials that belong to a single ternary phase diagram and which for the most part are far from the convex hull.In this scenario, we obtain test MAEs in the range 0.038 to 0.046 eV/atom for three different ternary phase diagrams when using less than 4 000 training data points from the respective phase diagram.We have also shown that CGCNN can make informative predictions also when the structures are not fully relaxed, although with loss of performance compared to fully relaxed structures.Using transfer learning, we are able to transfer a model from one phase diagram to another, using very little additional data; in our case 60 materials are sufficient to get highly accurate predictions.We also compare this with transferring a model pre-trained on a much larger Materials Project data set.Interestingly, a model pre-trained on a different data set performs very poorly, often worse than predicting a constant formation energy for all materials.However, fine tuning the pre-trained model with 60 training data points can improve the performance drastically, especially when the data set used for fine tuning is similar to the target data.From this we also conclude that even though CGCNN can handle unstable materials from a single ternary phase diagram, it needs to be exposed to such materials during training to perform well.Finally, we have illustrated how these insights could be used to aid in the generation of a phase diagram by, during structure relaxation, using carefully constructed CGCNN models to predict the final formation energies and only continue relaxation of the structures that are predicted to lie close to the convex hull of stability.Even though it is a simple approach, it emphasizes that machine learning has a great potential for speeding up highthroughput screening.

FIG. 2 :
FIG.2: Test MAE for different relaxation levels and phase diagrams in the TAATA data set.There are no major differences between the three phase diagrams, and the errors for the different levels of relaxation are ordered as expected.

FIG. 5 :FIG. 6 :
FIG. 5: Test MAE when transferring different pre-trained models to the TAATA-Hf data set for relaxation levels final (left), prerelax (middle), and preprerelax (right) and comparing it with training a model from scratch.Training set size corresponds to the number of data points used for fine-tuning and does not include training data for pre-training.

FIG. 7 :
FIG.7: Development of the predicted phase diagram during structure relaxation after the (a) preprerelax, (b) prerelax and (c) final relaxation level.Blue circles are structures on the convex hull.For the preprerelax and prerelax steps, formation energies are predicted using CGCNN for structures not included in the training set.Dark green squares are structures above the convex hull, but with energy above hull within the threshold 2 σ 2 a + σ 2 e and have therefore been further relaxed.Light red squares are structures above hull with energy above hull outside the threshold, and have thus not been further relaxed.For the final level, all formation energies have been computed with DFT, and red squares indicates structures above the convex hull.The phase diagrams have been plotted with pymatgen[46].