A collinear-spin machine learned interatomic potential for Fe 7 Cr 2 Ni alloy

,


INTRODUCTION
Austenitic stainless steels are key structural materials with high mechanical strength and corrosion resistance.They have many applications, from everyday tools such as household appliances and cars, to structural components of buildings, bridges and industrial machines [1].Their desirable properties come from alloying Fe with a concentration of 16-20% Cr for rust resistance, and about 10% Ni to maintain a crack-resistant face-centred-cubic crystal structure at all temperatures where it remains solid (< 1800 K) [2].Specific properties of the steel can be fine tuned by adding small concentrations (< 1%) of other solutes such as Mo, Mn, P, C etc, depending on the application of interest [3].However the core properties come from the Fe-Cr-Ni mix, so in this work we focus on the prototypical austenitic steel with composition Fe 7 Cr 2 Ni.
A key application of austenitic steels is in nuclear power plants.The austenitic steel grades 304 and 316, which have a composition very close to the model alloy Fe 7 Cr 2 Ni, are used in the primary containment barriers of light-water nuclear reactor pressure vessels (RPV) [4].The RPV interiors are exposed to constant neutron bombardment and a wide range of thermal fluctuations from cryogenic temperatures during maintenance to operating temperature of ∼ 600 K.Over time, these harsh conditions trigger the formation of radiation-induced point defects in the RPV interiors, which eventually cluster to form larger defects such as voids and dislocation loops.Macroscopically, these manifest as creep, swelling, stress * lakshmi.shenoy@warwick.ac.uk corrosion, and other ageing effects which are safety concerns [5].To improve RPV designs and facilitate mitigation efforts, there is interest in understanding the mechanisms behind these ageing phenomena.
Experimental data for aging phenomena are limited due to the challenges in conducting controlled experiments in extreme conditions, and also challenges in measuring atomic quantities experimentally [6][7][8].We therefore require computational methods to simulate these phenomena across length scales, to fill in the gaps between the available experimental data, and to explain the observed aging effects.
There are various tiers of first principles methods available for materials modelling, with higher accuracy being accompanied by (prohibitively) higher computational costs.While more complex phenomena such as nuclear cascades or spectroscopic excitations require higher levels of theory, the first principles method of choice for modelling mechanical properties as in our case is density functional theory (DFT), in which the electronic structure of materials is calculated from approximations of quantum mechanics [9].There have been efforts to study the Fe 7 Cr 2 Ni alloy using DFT [5,[10][11][12].However, as the computational cost of DFT calculations scales cubically with system size, these studies were restricted to supercells containing upto 256 atoms.These supercells are too small to model extended defects such as voids or dislocations.Also, as DFT calculations are computationally expensive, these studies were restricted to limited statistics, which is not ideal for modelling the variations due to chemical complexity of concentrated alloys.
Computationally more affordable alternatives for atomistic modelling of materials that allow us to simulate larger length and time scales are classical interatomic potentials.These models have a fixed mathematical form and are fit to experimental and DFT data for the material properties of interest.For metallic systems, the main classical models used are the embedded atom models (EAM) [13], the modified EAM [14], and the Finnis Sinclair model [15].While there have been many efforts to study alloys using classical interatomic potentials [16][17][18][19][20], these studies are restricted to just a few properties owing to the fixed mathematical form of empirical models.For instance, Bonny et.al. developed a FeCrNi EAM for plasticity [18] and another FeCrNi EAM for point defects [19] (referred to as EAM-13 throughout this paper), but neither can simulate both types of defects as they were trained for their respective target properties.We would ideally like to have a model that can simulate multiple defects and their interactions.
A successful emerging approach that can address the shortcoming of both DFT and EAM approaches is machine learned interatomic potentials (MLIPs) [21].MLIPs are trained on a database of first-principles data, generally from DFT.Their highly flexible mathematical forms allow them to interpolate the training data smoothly and so fit the potential energy surface with near DFT accuracy, while at the same time being much cheaper to use for predictions than first-principles methods directly.A major advantage of the MLIPs over classical potentials like EAM is that they can be trained on multiple defects, by including representative configurations in the training database.Moreover, the accuracy and scope of an MLIP can be improved iteratively by appending new training configurations to its training database, chosen either manually using knowledge about the system of interest or using active learning schemes [22].
A range of MLIP approaches have enjoyed considerable success, including kernel methods [23] linear and non-linear expansions of the potential in a polynomial basis such as the atomic cluster expansion (ACE) [24] or moment tensor potential (MTP) [25] frameworks, and most recently message-passing neural networks [26][27][28].In this study, we use the Gaussian approximation potential (GAP) framework, a kernel based method [23].The GAP model has physically motivated kernel choices and hyperparameters, making it intuitive to understand.It uses Gaussian process regression to fit the potential energy surface; the Bayesian interpretation of this procedure allows us to derive error estimates for GAP predictions, which is a useful feature of this choice of MLIP.The GAP model has been fitted and validated for many elemental materials such as Si [29], W [30], Fe [31] and C [32], demonstrating the effectiveness of this model.More recently, it has also been used for binary alloys [33] and the ternary alloy Ge-Sb-Te [34].In this work, we fit and validate a GAP model for Fe 7 Cr 2 Ni.We also extend the GAP descriptors to incorporate the effects of the spin-polarised electronic structure of the alloy in terms of collinear spins associated with the Fe atoms, and demonstrate how this Spin GAP model improves predictions compared to the standard GAP model.FIG.1: Kernel principal component analysis (KPCA) of training data in the 3-species SOAP descriptor space.This approach allows us to identify gaps in the training configuration space; e.g.DB7-8 were included to sample the gap between the DB1-4 and DB5-6 clusters.
This paper is organised as follows: Section II outlines how the training database was assembled, and includes details of the standard GAP and Spin GAP fitting procedures.Section III discusses the main results for bulk and point defect properties of Fe 7 Cr 2 Ni using the Spin GAP.The results are validated against DFT and experiments, and compared with EAM and standard GAP predictions.

II. METHODOLOGY A. Training Database
The first step to training a MLIP is to assemble a training database of high accuracy data relevant to properties of interest.Our training database comprises configurations relevant to the bulk, finite temperature and point defect properties of the alloy.It can be categorised into eight sub-databases (DBx) based on configuration type, as summarised in Table I.Fig. 1 shows the kernel principal component analysis (KPCA) of the training database configurations using the SOAP kernel [35], coloured by subdatabases.The horizontal axis KPCA component can be interpreted as being roughly correlated with the temperature of the configurations, while the vertical axis KPCA component is correlated with strain.
Perfect-lattice bulk and mono-vacancy supercells (containing 256 and 255 atoms respectively) with different element distributions that capture relevant short-range ordering were generated using an effective medium theory (EM) as described in subsection II B. These were fully relaxed as shown in the schematic of Fig. 2, using DFT calculations as described in subsection II C, to generate the training subdatabases DB1 and DB3 respectively.Random strains were applied to a subset of DB1 samples, to generate DB2, aimed at providing reference material which can inform an accurate description of the alloy's elastic behaviour.Snapshots from the DFT geometry optimisation trajectory of the EM monovacancy structures were compiled to form DB4, to provide details of the potential energy surface close to 0 K.
The EM structures were generated to capture shortrange ordering at temperatures of 600 K, 1000 K, 1500 K and 2000 K respectively.We ran ab-initio molecular dynamics (MD) for the 1000 K-2000 K configurations from DB1 and DB3 at the temperatures they were generated for, and uncorrelated snapshots from these MD trajectories were compiled to form DB5 and DB6 respectively.In Fig. 1, we can see that the high temperature DB5-6 samples sit quite far apart from the low temperature DB1-4 samples along the first KPCA component.To aid the GAP model with better interpolation of training data, we need to populate such gaps between training data clusters along the principal axes of the kernel space.To this end, we extend the database by including snapshots from the geometry optimisation of a few bulk and monovacancy MD structures, comprising DB7 and DB8 respectively.These configurations are seen to sample the space between low and high temperature clusters in the KPCA plot Fig. 1, and their inclusion decreased the errors of our models.The GAP potentials (both with and without the Fe spin extensions) were trained on this combined database DB1-8 as described in section II D.

B. Effective Medium Theory
Past DFT studies on FeCrNi have modelled the alloy using special quasirandom structures (SQS) [5,10,11].A limitation of using SQS for magnetic materials is that in aiming to produce local environments reflective of the random alloy, they neglect atomic short-range order.It would be more effective to train a model using configurations in which temperature dependent short-range order is present, especially in the case of point defects where the energetics are correlated to the concentration of species decorating the defect [5].
To obtain atomic configurations with physically motivated atomic short-range order, we draw samples from equilibrated, lattice-based Monte Carlo (MC) simula-tions, as depicted by the first step in Fig. 2. A configuration is described by a set of lattice site occupancies, {ξ iα }, where ξ iα = 1 if site i is occupied by an atom of chemical species α, and 0 otherwise.The internal energy is described by a simple pairwise Bragg-Williams Hamiltonian [36,37], where the atom-atom interchange parameter, V iα;jα ′ , describes the energy associated with an atom of species i on site α interacting with an atom of species j on site α ′ .We assume these interchange parameters are isotropic, homogeneous, and have finite range, which simplifies Eq. 1.
The V iα;jα ′ are obtained using the S (2) theory for multicomponent alloys [38], the details and implementation of which have been discussed extensively in earlier works [39][40][41][42].The theory uses the Korringa-Kohn-Rostoker (KKR) formulation of DFT, with disorder described via the coherent potential approximation (CPA) [43], producing an effective medium representing the electronic structure of the disordered alloy.Our method captures the effects of an ordered magnetic state, in our case the single-layer antiferromagnetic (AFM) state, on the short range order state [41].It is also possible to treat vacancies in this formalism at low concentrations by including a chemical species with no associated electrons, and no nuclear charge.
In this work, we used the HUTSEPOT code [44] to construct the self-consistent potentials of DFT within the KKR-CPA formalism.The interchange parameters are fitted to the first four coordination shells, and they are tabulated in Appendix A for reference.The MC simulations used a cells of 256 atoms with periodic boundary conditions applied in all three directions.Simulations were equilibrated at the desired temperature and then decorrelated samples were drawn 25,600 MC steps apart, i.e. 100 MC steps per lattice site.The samples drawn from high temperature simulations will have little to no short-range order, while those drawn at lower temperatures will have a degree of short-range order present.These lattice-based EM configurations are then relaxed with DFT using setting described in the next section, to get the cell distortion and off-lattice atomic displacements as depicted in the last step of Fig. 2.

C. DFT Settings
The Vienna Ab-initio Software Package (VASP) [45][46][47] was used to compute ground state DFT structure, energy, atomic forces and cell stresses for the training database.MLIPs can only be as accurate as the data they are trained on, so the choice of DFT settings cap the achievable accuracy.A key choice when setting up DFT calculations is which exchange-correlation (XC) functional to use.For structural properties of transition metals, the Perdew-Burke-Ernzerhof [48]  generalised gradient approximation (GGA) functionals is well established in the literature as a good choice, and so this is the XC functional we use.However, for more complex phenomena involving electronic excitations, we would require beyond-GGA approaches such as hybrid exchange, or even beyond-DFT methods such as the GW approach, and so we cannot hope to model such phenomena using a MLIP trained on GGA-based DFT.
Depending on the range of atomic interactions that we hope to model, the choice of pseudopotential would also affect the DFT accuracy.For example, modeling nuclear cascades would require very short-range interactions that benefit from all-electron DFT codes or semi-core pseudopotentials [49].However, since we are targeting mechanical properties, the key players are the outer shell electrons of the alloying elements, so the accuracy we require is not compromised by using softer pseudopotentials for the core electrons.We use the standard VASP pseudopotentials based on the projector augmented wave (PAW) method [50], with eight, six and ten valence electrons for Fe, Cr and Ni respectively.
As Fe and Ni are magnetic materials, and Cr has shown a spin-density wave at low temperatures [51,52], it is essential to turn on the spin setting for more accurate DFT calculations of this alloy.Piochaud et al. reported that their tests with non-collinear spins generally relaxed to collinear arrangements [5], so we constrain our systems to collinear spin.Also, while austenitic steel is paramagnetic (PM) at operation temperatures of nuclear reactors, single-layer anti-ferromagnetic (AFM) ordering was found to be the most stable ground state [5].To ensure we get ground state properties right, we impose an AFM ordering for DFT relaxations of our training samples.Initial magnetic moment magnitudes are overestimated to allow for relaxation, and are set at 3.0, 2.0 and 1.0 µ B for Fe, Cr and Ni respectively.These atomic magnetic moments evolve during DFT electronic minimisation, and are computed in VASP by integrating the spin density within a sphere centred around the atom of interest.
A Monkhorst-Pack k-point mesh [53] of size 3 × 3 × 3 was used to sample the Brillouin zone for all the training samples.Training data was computed with a planewave energy cutoff of 600 eV and an electronic selfconsistency criterion of 10 −7 eV.Preliminary steps, i.e. initial geometry optimisations and ab-initio MD trajectories, were done using a lower self-consistency criterion of 10 −4 eV.Electronic minimisation were performed using the preconditioned RMM-DIIS algorithm, while ionic position optimisations were done using the conjugate gradient method.The ab-initio MD trajectories were run for the NVT ensemble, using the Nose-Hoover thermostat [54,55] and timesteps of either 2.5 fs (for T=1000 K,1500 K) or 5 fs (for T=2000 K).The initial ∼ 200 steps (exact number depending on when the total energy stabilises) of the trajectories were discarded, and uncorrelated samples were taken from the remaining 1000-1500 steps of each trajectory.These were then evaluated with the higher precision DFT settings before including in the training database.

D. Potential Fitting
This section outlines how the GAP fits the Born-Oppenheimer potential energy surface (PES) by mapping the DFT training structures to their DFT energies, forces and stresses.As with other interatomic potentials, the GAP assumes that the energy E N of a configuration with N atoms can be written as where ϵ(q i ) is the energy contribution from atom i in the configuration, and qi is a descriptor vector that captures the unique features of the atom i environment.The atomic energies are modelled as a linear combination of kernels [56] given by where the kernel basis functions K(q j , qi ) capture the similarity between environment i and the M environments of the basis set.As GAP training databases are generally large (eg.N ∼ 160, 000 atomic environments in this work), the usual strategy is to use only a small portion M ≪ N of the training set for the basis set.This makes the fit computationally tractable, while still retaining accuracy as the training database contains many groups of similar configurations that can be sparsified without much loss of information.These M sparse points are selected by CUR decomposition, which searches for maximally dissimilar environments [29], ensuring that the chosen basis is representative of the variety across the full training database.
Moving to the left-hand side of Eq. 3: DFT does not give direct access to the atomic energies ϵ, but rather, it gives us total energies E N of the training configurations, along with atomic forces and virial stresses.As forces and stress are properties that can be derived from atomic energies (e.g.∂/∂x for an atomic force along the Cartesian x direction), we can back-calculate the unknown N component vector ϵ of atomic energies, from the known D component vector y of DFT data, by the relation y = L T ϵ, where L is a linear differential operator L [30].This allows us to derive the covariance kernel of the unknown target atomic energies (K N N ) from the covariance kernel of the known DFT training data (K DD ) using the relation K DD = L T K N N L. The sparse ridge-regression solution to Eq. 3 regularised by a diagonal tolerance matrix Λ = σ 2 ν I is With these optimised coefficients α, we can make predictions for new atomic environments q * using Eq. 3. In this work, we use the Smooth Overlap of Atomic Positions (SOAP) descriptor where c nlm are expansion coefficients for the atomic neighbour density power spectrum [35].For computational tractability, this sum is truncated at radial-basis functions up to order n < n max and spherical harmonics up to degree l < l max .The SOAP descriptor is suitable for describing atomic environments since it is translationally, rotationally and permutationally invariant.[35].
The corresponding covariance kernel is where δ is a factor that corresponds to the spread in energy of the training dataset, and ζ is tuned to amplify variations in the dot product, to pick out fine features.While the SOAP kernel is good at capturing smoothly varying details of the PES, it is not as effective in capturing sharp features as well, such as exchange repulsion when atoms get close to each other, or phase transitions [30].It has been found to be useful to use multiple kernels targeting different scales, so that each kernel does not have to simultaneously fit vastly different features [29].We hence use simple two-body kernels as well, to roughly capture the main trends of the PES, so that the SOAP kernels can focus on finer details.The two-body descriptor is just the pairwise separation between atoms, and we use the squared-exponential form for the two-body kernel, where θ j is a lengthscale hyperparameter.For a 3-species alloy, we require 6 two-body kernels (one for each pair of species), and 3 SOAP kernels (one for each species).The kernel hyperparameters of Eq. 5-6 and Eq. 7 that we use are summarized in Table II.In addition these, we σ energy ν = 0.001 eV/atom, σ force ν = 0.1 eV/ Å, and σ virial ν = 0.05 eV/atom for the regularization described in Eq. 4, values chosen based on the accuracy of our DFT training data.The MPI parallel gap fit code from the QUIP software package [57,58] was used to fit the GAP on the full dataset of Table I.
The cumulative distribution functions of the training and test energy errors of the resulting non-magnetic GAP are shown in Fig. 3. Similar plots for the force and stress errors have been included in Appendix B. The GAP has energy training and test error within 2 meV/atom and 6 meV/atom respectively, which are comparable to the differences in DFT energies due to choice of AFM layering direction.The train and test force errors are below 200 meV/ Å for low temperature configurations and 400 meV/ Å for high temperature configurations.The relatively large force errors for low-temperature configurations are because the GAP cannot fit the directional spininduced face-centered tetragonal (FCT) ground state structure, as the GAP does not have spin degrees of freedom.The stress training and test errors are within 2 GPa and 4 GPa respectively, which are small as they are less than 2% of the typical elastic constant values that are about ∼200 GPa.Error in energy per atom (meV/atom)

E. Spin Model
As the SOAP and two-body descriptors do not have any information on spin, the standard GAP is spinagnostic and cannot capture the tetrahedral deformation of AFM layering, and this limits the accuracy of its predictions.To address this, we introduce a fictitious fourth species to the GAP model, such that we have different descriptors for up spin (Fe↑) and down spin (Fe↓).This expanded descriptor set gives the model the flexibility to learn the tetrahedral deformation of AFM layering, and could be extended to the PM phase too.We henceforth refer to this 4-species potential as the Spin GAP.
As the SOAP descriptor length scales quadratically with number of species, slowing down training and evaluation times prohibitively, we needed to use descriptor compression methods to make the training computationally tractable, such as those introduced in [59].We use combined radial and species compression with tensor product coupling across 300 mixed radial and species channels.To check the effects of compression, a compressed version of the 3-species GAP was also fit.Fig. 3 show that the compressed GAP has comparable errors to the standard GAP, verifying that there is no significant loss of accuracy when compression is used for this system.On the other hand, the Spin GAP, has significantly improved training and test errors over those of the standard GAP.Typical energy errors decrease by 1 meV/atom, force errors by about 50 meV/ Å, and stress training errors decrease by about 0.2 GPa.

A. Bulk properties
The ground state bulk properties of the FeCrNi alloy predicted by the GAP and the Spin GAP are summarised in Table III, along with the corresponding DFT, EAM and experimental values for comparison, as well as DFT for pure Fe in an AFM spin state.The DFT results for the ground state structure of both AFM-Fe and AFM-FeCrNi are FCT, with the longer lattice constant corresponding to the AFM layering direction.This tetragonal distortion is more significant in pure AFM Fe (∼ 0.26 Å) compared to the alloy (∼ 0.08 Å), as the high concentration of Cr and Ni in the alloy lattice reduce the magnetism-induced geometry effects.Relaxation of EM structures with EAM-13 gives a lattice constant within the DFT and experimental range; however, EAM-13 fails to predict the lattice distortion, leading to a symmetric face centred cubic (FCC) lattice instead, as seen from the first set of bars in Fig. 4a.
In comparison, the standard GAP predicts that a distorted structure is more energetically favourable than a symmetric FCC structure, as it is trained on relaxed DFT structures that have FCT lattices with zero atomic forces.However, since the standard GAP has no spin degrees of freedom, it does not have the framework to process the spin inputs required to elongate one direction over the other two.Instead, the GAP training correlates the distortion to minor differences in the alloy composition along the different axes in the relaxed training set geometries, and so predicts a face-centred orthohombic (FCO) relaxed lattice based on composition along the different axes of test structures.As seen from the second set of bars in Fig. 4a, the three lattice constants of the GAP relaxed FCO structure are equally spaced in a similar range to that of the two DFT relaxed FCT lattice constants.The Spin GAP on the other hand, has a more flexible model form that can capture the FCT distortion associated with alternating Fe↑ and Fe↓ layers.The third set of bars in Fig. 4a show that the Spin GAP successfully predicts lattice constants in excellent agreement with DFT, and the experimental values in Table III.
The cumulative distribution functions of the non-zero elastic matrix components of EAM-13, Spin GAP and DFT are shown in Fig. 4b.As the EAM predicts an FCC structure, it has only 3 distinct non-zero elastic constants {C 11 , C 12 , C 44 }.The DFT and Spin GAP predict an FCT structure, and so have 6 distinct non-zero elastic constants {C 11 , C 33 , C 12 , C 13 , C 44 , C 66 }.The standard GAP on the other hand has 9 distinct non-zero elastic constants as it predicts an FCO structure, the mean values of which are in Table III.
As seen in the first two sets of bars in Fig. 4b, the Spin GAP slightly overestimates C 11 and C 33 by ∼8% each, but gets their relative magnitude right.This is a significant improvement from EAM-13 which overestimates the DFT C 11 by 30%.The EAM-13 C 11 is in Given that the Spin GAP predicts the different elastic constants well, it unsurprisingly also predicts the equation of state reasonably well.Fig. 5a compares the three Spin GAP E-V curves (AFM spin layering along X, Ŷ and Ẑ directions) of a 256-atom test configuration with respect to its DFT EV curves.We can see than the Spin GAP EV curves fall within the range of the DFT EV curves, even if they do not match exactly.The Spin GAP also gets the relaxed volume (E-V curve minimum) right, whereas the EAM-13 overestimates the relaxed volume as it is restricted to the FCC lattice.The bulk modulii computed from fitting an equation of state to these Spin GAP E-V curves are in very good agreement with DFT, overestimating it by just 4 GPa (∼ 2%), as seen in Fig. 5b (green distribution vs black dashed DFT reference), whereas the EAM-13 (pink distribution) overestimates the bulk modulus by 21%.
As the Spin GAP is orders of magnitude faster than DFT, it allows for much more sampling of different random configurations, or larger supercells than the 256atom ones used for DFT.Fig. 6 shows the mean and spread (one standard deviation) of the Spin GAP relaxed energy distributions at different supercell sizes, computed over 40 different configurations for each size.We see that the Spin GAP relaxed energy converges for supercells with more than 200 atoms, and its converged en- ergy is in good agreement with the DFT reference values (black crosses) within the converged size range.For smaller sizes, as is the case when applying DFT to random alloys, finite size effects are significant leading to unconverged energy predictions.Also, the distributions at smaller sizes are broader, because each configuration's energy is computed over a smaller set of random local environments, leading to a larger variance between different configurations.
For the cell size used for training (256-atoms), the full distribution of Spin GAP relaxed energies is in Fig. 10a, and we can see that its peak agrees very well with the DFT mean.The spin-agnostic GAP on the other hand underestimates the ground state energy by about 2 meV/atom.We attribute this underestimation to the fact that, while the GAP correlates the tendency to distort to minor changes in composition, it associates FCT directionality to noise as it does not have spin degrees of freedom.The GAP training regularizes the forces of the low energy DFT training structures, and extrapolates to predicting FCO as a lower energy structure than FCT.The Spin GAP overcomes this model form error by associating the tetragonal distortion with the AFM layering.

B. Chemical potentials
An essential ingredient for accurately computing energetics of point defects and their clusters is the chemical potential µ of each element in the alloy.This is the energy required to move an atom of the element from vacuum (infinitely far away from any interactions) to its bulk lattice site in the alloy supercell.The energy E A→B twice the standard deviations of Spin GAP and EAM-13 vacancy formation energies E vac of 200 different EM configurations respectively, split by vacancy element type.The DFT mean and range of E vac are taken from [5].required to substitute an element A with another element B at a lattice site in an alloy is equal to This gives three equations for A,B ∈ {Fe,Cr,Ni}, of which only two are independent.To solve for the three unknown chemical potentials, we require three independent equations, and we can get this third equation from the sum of chemical potentials of the N atoms in a configuration, which gives the bulk energy E N of that configuration.For Fe 7 Cr 2 Ni, this is expressed by the equation The chemical potentials µ Fe , µ Cr and µ Ni can be determined for any lattice site in a cell by computing the two relevant substitution energies, and then solving the system of equations defined by Eq. 8-9.For the Spin GAP, we solve this 3-species system of equations twice: first for {Fe↑,Cr,Ni}, and then for {Fe↓,Cr,Ni}, and we then take the average of the two resulting chemical potentials per species to get µ Fe , µ Cr and µ Ni .
This procedure was carried out for randomly chosen lattice sites of 800 different configurations using the Spin GAP.The distributions of substitution energies are shown in Fig. 7a, with reference DFT minimum values E A→B subs from [5].We see that the minimum substitution energies from literature (using the same DFT settings as this work) agree well with the minima of the corresponding Spin GAP substitution energy distributions.The resulting chemical potential distributions are shown in the violin plots of Fig. 7b.DFT chemical potentials averaged over 5 configurations are also labelled using black crosses.We can see that the Spin GAP ⟨µ Fe ⟩ agrees very well with DFT.The Spin GAP ⟨µ Ni ⟩ and ⟨µ Cr ⟩ are slightly offset from the corresponding DFT averages by +2% and −2% respectively; this is probably a result of not accounting fully for the magnetic nature of Ni and Cr in this alloy.

C. Monovacancies
To compute vacancy formation energies, we start with a relaxed bulk configuration of N atoms and energy E N , remove one atom of a given species 'el', and then carry out a fixed-cell relaxation of the atomic positions to get the energy E el N −1 of the vacancy configuration.As the vacancy configuration has one atom less than the corresponding bulk system, the comparison of the two configurations requires the chemical potential µ el of the atom removed to form the vacancy, giving the formula where quantities (µ el +E el N −1 ) and E N both have N atoms each.We first compute the intermediate step (E el N −1 − E N ) for 800 different configurations using the Spin GAP and the results are shown in Fig. 7c.In this case, the Spin GAP result for Cr agrees well with DFT, whereas those for Fe and Ni are underestimated.These offsets are consistent with the Spin GAP underestimating E Fe→Ni slightly, as seen previously in Fig. 7a.
In fact, the small species-dependent errors in µ el and E el N −1 − E N cancel each other out to give a consistent behaviour in vacancy energy E vac , irrespective of which element was removed to form the vacancy.This is evident from the consistent mean and spread of the E vac energy distributions denoted by the bars in Fig. 7d.This cancellation of errors makes sense as µ el involves introducing an atom into the bulk whereas E el N −1 E N involves the opposite procedure of removing an atom from the bulk structure.The resulting prediction of E SpinGAP vac =1.80 eV is in good agreement with the DFT value of E DFT vac =1.96 eV from the literature [5].This prediction for E vac is significantly more accurate than the standard spin-agnostic GAP trained on the same database which predicts E GAP vac =1.6 eV.The Spin GAP also performs marginally better than EAM-13, which gives a mean prediction of E vac =1.77 eV as seen in Fig. 7d.We attribute the remaining error of -0.2 eV in the Spin GAP prediction to the incomplete magnetic model, as we do not account for spin polarisation of the electron density of Cr and Ni or for variations in Fe magnetic moment magnitudes near to defect sites.A DFT study on this alloy finds that the DFT vacancy formation energy drops from 1.98 eV down to 1.80 eV when they switch from imposing paramagnetic spins to ferrimagnetic spins on the same chemical composition, showing that changes in the magnetic state alone can indeed lead to an order of 0.2 eV changes in E vac [12], further suggesting that the Spin GAP's underestimation of E vac is likely due to limitations of our spin model.On the other hand, a recently reported fully noncollinear magnetic atomic cluster expansion (ACE) model for iron also makes similar magnitudes errors in vacancy formation energies (errors of 0.4 eV and 0.2 eV when trained without and with extended defect subdatabases respectively), despite being a much more rigorous magnetic model [63].In their case, the errors probably come from challenges in sampling their multiplicatively larger spin space (as spin magnitudes and interspin angles are included).Whereas in our case, our simpler spin model comes with a relatively constrained spin space that is much easier to sample well.It is a promising result that we get comparable performance, suggesting that our spin model captures the main magnetic contributions despite being relatively simple.It would be interesting to try relaxing a few of the assumptions of our simple spin model while still keeping the spin space manageable, but that is beyond the scope of this paper and will be investigated in future works.

D. Atomic Short Range Order
DFT studies [5,10]  The computational expense of calculating chemical potentials (two fixed-cell relaxations of element substitutions Eq. 8 for each vacancy) makes it impractical to compute it for every vacancy using DFT, but this is now possible to run cheaply with the Spin GAP.Fig. 8 from left to right shows the trends in energy difference E el N −1 − E N , the corresponding chemical potentials, and the vacancy formation energies with respect to number of Cr neighbours as predicted by the Spin GAP.These results show that the chemical potential is also correlated to the number of Cr neighbours, with the opposite slope to the energy difference E el N −1 − E N .Computing vacancy formation energy involves adding up these two quantities (Eq.10) and so their opposing slopes offset each other, reducing the overall magnitude of positive correlation between E vac and number of Cr neighbours.Therefore, while we get consistent gradient of 0.02 eV/Cr for our E el N −1 − E N and the E vac from the literature (see values in Table IV), the inclusion of variations in chemical potentials decreases the overall correlation.This makes sense as the chemical potential accounts for majority of the heterogeneity in the vacancies neighbourhood.The remnant positive correlation is likely because of the volume of the vacancy which has previously been seen to affect stability [10]: as Cr atoms are the smallest species in the alloy, they relax less into the vacancy, leading the larger vacancies than are less stable and have a higher formation energy.
The inverse trends (but to a weaker extent) of the three subplots of Fig. 8

E. Alloy Composition
In the 304 and 316 austenitic stainless steel grades, the typical Cr and Ni concentrations can vary upto ±4% from the model 70:20:10 ratio of Fe, Cr and Ni in our training set.It would hence be useful to check if our Spin GAP can extrapolate to small variations in the alloying ratios.Fig. 9 shows the results for bulk relaxation for configurations within ±4% variations in Cr (blue) and Ni (orange) concentrations.We see very good agreement with the DFT reference values in this range, validating that the Spin GAP can indeed be used to extrapolate to compositions about the training ratio.This is because the GAP framework breaks down each configuration into The gradients of the linear trends in Fig. 9 are 0.0132 eV/%Cr and 0.0280 eV/%Ni.These convert to 1.32 eV and 2.80 eV for single atom Cr→Fe and Ni→Fe swaps respectively, which are consistent with the respective mean substitution energies of Fig. 7a.This linear trend indicates that for small variations in alloying concentrations, the substitution energies do not change much.The chemical potential however would vary, since their computation involves the bulk energy as well via Eq. 9, and these bulk energies vary linearly as seen in Fig. 9. Since we have accurate bulk and substitution energies for the ±4% concentration range, we can expect all derivative properties computed in the previous results sections to follow through reliably.

F. Paramagnetism
Although the Spin GAP was trained only on AFM DFT data, in this section we assess how well it extrapolates to the paramagnetic (PM) state.Comparing the magnetic ground state energies, Fig 10a shows that the Spin GAP correctly predicts the PM ground state of this alloy to be of higher energy, and hence less stable at 0K, than the AFM ground state.The unshaded orange distribution comprises energies of the starting configurations initialized with perfectly random 50% up and down spin iron atoms, whereas the shaded orange distribution comprises their equilibrated energies after using Metropolis Monte Carlo with spin flip moves to optimise their para- magnetic spin states.We see that the equilibrated PM energy distribution agrees very well with the mean PM energy from DFT (orange dashed line in Fig. 10a), showing that the Spin GAP can be used to generate reliable PM spin state configurations despite not being specifically trained for it.Fig. 10b shows the direct correlation plot for energies of the 13 paramagnetic DFT test structures, where Spin GAP was used to evaluate these test structure with the same spin state as that from DFT.We see that the Spin GAP reproduces the DFT energies of these paramagnetic configurations very well, as all the points lie along the diagonal with the maximum deviation being 6 meV/atom.For the PM ground state structure, DFT predicts a FCC structure with an average lattice constant of 3.53 Å.The Spin GAP accurately predicts the paramagnetic FCC structure, with lattice constants that agree remarkably well with DFT, as seen from the orange bars of Fig. 11.The lattice constants of all samples were sorted in ascending order and assigned to a, b and c, respectively, and Fig. 11 shows their mean values.The small ∼0.005 Å difference between the three PM lattice constants is probably due to minor variations in ratio of the three species along the three axes, and this too is consistent between the Spin GAP and DFT.As seen from the green bars in Fig. 11, the Spin GAP agrees reasonably well with the paramagnetic elastic constants as well, agreeing within 5-8% to their respective DFT reference values from the study by Antillon et.al.[12].The fact that the Spin GAP reproduces these properties is a promising result, as the paramagnetic state is relevant to modelling the temperature range of most applications of Fe 7 Cr 2 Ni based austenitic steels.

IV. CONCLUSION
In summary, we have developed two machine learning interatomic potentials -a standard GAP and a collinear Spin GAP -for a model Fe 7 Cr 2 Ni alloy representative of austenitic steels.They are each trained on the same DFT database containing 159k atomic environments, and validated against as independent test set.We demonstrate the shortcomings of the standard GAP in not being able to capture the ground state tetragonal distortion due to AFM layering observed in this alloy, as the standard GAP formalism cannot account for magnetic spins.We then proposed an extended model incorporating spin to correct for this issue, and verified that the Spin GAP predicts the ground state structure, energy, elastic properties and vacancies in very good agreement with DFT.
Atomistic modelling using this Spin GAP could be used to compute high accuracy inputs to larger scale microstructural models.Our Spin GAP is seen to extrapolate well to small variations in alloy composition making is useful for atomistic studies of more variety in austenitic steel grades.The spin model is also seen to describe the paramagnetic state well, which is relevant to modelling the alloy at operating temperatures of the steel components in nuclear and industrial machinery.Beyond the properties validated in this paper, it could also be extended via iterative re-training to model more phenomena of interest such as diffusion, phase transformations or responses to radiation.For instance, by adding and testing the description of grain boundary structures, the Spin GAP could be used to study segregation of Cr at grain boundaries under radiation.Another avenue for future work would be to extend this Spin GAP to study hydrogen embrittlement in austenitic steel, relevant for FIG. 2: Protocol for generating bulk and vacancy training structures that sample different elemental configurations, with realistic short-range ordering derived from effective medium (EM) theory.In this case, the DFT relaxation off-lattice displacements shown by the red arrows of step 3 are for AFM layering along the x axis (AFM-X).

B
FIG. 5: (a) Energy volume (E-V) curves predicted by DFT (crosses) and Spin GAP (solid lines) with AFM layering in x, ŷ and ẑ directions, and the corresponding spin-agnostic EAM-13 (dotted pink line).(b) Bulk modulus computed from a Birch-Murnaghan equation of state fit to the E-V curves.

FIG. 6 :
FIG.6: Convergence of Spin GAP ground state energy with respect to supercell size.The green dots and bars show the mean and standard deviation respectively, of the Spin GAP energy distribution of 40 configurations at each cell size.The Spin GAP energy converges to 1 meV/atom for cells with >200 atoms, and the converged energy agrees well with DFT (black crosses).

FIG. 7 :
FIG.7:(a) Distribution of substitution energies for elemental swaps predicted by Spin GAP (100 configurations each) and corresponding DFT minima (dashed lines) computed from 40 DFT calculations performed by Piochaud el al.[5].Distribution of (b) chemical potentials µ and (c) energy differences E N −1 − E N predicted by the Spin GAP from 800 configurations (256-atom) with respect to DFT means from 5 configurations (256-atom).(d) Mean and twice the standard deviations of Spin GAP and EAM-13 vacancy formation energies E vac of 200 different EM configurations respectively, split by vacancy element type.The DFT mean and range of E vac are taken from[5].

FIG. 10 :
FIG.10:(a) DFT mean values (dashed lines) and Spin GAP distributions for ground state energies for AFM (green) vs PM states using uniform random 50% up-down iron spins (unshaded orange) and after Monte Carlo optimisation with spin flip moves (shaded orange), comprising 400, 50 and 50 configurations respectively.(b) Correlation plot of Spin GAP relaxed energies for 13 PM reference DFT configurations with respect to their DFT relaxed energies.

FIG. 11 :
FIG. 11: Paramagnetic lattice constants (orange bars) and elastic constants (green bars) of the Spin GAP averaged over 50 configurations, compared to their respective DFT reference values (black bars).The DFT reference for lattice constants are an average over our 13 DFT PM test configurations, while those for the elastic constants are from the study by Antillon et.al.[12].

TABLE I :
Number of training and testing configurations for each subgroup DBx, comprising either 256-atom bulk or 255-atom vacancy configurations.

TABLE II :
Parameters for the collinear-spin GAP.

TABLE III :
Lattice parameters and elastic constants of the FeCrNi alloy computed using DFT, GAP, Spin GAP and EAM-13 at 0 K.For comparison, DFT results for AFM Fe at 0 K and experimental values for finite temperatures have also been included.Dashes indicate that the value is equal to the previous value by symmetry.
4b, while EAM-13 predicts an averaged value between the DFT [C 44 , C 66 ] range.Overall, the GAP elastic constants agree well with DFT.The 8-10% overestimation of modes {C 11 , C 33 , C 13 } indicate that there are still errors in getting the behaviour of the long vs short axis of the spin-induced tetrahedral deformation.This is likely because our spin model is still quite a simple approximation to the full physics of the alloy -it excludes contributions from Cr and Ni spins and ignores spin magnitudes.

TABLE IV :
[5,10] correlation of vacancy formation energy with respect to number of first nearest neighbours (#1nn) of type Cr and Ni respectively, from the literature[5,10](column 2-3) and our work (column 5), and for (E Fe N −1 − E N ) from our work (column 4).
[10]lude that vacancies in the Fe 7 Cr 2 Ni are more stable in Ni-rich environments and less stable in Cr-rich environments.The gradients reported in the literature for the linear correlation between E vac and number of Cr and Ni neighbours respectively have been included in Table IV for reference.The two studies agree well on the correlation with Ni neighbours, whereas Manzoor et.al.find a weaker correlation with respect to Cr neighbours compared to Piochaud et.al., which they attribute to the fact that they have four times the amount of sampling[10].In both cases however, they use constant pre-computed values for the chemical potentials, so trends in their vacancy formation energy Eq. 10 are effectively proportional to trends in just E el N −1 − E N .Linear correlation between (a) energy difference of vacancy vs bulk configurations E Fe N −1 − E N for vacancies generated by removing an iron atom each from the 200 different bulk configurations, (b) corresponding iron chemical potentials µ Fe for those vacancy sites, and (c) corresponding vacancy formation energy E vac computed by adding the quantities of (a) and (b), all with respect to number of Cr atoms amongst the 12 nearest neighbours of the vacancy.
are seen with respect to number of Ni neighbours, the gradients of which are reported in Table IV.The Spin GAP predicts a smaller negative correlation of -0.01 eV/Ni for E el N −1 − E N compared to the -0.04 eV/Ni values in the literature.Similar to the discrepancy between the two DFT studies for Cr trends, our lower correlation could be due to more variety in sampling.Piochaud et.al.generate all their vacancy structure by removing different atoms from one 256-atom SQS structure, and Manzoor et.al.do the same over two 256atom SQS structures, leading to a large overlap in the neighbourhood of the vacancies they consider.On the other hand, our Spin GAP results were computed for vacancies introduced in 200 different starting structures from effective medium theory, and so sample a larger variety in long range vacancy environments.Despite the weaker correlations, we do still get a small negative relation of -0.004 eV/Ni for E vac and number of Ni neighbours.Similar to the previous reasoning on stability of vacancy volumes, Ni atoms are the largest species in the alloy, so having more Ni neighbours leads to smaller vacancy volumes which are marginally more stable.