Multilayer atomic cluster expansion for semi-local interactions

Traditionally, interatomic potentials assume local bond formation supplemented by long-range electrostatic interactions when necessary. This ignores intermediate range multi-atom interactions that arise from the relaxation of the electronic structure. Here, we present the multilayer atomic cluster expansion (ml-ACE) that includes collective, semi-local multi-atom interactions naturally within its remit. We demonstrate that ml-ACE significantly improves fit accuracy compared to a local expansion on selected examples and provide physical intuition to understand this improvement.

Recent years have seen tremendous progress in modelling atomic interactions [1][2][3][4][5][6][7] . State of the art machine learning interatomic potentials interpolate reference data from high-throughput electronic structure calculations with errors on the order of meV/atom [8][9][10] . Commonly the energy or other atomic quantities are represented as a function of the local atomic environment enclosed within a cutoff radius centered on each atom. Contributions to the energy from electrostatics cannot be partitioned into local atomic environments and methods to incorporate such long-range interactions efficiently have been developed 11 , including self-consistent models that mimic the charge transfer of the underlying electronic structure calculations 12,13 .
However, the true potential energy from electronic structure calculations contains contributions that evade a local chemical description and cannot be captured by long-range electrostatic models either, even if selfconsistent charge transfer is included. We introduce the term "semi-local" for interactions that reach significantly beyond the local atomic environment but are not directly associated to long-range charge transfer or directed bond formation. Semi-local interactions are ubiquitous in density functional theory (DFT) and arise from the relaxation of the electronic structure, yet they have not been discussed in the context of machine learning potentials. Examples are the change of interaction in small clusters with size that approach bulk interactions only slowly; intra-atomic occupation changes upon hybridization, such as the promotion of electrons in carbon from s to p states, that alter the carbon bonding characteristics; variation in atomic hybridization in different atomic environments that induces metal-insulator transitions with consequences for the decay of the density matrix and related bond formation; electronic states along one-dimensional chains that can extend far beyond the local chemical environment.
We build our analysis of semi-local interactions on a local description of the electronic structure and expand the DFT energy with respect to modifications of the density matrix [14][15][16][17] , with H iαjβ = iα|Ĥ|jβ and where we further took the orbitals to be orthonormal and complete. This enables to reconstruct the density matrix as a linear combination of products of Hamiltonian matrix elements of varying order 7,19,20 , where the response functions χ N = χ N (E F ) depend on the Fermi level. In tight binding approximation offdiagonal Hamiltonian matrix elements H iαjβ with i = j depend only weakly on electronic redistribution, while the diagonal elements H iαiα follow the effective oneparticle potential and adjust to optimize energy for hybridization and charge transfer 15 . The detailed change ∆H iαiα is a function of the local atomic environment of atom i, which may be understood from the moments theorem Eq.(2) applied to the local density of states n iαiα (E). For example, in a metal often charge transfer is negligible and ∆H iαiα adjusts to variations in the local density of states to keep the number of electrons on an atom constant. Therefore electronic relaxation ∆H kγkγ on atom k will affect the density matrix between atoms i and j to lowest order as In general the off-diagonal Hamiltonian elements decay rapidly with distance between atoms i and j and local  Here, we present a general framework for integrating semi-local interactions from electronic structure calculations efficiently into machine-learned potentials (MLPs). This is achieved by first describing the atoms according to their local atomic environment before modelling the chemical bond formation between the atoms. In a first step, atoms condense information about their local environment into an indicator field. Next, atoms condense indicator fields of their neighbors into their own indicator field, see Fig. 1. This is repeated and with each layer information from atoms at larger distances is incorporated, mimicking the electronic structure relaxation cascade in self-consistent calculations and enabling the description of collective interactions that extend multiple times beyond the local cutoff radius.
The Atomic Cluster Expansion (ACE) 6,21-23 provides a complete and efficient local representation of the energy as a sum over atomic contributions, where F is a general non-linear function. Each atomic property ϕ (p) i is given by a linear expansion, with expansion coefficientsc and multi-atom basis functions A iv v v .
Other variables than the atomic positions may be taken into account 24  magnetic moments may be assigned to the atoms and the resulting energy depends on the values of these variables. To this end the state σ σ σ of the system is introduced that collects all necessary variables and comprises edges and vertices. We extend ACE to include semi-local interactions by extending the description of the state of an atom by an indicator field θ θ θ. The field is general, it may consist of different contributions and comprise several scalar, vectorial or tensorial elements. For an expansion on atom i, the state of a neighboring atom j is defined as while the state of atom i is given by where z i , r i denote, respectively, the chemical element and position of atom i. For simplicity we left out possible dependencies on charge, magnetism, etc. These variables are absorbed into a complete set of single-particle, respectively single-bond basis functions φ v (σ ji ), from which the atomic base is computed and n A ivn of various order form a complete set of basis functions and enable Eq.(6) to represent any atomic function of σ σ σ. The expansion remains essentially unchanged when carried out for vectorial or tensorial objects 24 .
Often it is pragmatic to choose product basis functions of the form where v collects the necessary indices and the functions T k are complete in the space spanned by the indicator fields 24 . The indicator field then depends on indicator fields of other atoms itself, etc. such that a recursive dependence is established as illustrated in Fig. 1.
For an expansion with several layers, the indicator fields θ θ θ (n) j and expansion coefficientsc (p,n) differ from layer to layer and atomic properties of layer n are represented as ϕ ). The indicator field in the next layer n + 1 is then obtained from a non-linear function as Eq. (5), Layer n max is the output layer, while the input layer n = 0 is initialized via T k (θ θ θ (0) ) = 1. The indicator fields in general carry particular symmetries, most importantly covariance under rotation and inversion, which requires symmetrization. 21,24 We denote the resulting model the multilayer Atomic Cluster Expansion (ml-ACE).
By adding new features in the form of indicator fields ml-ACE becomes high-dimensional and sparse basis sets are required for converging ml-ACE. This may be achieved, e.g, with sparsification algorithms, but in the present work we are guided by physical intuition and hierarchical analysis. The further a layer is away from the final layer, the smaller its impact will be on the final prediction, i.e. the details are often lost in the distance. This means that the complexity of the indicator field needs to be varied across the layers for best performance and efficient convergence. Because of the layered structure of ml-ACE, force gradients may be obtained efficiently and the computational expense is linear or less with the number of levels, as some basis functions can be reused, in particular the spherical harmonics.
The ml-ACE may be adapted to represent various message passing networks architectures. In fact, general message passing networks may be obtained as special cases of ml-ACE, see Batatia et al. 25 and Nigam et al. 26 . For example, Thomas et al. 27 and the closely related NequIP network 28 may be cast in the form of a particular low order ACE on each layer.
In the following we present results for a basic version of ml-ACE with scalar (invariant) indicator functions. In simple tight-binding models the difference of the onsite levels are fixed and their shifts may be characterized by the first moment of the atomic density of states µ (1) i = α H iαiα using Eq.(2). We therefore take a single indicator variable, which in addition we assume to be rotationally invariant, and that is represented as a function of the local atomic environment by ACE as Eq.(6). We expect a linear change of the energy with indicator field for small changes, Eq.(4). To make contact with traditional neural networks, we further choose H = tanh. The indicator function is then transformed as θ i = tanh(ϕ i ). The order parameters becomes part of ACE by a suitable choice of single bond basis functions and for a single order parameter we choose the basis functions T k as Chebyshev polynomials of the first kind.
We demonstrate the performance of the ml-ACE with examples of two distinct cases, namely small metallic Cu clusters and 10 small organic molecules from the revMD17 dataset. 29 In a small cluster of N atoms to which one more atom is added, the extra atom can significantly change the interaction between all atoms in the cluster, implying that (N +1)-body interactions are necessary, and (N +2)-body interactions when a further atom is added. One expects a slow convergence to bulk interactions only as N −1/3 , simply as in a compact cluster the number of surface atoms (with a significantly modified density of states) is proportional to N 2/3 . We employ a dataset of nearly 70000 small Cu clusters. The energies and forces of the cluster configurations were computed with FHI-aims 30,31 using a tight basis set. The clusters contain from 2 to 25 atoms. Most of reference clusters were generated by randomly placing atoms inside a sphere of a given radius and ensuring that the distance between any pair of atoms is not smaller than 80% of the bulk nearest neighbor distance. In addition cuts from various crystalline bulk phases were used as well as cluster geometries obtained with empirical potentials. For clusters of up to four atoms the positions were varied systematically to sample the complete configuration space. The distribution of the size and energies of the clusters is shown in Fig. 3a. All the cluster structures were distinctly different, none of the clusters were relaxed and in particular we did not use data along molecular dynamics trajectories to avoid bias because of the long de-correlation times. We randomly selected 10 % of the dataset for training and the remaining 90 % of the clusters for testing.  Fig. 3 shows the convergence of the mean average error (MAE) of forces and energies of ml-ACE for a single, scalar invariant indicator variable. A major improvement of nearly a factor of two is achieved compared with the standard single layer ACE model. Further increase in the model depth yield smaller yet consistent improvements, in accordance with expected rapid convergence with the number of layers 20 . Small clusters with up to four atoms and larger clusters with about 15 or more atoms show errors of only a few meV. The description of small clusters with up to four atoms benefits from the most accurate description of the many-body potentials up to body-order four enabled by ml-ACE. Larger clusters from about 15 atoms benefit from the accurate description of semi-local collective interactions. The largest errors are observed for cluster sizes between about 5 to 10 atoms.
We illustrate the correlation between indicator field and electronic structure in Fig. 2, where we show the first moment of the atomic density of states along a 19atom linear chain of Cu atoms and compare this to the values of the scalar indicator field of a two layered ACE model. The indicator field picks up the largest deviations at the boundary of the chain. We note that atomic linear chains were not part of the training set.
Extended interactions also contribute to bond formation in small molecules. Table I shows the convergence of ml-ACE models for an ethanol molecule from the revised MD17 dataset. 29,32 Similar to small clusters, adding a second layer gives the biggest performance gain, while every further layer results in smaller yet consistent improvement. Thus, for all other molecules we trained 4layer ACE models and the corresponding performance metrics are summarised in Table II (model details are given in Supplementary Materials). The resulting models outperform most of the machine learning potentials [32][33][34] with the exception of the equivariant neural network potentials NequIP 28 . Note that we only compare to models trained on the revised MD17 dataset and Table II shows comparison to the closely related linear ACE model and the best performing model NequIP. For other related models see Refs. 34-37. Performance improvement of the ml-ACE via semi-local information is evidenced by comparing to the linear ACE models. These models contain an order of magnitude more independent basis functions yet they are outperformed by ml-ACE for all molecules. On the other hand, NequIP models improve over ml-ACE for almost all molecules with varying performance differences. The largest absolute difference in force MAE (∼ 6.5 meV/Å) is observed for the aspirin molecule, while the smallest difference is observed for benzene (0.1 meV/Å) with the ml-ACE model being slightly more accurate. This discrepancy can be understood from the structure of the molecules. Charge density is nearly uniform in the benzene molecule and therefore small variations in the atomic environments during dynamics do not introduce significant charge redistribution and the semilocal information provided by a scalar invariant indicator is sufficient. On the contrary, in the aspirin molecule charge distribution is highly non-uniform and changes significantly during dynamics 38 . This leads to a variation  of s-p hybridization on the atoms and non-uniform onsite level modifications for s and p orbitals. The NequIP potential benefits from propagating additional non-scalar equivariant information across the layers which scalar invariant indicators of the current implementation of ml-ACE are unable to capture (see also Supplementary Materials for more details).
To conclude, semi-local interactions are an integral contribution in electronic structure calculations that are used for training machine learning potentials. Here we introduce ml-ACE that efficiently captures electronic structure relaxation in self-consistent schemes such as DFT. We show that the indicator fields from ml-ACE can be understood from physical and chemical intuition and note that message passing networks may be cast in the form of ml-ACE.
We demonstrate ml-ACE numerically for the special case of a single scalar invariant per atom. This allows us to reduce errors for small metallic clusters and molecules to about 50% as compared to single layer models. Employing only a single scalar invariant is also the main limitation of the numerical implementation presented here. Including several indicator variables with non-scalar rotational characteristics, corresponding to p, d, etc. onsite Hamiltonian elements are the necessary next steps for reducing the remaining errors further. As the focus of our work is on semi-local interactions, our implementation further neglected long-range interactions due to charge transfer. Both contributions, equivariant, angularly dependent indicator fields associated to p and d-valent onsite levels as well as charge transfer need to be taken into account for our next implementation of ml-ACE. Multilayer ACE models are trained via minimizing a loss function of the following form where κ is a trade-off between energy and force contribution, N struct is the number of structures employed in the parameterization, n at,n the number of atoms in structure n, and w (E) n and w (F ) n are per-structure and peratom weights for the energy and force residuals, which were set to 1 for every structure and normalized by the number of structures and atoms respectively. For small molecules, we select structures for fitting according to the first split from the revMD17 dataset. 1 Multilayer ACE models were implemented and fitted within tensorpotential package 2 and Table I summarizes the model hyperparameters used for each system.

II. SMALL MOLECULES
For understanding the difference in performance of the multilayer ACE models for different molecules we consider two cases -the best performing model for the benzene molecule and the worst performing model for the aspirin molecule. of the atomic DOS for each atom type computed for 15 randomly selected molecules from the training set using FHI-aims 3,4 with tight settings. As expected, the electronic distribution in benzene is rather uniform and does not change significantly during dynamics, which is illustrated by the narrow window of values of the first moment. Thus, the invariant scalar indicator is able to capture these small differences via propagating the semi-local information through the local atomic environments, which is illustrated by the linear correlation between values of µ (1) i and θ (3) i . Fig. 2 shows the correlation for 15 randomly selected aspirin molecules, recomputed with FHI-aims with tight settings. Data are grouped according to their type and position in a particular functional group. However,