Graph Atomic Cluster Expansion for semilocal interactions beyond equivariant message passing

The Atomic Cluster Expansion provides local, complete basis functions that enable efficient parametrization of many-atom interactions. We extend the Atomic Cluster Expansion to incorporate graph basis functions. This naturally leads to representations that enable the efficient description of semilocal interactions in physically and chemically transparent form. Simplification of the graph expansion by tensor decomposition results in an iterative procedure that comprises current message-passing machine learning interatomic potentials. We demonstrate the accuracy and efficiency of the graph Atomic Cluster Expansion for a number of small molecules, clusters and a general-purpose model for carbon. We further show that the graph Atomic Cluster Expansion scales linearly with number of neighbors and layer depth of the graph basis functions.


I. INTRODUCTION
Semilocal interatomic interactions beyond the reach of Hamiltonian matrix elements ⟨iα| Ĥ|jβ⟩ between orbitals α, β located on atoms i and j are ubiquitous in quantum mechanical calculations.Semilocal interactions involve contributions of different origin.First, diagonalization of the Hamiltonian induces interactions that reach multiple times beyond that of the Hamiltonian matrix, as can be seen from a simple series expansion that results in contributions of the form kγ ⟨iα| Ĥ|kγ⟩⟨kγ| Ĥ|jβ⟩ and higher order 1 .Second, relaxation of the electronic structure in self-consistent calculations induces interactions beyond the direct range of the Hamiltonian matrix elements 2 .Third, direct interactions induced by electronic correlations, dispersion corrections in density functional theory calculations, extend beyond the reach of Hamiltonian matrix elements.
We base our analysis of local and semilocal contributions on the Atomic Cluster Expansion (ACE) that enables accurate and efficient parametrization of manyatom interactions.The basis functions of ACE are complete and can represent other local descriptors 3,4 .For example, the smooth overlap of atomic positions (SOAP) descriptor 5 , the Spectral Neighbor Analysis Potential (SNAP) 6 , the atom-centred symmetry functions (ACSF) 7 and many other descriptors can be cast in the form of ACE.By expanding Cartesian in spherical coordinates, other models such as the moment tensor potentials (MTP) 8 can also be represented 4,9 .
ACE is efficient to evaluate and linear-scaling with the number of basis functions irrespective of the body order of the expansion 10,11 .This means that higher body-order interactions are captured efficiently and machine learning frameworks for non-linearly transforming descriptors to energies, such as neural network potentials, Gaussian process regression or kernel methods, are no longer necessary but optional for achieving accurate models.In fact, it was demonstrated in a number of publications that ACE exceeds the accuracy and numerical efficiency of more traditional machine learning inter-atomic potentials 10,[12][13][14] .
The ACE formalism allows for the seamless incorporation of additional features, such as atomic magnetic moments and atomic or orbital charges, as well as the representation of vectorial and tensorial outputs 9,15 .ACE is neither bound to atomic interactions nor three-dimensional space and was extended to jet tagging 16 , Hamiltonian matrix representations 17 and wave functions 18,19 .The fact that ACE builds on a basis representation enables efficient uncertainty prediction and active learning and exploration 20,21 .ACE may therefore been seen to provide a general and unified representation of atom-centered approaches 7,8,[22][23][24][25][26][27] .
In the past years graph and message-passing representations [28][29][30][31][32][33][34][35][36][37][38][39][40][41] were developed in parallel to atomcentered representations and only recently unified views on atom-centred and message-passing approaches emerged 2,42,43 .Here ACE-based message passing 44 provided the most accurate models 45,46 as these incorporated semilocal interactions 1,2,47 .On each atom messages in the form of ACE are evaluated and passed to neighboring atoms, which use these to construct the next layer of ACE messages until after some layers energy and forces are evaluated.Each message thereby condenses the representation of atomic environments and one has to ensure that the iteration of messages is as descriptive as possible.
Here we extend local ACE to incorporate graphs.We show that graph ACE encompasses atom-centred local ACE as well as multi-atom and multi-layer messagepassing architectures.We demonstrate that ACE generalizes and outperforms current local and semilocal machine learning interatomic potentials with respect to accuracy and efficiency.We start by introducing graphbased cluster basis functions in Sec.II.We then show that invariance and equivariance with respect to translation, rotation, inversion and permutation can be achieved along the same lines of local ACE in Sec.III before we compare local and global, graph ACE in detail in Sec.IV.Here we use the term global as the graph basis functions can in principle extend over all atoms, but as we will see this does not imply that the number of atoms in a arXiv:2311.16326v2[cond-mat.mtrl-sci]20 Jan 2024 model is fixed.We discuss the quantum mechanical foundation of graph ACE in Sec.V.The global, graph ACE is significantly more complex than local ACE and we introduce a series of simplifications in Sec.VI.This leads to recursive evaluation within the dandelion approximation in Sec.VII.We then show that the dandelion approximation is closely related to message-passing architectures with multi-atom messages and in fact provides a derivation of the multi-ACE framework in Sec.VIII.After having established that message-passing architectures can be viewed as a recursive evaluation of graph ACE, we further show that graph ACE exceeds the numerical accuracy and efficiency of other potentials for molecules, clusters and solids in Sec.IX.We conclude in Sec.X.
ACE is used by different research groups in different contexts and designations, which has led to a family of ACE models and it is sometimes hard to grasp the relation between them.For example, the performant implementation of ACE in LAMMPS 48 is called PACE 10 and PACEmaker 49 is software for the parametrization of ACE.PACEmaker enables non-linear and linear models as well as radial basis optimization.Software for obtaining linear ACE models is available as ACEsuit 50 and FitSNAP 51 .ACE-related representations for other Lie groups than the rotation group or for global expansions were termed G-equivariant CE 44 or boost-invariant polynomials (BIP) 16 , respectively, and it is tempting to call the tensor reduction of ACE coefficients 52 trACE.Message-passing multilayer ACE was abbreviated as multi-ACE 43 , its predecessor with scalar messages was called ml-ACE 2 and today's leading implementation of multi-ACE was designated as MACE 45 .The abbreviation of multi-ACE in the MACE implementation should not be confused with magnetic ACE 15 .Finally, it seems natural to refer to graph ACE as grACE 53 .
In this paper we show how several of the different ACE variants are comprised in a single, straightforward generalization of ACE.To discriminate between different ACE flavors we specify the model details after the basic name ACE, which we hope contributes to make connections between ACE variants more transparent.

II. GRAPH ATOMIC CLUSTER EXPANSION
ACE 3 builds on a decomposition of the energy, or any other scalar, vectorial or tensorial quantity, into atomic contributions E = i E i , where i = 1, . . ., N indexes the atoms.Such a decomposition is always possible and in itself not an approximation.An approximation is introduced, however, by limiting direct pairwise interactions to a cut-off distance r cut , which confines ACE to the local atomic environment of each atom.Here we refer to ACE as local ACE (l) , in contrast to the global, graph ACE (g) that we introduce next.For ACE (g) we do not assume a decomposition into atomic quantities from the very start and avoiding this leads us to incorporate semilocal interactions naturally and efficiently.As we will see, local and global ACE are closely related, with ACE (l) being a subset of ACE (g) .
We are interested in the energy (or another scalar, vectorial or tensorial property) of a system of N atoms, E = E(σ σ σ) . (1) The energy depends on the atomic positions r 1 , r 2 , . . ., r N , the species of the atoms µ 1 , µ 2 , . . ., µ N and potentially other properties such as atomic magnetic moments m m m i , atomic charges q i , dipole moments, etc. that we do not specify in detail.The ACE framework allows one to incorporate these different dependencies seamlessly in a single coherent model 9 .Here we assume that µ i collects all atomic variables on atom i.The features are combined in the configuration σ σ σ = (r 1 , µ 1 , r 2 , µ 2 , . . ., r N , µ N ) that fully specifies the state of the N atom system.

A. Configurations
In local ACE (l) , for the evaluation of the energy E i , or other scalar, vectorial or tensorial quantities, the configuration σ σ σ is centered on atom i, σ σ σ i = (r 1i , µ 1 , r 2i , µ 2 , . . ., r N i , µ N ), with r ji = r j − r i .The vector r ii = 0 of the central atom is ignored, therefore σ σ σ i has only N − 1 entries of position vectors.The configuration is centered on every atom and the energy expressed as E = i E i (σ σ σ i ).Centering the configuration on the atoms brings two advantages, first it facilitates a local expansion of E i and second, any function of σ i is invariant under translation by construction.
We illustrate atom-centered configurations for four atoms N = 4.These are given by σ σ σ 1 = (µ 1 , r 21 , µ 2 , r 31 , µ 3 , r 41 , µ 4 ) , (2) σ σ σ 4 = (r 14 , µ 1 , r 24 , µ 2 , r 43 , µ 3 , µ 4 ) . ( For N atoms there are N different configurations.These are the configurations that are used in ACE (l) .There are, however, many additional configurations that provide a full description of atomic positions up to a translation.We give a few examples for four atoms, etc. or see Fig. 1.Each of these configurations is complete in a sense that atomic positions can be reconstructed up to a translation.The configurations are not arbitrary, for example, (r 13 , µ 1 , µ 2 , r 31 , µ 3 , r 42 , µ 4 ) is not an admissible configuration as it does not allow for a reconstruction of all atomic positions up to a translation.At first glance the additional configurations appear less attractive than the configurations employed for ACE (l) , as they are not localized on an atom.As we will see, however, for this reason these configurations are suitable for the description of semilocal interactions.Global, graph ACE (g) builds on all admissible configurations.

B. Single-particle basis functions
We attach to each atom i basis functions with ∆r r r = r − r i and where u is a basis function index.Atom-centered basis functions help to ensure translational invariance and introduce a natural, distant dependent interaction hierarchy, but in principle different basis functions are possible, too.We further assume that the basis functions approach zero at some cut-off distance r cut .The cut-off distance can vary from atom to atom, but for ease of notation we just use r cut here.We ask that the basis functions are orthonormal and complete, Extensions to non-orthogonal basis functions are easily possible as well as to continuous basis functions or different forms of the inner product 3,9 .We further attach to the atoms basis functions χ iκ (µ) that depend on the species of the atom and/or on other variables.These basis functions can depend on discrete variables, for example, different atomic species or continuous variables such as magnetic moments with magnitudes and directions.We ask these basis functions to be orthonormal and complete on each atom, too, where for discrete variables in Eq.( 13) the integration is replaced by a summation and on the right hand side of Eq.( 14) the Dirac delta function becomes a Kronecker delta δ µµ ′ .As we are aiming at representations with well defined transformation under given group operations, we work with basis functions that are also basis functions of the irreducible representations of the group in question, i.e. for the rotation group we use ϕ iu (r) = ϕ iu (r, e e e) = R nl (r)Y lm (r r r) , (15)   with the multi-index u = nlm, distance dependent radial functions R nl with r = |r − r i | and spherical harmonics Y lm that depend on the direction r r r = (r − r i )/|r − r i |.Extension of the formalism to incorporate many other Lie groups with accessible irreducible representations 44 or basis functions in Cartesian coordinates 8 We note that by intuition it should be possible to represent chemistry in terms of distant dependent basis functions of the form Eq.( 10) only, if these basis functions are made to depend on atomic species.In turn basis functions χ iκ (µ) that depend on chemistry would not be necessary.This approach is followed, for example, by MTP 8 .We will make the relation to species-dependent basis functions explicit in Sec.VI B.

C. Cluster basis functions
Next we generate cluster basis functions from products of the single-particle basis functions as, with 1 ≤ i k ≤ N and k = 1, . . ., N .The indices i 1 , i 2 . . ., i N label the configuration σ σ σ with distance vectors r ki k = r k − r i k .In every configuration exactly one entry fulfills i k = k and for this ϕ ku k (r k ) = 1, as there are only N − 1 vectors required to reconstruct atomic positions of N atoms up to a translation.The products are limited to order N as in a system with N atoms at most N atoms interact.
To aid a local interpretation we ask ϕ i0 = 1 and χ i0 (µ i ) = 1.The cluster α collects all atoms i k with basis function indices u k ̸ = 0 or κ k ̸ = 0.When all basis function indices on all atoms are zero, the cluster α is empty α = 0, with Φ 0 = 1.The single-particle basis function indices of cluster α are given by v = (u u u, κ κ κ).The lowest product-order basis functions take the form (19)   where these examples show physically and chemically meaningful basis functions as will be discussed in the next Sec.II D.
For a given configuration σ σ σ i1i2...i N , by construction the cluster basis functions are orthonormal and complete, where the sum and integral are carried out over discrete and continuous configuration variables, respectively.The algebra is identical to local ACE (l) 3,9 and closely related to cluster expansion in alloy theory 54 .Different from local ACE (l) the basis functions are in general not localized on a single atom only.Completeness Eq.( 21) holds for every configuration individually, intuitively as it is possible to reconstruct atomic positions up to a translation from each configuration.
As the expansion is complete, for any configuration the energy or other scalar, vectorial or tensorial properties can be represented as a linear combination of the cluster basis functions The expansion coefficients are formally obtained as where the integral, or sum for discrete variables, is taken over all variables.However, as a configuration in ACE (g) is in general not localized on an atom, different from ACE (l) , Eqs.( 22) and ( 23) should be seen as formal results.Without further analysis they tell us little about the convergence and effectiveness of the expansion.
In the following we work with all possible configurations simultaneously and not only with a single configuration per atom as in ACE (l) .As every configuration formally provides a complete basis, working with all possible configurations necessarily leads to a globally overcomplete set of basis functions.However, as we will see, this gives us freedom for physical interpretation and will allow us to select sensitive basis functions, which ultimately leads to more accurate models.In order to prepare for this we analyse the cluster basis functions in more detail in the following.

D. Graph structure and reduction
The cluster basis functions Eq.( 16) are evaluated for a given cluster α from an admissible configuration σ σ σ.The clusters are graphs with directed edges that correspond to bonds i → j decorated with single particle basis functions ϕ iu (r j ) and where atoms correspond to nodes decorated with basis function χ iκ (µ).Considerable simplifications and reductions of cluster basis functions are achieved by the following observations.

Topology
For a given configuration, the expansion coefficient J αv associated to cluster basis function Φ αv is independent of atomic positions, from Eq. (23).Therefore only graph topology and for a given topology edge orientation and basis function indices need to be considered for the classification of the cluster basis functions and their expansion coefficients J αv , while the positions of the nodes only affect the numerical values of the cluster basis functions.(For determining graph topology we ignore edge directions.The reason for this will become clear with the introduction of root nodes.)

Connected graphs
A cluster basis function can only contribute if it is possible to reach any node in the graph from any other node by a walk along graph edges, irrespective of edge orientations.If a cluster basis function consists of two or more graph fragments that are not connected by at least one graph edge, then the graph fragments can be transported rigidly to infinite distance from each other and this will not change the numerical value of the cluster basis function.Therefore the corresponding cluster basis function is non-local and not suitable for the description of local properties that arise from the interaction of atoms.See illustration in Fig. 2 b.

Edges shorter than cut-off distance
Cluster basis functions on a graph with an edge i → j that is longer than r cut vanish identically as ϕ iv (r j ) = 0. Therefore these cluster basis function cannot contribute, as illustrated in Fig. 2 c.

Maximum one incoming edge per node
Up to a maximum of N − 1 directed edges may come out of a node, while at most a single directed edge may enter any node, as every atomic position is present at most once in any of the cluster basis functions.This limits graphs with n vertices to n − 1 edges.The graphs are necessarily acylic as a cycle would imply redundant geometrical information that by construction may not be present in admissible configurations.We determine the root node in a graph as the node from which any other node may be reached by directed walks involving edges that point outwards only.In Fig. 3 we illustrate possible graphs and show graphs up to order six.The graphs that are part of local ACE (l) are stars with all edges attached to a single node.

E. Trees and subtrees
Admissible graphs are trees.Each of the nodes in a tree can be a root node.As all edges point outwards from the root node, specification of the root node fully determines edge directions, which means that in a tree with n nodes n configurations with different edge orientations exist.
Often several subtrees are attached to a root node.In the following we break trees into subtrees that are attached to the root node.We use a basic topological classification to describe trees and subtrees.For each node the distance from the root node is measured in numbers of edges.The distances are ordered to describe the paths to reach each node from small to large as far as possible, i.e. (123234) is a subtree that branches at node 2 into (23) and (234).In Fig. 4 the topological classification for subtrees with up to five nodes is shown.

F. Atomic properties
Atoms form the nodes of the graphs of the cluster basis functions.For the computation of properties it seems sensible to associate the contribution of a particular cluster basis function to one or several atoms with the aim of achieving a decomposition into atomic quantities of the form E = i E i .It is evident that this decomposition is a matter of choice.For example, the contribution J αv Φ αv of cluster basis function Φ αv could be split equally among the graph nodes.Alternatively, the contribution J αv Φ αv could be added to a particular atom.Formally there are infinitely many ways how to define atomic properties E i that all lead to the same sum E = i E i , which also implies that atomic properties E i are not directly useful for training 20 .Further constraints are required for a unique definition of E i , such as, for example, symmetrization of many-body interactions and assigning their contributions equally to the nodes 55 .
Here we choose to evaluate atomic properties by adding the contribution of each graph to its root node.To this end we introduce atomic bases for tree graphs that generalize the atomic base of local ACE (l) , etc. for higher order, with v k = (u k , κ k ) and where the superscript indicates graph topology.The first terms for each order A (1) , A (11) , A (111) , A (1111) , . . .are the basis functions of local ACE (l) .The summation over atoms has to be carried out over values j 1 ̸ = j 2 , j 1 ̸ = j 3 , . . ., j 2 ̸ = j 3 , . . . to avoid self-interactions.In Sec.IV B we discuss self-interactions that arise from unconstrained summation, i.e. summation that does not respect j n ̸ = j k .The short-hand notation with square brackets provides a computable form of the graph topology that we use for coding ACE (g) .Each left square implies advancing across one edge along the tree and summation over neighbors of the following basis function, while closing a square bracket means retreating one edge.The number of left and right square brackets are the same, therefore the root node is always reached at the end.For example, the atomic base of the cluster basis function on tree (123234) with indices vv We can now write the expansion of a property E in terms of atomic contributions E = i E i .We order the graphs in the expansion by the number of nodes, corresponding to the body order.Terms including up to five nodes are given as iκv + κv1v2 c (11) κv1v2 A The star graphs (1), ( 11), (111), (1111) are part of local ACE (l) , all others are new additions from graph ACE (g) .
Clearly global ACE generates a great number of different basis functions and associated coefficients and the main effort in the remainder of this paper is to reduce the complexity of the expansion.Before we get to this, we will discuss transformation under rotation and inversion in the following section.

III. EQUIVARIANCE
For a model of the interatomic interaction we request invariance with respect to translation, rotation, inversion and permutation (TRIP).If the interest is in the expansion of vectorial or tensorial quantities, one requires TRIP equivariance, i.e. rotation of the vectorial output or of the input graphs leads to the same result.Specifically we require equivariance with respect to the group E(3), which is the semidirect product group of trans-lations T(3) and rotations and inversions where SO(3) is the group of rotations in three dimensions and C 2 the cyclic group of order two for inversion.
Invariance with respect to permutation of identical atoms, i.e. chemical species, is built into ACE (g) as the cluster basis functions are formed from products of the single particle basis functions, Sec.II C. Invariance under translation is ensured as all coordinate inputs are formed as differences that are unchanged by translation.To ensure equivariance under rotation and inversion, we rely on the properties of spherical harmonics that are basis functions of the irreducible representations of the rotation group.Evaluation of cluster basis functions on the various graphs leads to products of spherical harmonics.The products of spherical harmonics can be reduced to irreducible representations with generalized Clebsch-Gordan coefficients, as discussed in detail in Refs. 4,9,49.The mechanism for the reduction to invariant or suitably equivariant basis functions is identical for the star graphs of local ACE (l) and the tree graphs of global ACE (g) .Spherical harmonics further possess well defined properties under inversion, which makes it easy to characterize the behavior of the cluster basis functions under inversion.
We therefore follow ACE (l) to achieve equivariance with respect to rotation and inversion.It suffices to express the expansion coefficients in the form 4,9,10 where v i = (κ i n i l i m i ) are the indices of the cluster basis function, κ κ κn n nl l lm m m = v v v = (v 1 , v 2 , . . . ) and c κκ κ κn n nl l lm m m the expansion coefficients in Eq. (34).The generalized Clebsch-Gordan coefficients are given by C L L LM M M l l lm m m , with angular momenta L L L for different internal couplings of the coefficients and the expansion coefficients cκ κ κn n nl l lL L L that are independent and trainable parameters.For simplicity we omitted the indices for graph topology, as these are unaffected by the transformation and identical for c and c.Note that here we exchanged c and c compared to previous work 9,10 , simply in order that we do not have to write expansion coefficients as c everywhere.
The analysis for the group O(3) that trivially extends to E(3) for the simulation of matter in three dimensions is directly transferable to other groups for which irreducible representations and generalized Clebsch-Gordan coefficients are available or can be generated 44 , so that ACE (g) and ACE (l) are applicable directly in other symmetries and dimensions.

IV. LOCAL AND GLOBAL ACE
The derivation of local ACE (l) starts by assuming a decomposition of the property of a system into its atomic constituents, E = i E i .For each of the atomic constituents E i then an ACE (l) is carried out.
To this end ACE (l) employs configurations centered on each atom.The derivation of global, graph ACE (g) is different.ACE (g) does not decompose a-priori into atomic contributions E i but provides graph-based cluster basis functions for the complete, global property E. By assigning the contribution of the cluster basis functions to their root nodes, the decomposition of E into atomic properties E i is accomplished only after the formal ACE (g) construction.Thereby ACE (g) builds on all possible configurations and not only one, atom-centered configuration per atom as ACE (l) .It is evident that local ACE (l) is a subset of global ACE (g) .
The ACE (l) is obtained by limiting the ACE (g) cluster basis functions to stars only.This also implies that for general radial basis functions, ACE (l) and ACE (g) are identical up to three-body interactions and differ from four-body interactions onward.

A. Completeness and sensitivity
Both local and global ACE expansions are complete by construction.Local ACE (l) basis functions are complete for each atom, the basis functions suffice to discriminate between all different environments that an atom can have.None of the basis functions are redundant as every basis function contributes information and a description of the local environment of an atom that is orthogonal on all other basis functions.
In contrast to the local, atomic perspective on completeness of local ACE (l) , graph ACE (g) provides an (over)complete set of tree graph-based cluster basis functions for the complete system.ACE (g) cluster basis functions have n − 1 edges for n nodes.Edges can connect the same set of nodes in different ways, which results in cluster basis functions with different topology.
Fig. 5 further exemplifies the sensitivity of tree cluster basis functions of ACE (g) in comparison to stars of ACE (l) .The narrow opening angles on the root node of the left hand star graph makes it numerically challenging to discriminate the details of atomic positions and one may expect that a relatively large number of spherical harmonics and radial functions are required for an accurate portrayal of the associated four-body interaction.In contrast, the angles between edges in the tree graph on the right-hand side are large and the edge lengths are comparable.One therefore can assume that the tree cluster basis functions are more sensitive to small displacements of the atomic positions than the star cluster basis functions, which in turn should give the tree basis function a greater importance and sensitivity.This argument is further supported by the quantum mechanical analysis in Sec.V, where we discuss the role of direct pairwise connectivities for the representation of energies and forces in quantum mechanical systems.
Another obvious distinction between local and global, graph ACE is introduced by the cut-off distance.While ACE (l) is limited to star graphs with maximum twice the cut-off distance interaction range, the ACE (g) tree graphs can extend over several cut-off distances.For example, if one assumes that the cut-off distance limits single-particle basis functions to the short pair distances in Fig. 5, then the contribution of the left-hand star graph must vanish while the right-hand tree graph can contribute.
Self-interactions appear to be more difficult to remove in global ACE (g) as it is not obvious for all graphs that incorporate self-interactions, i.e. when edges connect nodes twice or multiple times, how these can be represented by graphs that are admissible graphs in ACE (g) as discussed in Sec.II D.Here we rely on the (over)completeness of the ACE (g) cluster basis functions, which implies that any graph-based function can indeed be represented by a linear combination of ACE (g) cluster basis functions.Fig. 6 illustrates self-interacting graphs.On the left hand side a self-interacting star graph is shown together with a star graph with one edge less that enables self-interaction correction.The right-hand side shows a self-interacting graph that is not admissible in ACE (g) and in grey a graph that can contribute to correcting self-interactions.
We argue in the next Sec.V that some of the self-interactions can be interpreted from quantummechanical considerations and can in turn be beneficial for the accuracy of ACE (g) .We further will see in Sec.VII that efficient implementations of ACE (g) involve self-interactions, just as ACE (l) , and illustrate this point numerically in Sec.IX A. FIG. 6. Example of two graphs with self-interacting contributions (black) and two graphs that can contribute to remove these self-interactions (grey).

V. QUANTUM MECHANICAL FOUNDATION
Graph ACE (g) closely resembles quantum mechanical electronic structure models, which helps to explain why ACE (g) is able to model reference data from electronic structure calculations very well.However, as ACE (g) is not conceived as a quantum mechanical model, not all of its contributions have a direct counterpart in quantum mechanics.
To make the connection to quantum mechanics explicit, we first simplify to the two-center approximation 56 , with local orbitals |κnlm⟩ = R (κ) nl Y lm on two atoms i and j oriented along the bond axis r r r ij = r r r i − r r r j .As other atoms in the environment of the bond do not contribute, one assumes invariance under rotation about the bond axis, which implies that the distance-dependent two-center Hamiltonian matrix elements in bond orientation can be written as with the bond length r ij .One rotates the matrix elements to the global coordinate system as (37) Spherical harmonics are rotated with Wigner D-matrices D (l) mm ′ (αβγ) and as rotation about the bond axis is not required, only two angles αβ are needed.Therefore the Wigner D-matrices reduce to spherical harmonics and the transformation matrix J(l 1 m 1 l 2 m 2 m ′ 1 ) can be expressed as a linear combination of spherical harmonics and the same holds for the Hamiltonian in the global coordinate system, See, for example, Eq.( 17) in Ref. 57 for an explicit expression of C(Ll . The distance dependent function φ(r ij ) collects the different pre-factors of the spherical harmonics.This means that the two-center matrix elements can be expressed as linear combinations of spherical harmonics with distance dependent pre-factors, which brings a direct link to ACE (g) basis functions ϕ iv (j), Eq.( 15), and one can in general represent the two-center Hamiltonian matrix elements from linear combinations of ACE (g) basis functions ϕ iv (j).Conceptually we could therefore replace the ACE (g) basis functions ϕ iv (j) with two-center Hamiltonian matrix elements (obviously at the cost of a more complex disentangling of angular contributions for rotational covariance), or we can just take the view that the ACE (g) basis functions ϕ iv (j) are indeed Hamiltonian matrix elements.This is the view that we take for the further discussion.However, it should also be noted that not all basis functions ϕ iv (j) automatically qualify as Hamiltonian matrix elements.Quantum mechanical operators are Hermitian, Ĥ = Ĥ † , which imposes symmetries that are not fulfilled by ACE (g) basis functions in general.We note in passing that overlap matrices are two-center matrices, too, which further reveals a relation between the ACE (g) basis functions and the Overlap Matrix descriptor of Li et al. 58 Several linear scaling Density Functional Theory and Tight-Binding methods such as recursion 59 , Fermi-Operator Expansion 60 , Bond-Order Potentials 1,61 and Kernel Polynomial Method 62 rely directly or indirectly 63,64 on the evaluation of the moments of the density of states from the atomic coordinates via the moments theorem 65 with the matrix elements H iαjβ = ⟨iα| Ĥ|jβ⟩ of the orthonormal basis functions ⟨iα|jβ⟩ = δ iα,jβ with orbitals α and β that are associated to atoms i and j, respectively.Moments of order M of the local density of states of orbital α on atom i are obtained from all self-returning hopping paths, i.e. cycles with M edges, where each edge is given by a Hamiltonian matrix element H iαjβ .Self-interactions in a cycle must be taken into account.Due to self-interactions as well as non-zero onsite matrix elements H iαiα some contributions to the M -th moment involve fewer than M atoms.Fig. 7 illustrates contributions to the fourth moment that differ by topology and shows their representation in cluster basis functions.For simplicity on-site matrix elements were ignored, H iαiα = 0. Note that the leftmost pair contribution is identical to the self-returning three-body contribution to the right of it, implying a self-interacting pair basis function depicted in grey.
A detailed analysis of tight-binding models 55,66-68 showed that fourth-moment cycle contributions are small compared to efficient pair and three-body contributions.This is broadly due to different self-cancellation in the graphs in Fig. 7.The effective pair contribution of the fourth moment has no angular contributions and therefore no cancellation can occur when the sum over neighbors is taken.The effective-three body graph has one opening angle and when the sum is taken over the neighbors of the root atom, angular dependence may lead to some self-cancellation.For the cycle the sum is taken over three neighbors, which due to interfering angular dependencies can lead to effective self-cancellation of the cycle contributions, which render the cycle contribution less important than the pair and three-body contributions to the fourth moment.

VI. SIMPLIFICATIONS A. Tensor decomposition
In the following we will discuss possible simplifications of the expression Eq.( 34).We will assume at various places that tensors can be decomposed as which further implies that decompositions that involve tensors of different sizes are also possible, for example, Low rank decomposition as provided by a joint summation index k is evidently a key advantage, but not strictly necessary.By trivially expanding λ (k) = k2k3k4 λ (k) δ kk2 δ k2k3 δ k3k4 and defining t k l n l δ k l k l+1 from Eq.( 41) one arrives at In the following we decompose cluster basis function expansion coefficients as seems best for an efficient representation of graph ACE (g) .
Tensor decomposition has been used in the context of ACE recently to eliminate the combinatorial scaling of the number of coefficients with the number of chemical elements by Darby et al. 52 .Here we build on this work, but take a slightly different route.We start by limiting tensor decomposition to chemical indices only.In fact, we deliberately keep the chemical index of the root atom in the associated expansion coefficient for the moment as it appears physically and chemically intuitive to do so.We further discuss in Appendix A how to completely remove all chemical indices and also employ this reduced representation in our numerical examples.

B. Atomic species
For removing atomic onsite basis functions we limit onsite atomic degrees of freedom to atomic species, i.e., we assume that χ iκ (µ i ) is a function of the atomic species µ i on node i only.The expansion coefficients c (... ) κv1v2v3... of ACE (g) in Eq.( 34) with v = (u, κ) are rewritten as and so on for higher orders.We then modify basis functions of the form Eq.( 15) to have radial functions that depend on chemistry, Next, we combine the indices k and n of the radial functions into a joint index n and we do the same for the expansion coefficients c κv1v2v3 → c κv1v2v3 with v i = (n i l i m i ), which essentially means that we are hiding k in the indices n 1 , n 2 , n 3 , . . . .Then, by choosing χ iκ (µ i ) = δ κµi we arrive at simplified expressions for Eq.( 24) and following, iv1 (r r r j1 )ϕ (2) iv1 (r r r j1 )ϕ (2) with the superscript index from the weights w and so on for higher-order and other graphs, and with We see that the chemistry dependence of the atomic interaction has been fully incorporated into the pairwise, chemistry dependent radial functions and that in this representation the decoration of the nodes with basis functions χ iκ (µ i ) is not required.In practice we do not need to know the decomposition Eq.( 45) as we optimize the radial functions during training and for this we only need to know the variables t, µ j , n, l and r ji that determine the radial functions.Note that the radial functions depend only on the chemical species of the neighboring atom j, i.e.R (t) µj nl (r ji ).From our quantum mechanical considerations in Sec.V it may be appropriate to extend this to We have used radial basis functions of the type R µj µinl (r ji ) previously 3,49 but ultimately this is a matter of choice and design and not dictated by the formulas that we have derived here.Also, as mentioned in Sec.II B the Moment Tensor Potentials 8 start from chemistry-dependent radial functions and without spanning chemical space with basis functions χ iκ (µ i ) and we have demonstrated here explicitly that employing chemistry-dependent radial functions or chemical space basis functions can lead to identical representations.
The expression for atomic properties in ACE (g) , Eq.( 34), remains essentially unchanged, but v does no longer contain the chemical index κ, which means that the number of entries in expansion coefficients c µv1v2v3 in multi-component systems do not suffer from combinatorial explosion.In this way the expansion Eq.( 34) is written as (1) µiv A (1) iv This reduced representation and the full representation Eq.( 34) are identical if sufficiently many terms are used in the tensor decomposition Eq.( 44).However, while the parameterization of multi-component systems is hardly possible with the full representation, the reduced representation enables this.
In the following we further keep Eq.( 51) so that it can be read in two ways.If we take the indices as v 1 = (n 1 l 1 m 1 ), v 2 = (n 2 l 2 m 2 ), v 3 = (n 3 l 3 m 3 ), . . ., we allow for different radial functions.We can also choose v 1 = (nl 1 m 1 ), v 2 = (nl 2 m 2 ), v 3 = (nl 3 m 3 ), . . ., i.e., a single joint index n, as briefly summarized in Appendix A. It is a matter of choice and numerical considerations, which set of indices is superior and we delay making this choice until the numerical implementation for the examples in Sec.IX.

C. Star decomposition and layers
We make contact with local ACE (l) by conceptually decomposing trees and subtrees into stars.The stars are categorized by the distance from the root node, i.e. the number of edges required to reach the star along the (sub)tree and the number of outgoing edges of the star.We call the number of edges that are required to reach the star from the root node the layer of the star.Star (t, p) is located on the t-th layer from the root node and has p outgoing edges.A floret in the following is an outgoing edge together with a node at the outgoing end, where this node has no outgoing edges.See Fig. 8 for an illustration.
We next employ tensor decomposition and write expansion coefficients as products of star expansion coefficients.For example, the expansion coefficient of graph (12332333) is represented as where the basis function indices are labeled as v tor , with t the layer index, o the index of the star node in layer t, r the index of the floret in star o.Generalization to arbitrary tree structures is obvious.This representation is general and does not limit interactions associated with trees and subtrees.The representation further takes into account the symmetry of the graph as illustrated, for example, for the (1233233) subtree, which must be invariant with respect to exchange of the two (2, 2) stars, As will be discussed next, the decomposition of the expansion coefficients in products of star coefficients Eq.( 52) has the advantage that the expansion coefficients of all subgraphs of a graph can immediately be written in analogous form.

D. Floret picking and subgraph expansion coefficients
Starting from a given tree, one can generate all possible trees that are subgraphs of the starting tree by picking one floret after another.For example, picking a floret in the (2, 3) star of the (12332333) tree, results in the (1233233) tree with one node less.Fig. 9 further illustrates floret picking in a (12333333) tree.
In general when a floret on node o in layer t is removed, the number of edges p in the corresponding star is reduced by one.To a removed floret we assign index v top = 0 and denote the corresponding expansion coefficient as and so on until all florets of the star on the node were removed For the example graph in the previous section, by picking florets on the (2,3) star we generate interaction coefficients of the subgraphs A star with zero florets is a floret itself that can be picked for further reducing the tree.Thus if we pick the new floret in the above (12233) graph, we arrive at If we next pick the two florets on the (2,2) star, the resulting expansion coefficient is given by The key observation is that by picking florets in all possible ways all subtrees and their corresponding expansion coefficients can immediately be obtained and represented.We will exploit this next for the recursive evaluation of a tree together with all of its subtrees.

E. Recursive evaluation
Because all expansion coefficients of subtrees of a tree can be expanded as part of the representation of the tree expansion coefficient, efficient recursive evaluation becomes possible.We illustrate this here for a small (122) graph, but extension to more complex graphs is obvious and will be exploited in the following section.
From Eq.( 51) we have for the contribution of a (122) tree, including its ( 12) and (1) subtrees, (1) µiv A (1) with expansion coefficients and basis functions iv (r r r j ) , iv1 (r r r j )ϕ (2) iv1 (r r r j )ϕ Here the sum of j, j 2 , j 3 is unrestricted so that we incorporate self-interactions in the expansion, see the discussions in Sec.IV B, V and IX A.
We next define a local ACE (l) on layer 1 of the expansion On the first layer we evaluate iv (r r r j )φ and another local ACE (l) from which one can identify We see that graph ACE (g) can be evaluated by layer-wise evaluation of local ACE (l) and the summation of all channel k contributions at the end.In the following section we will employ this for the evaluation of more complex trees and including all of their subgraph contributions.

VII. DANDELION APPROXIMATION
In the previous analysis we ordered graphs by their number of nodes.In the following we will drop this hierarchy as this enables us to make use of efficient recursive evaluation of the cluster basis functions.We start by considering trees that are identical up to one node, from which different numbers of edges emerge, i.e. the trees are characterized by topology strings that are identical up to one number that is repeated for a different number of times, for example (12), ( 123), ( 1233), ( 12333), (123333), ( 1233333) and (12333333).We call the largest of the trees, from which all others can be obtained by floret picking, a dandelion and abbreviate it as (123 6 ), see Fig. 9 for an illustration.
We focus on dandelions of the form (12 . . .T P ) that have a depth of T layers and every node has P outgoing edges, see lower part of Fig. 9.The regularity of the dandelion trees makes notation easier and more transparent, while on the other hand the dandelions are sufficiently general to accommodate many other trees as subtrees.

A. Dandelion expansion coefficients and recursion
We start by decomposing the dandelion expansion coefficient into star contributions following Sec.VI C. The  1232), ( 1233), (1234) dandelion graphs (lower panel).The root node is not shown and the subtrees emerge from the root node, which fully characterizes the directions of the edges, therefore edge direction is not shown.
expansion coefficient of a (12 . . .T P ) dandelion can in general be represented as where we did not write out the indices on the left hand side.This formula is best understood for small values and then expanded to general T and P .For example, for two layers T = 2 and and two nodes P = 2, the expansion coefficient reads which can be compared directly to the examples given in Sec.VI C. Following the discussion in Sec.VI E, the contribution of the (12 . . .T P ) dandelion tree and all its subtrees can be evaluated recursively.The recursion is initialized by a local ACE (l) in the last layer T , with the usual atomic base, and ACE (l) as The layers t = T − 1, T − 2, . . ., 2, 1, 0 are iterated downwards by forming an atomic base that pulls in information from its neighbors in the form of a local ACE (l) The atomic base is then used to set up an effective ACE (l) on the next layer, The iteration is terminated with In practise one may want to stop the ACE (l) on each layer at a body-order K smaller than P , implicitly assuming c (t,k) = 0, k = K + 1, . . ., P for the corresponding star expansion coefficients.One can understand the local ACE (l) on each layer φ jk as messages that are attached to the atoms j and transferred to their neighbors i during construction of the effective atomic base A (t) iv .If desired, the expansion coefficients of the local ACE (l) on each layer can be tensor-decomposed further, using the approach discussed for ACE (l) in Sec.VI A and Appendix A or Ref. 52.
Further, for a scalar expansion, one can just take ACE (g) directly as or alternatively, in analogy to our work on ACE (l) 3,10,12,49 , it can be efficient to compute several expansions for each atom, φ i1 , φ i2 , φ i3 , . . .and compute a scalar property as where F is a non-linear function, for example, for two expansions.
With each layer in the recursion the effective interaction range is extended by r cut .If an atom has N neighbors within r cut , naively one would expect that the effort for evaluating ACE (g) in dandelion approximation scales as T 3 or even N T .However, as shown in App.E, where we also provide explicit formulas for gradients, the effort for evaluating ACE (g) and its gradients scales linearly with the number of layers T and linearly with the number of neighbors N .
For an interatomic potential, in layer 1 only contributions l = m = 0 and even parity p = 1 are relevant, but for vectorial or tensorial expansions other values of l and p are of interest 9 .
During the iteration implicitly products of (generalized) Clebsch Gordan coefficients are taken.These products could in principle be reduced to remove some linearly dependent functions 4,49,69 .

Coupling radial channels
While angular coupling seems advisable to maintain clean angular momentum channels, an analogous coupling is also possible for the radial and chemical indices.This can be achieved by introducing additional, layerdependent weights W (t) in Eq.(87) to read This coupling has the advantage that the effective atomic base has only one radial, respectively, chemical channel as in the original ACE (l) formalism.The coupling W (t) brings some freedom and its dependency on the indices npln 1 p 1 l 1 n 2 p 2 l 2 can be explored for numerical efficiency and reduced if necessary.Here full tensor decomposition implies n = n 1 = n 2 .Other variants, such as requesting n 1 = n 2 but keeping n different are also possible and we employ this for the numerical examples in Sec.IX.
The effective ACE (l) is then evaluated as We also note that while recent graph and message passing interatomic potentials highlighted the importance of non-scalar, equivariant messages, in the recursive evaluation of ACE (g) equivariant, vectorial and tensorial intermediate ACE (l) emerge naturally from the angular character of the coupling in the ACE (g) basis functions.

VIII. COMPARISON TO OTHER METHODS
There are two main approaches that claim unification of message passing networks with atom-centered many-atom expansions 42,43 .In particular the multi-ACE framework is close to results of our analysis as it builds on ACE (l) messages.We therefore discuss the multi-ACE framework in some detail first.

A. Multi ACE
The multi-ACE framework 43 has unified ACE (l) with message-passing graph networks.Multi-ACE shares many features with the recursive evaluation of the dandelion graph of ACE (g) in Sec.VII if the approximate product reduction of angular contributions (Appendix B) is employed.In fact, the multi-ACE framework is fully contained in the graph ACE (g) design space that allows for some freedom due to different choices that one can make when tensor-decomposing the ACE (g) expansion coefficients.A key difference between multi-ACE and ACE (g) is, however, that we derived the recursive evaluation of the dandelion approximation starting from the global (over)completeness of graph ACE (g) , while the multi-ACE development was guided by merging ACE (l) and NequIP 32 .For example, message passing in multi-ACE is understood as a chemically inspired sparsification of a local ACE (l) with a very long cutoff.As discussed in Sec.IV, the cluster basis functions of ACE (l) are a subset of ACE (g) , and the graph ACE (g) cluster basis functions enter the recursive evaluation of the dandelion approximation.This also means that multi-ACE cannot be obtained from local ACE (l) directly, but only from ACE (g) .Put this way, ACE (g) provides the foundation and derivation of multi-ACE.
Multi-ACE makes a distinction between features h and messages m.The messages essentially are identical to local ACE (l) on each layer in the form of Eq.( 88).The features are obtained from linear transformation of the messages and are multiplied with the single-particle basis functions.In dandelion approximation of ACE (g) the distinction between features and messages is not required explicitly, but combined into the computation of the effective atomic base in Eq.( 90) and local ACE (l) on each layer.Furthermore, multi-ACE has a slightly different way of handling angular channels.It keeps two angular indices for the effective atomic base and does not mix one of the two angular channels, in contrast to the mixing of the angular channels in the effective atomic base for the dandelion approximation.It should be emphasized that both approaches are part of the design space of the dandelion approximation and both make use, implicitly or explicitly, of the approximate product reduction of angular contributions discussed in Sec.B.
The multi-ACE manuscript 43 discusses design choices explicitly for several message-passing models and how these models can be understood from a multi-ACE perspective.This analysis immediately carries over to graph ACE (g) and we will therefore limit our discussion to selected representatives of the multi-ACE framework.

B. Nequip
NequIP 32 pre-dates multi-ACE.It builds on tensor field networks 31 for an accurate representation of the atomic interaction.In retrospect NequIP may be understood as a specific graph ACE (g) realization, limited to low body-order ACE (l) messages.Alternatively multi-ACE may be understood as a generalization of NequIP to higher body-order ACE (l) messages.The limitation to low body-order messages helps to explain why NequIP requires several layers for accurate representations.

C. Multilayer ACE
The multilayer ACE, abbreviated ml-ACE, is a multilayer message passing interatomic potential 2 .It is limited to scalar, but non-linear update functions and a subset of multi-ACE and of course ACE (g) .The ml-ACE was inspired by electronic structure relaxation that induces semilocal interactions beyond the immediate neighbor shell of an atom.

D. MACE
MACE 45,46 is a version and an accurate implementation of multi-ACE.MACE was built on the e3nn software 70 and made a few specific design choices within the multi-ACE framework.For example, MACE used a rank six tensor for representing radial functions despite not indexing chemistry explicitly.This provides some freedom in the multi-ACE design space but cannot be justified directly from our ACE (g) analysis.MACE used a multilayer perceptron to represent the radial functions, a choice that we followed in our numerical implementation of ACE (g) .As in the multi-ACE framework, MACE also made a distinction between features and messages and mixes both contributions from one layer into the features of the next layer, while in ACE (g) this distinction is not necessary.The ACE (l) on each layer in MACE is constructed recursively, e.g., Sec.A3.3 in Ref. [45].The recursion is defined to reduce the number of basis function with many edges.Furthermore, MACE did not incorporate all possible parity couplings.As discussed in Sec.IX, we observe that MACE requires about one order of magnitude more parameters than ACE (g) , which may be attributed to some of these design choices.

E. Allegro
Allegro 40 is a many-body potential based on iterated tensor products of equivariant representations.The structure of Allegro has been compared in detail to local ACE (l) in Ref. 40.For example, the body order of ACE (l) has its analogy in the number of layers in Allegro.We speculate that ACE (g) limited to the same direct neighbor atoms as Allegro has a similar mechanism for maintaining flexible training functions, while in addition benefiting from other graphs than the stars of ACE (l) , see the discussion in Sec.IV A.

F. Unified atom-centred and message-passing schemes
Nigam et al. 42 unified atomic density based descriptors and message passing networks by combining the two into multi-centred atomic density correlations.The atomic density correlations have similarities to the graph cluster basis functions that we introduce here, but appear not to be limited to admissible configurations only and in turn also not to acyclic graphs.

IX. APPLICATIONS
We consider three application examples.First, a model system of four-atom Si clusters.The four-atom clusters are sufficiently simple to demonstrate and compare various graph architectures, including direct construction of the ACE (g) basis functions, Eq.( 24) and following.At the same time due to the bonding and structural diversity of the reference data it presents a challenge for any machine learning potential.Second, we consider the frequently used revised MD17 39,71,72 dataset of 10 small organic molecules as well as a dataset of a flexible 3BPA molecule 13 .And third, a large carbon dataset that was recently used for a general purpose ACE (l) 12 .
From these examples, we assess the general applicability of ACE (g) , its computational performance and its accuracy compared to other models.Details on training and model parameters are given in Appendix C.

A. Analysis for four-atom Si clusters
We first consider a dataset of four-atom Si clusters.The dataset consists of a total of 16180 clusters.The clusters were computed with DFT using the PBE functional 73 with the FHI-aims code 74,75 using tight settings.The calculations were non-magnetic to ensure a single energy hypersurface.The clusters were not relaxed and comprise completely randomized atomic positions.The clusters were constructed such that all distances between atoms are smaller than the cut-off radius of 8 Å.In this way, we ensure that non of the star graph contributions vanish due to limited interaction range and instead compare sensitivity of the various graph contributions.From the clusters 90 % were taken for training and the remaining 10 % for testing.We compare three different ACE models, local ACE (l) limited to ( 1) stars, an explicit graph basis with star and tree type contributions constructed explicitly and without self-interactions, and global ACE (g) models in various forms of the dandelion approximation that incorporate self-interactions.Fig. 10 shows the test metrics of the considered models as a function of the maximum number of graph edges.For ACE (l) models the number of edges is the same as the product order.The ACE (l) results are indicated by filled circles.Star and tree graphs correspond to Eqs. (25,26,27,30).ACE (g) models were limited to tree graphs and included ( 12), ( 1212), (121212) maximum graphs (diamonds), ( 122) and (122122) maximum graphs (squares), and ( 123) and (123123) maximum graphs (triangles).As discussed in Sec.IV, ACE (l) and ACE (g) start to differ only from three edges in the graph.Therefore, different models with two edges, i.e. (11) or (12) graphs, lead to numerically identical results.
For a maximum of three edges, corresponding to 4body interactions, as expected the (111) star contribution without self-interactions and the third-order contributions of the atomic base A (1) 3 , that incorporate self-interactions, are identical.Furthermore, we observe that the (123) tree is more accurate than the (111) star, which highlights the importance of tree-sensitivity as illustrated in Fig. 5.The dandelion approximation with (122) or (123) largest graphs, which incorporates selfinteractions, improves over the explicit (123) graph.The reason for this is that the dandelion recursion brings in additional degrees of freedom, c.f. Eq.( 90), due to the approximate product reduction of angular contributions.These additional numerical parameters lead to a further lowering of the test error as compared to the (123) tree and, therefore, utilizing these parameters is considered beneficial.We note that if Eq.( 87) is used instead, the results for the A (123) dandelion model closely match the (123) tree (not shown).
A maximum of three edges corresponding to four-body interactions should be sufficient for the description of four-atom clusters.In fact, if self-interaction contributions are not permitted, graphs with more than four nodes vanish identically for four-atom clusters.Thus all graphs with four or more edges cannot contribute without self-interactions.However, clearly visible in Fig. 10, increasing the number of edges further lowers the errors for ACE (l) and ACE (g) .This is due to the fact that basis functions with up to three edges were not exhaustively increased but kept constant and further self-interacting basis functions with more edges can contribute to lower the error.The rational behind this is our observation that adding a few more basis functions on graphs with additional edges improves the model more efficiently than adding more basis functions to a given graph.
At a maximum of six edges, ACE (g) improves significantly over ACE (l) .Thereby the number of layers, i.e. two or three layers, does not appear to be decisive.More accurate models can be achieved by either increasing the number of layers or alternatively the body order of the ACE (l) in each layer.This explains, for example, why MACE works well with two layers only and many-body ACE (l) messages while NequIP and Allegro require three layers.
We remark that in Fig. 10 we focus on relative errors of different models.Changing hyperparameters, including non-linear embeddings or other forms of the radial functions, would shift the complete graph to lower errors, but without modifying the above analysis and conclusions.

B. Small molecules
The MD17 dataset 72 consists of configurations from ab initio molecular dynamics simulations of 10 small molecules.For each molecule, there are 100000 configurations, 1000 of which were randomly selected for training, the rest was used for testing.The graph ACE (g) mean absolute errors for energies end forces are shown in Table I in comparison to the best available and ACE-related ML models.Results of many other models can be found, for example, in Refs. 13,32,45.The graph ACE (g) shows the best performance for all 10 molecules in comparison to all other state-of-the-art models.A further test for which the models were trained only on 50 configurations from the training set is shown on the right side of Table I.Also here ACE (g) outperforms the other models for every molecule, indicating excellent data efficiency.The 3BPA dataset was introduced in Ref. 13 to assess extrapolation capability of machine learned interatomic potentials.The training set consists of 500 configurations of the flexible drug-like molecule 3-(benzyloxy)pyridin-2amine, obtained from ab initio molecular dynamics at 300 K and the test set contains a series of molecular dynamics calculations at 300 K, 600 K and 1200 K as well as relaxed configurations of so-called dihedral slices that consist of conformer configurations far from the training data.The energy and force RMSE of ACE (g) for these tests are shown in Table II together with results from other state-of-the-art models.ACE (g) outperforms all other models in all cases.ACE (g) is also computationally more efficient than other methods.
The evaluation time of the ACE (g) models on a NVIDIA A100 GPU is around 2.3 ms/molecule which is about 10 times faster than MACE 45 and 45 times faster than NequIP 45 timings reported for the same GPU.Also, models like MACE, Allegro and NequIP utilize several million parameters to achieve the reported performance, while the largest ACE (g) model uses around 162k parameters.

C. Carbon
The carbon dataset 12 is particularly challenging as it was designed for a general purpose potential and therefore consists of structures of various atomic distances and bonding characteristics.The structures can be split into five groups based on bonding character and structure, namely sp2 and sp3 bonded, amorphous and liquid configurations, bulk structures of various crystals and clusters with two to six atoms.The dataset contains in total 19205 structures, and as in the reference, 17293 were taken for fitting and 1912 for testing.We fit ACE (g) and MACE models and compare to the original ACE (l) result 12 in Fig. 11.Both ACE (g) and MACE pro- vide a considerable improvement over ACE (l) , especially for clusters.While both ACE (g) and MACE show similar performance with ACE (g) showing slightly lower errors in total, ACE (g) is able to achieve this with an order of magnitude fewer parameters, i.e., ACE (g) has around 105k parameters, while MACE has almost 2.5M.We also assess evaluation time on a NVIDIA A100 GPU for molecular dynamic simulations with LAMMPS.For a supercell with 2016 atoms in equilibrium diamond structure we obtain 200 µs/atom for ACE (g) versus 400 µs/atom for MACE (see App.D for more details).Moreover, on the GPU with 80 Gb of memory we were able to run simulations only for up to 2016 atoms with the MACE model whereas for ACE (g) this limit was 12672 atoms.

X. CONCLUSION
We introduced graph Atomic Cluster Expansion (grACE or ACE (g) ).Different from ACE (l) , for the derivation of ACE (g) we did not assume a decomposition into atomic quantities.Therefore, by construction the ACE (g) cluster basis functions for each configuration are complete and orthonormal in the space of N interacting atoms.The basis functions from different configurations are categorized by graph topology and their radial, angular, chemical, magnetic, etc. character.A decomposition into atomic contributions is achieved by assigning the contributions of cluster basis functions to the root node of their graphs.
We show that local ACE (l) is a subset of global, graph ACE (g) obtained by limiting ACE (g) to star graphs only.We further highlight the relation of ACE (g) to quantum By employing tensor decomposition we achieve an inprincple lossless simplification of ACE (g) that enables efficient recursive evaluation of tree graphs including the contribution of all subgraphs.In passing this allows us to illustrate that the success of recently developed equivariant message passing models is neither connected directly to message passing nor to equivariance, but should be seen a consequence of including graph basis functions in ACE (g) that are more sensitive than the stars of ACE (l) and have a longer reach, which makes them well suited for modelling semilocal interactions.We show that graph ACE (g) encompasses multi-ACE and its implementation in MACE and demonstrate the numerical accuracy and efficiency of graph ACE (g) in dandelion approximation for molecules, clusters and solids.In all our tests ACE (g) is more accurate than the currently most accurate machine learned interatomic potentials, while ACE (g) is also faster and an order of magnitude more parameter efficient.
We call this the approximate product angular reduction as a more elaborate analysis must involve the appropriate decomposition of the Clebsch-Gordan coefficients.
The generalized Clebsch-Gordan coefficients can be generated from products of the Clebsch-Gordan matrices.The products of the Clebsch-Gordan coefficients are overcomplete in a sense that contraction with spherical harmonics in general leads to linearly dependent functions.Therefore not all possible angular momenta couplings in the generalized Clebsch-Gordan coefficients are admissable and selected couplings need to be removed 4,49,69 .Thus if one approximates generalized Clebsch-Gordan coefficients from products of smaller generalized Clebsch-Gordan coefficients, one expects to create too many couplings of which some lead to linearly dependent functions.On the other hand, however, none of the possible angular couplings are missed and in this sense the approximate product angular reduction is overcomplete.We employ basis functions Eq.( 49) that separate chemistry from radial functions as with trainable parameters W n (µ j ) for differentiating the edge chemistry.

b. Radial functions
Radial functions R nl (r), Eq.( 15), in ACE (g) models are represented as expansions in radial basis functions g k (r).We use two types of expansion, namely linear and multilayer perceptron (MLP).The linear expansion is given by with the radial expansion coefficients c nlk .
For the MLP, a layer transforms inputs x k to output h n as where a is the activation function and w nk are trainable parameters.For the radial function R nl (r), inputs x k = g k (r) are transformed via three hidden layers with 64 units each and SiLu activation function.In the last layer no activation is applied.Two types of radial basis functions were utilized, the simplified spherical Bessel functions 49,76 and the Bessel function with polynomial envelope function 28,32,40 with p = 5.

c. Non-linear embedding
The atomic energy is expressed as a non-linear function Eq.(83) with ACE (g) inputs Eq.( 81).Here we use a single-layer MLP with SiLu activation as a non-linear embedding and vary the number of units in the layer and number of inputs.

d. Loss functions
For optimizing our ACE (g) models we used the following loss function where κ E and κ F weight the contributions of energies and forces errors, N struct is the number of structures employed in the parametrization, and w (E) n and w (F ) n are per-structure and per-atom weights for the energy and force contributions for structure n, which were set to one for every structure and normalized by the number of structures and force components respectively.For the carbon dataset, the energy residual is in addition normalized to the number of atoms n at,n in a structure.∆ L2 is the regularization term that penalizes the absolute values of the ACE (g) expansion coefficients c c c in Eq.( 91) where κ L2 is the regularization weight parameter.

e. Model parameters
For Si clusters the radial functions are produced from a linear expansion of eight radial basis functions in the from of simplified spherical Bessel functions with cutoff distance of 8 Å and the radial functions are shared across the layers.We further use n max = 8 and l max = 4 for the atomic base on each layer and all intermediate couplings.A linear embedding was used for the energy.For training we used the L-BFGS-B algorithm as implemented in scipy 77 with κ E = 1, κ F = 100 and κ L2 = 5 • 10 −8 .
For small molecules, we use models of dandelion type (1222 4 ) with variable radial functions for two layers.A linear expansion of 8 radial basis functions in the from of Bessel functions 40 with n max = 8 is used for the first layer, while a MLP expansion of the same basis is used on the layer zero with n max = 32.For the MD17 dataset the cutoff for the radial basis is set to 4 Å and 4.5 Å for the 3BPA dataset.For all models l max = 3 for the atomic base and all the basis functions intermediate couplings as well as the second layer ACE (l) expansion.A single layer MLP with two inputs and 16 hidden units is used for the energy embedding alongside with linear embedding.The L-BFGS-B algorithm was used for training with κ E = 1, κ F = 500 and κ L2 = 5 • 10 −5 for all the models except benzene for which was κ F = 1000.
For carbon a (1222 3 ) model was used, with radial functions in the form of a MLP expansion with Bessel functions 28 input, n max = 16 and cut-off 5 Å, which is shared across layers.We further set l max = 4 for atomic base and intermediate couplings on the layer zero.The first layer ACE (l) expansion and basis functions intermediate couplings were set to l max = 3.We utilize a single layer MLP with 16 inputs and 32 hidden units for the energy embedding together with linear embedding.For training we used the AMSGrad version of the Adam 78 optimizer with batch size 10, learning rate 5 • 10 −3 and κ E = 1, κ F = 500.During optimization, the learning rate was reduced in steps with a factor 0.6 down to 6 • 10 −4 .With each reduction the fit was restarted and energy and force weights in the loss functions were gradually adjusted to κ E = 10, κ F = 1 as well.Finally, the fit was further refined with the L-BFGS-B algorithm.
For training the carbon MACE model we use the same train and test split as above, 256 feature channels, l max = 3, messages with L max = 2 and radial cutoff of 5 Å.Optimization is performed with the AMSGrad version of the Adam optimizer with learning rate 0.01 and batch size 5. Energy and force weights in the loss function were set to 80 and 1000 respectively followed by switching energy weight to 1000 and force weight to 10.
The ACE (g) models and fitting algorithm are implemented within tensorpotential package 49 and GPU acceleration is enabled via TensorFlow 79 .All ACE (g) fits and evaluations were performed with double precision.

Appendix D: Comparison of computational performance
A comparison of computational performance for ACE (g) and MACE models is presented in Fig. 12.All simulations were performed using LAMMPS 48 with custom pair styles for ACE (g) and MACE, without domain decomposition.Evaluation of MACE was carried out exclusively on the GPU, with the KOKKOS package for neighbor list construction and MD trajectory integration.In contrast, the current ACE (g) is implemented in LAMMPS calling the TensorFlow graph on the GPU to compute energies and forces, while all other calculations run on a single CPU core, which requires CPU-GPU data transfer in both directions.Therefore, the current ACE (g) implementation can be substantially improved in the future.Appendix E: Efficient gradients for graph ACE ACE (g) graph basis functions extend over T layers, where each layer covers a cut off distance r cut .Therefore, in a homogeneous system with N neighbors around each atom, one can assume that the root atom interacts with T 3 N atoms and one may expect the effort for evaluating ACE (g) to scale as T 3 with the number of layers.Alternatively, one could argue that in each layer connections are made to each of the N neighbors and from this expect an even steeper scaling as N T .
Here we show that the evaluation of ACE (g) , including gradients, can be achieved with an effort that scales linearly with neighbors N and linearly with layers T only.This is achieved by a straightforward change of summation order, in close analogy to the linear scaling of ACE (l) with N .
We note that the implementation that we used to carry out the numerical examples and to establish timings presented here exploits the automatic differentiation available from TensorFlow and does not employ the following formulas for gradients directly.The reasons for this is that the optimization of the loss functions requires gradients with respect to positions for the forces and subsequent derivatives of the forces with respect to expansion coefficients.It will be interesting to compare the efficiency of TensorFlow automatic gradients with the explicit implementation of the following gradient for force evaluation.
We give explicit formulas for the case of coupled radial channels, but for other variants of ACE (g)  The atomic base is computed as where N i contains all atoms within the cut-off distance r cut around atom i but not atom i itself.Local ACE (l) on the next layer is a polynomial of the atomic base This iteration is initialized with φ (T ) i = 1 and iterated down to layer 0, where the resulting ACE is computed as For a scalar quantity like the energy, non-linear functions of several expansions may be computed and where the sum is taken over all N particles in the system.The force on atom k is obtained as In analogy to the computation of E i we start by iterating for computing ∇ k E i .We note that and for all atoms j in N i The derivatives can be obtained efficiently by recursive evaluation 10 and will therefore not be discussed here.We next collect prefactors and define for k ̸ = i and This allows us to compute the required derivative recursively as and Eq.(E15) extends the range of the interaction with every iteration by r cut , so that a naive implementation requires a neighborlist for atoms within T r cut from the root atom i.We show next how to avoid using large neighborlists and poorly scaling summations.
To this end we write out the recursion explicitly,

(E17)
The iteration terminates with ∇ k φ (T ) j = 0.The complexity of the recursive evaluation stems from the iterative summation, which in a direct implementation leads to a scaling as N T .This poor scaling can be avoided altogether by collecting and iterating prefactors.We define for t = 1, . . ., T − 1.The evaluation of the prefactors is linear with neighbors N and linear with layers T .The expression for the force is given as which requires another summation over neigbors N and the T layers.
Therefore the poor scaling with N T summations has been transformed to linear scaling with effort proportional to T N for energy and force evaluation.However the iterative construction means that h (t) i needs to be available on neighboring atoms, for example on 'ghost' atoms in LAMMPS domain decomposition.

Algorithm
For the algorithm we assume that bonds are stored as i → j for quantities ϕ ji and that the index n j→i of the inverse bond j → i can be looked up, i.e. n j→i = INV(n i→j ).If this is not available, the values of ϕ (t) ji and f (t) ji need to be computed and stored for the inverse bonds, too.We furthermore assume that tensors can be attached to atoms and communicated to ghost atoms.Finally, we use the recursive evaluation of ACE (l) as implemented in Ref. 10  .
(E29) The weights are combined with the effective force gradients for k ̸ = i f f f kn2p2l2m2 , (E30) for the computation of the forces (E31)

FIG. 2 .
FIG. 2. Illustration of locality constraint.The left hand graph a) is admissible, while the two other graphs are not.The middle graph b) is not a single connected graph, but consists of two disconnected graph fragments.The right hand graph c) includes an edge length larger than cut-off distance.

FIG. 3 .
FIG. 3. Illustration of an admissible graph with n − 1 edges and n nodes (top).Possible graphs with up to six nodes (bottom).Star graphs that are part of local ACE (l) are marked by a grey background.Graphs that have the same topology but different edge orientations are indicated by dashed boxes.For graphs with five and six nodes edge directions are not specified.

FIG. 4 .
FIG. 4. Basic subtrees up to order five including topological classification based on distances from the root node.

FIG. 5 .
FIG.5.Example of two cluster basis functions on the same nodes but with different topology.The left graph is a star and part of ACE(l)  .

FIG. 7 .
FIG. 7. Different contributions to the fourth moment (top row) and their graph representations (bottom row).The rounded arrows represent Hamiltonian matrix elements.Onsite matrix elements were ignored.The three-body fourth moment contribution folded onto itself implies self-interactions in ACE (g) as indicated in grey.The four node cycle graph is self-interacting.

FIG. 8 .
FIG. 8. Star decomposition of subtree with topology classification (123344344455).Numbers in the subtree index the distance from the root node.The decomposition is indicated by circles that contain the stars.The stars are classified by their distance from the root node and number of florets.

FIG. 9 .
FIG. 9. Illustration of (1236) graph and floret picking (upper panel, from right to left) and (1232), (1233), (1234) dandelion graphs (lower panel).The root node is not shown and the subtrees emerge from the root node, which fully characterizes the directions of the edges, therefore edge direction is not shown.

FIG. 10 .
FIG. 10.Comparison of different ACE(l)  and ACE (g) models for four-atom Si clusters with (⃝) and without (∅) selfinteractions.See text for details.

FIG. 11 .
FIG. 11.RMSE for energy and forces for different groups of structures in carbon dataset.
Appendix C: Model and training details a. Basis functions

FIG. 12 .
FIG.12.Computational performance for ACE(g)  and MACE models for carbon.See text for details.

TABLE I .
Energies (E, meV) and forces (F, meV/ Å) MAE of ACE(g)models for the revised MD17 dataset in comparison to the best published models as well as ACE related models.Values on the left from vertical line show metrics for training on 1000 structures, values on the right are for training on 50 structures.In both cases, metrics are obtained from the 99000 test structures for each molecule.Some of the potentials we reference used a slightly different procedure, for example, for MACE the test set size was 1000 structures only.Molecule ACE (g) ml-ACE 2 MACE 45 Linear ACE 13 NequIP 32 Allegro 40 ACE (g) Linear ACE NequIP 45 MACE

TABLE II .
Energies (E, meV) and forces (F, meV/ Å) RMSE of ACE(g)for the 3BPA dataset in comparison to other published models.In total five ACE (g) models were trained, the standard deviation is given in parentheses.