StrainTensorNet: Predicting crystal structure elastic properties using SE(3)-equivariant graph neural networks

Accurately predicting the elastic properties of crystalline solids is vital for computational materials science. However, traditional atomistic scale ab initio approaches are computationally intensive, especially for studying complex materials with a large number of atoms in a unit cell. We introduce a novel data-driven approach to efficiently predict the elastic properties of crystal structures using SE(3)-equivariant graph neural networks (GNNs). This approach yields important scalar elastic moduli with the accuracy comparable to recent data-driven studies. Importantly, our symmetry-aware GNNs model also enables the prediction of the strain energy density (SED) and the associated elastic constants, the fundamental tensorial quantities that are significantly influenced by a material's crystallographic group. The model consistently distinguishes independent elements of SED tensors, in accordance with the symmetry of the crystal structures. Finally, our deep learning model possesses meaningful latent features, offering an interpretable prediction of the elastic properties.


I. INTRODUCTION
The elastic properties of crystalline solids such as elastic constants, bulk modulus, and shear modulus are important macroscopic quantities that determine materials' mechanical characteristics.Computational study of which can provide theoretical guidelines to understand various phenomena in solid materials, e.g., mechanical stability of crystal structures [1,2], pressure-driven deformation and phase transition of materials [3,4], the response of materials to sound wave propagation [5,6], and the hardness of materials [7][8][9], to name a few.From an atomistic scale description, an ab initio approach based on density functional theory has been employed to investigate the macroscopic elastic properties of materials, yielding, for example, elastic constants and elastic moduli that are in agreement with experimental results [10][11][12][13].
Three atomistic scale computational methods are usually adopted to calculate the elastic constants of crystal structures: an energy-based method [14], a stress-strain method [10,15], and a calculation from group velocity [16][17][18].The energy-based and the stress-strain methods are more standard, and derive the elastic constants using respectively an energy and a stress tensor acquired from an ab initio calculation.Although both methods yield a comparable prediction of elastic constants, the energybased method is computationally less efficient, involving a larger number of plane-wave basis set and k-point meshes so that the predicted elastic constants converge to reliable values [19].Indeed, the more efficient stressstrain method is commonly employed to determine elastic * Correspondence to: thiparatc@gmail.comconstants by established softwares such as VASP, Quantum Espresso and CASTEP [20][21][22].The stress-strain method is also utilized to create a database for elastic properties of inorganic compounds [23,24].
However, atomistic scale simulation can be computationally prohibitive, especially when the number of atoms in a unit cell grows large.Such constraint limits an ab initio method's capability to investigate more complex crystalline solids.On the other hand, advances in machine learning bring about alternative approaches to computational materials science.This data-driven paradigm can reasonably predict the elastic properties of crystal structures, provided sufficient training data from an ab initio method [25,26].Even when the number of atoms in a unit cell is large, such approach can efficiently predict the bulk and shear moduli of complex materials such as alloys [27,28].Applying machine learning together with ab initio calculations is also potentially useful in searching for novel superhard materials [29,30].
Machine learning models based on graph neural networks (GNNs) have received increasing attention in studying solid materials and molecules.With GNNs, it is natural to encode atomistic scaled descriptions of solids into a computational framework; atomic locations and the associated atomic attributes can be directly embedded into node attributes of a graph, whereas pairwise interactions among atoms can be encoded into the edge attributes.Efficient GNNs training procedures have also been proposed [31][32][33], enabling GNNs to learn the representation of a complex relationship between the input and its associated prediction [34].These neural networks can also be endowed with translation-rotation equivariant property, so that input atomic locations of molecules (or point clouds) in R 3 that differ only in their orientation or their centroid can be identified.Enforcing an SE (3) equivariant property helps the networks to extract a more compact (translation and rotation independent) relationship between the inputs and their predictions.Due to these appealing features, variations of SE(3) equivariant GNNs have been developed to study materials science [35].
In this work, we adopt a data-driven approach using graph neural networks (GNNs) to predict the elastic properties of crystal structures.Accounting for the symmetry of crystalline solids, our SE(3) equivariant GNNs take as an input atomistic descriptions of strained materials, such as atomic locations and their associated atomic attributes, and predict the strain energy density (SED) of the materials.The prediction of the SED, which is the energy stored in a crystal structure when distorted by either tensile or compressive strains in particular directions, can be obtained efficiently and relatively accurately, given sufficient training data.The model thus provides an alternative approach to the standard ab initio energy-based prediction method.
After the SED is computed, we can then calculate the elastic constants and construct the elastic tensor with a simple analytical expression, see Sec.II A. Other macroscopic (scalar) elastic properties including bulk modulus, shear modulus, Young's modulus, and Poisson's ratio immediately follow from the elastic constants.Sec.IV reports our model prediction results of these elastic properties.The prediction performances of the scalar elastic properties are comparable to those of the recent datadriven work [25,26].Importantly, however, the datadriven prediction of the SED and the associated elastic constants, which are fundamental tensorial quantities that depend on the crystallographic group, are first reported here.The trained model consistently reveal the independent components of strain energy tensors dictated by the symmetry of the crystal structures, as evident by the degeneracy structure of the distance metrics computed in Sec.IV B. Our symmetry-aware GNNs architecture as well as the dataset used to train the model are provided in Sec.III.Also, as opposed to a blackbox deep learning prediction, we show in Sec.IV C that the latent features (data representations) that the model learned are physically meaningful, rendering our GNNs explainable.We conclude this work in Sec.V, and provide supplementary results and necessary derivations in the Appendices.

II. LINEAR ELASTICITY BACKGROUND
A. The strain energy tensor and the elastic tensor Assuming linear elastic deformation and no external stress, the strain energy density (SED) of isotropic ma-terials can be expressed as [36] U (ϵ 1 , ϵ 2 , ..., ϵ 6 ) = 1 2 where C ij is an elastic constant and ϵ i is a strain component i ∈ {1, 2, . . ., 6} in Voigt notation.With the goal of obtaining the elastic constant, it suffices to focus on the SED tensor of rank 2, which we now show.Denote U ij ≡ U (ϵ i , ϵ j ) for i ̸ = j and U ii ≡ U (ϵ i ), which are the energy stored in a distorted crystal structure when the lattice is strained by at most two strain components (two Voigt indices).In this study, we consider U ij in units of eV per atom.Since the elastic constant tensor is symmetric, the SED tensor is also symmetric and can be expressed as for i ̸ = j and, Thus, the elastic constants can then be analytically obtained as a function of the SED: where δ ij is the Kronecker delta.
Because the SED tensor of interest and the elastic tensor are symmetric tensors of rank 2 with 6 × 6 components, so there are (6 + 1)6/2 = 21 independent components.In the matrix form, we denote the SED tensor as where only the upper triangular elements are shown (the lower diagonal elements are left blank for brevity).For crystalline solids, however, the upper triangular elements are not completely independent, depending on the symmetry of the crystal structure.For instance, the SED tensor of a cubic lattice is Note that some components of the elastic constants C ij can be zero in many crystal structures, e.g. a cubic lattice has 9 zeros out of 21 independent components.However, due to the property of SED from Eq. ( 2), U ij is never 0. For the purpose of machine learning regression, working with SED helps avoid zero inflation problems in the training dataset.Fig. E1 shows the distributions of C ij and U ij , which cover different ranges depending on the indices.Importantly, the distribution of elastic constants can concentrate around zero, but this is not the case for the SED.As an illustrative example of how the SED can avoid a zero inflation problem, consider the elastic constants of a diamond cubic crystal structure.

B. Strain operations
The unit cell of a crystal can be compactly parametrized by a lattice matrix L = [a b c], where a, b, and c are vectors of lattice parameters in the Cartesian coordinate.If the system is applied by strain, the lattice matrix will be deformed by where ε I is the strain matrix In this work, we assume the crystal structures will be deformed by at most two strain components, so that L ′ = L ′ (ϵ i , ϵ j ).
Due to an applied strain, the atomic coordinates r must also be transformed accordingly as where r f is a fractional coordinate of an atom in an unstrained lattice.Noting that r f typically shifts from its equilibrium value when the strain is applied (say, in density function theory or in experiments); however, the atomic relaxation is neglected for simplicity.

III. MACHINE LEARNING WITH SE(3)-EQUIVARIANT GRAPH NEURAL NETWORKS A. Crystal graphs
A crystal structure can be represented as a multigraph whose nodes and edges, respectively, encode atoms and their pairwise connections.A pair of nodes describing an atom of type m and an atom of type n can be connected by multiple edges, encapsulating the interactions between the atom of type m in a unit cell and atoms of type n in the unit cell as well as in other periodic cells of consideration, see Fig 1.An atom of type m can interact with an atom of the same type in periodic cells if the interaction range of consideration is large enough, represented by self-loops in the node m of the crystal graph.Each atom's location and its atomic features (e.g., atomic mass, electronegativity, polarizability, atomic radius, and etc.) in a unit cell is accounted for by the attributes of a single node.
More formally, a crystal graph G = (V, E) consists of a set of nodes (vertices) V and a set of edges E defined as where m and n are indices of the two atoms in the unit cell, f n is a vectorized atomic feature of an atom of type n, M is the number of atomic features, r fn is the fractional coordinate of an atom of type n, and T is a translation vector within the space of interest.Each node embeds two kinds of information: atomic attributes f and atomic positions r in the unit cell.From this definition, the unit cell and its periodic images is compactly represented as a multigraph; only a finite number of nodes describing the atoms in the unit cell is required, and the number of edges joining the two nodes grows linearly with the number of periodic images of the atoms in the structure of interest (see Fig. 1).
A crystal graph describing the whole crystal structure will require infinitely many edges information.In practice, one typically embeds only partial information of the structure using the description of the unit cell and a finite number of edges encoding interactions with neighbor atoms [37].One method to construct the edges is to connect a target atom to other atoms within a finite radius; however, still, this method generates an excessive number of edges, which is computationally infeasible for GNNs machine learning.Alternatively, some algorithms determine the connection between atoms using, e.g., the Voronoi tessellation, generating a moderate number of edges [38,39].However, for layered materials, using Voronoi tessellation causes the edges connecting atoms between layers to be absent.These interlayer connections are crucial in differentiating the structure strained in out-of-plane direction from the structures strained in other directions.In this work, we define an edge between a pair of atoms ∆r (dropping the translation vector superscript and the atom indices subscript for brevity) to be non-zero only if each component ∆r β where β ∈ {x, y, z} satisfies the following spatial cutoff criterion: and c β are the β components of the corresponding lattice parameters constituting the lattice matrix L, s is a cutoff distance, and δ is a small cutoff extension of our choice.This can reduce the number of edges from the hard radial cutoff criterion and ensures the connections of atoms between layers for layered materials by an appropriate choice of s and δ.

B. Rotational and translational equivariance
If two crystal graph inputs constructed from an identical crystal graph strained by two different sets of strained components (ϵ i , ϵ j ) and (ϵ k , ϵ l ) yielding the new vertices , respectively, are equivalent up to a 3D rotation and a translation, then we expect our machine learning model to predict the same SED.This symmetry-aware property can be implemented with geometric deep learning models [40].We will equip a latent representation of our model with this symmetry-aware (equivariant) property, which is defined as follows.
A latent representation ϕ : V → Y is equivariant if for each transformation T g : V → V with g being an element of an abstract group G, there exists a transformation S g : Y → Y such that the following condition holds: for all g ∈ G, v ∈ V.In our case, the group G of interest is SE(3), providing the latent representation with a 3D rotation and translation equivariant property.This latent feature ϕ can be achieved with SE(3)-equivariant GNNs, known as Tensor Field Network (TFN) [41], and with a TFN with an appropriate attention mechanism, known as SE(3)-transformers [42], see more detailed recipes of these two models in Appendix A.
The key idea that enables SE(3)-equivariant property of these two GNNs is that its message passing kernel is constructed from translational invariant spherical harmonic bases, with the structure-preserving transformation S g given by the Wigner-D matrices (see Appendix A).Under a 3D rotation, each spherical harmonic J basis function of the kernel transforms according to where D J (g) is the J th Wigner-D matrix, and R g is a rotation matrix associated with g ∈ SO(3), making the learned latent representation ϕ equivariant [41].The multi-head attention mechanism in SE(3)-transformers also induces an equivariant property on the product rule between each key and value pairs, rendering the whole message passing algorithm equivariant under SE(3) [42].
In this work, we use SE(3)-transformers to first build a compressed representation of the orientations of a strained crystal structure, then such learned latent representation will be passed to other networks with a high expressive power to perform the prediction (regression) task of the SED.The following section contains the entire framework for the SED prediction.

C. Model architecture and training procedure
We now describe our GNNs-based model that predicts the SED tensor.The architecture of our model, which we term StrainTensorNet, is illustrated in Fig. 2. The model is designed to receive two types of input: a crystal graph representing a strained crystal structure, and a one-hot vector of dimension 6(6 + 1)/2 = 21 [43] indicating the strained components, with a value 1 in the ij component that the strain operation is applied on and with a value FIG. 2. With a strained crystal graph and the strained component one-hot vector as the input, StrainTensorNet employs a self-supervised approach combining both classification and regression networks to predict both the degeneracy class vector and the Uij.The f 0 in,n ≡ fn denotes the input feature of the n th node, where a superscript, 0, indicates the input into the l = 0 channel of the spherical harmonics in SE(3)-Transformer, which is a channel whose feature is invariant under rotation (transforms like a scalar quantity).∆r ′ mn is the edge connecting a target node m and a neighbor node n.The degeneracy class vector from the classification network will be concatenated with the latent representation of the strained crystal graph (f 0 reg,n ) for the final regression network to predict the SED.

otherwise.
The crystal graph of a strained crystal structure is an input feature of the SE(3)-equivariant GNNs building block, which generates two latent features.The first latent feature (f 0 class,n in Fig. 2) is fed into a classification neural network that predicts a vector of dimension 21, identifying all the upper-triangular elements of the SED tensor whose values are exactly identical (by symmetry) to the component indicated by the input one-hot vector.This inherent classification informs the model to discern the degenerate structure of the SED tensor that depends on the symmetry of an input crystal graph (see Fig. 3).For example, for a cubic lattice strained in the direction 11, the input one-hot vector (1, 0, . . ., 0) T together with the latent feature f 0 class,n shall give a prediction of a vector that has the value close to 1 in the indices ij = 11, 22, 33, and the value close to 0 in the other 18 indices.Finally, the predicted degeneracy class vector (DCV) together with the second latent feature (f 0 reg,n in Fig. 2) will be fed into the final neural network that predicts (regresses on) the SED.
To generate an SE(3)-equivariant latent representation of an input crystal graph, as alluded to in the previous section, we first used the SE(3)-Transformer that receives the strained crystal graph input.The material structure from the database is first transformed according to Eq. ( 6) by a strain of a fixed magnitude, which is then converted into the strained crystal graph input.Noting that in the ab initio calculation, the atomic coordinates in the strained lattice are optimized to be in the equilibrium positions under an applied strain, but in this work, the fractional coordinates are kept as in pre-strain condition.The strained crystal graph input is given by where the prime notation indicates that the atomic positions and the translation vector are of the strained lattice.
The training data of the {U ij } is computed, via Eq.( 2), from {C ij } extracted from Materials Project database [24] (see Appendix E).For molecules and materials, if the number of training data is large enough, only atomic numbers are sufficient for the node (atomic) attributes [44].Nevertheless, in this work, the features such as atomic mass, electronegativity, polarizibility, atomic radius, atomic number, and ij indices, are used as node attributes.Polarizability data is obtained from [45], whereas other attributes are obtained from Materials Project database.The atomic number is vectorized through an embedding layer of dimension 512, which is then concatenated with the other four node attributes (see Fig. 2).Note that the elastic constants are influenced by the atomic mass since the phonon frequencies in sound waves, particularly the acoustic modes, are inversely proportional to the square root of atomic mass.Additionally, the resistance to changes in bond length correlates with both electronegativity and polarizability.These three atomic properties are thus incorporated into node attributes, and they significantly improve the model prediction accuracy.
Since the atomic features are scalar attributes, the input feature vector will be fed into the l = 0 channel of the transformer network since such channel transforms like a scalar (see Appendix.A).We'll denote each node's input feature vector as f 0 in,n = f n .For computational feasibility, we will restrict the output feature vector of the SE(3)-transformers to consist of 4 channels f l hid,n with l ∈ {0, 1, 2, 3}.These latent features are fed into two different SE(3)-equivariant GNNs without attention (TFNs), outputing a node-wise classification feature vector f 0 class,n , and the node-wise regression feature vector f 0 reg,n , see Fig. 2. Note that the attention mechanism is not deployed in the last GNN layer as it yields a better prediction accuracy.
To train the model to discern different orientations of a strained crystal graph and classify its appropriate degeneracy class vector, in each epoch, we draw a new realization of a crystal graph in a different orientation and of a new one-hot vector in the same class of vector associated with the original strained component.Specifically, a new crystal graph input is sampled from a uniformly random orientation (uniformly random Euler angles) around the center of mass in the unit cell of the original strained crystal graph.A new one-hot vector of the strained components is chosen such that the element 1 is drawn uniformly from one of the non-zero components of the DCV.For example, if the original strained component is 11 for a cubic material, a new one-hot vector in each epoch is drawn from the situations where only one of the strained components 11, 22, or 33 is 1, while the other components are 0.
Our self-supervised approach combines both the regression errors and the classification errors in the global loss function: where the SED prediction is the output from the regression network, L class is the binary cross-entropy loss function for multi-label classification, λ specifies the relative significance between the classification and the regression errors, and N is the total number of training samples.We stop the training when the gradient of the loss is relatively small over multiple epochs.Training was performed on NVIDIA A100 GPUs.0 D. Choosing a subset of strained crystal structures for training data By the virtue of rotational equivariance, we expect StrainTensorNet to efficiently predict the 21 components of U using training data consisting of only nondegenerate components.We have selected the number of non-degenerate components for training that depends on crystal systems, i.e. cubic, tetragonal, hexagonal, trigonal, monoclinic, and triclinic.For cubic materials, only 6 components are used for training (the number of nondegenerate components for other crystal systems can be found in Table I).Recall that, as discussed earlier, the cubic lattice has 5 distinct U ij as shown in Eq. ( 5); however, the strain tensor transform the cubic structures into 6 distinct structures (according to the symmetry in Laue class m 3m) as follows , where each element in the matrix stands for a crystal system of L ′ (ϵ i , ϵ j ), T , O, and M are tetragonal, orthorhombic, and monoclinic lattices, respectively, and a different number labeling the subscript indicates a different structure.Despite the fact that U 14 (C 14 ) is equal to U 15 (C 15 ) by the cubic symmetry, L ′ (ϵ 1 , ϵ 4 ) and L ′ (ϵ 1 , ϵ 5 ) possess different lattice symmetries which are orthorhombic and monoclinic lattices, respectively.This is because ϵ 1 strains the structure in the x direction, while ϵ 4 and ϵ 5 strain the structure in both the y and the z directions, and both the x and the y directions, respectively.Since L ′ (ϵ 1 , ϵ 4 ) and L ′ (ϵ 1 , ϵ 5 ) are not equivalent up to a rotation, the SE(3) kernel regards the two inputs as different inputs.By exploiting SE(3) kernel to identify inputs that are equivalent up to a rotation, for cubic materials, it suffices to use the training data consisting of the distinct input structures, strained only in 6 directions, i.e., 11, 12, 14, 15, 44, and 45.

E. The distance between SED components
To evaluate our model capability to predict the symmetry-dependent degeneracy pattern of the SED tensor, we use the Canberra distance between two components of the SED tensor as a metric: where N represents the number of samples, and ij and kl are Voigt indices.With this metric, the distance between U ij and U kl is zero if they belong to the same degeneracy class (their values are identical.)The top row  and 4 are averaged over the samples with the same crystal system in the dataset.Hence, the distance pattern of low-symmetry crystals, such as monoclinic and triclinic, will strongly depend on the dataset.

A. Rationale for the StrainTensorNet model architecture
The underlying principle for using an equivariant network as a core component is to construct a compact representation of the SED tensor, such that the crystal structures strained by different sets of ϵ i and ϵ j that are equivalent up to rotation (and translation) possess the same Fig.G1 shows our earlier model development with the goal to compactly represent the SED tensor, beginning from the simplest but less expressive to StrainTensorNet which can compactly represent the SED tensor rather accurately.All these trial GNN-based models were trained on crystal structures with fewer than 500 edges.The simplest model (Fig. G1(a)) takes as an input only the crystal graph with node attributes, i.e., atomic mass, atomic radius, electronegativity, polarizability, and embedded atomic number.This minimal equivariant model does not yield a sufficiently accurate representation of U ij ; we found that the model is more biased towards predicting the SED tensor whose distance matrix pattern is more akin to that of the cubic crystals.This could be because the model is not sufficiently expressive, and thus attempts to fit the majority of the training dataset comprising more higher-symmetry structures (see Table IV for the crystal system statistics of our curated dataset).
To improve the minimal model's expressiveness, we introduce the degeneracy class vector (DCV) as an additional input, to inform the model about the symmetrydependent class of the SED tensor.The DCV is first em-bedded by fully-connected layers and then concatenated with global max-pooled features or node attributes, as shown in Fig. G1(b) and (c), respectively.These symmetry-informed models yield lower MAE and RMSE of the SED tensor compared to that of the minimal model, while the model in Fig. G1(b) performs slightly better than the model in Fig. G1(c) (see Table VI).Moreover, the predicted distance patterns are less biased towards those of the cubic lattice.Specifically, for crystal structures with lower symmetry such as hexagonal, tetragonal, trigonal, and orthorhombic, the predicted distance patterns are relatively similar to their ground-truth distance patterns.As expected, informing the model about the DCV of the input helps force the distance between U ij and U kl belonging to the same degeneracy class to be closer to zero.
While these models with a DCV included as an input give good predictions of the SED tensor, they require the knowledge of the prestrained crystal symmetry a priori (to properly assign the value of the DCV).To overcome this limitation, we use a self-supervised method schematically shown Fig. 2 to modify the model to predict, rather than to require, the DCV.The classification neural network that predicts the DCV takes as an input a one-hot vector of the strained ij components, representing the ϵ i and ϵ j components in (7).This self-supervised technique enables the StrainTensorNet to predict U ij without any prior knowledge of the prestrained crystal symmetry.Importantly, StrainTensorNet can express the degeneracy class of the SED tensor relatively well, see Figs. 3 and 4.

B. StrainTensorNet's prediction results
Table II summarizes StrainTensorNet's prediction errors on the test set.To the best of our knowledge, Strain-TensorNet provides the first data-driven prediction of U ij with a reasonable accuracy.Using these predicted SED tensors together with Eq. ( 3), elastic tensors of materials can also be calculated.Our model's prediction accuracy significantly depends on the elemental composition of the compounds.Fig. 5 depicts the MAE of U ij and C ij , averaged over 21 ij components and categorized by elements.It is interesting to note that the compounds containing period 2 elements (Be, B, C, N, O, and F), some period 6 transition metals (Ta, W, Re, and Os), and some actinides (Th, Pa, and Pu), which have high bulk and shear moduli on average, exhibit a higher MAE for U ij compared to the overall MAE of the entire test set.As a result, their MAE for C ij are also significantly higher than the overall MAE of the test set.In contrast, the compounds containing lanthanides, which exhibit medium bulk and shear moduli on average, except for Gd, demonstrate a considerably lower MAE than the overall MAE of the test set, for both U ij and C ij .
To further investigate the prediction results of the elastic constants, we plot the comparisons between the prediction (obtained from the predicted SED tensors to-gether with Eq. ( 3)) and the ground truth of the elastic constants in Fig. 6.Fig. 6 (a) shows the prediction results of every C ij components of all crystal structures, whereas Figs.6(b)-(f) show the prediction results of C 22 , C 23 , C 26 , C 55 , and C 56 of cubic materials.For cubic materials, the model is trained only on a non-degenerate subset of 21 U ij components, specifically components 11, 12, 14, 15, 44, and 45.Notably, the model is able to predict other unseen components reasonably well.However, the prediction of the elastic constant components that are exactly zero is challenging.For example, the ground-truth values of C 26 and C 56 of cubic materials are exactly 0 GPa, while the means and standard deviations of these components from our model predictions are 0.69 and 4.45 GPa, and -0.43 and 4.85 GPa respectively (see Fig. G3 for the histograms of the prediction results).Although the standard deviations are relatively large, the means are less than 1 GPa.It is worth noting that the first-principles calculations also make some errors in these values, but these are usually not computed since they are known to be exactly 0 GPa.
To see how well StrainTensorNet discerns the geometric relationship between different strained crystal graph inputs, we plot the Canberra distance metric of the predicted and ground-truth SED tensors for comparison in Fig. 3.The predicted distance metric of cubic materials, as shown in the left column of Fig. 3, closely resembles the ground-truth distance metric, although there are minor discrepancies in the values that belong to the same degeneracy class.For instance, the distances between the predicted U 11 , U 22 , and U 33 are 4.8 − 6.2 µeV/atom, resulting in the percentage errors of 1.83 − 2.43% (computed from 1 ) For hexagonal materials, U 11 are identical to U 22 , but not to U 33 .The predicted distance metric of hexagonal materials accordingly forms a dark blue box in the 11 and 22 components.The percentage error between the predicted U 11 and U 22 is 2.22%.For monoclinic materials, the values of each U ij are not necessarily identical to one another, and the distance metric is averaged over monoclinic materials data.The resulting distance pattern may differ significantly depending on the dataset.Notably, these agreements between the predicted and the ground-truth distance metric in various crystal structures reveal that StrainTensorNet gives an excellent prediction of the degenerate structure of SED tensors of strained crystalline materials.Lastly, the model can also predict other elastic properties, including Voigt's bulk modulus (B V ), Voigt's shear modulus (G V ), Young's modulus (Y ), and Poisson's ratio (ν), using the predicted C ij together with Eqs.(B1)−(B4).Table II summarizes the MAE and RMSE of our predicted elastic properties compared to previous works.The MAE and RMSE of B V are 12.59 and 20.83 GPa, respectively, which are comparable to those of Mazhnik et al. [25].The MAE and RMSE of G V are 10.49 and 18.95 GPa, respectively, with the RMSE of G V being comparable to that reported by Zhao et al. [26].Our work also yields smaller errors for ν than Mazhnik et al. [25] Note that our dataset (see Sec. E) differs from the dataset in the relevant work of [25,26].Fig. G4 presents the MAE of B V , G V , Y and ν, categorized by elements.The model produces high MAE for compounds containing period 2 elements, mainly due to their larger MAE in U ij and C ij (see Fig. 5).On the other hand, the model can produce smaller MAE for lanthanide compounds, as their MAE in U ij and C ij (see Fig. 5) are smaller.

C. Interpretability of StrainTensorNet's latent features
To investigate how latent features facilitate successful prediction of the SED tensor, we employed the diffusion map to faithfully visualize the three-dimensional data representation of the high-dimensional latent features of the entire test set [46,47].Figs.7 and 8 reveal the dimensionality reduction of the latent features in the diffusion coordinates (φ 1 , φ 2 , φ 3 ).Two different latent features were considered: the global max-pooling layer from the GNN representation of the input crystal graph (denoted as A in Fig. 2), and the concatenation between A and the embedded DCV (denoted as B in Fig. 2).These low-dimensional representations in Figs.7 and 8 are colored by the energy scale (left column) and by the strained component ij (middle and right columns).
What does the max-pooled feature of the crystal graph input A represent?Fig. 8(c) shows that the crystal graph latent representations of the same material that are strained in different ij components are almost identical, in fact sharing greater resemblance than the representations of different materials that belong to the same symmetry class.This is consistent with the result of Bronstein et al. which shows that the latent representation in SE(3)-equivariant GNNs of the original input is very similar to those of the mildly spatially distorted inputs, and, interestingly, is less similar to those of inputs spatially translated further away [40].Hence, the latent feature A seems to uniquely encode the information of the input material, rather than the symmetry of the input.Thus A alone does not suffice to meaningfully represent the symmetry-dependent SED tensor (Fig. 8(a)  and (b)).
On the other hand, with the embedded DCV in the latent feature B, the network can differentiate different strained components within the same material class.When strained in 21 different components, the latent features A of three example materials (trigonal LuTlTe 2 phase, tetragonal YMnSi phase, and cubic Ta 3 Ru phase) that were not differentiable within the same material class (Fig. 8(c)) are now clearly segregated in the φ 2 variable through the help of the embedded DCV, see Fig. 7(c).In fact, the latent feature B which combines both the input crystal graph representation and the embedded DCV offers an interpretable latent representation of the SED (Fig. 7), which we now discuss.
The latent feature B is organized such that the (φ 1 , φ 2 ) coordinate encodes the strain energy density that varies from a smaller value in the (+, +) quadrant to a larger value in the (−, −) quadrant (see Fig. 7(a)).Since U ij stored by the tensile strain (i, j ∈ {1, 2, 3}) is typically higher than U ij stored by the shear strain (i, j ∈ {4, 5, 6}), Figs.7(a) and (b) consistently reveal that the latent features of the materials strained by the 11, 22, or 33 component have a negative φ 2 value, whereas those of the materials strained by the 44, 55, or 66 components have a positive φ 2 value.In addition, since U ij for i ̸ = j is computed from the sum of C ii , C jj , and C ij , it is typically larger than U ii and U jj .Figs. 7(a) and (b) also consistently reveal that the latent features of the materials strained by 44, 45, 11, and 12 components are respectively organized in the φ 2 coordinate from a more positive to a more negative value.Additionally, Figs 7(a) and (c) show that materials with higher (lower) average SED will be represented by a larger negative (larger positive) φ 1 variable.In summary, the max-pooled feature from the graph neural networks A encodes the material information, and, together with the information of the DCV, the concatenated latent feature B effectively encapsulates both the material information and its strained component-dependent SED.

V. CONCLUSIONS
We have demonstrated a novel data-driven framework for predicting the elastic properties of crystal structures using SE(3)-equivariant graph neural networks (GNNs).By leveraging SE(3)-equivariant GNNs as building blocks, our self-supervised deep learning model accurately predicts Voigt's bulk modulus, Voigt's shear modulus, Young's modulus, and Poisson's ratio that are comparable to other non-equivariant deep learning studies [25,26].
A key contribution is the prediction of the strain energy density (SED) and its associated elastic constants, which are tensorial quantities that depend on a material's crystallographic group.The similarity between the distance metrics of the SED components between the ground truths and the predictions demonstrates our model's capability to identify different symmetry groups of strained crystal structures.Requiring only a strained crystal graph and the strained component as the input, our approach offers an efficient alternative to the standard ab initio method for designing new materials with tailored elastic properties.
The interpretability of the model is also a notable feature.The learned latent representations taking into account the degeneracy class vector are organized in a physically meaningful structure for predicting the SED tensors.This interpretability aspect enhances the transparency of model prediction, enabling the justification of whether the prediction is physically relevant.
The combination of interpretability and the consideration of crystallographic groups sets our model apart from recent data-driven methods for predicting elastic properties of materials.We hope this work is a stepping stone toward an efficient data-driven approach for materials discovery and design, and opens up avenues for approaching more challenging tensor prediction tasks, e.g., predicting dielectric and piezoelectric tensors, which are second-rank and third-rank tensors, respectively.where the channel-l value message, v l ij , is the same as that of the above TFN, defined as and the attention on the edge ij (which is invariant under SO(3) [42]) is with a query vector on a node i, q i , and a key vector on the edge ij, k ij , given by, respectively, l≥0 denotes the direct sum over the l channels.Fig. A1 depicts the message passing with the attention mechanism of the SE(3)-Transformer model.Detailed numerical implementation of these models can be found in https://github.com/trachote/predict_elastic_tensor. where ).

Orthorhombic, Monoclinic, and Triclinic
Strain energy tensors of orthorhombic, monoclinic, and triclinic lattices do not possess any degenerate structure, and are parametrized by the most general tensor of the form (4) in the main text.The equivariant property of the SE(3)-transformer should be able to identify strained crystal graph inputs that are equivalent up to a rotation, thus to reduce the number of training data, we require one independent ij component of the SED tensor for each degeneracy class vector.The other strained components can be incorporated during the randomized update of the one-hot strained component vector within the same degeneracy class in each training epoch, see Sec.III C. The minimal set of ij indices of each crystal system can be chosen from the degenerate structure of the SED tensor for each crystal system in the previous Appendix C. Our chosen set is presented in Table I.

Appendix E: Dataset
Material data is pooled from the Materials Project database [24].There are 12,105 materials in the database that have the elastic tensor.However, only some materials are used in this work due to memory restriction.The memory usage increases with the number of edges, so the computationally feasible data is restricted to the materials with the maximum number of edges of 2,000 constructed by a cell-radii approach.There are 9,296 out of 12,105 materials from the database with this property (about 76.79%).Fig. E2 shows that the selected materials have bulk and shear moduli distributing in the range of 0 − 600 GPa, which can be a good representative of the whole database.In addition, we have found that the validation errors do not decrease significantly even when the number of edges increases to 3,000.The statistics of crystal systems curated for training and test sets of these 9,296 materials is shown is Table IV.The dataset is partitioned into a training set and a validation set using the stratified method (in scikit-learn) with a random seed number 0.  The hyperparameters that significantly affect the model performance are the number of convolution layers, the number of l-channels in the SE(3) kernel, and the number of hidden units of the learnable radial function in the kernel ϕ.The number of convolution layers is 4, where the model performance only slightly improves with more layers.The learnable radial function ϕ is a feedforward network, consisting of an input layer that is fed into a fully-connected hidden layer with a ReLU activation, which is forwarded to another fully-connected layer that outputs a hidden feature vector.We use the number of hidden units in the hidden layer of ϕ to be 128 to avoid over-squashing, and the performance only marginally improves expanding beyond 128 hidden units.The dimension of the hidden feature vectors f l hid,n is 32, whereas the dimension of f l reg,n and f l class,n are 128.The dimension of the input atomic number embedding is 512.The maximum degree of l is 4 where, of course, the model per-formance also improves if the degree is higher especially for low-symmetry crystal structures; however, increasing the maximum degree of l beyond 4 becomes infeasible on our computational infrastructure, with other hyperparameters fixed.The number of transformer-head is 2. We use ADAM optimizer and cosine annealing method for a scheduler with the initial learning rate of 10 −4 .

FIG. 1 .
FIG. 1.A 2-dimensional lattice and its strained crystal graph embedding.Grey dashed lines are periodic boundaries.The whole structure can be spanned by the 3 atoms in 3 different bases in the unit cell, represented by 3 different nodes.Black lines are (undirected) edges connecting two neighbor nodes.The strained lattice is represented as a multigraph on the right, which is the crystal graph input into our SE(3)equivariant GNN building block.Note that to describe the crystal graph for the unit cell and the atoms on its boundary, only three nodes are required.The multi-edges of each pair of nodes are distinguished by different crystallographic directions.

FIG. 3 .
FIG.3.The distance metrics d(Uij, U kl ) computed from the test set of the ground-truth SED (top row) and the predicted SED (bottom row).The columns organize the crystal structure, appearing from left to right is for cubic, hexagonal, and monoclinic materials respectively.The model prediction shows an excellent agreement with the ground-truth for crystals with high symmetry (cubic), and a good agreement with the ground truths for crystals with lower symmetry (hexagonal and monoclinic).
SED.Such symmetry-dependent SED tensor of specific crystal structures is demonstrated in the ground-truth distance patterns d(U ij , U kl ) of Fig. 3 (top), where the StrainTensorNet can well approximate such symmetrydependent patterns, see Fig. 3 (bottom).

FIG. 4 .
FIG. 4. The distance metric (computed from the test set) of tetragonal, trigonal, orthorhombic, and triclinic materials (top to bottom respectively) for (left) the ground-truth and (right) StrainTensorNet's prediction.

FIG. 5 .
FIG. 5.The bar charts display the MAE of the predicted Uij and Cij components, averaged over 21 ij components.The charts are categorized based on the elements present in the compounds in the test set and are colored according to the average bulk modulus for (a) and (c), and average shear modulus for (b) and (d), as indicated in the legend below.The dashed lines represent the MAE of each property, averaged over the data in the test set.It can be seen that most elements, whose compounds have high bulk and shear moduli, exhibit errors of Uij and Cij that exceed the dataset's MAE.

FIG. 6 .
FIG. 6.The predicted values (vertical axes) versus the ground-truth values (horizontal axes) of the elastic constants.(a) All the components of the elastic constants of the whole test set of all crystal systems.The results for only cubic materials are shown in (b) C22, (c) C23, (d) C26, (e) C55, and (f) C56, where we note that the 22, 23, 26, 55, and 56 indices did not appear in the input of the training set for cubic materials.

FIG. 8 .
FIG. 8. Three-dimensional diffusion map representation of the global max-pooled latent features of the graph neural networks (i.e., the latent feature A in Fig. 2).(a) is colored by the SED scale, and (b) is colored by the strained component ij.(c) highlights the latent features of 3 example materials colored by the strained component ij as in (b); the features of other materials are shown in transparent grey.The 3 materials are trigonal LuTlTe2 phase ( ), tetragonal YMnSi phase (△), and cubic Ta3Ru phase (□), which have low, medium, and high elastic constants, respectively.See the discussion and the interpretation of these plots in Sec.IV C.

2 .
Tetragonal II (for Laue class 4/m) Appendix D: The minimal set of strained components for each crystal systems in a training data set

FIG. G1 .
FIG. G1.Earlier trial models with SE(3)-equivariant GNNs as a fundamental building block.(Top) The building block consists of a strained crystal graph input that is fed into SE(3)-equivariant GNNs (SE(3)-transformer followed by a TFN, similar to that of the building block in StrainTensorNet), whose output is the global max-pooled feature from the TFN.(a) A minimal equivariant GNN model for SED regression.(b) An SED regression network that take both the known degeneracy class vector and the minimal equivariant GNN's representation as an input.(c) The same network as (a) but the input node attributes also incorporates the degeneracy class vector.
FIG. G2.From top to bottom rows are, respectively, the distance matric (of the materials whose numbers of edges are less than 500) of the ground truths, of the model in Fig. G1(a), of the model Fig. G1(c), and of the model model in Fig. G1(b).The last row suggests that the concatenation of the strained crystal graph latent representation and the degeneracy class vector significantly increases the expressiveness of the SED regression network, motivating the self-supervised network architecture in the main text.

FIG. G3 .
FIG. G3.Statistics of the prediction results (vertical axes) vs of the ground truths (horizontal axes) of the elastic constant components that are concentrated near 0 GPa.(a) The predicted C26 of a cubic system has the mean and the standard deviation of 0.69 and 4.45 GPa.(b) The predicted C56 of a cubic system has the the mean and the standard deviation of -0.43 and 4.85 GPa.(c) The predicted C26 of all the crystal systems has the mean and the standard deviation of 0.99 and 6.08 GPa.(d) The predicted C56 of all the crystal systems has the mean and the standard deviation of -0.20 and 9.22 GPa, respectively.The ground-truth distributions of C26 and C56 are centered around 0 GPa with vanishing widths.Although some of the predicted C26 and C56 are not exactly 0 GPa (for cubic system), most of the predicted values are near 0 GPa.
For diamond, C 11 , C 12 , C 14 , C 44 , and C 45 are 1054, 126, 0, 562, and 0 GPa, respectively.Then, Eq. (2) gives U 11 , U 12 , U 14 , U 44 , and U 45 subjected to 2% strain to be 7.506, 16.807, 11.509, 4.002, and 8.005 meV/atom, respectively.Note that, in this study, we consider U ij in the unit of eV per atom instead of per volume.The magnitude of U 12 is the largest as it is the sum of C 12 and C 11 , while U 14 and U 45 are smaller (but non-zero) because C 14 and C 45 are zero.

TABLE III .
Statistics of number of edges for 12105 materials created by cell radii and CrystalNN methods.

TABLE IV .
The number of data for training and test sets categorized by crystal systems.

TABLE V .
The MAE and RMSE of predicted elastic properties using the model in Fig.G1(b).The dataset for this model consists of materials whose number of edges is less than 2000.