Finding Symmetry Breaking Order Parameters with Euclidean Neural Networks

Curie's principle states that"when effects show certain asymmetry, this asymmetry must be found in the causes that gave rise to them". We demonstrate that symmetry equivariant neural networks uphold Curie's principle and this property can be used to uncover symmetry breaking order parameters necessary to make input and output data symmetrically compatible. We prove these properties mathematically and demonstrate them numerically by training a Euclidean symmetry equivariant neural network to learn symmetry breaking input to deform a square into a rectangle.


INTRODUCTION
In the 1894 paper "On the symmetry of physical phenomena", Pierre Curie articulated the following [1,2]: "When certain effects show certain asymmetry, this asymmetry must be found in the causes that gave rise to them." This observation, known as Curie's principle, is as useful now as it was at the turn of the 20 th century. Many physical phenomena are now understood to be consequences of symmetry breaking [3]: the mechanism that generates mass [4][5][6], superconductivity [7], phase transitions leading to ferroelectricity [8], and many others. Identifying sources of symmetry and symmetry breaking will undoubtedly continue to play a vital role in physicists' endeavors to model complex physical systems.
Machine learning techniques such as neural networks are data driven methods for building models that have been successfully applied to many areas of physics, particularly fields that employ computationally expensive calculations and simulations, such as quantum matter, particle physics, and cosmology [9,10]. An important consideration in building machine learned models of physical processes is how to incorporate the strong axioms we have about the symmetry properties of physical systems. Building these axioms into the data featurization, training method, or the model itself prevents the model from learning undesirable and unphysical bias that violates these axioms [11][12][13][14][15][16][17][18]. An important feature of these models, that we prove in this Letter, is that they are unable to fit data that is not compatible by symmetry (e.g. the input is higher symmetry than the output). If we hope to use learned models to gain physical insight, this is a crucial feature not a bug; many discoveries in physics have been made when symmetry implied something was missing (e.g. the first postulation of the neutrino by Pauli [19]).
In this Letter, we show how we can harness both the flexibility of neural networks and the rigor of symmetry to identify when our data (input and output) are not compatible by symmetry (violating Curie's principle). We prove that neural networks that automatically have the same symmetry properties as physical systems, symmetry equivariant neural networks, exhibit Curie's Principle. Thus, symmetry equivariant neural networks can be used as "symmetry compilers": These model will be unable to preferentially fit to output that is not symmetrically compatible with the input; it will instead automatically weigh all symmetrically degenerate possibilities equally. Furthermore, we can use gradients of error between predicted and known output to determine the form (representations) of symmetry breaking information missing from the input data.
We organize this paper as follows: First, we provide background to how these symmetry equivariant networks are constructed. Then, we prove the symmetry properties of the output and gradients of Euclidean symmetry equivariant neural networks. Finally, we demonstrate these properties numerically by training a Euclidean neural network to deform a square into a rectangle. To conclude, we discuss the applicability of these techniques to more complex examples.

BACKGROUND
A neural network is a function f that maps vector space V 1 to another vector space V 2 parameterized by weights W ∈ R N , i.e. f : The performance of the neural network is evaluated by use of a loss function. Common loss functions are the mean absolute error (MAE) and mean squared error (MSE). The weights W are updated by taking gradients of the loss L with respect to W , W i+1 = W i + η ∂L ∂Wi where η is the learning rate. It is also possible to compute gradients of L with respect to the input x, which we will use in our final experiment.
In this work, we use Euclidean neural networks which are parameterizable functions that are equivariant to elements of 3D Euclidean symmetry (3D rotations, 3D translations, inversion, and any composition of those operations such as mirrors, screws, and glides). A function is equivariant under group G if for the group representation D 1 and D 2 acting on vector space V 1 and V 2 , respectively, if f (D 1 (g)x) = D 2 (g)f (x), for ∀x ∈ V 1 and ∀g ∈ G.
If an object or pattern can be identified by a Euclidean neural network in one orientation, it is guaranteed to be identified with the same accuracy in another orientation. This general class of networks has been explored by multiple groups [15][16][17] and combines representation theory with work in deep learning on building equivariances into convolutional neural networks [20][21][22].
The success of convolutional neural networks at a variety of geometric tasks (whether the inputs are images, point clouds, or graphs) is due to them having translation equivariance (e.g. if an pattern can be identified in one location by the filters of the network, it will also be identified if it appears in another location). Euclidean neural networks are a subset of convolutional neural networks where the filters are constrained to be equivariant to 3D rotations. To accomplish this, the filter functions are defined to be separable into a learned radial function and spherical harmonics, F lm ( r) = R (l) (|r|)Y lm (r), analogous to the separable nature of the hydrogenic wavefunctions. For simplicity and without loss of generality, in this work, we use real spherical harmonics.
An additional consequence of Euclidean equivariance is that all "tensors" in a Euclidean neural network are geometric tensors and we must combine input and filter geometric tensors according to the rules of tensor algebra, using Clebsch-Gordon coefficients or Wigner 3j symbols (they are equivalent) to contract representation indices. We choose to express these geometric tensors in an irreducible representation basis and use spherical harmonics compatible with our chosen conventions.
For the experiments in this letter, we use continuous convolutions over points set geometries. Unlike most convolutional neural networks, our filter functions are defined over 3D space, not on a pre-specified grid. These methods and results generalize to images. To conduct our experiments, we use the e3nn framework [23] for 3D Euclidean equivariant neural networks in this work written with PyTorch pytorch [24]. The jupyter [25] notebooks used for running the experiments and creating the figures for this letter are made available at Ref. [26] under the heading Simple Tasks and Symmetry.

MATHEMATICAL FORMULATION
In this section, we prove symmetry properties of the gradients of a G-invariant scalar loss (such as the MSE loss) evaluated on the output of a G-equivariant neural network f (x) and ground truth data y true , e.g. ∂(f (x) − y true ) 2 /∂x. In the section Experiments, we demonstrate that these properties allow us to learn symmetry breaking changes to the input that allow us to find "missing data" implied by symmetry.
For a group G and a representation (V, D), the symmetry group of x ∈ V is defined as (1) For instance, consider a square x at the origin acted on by the representation D of G = SO(3). The symmetry group of the square is the dihedral group, a subgroup of SO(3). For contrast, now consider another square rotated with respect to the first one, its symmetry group is also the dihedral group; however, they are not identical since they are related by a rotation. Sym(x) = Sym(D(h)x); they are two different groups.
If D 2 is the trivial representation (D 2 (g) = 1), then the function is said to be invariant to G, which is a special case of equivariance. Let f : V 1 → V 2 be equivariant to group G. We first want to prove that Proof: For g ∈ Sym(x) (i.e. D 1 (g)x = x), Next, we want to prove Proof: For g ∈ Sym(x) ∩ Sym(y), Finally, we prove if f is a differentiable and invariant function, then f : Proof: For x ∈ V , let x = µ x µ e µ where the set of {e µ } µ is an orthonormal basis. We first, recall the definition of the derivative and the following equalities for how the vector x and its basis e µ transform under the representation D To show the equivariance of the gradient, by the invariance of f To recapitulate, for an G-equivariant function, the symmetry of the output has equal or higher symmetry than the input. When training a neural network, one uses a loss function to compute gradients of the loss with respect to network parameters or alternatively to the the input (data) to the network. If this loss is a G-invariant function and the network is an G-equivariant function, then the gradients are G-equivariant. Thus, the symmetry of the gradients have equal or higher symmetry than the input to the loss function. If the input to the loss function is a linear combination (such as the difference) of the network and ground truth output, the symmetry group the gradients will be a superset of the intersection of the symmetry groups of the network and ground truth output.
Finally, if the symmetry of the ground truth output is lower than the input to the network, the gradients can have symmetry lower than the input, allowing for the use of gradients to update the input to the network to make the network input and output symmetrically compatible. This procedure can be used to find symmetry breaking order parameters missing in the original data but implied by symmetry.

SYMMETRY AND INTERPRETATION OF NETWORK INPUT AND OUTPUT
3D space has Euclidean symmetry, denoted as E(3). Objects (e.g. geometry and geometric tensor fields) in 3D space lower that symmetry to a subgroup of E(3), as defined by Eqn. 1. The fact that Euclidean neural networks are equivariant to all Euclidean symmetry operations implies that the network will preserve any subgroup of Euclidean symmetry: point groups, space groups, subperiodic groups, and other subgroups that are not traditionally tabulated.
Properties in Euclidean space take the form of geometric tensor fields, geometric tensors defined over space (or on a specific geometric). These are also the input to Euclidean neural networks. In practice, we articulate this as two separate inputs, the point geometry of our system expressed as 3D coordinates (which is used by the convolutional filters) and features on that geometry (which is producted with convolutional filters). We express these tensors and our convolutional filters in an irreducible basis of O(3); the irreducible representations of O(3) are indexed by angular frequency L and how they transform under parity (odd or even parity). Some examples of input features would be scalars such as mass or charge which transform in the same manner as L = 0 with even parity (no change under inversion) or vectors such as velocity or acceleration which transform in the same manner as L = 1 with odd parity (all components change by factor of -1 under inversion).
Geometric tensors can represent many things; numerical properties (e.g mass, velocity, and elasticity) are the most familiar. We can also use geometric tensors to express spatial functions.
A function on the sphere can be projected onto the spherical harmonics (typically up to some maximum L), where Y L : s 2 → R 2L+1 , are the spherical harmonics. This is the angular equivalent of a Fourier transform.
The coefficients of this projection form a geometric tensor in the irreducible basis. We can additionally encode radial information in this projection by either adding radial functions to the projection procedure or interpreting the magnitude of the function on the sphere as a radial distance from the origin.
For example, we can project a local point cloud onto spherical harmonics at a specified origin and store the projection as a feature on a point; this is a common step in calculating rotation invariant descriptors of local atomic environments [13]. In this work, to project a local point cloud onto a specified origin, we treat the point cloud as a set of δ functions at corresponding angles around the origin and weigh the projection of each point by its radial distance from the origin.
We additionally re-scale this signal to account for finite basis effects by ensuring the max of the function corresponds to the original radial distance (f r ( r r ) = r ). See Figure 1 where this method is used to project the vertices of a tetrahedron onto an origin; the function magnitude is plotted to be proportional to radial distance. Because these projection coefficients form irrep tensors at specified locations in spaces (the projection origins), they can be the input or output of a Euclidean neural network.
EXPERIMENTS Two simple tasks. One learnable; one not.
According to Eqn. 3, since Euclidean neural networks are equivariant to Euclidean symmetry, the symmetry of the output can only be of equal or higher symmetry than the input. To demonstrate this, we train two neural networks to deform two arrangements of points in the xy plane into one another, one with four points at the vertices of a square, and another with four points at the vertices of a rectangle.
We interpret our network output as the projection of where we want each point to move. The procedure for generating these projections is described in the previous section. The predicted displacements for each point are articulated as a spherical harmonic signal and trained to match the spherical harmonic projection (for 0 ≤ L ≤ 5) of the desired displacement vector or final point location.
We could have alternatively used an L = 1 output to indicate a 3D vector displacement, but as we will show, the spherical harmonic signal is more informative in cases where the fit is poor. First, we train a neural network to deform the rectangle into the square. The network is able to accomplish this quickly and accurately. Second, we train another neural network to deform the square into the rectangle. No matter the amount of training, the network cannot accurately perform the desired task. This is because a square is higher symmetry than a rectangle, with symmetries of point group D 4h and D 2h , respectively. By Eqn. 3, the output of the network has to have equal or higher symmetry than the input.
In Figure 2, we show output of the trained networks for both cases. On the right, we see that the model trained to deform the square into the rectangle is producing symmetric spherical harmonic signals each with two maxima. Due to being rotation equivariant and the input having the symmetry of a square, the network cannot distinguish distorting the square to form a rectangle aligned along the x axis from a rectangle along the y axis. The best prediction it can make is to provide an averaged output for both outcomes. Had we articulated our displacements as L = 1 vectors, the output in the high symmetry case would be the average of degenerate displacement vectors, in this case, a vector with zero magnitude.

Fixing task two with symmetry breaking
When going from the point group D 4h to D 2h , 8 symmetry operations are lost -two four-fold axes, two twofold axes, two improper four-fold axes, and two mirror planes. In the character table for D 4h , there is a 1-dimensional irreducible representation that breaks all these symmetries, B 1g , which has an L = 2 basis function x 2 − y 2 in the coordinate system of the z axis being along the highest symmetry axis and x and y aligned with two of the mirror planes. To lower the symmetry of the square to D 2h , we add a non-zero contribution to the Y 2,2 component (proportional to x 2 − y 2 ) of all the point features. When this term is added, the model is immediately able to learn the task with equal accuracy as a model trained to distort the rectangle into the square.

Learning Symmetry Breaking Input
The situation of having a dataset where the "inputs" are higher symmetry that the "outputs" is one that has occurred many times in scientific history when there is missing data -an asymmetry in the system waiting to be discovered. For example, neutrinos were first postulated by Pauli as undetected particles that would account for missing angular momentum and energy in measurements of the process of beta decay [19].
In the context of phase transitions as described by Landau theory [8], symmetry-breaking factors are called order parameters. The process of determining the representation of symmetry breaking quantities can be nontrivial, particularly for complex systems.
To perform backpropogation, a neural network is required to be differentiable, such that gradients of the loss can be taken with respect to every parameter in the model. This technique can be extended to the input.
In this section, we demonstrate we can learn candidate order parameters because of the symmetry properties of Euclidean neural networks according to Eqns. 6 and 8.
In this example, we require the input to be the same on each point. Additionally, because we want to recover the minimal input needed to break symmetry, we add a component-wise mean absolute error (MAE) loss on each L > 0 components of the input feature to encourage sparsity. It is important to note that we only apply the component-wise MAE loss to the input and not the network output. While the component-wise MSE loss is rotation invariant, the component-wise MAE loss is not and encouraging sparsity of the symmetry breaking parameter in one coordinate frame does not guarantee sparsity in general. We have chosen to train the network in the coordinate frame that matches the conventions of point group tables, so the irreducible representation basis functions can be directly compared.
In this task, the setup is identical to task two, but we modify the training procedure. We first train the model normally until the loss no longer improves. Then we alternate between updating the parameters of the model and updating the input using backpropogation of the loss.
As the loss converges, we find that the input for L > 0 consists of non-zero order parameters comprising only components that transform as the D 4h irrep B 1g , such as the spherical harmonic Y 2,2 , proportional to x 2 − y 2 , and the spherical harmonic Y 4,2 , proportional FIG. 3: Input parameters (top row) and output signal (bottom row) for one of the square vertices at (from left to right) the start, middle, and end of the model and order parameter optimization. The starting input parameter (on the left) is only a scalar of value 1 (Y0,0 = 1.), hence it being a spherically symmetric signal. As the optimization procedure continues, the symmetry breaking parameters become larger, gaining contribution from components other than Y0,0 and the model is starting to be able to fit to the target output. When the loss converges, the input parameters have gained weight on Y2,2, and Y4,2 components with other non-scalar components close to zero and the model is able to fit to the target output.
to (x 2 − y 2 )(7z 2 − r 2 ). See Fig. 3 for images of the evolution of the input and output signals during the model and order parameter optimization process. Modulo an arbitrary sign that depends on initialization, the resulting order parameter matches the x 2 −y 2 that we found in the previous section (using the character table of point group D 4h and the knowledge of which symmetries we wanted to break). Additionally, we find that the spherical harmonic Y 4,2 also transforms as B 1g , which can be confirmed with simple calculations.

OUTLOOK
We have used an archetypal example to demonstrate that symmetry equivariant neural networks exhibit Curie's principle: the output of the neural network must have equal or higher symmetry than the symmetry of the input. Using this property, we can uncover missing symmetry breaking information that must exist due to Curie's principle but is not provided in the data. We emphasize that while our experiments were specific to Euclidean neural networks, our results generalize to other symmetry equivariant neural networks such as permutation equivariant neural networks which are a subset of graph convolutional neural networks.
Euclidean symmetry equivariant neural networks provide a systematic way of finding symmetry breaking order parameters of arbitrary isotropy subgroups of E(3) without any explicit knowledge of the symmetry of the given data. We can even find order parameters that satisfy certain conditions by articulating those conditions in how we construct the input and loss function.
While our example involves only the most elementary of point groups, these methods can be applied to arbitrary geometric tensor fields. For these networks, there is no computational difference between treating these cases, whereas traditionally to arrive at these symmetry insights, one must derive character tables, compatibility relationships, and functional forms of irreducible representations from scratch. The same procedures demonstrated in this paper can be used to find order parameters of real physical systems: phonon modes of structural phase transitions in crystalline systems (e.g. order parameters describing the octahedral tilting of perovskites), missing environmental parameters of an experimental setup (e.g. anisotropies in the magnetic field of an accelerator magnet), or identifying other undetected or otherwise missing quantities necessary to preserve symmetry.
As useful as symmetry is, symmetry is a challenging tool to master. For example, using symmetry in the context of crystallography requires being aware of many conventions and understanding advanced language and concepts for describing symmetry relations; such information is tabulated in the growing 8 volumes of The International Tables of Crystallography [27]. Euclidean neural networks have symmetry built in; this allows for symmetry aware operations to be encoded without expert knowledge in representation theory, expanding the accessibility of these mathematical principles.