Machine learning a manifold

We propose a simple method to identify a continuous Lie algebra symmetry in a dataset through regression by an artificial neural network. Our proposal takes advantage of the $ \mathcal{O}(\epsilon^2)$ scaling of the output variable under infinitesimal symmetry transformations on the input variables. As symmetry transformations are generated post-training, the methodology does not rely on sampling of the full representation space or binning of the dataset, and the possibility of false identification is minimised. We demonstrate our method in the SU(3)-symmetric (non-) linear $\Sigma$ model.

Introduction -Symmetry principles have drastically simplified the description of particle physics in the twentieth century. Famously, the 8-fold way [1] of organizing pions and kaons into a representation of an approximate SU(3) flavor symmetry lead to the development of the quark model. In the same vein, future discovery experiments would primarily have access to the low-energy particle content of theories beyond the standard model (BSM): in the case of a broken approximate global symmetry, this includes the pseudo-Nambu Goldstone bosons (pNGB), transforming under the adjoint representation of the unbroken symmetry (see e.g. [2] for a relevant review.). If the BSM theory is confining, the symmetries of the low energy theory provide a window to the structure of the high energy theory through the barrier of the strong coupling regime. However, the pNGB representation need not have a small dimensionality, or define a simple topology. It may also be broken both spontaneously and explicitly, and the dataset may be noisy. Identifying residual (approximate) symmetries is therefore an interesting problem.
Motivated by this problem, we investigate the use of artificial neural networks (NN) to identify a symmetry in a dataset. We work with a simplified version of the problem: a function V (φ) symmetric under a transformation of coordinates φ → f (φ): V (φ) = V (f (φ)). To interpolate between datapoints we use a NN (recently discussed in the context of high energy physics in [3]), which allows us to test the local properties of the manifold and deduce the presence of a symmetry -or rather, eliminating the possibility of its absence -from its topology.
Detection of symmetry with the use of machine learning has a long history [4], though most attempts focus on mirror or rotational symmetries in image data and within the domain of computer vision [5][6][7][8]. In recent years there has been an increased interest in learning invariant transformations of input data which do not change the output of a specific machine learning task [9][10][11][12][13][14][15][16]. This is useful as the construction of invariant or equivariant NN reduces the number of samples of input data required for generalization. Machine learning has been used to explore various features of conformal field theories, including to distinguish between scale invariant and conformal symmetries [17]. It has also been demonstrated that computation of tensor products and branching rules of irreducible representations are machine-learnable [18]. Furthermore, a recent work has investigated using generative adversarial networks to learn transformations that preserve the measured probability density function of a random process [19]. Here we are interested in a variation to this problem, testing for the presence of a symmetry in a dataset that samples a patch of a function whose domain has a high dimensionality.
The use of NN for the detection of symmetries in such a context has previously been considered by [20,21] for translations, discrete symmetries, and SO(N ) SU(N − 1) with N < 3. The methodology in this paper differs from the approaches taken in Refs. [20,21] in two important aspects. Firstly, points related by a symmetry transformation are generated post-training. This implies that the local properties of the manifold can in principle be studied without global knowledge of the manifold, or a large number of close neighbors in the tangent space. Both of these implications may prove to be a marked advantage in datasets with large dimensionality. For example, no pre-training stage of narrow bin definition and data categorization is necessary. The fraction of data which is related by the action of a single generator roughly scales like 1 − (∆y/y) d , where ∆y is a narrow bin width in output variable y and where d is the dimensionality of the dataset. A further difference is that the methodology here can be used to demonstrate the absence of a symmetry, such that the probability of mis-identification is minimized. With sparser sampling the assumptions made about the symmetry transformation (for example its direction) may play an increasingly important role, potentially leading to the false identification of SO(N ) symmetry. We demonstrate in particular that our methodology can be used to show the absence of arXiv:2112.07673v2 [hep-ph] 31 May 2022 SO(8) (and the presence of SU(3)) using the non-linear sigma model.
Methodology -To detect the Lie algebra, we take advantage of the fact that the symmetry is continuous and locally defined. In the presence of a symmetry, an infinitesimal transformation of the fields of the form φ i → φ i = φ i + T ij φ j leads to a transformation of the effective action of O( 2 ): (1) The O( 2 ) and higher terms remain as the Lie algebra lives in the tangent space of the Lie group's manifold. They are (for simplicity focusing on a single multiplet without derivative interactions): A neural network can be used to interpolate a dataset and make predictions for the transformed fields. Then, if the symmetry is present, we should find where is the absolute percentage error of the NN on the validation set, φ is a datapoint in the validation set, φ its image under the transformation to be tested, and V NN (φ i ) is the NN prediction of the transformed field. As φ i is part of the dataset, V (φ i ) is known and does not need to be predicted by the network. Importantly, the 2 scaling is independent of the normalization, and approximates the a-priori unknown coefficients in the expansion.
Models -We use as inspiration a BSM scenario of a new scale of spontaneous symmetry breaking (SSB) that leaves behind Nambu Goldstone Boson (NGB) fields. 1 These would generically be the lightest fields and a reasonable guess as the earliest indication of the new physics. The symmetries exhibited by NGB interactions would then be a probe of the structure of the theory at or above the symmetry breaking scale. The interactions of the NGBs are parameterized by low-energy effective theories of spontaneously broken symmetries. We will focus on two such benchmark models, the linear and non-linear Σ models.
Non-linear Σ model -The non-linear Σ model (NLΣM) is given by: which has a non-linearly realized SU (N ) L ×SU (N ) R chiral symmetry and a preserved SU (N ) F flavor symmetry below the SSB scale f . As the flavor symmetry is manifest order by order in f , we can expand to O(1/f 2 ) to obtain: The pions of (6) are in the adjoint representation of SU (N ) F , and transform as where Θ a gives a set of infinitesimal transformation parameters. The f abc are the structure constants of SU (N ) which form the Lie algebra. Under the transformation in (7), the potential changes as V → V + O( 2 ). Note that for N ≤ 2, SU (N ) SO(N − 1). Our goal is to identify the SU (N ) flavor symmetry of the NLΣM, and in general SU (N ) will not be isomorphic to any SO(N ) group. Consider the NLΣM with the the lowest SU (N > 2) flavor symmetry. In this case the pions form an 8-plet in the adjoint representation of SU (3), but could also be rotated under SO (8). Acting an SO(8) transformation on these pions gives where T a are the generators of the SO(8) Lie algebra. This yields as one would expect for an infinitesimal transformation not associated with a symmetry of the theory. The ability to disentangle SU (3) from SO (8) is thus required to detect the correct symmetry present in the NLΣM.
To summarize, the NLΣM with an N = 3 flavor symmetry changes as: symmetry absent (11) under SU (3) and SO (8) transformations of the π a fields. This behavior will be exploited in our symmetry detection strategy below.
Linear Σ model -The same symmetry pattern SU (N ) L × SU (N ) R → SU (N ) F can be described by the linear Σ model (LΣM), given by where Working from the assumption that the NGB fields will be the lightest, we integrate out the heavy X, ϕ fields associated with unbroken generators. For simplicity, we assume sufficient symmetry breaking effects to lift the mass of the η field enough so that it may also be integrated out. 2 This leaves only the pion field interactions: This potential is again invariant under the SU (N ) transformations of (7). Unlike the NLΣM, however, the potential in (14) is also invariant under an SO(8) symmetry for N = 3 flavors: We therefore expect to be able to detect the presence of both symmetries. It will be useful to contrast a symmetry transformation with a non-symmetric transformation. For this purpose we use the simple transformation: 2 If (13) describes the low energy theory behavior of QCD-like confinement of some non-Abelian gauge field, the corresponding η would generically acquire a mass of the order of m X,ϕ due to explicit U (1) A breaking from instanton effects.  where n is the number of π fields. This transformation does not correspond to any symmetry and changes the potential as V LΣM → V LΣM + O( ). We will refer to this transformation as arb (8) in the rest of this paper.
Neural network -The methodology proposed above uses a sequential feed-forward neural network to perform regression. As a result of the universal approximation theorem (UAT) [22,23], there is no theoretical limit to the accuracy with which a neural network with a single hidden layer and enough neurons can approximate any function. Moreover, as was recently demonstrated in [3], hidden layers can increase the interpolating abilities of the NN (the LΣM (13) and NLΣM (6) SU(3) potentials contain 80 and 143 terms from 8 and 16 dimensional input respectively). In this section we report on neural network's architecture and hyperparameters used in the analysis below. We motivate these choices in the supplementary material.
To create our neural networks we used the Keras [24] library. The neural networks used had 8 hidden layers with 512 neurons with hyperparameters as in Table I, but we observed no strong dependence on this architecture. We found the best performance with an adaptive learning rate activation function with a small initial learning rate. No markers of overtraining were observed.
The training data was generated using uniform sampling in |φ| 1/4 , where φ = {π, ∂ µ π} represents an input field. This distribution of input points was chosen to get an approximately Gaussian distribution in V (φ). We found that the network'sĒ % performance scaled monotonically with the training set size, as expected. Notably, the performance was inversely correlated with batch size >8, which we attribute to the network effectively averaging out important features of the manifold.
Symmetry detection -After training, we use the neural network to predict (∆V ) NN (3) for the validation data. In the presence of a symmetry, the converged neural network should predict (∆V ) NN ∝ n with n ≥ 2 at leading order in ; in its absence, the leading term is n = 1. We can therefore deduce the presence of a symmetry from the absence of linear scaling for a large enough -range of predictions.
The noise due to the neural network loss function is typically correlated with the magnitude of the input vectors |φ| and depends on details of the sampling. 3 As the magnitude of the transformation is chosen independently of the input data, the neural network noise is in principle uncorrelated with . Then, the "error" in the prediction (3) for a converged network is given by We point out in particular that in the case of a symmetry transformation, no linear scaling is introduced in the error. Furthermore, we expect the scaling to become flat in for n Ē % /100%.
We demonstrate the scaling of our converged network with a simple polynomial fit (∆V ) NN =  Fig. 1. We give the resulting fits for both models (NLΣM and LΣM) in Table II. Both Fig. 1 and the fit coefficients in Table II demonstrate that there is a constant error in (∆V ) NN which approximately corresponds to the value ofĒ % for the network. Even by eye one can identify the linear or quadratic scaling in ∆V from Fig. 1. We find that we can correctly show that a 1 a 2 for the SU(3) transformations in the NLΣM, and for both SU(3) and SO (8) in the LΣM model. We also correctly exclude a 1 = 0 for SO (8) in the NLΣM, and for the arbitrary transformation (16) in the LΣM.
We also check that linear scaling in doesn't appear for the range we consider. To test this, we construct a sliding window in with a width corresponding to an order of magnitude in logspace. On this window we evaluate our simple polynomial fit on our data points for ∆V . In Fig. 2 we plot the resulting values for a 1 as a function of . In both the NLΣM and LΣM we find that for transformations that preserve a symmetry, both (∆V ) truth and (∆V ) NN are consistent with a 1 = 0 for all 1, whereas for other transformations a 1 = 0 is excluded for a significant range of 1.
Results and discussion -In this work we have proposed and demonstrated a method to detect a Lie group symmetry in a dataset using regression by an artificial NN. The NN was trained to replicate V (φ, ∂ µ φ) given training data in the form of {φ, ∂ µ φ, V }. The symmetry was then tested by measuring the NN response to an O( ) transformation of the input fields according to the Lie algebra associated with the Lie group symmetry, effectively augmenting the dataset. We used this method to test for SO(8) and SU (3) symmetries in the NLΣM and LΣM, see Fig. 1. As expected, we found the NLΣM is symmetric under SU (3) transformations, but is not invariant under SO (8). For the LΣM, we detected the presence of both SU (3) and SO(8) symmetries. The method presented here takes advantage of the fact that the Lie algebra lives in the tangent space of the group's manifold. This mitigates the importance of perfect interpolation as well as exact invariance under the full symmetry group: a symmetric system's true potential will not be exactly invariant under the Lie algebra transformation, but instead will exhibit O( 2 ) scaling. In contrast, a system that lacks the symmetry will instead exhibit linear O( ) scaling. By ruling out O( ) scaling, we can rule out the absence of a symmetry.
The power of the neural network lies in the ability to extend this method to more realistic scenarios in which the symmetry is obscured. The next steps are to apply this technique to recover the same symmetry from more realistic data limited by minimal experimental signals or contaminated by noise. Data from more realistic experimental signals would not in general provide an ordering for the NGB fields. The generators of SU (N ) do not commute with the operator that shuffles these fields, and so this method would only recover the symmetry in one of the (N 2 −1)! combinations of shuffled NGB fields. 4 For SU (3), we are able to reorder the shuffled fields by exploiting properties of members of the Cartan subalgebra, T 3 and T 8 . This trick may be formalized and extended to general SU (N ) or even general Lie Groups, but we leave this for future study. In future work we will also study 4 We note that the possibility of indistinguishable pNGB fields is not particularly worrisome. The assumption that these fields are massive implies the presence of explicit symmetry breaking of the SSB group, generally allowing the NGB fields to be distinguished. This the case for pions and kaons transforming under the SU (3) flavor symmetry of QCD and played a key role in the discovery of the 8-fold way.
the use of this method to recover approximate symmetries in the presence of explicit breaking.

SUPPLEMENTARY MATERIAL
Neural network optimization -Before the analysis in the main paper was carried out, a number of optimization studies were performed which we will describe here. Initial studies included linear regression and decision tree methods. Ultimately for the models discussed in this work, better convergence was found with the discussed sequential feedforward neural networks. In our optimization studies, we fixed all but one -or in some cases two -hyperparameters. This optimization was sufficient to produce networks, performing at the sub percentage level validation mean absolute percentage error (Ē % ). The hyperparameters in Table I were used as a default from which to improve on, and the range of hyperparameter testing is given in Table III. Network performance was assessed with the minimum value of theĒ % . The minimumĒ % was recorded for ten networks of each configuration, and the corresponding mean and standard error were calculated.
The optimal shape of the network was found at seven hidden layers with 410 ± 25 neurons per layer. However, the variance of the network performance effectively plateaued, and the stochastic nature of network training dominated the variation in theĒ % for networks with more than three hidden layers and two hundred neurons. Typically, increased network size increases the danger of over-fitting. Over fitting was only seen with small numbers of input training data points. Training time and forward propagation time can be reduced by using smaller networks. Thus, a network with three hidden layers and two hundred neurons per layer is advised. However, when using more training data points, one can see performance improvements from increased degrees of freedom.
The Adam optimizer performs very well across a wide variety of tasks. This problem was no exception, with Adam providing modest improvements over Nadam and Adamax. Notably, SGD failed to converge, implying that per-parameter historical weighting was essential. In addition, the use of a learning rate reduction after a set number of epochs was seen to improveĒ % with large data-sets; however, this was not tested rigorously. Adjustment of the β 1 and β 2 hyperparameters was investigated, though network performance was agnostic to both parameters with values above 0.8. Although the Adam optimizer possesses an adaptive learning rate, extreme learning rate values yielded the expected poor convergence. A learning rate in the region of [4 · 10 −4 , 2 · 10 −2 ] provided constantĒ % .
Network performance should scale with training data size, and this was seen. The network performance exhibited power-law scaling with training data size, and exponentially more data points are required to improve performance by the same amount. Thus, training time becomes the primary constraint on performance improvement. The variation of network performance with batch size was negatively correlated. The optimal performance of the network was found with mini-batches of size 8. Below this, network, convergence was unreliable. The preference of the network to lower batch sizes is in line with other typical training results. Training time is drastically affected by lowering the batch size as parallelization is reduced. However, the performance improvements are significant enough to suggest a batch size of 8.