Unsupervised Machine Learning of Quenched Gauge Symmetries: A Proof-of-Concept Demonstration

In condensed matter physics, one of the goals of machine learning is the classification of phases of matter. The consideration of a system's symmetries can significantly assist the machine in this goal. We demonstrate the ability of an unsupervised machine learning protocol, the Principal Component Analysis method, to detect hidden quenched gauge symmetries introduced via the so-called Mattis gauge transformation. Our work reveals that unsupervised machine learning can identify hidden properties of a model and may therefore provide new insights into the models themselves.

Introduction -Machine learning (ML) has in recent years proven to be a powerful pattern recognition tool with applications in various branches of science. These techniques have shown their ability to extract, identify, and even propose descriptive patterns found in the input data. Particularly in condensed matter physics, the application of ML techniques began with the use of the Principal Component Analysis (PCA) method [1] and neural networks [2] to identify the ferromagnetic and paramagnetic phases of the Ising model on a square lattice. Since then, this field has exploded with a variety of ML applications [3][4][5]. These techniques and applications can be broadly grouped into two categories: supervised ML (SML), in which the input data is labelled to train the machine [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21]; and unsupervised ML (UML), in which the input data is unlabelled and the machine proposes its own classification scheme [19][20][21][22][23][24][25][26][27][28][29][30][31][32]. As a major task of the condensed matter physicist, the classification of phases in various models has remained central among these applications. Evidence is accumulating that the machine learning of phases can be guided by physical insights into the model or system, such as symmetries. This has been most clearly demonstrated by exploiting properties such as locality and translational symmetry via convolutional neural networks [2], or by taking advantage of symmetrybreaking to extract order parameters for hidden orders [7].
In light of the benefits that these physically-inspired shortcuts provide, one may ask a question of foremost importance for the usage of ML in physics: is it possible for ML to provide theoretical insight into the hidden or unknown properties of a model itself ? A fitting testing ground for such a question is physical models possessing gauge symmetries, as these models can be simplified by a suitable mathematical transformation. Our question then becomes a matter of determining if ML can detect the gauge symmetry of these models without prior knowledge. Doing so would prove that ML is capable of learning fundamental mathematical details of the studied model and not just thermodynamic quantities. This ability offers clear benefits for various branches of physics, including the aforementioned exploitation of symmetries for phase classification. Furthermore, the controlled mathematical nature of these gauge-symmetric models would also suggest their use as a probe of how ML methods work and what they are truly learning.
To explore this question, we therefore require (i) a model that seems complex but can be simplified by some gauge transformation, and (ii) a UML method whose self-determined classification scheme can be exposed. In light of (i), we study the Mattis Ising Spin Glass (MISG) [33,34] and the Mattis XY Gauge Glass (MXYGG) models [34]. At first glance, the MISG and MXYGG models look prohibitively complex: the Hamiltonians for these models possess almost arbitrary bond interactions, which make an analytical approach seem intractable. A visual snapshot of their ground state configurations displays no recognizable pattern but instead appears completely disordered. However, the MISG and MXYGG models can be transformed into the regular ferromagnetic Ising and XY models, respectively, under a (Mattis) gauge transformation [34]. Regarding (ii), an important consideration is the trade-off between interpretability and scalability. We therefore use PCA [35], which is highly interpretable and simple to apply, as opposed to neuralnetwork-based methods, which may be more powerful but are not as open to interpretation.
The outline of the paper is as follows. We first describe the MISG and MXYGG models, as well as the Mattis gauge transformation, in the Models section. We then give a brief introduction to PCA in the Methods section. In the Results section, we demonstrate that PCA is able to identify the gauge variables that quantify the Mattis gauge transformation. PCA additionally finds that the bond-disordered MISG and MXYGG models are simply disguised versions of the regular Ising and XY models, classifying the phases in the former gauge-transformed models in exactly the same manner as it would with the regular models. Our work suggests that interpretable ML methods can therefore be used to reveal hidden features in the models themselves, giving a positive answer to our above question. We conclude by discussing the implications of our findings for investigations into other models and for other ML applications.
Models -The MISG model [33,34] on a square lattice is defined by the Hamiltonian where the spin variables are σ z i = ±1. The couplings {J ij } are free to take the values ±J randomly, with the imposed constraint that the product P of the couplings around a square plaquette is positive: This constraint enforces a non-frustrated ground state in the system and allows a so-called Mattis gauge transformation to be applied [33,34]. This gauge transformation reexpresses the interaction couplings as J ij = i j J, where { i } are random site (gauge) variables that take values of ±1. Through this transformation, the Hamiltonian (1) becomes where τ z i ≡ i σ z i = ±1 are new Ising variables. It is now clear that this system possesses a welldefined order parameter given by the Ising model "τ −magnetization", illustrating that the MISG model is nothing but an Ising model in disguise. Further information about this mapping is given in the Appendix.
Similarly, the MXYGG model is described by an XY model with random phase factors {A ij } [36][37][38], where ∆φ ij = φ i − φ j is the difference between the onsite angular variables φ i ∈ [0, 2π). This is equivalent to an XY model with random Heisenberg exchange J ij ≡ J cos(A ij ) and Dzyaloshinsky-Moriya interactions D ij ≡ J sin(A ij ). This Hamiltonian is unfrustrated as long as the phase factors around a plaquette add to a multiple of 2π, i.e. P XY = ( i,j ∈ A ij ) mod 2π = 0 [34]. A Mattis gauge transformation can then be applied by defining random

2π). The Hamiltonian then becomes
where

(7)
Methods -PCA is a dimensional reduction technique that identifies which linear combinations of the input data best characterize the full dataset. The input data for this method is defined as n sets of configurations {x i (T j )} of an N −site system, where x i is some variable (e.g. σ z i ) associated with the i th site and sampled at a temperature T j (j = 1, . . . , n). The full dataset can then be formatted as a data matrix X data , After each row is centered by subtracting its mean value, the covariance matrix defined as X T data X data is diagonalized. The normalized eigenvalues and eigenvectors obtained are the so-called explained variance ratios {λ k } and principal components { u (k) }, respectively. Note that the eigenvectors can be rescaled by any convenient factor, such as the system size. The projection (k) (T j ) of the j th configuration {x i (T j )} onto the k th principal component u (k) takes the form These linear combinations { (k) (T j )} are the new quantities used by PCA to characterize the full dataset, where their relative importance is given by the values of their explained variance ratios. By construction, the i th value of any principal component is the coefficient multiplying the variable x i for any projection (k) (T j ); therefore, the components of the eigenvector u (k) directly contain sitedependent information. Plotting different projections against each other visually reveals how PCA "clusters" the input data and along which projections the data is most or least correlated. These clusters are composed of points which represent configurations with similar values of the projections. PCA is applied to the MISG model by using spin The gauge variables {b i } are randomly drawn from a discrete distribution { 2πn 5 | n = 1, . . . , 5} [39]. In order to sample uncorrelated data, 3 × 10 4 thermalization sweeps and 5 × 10 4 measurement sweeps are used at every temperature for both models; 50 different temperatures are selected. Sampling is done every 50 (100) measurement sweeps for the MISG (MXYGG) model, producing n = 5 × 10 4 (n = 2.5 × 10 4 ) configurations. In both the MISG and MXYGG cases, PCA has no information about the gauge variables The objective is to determine if PCA can identify these gauge variables and the underlying Ising or XY models regardless, to which we now turn.
Results for the MISG Model -PCA is applied to the configurations sampled through MC simulations for the MISG model. Plotting (1) versus (2) reveals a central high-temperature cluster and two adjacent lowtemperature clusters as illustrated in Fig. 8 of the Appendix, which is precisely how PCA clusters the input data of the regular ferromagnetic Ising model [1]. The similarity in this clustering suggests that PCA is characterizing the input data according to the Ising magnetization order parameter as it did in the regular case, thereby detecting the underlying Ising model.
To verify this quantitatively, the projection (1) of the input data onto the first (and most important) principal component is compared with the τ -magnetization calculated within MC simulations using Eq. (4), as shown in Fig. 1. The resemblance of the projection in Fig. 1a to the Ising magnetization in Fig. 1b demonstrates that PCA is learning this order parameter. This is further confirmed when this projection is plotted against the Ising magnetization in Fig. 1c, revealing a linear relationship with a slope of 1. Since (1) is equivalent to the τ -magnetization even when PCA was only provided with  with the ones produced with the known gauge variables used in the MC simulation. Moreover, a remarkable result comes from the distribution for the square plaquette values {P }, shown in Fig. 2c: this distribution is centered near the value P = 1 which defines the plaquette constraint used in the MC simulation. PCA's ability to learn the values of the gauge variables is additionally provided by MC simulations: when the learned gauge variables { i } are used within MC simulations, the resulting energy per spin and specific heat curves are equivalent to the original curves which used the known gauge variables, as detailed in Fig. 7 of the Appendix. After extracting the gauge variables, we can apply this learned gauge transformation to other quantities to confirm that the MISG model is transformed into the regular Ising model. For example, the second principal component of this model has been computed and plotted on the associated lattice sites [24]; by multiplying the second principal component of the MISG model by the first, as shown in Fig. 3, the known regular Ising result is reconstructed. This operation is therefore equivalent to applying the gauge transformation to go from the MISG model to the regular Ising model. Altogether, this comparison of the learned and known gauge variables and thermodynamic quantities demonstrates PCA's ability to identify the correct values of these quenched gauge variables.
Results for the MXYGG Model -We now turn to the more complex case of the MXYGG model. As in [22,24], we first perform PCA on the full dataset {{cos(φ i )}, {sin(φ i )}} generated from MC simulations. By projecting this data onto the first two principal components, which are equally most important, the resulting clusters have the same U (1) symmetry as the ones reported for the regular XY model (see Fig. 2 of [22] and discussion therein). This similarity in the clusters suggests that PCA is characterizing the full dataset of the MXYGG and XY models in the same fashion, i.e. according to the magnetization vector [22]. However, if PCA is performed only on the X dataset or the Y dataset of the MXYGG model, the resulting clusters still reveal a U (1) symmetry, as shown in Fig. 4; this is in contrast to the results for the regular XY model (see Fig. 9 of the Appendix). This difference indicates that PCA identifies some feature that differentiates the MXYGG model from the regular XY model. This suggests that PCA has detected the Mattis gauge transformation, which must also be present in the full dataset. Now that the presence of the gauge transformation has been identified, we return to the principal components calculated from the full dataset. As in the regular XY model [22], the first two principal components, u (1) and u (2) , have the largest explained variance ratios. These two eigenvectors describe the non-zero magnetization components observed in the finite system [22]. The projections of the data onto the first and second principal components take the form where we have defined to match the separation of cosines and sines in the full dataset {{cos(φ i )}, {sin(φ i )}}. Since we know that PCA is characterizing the data according to the magnetization vector, we identify the projections (1) and (2) with the components of the magnetization in Eq. (7). Through this identification the values of the gauge variables {b i } are extracted from the principal components, as detailed in the Appendix. The distribution of the extracted gauge variables {b i } is shown in Fig. 5, revealing five equally-spaced peaks as expected for the five equallyspaced choices of gauge variables. PCA is therefore able to calculate the transformation that maps the MXYGG model onto the regular XY model.
Conclusion -We have applied PCA to two spin models with random interactions, the MISG and MXYGG models on a square lattice. PCA was able to determine that each spin model can be related to a simpler model, namely the regular Ising and XY models. This was accomplished by (1) recognizing the similarities between the projections of the input data onto the principal components of the regular and gauge-transformed models, (2) identifying that PCA characterizes the data using the same thermodynamic quantity (i.e. the magnetization), and (3) verifying that the gauge variables calculated by PCA were consistent with the ones selected within MC simulations. These results should easily generalize to other gauge-symmetric spin models with random interactions, such as spin glass models with O(3) gauge symmetry [34,40]. Our work suggests that UML is capable of more than just classifying data; interpretable UML methods could possibly learn hidden features of an underlying model, such as symmetries and gauge transformations. For the physicist, this means UML could reveal previously unknown insights into a simulated model. It is of interest to investigate how other UML methods beyond PCA fare in this regard (e.g. autoencoders, which share some similarities with PCA, are capable of nonlinear fitting and therefore possess greater descriptive power [25]). Such methods may not be as interpretable as PCA; hence, using them to discover the hidden properties of a model might be a more complicated task. However, even in such cases, our work indicates that UML could at least "see through" nontrivial characteristics such as gauge symmetries. This suggests that UML methods could alternatively be used to efficiently label data for subsequently applied SML methods, which may explain how PCA and a neural network together learned the SU (2) gauge theory order parameter [21]. The generalization of this idea to other and more powerful UML methods may therefore expedite the learning process of a neural network, which makes this an avenue worth pursuing in its own right and especially for classifying phases. Lastly, gauge-symmetric models represent a class of models with known mathematical simplifications. Applying UML methods to these models may therefore provide a deeper understanding of how these methods work and what exactly they learn.
We thank W. Jin, C. X. Cerkauskas, K. Chung, A. Golubeva, R. G. Melko, and S. J. Wetzel for helpful discussions. This work was supported by the Canada Research Chair program (M.J.P.G., Tier 1) and by the NSERC of Canada CGS-M program (D.P.).

Definition of Plaquettes
A plaquette in the lattice is defined as the smallest region contained within a closed loop of neighbouring sites. On the square lattice, the resulting plaquettes are composed of four sites. For the Mattis transformation we introduce gauge variables i for every site to define the coupling constant J ij = i j J on every nearest neighbour bond. This procedure is sketched in Fig. 6.

MC Simulation with the Learned Gauge Variables
After applying PCA to the MISG model, we study the faithfulness of the learned gauge variables. We performed a MC simulation on the MISG model as before, but instead used the learned gauge variables in place of the known gauge variables. The thermodynamic quantities obtained with this simulation are then compared with the thermodynamic quantities which used the known gauge variables, as shown in Fig. 7. As can be seen, the energy and specific curves obtained for both simulations are identical, supporting PCA's ability to learn the gauge variables of the MISG model. models are used here. PCA is applied to the spin configurations from both sets of data, which are formatted in the same manner as the input data of the MISG and MXYGG models. The clusters identified by PCA for the regular Ising model are shown in Fig. 8. For the regular XY model, PCA is applied to either the X dataset ({cos(φ i )}) or the Y dataset ({sin(φ i )}). The projections onto the first two principal components of the X dataset alone or the Y dataset alone are shown in Fig. 9. Firstly, Fig. 9 should be compared with Fig. 4 of the main text. Although the clusters that PCA identifies for the regular XY and the MXYGG models look the same when provided with the full dataset, there is a clear difference when PCA is provided with only the X or Y dataset. This difference is indicative of an identified feature which is not present in the regular XY model. Secondly, Fig. 9 should be compared with Fig. 8. Previous work on the regular Ising model [1] has shown that the central high-temperature cluster and the two adjacent low-temperature clusters correspond to the paramagnetic and ferromagnetic phases, respectively, which PCA determines by summing the spin configurations. Fig. 9 can be similarly interpreted in light of this. PCA characterizes the input data of the regular XY model by directly summing the spin configurations along two orthogonal directions. This explains why the PCA clusters for the X and Y datasets of the regular XY model look like those of the regular Ising model. However, since the spin variables in the regular XY model are continuous and not discrete ±1 values as in the regular Ising model, the lowtemperature projection forms one continuous line rather than two separate clusters. Note that the two orthogonal directions along which the magnetization is determined by PCA are not necessarily the chosen x and y directions of the MC simulation, owing to the global U (1) rotational symmetry of the regular XY model. The determination of this global rotation is the focus of the next section.   for a global rotation angle α. Comparing this expression with Eq. (10), the components of the principal component eigenvectors in Fig. 10 therefore indicate the global rotation α along which PCA learns the magnetization. By considering this global rotation, the components of these principal component eigenvectors can be used to analytically determine the value of α; the histogram for this extraction is shown in Fig. 11. When this global rotation is accounted for, the principal eigenvectors do take values of only 1s or 0s, as shown in Fig. 12. This proves that PCA is learning the magnetization of the XY model along two orthogonal directions; this same analysis can be applied to the principal components of the MXYGG model to extract the local rotations produced by the gauge variables {b i }, giving the histogram in Fig. 5.  (1) and u (2) , for the regular XY model after applying a global rotation. The two branches of each graph correspond to coefficients for {cos (φi)} or {sin (φi)} data.