Unsupervised learning of topological phase transitions using Calinski-Harabaz index

Machine learning methods have been recently applied to learning phases of matter and transitions between them. Of particular interest is the topological phase transition, such as in the XY model, which can be difficult for unsupervised learning such as the principal component analysis. Recently, authors of [Nature Physics \textbf{15},790 (2019)] employed the diffusion-map method for identifying topological order and were able to determine the BKT phase transition of the XY model, specifically via the intersection of the average cluster distance $\bar{D}$ and the within cluster dispersion $\bar\sigma$ (when the different clusters vary from separation to mixing together). However, sometimes it is not easy to find the intersection if $\bar{D}$ or $\bar{\sigma}$ does not change too much due to topological constraint. In this paper, we propose to use the Calinski-Harabaz ($ch$) index, defined roughly as the ratio $\bar D/\bar \sigma$, to determine the critical points, at which the $ch$ index reaches a maximum or minimum value, or jump sharply. We examine the $ch$ index in several statistical models, including ones that contain a BKT phase transition. For the Ising model, the peaks of the quantity $ch$ or its components are consistent with the position of the specific heat maximum. For the XY model both on the square lattices and honeycomb lattices, our results of the $ch$ index show the convergence of the peaks over a range of the parameters $\varepsilon/\varepsilon_0$ in the Gaussian kernel. We also examine the generalized XY model with $q=2$ and $q=8$ and at the value away from the pure XY limit. Our method is thus useful to both topological and non-topological phase transitions and can achieve accuracy as good as supervised learning methods previously used in these models, and may be used for searching phases from experimental data.


I. INTRODUCTION
Exploring phases of matter and phase transitions is a traditional but still active research direction in statistical physics [1,2], partly due to new phases of matter that have been uncovered. In recent years, this field of research is revived thanks to the introduction of artificial intelligence and machine learning methods to recognize phases and transitions [3]. Among the various methods, supervised learning methods are used to train the network using prior labeled data generated by Monte Carlo methods in various classical systems, such as the Ising model and its variants [3][4][5], XY models [6][7][8][9] with BKT phase transitions [10,11], Dzyaloshinskii-Moriya ferromagnets [12], skyrmions [13], Potts models [14] and percolation models [9]. The properties of quantum systems, such as the energy spectrum, entanglement spectrum [15], or configuration in the imaginary time [16,17] are also used as inputs to the neural networks for the training. On the other hand, unsupervised learning methods can provide unbiased classification, as they do not need prior knowledge of the system so as to classify phases and obtain the phase transition. These include methods such as the principal component analysis [18], autoencoders [19], t-SNE [20], clustering with quantum mechanics [21]. A key feature of these is to determine the sought-after properties (such as phase transition) by examining indicators from the scattered plot in the reduced space. Beyond equilibrium statistical physics, nonequilibrium and dynamical properties [22][23][24] are also obtained by machine learning methods. In addition, there are other newly developed schemes applied to the phases of matter, such as the adversarial neural networks [25], confusion method and its extension [26], the super resolving method [27], and even applications to experimental data [28,29]. Other developments in this area can be found in Refs. [30][31][32][33][34][35][36][37][38][39][40][41] and the review article [42].
It is well known that in traditional continuous phase transitions global symmetry is broken and these transitions can be described by the Landau theory. However, the topological phase transitions have no broken symmetry and therefore it is of great interest to understand how the transitions emerge and how to locate the transition points. Recent developments of machine learning offers new tools for this endeavor [6][7][8][9], based on supervised learning. However, the learning-by-confusion scheme when applied to the XY model predicts a transition temperature set by the value located at the max-imum of the specific heat [8]. In Ref. [7], it was found that significant feature engineering of the raw spin states is needed to relate vortex configurations and the transition. Moreover, a single-hidden-layer Fully Connected Networks (FCN) could not correctly learn the phases in the XY model, whereas the Convolutional Neural Networks (CNN) was successfully employed to learn the BKT transition [7], and the classification was later extended to the generalized XY model [9]. However, for unsupervised learning with the PCA method, it was claimed to be difficult to identify the transition [6]. Recently, Rodriguez-Nieva and Scheurer used the diffusion map method [43] invented by R.R. Coifman and S. Lafon [44] and related to quantum clustering [21], and devised an unsupervised learning method for identifying topological orders and successfully locating the BKT transition. They divided the configurations into several topological sectors with different winding numbers, then calculated the eigenvector Ψ and eigenvalues λ of the so-called diffusion matrix P (see Sec. II A). From the intersections of the average cluster distanceD and within-cluster dispersionσ, or equivalently the intersection of ∆λ (the jump in the eigenvalues) and σ λ (the standard deviation of the eigenvalues), the phase transition of the XY model on the square lattice can be obtained very well.
Our motivation of this study is to examine whether or not the diffusion map (DM) method of Rodriguez-Nieva and Scheurer (RNS method) is suitable beyond the pure XY model, such as the generalized XY model. Indeed we find that the DM method works for some topological phase transitions, but it fails to locate the phase transition in the generalized XY model in other regimes. Specifically, the RNS method for determining the transition point relies on the intersection of the two curves (such as the average cluster distanceD and within-cluster dispersionσ), and in the q = 8 generalized XY model, as illustrated in Sec. V, we cannot find an intersection there. The thermal fluctuation is not strong enough and the scattering clusters with different winding as numbers do not mix close to T c , i.e.D does not decrease substantially. The question arises: are there other quantities that can be used to locate the phase transition points?
In this work, we mainly use the Calinski-Harabaz (ch) index score [45] defined by ch t /ch b , related to the ratio ofD/σ, which means that if the variation ofD can be negligible due to strong topological constraints, the variation ofσ itself can help to determine the phase transition point. We also use the Silhouette coefficient (sc) [46] or its components to compare with the results.
The outline of this work is as follows. Sec. II shows the DM methods, the definition of the indices ch, sc and their components, their advantage and prior knowledge for using the indices. Sec. III shows the signature of ch and sc for the two-dimensional Ising model from configurations using the Swendsen-Wang algorithms [47]. In Sec. IV, the critical phase transition points of the XY model on the square and the honeycomb lattices are obtained using the DM method. In Sec. V, for the q = 2 and q = 8 generalized XY models, the global phase diagrams are obtained by the DM method assisted by PCA or k-PCA methods. Other techniques are discussed in Sec. VI regarding the effect of higher dimensions considered in the k-means method and how to automatically find the hyperparameter ε/ε 0 . Concluding comments are made in Sec.VII. In Appendix A, the simplest example 1D XY model is discussed, and the comparison between PCA and k-PCA is shown in Appendix B. Finally, a total of 11 indices used in unsupervised learning are listed in Appendix C.

A. The review of diffusion map method
Here, we explain the DM method of Rodriguez-Nieva and Scheurer [43]. Assume that we have M configurations {x l }, where l = 1, . . . M . The connectivity between x l and x l ′ is denoted by the elementary Gaussian kernel The normalization of K ε (x l , x l ′ ) can be done by performing the sum over l ′ , We can also perform the normalization along the direction of l (i.e., the sum over l). The eigenvalue equation of the diffusion matrix P l,l ′ is P ψ k = λ k ψ k , where ψ k 's are the right eigenvectors, with the corresponding eigenvalues λ k ≤ 1, for k = 0, 1, . . . , m − 1.
To find the phase transition point, Ref. [43] and its earlier preprint version [48] use two different ways, respectively: (a) Intersection of the mean distance of cluster centers D/2n, where n is the number of clusters, and the dispersion σ of the data points in each cluster. The quantities D/2n and σ are obtained from the scatter plot, where n is the number of topological sectors.
After obtaining the eigenvectors of the P matrix given by the authors project them unto a two-dimensional space, namely, a two-column matrix, then D/2n and the dispersion σ can be obtained from the scatter plot of the two column vectors or their modified version. The detailed application to the one-dimensional XY model is reproduced in Appendix A.
(b) Intersection of the mean fluctuation σ λ and gap of eigenvalues ∆λ, where and the gap of eigenvalues between the topological sectors n and n − 1, B. The indices ch and sc We propose to use indices instead of intersections. Based on the first two leading eigenvectors Ψ 0 and Ψ 1 of the PCA, kernel-PCA (kPCA) or the second and third vectors Ψ 1 and Ψ 2 of the DM method, for a manually chosen cluster numbers k, the ch index is given in terms of the following ratio: where B k is the between-group dispersion matrix and W k is the within-cluster dispersion matrix and they are defined as follows, where N is the number of data points, C q is the set of points in cluster q, c q is the center of cluster q, and c denotes the average center of all cluster centers {c q }, and n q the number of points in cluster q. The sc index of the i-th sample is where a(i) is the mean distance between sample i and all other data points in the same cluster, b(i) is the mean dissimilarity of point i to some cluster C expressed as the mean of the distance from i to all points in C (where C = C i ). The mean sc = sc(i)/N over all points of a cluster is a measure of how tightly grouped all the points in the cluster are. Sometimes sc a = a(i)/N or sc b = b(i)/N are also very useful [49].
The performance of a total of 11 indices is shown in Appendix C and the results of indices ch, dn, pbm and Ii applied to the q = 8 generalized XY model are all reasonable choices. Here, we choose two representative indices ch and sc as examples.

C.
Advantage and prior knowledge required using ch Fig. 1 draws the flowchart of the RNS method and the place of our modifications. Depending on whether or not we are considering topology, we perform dimensional reduction by DM or PCA (k-PCA) respectively, to get the features of the data, i.e., the eigenvectors of the diffusion matrix P or the covariance matrix. Then we apply k-means to cluster the two-dimensional or higher dimensional scattering points. If we do not choose "ch", then intersections of ∆λ and σ λ , or intersections ofD/2n and σ [43,48] can be used instead.
To test whether the clustering is good or bad, if "ch" is chosen, then the peaks of ch or its components ch t and ch b will be used directly. For the topological phase transition, the index ch or its component ch t and ch b can give signatures of phase transition when it is not easy to determine the intersection of the RNS method or when there is no intersection.
The index ch can also be applied to non-topological phase transitions, such as the order-disorder phase transition of the Ising model. This does not require any prior knowledge except for the configurations generated by e.g. Monte Carlo methods or from real experiments.
For topological phase transitions, such as the XY and generalized XY models, although this type of unsupervised learning is not similar to supervised learning (such as fully-connected layers, or convolutional neural networks), it still needs labels of configurations, i.e., the topological winding number and the number of possible phases. However, the label of the topological winding number does not mean the label of phases, and essentially, this method is still an unsupervised learning method.

III. THE 2D ISING MODEL
To test the ability to locate the phase transition point T c , we calculate ch for the simplest model, i.e., the Ising model, where J is the ferromagnetic interaction between a pair of nearest neighborhood spins, and S i = ±1. The unsupervised learning of Ising model has been studied before, (see e.g., Refs. [18,19]). We use a two-dimensional 64 × 64 lattice and generate N s = 2000 samples for each temperature T i and analyze them by PCA, using the scattering data of the first two leading eigenvectors Ψ 0 and Ψ 1 . Moreover, we calculate ch(k = 2) and sc(k = 2) for each T i according to Eqs. (6) and (8). Figure 2 shows the main results for the Ising model. In Fig. 2 (a), sc itself and sc b have a sharp decrease whereas sc a has peaks around T c (here we re-scale the results so the maximum value is 1). In Figs. 2 (b) and (c), the ch b index is peaked around 2.3; ch t and ch also have a sharp jump at the phase transition points.
To understand these results, the scatter plot of Ψ 1 and Ψ 2 are shown in Figs. 2 (d)-(f) for temperatures T = 1.5, 2.3 and 2.9, respectively. At low temperature, the clusters identified by the two colors separate from each other in the reduced space and finally mix together at high temperatures.
In general, the ch index is large when the clusters are well separated and the points in each cluster are well aggregated. From this viewpoint, in the low temperature phase, the ch index becomes large because all up states and all down states can be easily distinguished if the analysis is successful.
The index ch and their components perform well in detecting the transition, and the position of the peak or the jump is the largest at T = 2.3. Around the phase transition point, configurations possess the properties from both phases (paramagnetic and ferromagnetic) and the fluctuation is the largest there.
Comparing to traditional Monte Carlo results with the same size L = 64 as shown in Fig.2 (g)-(i), we find that the position of the peak is located at around 2.295 consistent with our ch or sc results with numerical intervals of 0.01. For the purpose of reference, the thermodynamic limit transition T c = 2.269 is marked.
It should be noted that if we simulate the Ising model by the Metropolis Monte Carlo algorithm [50], which flips one spin each time, in the low temperature limit, almost all spins choose to sit in the initial state (i.e., 111111, spin up). The scatter plot will simply have one group. However, this is not wrong because the state still obeys the Boltzmann distribution. This is the reason of small ch at low temperatures of using the Metropolis algorithm to generate configurations. However, we can still observe the signature at the phase transition point. Using the Swendsen-Wang algorithm instead [47], i.e., a global updating Monte Carlo method, the spin states will all be spin up or spin down in lower temperatures, which is not dependent of the initial state.

IV. THE 2D XY MODEL
The Hamiltonian of the classical XY model is given by where i, j denotes a nearest-neighbor pair of sites i and j, and θ in (0, 2π] is a classical variable defined at each site describing the angles of spin directions in a twodimensional spin plane. The sum in the Hamiltonian is over nearest-neighbor pairs on the square lattice (L × L) with the periodic boundary condition; other lattices can be also considered.

A. The 2D XY model on square lattices
Now we analyze the first example, i.e., the twodimensional XY model on the square lattice. Firstly, we generate the configurations with five fixed winding number pairs at ν = (ν x , ν y ) = (0, 0), (0, 1), (1, 0), (0, −1) and (−1, 0). For each fixed winding number pair, it should be noted that, for the two-dimensional geometry, the winding number component ν x = 1 means that the spins in each row form a winding number of 1 rather than just the spins on one row randomly selected. Cluster simulation algorithms [47,51] are not suitable to update the spins because the global flips break the topological winding number easily and therefore the Metropolis algorithm is used while trying to rotate the spin vector with a very small step each time so as to preserve the winding number.
For each topological sector, we generate m = 500 configurations. Combining all configurations {x l , l = 1, · · · , 5m} from all five sectors, we construct the kernel K ε . The elements between K ε (x l , x l ′ ) is defined in Eq. (1) and the normalized matrix P ε (x l , x l ′ ) is obtained, obeying its eigenvalue equation P Ψ k = λ k Ψ k . Using the scatter plot of the second and third leading eigenvectors Ψ 1 and Ψ 2 of P ε , the ch index are obtained and displayed in Figs. 3 (a)-(d).
There is a parameter ε in the above matrices and in order to deduce consistent results, we need to make sure the results are converging for a finite range of ε/ε 0 . We see that this is indeed the case for different values of ε/ε 0 in Figs. 3(a)-(d). For ε/ε 0 = 2.5, 3, 3.5 & 4, T c is less than 0.9, and then becomes 0.93 when ε/ε 0 = 4.5, 5, 5.5, 6, 6.5 and 7. Finally, the peaks move left and deviate from 0.93 again when ε/ε 0 = 8 or larger. It should be noted that here, 10, 000 or more Monte Carlo steps, are used in order to reach the equilibrium of systems.
The estimated points are labeled by the circles in Fig. 3 (e) and they all distribute nearby or on the red lines representing the latest result critical temperature T c = 0.8935 [52]The intersections ofD andσ are labeled by the gray regions in the critical regimes T c = 0.9 ± 0.1 from Ref. [43]. It appears that it is easier to use ch to locate the phase transition as we only need to identify the peak location. The results from Ref. [43] have greater uncertainty than those by using the ch index.
The histogram of our estimated T c is shown in Fig. 3 (f), which helps to determine the hyperparameter ε/ε 0 (see Sec. VI).
In Fig. 3 (g)-(i), the finite size effect of T c is checked using the ch with two smaller sizes L = 8 and L = 16 with ε/ε 0 = 5.5, 6 & 6.5. Combining the estimated T c (L) with L = 8, 16, 32, we use two different ways of (linear, exponential) extrapolation to get T c (∞) in the thermodynamic limit. The results are 0.88(5) and 0.89(5), respectively.

B. 2D XY model on honeycomb lattices
Here, we study the second example , i.e., the pure XY model on the honeycomb lattice, as the critical point is known exactly at T c = 1/ √ 2 ≈ 0.707 [53]. The geometry of the honeycomb lattice is equivalent to the 8 × 8 brick-wall lattice shown in Fig. 4, where every spin has three nearest-neighbor spins. To initialize the configurations with a fixed winding number (ν x , ν y )=(0, 1), the spins connected by solid gray lines are defined as forming ν y = 1 in the vertical direction. Specifically, if we start a spin at position (2,1), and then go left to → (1,1) → (1,2) · · · (2,8) and go back to (2,1) through the red  dashed lines, connecting the sites at the boundaries for periodic boundary conditions, the spins sweep an angle of 2π counter-clockwise.
It should be noted that two spins, such as (1,1) and (2,1), connected by the horizontal gray lines will contribute to the winding number ν y , and they also contribute to ν x . This poses a problem when fixing ν x and ν y independently. Fortunately, this problem can be solved. Specifically, in the first row, labeled by 1 in the vertical (y) direction, the relation of angles obeys 1) ). During the simulation, small fluctuations are allowed if they do not break the winding number.
In Fig. 5, using configurations constrained in five topological sectors on the 32 × 32 lattices, we find that the peaks are located at the exact value T c = 1 √ 2 ≈ 0.7 for several different values of ε/ε 0 in the interval [2,6] with intervals of 0.5. We also calculate the intersections by ∆λ and σ λ at ε/ε 0 = 5, 6, 7. The intersection can also arrive at 0.7 when ε/ε 0 = 7, but not 5 and 6. It indicates that the ch performs better than the intersection as the intersection method may not give a high confidence of the transition.

V. 2D GENERALIZED XY (GXY) MODELS
Here, we apply our method to the generalized XY models [54,55], whose Hamiltonian is given by where ∆ is the relative weight of the pure XY model, and q is another integer parameter that can drive the system to form a nematic phase. For both ∆ = 0 and ∆ = 1 the model reduces to the pure XY model (redefining q as 1 in the first case), and hence the transition temperature is identical to that of the pure XY model. However, such a redefinition is not possible with ∆ = 1. The phase diagrams of the GXY models depend on the integer parameter q, and they have been explored extensively.
A. q=2 GXY model The q = 2 GXY model has a richer phase diagram than the XY model and has an additional new phase [9]. Thus, it is a good candidate model to test our method away from the pure XY limit. The phase diagram is illustrated in Fig. 6 (a) on the ∆ − T plane, and we also show the results from our unsupervised method and those from PCA as comparison. The symbols N , F , P represent nematic, ferromagnetic and paramagnetic phases, respectively. The dashed lines are data from the MC simulations [54] mainly of L = 50 up to L = 300. The color indicates the value of ch index obtained from simulation with the system size L = 32. We now discuss the F -P , N -F and N -P transitions as follows.
(i) The F -P phase transition: Interestingly, the positions of the peaks by ch are consistent with the dashed line of the phase boundary in the whole region ∆ > 0.4.
The index ch performs very well where ∆ is away from the pure XY model limit. The essential nature of the F -P phase transition is still BKT.
(ii) The N -P phase transition: In the regime ∆ = 0, the ch peaks around T = 0.7, which is 0.2 less than 0.9. This discrepancy is likely due to the nature of halfvorticity in the N phase. We can improve the result by limiting the configurations to have the half-winding number ν x(y) = 1/2 as the topological constraint in our Monte Carlo simulation. The half-vortex physics has been discussed in Ref. [54].
Thus, we only consider (ν x ,ν y ) in the four types of combinations (±1/2, 0) and (0, ±1/2). To form (ν x = 1/2,0), the difference between a pair of spins located at the left most and right most boundaries is fixed as π and we assign each spin using the Eq. (A2). With the half vortex constraint, our results illustrated by the red symbols move closer to the dashed lines of the N − P transition in Fig. 6 (a)  (iii) The N -F phase transition: When we realize that the topological constraint makes the peaks (features) of the distribution for the spin directions implicit in the F and P phases, we use the configurations without any constraint in this case. The results are labeled by purple symbols with legend 'pca, L32'.
The behavior of ch depends on the structure of the sample points, and sometimes ch emerges as a sharp jump at the phase transition point such as in the Ising model discussed in the main text. Sometimes it is a local minimum value at the phase transition point. Take the q = 2 GXY model at T = 0.5 as an example, in Fig. 7, both ln(ch t ) and ln(ch b ) increase as a function of ∆. However the difference α = ln(ch t ) − ln(ch b ) = −gap decreases first and then increases around ∆ 2 = 0.22, where the gap has a positive value.
According to the following equation: clearly, min(ch) ⇒ max(gap) and the local minimum of the ch is at the location of ∆ 2 . For the N -F phase transition in Fig. 6 (a), the PCA is stronger than the DM method. Here, we would like to give a physical discussion about this because such a difference is related to the advantage of the proposed method, and thus a more detailed discussion should be made.
The reasons for PCA being stronger for the N − F transition can be explained as follows: (i) The N − F phase transition is not a topological phase transition [55]. The DM method designed here is to determine the topological phase transition. It is still interesting to see the ch of DM by inputting Ising configurations without any topological constraint. In Fig. 9, using k-PCA and the DM method with complete same configurations, the signatures of phase transition emerge and the results are consistent at T c = 2.3. However, the signature of k-PCA method is clearer.
(ii) From the view of the data, it is also understandable that PCA is stronger. Without any topological constraint, in the nematic (N ) phase, the spins prefer two dominant directions and their histograms obey a double peaks structure. In the Ferromagnetic (F ) phase, one main direction remains and one peak emerges for the histograms. Fig. 8 (b1)-(b3) show the typical histograms in three phases. The main feature difference between the phases is obvious.
However, with a topological constraint, the spin directions are mainly distributed according to the winding numbers. For example, the spin angles in each row are θ x = 2πx/L with additional fluctuations, where x and L are the number index and total number, respectively, in each row. Therefore, the distribution sits almost in the range [0, 2π] as shown in Figs. 8(a1)-(a3). The feature difference of spin angles disappears when using the constraint. Therefore, the DM method cannot get transition points using the configurations with constraints.

B. q=8 GXY model
For the q = 8 GXY model [55], Fig. 10 (a) shows the global phase diagram using the values of the ch index. The phases N , F 2 , F , P are obtained and the distributions are also shown.
In the new F 2 phase, the distribution of the spin vectors has 8 peaks, but is dominated by 4 possible angles. The 'X' shape dashed lines are from Refs. [54,55]. The orange color represents the values of ch by the DM method.
Let us first discuss the F (F 2 )-P transition. Clearly, for F -P transition in the range 0.5 < ∆ < 1, the peak positions of ch align well with the dashed line. In the center of 'X', the F 2 and P transition are also consistent with MC result i.e., T c = 0.5.
However, we could not use the intersection of the cluster average distanceD and within-cluster dispersionσ as described in Ref. [43] becauseD does not vary too much and there is no intersection as shown in Figs. 10 (e)-(g). The five different colors therein represent the five typical topological sectors. However, fortunately, when zooming in the figures in Figs. 10 (g)-(i), we find that, near the transition region, the shape of a fixed cluster shrinks because the data points gather closer together, and hence the within-cluster dispersionσ is smaller. The index ch = cht ch b thus develops a peak around T c = 0.5 as shown in Fig. 10 (c), with ch t and ch b also displayed in Fig. 10 (d). In contrast, Fig. 10 (b) shows that sc cannot be used to signal the transition temperature T c because it evaluates the difference, sc b − sc a , but sc a ≪ sc b in spite of the fact that sc a has a local minimum.
It should be mentioned that for the N -P transition, the use of either the integer or half vortex constraint is   Figs. 12 (a)-(d) show the detail of phase transition F -P , F 2 -F , N -F 2 , respectively. For the F -P transition, in Fig. 12 (a), fixing ∆ in the interval [0.5, 1] at steps of 0.1, the peaks of ch are located at 0.5 and 1 respectively, completely matching the dashed lines in the global phase diagram in Fig.10.
In Fig. 12 (b), for the F 2 -F transition, by fixing ∆ = 0.8, the peaks of ch are located at 0.2 with L = 8, 16, 32, 48 and 64. The other values of ∆ are not shown. In Fig. 12 (c), fixing T =0.2, 0.3 and 0.4, the positions ∆ of local minimum of ch located at 0.2, 0.3 and 0.4 respectively.
In Fig.12 (d), for ∆ = 0, using the 1/8 vortex constraint results in the most accurate critical points. The peak position is at 0.9 more closely to 1 than 0.7 by the integer vortex constraint. The error bars are obtained by the bootstrap method using 400 randomly chosen configurations between the total 2000 configurations and total of 20 bins.

VI. OTHER TECHNICAL MODIFICATIONS
In the above sections, during the use of the k-means method, we apply two output eigenvectors of the diffusion matrix P and then get the values of the ch index. The conclusion is that using two leading vectors leads to the best accuracy. At the same time, it is possible to design a way to determine the super parameter ε/ε 0 automatically. In this section, we will focus on such issues for the completeness of our method. Here, the effects of higher-dimensional features will be discussed using k-means, namely, whether or not the accuracy is enhanced when including more features will be clarified. The answer is that retaining more dimensions of the eigenvectors, the result may deviate from the right critical points or even predict wrong critical points.
Assuming the covariance matrix of the PCA method is C, and its eigenvectors are {Ψ d }, where d = 1, · · · , d max (∆d = d max ). For the Ising model, in Sec. III, the first and second vectors {Ψ d }, are used, where d max = 2, or ∆d = 2. Here, we consider vectors with ∆d = 3, 4 and 5. The difference is found between the results of two-dimensional vectors and higher dimensional data, as shown in Figs. 13(a)-(c). The estimated position of peaks are 2.31, 2.31 and 2.33 respectively. The result T c = 2.3 for ∆d = 2 is the closest to the peak of the specific heat.
For the generalized XY model (q = 8, ∆ = 0.5), with the DM method, the vectors with index d = 2, · · · , d max (∆d = d max − 1) are used as the first vector ignored. In Fig.13 (d), with ∆d = 3 or 4, the position of peaks are T c = 0.5 as the same as that of ∆d = 2. However, if higher-dimensional data are used, such as ∆d = 5 and 6, the ch score does not yield the correct signature of the phase transition as shown in Figs.13 (e) and (f).
Another issue discussed here is whether one can design an automatic method to determine the hyperparameter ε/ε 0 . Since its value has been tried many times for some reasonable choices, one may expect to find a way to determine its optimal value automatically.
To the best of our knowledge, at present, there are no such automatic determination of the hyperparameter ε/ε 0 . Here we devise a simple analysis from the statistics (histogram) of T c obtained by various ε/ε 0 : i. Calculate ch(T ) as a function of temperature T with various ε/ε 0 in a range from ε min /ε 0 to ε max /ε 0 by using the configurations with topological constraints; ii. Store the position of ch peaks, i.e., T c and the times they appear as shown in Fig. 3 (e) and (f). Sometimes, the peaks may not be not obvious, and a plateau emerges. The two ends of the plateau will be stored.
The histogram of T c vs. ε/ε 0 can be used to give the best estimate for the transition. Therefore, corresponding to the highest position T c = 0.96, the values 4 ≤ ε/ε 0 ≤ 6 are acceptable.
Another possible approach is to calculate a location dependent σ for each data point instead of selecting a single scaling parameter [57]. Then, the kernel matrix between a pair of points can be written as where σ i and σ j are the local scale parameters for x i and x j , respectively. The selection of the local scale σ i is determined by the local statistics of the neighborhood of point x i . For example, the scale can be set as where x K is the K-th nearest neighbor. However, here, K is also a hyperparameter to be chosen. By comparing Eq. (11) and Eq. (1), 2N ε = σ i σ j , the obtained ε/ε 0 ≈ 30 is about several times larger than the acceptable regimes.

VII. DISCUSSION AND CONCLUSION
In summary, we use the Calinski-Harabaz (ch) index to successfully locate phase transitions of a few classical statistical models, including the Ising, XY and the generalized XY models. From the scatter data, for which we use the ch index to obtain the phase transitions for the Ising, XY and GXY models on lattices. This combines the advantages of the PCA and DM methods, as the scattering data can be based on either the first two leading eigenvectors of the PCA Kernel-PCA or the second and third vectors from DM method.
The advantages of using the ch index are less steps, wider applications and better convergence. After kmeans applied to the eigenvectors of the diffusion matrix P , using the ch index, we do not need to maximize the visibility of cluster as proposed in RNS method. For some phase transitions, it may not be easy to find the intersections ofD and σ. In Figs. 10 (e)-(g), ch index could capture the signatures of both quantities. In Fig. 5, for the pure XY model on the honeycomb lattices, the exact critical point is at T c = 0.707. Using ch, when ε/ε 0 is in the interval [2,6], the estimated results of T c are all close to 0.7.
In the pure XY limit, we have tested that ch can locate the phase transition for the XY model on both the square and honeycomb lattices, similar to the DM method of Rodriguez-Nieva and Scheurer. For the q = 2 GXY model, the values of the ch index in the whole phase diagram matched the boundary between the ferromagnetic phase to paramagnetic phase very well even away from the pure XY limit. Close to the N -P transition, the accuracy can be improved by using the 1/2-vortex constraint in generating the Monte Carlo configurations. For the q = 8 GXY model, using intersections ofD and σ cannot be used to locate the transition point, such as at ∆ = 0.5, but the ch index can still work to locate the phase transition point due to its incorporation of the fluctuation of samples within each cluster. Moreover, the N -P transition can also be identified by using the 1/8 vortex constraint.
We have also compared the results from other indices (see Table I), but we find that the ch index works best overall for the models considered in this work. In the future, it will be desirable to systematically study the utility of various parameters in models of statistical physics that exhibit different natures of transitions.
The development of applying machine learning to physics could develop new methods for studying unknown phases. For the Ising model or similar models, the transition point is well known. This situation will help to check our proposed method.
If we get the first-hand data from experiments, and do not know the details of the Hamiltonian, our unsupervised method can help deduce the number of possible classes (phases) in the data. Furthermore, using the ch index we could identify phase transition points. Therefore, the method of using the ch index is very useful for future non-topological phase transitions. For the XY model and Hamiltonians similar to the GXY model, by using the ch index, topological phase transition points were obtained without using the traditional method of measuring correlations [55] and spin stiffness [52]. Of course, some reasonable prior knowledge is needed such as the possible winding numbers. The idea of DM can be applied to topological quantum systems (see Refs. [58,59]). In principle, the topological quantum phases and transitions between them may be probed using the ch or sc indices.
The Hamiltonian of the pure XY model reads where J is the coupling strength of the nearest neighboring pair of spins i, i + 1 and throughout this work, we set J = 1 for simplicity and use the periodic boundary condition θ N +1 = θ. For simplicity, following Ref. [43], we generate the spin vectors according to the following, where ν is the winding number, l is the identification number of the configuration {θ i }, with i varying from 1 to the total length N . The first term 2πν (l) i/N is used to define the winding number ν = i ∆ i /2π, where the ∆ i is in the range [−π, π) by the so-called saw function [7] obtained by replacing ∆ i with ∆ i ± 2π if it is not in the target range. The term δθ (l) i obeys the Gaussian fluctuation and θ (l) is generated randomly between [0, 2π). We consider two types of generated configurations with winding numbers ν = 0 and ν = 1. The histogram of the values of the first component of the diffusion map ψ 1 is shown in Fig. 14 (a). The histogram of ψ 1 has values at ±1 and ψ is a vector with size m × 1, therefore the values are equal to ±1 when ψ 1 is re-scaled by √ m. Fig. 14 (b) shows the largest 20 eigenvalues of the transition probability matrix P l,l ′ in Eq. (2). Two maximum eigenvalues are found equal to unity. Following Ref. [43], we also test ν = 7 according to the following equation, where ν = {0, ±1, ±2, ±3}. We find that ψ 1 has seven district values ranging from −0.03 to 0.03 , corresponding to seven winding numbers marked by the symbols in Fig. 14 (c). Eigenvalues λ k vs. k is also shown in Fig. 14  (d). Clearly, the plateau of eigenvalues appears in k ≤ 7. Figs. 16, represent some results of PCA and kPCA on simple and generalized datasets in the appendix of Ref. [43].
The simple datasets are configurations of onedimensional XY model with size N = 256, m = 1000, and σ θ = 0.3 and with topological winding numbers ν = 0 and ν = 1 according to Eq. (A2). Figure 16 show results of linear PCA, PCA with Gaussian kernels, and PCA with polynomial kernels, respectively (the top left, top middle and top right panels). Obviously, the classifications of PCA with a nonlinear kernel are much more clear for the XY models. The three dimensional visualization is based on the three reduced components. However, the above method fails for data generated with slight modification of Eq. (A2) with an additional term η (l) [1 − cos(2πi/N )], where η (l) is random in the range [−η 0 , η 0 ]. The results are shown in Fig. 16 at the bottom left, bottom middle and bottom right panels, respectively.

Appendix C: Other indices
As listed in Ref. [49], we check the 11 indices listed in Table (I) for validating the classifications. Take the q = 8 GXY model as an example, the value of the parameters such as the temperature T and ∆ are fixed as those in Fig. 10 (b). We find only four indices, presented in Fig.17 (a)-(d) produce signals at the critical points. These are ch, Ii, dn and pbm, whose full names are listed in Table  (I). The other indices could not give correct signals to locate the phase transition points.