Decoding conformal field theories: from supervised to unsupervised learning

We use machine learning to classify rational two-dimensional conformal field theories. We first use the energy spectra of these minimal models to train a supervised learning algorithm. We find that the machine is able to correctly predict the nature and the value of critical points of several strongly correlated spin models using only their energy spectra. This is in contrast to previous works that use machine learning to classify different phases of matter, but do not reveal the nature of the critical point between phases. Given that the ground-state entanglement Hamiltonian of certain topological phases of matter is also described by conformal field theories, we use supervised learning on R\'{e}yni entropies and find that the machine is able to identify which conformal field theory describes the entanglement Hamiltonian with only the lowest few R\'{e}yni entropies to a high degree of accuracy. Finally, using autoencoders, an unsupervised learning algorithm, we find a hidden variable that has a direct correlation with the central charge and discuss prospects for using machine learning to investigate other conformal field theories, including higher-dimensional ones. Our results highlight that machine learning can be used to find and characterize critical points and also hint at the intriguing possibility to use machine learning to learn about more complex conformal field theories.


I. INTRODUCTION
Conformal field theories (CFTs), which are quantum field theories with conformal invariance, appear in many areas of physics including condensed matter, statistical physics, and string theory [1,2]. This procedure turns out to be especially powerful in two spacetime dimensions (one spatial dimensional and one temporal dimension), where the conformal group is infinite-dimensional, and certain two-dimensional CFTs may be classified by a finite number of primary fields [1,2]. These CFTs, which are realized in a number of physically relevant systems, including the low-energy theory of the quantum critical point of the transverse-field Ising model [3], the edge states (along with the ground-state entanglement Hamiltonian) of fractional quantum Hall systems [4,5], and the Polyakov action describing the world sheet in string theory [6], are important for being rare examples of analytically tractable strongly interacting quantum field theories. Therefore, given some data of a quantum system, it is important to identify whether that system is described by a CFT. This data, obtained from either experimental measurements or numerical simulations, could be the lowest few energy levels of a given Hamiltonian or Rényi entanglement entropies, which can be measured by probing multiple copies of the system's state [7][8][9][10][11]. In particular, it is interesting to ask if this information can be used to detect whether a system is at Machine learning has been increasingly used to study a wide range of problems in different areas of physics recently [12]. Notable examples include classifying phases of matter [13][14][15], studying nonequilibrium dynamics of physical systems [16][17][18], studying the string theory landscape [19], and AdS/CFT correspondence [20], simulating dynamics of quantum systems [21], quantum state tomography [22,23], and augmenting capabilities of quantum devices [24,25].
In this paper, we use both unsupervised and supervised machine learning to investigate various two-dimensional CFTs, as sketched in Fig. 1. For supervised learning, we use a deep neural network. Our first training data set is the lowest energy level of exactly solvable two-dimensional CFTs. The chosen CFT models include the well-known Ising critical point and the SU(2) k anyonic chain parafermonic model (see Table I for a full list). We then ask the machine to locate and predict the nature of critical points of quantum spin chains to high accuracy. By looking at the confidence of the network, we are able to correctly identify the value of the critical point. Remarkably, our approach requires a single system size, whereas common methods (such as entanglement scaling [26]) require finite-size scaling. We further elaborate on the advantage of our method compared to conventional methods and provide examples in the Appendix. Given that the entanglement spectra (ES) of various topological phases of matter are also described by CFTs [5,27], we train our network with the lowest few Réyni entropies. While the relationship between Réyni entropies and ES is nonlinear and requires all Réyni entropies to solve for the ES exactly, we are able to extract the CFT that describes the ES of two different spin-ladder systems to high accuracy with only having access to the lowest few Réyni entropies. Finally, we use the autoencoder algorithm

Input
Reconstructed input ω Decoder Encoder

Input CFT Classes
Energy spectrum or Renyi entropies FIG. 1. Schematic illustration of the machine-learning algorithms to identify different CFTs. The pre-processed energy spectrum or Renyi entropies are stored into a vector that serves as an input to the algorithms. We consider two scenarios: supervised learning and unsupervised learning. In the former, labels of the CFT class are provided to a neural network classifier, which predicts the CFT theory that describes the given CFT data. In the latter, we use an autoencoder neural network, which learns an efficient representation of the energy spectra. The first half of the network acts as an encoder that maps the input to a single scalar variable ω, and the second half decodes ω and reconstructs the original input. We find that ω is directly correlated to the central charge. [28], i.e., an unsupervised learning algorithm, and we find that the value of the hidden variable is directly related to the central charge. This gives us a hint that the machine can detect the complexity of CFTs.

II. MACHINE LEARNING AND CFT BASICS
We first review the CFT knowledge needed to generate our training data (see Ref. [1] for a detailed review of CFTs and Ref. [28] for machine learning), which is taken to be the lowest 20 energy levels of a finite-size model. We take our system to have periodic boundary conditions, although our approach can be readily generalized to include other boundary conditions. In this paper, we restrict ourselves to rational CFTs (RCFTs), which only contain a finite number of primary fields, and we furthermore focus on CFTs with field content such that they are modular invariant (see the Appendix for definitions and details). Our methods may easily be applied to CFTs with nonmodular invariant field content. The discrete energy levels (in units of 2π/L, where L is the length of the system andh = 1) of a generic finite one-dimensional model which flows to a CFT is given by [1] where E 1 , E 0 , and v are nonuniversal constants, and c is the central charge of the CFT. We are also omitting subleading dependence on L due to corrections to the scaling limit. Here, R correspond to scaling dimension of the primary fields and m L and m R are non-negative integers describing the descendant fields.
As a definite example, we now discuss the structure of the primary descendant fields for the critical Ising model, the simplest nontrivial CFT and an example of a Virasoro minimal model. With modular invariance imposed, there are three primary fields for this model, h (0) L,R = 0, 1 16 and 1 2 . The number of descendant fields can be calculated by expanding the so-called character function, as reviewed in the Appendix. Upon doing so, one finds the lowest ten energy levels of this CFT (with the ground-state energy set to zero) are 2πv L × {0, 1 8 , 1, 9 8 , 9 8 , 2, 2, 2, 2, 17 8 }. The energy spectrum of some of the other models we consider are discussed in the Appendix, and the list of all CFTs we consider for unsupervised learning is given in Table I. We stress that this is by no means a complete list of RCFTs, as there is actually an infinite number of them.
We use a neural network to classify the input CFT spectra into their corresponding CFT classes (see Fig. 1). Specifically, we use a multilayer perceptron with with the input and output being the first 15 energy levels and their corresponding CFT class labels from the 13 CFT classes in Table I (see Appendix for a discussion on the stability of the network with respect to the different number of classes). For the training, we optimize the categorical cross entropy over samples of the preprocessed energy spectra of different CFT classes with their corresponding labels. The preprocessing step, which transforms the input such that the ground-state energy is set to zero and the other energies are rescaled so the largest energy level is 1 is crucial as it removes the contributions of the nonuniversal constants in the input data (see Appendix).
Note that conventional methods for identifying CFTs require data from different system sizes to extract the central charge and universal part of the spectrum. In particular, the central charge can be obtained from the scaling of the entanglement entropy in the ground state with the system size. The CFT class can then be identified by visually comparing the energy spectrum of the system with the predicted spectra of different CFT classes consistent with the obtained central charge [26,29]. In contrast, our method only requires the spectrum at fixed system size encompasses all the above steps in single black box and returns the CFT class corresponding to the input data. Note that using this method, the CFT type is not uniquely identified by looking at the spectrum at a fixed system size. The output of the algorithm is the most likely CFT in its library that describes the input spectrum (see also Appendix for a more detailed comparison).  Table I. The numbers in the legend refer to the CFT class.

III. CRITICAL SPIN CHAINS
We now use the network trained on ideal CFT energy spectra with added noise to make predictions for two physical models. Specifically, we feed the machine energy spectra from two many-body quantum spin models obtained using exact diagonalization. By analyzing the output of the machine, we are able to predict the location of the critical point and the type of CFT that describes it.
We first consider the transverse Ising model, Here, σ i α are the Pauli matrices on site i and h is a global magnetic field. For h = 1, the lowenergy theory of H I is a CFT whose central charge is c = 1/2 [minimal model (A 3 , A 2 )]. We perform exact diagonalization (for L = 22) for different h and feed the energy spectrum into the network. In Fig. 2(a), we observe that the entropy S of the output layer, S = − i p i ln p i , where p i corresponds to the i th value of the output softmax layer, is minimal at h = 1 and has a large probability of being described by minimal model (A 3 , A 2 ). This indicates that the network has not only correctly predicted the location of the critical point for this model but also the nature of the critical point, i.e., the CFT class, the central charge, and the primary fields describing the critical point.
Before moving on to the next model, we discuss another quantitative approach for identifying CFTs from energy spectra. We consider a clustering algorithm, using the Gaussian kernel of the Euclidean distance, i.e., d 2 (x, c m ) = e −||x−c m || 2 2 , where x is the input spectra and c m is the center of the mth cluster [28]. In our paper, the ideal center of clusters is known from the CFT theory. Therefore, given an energy spectra, we calculate and compare the kernel on the rescaled input and the cluster centers of each CFT. In Fig. 2(b), we plot d 2 for the Ising model and various CFTs. We observe that d 2 is peaked around the critical point for only the Ising model [ Fig. 2(b)], similar to the neural network approach [ Fig. 2(a)]. However,  Table I. The numbers in the legend refer to the CFT class.
we believe the neural network approach will be more reliable as it does not rely on a single energy spectrum.
We now move to a more complicated model, which has two critical points described by different CFTs and both two-body and three-body interactions. The Hamiltonian of this model, originally introduced in Ref. [30], is given by For λ = 0, this model is described by a minimal model (A 3 , A 2 ) as discussed above. When λ ≈ 0.856, the low-energy theory of this model is described by a different minimal model, (A 4 , A 3 ). Again, we feed the network the many-body energy spectrum for various λ. In Fig. 3(a),we observe that the machine is correctly able to identify the location and underlying CFTs of the two critical points with high accuracy. Similar results are seen for the Gaussian kernel method [ Fig. 3(b)].

IV. RÉNYI ENTROPIES
We now consider training with Réyni entropies. This is motivated by the fact that the (bipartite real-space) entanglement Hamiltonian, H e , of two-dimensional topological phases is often described by (either chiral or nonchiral) one-dimensional CFTs [5,27]. Unfortunately, it is hard to experimentally measure the eigenvalues of H e , i.e., the ES (although there are various theoretical proposals on how to do so [31]). Instead, one typically measures the Réyni entropy by preparing multiple copies of the state and interfering them [9]. Furthermore, one can calculate S n with quantum Monte Carlo, making the calculation of entanglement more manageable for larger systems [11,32]. We will demonstrate that, given a critical H e [33], one can train neural networks with Réyni entropies to correctly identify the underlying CFT.
The nth Réyni entropy is defined as S n = 1 1−n ln Trρ n A , where ρ A is the reduced density matrix and n is some positive integer not equal to 1 be obtained with knowledge of S n for all n. In practice, one can obtain an estimate of ES with only a finite number of S n [7,32]. In these approaches, the ES is obtained from the roots of a polynomial equation, whose coefficients are related to Réyni entropies through Newton's identities. However, rootfinding algorithms are sensitive to errors in the coefficients, making such schemes unstable in the presence of errors in S n measurements [34]. We approach this problem using machine learning.
If H e is a CFT, the ith ES level is given by We restrict the sum in Eq. (2) to the lowest 100 ES levels. For training, we consider a finite range of v L ∈ (0.2, 10). This range is chosen not to include large (small) v L , where excited state information is washed out (choice of cutoff plays an important role). Also, note that for larger n, S n becomes less dependent on the cutoff by definition. Thus, in the chosen range of v L , the choice of cutoff, i.e., simply truncating the sum, has little effect on our results.
Instead of each sample being a vector of energy levels as in the previous section, it is a vector of Réyni entropies, (S 2 , S 3 , . . . ). Here, we include up to 28 Réyni entropies, starting with S 2 . We train our machine with 10 000 different samples for each CFT class (same classes used for energy spectrum training). We generate the data uniformly by randomly choosing v L . We obtain a training accuracy of up to 94%, depending on the number of S α included. Generally, upon increasing the number of S α included, the accuracy increases (see Fig. 4). See Appendix for details of the networks and training. We test our model on two exactly solvable systems studied in Ref. [27]. These are quantum spin ladders which we refer to as the square ladder model (SLM) and triangle ladder model (TLM). The ground states of these two models may be written exactly [27], from which we obtain the reduced density matrix at the critical point. The critical theory of the SLM (TLM) is described by the minimal model (A 3 , A 2 ) [(D 4 , A 4 )]. We numerically calculate S n for L = 18 and use these numerical results as input into our trained neural network. We find that the neural network correctly predicts the CFT that describes H e for both models with high accuracy. As expected, this accuracy generally increases as one increases the number of S α included in the training set (see Fig. 4).

V. UNSUPERVISED LEARNING
We now turn to using unsupervised learning to explore two-dimensional CFTs. Our data consists of three families of CFTs (see Fig. 5 for the list of CFTs used for unsupervised training). We use autoencoders [35] to find a compressed representation of the CFTs (see Fig. 1). Previously, autoencoders have been able to detect the order parameter, i.e., magnetization, in the Ising model [14]. The autoencoder is comprised of an encoder function ω = f (x) and a decoder function r = g(ω), where the hidden variable ω encodes a compressed representation of the input x. The hidden variable is used by the decoder to find the reconstruction r. By restricting the dimension of ω, the network only approximately reconstructs the input, however, it learns the important features of the training data and encodes it in ω. Each class has 100 examples which consists of the lowest 100 energies of the CFT (with same noise added as our energy based classification section. ) We train different autoencoders on a set of energies corresponding to different CFT classes by minimizing where the sum is taken over N m examples in the training set. The x m is a input of the first layer and r m is a output of the last layer (see Appendix).
We consider the simplest case of h = 1 and show the value of ω for different CFT spectra in Fig. 5. We observe that within a single family of CFTs, the magnitude of the hidden variable has positive correlation with k, and hence the central charge.
Finally, we note recent work used supervised machine learning to investigate CFT correlation functions and the emergence of conformal invariance [36]. Recently, it has been demonstrated that for the specific conformal field theory like Ising CFT, one can use an unsupervised learning method to classify them without dimensional reduction [37,38]. However, the information of the critical point should be known in advance, which is different from our paper. Our paper specifies conformal field theory by using the hidden variable when restricting to a single-family without knowing the critical point in advance. In the future, it would be interesting to  include correlation functions in our unsupervised training to see if we could distinguish different families of CFTs.

VI. DISCUSSION
There are several directions where our paper can be readily extended. For example, the entanglement Hamiltonian of a CFT can always be written in terms of the boost operator [39,40] to which our methods may be applied. For critical onedimensional systems, the ES for particular entanglement cuts correspond with the spectra of boundary CFTs [41][42][43], which are also known [44][45][46]. We believe the methods developed in this paper can be straightforwardly extended to these CFTs.

Two-dimensional conformal field theories
In this Appendix, we summarize the important aspects of the two-dimensional CFTs relevant to the results presented in this paper. Detailed discussions of two-dimensional CFTs can be found in Refs. [1,2,47].
The Virasoro minimal models are the complete set of unitary CFTs with a finite number of irreducible representations under the Virasoro algebra; however, if a CFT is also invariant under a larger symmetry group, it may be an RCFT by having a finite number of irreducible representations under the extended symmetry algebra. This is the case for parafermionic models and superconformal minimal models, which contain conserved parafermionic and fermionic currents, respectively.
The two-dimensional CFTs we consider may be specified by a central charge, c, and a finite set of holomorphic and antiholomorphic fields, denoted by φ h L (z) and φ h R (z), respectively. Here, we use complex coordinates z = x + it and z = x − it to parametrize the two-dimensional coordinates (x, t ). The numbers (h L , h R ), which are called the conformal dimensions of the associated primary fields, are real numbers which are generically independent. With this data, it is known that the finite-size energy spectrum of a two-dimensional CFT (in units of 2π/L) is given by Eq. (1), where the lowest states correspond to primary fields and the higher states are known as descendants. However, the degeneracy of the states corresponding to primary operators and their descendants can be nontrivial [48].
We now review the degeneracy structure of the energy spectrum. In this paper, we simply present the result for the partition function and refer the reader to Ref. [1] for details. We consider an RCFT on a torus with complex-valued periods equal to ω 1 , ω 2 , and define the modular parameter of the torus as τ = ω 2 /ω 1 . Then we can write the partition function of an RCFT on the torus as [1] where are the so-called characters associated with a given primary operator φ h L,R . Here, M h L ,h R counts the number of occurrences of the primary φ h L (z) × φ h R (z) in the CFT and we use the parametrization q = e 2πiτ . The reason for considering the partition function on the torus is to demand that Z (τ ) be left invariant under the modular transformations τ → τ + 1 and τ → −1/τ . This strongly constrains the structure of the spectrum investigated in the main body of the paper. It is believed that only modular invariant CFTs can be realized by a one-dimensional quantum lattice model, although nonmodular invariant CFTs may arise as boundaries of two-dimensional lattice theories with bulk topological order [49]. In the following, we discuss the form of χ h L for the CFT families we are interested in.

Virasoro minimal models
In Virasoro minimal models, the central charge of the Virasoro algebra takes values of the type [29] where p, q are coprime integers such that p, q 2. Then the allowed conformal dimensions of the (anti)holomorphic representations are where The (p, q) and (q, p) models are the same. From the previous discussion, we know that the allowed values of (h L , h R ) and their degeneracies can be inferred by the set of modular invariant partition functions on the torus. The complete set of such partition functions has been entirely worked out for the unitary minimal models using the so-called ADE classification [50].
As a definite example, we consider the Ising CFT (c = 1/2), in which case there is only a single modular invariant choice of operators. If one expands the partition function in terms of the parameters q = e 2πiτ ,q = e −2πiτ , the full energy spectrum and its degeneracy can be read off from the coefficients and powers of the expansion. For the Ising CFT, the partition function turns out to be diagonal, meaning one only 043031-5 allows fields of the form φ h L (z) × φ h R (z), with h L = h R , and we can read off the spectrum from the degeneracy of q alone. Just giving the q dependence, and keeping terms up to q 5 and the level-3 descendants, the expansion is q 1 12 (1− 6 3(4) ) Z Ising = 1 + 8 √ q + q + 2q 9/8 + 4q 2 This expression should be read as follows: With energies measured with respect to the ground state and in units of 2π/L, we have unique states with E = 0, 1/8, 1, a twofold degenerate state with E = 9/8, a fourfold degenerate state with E = 2, a threefold degenerate state with E = 17/8, etc. This explains how the energy level structure of the Ising model given in the main text was obtained.

Z k parafermion CFTs
Z k parafermion CFTs have a central charge given by c = k+2 [29]. The conformal dimensions of the primary fields of these CFTs are where r = 0, 1, ..., k and s = −r + 2, −l + 4, ..., r. The associated character is [51] where c r s (q) is given by and η(q) is the Dedekind eta function. The partition function is then given by One may show that the theories for k = 1, 2, 3 correspond to Virasoro minimal models, but for k 4 we have new RCFTs.
In this paper, we include parafermionic theories with k = 4, 5, 6, 7. The energy spectra of these models can be obtained by the q expansion of Eq. (A10).

N = 1 superconformal minimal models
The N = 1 superconformal minimal models have central charge c = 3 2 − 12 k(k+2) , with k 2 an integer [29]. The scaling dimension of the primary field is where 1 r k − 1 and 1 s k + 1. Fields with r + s even have a conformal dimension given by The characters and partition function for this case are much more involved, and we refer the readers to Refs. [52,53] for their explicit expressions.

APPENDIX B: DETAILS OF THE NEURAL NETWORK ARCHITECTURES AND TRAINING
In this section, we present the details of the neural networks used in this paper and explain the training process.

Supervised learning with energy spectra
For the supervised learning approach that led to results shown in Figs. 2 and 3 in the main text, we use the following neural network architecture with the input and output being the first 15 energy levels and their corresponding CFT class labels from the 13 CFT classes in Table 1 in the main text: The layers are represented by their domain and the expressions above the arrows indicate the activation functions of each layer. To train the network, we take samples of the energy spectra of different CFT classes and add a noise term drawn randomly from the uniform distribution in (− , ). This is physically motivated by the existence of experimental measurement errors or subleading corrections to Eq. (1). It also serves as a form of data augmentation than can prevent overfitting [28]. We also preprocess the input such that the ground-state energy is set to zero and the other energies are rescaled so the largest energy level is 1. This removes the contributions of the nonuniversal constants in the input data. We then optimize the categorical cross entropy over 3000 samples for each class with = 0.1. The optimization is performed using the Adam optimizer with hyperparameters given in Ref. [54] in 2000 epochs with the batch size set to 128.

Supervised learning with Rényi entropies
For the results shown in Fig. 4, in the main text we use the following network architecture: where n is the number of Réyni entropies we use. Similar to the classification of the energy spectra, we train the network by optimizing the cross entropy using the Adam optimizer [54], this time with 500 epochs and batch size set to 128.

Unsupervised learning
We use the following architecture for the autoencoder used to obtain results shown in Fig. 5: where h is the dimension of the hidden variable ω. We train the network by optimizing C, using the Adam optimizer with 2000 epochs and batch sizes equals to 256.

APPENDIX C: PREPROCESSING PROCEDURE
In this section, we discuss how to eliminate the nonuniversal constants E 0 , E 1 , L, v. We refer to this procedure as 043031-6 FIG. 6. We use a similar neural network model as in Fig. 2(a) in the main text, but train with a different number of classes. The peak at h = 1 is stable in different models.
preprocessing. We have the following relation: Let H i be the ith value of h L + h R , which is an integer. Notice that the lowest level, H 0 , is zero for all CFTs investigated in this paper. Defining {X 0 , X 1 , ...X n } as our nonuniversal energies, we have We then have the energy differences from the ground state: We then rescale the highest shifted energy to be 1. This gives a set of n preprocessed energies, {x 0 , x 1 , ...x n }, given by We see that x i is independent of E 0 , E 1 , L, v.

APPENDIX D: COMPARISON OF MODELS TRAINED USING DIFFERENT CLASSES
We train our neural network (same architectures) using the same spectral data, but change the number of classes in the training set. The rest of the parameters are kept the same as those used in Fig. 2(a) in the main text. In Fig. 6, we observe that the critical point (h = 1) is visible in all cases and is stable. The less significant peaks, however, are not stable.

APPENDIX E: COMPARISON WITH THE CONVENTIONAL METHOD
Here we review two conventional methods (entropy scaling with system size and rescaled energy comparison) of extracting central charge from simulation or experimental data. These methods can be used separately or in combination with each other.
The first scheme relies on using the entropy scaling of the ground state. Specifically, it has been shown that the entan- glement entropy S of a subsystem of size in the ground state at one-dimensional (1 + 1D) conformal critical points is an extensive quantity that scales universally as [26,55,56] where c is the central charge, L is the length of the spin chain, a is the lattice constant, and C A is a nonuniversal constant. Therefore, by finding the ground state, e.g., using numerical methods, and examining S at different values of L, for a subsystem size at a fixed relative length /L, one can obtain the central charge c from the slope of the linear-log plot of S versus L as shown in Fig. 7(a). After extracting the central charge, we can find the corresponding CFT class by visually comparing the given energy spectrum with the CFT prediction in Eq. (C1). We describe this approach in more detail in the following.
To visually compare the spectral data with CFT prediction, one needs to first rescale the data. We follow Ref. [29] and rescale energies such that the energy of first excitation in the CFT matches that of the data. As an example, in Fig. 7(b), we compare the spectral data of H T = 2H I (1) + λH 3 at λ ≈ 0.856, shown in gray lines with various CFT classes shown in color dots. As shown in the main text, this model is described by the minimal model (A 4 , A 3 ) (CFT class 1 in Fig. 7). Moreover, the central charge scaling can correctly identify this model as there is no other CFT with c = 7/10 in the CFT classes we considered. However, we observe that in the absence of the central charge value, it is not straightforward to find the correct CFT by visually inspecting Fig. 7(b). In contrast, we use a different rescaling method and, by using the machine-learning approach, we can quantify the similarity of the data to different CFT classes and identify the correct one (as shown in Figs. 2 and 3 in the main text) using the energy spectrum at a fixed system size. 043031-7