Deep learning of topological phase transitions from entanglement aspects

The one-dimensional $p$-wave superconductor proposed by Kitaev has long been a classic example for understanding topological phase transitions through various methods, such as examining Berry phase, edge states of open chains and, in particular, aspects from quantum entanglement of ground states. In order to understand the amount of information carried in the entanglement-related quantities, here we study topological phase transitions of the model with emphasis of using the deep learning approach. We feed different quantities, including Majorana correlation matrices (MCMs), entanglement spectra (ES) or entanglement eigenvectors (EE) originated from Block correlation matrices (BCMs), into the deep neural networks for training, and investigate which one could be the most useful input format in this approach. We find that ES is indeed too compressed information compared to MCM or EE. MCM and EE can provide us abundant information to recognize not only the topological phase transitions in the model but also phases of matter with different $U$(1) gauges, which is not reachable by using ES only.

Introduction-Going beyond Ginzburg-Landau theory of phase transitions [1], a topological phase transition (TPT) can occur when no symmetry is broken in a physical system.Since the discovery of integer quantum Hall effect [2], the very first example of TPTs, various materials potentially exhibiting TPTs have been proposed in recent years, although only some of them have been confirmed experimentally [3,4].Among them, the onedimensional topological p-wave superconductor proposed by Kitaev [5] has become one of the most interesting proposals due to the fact that the edge modes in these superconductors can be viewed as "Majorana fermions", whose anti-particle is the particle itself.In particular, they are essential components in forming practical fault-tolerant quantum computers [6].To realize such superconducting state, clever combinations of topological materials or semiconductors with an ordinary s-wave superconductor through proximity effect have been proposed [7].In the past few years, Majorana fermions/zero modes have been claimed found at the edges of certain systems either by directly observing the STM images [8] or indirectly by measuring the 4π-periodicity of Josephson junction currents [9].
In fact, Majorana zero modes can be alternatively detected by measuring quantum entanglement, a concept in the subject of quantum information.For instance, the von Neumann entanglement entropy of a subsystem A embedded in the environment B can be obtained via the density matrix after tracing out the degrees of freedom in B, i.e., the sub-entropy S A = − Tr ρ A log 2 ρ A with the reduced density matrix ρ A = Tr B |Ψ A∪B Ψ A∪B | [10][11][12][13][14].To reveal the topological nature of the system, one can further analyze the eigenvalues of the ρ A -deduced entanglement Hamiltonian: The presence of Majorana zero (edge) modes can now be inferred by a corresponding degeneracy in the entanglement spectrum.In other words, the spectrum preserves topological information.Moreover, for any system with a quadratic Hamiltonian, the whole situation can be simplified to compute the eigenvalues of the correlation function matrix, also known as the one-particle entanglement spectrum [15,16].Majorana zero modes are then shown in terms of doubly degenerate eigenvalues, 1/2, in the spectrum [16].This example reflects the power of the entanglement-related quantities as a good topological indicator, but sometimes computation of these quantities is time-consuming and measuring entanglement is experimentally difficult.And hence, a more efficient tool for identifying TPTs might be necessary.
Machine learning (ML) is a rapidly growing field of computer science recently due to the availability of largescaled datasets and advances in the computation hardware.Its applications have become ubiquitous in our daily life from automated machine translation, vision and speech recognition, matching news items, to email spamfilter and so on [17].By feeding in a large amount of data (or "features"), ML algorithms can "learn" to condense them into a more accessible/meaningful form, such as distinguishable classes or patterns.In particular, a neural network-based learning method, called deep learning (DL), are composed of several simple but non-linear modules and able to effectively learn suitable representations from complex raw data and distill essential information.Therefore, as a straightforward application on quantum matters, so far DL has been employed by physicists to classify different phases of matter and identify phase boundaries [18][19][20][21][22].Moreover, the remarkable property of this method is also shown in considering topo-logical [23,24] or out-of-equilibrium systems [25] where no obvious local order parameter is available.
Despite recent progress in using DL for identifying phase transitions, many of them are based on straightforward wave functions (by certain manipulation or not) as the input data for learning, while relatively few are based on the aspect of quantum information.Since quantum information is also known to be useful when there is no local order parameter available in a system, one pioneering work takes the entanglement spectrum (ES), used to compress the ground state information, as the input data and trains a neural network to distinguish the topological phase from the trivial one [18].However, one should ask if ES is too compressed for a given quantum system and are there better ways to represent quantum information for the purpose of using DL technique.Therefore, in this work, we study topological phase transitions in 1D p-SC via DL approach and systematically examine what kinds of quantum information related quantities could better represent the features of a quantum system, and hence a better input for DL.Concretely, we find that block correlation matrix in Majorana representation not only tip out the phase boundaries but can even offer extra information beyond ES.
Model -The p-wave superconducting system of spinless fermions [5] in one dimension (1D) is described by the Hamiltonian where t is the nearest-neighbor hopping amplitude, ∆ is superconducting pairing potential, and µ represents on-site chemical potential.With the translational invariance, Eq. ( 1) can be rewritten as where Pauli matrices σ = (σ x , σ y , σ z ), and . Note that this system preserves particle-hole symmetry, while it breaks time reversal symmetry and hence, the chiral symmetry; therefore it belongs to the class D according to the ten-fold way classification for symmetry-protected topological systems [26].It can be characterized by a Z 2 topological invariant.
To make physics more transparent, we define Majorana operators, , and Eq. ( 1) becomes As shown in Fig. 1(a), when |µ| > 2t, the system Hamiltonian can be adiabatically connected to the form, , where t = |∆| = 0 and µ < 0. It is then straightforward to see that the ground state is now composed of a paired Majorana fermions at the same site, resulting in no Majorana edge modes and hence a topologically trivial phase (phases III and IV).On the other hand, when |µ| < 2t, H is adiabatically connected to the special case, it j d 2j d 2j+1 , where t = |∆| > 0 and µ = 0: Most Majorana fermions from neighboring sites are paired together while the system leaves the edge Majorana modes alone (unpaired), and thus corresponding to a nontrivial phase (phases I and II).
As we have mentioned in the introduction, quantum information of the system can serve as a useful diagnostic for topological phase transitions, and in particular, the entanglement correlations should encode whole information about the focused state.In this paper, we consider two common correlators.First, by separating the system into A and B blocks [see Fig. 1(b)] and in terms of Majorana operators, the Majorana correlation matrix (MCM) of Majorana fermions at different sites within the subsystem A can be defined as MCM i,j ≡ i Tr , where ρ 0 represents the density matrix of the ground state.The other correlations are either the conjugation or proportional to trivial identity when system Hamiltonian is quadratic.Second, the block correlation matrix (BCM) for subsystem A is defined as BCM i,j = Tr ρ 0 ĉi ĉ † j with ĉi ≡ (c i , c † i ) T and i, j being sites of the finite block A. This matrix is intimately connected to the more familiar quantity, the reduced density matrix of the block A, , where λ m are simply the eigenvalues of BCM and λ m s are also known as one-particle entanglement spectrum (OPES).Therefore, the eigenvalues and their corresponding eigenvectors of BCM could be considered as potential representations of quantum information in the system for our deep learning purpose.DL-based approach -In order to correlate the phases outlined in the previous section with various quantum information-inspired input representations, we employ convolutional neural networks (CNNs), which are nonlinear functions particularly designed for efficiently recognizing patterns in the image-type data [27,28].As we explain below in more detail, our choice of using CNNs is based on a natural interpretation of the input representations as "images".
Using Keras [29], we build a deep CNN architecture as depicted schematically in Fig. 1(c).The architecture is mainly composed of two parts: Begins with convolutional layers followed by fully-connected neural networks.The convolution part processes the input data by two consecutive convolutional layers both with filters of kernel size 3 × 3 and rectified linear unit (Relu) activation functions.The number of filters (depth) is 64 for the first layer, and 128 for the second one.We do not insert pooling layers here to avoid missing subtle information due to our small image size (usually smaller than 100 × 100).After convolution, the processed data is then fed into a classifier made of a fully-connected, Relu activated layer with 512 neurons and a four-neuron, fully-connected softmax layer.The final outputs after softmax activation would sum up to unity and thus can be interpreted as the probabilities that the input data belongs to the four different phases as shown in Fig. 1(a).
At the supervised training stage, we train the CNN on a dataset composed of O(10 4 ) representative "images" in "gray scale" (typically of 20 × 20 or 40 × 40 pixels) generated simply around six µ values (in a width of 0.05t), corresponding to four possible phases deep inside the phase diagram [See Figs.1(a) and 3].Note that we collect "image" data in each topological phase around two different µ values for training to inform the model about two possible phase boundaries.Setting the train-validation split ratio as 0.2, the optimization for our model is then performed by ADAM algorithm [30] at learning rate 10 −3 with cross entropy as the loss function.Typically after training over 15 epochs, both train and validation losses would be less than 10 −6 , indicating that the resultant model becomes reliable [See Fig. 2(b)].Once training is done, at the inference stage we fix whole parameters in the trained model and feed into new data for prediction.
Results -Now, we take aforementioned deep learning approach to study topological phase transitions occurred in 1D p-SC and examine various quantum informationinspired input features in order to provide a better compressed representation of the naive ground state wave function.We first prepare the training input "images" with labels by calculating a thousand of MCMs with subsystem size L (block A) under periodic boundary conditions of the full system around a given µ for each phase.Each MCM can be viewed as a L × L "image" in one (gray) channel and entries in a MCM represent pixel values.For simplicity, we fix ∆ = t in the training set (except for phase II, where ∆ = −t).After training, we find that the model easily learns how to distinguish different phases for the given dataset and is ready to generalize for unseen data points.
As shown in Fig. 2(a), the neuron output corresponding to phase IV goes from probability 1 at µ/t = −2.5 to 0 at µ/t = −1.5.Such curve gets across at µ * (L = 20)/t ≈ −2.01 with the curve corresponding to phase I which behaves just oppositely, indicating that our CNN model indeed realizes a phase transition.In addition, the curves corresponding to the other phases are never activated here.Similarly, at µ * (L = 20)/t ≈ 2.01 two curves corresponding to phases I and III, respectively, also cross with each other and thus again it suggests another occurrence of the phase transition (not shown).
Note that for a given finite-size L, µ * (L) is identified as the point where both crossing curves have equal probability 0.5, i.e., at the moment that our trained model is not able to distinguish between the two phases.The nonabruptness of the phase transition seen from the probability curves is due to finite-size effect.As one can see in Fig. 2(c), the transition region becomes sharper when L grows longer and the finite-size trend of µ * (L)/t is also closer to µ * /t = −2 in the thermodynamic limit where L → ∞.Alternatively, we next consider taking BCM as our training input.For a finite subsystem A of size L, there are at least three ways to represent each BCM (now of size 2L × 2L due to Nambu notation) as an "image": (i) View BCM itself as a "gray image"; (ii) arrange all eigenvalues of the BCM in ascending order into a diago-nal matrix, which can be viewed as an "image"; (iii) we diagonalize a BCM and then arrange each eigenvector as one of the columns in a new matrix M .M is again of size 2L × 2L and can be viewed as a "gray image".In the DL approach, each way of representing the focused ground state would encode different levels of entanglement information, likely leading to distinguishable ability in realizing different phases, which is what we want to examine now.Since by definition of MCM, way (i) of treating BCM should be similar to the MCM case and hence we do not repeat it here and only focus on the latter two ways.
In fact, in way (ii) utilizing diagonal matrices of BCMs from all eigenvalues as input "2D images" is not efficient because the off-diagonal part contains no information (like black background).Therefore, we simply feed in all eigenvalues of a BCM, namely, entanglement spectrum, as an 1D input to a simple feed-forward neural network composed of 3 consecutive fully-connected layers (with Relu activation), having 32, 64, 256 neurons, respectively, followed by a 4-neuron dense layer with softmax activation as the final output.By taking the same training procedure (skip phase II training samples) and waiting until the training and validation losses converge, we show the final predictions of unseen data points at ∆/t = 1, L = 20 as a function of µ in Fig. 3.It is clear that phase I can be recognized very well, while the outputs corresponding to phases III and IV are not.The predicted probabilities are 0.5 and 0.5, respectively, indicating that these two phases confuse the network due to their similar eigen-spectra.If one examines phases III and IV carefully, one can see that they indeed belong to the same superconducting phase but very likely with different U (1) gauges.The overall results for this case thus suggest that the representation using entanglement spectrum as inputs compresses too much quantum information and may not be effective when considering to determine the global phase diagram via DL approach.
To gain more quantum information, we finally investigate way (iii) to prepare the input dataset for training the CNN model we have mentioned before.Once complete training, Figs.4(a) and 4(b) depict again the model predictions for the unseen data points as a function of µ at ∆/t = 1, L = 20.In the case of Fig. 4(a), we make each input "image" by including all eigenvectors of each BCM, while by including only two middlemost eigenvectors in the case of Fig. 4(b).Clearly, both cases show the ability of the CNN model to recognize global phase diagrams despite of different U (1) phases which phases III and IV may take, and similar results are also observed when the unseen data points vary along ∆ with fixed |µ| ≤ 2 (i.e.phases I and II).But one would notice that the Fig. 4(a) case is much worse in determining the phase boundaries than that of the Fig. 4(b) case.
The underlying reason behind the above result can be explained as follows.Let us examine some se-lected input "images" made of all eigenvectors at µ/t = 2.5, −0.625, 0.625, 2.5, respectively, in Fig. 4(c).The essential feature to distinguish a nontrivial topological phase from the trivial one is the presence of end modes (with entanglement eigenvalue 0.5) in the middlemost region of the figure along x.Thus, importantly, this indicates that our model can differentiate the end modes (µ/t = −0.625,0.625 cases) from the boundary eigenstates due to the finiteness of the subsystem A (µ/t = 2.5, 2.5 cases), which is difficult by human eyes.However, when taking all eigenvectors in making "images", the bulk eigenvectors occupy large portion of the image which may force our model to pay more attention to this portion and hence weaken the ability of the model to determine the phase boundaries.
Discussion and conclusion -There is compelling reason for an independent check of our aforementioned results by comparing with those from seemingly unrelated 1D transverse-field Ising model.In fact, the transversefield Ising model with a magnetic field λ, written as H tI = − i σ x i σ x i+1 − λ i σ z i with Pauli matrices σ α , can be transformed into 1D p-wave superconductor with t = −∆ = 1 and µ = 2λ [see Eq. ( 1)], by a non-local Jordan-Wigner transformation.Given the correlation function matrices, the entanglement spectra or entanglement eigenstates as a possible form of inputs, the transition point λ = 1 is still stood out via the proposed deep learning approach to distinguish between the ferromagnetic phase (λ > 1) and the paramagnetic one (λ < 1).It justifies the effectiveness of using the quantum information (in particular, the entanglement aspect) to encode quantum phases in the deep learning process.
Moreover, among the quantum information-related quantities used in this study, since a 2N × 2N BCM has N redundant variables, we notice that using N ×N MCM would be enough for recognizing topological phase transitions.And the fact that a relatively small size of MCM is already effective in our approach reflects the nature of BCMs, which indeed includes essential quantum information of the infinite chain at the ground state .
In summary, we demonstrate how to adopt deep learning approach assisted via entanglement aspect to discover topological phase transitions.Several quantum information-related quantities such as MCM, ES or EE (from BCM) are fed into deep neural networks for training.While ES can only find the phase transition points, MCM and EE contain much abundant information to find not only the critical points but also phases of matter with different U (1) gauges.Our work emphasizes utilizing quantum information, instead of naive wave functions, as inputs in the deep learning approach and it might be proved useful as well in the higher dimensional systems.

FIG. 1 :
FIG. 1: (Color online) (a) Topological phase diagram of the 1D p-wave superconductor, where the chain-like inset pictures show schematically the Majorana representation for each phase as described in the main context.These indicate phases I and II are topological, while the others are not.(b) The infinite system is divided into a finite subsystem A with L sites and an environment B. (c) The schematic illustration of the convolutional neural network used in this work.

FIG. 2 :
FIG. 2: (Color online) (a) Each neuron output of the final softmax layer, corresponding to the probability of each phase, as a function of µ/t (unseen data) with ∆/t = 1, L = 20.Although the training sets from MCMs are far beyond the µ/t region shown here, the CNN can still recognize a topological phase transition near -2.0.The dashed line indicates the theoretical value of the transition.(b) The validation loss follows the trend of the training loss well, suggesting no overfitting happened.(c) The transition step becomes sharper as L grows, showing the finite-size effect of the subsystem A.

FIG. 3 :
FIG. 3: (Color online) The neuron output "phase diagram" is shown as a function of µ/t with ∆/t = 1, L = 20.The training sets from the eigen-spectra of BCMs are prepared at around µ/t = −13, −0.3, 0.3, 13 of a window width 0.05 (red dots).The phase boundaries are clearly recognized for unseen data points, but the probabilities corresponding to phases III and IV are equal at |µ/t| > 2, suggesting they confuse the neural network.

FIG. 4 :
FIG. 4: (Color online) (a) The neuron output "phase diagram" is shown as a function of µ/t with ∆/t = 1, L = 20.The training sets from whole eigenvectors of BCMs are prepared at the same µ/t regions mentioned in Fig. 3 (red dots).The phase boundaries can still be recognized for the unseen data points, but with some deviation from the exact value.(b) Similar diagram as (a) but with only two middlemost eigenvectors making the input "images".The phase boundaries are sharply shown.(c) Representative input "images" for the CNN at given µ/t values in the case (a).Boundary modes or topology-induced end modes are present at the middle-top and the middle-bottom parts of each image.