Unsupervised learning using topological data augmentation

Unsupervised machine learning is a cornerstone of artificial intelligence as it provides algorithms capable of learning tasks, such as classification of data, without explicit human assistance. We present an unsupervised deep learning protocol for finding topological indices of quantum systems. The core of the proposed scheme is a ‘topological data augmentation’ procedure that uses seed objects to generate ensembles of topologically equivalent data. Such data, assigned with dummy labels, can then be used to train a neural network classifier for sorting arbitrary objects into topological equivalence classes. Our protocol is explicitly illustrated on 2-band insulators in 1d and 2d, characterized by a winding number and a Chern number respectively. By using the augmentation technique also in the classification step we can achieve accuracy arbitrarily close to 100% even for objects with indices outside the training regime.

Neural-network-based approaches typically require external supervision and labeled data before they become capable of predicting relevant features. This severely limits their applicability. Unsupervised methods are much more scarce but have greater potential.
By now several unsupervised ML tools have been put forward such as usage of principal component analysis [38][39][40][41] , variational autoencoders [40][41][42] , selforganizing maps 43 , advanced clustering algorithms 44,45 . One particular method, learning by confusion 1 , has recently been formulated by van Nieuwenburg et al. and shown to act as an universal feature extractor. The limitation of this approach lies in the need of human assistance in the interpretation of the results: The extracted features may not be physically interesting or some of the relevant features may be missed at all. In this paper, as explained below, we advance the learning by confusion approach for predicting topological quantum state equivalences.
Topologically nontrivial phases of matter have been at the forefront of active research in condensed matter physics for several years now 46,47 , with many questions still open. In this paper we apply unsupervised ML to the task of detecting topological quantum phase transitions. We put a novel twist to the conventional training procedure inspired by data augmentation techniques widely applied in image recognition for expanding datasets 48 .
The idea is to employ a specially designed data augmentation procedure based on preserving the essential topological features whilst erasing any other features: The training is performed on derivative states obtained from parent quantum states by using topology preserving random deformations. We then suggest to use this data in combination with a modified confusion learning scheme to predict the topological state equivalences without any prior knowledge about the topological invariants. It is shown that the predictive outcome of NNs contains a unique signature generically present in the case of a topological state distinction. The procedure is demonstrated on simple examples of quantum states in 1d where we were able to correctly reproduce the known topological equivalences.
The paper is organized as follows. In Sec. II our method for topological classification is sketched on the example of 2d geometric objects. We then turn to the central part of this work and discuss the neural-networkbased topological classification of quantum states in 1d: First, we give a simplified description in Sec. III A and then do a deeper analysis in Sec. III B. A short summary of our findings is given in the last section.

II. A TOY MODEL: TOPOLOGICAL CLASSIFICATION OF GEOMETRIC OBJECTS
To set the stage, let us first sketch our unsupervised scheme for detecting topological inequivalence on a the simple example of geometric objects in 2d. Here we shall skip many of the details and aim to only outline the idea without any numeric implementation. For simplicity, let us focus on just three simple cases: a solid circle, a solid rectangle, and a hollow circle, Fig. 1. We present a procedure that without any prior knowledge of the topological invariants will indicate that the solid circle and rectangle are topologically equivalent but distinct from the hollow circle. The only things required to be a priori specified is the space the objects live in, 2d, and types of continuous deformations that define the concept of topological distinction in this space. The scheme is divided into two main parts: Exploration and Prediction. During the ex- ploration stage we collect ensembles of geometric shapes topologically equivalent to the original samples. They are obtained by performing a large number of random continuous deformations on the original object. In this way we will get a random sampling of the topological equivalence space to which the object belongs to. One then performs the exploration procedure independently for each of the three objects and aims to determine if the corresponding ensembles cover the same topological equivalence spaces, the process that we dub the prediction stage. To do so we suggest to use learning by confusion 1 : We take two sampling ensembles corresponding to two geometric objects and train the network to distinguish them. If the objects are topologically equivalent and the corresponding equivalence space has been sufficiently explored, then the network will always fail and the classification accuracy will always be close to 50%: The network will look for regularities in two featureless datasets that correspond to the same part of the topological space explored randomly. In contrast, the topologically distinct objects are well separated and the network will ideally hit 100% classification accuracy. The procedure described above is summarized in Fig. 1. Of course, instead of doing the classification for geometric shapes in 2d one can follow exactly the same procedure for any abstract objects and we suggest to use it for detecting topological phase transitions discussed in detail below. Before moving further, let us briefly summarize the concrete steps of the unsupervised computational scheme presented above: One first picks two objects and creates two distinct datasets by applying a large number of random local deformations to them, the process we named the exploration stage. These two datasets are then used in the prediction stage for determining if the objects are topologically equivalent or not: We train a Deep Neural Network to distinguish the datasets and the objects are topologically equivalent (distinct) if it fails (succeeds) in doing so, meaning that the classification accuracy is around 50% (100%). Note that this procedure does not contain any prior knowledge of the topological invariants or structure of the objects, the only information used is the characteristics of the topological space they belong to for determining the range of continuous transformations allowed to be performed.

III. TOPOLOGICAL CLASSIFICATION OF QUANTUM STATES IN 1D
Here we explicitly demonstrate the applicability of the proposed method and generically identify topological quantum state equivalences in 1d. Any two quantum states are said to be topologically distinct if they cannot be transformed into each other by continuous transformations respecting some underling symmetries. We focus on the 1d topological classification in symmetry class AIII of the standard ten-fold classification, known to contain topologically inequivalent phases labeled by an integer topological invariant, the so-called winding number ω 46 .
Any gapped two-band system from this symmetry class can be represented by a momentum-periodic Hamiltonian and h y (k), and Pauli matrices σ x and σ y . The winding number ω then calculates how many times the vector (h x (k), h y (k)) winds around zero as a function of k ∈ [0, 2π). Before turning to our main procedure, it is practical to normalize (h x (k), h y (k)) to be of unit length for each k and consider from now on the space of normalized H(k). Note that this normalization process neither induces any topological phase transitions nor requires any prior knowledge about the winding number. This is done for efficiency, as we would like to make the exploration and prediction stages more efficient by reducing the size of the initial unbounded topological space. In this formulation, any quantum state can be represented by a continuous set of unit vectors (h x (k), h y (k)) with k ∈ [0, 2π).

A. Quantum states in 1D: Basic description
For a simple illustration of the proposed method we shall consider several 1d quantum states corresponding to the same or different winding numbers and show that our algorithm can capture their topological equivalence or distinction. In contrast to the original "learning by confusion" protocol 1 we do not rely on a parametrized state generation from a given Hamiltonian, but instead create the ensembles of data using a random continuous deformation. First, let us take a look at three cases corresponding to winding numbers 0, 1, and 2 shown in Fig. 2. For numerical purposes we discretize momentum space into 100 equally spaced sites. To explore the equivalence space of all topologically equivalent states we perform the following deformations: Randomly select a site in the momentum space and rotate the corresponding unit vector by a random angle φ ∈ [−π, π). The deformation is then smeared out by rotating also the adjacent vectors by an angle gradually decreasing with the distance, see Fig. 3. By repeating this procedure many times we obtain a random representative of the topological equivalence space and then collect such representatives for obtaining an ensemble of states sampling the same topological equivalence space. For producing numerical results the deformation decay was chosen to be described by a Gaussian function with standard deviation of 10 discrete momentum sites. Each of the states we deformed 50 times before saving to the corresponding ensemble of data. In total, we collected 10 4 data samples for each ensemble of topologically equivalent quantum states.
For determining the topological equivalences by confusion we train a Neural Network to pairwise classify the corresponding ensembles of states obtained during the exploration process: The failure (success) will indicate the topological equivalence (distinction). Here we shall employ a standard Convolution Neural Network with 2 Convolution layers of 16 and 8 feature maps with receptive field of size 2 by 2 followed by a fully-connected layer of size 50 and output classification layer of size 2. In total there are 80 752 trainable parameters. The activation function was selected to be a rectified linear unit function in every layer except the output classification layer with a softmax activation. The training was preformed over 20 epochs at max. We found that the layers of dropouts, max-poolings, batch normalizations or any kinds of regularizations intended to reduce the overfitting only insignificantly affected the results and therefore are skipped here. Also, our classification task is quite different from the conventional image recognition problem where one is interested in reaching the highest prediction accuracy on data unknown to the network, in this way mimicking realistic conditions of image recognition to be applied in daily use. Instead, here we check if the network is capable of learning to separate two sets of data and do not specify at which stage of the training the best performance should be obtained. The classification accuracy of interest is then defined as the maximum classification precision obtained at any stage of the training. For calculating the accuracy we create a separate set of samples that is not used at any stage of the training.
Here we take it to contain 10 3 samples from each of two considered ensembles of states. In Fig. 4 we present the accuracies for classifying the ensembles of states shown in Fig. 2 and complementary states to them. The complementary states were obtained by rotating the corresponding vectors at each momentum value by 180 degrees. Clearly, this process does not change the topological invariants and therefore the network should be confused to discriminate the pairs of complementary states. The results are in perfect agreement with the discussion above: The states corresponding to different topological indices are successfully classified by the network but not the complementary ones, with the classification accuracy always staying around 50%.

B. Quantum states in 1D: Detailed analysis
The procedure described above illustrates well the main idea of our method, however, in this formulation it lacks some very important details without which it is not suited for more general quantum state classification. There are two crucial things to address: One has to en- sure that no topological phase transitions occur during the exploration stage possible due to discretized nature of the deformations and that the obtained ensembles of derivative states become indistinguishable for any two topologically equivalent states. The former issue can be most straightforwardly solved by simply monitoring the changes in the quantum state and forbidding discontinuous deformations to occur. The latter issue is of more fundamental nature. To confuse the Neural Network the ensembles of data have to be sufficiently randomized, we should not leave any features that will help the network to distinguish two datasets. Clearly, if we apply too little continuous deformations the ensemble of derivative states will still contain some information about the parent state and the network will in general use it for differentiating datasets. Therefore, we need to design an unsupervised tool that will signal that the produced derivative states are randomized well and therefore are ready for being used for the topological classification. There may exist multiple criteria indicating the lack of sufficient exploration, but here we suggest to use the following idea: Topological indices are not invariant under discontinuous transformations and any pair of topologically distinct well-explored ensembles is anticipated to become topologically indistinguishable under discontinuous changes. We thus apply generic discontinuous deformations to the ensembles of states and expect the classification accuracy to stay at (in the case of topologically equivalence) or drop fast to (in the case of topo-  6: Examples of nonhomogeneous 1d quantum states from AIII symmetry class: 4. hx(k) = cos(k) and hy(k) = sin(k) for k ∈ [0, π) and hx(k) = cos(−k) and hy(k) = sin(−k) for k ∈ [π, 2π) with ω = 0; 5. hx(k) = 1 and hy(k) = 0 for k ∈ [0, π) and hx(k) = cos(k) and hy(k) = sin(k) for k ∈ [π, 2π) with ω = 1; 6. hx(k) = 1 and hy(k) = 0 for k ∈ [0, π) and hx(k) = cos(2k) and hy(k) = sin(2k) for k ∈ [π, 2π) with ω = 2.
logical distinction) approximately 50%. If this does not happen (as exemplified in Figure 5c and d) then it is a sign that there are other features still present and more continuous deformations are needed to wipe them out. We implement the discontinuous deformations by simply removing some random momentum space section of the states. Clearly, this is in general a discontinuous process such that topological features are expected to be lost very fast with growing size of the removed pieces. We are not aware of any non-topological features that will behave in a similar way, they are most probably very rare, maybe even non-existent, and even if encountered in practice they will be very unlikely to persist after the exploration process based on random deformations.
By using this recipe for progressive discontinuous deformations every two ensembles of explored states now fall into one of three categories: topologically equivalent, topologically distinct or inconclusive, where the latter means that the classification accuracy does not rapidly approach 50% implying that the states were insufficiently deformed. In Figs. 5a and 5b we present typical idealized dependencies of the classification accuracy vs. number of removed sites for two well-explored topologically equivalent (Fig. 5a) and distinct (Fig. 5b) states. In Figs. 5c and 5d we show the outputs of our procedure performed on two concrete case studies and illustrate how the dependencies change with increasing number of performed random deformations: One can clearly see that the datasets corresponding to 200 random deformations (blue curves) contain distinguishing features surviving removal of very large sections, which signals that this data falls into the inconclusive category. At 1000 random deformations (black curves) such features disappear and our classification procedure correctly reproduces the topological equivalences.
To illustrate the generic unsupervised scheme we determine topological equivalences between the challenging examples from Fig. 6  were found to be insufficient to explore well the corresponding equivalence spaces and therefore the following modification to the exploration process was applied: Instead of just one site we now randomly pick 1 to 40 adjacent sites and all of them rotate by the same random angle φ ∈ [−π, π), and then smear out this deformation by adding to it decaying tails of Gaussian form with standard deviation 10. In this way we include deformations of all distance ranges making the exploration stage more efficient. In total we collected ensembles of 10 4 derivative states for each parent state from Figs. 2 and 6. Each state in an ensemble was produced by deforming the parent state 10 3 times, except for Figs. 5c and 5d where we also considered ensembles where states were generated by 200 deformations. The continuity was ensured by explicitly checking if the state after a deformation did not go through a discontinuous change: We were monitoring relative angles θ i,i+1 ∈ [−π, π) between the neighboring vectors and forbade any deformations satisfying θ i,i+1 + ∆θ i,i+1 > π for any i, where ∆θ i,i+1 = φ i+1 − φ i with φ i the corresponding deformation angle applied at momentum site i. In the prediction stage we employed the same Convolutional Neural Network as in Sec. III A and the validation set consisting of 2 * 10 3 states unused at any stage of the training. For implementing discontinuous changes we simply removed randomly selected n k adjacent unit vectors from all derivative quantum states. The topological classification between quantum states from Fig. 2 vs. quantum states from Fig. 6 is shown in Fig. 7: Remarkably, the obtained data correctly predict the topological equivalences and distinctions in all of the considered cases. We shall make a few important remarks here. The noisy fluctuations in the graphs are completely numeric and anticipated to decrease with increasing the number of taken states in the ensembles (10 4 ). The curves corresponding to topologically distinct states are expected to become sharper as one would increase the numbers of deformations (10 3 ) and sites (N k = 100). More importantly, one notices that in many topologically distinct cases the classification accuracy stays always much below 100%. This just reflects the fact that our neural network was not able to learn the topological invariant. The classification accuracy can be improved by manually tuning the net's architecture, i.e. changing the design, or increasing its complexity. However, even if the result is less than 100% (but well above 50%) we may still make definite statements about the topological distinctions, as the datasets are distinguishable by the network. Note that deep neural networks are universal nonlinear classifiers and the cases in which two topologically distinct datasets, being well separable by construction, appear to be totally indistinguishable to them we conjecture to be extremely rare or even non-existent.

IV. SUMMARY
We have developed a neural-network-based procedure for determining topological equivalences between quantum states. In short, the scheme is based on creating ensembles of data derived from the original states of interest and classifying them employing learning by confusion 1 . The novelty lies primarily in how the training data is generated by exploring the space of topologically equivalent states, without relying on a parametrized Hamiltonian. In contrast to other ML studies of topological properties 17-21 , our method does not require any training on a priori labeled data making it definitely more practical. More significantly, the proposed approach seems to be universal and we anticipate that it can be used for spotting topological quantum equivalences in higher dimensions, more exotic symmetry classes (e.g. crystalline TIs 49 ), under time-periodic external drives 50,51 , and in the presence of interactions 46,47 . All these directions would be definitely interesting to explore, with the last one being certainly the most challenging and valuable.