Interpreting machine learning of topological quantum phase transitions

There has been growing excitement over the possibility of employing artificial neural networks (ANNs) to gain new theoretical insight into the physics of quantum many-body problems."Interpretability"remains a concern: can we understand the basis for the ANN's decision-making criteria in order to inform our theoretical understanding?"Interpretable"machine learning in quantum matter has to date been restricted to linear models, such as support vector machines, due to the greater difficulty of interpreting non-linear ANNs. Here we consider topological quantum phase transitions in models of Chern insulator, $\mathbb{Z}_2$ topological insulator, and $\mathbb{Z}_2$ quantum spin liquid, each using a shallow fully connected feed-forward ANN. The use of quantum loop topography, a"domain knowledge"-guided approach to feature selection, facilitates the construction of faithful phase diagrams. Due to the relative simplicity of the ANN, its learning can be interpreted in each of the three cases. To identify the topological phases, the ANNs learn physically meaningful features, such as topological invariants and deconfinement of loops. The interpretability in these cases suggests hope for theoretical progress based on future uses of ANN-based machine learning on quantum many-body problems.

There has been growing excitement over the possibility of employing artificial neural networks (ANNs) to gain new theoretical insight into the physics of quantum many-body problems. "Interpretability" remains a concern: can we understand the basis for the ANN's decision-making criteria in order to inform our theoretical understanding? "Interpretable" machine learning in quantum matter has to date been restricted to linear models, such as support vector machines, due to the greater difficulty of interpreting non-linear ANNs. Here we consider topological quantum phase transitions in models of Chern insulator, Z 2 topological insulator, and Z 2 quantum spin liquid, each using a shallow fully connected feed-forward ANN. The use of quantum loop topography, a "domain knowledge"-guided approach to feature selection, facilitates the construction of faithful phase diagrams. Due to the relative simplicity of the ANN, its learning can be interpreted in each of the three cases. To identify the topological phases, the ANNs learn physically meaningful features, such as topological invariants and deconfinement of loops. The interpretability in these cases suggests hope for theoretical progress based on future uses of ANN-based machine learning on quantum many-body problems.

I. INTRODUCTION
There has been much recent activity in the quantum matter community applying ANN-based machine learning to synthetic [1][2][3][4][5][6][7][8][9][10][11][12][13] and experimental [14][15][16] quantum matter data. These efforts exploit the ability of ANN models to provide effective approximations of functions in high-dimensional spaces. ANNs can thus represent either many-body wave functions or complex mappings between many-body Hilbert space and associated emergent properties. Indeed ANN-based variational studies [1][2][3][4][5][6][7] and phase detection [8][9][10][11][12][13] have successfully reproduced known results. Even the simplest ANN with a single hidden layer can approximate any target function with a sufficiently wide hidden layer [17]; though in practice, such an architecture would be difficult to train efficiently, so multi-layer architectures are preferred. Insofar as the goal of theoretical quantum matter physics is some fundamental understanding of its underlying properties, however, it would be disappointing if the ANN operated as a black box, giving little insight into the basis for its predictions or results. More generally, "interpretability" is an important challenge in many areas of algorithmic Artificial Intelligence, and is of particular importance in the use of ANNs as tools for scientific research.
The more expressive is a machine learning model, the harder it can be to interpret. It is difficult to characterize a function on a high-dimensional domain if it has an enormous number of parameters, and no obvious symmetries to permit easy visualization. If an ANN trained on a large number of labeled samples is able to predict with high precision the expected properties of new sample data, then we have certainly made progress. But if we are unable to extract from the ANN the specific features of the data, and combinations thereof, that it uses to make those predictions, then we have not achieved a deeper insight into the physics of the system. Our goal in physics is to develop some more compact formulation of the crucial degrees of freedom, derive from that some more general intuition into the system's behavior, and use that to de-velop analytic methods or physical laws that can be extended to a wide variety of related phenomena. A black box predictor that works only on a specific class of examples, and gives no insight into how it makes predictions, would be fundamentally unsatisfying as a tool for theoretical physics.
To date, "interpretable" machine learning in problems of quantum matter has been restricted to support vector machines (SVMs). Specifically, SVMs have detected features of the order parameters in various spin models [18][19][20], the Hamiltonian constraints in gauge theories [18], and the level statistics in the many-Body localization transition [21]. SVM's are intrinsically linear classifiers, based on finding a hyperplane to separate an already curated feature set, hence more easily interpreted. The most general ANN, on the other hand, can take an unmanageably large set of raw features, and transform and combine them in arbitrarily complex ways by way of millions or billions of learned parameter values. Its predictions are opaque unless we can determine how it has rearranged, amplified, and combined them into effective degrees of freedom that govern the phenomena of interest.
Here we consider three investigations of topological quantum phase transitions, each using a shallow fully connected feed-forward neural network. The quantum phase transitions are between topologically trivial states and three distinct topological phases in two spatial dimensions: a time-reversal symmetry breaking Chern insulator (CI), a time-reversal invariant Z 2 topological insulator (TI) and a Z 2 quantum spin liquid (QSL). Of the three cases, the ANN-based phase diagram for a Chern insulator [11] and Z 2 quantum spin liquid [12] have been previously obtained by two of us. The case of the ANNbased phase diagram for the Z 2 TI, as far as we know, is first obtained in section III here, although the model of Kane and Mele [22] is well-known. In all three cases, the topological order is detected using only a simple shallow ANN, by using the physically motivated features introduced in Ref. [11], designated quantum loop topography (QLT). The QLT consists of a semi-local gauge invariant product of two-point functions from (variational) Monte Carlo instances. The specific geometry of the QLT is guided by characteristics of the phase itself, i.e., it is based on "domain knowledge". Given the simplicity of our ANN, and its ability to interpolate between QLT and the topological phases of interest, it is plausible that insight into this physics can be derived by probing the "interior" of the ANN to illuminate properties of the function it has learned.
In this paper, we probe trained ANNs that yield correct topological quantum phase diagrams for the three cases of interest. We find robust interpretations of the "learning" of these ANNs to fall in two classes: (1) a linear function corresponding to a topological invariant, and (2) non-linear functions that build non-local and non-linear observables from our QLT inputs. The CI case in section II falls into class (1), and the Z 2 TI and the Z 2 QSL cases fall into class (2). The remainder of this paper is organized as follows. In section II, we review the QLT and ANN-based phase detection for the Chern insulators, and interpret the trained ANN. In section III, we obtain an ANN-based phase diagram for the Z 2 TI using a new QLT, and again interpret what the trained ANN has learned about the system. In section IV, we examine an ANN trained to detect the Z 2 quantum spin liquid phase, and interpret the methodology it has learned. We close with a summary and concluding remarks in section V.

II. INTERPRETING LINEAR ML: CHERN INSULATORS
QLT for CI assigns a D(d c )-dimensional vector of complex numbers to each lattice site j, thereby forming a quasi twodimensional "image". The elements of the vector associated to the site j are chained products such as where k and l are two sites that form a triangle with site j. In Eq. (1), eachP jk | α ≡ c † j c k α is a variational Monte Carlo sample of the two-point correlations associated with sites j, k evaluated at Monte Carlo step α, and β, γ label different Monte Carlo steps. The length D(d c ) of the vector is set by the total number of triangles anchored at the site j with lateral distance d ≤ d c , where d c is the cutoff scale that can remain close to the lattice constant for a gapped system (see Fig. 1).
The above choice of QLT for CI is motivated by the characteristic response function that defines CI, the Hall conductivity for free fermion systems [11]: where P i j ≡ c † i c j is the equal-time two-point correlation between sites i and j, S jkl is the signed area of the triangle jkl, and N is the total number of sites [23,24]. Hence QLT in Eq. (1) provides input that could contribute to the Hall conductivity, albeit with noisy single Monte Carlo instance datã P jk | α ≡ c † j c k α . More importantly, since it is constructed from loops, QLT provides only gauge invariant data to the ANN. We now we train an ANN using QLT from a model that exhibits a topological quantum phase transition (TQPT) between trivial insulator and Chern insulators [25], and probe the ANN to interpret how it has learned. The model Hamiltonian is a tight-binding model on a two-dimensional square lattice [11]: where r = (x, y) and 0 ≤ κ ≤ 1 is a tuning parameter. The κ = 1 limit is the π-flux square lattice model for a quantum Hall insulator with Chern number C = 1, while the κ = 0 limit reduces to decoupled two-leg ladders. H(κ) interpolates between the quantum Hall insulator and the normal insulator with a TQPT at κ = 0.5. We study a system of size 12 × 12 lattice spacings. Fig. 1 shows the architecture of our single hidden layer ANN. The trained ANN with weights w (1) , w (2) and biases b (1) , b (2) transforms a QLT input vector x into output y given by σ(z) is the non-linear activation function applied to the neurons, here taken as rectified linear units (RELU), i.e., with (The RELU choice has proven more efficient to train in the context of deep networks, and as well affords a simpler interpretation than the sigmoid function S (z) = 1 + exp (z) −1 . used in Ref. [11].)  (1, 1)), the RELU ANN reliably provides consistent determination of the phase. p is the probability that the ANN assigns a sample state to be in the quantum Hall phase. The vertical red line is the critical value κ = 0.5 between the two phases. Fig. 2 shows the typical phase recognition by a successfully trained ANN. Here, the network confidence p of the ground state being a Chern insulator at the given model parameter κ is assessed by taking the average of the neural network outputs for 500 independent input samples at the given parameter value. An ANN trained at the two marked training points reliably detects the transition from trivial insulator to CI at κ = 0.5.
Our interpretation of a trained ANN begins with inspection of the final weights w (2) j for each neuron j. As shown in Fig. 3(b), we often find the w (2) j to be largely concentrated on a single hidden layer neuron, which we label j max . With the ANN output determined by this single neuron, we can ignore the rest of the hidden layer neurons, and trace the firing of the output neuron back to the firing condition of j max , in turn encoded in the weights w (1) j max ,i and bias b (1) j max . Here i = r, t labels the inputs according to their lattice sites r and the triangles t ∈ [1, D(d c )]. The real and imaginary parts are also treated as separate inputs. Fig 3(b) shows the distribution of w (1) j max ,i for the four smallest triangles and the rest for each lattice site r. By inspection of w (1) j max ,i for all i's we find that the neuron j max ended up weighting as most significant the imaginary parts of P jk | αPkl | βPl j | γ coming from the smallest triangles jkl, namely, j = r, k = r ±x, and j = r ±ŷ. Moreover w (1) j max ,i is approximately evenly distributed across all four d = 1 triangles and across all real-space positions r, as shown by the four colors for the four triangles in Fig. 3(d). For all of the other inputs, including all real parts, the associated weights w (1) j max ,i are close to zero, as shown in magenta in Fig. 3(d).
The observation of significant separation between the weights w (1) j max ,i of the smallest triangles and the rest implies that we can approximate the firing condition by concentrating on those smallest triangles. Moreover, since the weights for the smallest triangles cornered at each site are roughly the same, we can approximate the firing condition for the Chern Hall insulator by the following criterion: ImP r±ŷ, r P r, r±x P r±x, r±ŷ +b (1) j max , 0]+b (2) > 0 (6) In the above, we have replaced the w (1) j max ,i over all positions r and four triangles (labeled by the relative position of the other two vertices ±x and ±ŷ) with their averagew (1) j max . Reading off the weights and biases from a learned ANN and inserting their values,w (1) j (7) where N = L 2 = 144. Considering S jkl = 1/2 for the d jkl = 1 triangles, the above criterion implies that the ANN relied on the imaginary part of the QLT input associated with the smallest triangles, and diagnosed the system to be a Chern insulator when their contribution to the Chern number was substantial. This is a reasonable and efficient diagnosis, given the exact formula Eq. (2) for the invariant in the position basis. Our successful interpretation of the ANN's learning in the case of the Chern insulator was thus enabled by two aspects of our approach: (i) the QLT was effective in providing the relevant features, and (ii) the exact topological invariant was known in the local basis and so guided our interpretation. The effectiveness of the QLT is reflected in the ANN learning a linear function based on a single j max neuron.

III. NON-LINEAR ML: THE TWO-DIMENSIONAL QUANTUM SPIN HALL INSULATORS
We now turn to a topological insulator with no known expression for the topological invariant in the position basis: the quantum spin Hall (QSH) insulator. A QSH insulator is defined as a two-dimensional, time-reversal invariant topological insulator with a quantized spin Hall conductance and a vanishing charge Hall conductance. The characteristic Z 2 topological invariant is only known as a loop-integral of the phase winding of the Pfaffian Pf k = Pf u n ( k)|Θ|u m ( k) over a contour in momentum space enclosing half the Brillouin zone [22], where n and m are 3. (a,b) The distribution of the (absolute values of) the weights w (2) j between the hidden layer neurons and the output neuron in an ANN with RELU neurons, (a) before, and (b) after training. In the case shown, j max is the very first neuron with j = 1. (c,d) The distribution of weights w (1) jmax,i associated with the input i for the dominant neuron j max , (c) before, and (d) after training. Here i labels the position r = (x, y), the type of triangle and the real and imaginary parts of the contributions. The major contribution to the output comes from the imaginary parts of the correlations from the four d = 1 triangles (see Fig. 1), whose weights are depicted in red, yellow, green and blue, respectively. In comparison, the weights associated with other inputs are much closer to zero even after training, as illustrated by the w (1) jmax,i distributions in magenta.
band labels, and Θ is the time reversal operator. A positionbasis expression for the Z 2 index I, the counterpart of Eq. 2 for the Chern number, is not known in general.
In the presence of spin s z conservation, a quantum spin Hall insulator is equivalent to two copies of Chern insulators, with σ ↑ xy = 1 for the spin up (s z =↑) electrons and an antichiral quantum Hall insulator σ ↓ xy = −1 for the spin down electrons(s z =↓). Hence Eq. (2) for each spin component will serve as a position-basis expression for the Z 2 index. It is known, however, that the QSH state is well-defined thrgh the momentum space expression for the Z 2 index Eq. (8), even when the Rashba spin-orbit coupling breaks s z conservation.
We turn to the physical response of an effective spin-current to a transverse electric field: the spin-Hall conductivity. Shi et al. [26] introduced the effective spin current in spin-orbit coupled systems, in the absence of average torque, as a timederivative of the spin-displacement operator:Ĵ s = drŝ z dt = 1/ [H,rŝ z ]. The flat-band Hamiltonian is defined asĤ = 1 −P, whereP is the projection operator onto the valence band of H.Ĥ is adiabatically connected to the model Hamiltonian H, and thus shares the topological properties such as the spin-Hall conductivity and the Z 2 topological index. According to the Kubo formula, the spin-Hall conductivity of H is trP P, xŝ z P, y = jkl,s z j ,s z k ,s z l P j,s z j ;k,s z k P k,s z k ;l,s z l P l,s z l ; j,s z j ŝ z j x j −ŝ z k x k (y k − y l ) where P j,s z j ;k,s z k ≡ c † j,s z j ; c k,s z k are the two-point correlators of H and the summation is over all triangles, with vertices j, k and l. Since there is no spin-quantization direction in the presence of Rashba spin-orbit coupling, we propose to use the following QLT to probe the quantum spin Hall (QSH) effect: As in the last section,P's are to be evaluated at independent Monte Carlo steps, and we focus only on the smallest triangles jkl, which should account for the major contributions, due to the exponentially decaying correlations in a gapped system. We now employ the QLT in Eq. (10) to train an ANN to recognize the QSH phase in the two-parameter phase space of the Kane-Mele model of Ref [22]: where t = 1 is the nearest neighbor hopping amplitude, λ S O = 0.1 is the spin-orbit coupling between the next nearest neighbors, the sign ν i j = ±1 depends on whether the direction is along or against the arrow (see Fig. 4 upper panel), λ ν is a staggered potential, and λ R is the Rashba term. For λ R = 0, s z is a good quantum number, and the model reduces to two independent copies of quantum Hall insulators [25].
In the presence of a finite λ R Rashba term, however, s z is no longer a good quantum number, and there is no longer a conserved spin. For small ratios of λ R /λ S O and λ ν /λ S O , the model Eq. (11) is known to realize a quantum spin Hall insulator, and otherwise a normal insulator [22]. The exact phase diagram in Fig. 4(b) is obtained through an explicit evaluation of the Z 2 index given in Eq. (8). Here we attempt to reproduce the phase diagram using QLT-based machine learning. We consider variational Monte Carlo samples of s x j P jk P kl P l j , s y j P jk P kl P l j and s z j P jk P kl P l j over the three smallest types of triangles (see Fig. 4(a)) and feed the corresponding QLT inputs into an ANN with a single hidden layer of RELU, as in the previous section. The training set consists of QLTs obtained from exact wave functions at the training points marked in Fig. 4(b). To avoid approximate conservation of spin s z , we randomly cycle s x , s y , and s z of the training samples during supervised machine learning. The trained ANN is then applied to QLT samples obtained from the phase space between λ ν ∈ [0, 0.9] and λ R ∈ [0, 0.6] to assess the likelihood that the input belongs to a quantum spin Hall insulator. The resulting phase diagram obtained by the ANN is shown in Fig. 5, and is consistent with the exact phase diagram. The benefit of using the QLT-based local input is that this approach allows the investigation of systems with disorders. With the original definition of the Z 2 invariant requiring a momentum space integral, there was no framework to assess whether a realistic system with disorder is a QSH insulator (although due its topological nature one would expect the QSH insulator to be immune to small disorder). Our success in using the QLT-based machine learning to recognize the QSH insulator paves the way for studying realistic models with disorders to discover more QSH insulators in nature.
To interpret the ANN's learning from the QLT for the QSH Eq. (10), we again plot the modulus of w (2) j between the hidden layer RELUs and the output neuron. We now find all the neurons to have substantial weight in the trained ANNs (see Fig. 6). Contrasting the weight distribution in Fig. 6 for QSH insulator with that for the CI (see the right column of Fig. 3), we see that the ANN for the SQH learns a more complex function of the QLT input. The earlier concentration of w (2) j to one neuron j max for the CI means that the ANN effectively expressed a linear function of the QLT as threshold for that neuron, because the QLT features were already so effective for the CI. That the ANN forms a non-linear function with the QLT inputs for the QSH implies that the position-based expression for the Z 2 invariant is not a simple linear combination of the QLT Eq. (10). This further motivates the use of the ANN-based approach for the QSH. It is moreover plausible that we can gain insight into the presently unknown positionbased expression for the Z 2 invariant, by studying the weights and biases of the function learned by the ANN. It might also be possible to adapt conventional machine learning methods such as the Lasso [27], or more general L 1 regularizations, to enforce sparsity as in Fig. 3b, and thereby facilitate interpretability. In this section, we turn to strongly interacting models of lattice gauge theory. Previously, ANN-based machine learning has been used to improve the efficiency of lattice QCD simulations [28] and detected phase boundaries of topological quantum phase transitions [8,12,18], using the mapping between the T = 0 quantum problem in two dimensions and the lattice gauge theory in three dimensions. Here we revisit the ANNbased phase diagram of Ref. [12], where two of us successfully trained an ANN to recognize the Z 2 quantum spin liquid phase, the deconfined phase for the corresponding threedimensional Z 2 lattice gauge theory. The goal is to interpret what the ANN learns from the semi-local QLT training data. Specifically, the question is how the ANN, using the behavior of small loops as features, can detect a confining phase, ordinarily signaled by the area vs. length behavior of large loops, as explained below. [29] For simplicity, we consider the Z 2 lattice gauge theory given by the Hamiltonian on a three dimensional cubic lattice: where S j = ±1 lives on the bonds of the cubic lattice, and p denotes the square plaquettes. For small values of λ b , the system has a phase transition at the critical value λ p ∼ 0.76 between a deconfined phase at large λ p and a confined phase at small λ p . An important physical concept in the lattice gauge theory is the Wilson loop: the path-ordered gauge field integrated around a closed loop. Since the Wilson loops are gauge invariant, they provide meaningful measurements of the lattice gauge theory and a natural dataset for machine learning [28]. For the Z 2 lattice gauge theory, the Wilson loop around a given loop C is defined as The deconfined phase of the lattice gauge theory is not distinguished by any broken symmetries, but rather by the limiting behavior of the Wilson loop in the thermodynamic limit: in In what follows, we will take advantage of the Abelian nature of Z 2 , which permits small Wilson loops to fuse in a unique channel and form larger ones, consisting of the products of the smaller ones, as in Fig. 7. In a single Monte Carlo snapshot, the sampled values of all Wilson loops can thus be obtained from those of the smallest Wilson loops, W p = j∈p S j around the square lattice plaquettes. In accord with this observation, we use the classical Monte Carlo samples of the smallest Wilson loops W p on the L = 12 dual lattice as the inputs to the ANN, see Fig. 8.
We use normalized Monte Carlo samples of the set of W p for the two distinctive phases above (λ p = 0.83) and below (λ p = 0.68) the critical value as the training set, with λ b = 0.1, to perform the supervised learning. In ref. [12], the phase diagram mapped out for the λ b /λ p plane by the optimized ANN was found to be in good agreement with the value of T c determined by finite-size scaling of Monte Carlo data, giving confidence in that phase diagram. As in the earlier sections, we use a shallow ANN with RELU neurons (see Fig. 8). An inspection of an ANN trained to correctly distinguish phases of the Z 2 lattice gauge theory (Z 2 topological order) shows multiple hidden layer neurons to participate. This is shown in the distribution of weights w (2) j in Fig. 9. As in the case of QSH (see section III, this implies that the decision boundary criterion for the Z 2 lattice gauge theory is also a non-trivial function of the inputs. To gain insight into the function that the ANN learns, we introduce non-linearity within the preprocessing so that the target function can be represented approximately linearly. For this, we include asymptotically higher-order terms of the normalized inputs x i as new inputs, in exchange for reducing the width of the hidden layer. This way, we aim to de- termine the nature of the non-linearity captured in Fig. 9. In particular, we can include higher-order terms x i x j to the input for further training steps when the inputs x i and x j show a strong correlation at the current step, for instance y(x i , x j ) + y(x j , x i ) − 2y(x i /2 + x j /2, x i /2 + x j /2) with all other inputs omitted in the expression. For simplicity, we limit ourselves to the quadratic order of the original inputs, related FIG. 11. Higher order terms of the original inputs showing strong correlations in the machine learning of the deconfinement of the Z 2 lattice gauge theory are mostly the products of local plaquettes that give rise to larger Wilson loops. Such quadratic order terms are included as new inputs, progressively to preprocess the non-linearity, so that the complexity of the RELU ANN can be reduced for interpretation. The numbers below are the relative w (1) weights averaged over equivalent inputs under translations and rotations. to W p for the study of the Z 2 lattice gauge theory. In the meantime, we gradually reduce the hidden layer width of the RELU neural network to approach the linear limit, and eliminate newly-included inputs that do not contribute significantly to the output, reducing the width while maintaining performance, see Fig. 10. When the supervised machine learning of the RELU neural network finally converges with a small hidden layer width, the ANN can be interpreted as in the linear function formalism of Sec. II.
The above iterative approach singles out the products of neighboring inputs (see Fig. 11) as new inputs that simultaneously permit the hidden layer width to shrink to as narrow as 3 neuron-wide and contribute with the most weight. Here we averaged over the weights for the inputs of identical geometry to obtain the relative contributionsw (1) jmax for each type of higher-order inputs. The selection of the higher-order inputs and their weights offer much insight into the learning of the ANN. Firstly, it is notable that products of more distant inputs are left out. The new higher-order inputs are exclusively those that combine two smaller loops from the initial input to form a larger loop. This is a feature that is commensurate with expectation for the Abelian gauge theory, for which small Wilson loops fuse to form larger Wilson loops. Secondly, the fact that the new larger loops acquire larger weight compared to the original small loop indicates that the ANN's criteria are consistent with expectation for a deconfinement transition of the Z 2 lattice gauge theory. For a rigorous identification of the transition, the expectation value of the Wilson loop needs to be calculated for a large loop in the thermodynamic limit, a challenging task for any computational approach. But we have demonstrated that the ANN can discover the deconfinement transition by using small loops, together with slightly larger loops obtained through multiplication, to arrive at a rudimentary yet physically sound judgment.

V. CONCLUSIONS
To summarize, we studied the interpretability of machine learning in the context of three distinct topological quantum phase transitions learned by shallow, fully connected feedforward ANNs. The quantum phase transitions of interest are between topologically trivial states and a Chern insulator (CI), topological insulator (TI), and quantum spin liquid (QSL), respectively. As is well-known in the machine learning literature [31], the more expressive the machine learning architectures are, the more opaque are their decision making criteria. To date, "interpretable" machine learning in problems of quantum matter has been restricted to linear models. The relatively simple and minimally non-linear structure of our ANN was nevertheless able to learn topological phases, aided by the quantum loop tomography (QLT): a physically motivated feature selection scheme. For the CI, the criteria the ANN learned amounted to evaluating a noisy version of the relevant topological invariant, which was linear in the QLT inputs. For the TI and the QSL, on the other hand, the ANNs based their criteria on non-linear functions of the inputs. For QSL, we determined that the ANN fused the neighboring Wilson loops of the corresponding Z 2 gauge theory in the QLT to form larger Wilson loops.
Our successful interpretation of QLT-based machine learning gives us confidence in the ANN's phase detection by confirming that its decision criterion is guided by key physical properties of the target phases. Our results should serve to encourage wider application of QLT-based machine learning. We note the important role of physical insight that guided the design of QLT feature selection which, in turn, enabled interpretable machine learning. The shallow ANN depth, combined with physical insight, powered the QLT-enabled interpretation of the learning for the CI. Understanding of deconfinement in the thermodynamic limit was critical to the successful interpretation of fully non-linear machine learning of the QSL. Our approach to interpreting the learning of QSL bears similarity to the variational auto-encoder [32,33], and further investigating this similarity could be an interesting future direction. These results also provide hope that interpretable machine learning in the future can instead inform our physical insight, when our prior understanding is not sufficient to craft the necessary informative features. We could imagine instead such informative composite features emerging in later layers of a deep neural network fed only naive features, and whose interpretation would then lead to a better theoretical understanding of the underlying physics.