Towards Novel Insights in Lattice Field Theory with Explainable Machine Learning

Machine learning has the potential to aid our understanding of phase structures in lattice quantum field theories through the statistical analysis of Monte Carlo samples. Available algorithms, in particular those based on deep learning, often demonstrate remarkable performance in the search for previously unidentified features, but tend to lack transparency if applied naively. To address these shortcomings, we propose representation learning in combination with interpretability methods as a framework for the identification of observables. More specifically, we investigate action parameter regression as a pretext task while using layer-wise relevance propagation (LRP) to identify the most important observables depending on the location in the phase diagram. The approach is put to work in the context of a scalar Yukawa model in (2+1)d. First, we investigate a multilayer perceptron to determine an importance hierarchy of several predefined, standard observables. The method is then applied directly to the raw field configurations using a convolutional network, demonstrating the ability to reconstruct all order parameters from the learned filter weights. Based on our results, we argue that due to its broad applicability, attribution methods such as LRP could prove a useful and versatile tool in our search for new physical insights. In the case of the Yukawa model, it facilitates the construction of an observable that characterises the symmetric phase.

Machine learning has the potential to aid our understanding of phase structures in lattice quantum field theories through the statistical analysis of Monte Carlo samples. Available algorithms, in particular those based on deep learning, often demonstrate remarkable performance in the search for previously unidentified features, but tend to lack transparency if applied naively. To address these shortcomings, we propose representation learning in combination with interpretability methods as a framework for the identification of observables. More specifically, we investigate action parameter regression as a pretext task while using layer-wise relevance propagation (LRP) to identify the most important observables depending on the location in the phase diagram. The approach is put to work in the context of a scalar Yukawa model in (2+1)d. First, we investigate a multilayer perceptron to determine an importance hierarchy of several predefined, standard observables. The method is then applied directly to the raw field configurations using a convolutional network, demonstrating the ability to reconstruct all order parameters from the learned filter weights. Based on our results, we argue that due to its broad applicability, attribution methods such as LRP could prove a useful and versatile tool in our search for new physical insights. In the case of the Yukawa model, it facilitates the construction of an observable that characterises the symmetric phase.

I. INTRODUCTION
Lattice simulations of quantum field theories have proven essential for the theoretical understanding of fundamental interactions from first principles, perhaps most prominently so in quantum chromodynamics. However, an in-depth understanding of the emergent dynamics is often difficult. In cases where such an understanding remains elusive, it may be instructive to search for so far unidentified structures in the data to better characterise the dynamics.
In this quest towards new physical insight, we turn to machine learning (ML) approaches, in particular from the subfield of deep learning [1]. These methods have proven capable of efficiently identifying high-level features in a broad range of data types-in many cases, such as speech or image recognition, with spectacular success [2][3][4][5]. Accordingly, there is growing interest in the lattice community to harness the capabilities of these algorithms, both for high energy physics and condensed matter systems. Applications include predictive objectives, such as detecting phase transitions from lattice configurations, as well as generative modeling . We recommend [36] as an introduction to ML for physicists and [37] as a general review for ML applications in physics.
One ansatz for the identification of relevant observables from lattice data is through representation learning, i.e. by training on a pretext task. The rationale behind this approach is that the ML algorithm learns to recognise patterns which can be leveraged to construct observables from low-level features that characterise different phases. However, solving a given task does by itself not lead to physical insights, since the inner structure of the algorithm typically remains opaque. This issue can at least partially be resolved by the use of "explainable AI" tech-niques, which have recently attracted considerable interest in the ML community and beyond. In this work, we focus on layer-wise relevance propagation (LRP) [38]. It is one of several popular post-hoc attribution methods that propagate the prediction back to the input domain, thereby highlighting features that influence the algorithm towards/against a particular classification decision.
We test this approach in the context of Yukawa theory in (2+1) dimensions, using inference of an action parameter as a pretext task in order to identify relevant observables. In a first step, we demonstrate that this is at least partially possible by training a multilayer perceptron (MLP) on a set of standard observables. Here, we show that the relevance of features in different phases, as determined by LRP, agrees with physical expectations. We benchmark our results with a similar method based on random forests. Subsequently, we demonstrate that the action parameter can be inferred directly from field configurations using a convolutional neural network (CNN). We use LRP to identify relevant filters and discuss how these align with physical knowledge. This also allows us to construct an observable that appears to be a distinctive feature of the paramagnetic phase.
The paper is organised as follows. In Section II we briefly review scalar Yukawa theory on the lattice and define important quantities. Section III serves as an introduction to the topic of explainable AI and discusses LRP in order to convey the rationale behind our approach. Numerical results for the MLP and a random forest benchmark are presented in Section IV A. In Section IV B we conduct an analysis of the CNN and subsequently demonstrate how all order parameters, as well as the aforementioned observable relevant for the paramagnetic phase, can be extracted from the filters. We discuss our findings and possible future work in Section V.

II. YUKAWA THEORY
We consider a scalar Yukawa model defined on a (2+1)d cubic lattice with periodic boundary conditions. The theory is comprised of a real-valued scalar field with quartic self-interaction coupled to Dirac fermions. The action for the scalar field can be cast into the following dimensionless form, where Λ denotes the set of all lattice sites. Here, κ is called the hopping parameter and λ takes the role of the coupling constant.
In order to ensure positivity of the partition function, one needs a minimum of two degenerate fermion flavors. Due to their bilinear contributions to the action, the fermionic d.o.f. can be integrated out, yielding the determinant of the discretised Dirac operator, as a multiplicative contribution to the statistical weight. The Euclidean Dirac γ-matrices are absorbed by the staggered transformation, yielding the scalars η µ (n) that mix the spatial and spinor degrees of freedom. They are given by η 1 (n) = 1 and η l (n) = (−1) n1 · · · (−1) n l−1 . M f denotes the fermion mass and g is the Yukawa coupling to the bosonic field. The expectation value of an observable O can then be expressed as the path integral where Z denotes the partition function. Important observables characterising phases and critical phenomena in scalar φ 4 -theory include the magnetisation, as well as the staggered magnetisation which is relevant for negative κ. The scalar part of Equa- tion (1) features the additional staggered symmetry which connects both magnetisations. The fermionic part explicitly breaks this symmetry. A slice of the phase diagram at fixed Yukawa coupling is shown in Figure 1. The theory exhibits an interesting structure, with two broken phases of ferromagnetic (FM) and antiferromagnetic (AFM) nature, where M and M s respectively acquire non-zero expectation values. They are separated by a symmetric, paramagnetic (PM) phase, where both quantities vanish.
We also consider the connected two-point correlation function While the expectation value of the magnetisation can be estimated from a single field configuration at reasonable lattice sizes, signals of n-point correlators are naturally much more suppressed due to statistical noise and cannot be reasonably approximated from one sample. Therefore, we introduce the time-sliced correlator G c (t), which is defined by where the sum runs over spacelike components. It measures correlations only in the temporal direction, which leads to a better signal-to-noise ratio due to the averaging procedure. Some aspects of the derivation and simulation are given in Appendix A. For a comprehensive treatment of Yukawa theory on the lattice, we recommend [39].

III. INSIGHTS FROM EXPLAINABLE AI
Simple methods from statistics and ML often lack the capability to model complex data, whereas sophisticated algorithms typically tend to be less transparent. A commonly used example is principal component analysis (PCA). It has been successfully applied to the extraction of (albeit already known) order parameters for various systems [6,9,12]. However, its linear structure prohibits the identification of complex non-linear features, e.g. Wilson loops in gauge theories. Hence, we require tools capable of modeling non-linearities, such as deep neural networks [17]. They allow for a more comprehensive treatment of complex systems, which has been demonstrated e.g. for fermionic theories in [7,11]. The approach also enables novel procedures, such as learning by confusion and similar techniques, to locate phase transitions in a semi-supervised manner [15,34]. For lattice QCD, action parameters can be extracted from field configurations [24]. Overall, deep learning tools seem particularly well-suited to grasp relevant information about quantum field dynamics in a completely data-driven approach, by learning abstract internal representations of relevant features.
However, their lack of transparency is frequently a major drawback of using such methods, which prohibits access to and comprehension of these representations. A unified understanding of how and what these architectures learn, and why it seems to work so well in a wide range of applications, is still pending. To better understand the processes behind neural network-driven phase detection in lattice models, multiple proposals have been made, such as pruning [8,25,31], utilising (kernel) support vector machines [16,32], and saliency maps [33].
Also, in the broader scope of ML research, there has been growing interest in interpretability approaches, most of them focusing on post-hoc explanations for trained models. So-called attribution methods typically assign a relevance score to each of the input features that quantifies-depending on the attribution algorithmwhich features the classifier was particularly sensitive to, or influenced the algorithm towards/against an individual classification decision. In the domain of image recognition, such attribution maps are typically visualised as heatmaps overlaying the input image. The development of attribution algorithms is a very active field of research in the ML community. Therefore, we refer to dedicated research articles for more in-depth treatments [40,41]. Very broadly, the most important types of such local interpretability methods can be categorised as: 1. Gradient-based, such as saliency maps [42] obtained by computing the derivative of a particular class score with respect to the input features or integrated gradients [43]. 2. Decomposition-based, such as layer-wise relevance propagation (LRP) [38] or DeepLift [44]. 3. Perturbation-based, as in [45], investigating the change in class scores when occluding parts of the input features.
In this work, we focus on LRP, as mentioned a particu-FIG. 2: Sketch of LRP through the last two layers of a classification network that predicts one-hot vectors. Relevance is indicated by arrow width. The conservation law requires the sum of widths to remain constant during backpropagation. Diagram adapted from [46]. lar variant of decomposition-based attribution methods. However, we stress that the qualitative finds do not rely on the specific choice of method. The general idea of LRP is to start from a relevance assignment in the output layer and subsequently propagate this relevance back to the input in a layer-wise fashion using certain propagation rules, see the sketch in Figure 2 and Appendix C for details. In this way, the method assigns a relevance score to each neuron, where positive (negative) entries strongly influence the classifier towards (against) a particular classification decision.

IV. RESULTS
In this section, numerical results are presented which corroborate our rationale. We train a multilayer perceptron (MLP) and a convolutional neural network (CNN) to infer the associated hopping parameter κ from a set of known observables (Approach A), as well as solely from the raw field configurations (Approach B), akin to [24]. In the first case, without providing any prior knowledge of the phase boundaries, LRP manages to reveal the underlying phase structure and returns a phase-dependent importance hierarchy of the observables that is in accordance with physical expert knowledge. In the second case, by calculating the relevances of the learned filters, we can associate each of them with one of the physical phases and thereby extract the known order parameters. Moreover, it facilitates the construction of an observable that characterises the symmetric phase. Both variants of our strategy are sketched in Figure 3. Predicting action parameters from samples is an ill-conditioned inverse problem, since the same field configuration can in principle be sampled from different combinations of couplings. Hence, the optimisation objective is formulated in terms of maximum likelihood estimation. Assuming a Gaussian distribution with fixed variance, this objective reduces to minimising the mean squared error (MSE), which we use as loss function in the following. In addition, we apply weight regularisation, see Appendix E for details.

Configurations
Action parameters FIG. 3: Sketch of our strategy to learn meaningful structures from the simulation data by analysing the networks trained for action parameter inference. Field configurations used for training are either preprocessed into observables for the MLP (Approach A) or directly operated upon with a CNN (Approach B). Obtaining accurate predictions for the parameters indicates approximate cycle consistency in the above diagram, which supports the notion that the networks have successfully identified characteristic features. These can then be extracted in a subsequent interpretation step using LRP.

A. Importance Hierarchies of Known Observables
In Section II we introduced a set of standard observables, consisting of the normal and staggered magnetisation as well as the time-sliced two-point correlation function. 1 It seems reasonable to assume that much of the relevant information characterising the phase structure and dynamics of the theory is encoded in these quantities. To check this, we create an ordered dataset of measurements of these quantities at various, evenly spaced values of κ (see Appendix B for details on the dataset) and use it to perform a regression analysis. We employ a MLP, also called fully-connected neural network (see Appendix E for details on the specific architecture). The method is compared against a random forest regressor as a baseline, which is a standard method based on the optimisation of decision trees [47] (see Appendix D for details). The results for both approaches, shown in Figures 4 and 5, will be discussed in the following.
We observe qualitatively similar accuracy on the training and test data in the broken FM and AFM phases. This is expected, since we know from Figure 1 that always one of the two types of magnetisations is strictly monotonic in the respective phase and can therefore determine κ uniquely. However, both approaches yield at best mediocre performance in the symmetric PM phase. Here, both magnetisations tend to zero and therefore do not contain much relevant information. Moreover, the two-point correlator exhibits approximately symmetric properties around κ = 0. Therefore, it also does not provide a unique mapping. This issue is resembled in the prediction for both methods. The random forest yields a symmetric discrepancy around κ = 0. In comparison, the MLP shows an improved performance for κ < 0, albeit at the price of a larger variance for κ > 0. At this point, we can already see that the chosen set of observables suffices to characterise the theory only in the broken phases, whereas in the symmetric phase, additional information appears to be necessary.
Before we embark on the search for the missing piece, let us first examine the results further to verify that the learned decision rules conform to the physical interpretation given above. We begin with the relevances as determined by LRP, shown in Figure 4 (bottom), and later compare to the random forest benchmark below. As expected, M and M s are relevant in the FM and AFM phases, respectively. There, considerable relevance is also assigned to the observable G c (t = 0). However, the contribution appears to diminish when going deeper into the broken phases. Its comparably large relevance in the symmetric PM phase shows that it contains most of the information used for the noisy prediction. As de- scribed above, the mediocre performance in this phase indicates that although the network seems to find weak signals, the chosen set of observables cannot be optimal. The interpretation sketched above is further supported by the results obtained through random forest regression [47]. Analogously to the previously introduced relevance for LRP, we can determine nominal contributions of input features to the prediction and hence a measure of local feature importance (see Appendix D for details), which is shown in Figure 5 (bottom). In the broken FM and AFM phases, the respective contributions of M and M s demonstrate a linear dependence on κ. Again, this clearly indicates that these quantities characterise the associated phases. For the symmetric PM phase, the situation appears more challenging, since no such clear dependence is observed for any of the observables. The non-zero contributions of features in the PM phase imply that they add some valuable information to the decision here. However, this has to be weighted against the observation that the accuracy in this region is poor. This further confirms our previous conclusion that relevant information to characterise this phase is largely lost in the preprocessing step, assuming that it was initially present in the raw field configurations. It is worthwhile stressing that this analysis represents an independent confirmation of the results obtained above. Both algorithms (MLP vs. random forest) rely on fundamentally different principles. We use a model-intrinsic interpretability measure for the random forest, whereas for the MLP we rely on LRP, i.e. a post-hoc attribution method.

B. Extracting Observables from Convolutional Filters
In the previous section, we used a dataset of known observables to reconstruct κ. Calculating such quantities corresponds to heavy preprocessing of the highdimensional field configuration data. The resulting lowdimensional features are far less noisy, implying distillation of relevant information. This is a common procedure in the field of data science, and may become unavoidable for large lattices and/or theories with more degrees of freedom. E.g. in state-of-the-art simulations of lattice QCD, the required memory to store a single field configuration can easily reach O(10 9 ) floating point numbers. Nevertheless, using preprocessed data in the form of standard observables introduces strong biases towards known structures. If our perception of the problem or generally our physical intuition is flawed, machine learning cannot help us-the relevant information may very well be lost in the preprocessing step. In the present case specifically, it appears that important features in the PM phase are neglected by this procedure, assuming that structures characterising this phase do in fact exist. Therefore, it is instructive to search for signals of such structures by training neural networks directly on field configurations. As a starting point for this search, we first perform a PCA on the field configuration dataset. As previously mentioned, this has been done before with promising results [6,9,12], albeit not in exactly the same physical setting. PCA immediately identifies the normal and staggered magnetisations as dominant features, essentially reproducing the work of [9]. All higher order principal components show a vanishing explained variance ratio, implying that no other relevant, purely linear features are present in the data. This observation indicates that, if a quantity exists which parametrises the symmetric PM phase, it cannot simply be a linear combination of the field variables.
Our improved approach is based on a convolutional neural network (CNN). The training procedure is largely equivalent to that for the MLP in the previous section, with the observable dataset replaced by the full field configurations. We train a CNN using five convolutional filters with a shape of 2×2×2 and a stride of 1. In order to support explainability, we encourage weight sparsity by adding the L 1 norm to the loss-also known as LASSO regularisation-as suggested in [33] (see Appendix E for details). Due to the nature of the convolution operation, learned filters have a direct interpretation i.t.o. first-order linear approximations of relevant observables. Hence, we expect the CNN to reproduce the PCA results at the very least, and aim for the identification of other, non-linear quantities, which the network can encode in subsequent layers. It is important to understand this difference between the approaches, even though both extract only linear signals in a first approximation.
The model predictions are shown in Figure 6 (top). We can immediately observe a superior performance in the PM phase compared to our previous results. The CNN succeeds to consistently infer κ from the field configuration data with high accuracy. This indicates that it indeed manages to construct internal representations suitable not only to discern the different phases, which would be sufficient for classification purposes, but also for an ordering of data points within each phase.
In order to interpret the predictions and extract knowledge about the learned representations, we have to customise LRP to our needs. In image recognition, as previously mentioned, one mostly aims at highlighting important regions in the input domain, leading to superimposed heatmaps. This is based on the inherent heterogeneity common to image data, where relevant features are usually localised. For field configurations on the lattice, due to the translational symmetry of the action and the resulting homogeneity, no particularly distinguished, localised region should be apparent in any given sample. However, each convolutional filter encodes an activation map that is in fact sensitive to a specific feature present in a lattice configuration. In contrast to the usual ansatz, the spatial homogeneity promotes global pooling over the relevances associated with each filter weight. Hence, instead of assigning relevances to input pixels, we are interested in the cumulative filter relevance which indicates their individual importance for a particular prediction. Analogously to the rationale of the previous section, we can use this approach to build importance hierarchies of filters, thereby facilitating their physical interpretation as signals of relevant observables. Figure 6 (bottom) shows each filter relevance as a function of κ. We can recognise some similarities to the relevances in Figure 4, highlighting the underlying phase structure of the Yukawa theory. It appears that the model can parametrise each phase individually using one or a small subset of filters, while the others show small or insignificant relevances in the respective region. The learned weight maps are shown in Figure 7, where we also assign names to the filters depending on the corresponding associated phase, with the exception of filter no.0 because it exhibits completely vanishing weights and relevance. It seems to have been dropped entirely by the network, indicating that four filters are sufficient to characterise all phases seen in the data. This reduction is an effect of the regularisation, and constitutes a recurring pattern also when more filters are initially used, providing a first hint towards the number of independent quantities utilised by the network.
Let us begin by examining the results that directly correspond to known quantities. We observe that the FM1 and FM2 filters have entries of roughly uniform magnitude with a globally flipped sign. Accordingly, we can identify them as signals of the negative and positive branches of the magnetisation M , respectively. This is corroborated by their dominating relevances in the FM phase. The AFM filter exhibits alternating entries of uniform magnitude and therefore corresponds to the staggered magnetisation M s , which accordingly dominates the AFM phase. Hence, both order parameters can be explicitly reconstructed from the CNN. The appearance of two filters for the magnetisation is easily understood by inspection of the network architecture in Table III, the crucial point being the application of a ReLU activation after the convolution operation. Consider the action of a positively-valued filter to negatively magnetised field configurations, or vice versa. The resulting negative activation map is subsequently defaulted to zero by the ReLU. Hence, in order to take both branches of M into account, two equivalent filters with opposing signs are required. The comparably large error bars in this region stem from the presence of positively and negatively magnetised samples in the dataset, which lead to a higher per-filter variance. Therefore, we additionally plot the cumulative relevance of both filters.
We now discuss the main object of interest, namely the PM filter. It supplies the dominant signal for the characterisation of this phase. A linear application of this filter to the configurations, as done for the FM and AFM filters, does not produce a monotonic quantity, which would be required for a unique ordering. This further supports the aforementioned evidence gathered by PCA for the absence of an additional, purely linear observable. Hence, the simple reconstruction scheme outlined in the previous paragraphs cannot be applied in this case. Instead, we undertake a heuristic attempt to reconstruct the relevant quantity. To this end, we note that the ReLU activation applied to the convolutional layer's output can effectively correspond to the absolute value function, albeit with less statistics, if the entries of the activation map are distributed accordingly. Inspired by this observation, we define the following observable, − φ(n +μ 2 +μ 3 ) + φ(n +μ 1 +μ 2 +μ 3 ) . (9) As with M and M s , we obtain the corresponding staggered form O s PM by applying the transformation given in Equation (6). The resulting pair of quantities is visualised by the following idealised filters. The observable O PM defined in Equation (9) is the sum over all lattice sites of the lattice derivative in the diagonalμ 2 +μ 3 direction of blocks in theμ 1 direction. This already explains the modulus, as otherwise O PM would be the sum over all sites of a total derivative, which vanishes identically. We also remark that O PM can be made isotropic by summing over all directions.
We now discuss the properties of the theory that are measured by O PM : In the continuum limit, O PM naively tends towards the volume integral over |∇φ|. Due to The blocking in theμ 1 -direction leads to a sensitivity of O PM to sign flips of nearest-neighbours. While no continuum observable is sensitive to these sign flips, the continuum limit of O PM maintains this information. Accordingly, O PM exhibits a distinct behavior in the presence of localised, (anti-)magnetised regions, even if the expectation values vanish globally. Possible local field alignments resulting in different values of O PM , but not of the standard derivative, are visualised in Figure 11.
The construction and discussed sensitivities of O PM demonstrates again the usefulness of LRP: we can identify the learned representation as a feature of the dataset arising from the lattice discretisation. O PM and O s PM as functions of κ are shown in Figure 9 together with the other reconstructed observables and their respective analytical counterparts. A monotonic, roughly linear dependence is observed in the PM phase, indicating that the quantity indeed provides a unique mapping which aids the κ inference. In fact, if O PM is included in the set of predefined observables for the inference approach detailed in the previous section, the prediction accuracy of the MLP accordingly becomes comparable to the CNN in this phase.
In conclusion, we find that the CNN characterises the PM phase by additionally measuring kinetic contributions in the described manner, rather than only expectation values of the condensate like in the broken phases. Still, M and M s are being utilised as well, judging from the comparably large relevances of the FM filters in this region. Due to the opacity of the fully-connected layers following the convolution, some ambiguity remains regarding the precise decision rules that the network implements based on these quantities. This residual lack of clarity can likely be resolved by manually enforcing locality in the internal operations, e.g. by introducing artificial bottlenecks into the network. Of course, the form of O PM is also not exactly equivalent to the operations of the CNN, even though they share many important features. In particular, there is a mismatch between the averaging procedure and the MaxPool layer. Effects associated with the choice of different activation functions and pooling layers, which may be tailored more specifically towards certain types of observables, should be investigated in the future. However, our analysis shows that the overlap with the learned internal representation is significant.

V. CONCLUSIONS AND OUTLOOK
We have investigated the application of interpretability methods to deep neural network classifiers as a generalpurpose framework for the identification of physical features from lattice data. The approach facilitates an interpretation of a network's predictions, permitting a quantitative understanding of the internal representations that the network learns in order to solve a pretext task-in this case, inference of action parameters. This culminates in the extraction of relevant observables from the data, leading to insights about the phase structure.
First, both types of magnetisations and the time-sliced, connected two-point correlator were used as training data for a MLP (see Figure 4). Inference of the hopping parameter was shown to work in each of the two broken phases, respectively. However, in the symmetric phase, the network was observed to suffer from bad accuracy. This indicates that the amount of relevant information present in the dataset is insufficient for the network to fully capture the dynamics of the theory. Using layerwise relevance propgation, we determined a κ-dependent importance hierarchy of the observables. Using this approach we were able to confirm our physics expectations about order parameters being relevant within their associated phases. Moreover, while the two-point correlation function is sensitive to the PM phase, this singal is insufficient for attaining high accuracy for the MLP. Our numerical results and interpretation thereof were further verified by a random forest regression benchmark performed on the same dataset, which demonstrated qualitatively comparable accuracy (see Figure 5).
Next, we trained a CNN directly on the field configurations. In contrast to aforementioned results, the CNN was shown to yield superior accuracy for the same inference task (see Figure 6). Therefore, the set of observables chosen previously must have neglected important information, which the network managed to distill from the raw data. Employing LRP, a cumulative relevance was assigned to the individual convolutional filters, revealing a distinctive pattern that explains the decision process. In particular, we observed that the network specifically assigned filters to the each of the phases of the theory, with small to vanishing relevances in the remaining phases. This also indicates where phase transitions are located. We confirmed that the learned filters corre-spond to representations of the known order parameters by examining the weight maps (see Figures 7 and 8), essentially reproducing previous results.
Guided by the filter analysis, we constructed an observable that characterises the symmetric phase. In a heuristic attempt to find the exact form of this quantity, we defined O PM in Equation (9) and showed that it exhibits several interesting properties (see Figure 9). We interpreted this quantity as a particular measure of local fluctuations that is also sensitive to nearest-neighbour sign flips. This further validates our physical intuition, since in the PM phase, we expect that relevant information for its characterisation is encoded by kinetic contributions. As discussed in detail below Equation (9), the naive continuum limit of O PM is simply the volume integral of |∇φ|, Hence, it has lost the information about nearestneighbour sign flips, while the continuum limit of its expectation value, O PM , keeps its sensitivity towards this property. Accordingly, the construction of this observable guided by the filter analysis is non-trivial evidence for the potential power of the present approach: the results demonstrate that we can identify relevant structures which may otherwise stay hidden. At this point, LRP has indeed facilitated a deeper understanding of the CNN, by explaining the origin of its comparably high accuracy w.r.t. the MLP. With these results, we have conclusively established the value of interpretability methods in deep learning analyses of lattice data.
In the present work, the emphasis was put on the methodological aspects of the analysis in order to form a comprehensive basis for future efforts. Many interesting aspects, such as an investigation of the fermionic sector, were barely discussed. Instead, we have focused on the inference of the hopping parameter. Including other action parameters into the labels, such as the Yukawa coupling or chemical potential, is a promising endeavour for the future, as it will likely lead to an improvement in comparison to the current results. This is necessary in order to pave the way towards an application to more interesting scenarios, such as QCD at finite density or competing order regimes in the Hubbard model. Moreover, the introduced ML pipeline has the potential to provide insight also in various other areas of computational physics. All field configurations composing the datasets used in this work are generated with the parameters listed in Table I. A single, labeled sample is given by the mapping In order to explicitly enforce Z 2 symmetry onto the neural networks, we use the same configurations twice in the dataset, just with a globally flipped sign. This raw data is directly used to train the CNN. For the MLP, the samples are preprocessed by computing the chosen set of observables for each configuration, In this case, we can simply take the modulus of the magnetisations without losing information, since only two branches with exactly opposite signs are present in the phase diagram. Due to the finite expectation value of the staggered magnetisation, the AFM phase contains unphysical negative correlations. In order to remove these lattice artifacts, we adapt the usual time-sliced two-point correlator to Generally, LRP is designed for classification problems. Therefore, we discretise κ to facilitate the formulation of the inference objective as a classification task. All values of κ are transformed into individual bins and the networks are tasked to predict the correct bin. In order to retain a notion of locality, the true bins are additionally smeared out with a Gaussian distribution, resulting in the target labels (B4) Here, b denotes the bin number, and the variance was set to σ = 3∆κ. In combination with a MSE loss, we obtain qualitatively similar prediction results compared to a standard regression approach.
connected sequentially with so-called branches. A single decision tree is grown iteratively from a root node to multiple leaf nodes. A concrete prediction corresponds to a unique path from the root to a single leaf. Each node on the path is associated with a specific feature. Hence, we can sum up the contributions to the decision separately for each feature by moving along the path, Here, the bias corresponds to the average prediction at the root node. We employ the scikit-learn implementation [48] in combination with a TreeInterpreter extension [49]. The latter reference also provides an excellent introduction to the concept of feature contributions. The random forest is initialised with 10 trees and a maximum tree depth of 10. This parameter is essential for regularisation, since an unconstrained depth causes overfitting and thus results in poor generalisation performance.

Appendix E: Network Architectures and Implementation Details
We use the PyTorch framework [50]. The machinery of LRP is included by defining a custom torch.nn.Module and equipping all layers with a relevance propagation rule. Furthermore, all biases are restricted to negative values in order to ensure the existence of a root point. For training, we employ the Adam optimiser [51] with default hyperparameters and an initial learning rate of 0.001, using a batch size of 16.
For both networks, the first layer undergoes least absolute shrinkage and selection operator (LASSO) regularisation during training, which encourages sparsity and thereby enhances interpretability. This corresponds to simply adding the L 1 norm of the respective weights w ij to the MSE loss, which accordingly takes the form Here, y f ,ŷ f denote the prediction and ground truth labels, and i, j the input and output nodes of the first layer. The quantity λ Lasso parametrises the strength of the regularisation.
The network architectures used in this work are given in the following tables.