Hamiltonian Learning for Quantum Error Correction

The efficient validation of quantum devices is critical for emerging technological applications. In a wide class of use-cases the precise engineering of a Hamiltonian is required both for the implementation of gate-based quantum information processing as well as for reliable quantum memories. Inferring the experimentally realized Hamiltonian through a scalable number of measurements constitutes the challenging task of Hamiltonian learning. In particular, assessing the quality of the implementation of topological codes is essential for quantum error correction. Here, we introduce a neural net based approach to this challenge. We capitalize on a family of exactly solvable models to train our algorithm and generalize to a broad class of experimentally relevant sources of errors. We discuss how our algorithm scales with system size and analyze its resilience towards various noise sources.


I. INTRODUCTION
While finding an eigenstate |Ψ of a given Hamiltonian H might be a daunting task, the problem is nevertheless well defined and can often be solved by a set of existing tools. The inverse question, the identification of a Hamiltonian H given the knowledge of a state |Ψ , is a much more complicated one and without a unique answer in the general case.
Hamiltonian engineering is one of the central objectives in the context of constructing quantum information processing and storage devices [1][2][3]. Despite significant progress of quantum technologies over the last decades, the validation of the engineered Hamiltonian will always be an essential step for any quantum device. Moreover, the implemented Hamiltonian cannot be accessed directly, but only through the measurements performed on the system. Given these measurements one would like to verify that the desired Hamiltonian has been implemented correctly. In addition to that, this verification should ideally be done in an efficient and scalable way.
Reconstructing the Hamiltonian governing the systems' dynamics directly from the measurement results constitutes the inverse problem we address. Beyond not being unique, in the quantum setting there arises an additional challenge due to the exponential size of the Hilbert space: Reconstructing even just the state alone via full quantum tomography is in general exponentially costly [4] (though there are notable cases where tomography can be done efficiently [5][6][7]). However, even with one eigenstate fully reconstructed, we still do not have enough information to determine which Hamiltonian it belongs to in the general setting.
There are two aspects one can exploit to simplify the inverse problem. First, we are typically not interested in the full Hamiltonian. For many purposes we only need to make sure we implement a Hamiltonian which possesses a ground state with desired properties, such as topological ground state degeneracies [8]. In other words, we only aim at finding a possible parent Hamiltonian to this ground state family. Second, the implemented Hamiltonian we try to reconstruct will typically be in the vicinity of the sought after parent Hamiltonian. This property can render the reconstruction feasible and will typically not require a full tomography.
In this work, we develop a method to recover a parent Hamiltonian for a generic example important in the context of quantum devices: the family of the so-called stabilizer Hamiltonians [9] that lie at the heart of the theory of fault-tolerant quantum computation [10,11]. A canonical example of this family of Hamiltonians is the toric code model [8]. The toric code is a quantum spin-1/2 model defined on square lattice with periodic boundary conditions. Its ground state manifold is exactly known and its structure can be used for a stable encoding of quantum states.
In particular, we design a machine learning driven method for the validation of the implementation of the toric code. Our approach addresses all of the aforementioned challenges in the solution of the inverse problem: (i) We find a minimal set of local measurements performed on a ground state that allows us to learn the system's Hamiltonian, i.e., we don't need full tomography. (ii) Capitalizing on the proximity to the targeted toric code Hamiltonian we can train the neural network on a restricted class of exactly solvable models. Using the power of generalization of machine learning algorithms, we can then apply these trained networks to the general problem. (iii) The need for only a very restricted set of measurements and the use of efficient computing algorithms allow us to scale the solution problem beyond system sizes required for future experiments. Moreover, the generalization power of neural nets equips our approach with a certain resilience to experimental noise.
The ability to efficiently validate Hamiltonians with topologically protected ground state degeneracies has direct implications for quantum error correction [10,31]. One way of using the toric code as a quantum memory is to prepare the ground state in order to encode logical qubits in its ground state manifold [8]. In practice, this means designing the many-body interactions that the stabilizer Hamiltonians contains. While many-body interactions can be difficult to implement in general in an experiment, there are theoretical proposals and experimental progress towards achieving the four-body interaction needed for the toric code in a variety of platforms [32][33][34][35][36][37][38][39]. However, only through the reliable validation of these schemes can one lift these proposals to a potential avenue for topologically protected quantum memories. Here we provide such a validation protocol.
In an alternative approach to using the toric code, one arrives at a ground state without the need to engineer any Hamiltonian, but an arbitrary state is successively projected into the desired state by a series of projective measurements [40,41]. The performance of these methods depends on the frequency, correlations and types of errors [42][43][44][45][46] and we comment below how our algorithm relates these approaches. This paper is organized as follows: In Sec. II we discuss the family of exactly solvable stabilizer Hamiltonians and the phase transitions they manifest. Sec. III addresses the neural network based Hamiltonian learning scheme we designed for this class of topological models. In Sec. IV we present the numerical analysis of the convergence and errors of the model and elaborate on its resilience to experimental noise. We discuss our results presented here in the broader context of the field and their implication for the future research directions in Sec V.

II. SOLVABLE TOPOLOGICAL MODELS
Kitaev's toric code model [8] is defined on a k×k square lattice with periodic boundary conditions and spin-1/2 degrees of freedom located on the edges. The Hamiltonian consists of the sum of four-spin interaction terms where the stabilizer operators are defined as Here, s denotes the set of four spins around a vertex and p the set of four spins around a plaquette, respectively (see Fig. 1). Since the stabilizer operators, A s and B p , mutually commute, the ground state manifold is exactly known. It corresponds to eigenstates of all operators A s and B p with maximal eigenvalues +1. The ground state degeneracy depends on the topology of the manifold the model is defined on. On a torus, there are four degenerate ground states |GS i , i ∈ {0, 1, 2, 3}. Moreover, the topological order of these ground states results in stability of the degeneracy against arbitrary local perturbations. A qubit state can then be encoded through a suitably chosen superposition The quantum operations on this state can be performed by so-called non-contractible loops, open strings of σ z or σ x operators accross the lattice. This property of the toric code ground state provides exceptionally stable encoding for the state of a logical qubit: A lattice-size long string of qubits needs to be flipped for the state of the logical qubit to change. For the purpose of our Hamiltonian learning we can restrict the discussion to one of the four states and we will use where |0 is a reference state of the lattice with all spins up. We comment below on why our method is insensitive to the choice made here. A question can be posed about how the ground state changes if the stabilizer Hamiltonian is perturbed. In Refs. [47,48] it was shown that there exists a way to leave the topologically ordered toric code phase while keeping the Phase diagram of the modified toric code model. The field strength β (blue axis) and configuration {λi} are encoded as amplitude and angle in the phase diagram. Important field configurations are marked at the phase diagram boundary with an example field configuration and a small diagram. The small diagrams show the percentages of fields λi being equal to +1, 0 or −1. A phase transition out of the topologically ordered phase (TO) into the non-topologically ordered phase (non-TO) occurs at the black phase boundary. The phase of the corresponding 2D classical spin model is denoted by PM (paramagnetic) , FM (ferromagnetic) or AFM (antiferromagnetic). If all fields are equal to zero, the system corresponds to the pure toric code model (1). In the phase diagram, these configurations are indicated by a red dashed line with label "TC line". model analytically solvable. In particular, we can add σ z -terms to the star operators, A s , in a way that slightly modifies the ground state, while keeping its degeneracy untouched. We write this Hamiltonian as where λ i ∈ [−1, 1]. The positive, real-valued parameter β drives the phase transition out of the topological phase. The particular position of the phase transition depends on the distribution of fields given by λ i . The ground state of H can be expressed in terms of the original toric code ground state, |GS TC as The normalization factor Z is given and explained in Appendix A. The above described modification is not the only approach for keeping the Hamiltonian (1) analytically solvable while driving the phase transition. For example, we can symmetrically modify the plaquette terms, B p by σ x -terms. In particular, adding exp{−β i∈p λ i σ x i } to each plaquette would yield an analogous analytical ground state, The added perturbations are of experimental relevance, as they simplify to small σ z (or σ x ) background fields when Eq. (6) is expanded to lowest order in β. Similarly, one obtains σ z (or σ x ) background fields if the field configuration {λ i } is sufficiently sparse (see Appendix A). This effect could be a result of an imperfection or systematic errors in an engineered interaction.
For the sake of Hamiltonian learning, understanding the phase transition of model (6) is of central importance. Given the analytical form (7), it seems obvious that from suitably chosen measurements one should be able to infer the parameters of the Hamiltonian. However, this oneto-one correspondence might be obstructed by the set of degenerate ground states. It turns out, that as long as we are in a phase adiabatically connected to the pure toric code the Hamiltonian can be recovered independently of the specific superposition of degenerate ground states.
We show the position of phase transition in the perturbed toric Hamiltonian (6) pictorially in Fig. 2. The phase transition is found via mapping to classical spin models [47,48]. In particular, if the corresponding classical model is in its paramagnetic phase we are in the phase of the toric code. Transitions out of this quantum phase are indicated by the ordering of the classical spins. Details about this mapping are provided in App. B. Stability against local perturbations is a known property of topological order. The phase diagram in Fig. 2 shows that topological order in Kitaev's toric code model is also stable with respect to a large class of non-local perturbations.
These considerations lead to an immediate application to quantum error correction. Let us now assume four-body interactions of the toric Hamiltonian (1) are implemented in a chosen physical system. Consequently, the system will eventually arrive to its ground state. The sources of possible errors and inaccuracies are then systematic errors and noise inherently present in the experiment. We argue that error correction can then be performed at the level of Hamiltonian learning by finding a minimal set of measurements that has to be performed on the state at hand in order to deduce the Hamiltonian. Once the Hamiltonian is exactly found it can be corrected in order to arrive at the ideal Hamiltonian H TC .

III. HAMILTONIAN LEARNING
In order to solve the inverse problem we need to identify a minimal set of measurements that allows us to infer the faulty Hamiltonian. This requirement is intrinsically bound to the question of how we deduce the Hamiltonian from such measurements. In this section we outline how we approach this problem and how we can extend our scheme to the case of Hamiltonians where we do not have access to an exact ground state.
The measurements we need to perform to reconstruct the Hamiltonian H should identify the product of field distributions given by {λ i } with the parameter β. To simplify the notation, we introduce the parameters {b i } to denote the product, b i = βλ i .
Using translational invariance of the underlying lattice we find that the expectation values where i corresponds to the chosen lattice site coordinate, contain a sufficient amount of information to recover the parameter b i . This observation implies that for each lattice site we need to evaluate correlation functions of the star operators, A i = i∈s σ x i , "touching" at a given lattice site. The function that maps these expectation values on the Hamiltonian parameters is implemented via a small neural network, see Fig. 1. For each spin we input the expectation values A i , A i+1 , A i A i+1 and obtain b i that determines the field strength on a given spin.
We denote with b z i the field strength of σ z fields and with b x i the fields strength of σ x fields (which are determined from the expectation values of the plaquette terms . As a consequence of symmetry under the exchange of vertices and plaquettes as well as σ x and σ z , the neural network trained to identify the field parameter b z i also succeeds in identifying the parameter b x i . Assuming access to many copies of the system, one can evaluate the expectation values described above. Note that the total amount of expectation values needed scales linearly with the number of parameters to be estimated and hence, linearly with the number of spins in the lattice. We train a small neural net on the exact ground states of the family of Hamiltonians (6). Owing to the analytical solution we are able to simulate lattices of almost arbitrary size. We choose a particular spin in the lattice and evaluate the expectation values A i , A i+1 , A i A i+1 for a range of different field distributions on the surrounding spin and a range of values of b z i . We restrict ourselves to the field configurations {b z i } such that the system is in the toric code phase, cf. Fig. 2. Consequently, it is possible to recover the Hamiltonian H even if the system is in an arbitrary state in the ground state manifold, while the training set is restricted to a specific ground state. The reason for this simplification lies in the properties of the nature of the toric code ground state: The degenerate ground states can not be distinguished by local measurements and hence, the measurement outcome of the stabilizer expectation values is identical for all states in the ground state manifold.
We use the set of expectation values A i , A i+1 , A i A i+1 as the input for our neural net. The labels we train the network to associate with these inputs are the field strength on the given spin b z i (Fig. 1). We find that a dense neural network of 3 layers with 128, 150 and 128 neurons respectively is able to reliably approximate this mapping. We detail the architecture and details of the training in App. C.
We evaluate the performance of the network on states that lie outside of manifold of ground states of exactly solvable Hamiltonians. This strategy would in principle allow us to apply the trained models on measurements of a large quantum computer or simulator. Above, we introduced two examples of toric-code Hamiltonian modifications, one through σ z and one through σ x fields Separately, these modifications do not violate exact solvability of the model. However, in the small-β limit this Hamiltonian covers a very general set of errors that may occur in an actual quantum device. Therefore, we showcase our method on this Hamiltonian. The above Hamiltonian cannot be solved for large systems. However, we can use the network trained on the solvable Hamiltonian. We then apply it to the ground state of (8) which for small system sizes can be obtained by brute force diagonalization.
Let us introduce an iterative algorithm designed to evaluate the fields that characterize Hamiltonian (8). We take one of the above described ground states and evaluate the required correlators for each of the spins, and we use them as input to the trained neural network. For each spin we obtain the magnitude of the field strength in σ x and σ z direction. We follow this step by numerically removing the found fields from the original input Hamiltonian. While in a numerical simulation this is a straightforward task, in a quantum simulation this would be implemented by adjusting the interaction parameters to cancel the fields and re-initializing, or by an adiabatic transition between two Hamiltonians. An implementation of the corresponding set of gates is also possible. We use the "corrected" state as the new input for the same neural network, remove the resulting fields, and repeat the procedure iteratively until convergence. The iterative application is not completely straightforward as β and −β result in the same expectation values of A s and B p . Hence, the iterative procedure includes a decision tree for the sign of the field. With these adjustments (addressed in detail in Appendix C) the correct fields (and therefore the correct Hamiltonian) are found within five iterations. We illustrated the algorithmic field removal in Fig. 3.

IV. RESULTS
We define two quantities to characterize the performance of our method. First, we introduce a measure at the level of the resulting corrected quantum ground states: The probability that physical qubits pertain spin flip or phase errors after the projection onto the stabilizers. Second, we put forward a criterion at the level of the corrected Hamiltonian. This measure is simply a distance between the ideal and the corrected Hamiltonian in a suitably chosen metric.
To formulate the error measure for the corrected states we draw the connection to standard error correction strategies: One starts from an arbitrary quantum state and implements a projection on the eigenspaces of the operators A s = i∈s σ x i , B p = i∈p σ z i by measuring the state of ancillary qubits that are entangled with star and plaquette operators [40]. This projection results in an eigenstate of Hamiltonian (1). Using the measurement outcomes, a decoder identifies single spin operations that map the state to the ground state of Hamiltonian (1). The decoder will succeed if the probability of spin and phase flip errors between the projective measurements is lower than a given threshold. This threshold ranges anywhere between 2%-11% depending on the chosen error model [42][43][44][45][46][49][50][51].
For our case we do not start from an arbitrary state, but from the ground state of the corrected parent Hamiltonian. We can relate the precision of our method to standard decoding techniques by calculating the probability that a single physical qubit will flip its spin or phase when projecting into the stabilizer eigenspace. The calculation of a single qubit spin or phase flip can be deduced from the stabilizer expectation values A s and B p and is detailed in App. D.
In Fig. 4, we show the probability of a single bit or phase flip error as a function of the iteration of our algorithm. The iteration axis also contains pictorial illustrations of the absolute field strengths on the state. When projecting on the stabilizer eigenstates from the original faulty eigenstate we find errors well above the decoding thresholds (typically above 12%). We are, thus, able to show that even a small systematic error present in the Hamiltonian engineering can potentially confuse a decoder. After five iterations we arrive to a single qubit error probability of order of 10 −2 %. 1 In other words, if we attempted to decode the initial faulty ground state, the decoder may fail. After our correction procedure it is always highly likely to succeed. Another measure that quantifies the precision of Hamiltonian learning can be implemented on the level of the Hamiltonian directly [30]. This method relies on expanding the family of Hamiltonians we would like to estimate in a suitable basis and then measure the distance of the estimated coefficients from the ideal ones. In particular, the Hamiltonian is expressed as H = m c m S m , where {S m } is an operator basis with expansion coefficients c m . Once we make sure that the chosen basis {S m } is independent of the parameters to be estimated, the Hamiltonian error simply translates into the distance whereĉ true andĉ recovered are the normalized vectors of exact and found coefficients respectively. In our casê c true/recovered are non-linear functions of b i = βλ i , the parameters the neural network is estimating. We show this functional dependence together with the expansion of the Hamiltonian in App. D. The resulting Hamiltonian error is shown as a function of iterations of the protocol in Fig. 4. We normalize the Hamiltonian error such that the maximal value is one. After five iterations of the protocol, the distance between expansion coefficients decreases to order of 10 −3 . Both state and Hamiltonian error explained up to this point were evaluated under the assumption of perfect experimental readout. We now briefly discuss how our method performs when this assumption is relaxed. We assume that the experimental input for the neural network is in the form of the measured expectation values. Experimental noise would then enter through these expectation values not being evaluated correctly. Additionally, estimating an expectation value using a finite number of samples induces a statistical uncertainty. We simulated this scenario by adding Gaussian noise on our numerically evaluated inputs A s , B p . We show the behavior of the single qubit errors as well as the Hamiltonian error in Fig. 5. We observe that even in presence of Gaussian noise the single qubit error rate after projection onto the stabilizers is reliably reduced below 5%.
All simulations so far are conducted for a lattice with 18 spins, as a full quantum simulation is required to simulate the correction procedure for the non-solvable model [see Eq. (8)]. However, this restriction is not necessary when it comes to training of the neural network. In an experiment, the presented Hamiltonian learning procedure can therefore be applied to almost arbitrary lattice sizes. We demonstrate the scaling behavior by training the network separately for different lattice sizes while the size of the training set is fixed. The computation time to generate the training set via Monte Carlo sampling only increases linearly with number of qubits, as the number of sweeps through the lattice is kept constant. The correction process can only be simulated by keeping the model solvable. We want to emphasize that this restriction appears purely for testing purposes and is not of relevance in an experimental setup. In Fig. 6 the results of our simulations are shown, i.e. the error measure outcomes are independent of lattice size.

V. DISCUSSION
In this work we have introduced a machine-learning driven method for scalable quantum Hamiltonian learning based on a toy model relevant for quantum information processing, the toric code. Scalability was achieved by training the neural network on exactly solvable models. We have shown that our method performs well even for states that lie outside this class of analytical solutions. Using knowledge of topological phases and the stability of the toric code to a large class of non-local perturbations, we justified the restriction of the correction to the topological phase. However, an expansion of the correction process outside of the topological phase should be a straightforward procedure.
Our work is complementary to standard approaches to quantum error correction. More specifically, the performance of a standard decoder is significantly im-  proved when combined with the procedure introduced here. When the Hamiltonian is engineered precisely, logical errors are suppressed. We provided a tool for precise engineering based on measuring a minimal set of local expectation values scaling linearly with number of qubits. Implementing a practical protocol that incorporates both the Hamiltonian learning techniques presented here and stabilizer measurement based decoders would therefore be a natural step towards running a fault-tolerant quantum computations. The ground state structure in limiting cases of the disordered toric code model introduced throughout this work is detailed in this section.

Normalization factor
The ground state of the disordered Hamiltonian defined in Eq. (6) is of the form To explain the normalization factor Z, we re-write the toric code ground state as where G denotes the abelian group whose elements g are all possible spin-flip operations defined by the action of products of vertex operators on an initial spinconfiguration [47]. In the toric code, periodic boundary conditions in all directions are imposed leading to the relation s A s = 1. The group elements of the group G are only determined modulo the factor s A s . As a consequence, the number of elements in the group is equal to half the number of possible products of vertex operators. We can re-write the ground state |Ψ using the introduced notation g|0 . (A5) Here, σ z i (g) can take the values ±1. More concretely, σ z i (g) corresponds to the eigenvalue of the operator σ z i on the eigenstate g|0 . The normalization factor (partition function) has the following form . (A6)

Disorder in the limit of small fields
In particular cases, the disorder defined in Eq. (6) simplifies to σ z (σ x ) single-spin fields. In the limiting case of small fields (β 1), we expand the added term as Terms of higher order containing β 2 , β 3 and β 4 are neglected as a result of the approximation. Hence, we arrive at a model containing only σ z fields acting on single spins, and interactions between spins vanish. More specifically, the Hamiltonian of the disordered model in this limit is given by [47] H

Disorder in the limit of sparse fields
A similar approximation can be made for sufficiently sparse fields. We denote with "sufficiently sparse" here a field configuration, where at most one field parameter λ i per vertex is not equal to zero. If this condition is met, all interactions between spins vanish and only fields acting on single spins remain e −β i∈s λiσ z i = cosh(βλ s0 ))1 − sinh(βλ s0 )σ z s0 .
The index s 0 denotes the spin in vertex s for which λ s0 = 0. We insert this simplification into the Hamiltonian defined in Eq. (6) and arrive at Here, U is defined as the sparse set of spins i with field strength λ i = 0. We conclude, that we arrive at a model consisting of the toric code model with σ z fields and no additional interactions.

Appendix B: Topological phases
We analyze topological order and phase transitions of the disordered toric code model in this section. For this purpose, we map the system to a classical spin model as done in [47]. The ground state of the modified toric code (6) is given by [Eq. (A5)] g|0 . (B1) It should be noted, that the group element g is only determined by the product s∈Si A s modulo the product of all vertex operators, due to periodic boundary conditions. We can map every product of vertex operators to a pseudospin configuration, defined as follows: Artificial degrees of freedom θ s ∈ {−1, 1} are introduced on all vertices s. The value of θ s determines, if the vertex s is flipped (contained in the product of vertex operators we map the pseudo-spin configuration to). More specifically, the vertex s is flipped if θ s = −1. We can now make a change of variables. The eigenvalue σ z i (g) of the spin i depends on the two adjacent vertices s, s being flipped or not. As a consequence, this eigenvalue is given by σ z i (g) = θ s θ s , resulting in a mapping between the the spin configuration {σ z i (g)} and the pseudo-spin configuration {θ s } with a gauge freedom of an overall factor of (−1). The mapping is further illustrated in Fig. 7.
Let us define an Ising model on the pseudo-spin configuration with the Hamiltonian given by [47,48] The connection to the field parameters β and {λ i } is given by the coupling constant J s,s . In particular, By comparing to the partition function of the disordered toric code (A6) model we observe that Z Ising = 2Z [52]. We can show that the phase transition of the disordered toric code is mapped to the phase transition of the Ising model by considering a well-known measure to detect quantum phase transitions, the fidelity susceptibility [53]. The fidelity susceptibility is defined as Here, the state |Ψ(β) is a ground state of a given Hamiltonian depending on the parameter β, for our case the state is defined in Eq. (7). A quantum phase transition is indicated via a maximum or divergence in the fidelity susceptibility χ F [54,55]. In particular, this property has been shown for generic second-order symmetry-breaking quantum phase transitions [56]. It has further been suggested by numerical studies [52], that the fidelity susceptibility is also able to detect a topological phase transition. The fidelity susceptibility for the introduced disordered toric code model is calculated as We can understand the connection between the phase transition of the quantum toric code model and the phase transition of the classical spin model by examining the heat capacity of the mapped Ising model We observe, that the heat capacity is proportional to the fidelity susceptibility of the disordered toric code model. A similar relation has already been found in [52] for the fidelity metric, an equivalent measure to the fidelity susceptibility regarding the characterization of quantum phase transitions. For the disordered toric code model, both fidelity measures simplify to the same expression. As both the heat capacity and the fidelity susceptibility indicate a phase transition via a maximum or divergence respectively, we conclude that a phase transition in the classical spin model at a critical temperature T = T c indicates a quantum phase transition in the disordered toric code model at critical field strength β c = 1 k B Tc . We now understand, that a phase transition in the defined Ising model indicates a quantum phase transition of the disordered toric code model. It remains to examine, if the indicated quantum phase transition is topological, i.e. if the transition is made to a topologically trivial phase. This question has already been discussed in [47,48]. In particular, it has been shown that the paramagnetic phase of the Ising model maps to the topological phase of the perturbed toric code model, whereas the ferromagnetic phase maps to the topologically trivial phase. The antiferromagnetic phase also maps to the topologically trivial phase due to sublattice symmetry. The argument for the mapping of Ising phases to topological phases holds only deep in the corresponding phases in the case of arbitrary field strengths. Combined with the discussion of the fidelity susceptibility, we can conclude that it also holds close to the phase transition and that the topological phase transition is sharp.
Let us examine, how the field configuration {λ i } influences the position of the phase transition at critical field strength β c . A complete description is in general a daunting task due to the amount of degrees of freedom related to the field configuration {λ i }. However, we can gain insights about the stability of the topological phase when focusing on a more conceptual {λ i }-β phase diagram (see Fig. 2). In particular, we aim to show for which field configurations phase transitions out of the topological phase occur at all.
In order to analyze the relation between field configuration and phase transition, we examine two limiting cases. We start with a configuration with λ i = 1 ∀i. This configuration is mapped to an Ising model with equal bond strengths, for which the phase transition occurs at critical field strength β c ≈ 0.44. Let us ask a question whether we can modify the field configuration such that no phase transition occurs anymore. As long as all fields are strictly greater than 0, there can always be found a critical field strength such that the system leaves the topological phase. More concretely, the value of β can always be chosen large enough such that every product of fields βλ i is larger than β c ≈ 0.44. Therefore, a phase transition must occur. Due to sublattice symmetry, the same statement holds for all fields strictly smaller than 0.
There are two options to arrive at a configuration where no phase transition occurs. This can be achieved by obtaining a configuration of neither only strictly positive nor only strictly negative fields. One option lies in setting a fraction of fields equal to 0 and arriving at a mixture of strictly positive fields and fields equal to 0 (or strictly negative fields and fields equal to 0). Here, we can examine the critical concentration of fields, that have to be removed such that no phase transition occurs. In other words, we are interested how sparse the configuration has to be such that the topological phase is never left. To answer this question, we consider a well-known probabilistic classical spin model: an Ising model with random bond dilution. In particular, the Hamiltonian of the diluted Ising model is given by corresponding to Here, P (J s,s ) is the probability distribution of a bond strength, translating into the probability distribution P (λ i ) of a field λ i with i denoting the edge connecting the sites s and s . The question of the critical fraction of fields set to zero hence corresponds to a critical dilution probability q. The model has been well-studied [57][58][59][60] and the critical concentration found to be q c = 0.5. In other words, the presence of the background field on less than half of the spin cannot drive the system out of the topological phase regardless of the field strength.
Another way one can modify the field configuration in order to preserve topological order is to flip signs of a fraction of fields. We can make use of a probabilistic Ising model, which has already been studied in connection to the disordered toric code model [48]. In this model, we consider a bond dilution determined by the probability distribution where p is the probability that the sign of a bond strength is flipped. The critical value of p has been found to be p c ≈ 0.12 [57][58][59][60]. Due to sublattice symmetry, the phase diagram for this model is symmetric with respect to p = 0.5. Hence, a second critical value occurs at p c = 0.88. The J − T phase diagram of both models is well-known and shown in Fig. 8.
The found relations between field configuration and phase transition are summarized into a conceptual phase diagram, shown in Fig. 2. The field configuration {λ i } and the field strength β are encoded as angle and amplitude in the phase diagram. The field distribution {λ i } is varied continuously with change of angle in between the configurations explicitly marked in the diagram. The depicted configurations either mark a phase transition or correspond to a limiting case of field strengths. An exemplary field distribution and a small diagram indicating the percentages of occurring field strengths illustrate the depicted configurations. The field configurations on the left hand side do not contain fields of different signs simultaneously. On the right hand side, all fields are different from 0. The phase diagram is symmetric with respect to its horizontal axis as a result of sublattice symmetry.
Appendix C: Neural Network

Network training and architecture
The architecture and training process of the artificial neural network used for the Hamiltonian learning process is detailed in this section. We have introduced in the main text a class of Hamiltonians of the form Our algorithm aims at determination of the parameters {b z i } and {b x i }. We found the measurement set used as an input for the network heuristically. This set is obtained by minimizing the amount of measurements containing sufficient information to determine the Hamiltonian parameters. The result of this search is the set of expectation values A s , A s and A s A s , that shows to be sufficient to determine the fields |b z i |. Symmetry leads to the equivalent set of plaquette operators, B s , B s and B s B s determining fields |b x i |. As a result, when the trained network receives as input the vertex stabilizer expectation values, it outputs the field |b z i | on the qubit i with adjacent vertices s, s . Analogously, when given the inputs B p , B p and B p B p , it returns |b x i | for qubit i with adjacent plaquetter p and p . We can combine the ability of neural networks to generalize knowledge with the symmetry of the model with respect to exchange of vertices and plaquettes. It follows that it is sufficient to train on the solvable model containing only fields in one direction, and use the network for evaluation of the general non-solvable model. Even though the training is restricted to either vertex stabilizer expectation values or plaquette expectation values, the symmetry ensures that both types of expectation values can be taken as input for evaluation and consequently both field configurations {b z } and {b x } can be determined. We choose to train on the solvable model with σ z -type fields Thus, the network is trained on the inputs A s , A s and A s A s . It is furthermore sufficient to train for a field strength at a specific position (spin), as the measurements for a field at a different position are translationally invariant with respect to the position of the field. In other words, we just choose a single spin to evaluate the expectation values and create the training set by changing the size of the lattice and distribution of the background fields.
We design a neural network with three input neurons, three hidden layers and one output neuron. The network architecture is depicted in Fig. 9. As we estimate a continuous parameter, we calculate the loss function as mean squared distance between the estimated value b output and the correct field strength b label where the vector b output/label corresponds to the field vector of the network output and labeled data respectively. The batch size is denoted by BS. The network is trained separately for each lattice size k, as the stabilizer expectation values vary with lattice size. Training data is generated by calculating the vertex stabilizer expectation values via Monte Carlo sampling for sufficiently distinct field configurations {b z i }, such that the network learns to neglect the contribution in the expectation values from fields we do not aim to determine. More concretely, every field strength is taken from a uniformly random distribution in the interval [−b max , b max ] and the label is chosen to be the absolute value of the field strength at position i. The maximal value of the field strength b max = 1.7 is selected such that field configurations are included inducing a probability of a single qubit error after projection into the stabilizer eigenspace larger than the error threshold of standard decoders. We use a traning set of 7450 examples. The size of the training set is identical for all lattice sizes. We use this restriction in order to be able to analyze scaling behavior (see Fig. 6).
In Fig. 10, we show that training and evaluation loss (50 evaluation examples) both converge after less than 10 4 training steps. The training results for the lattice length k = 16 are plotted. We consider statistical uncertainty in the input expectation values as a main source for the nonzero loss the training converges to, which is similar for all tested lattice sizes. In particular, the neural network has been trained for the lattice sizes k = 3, 4, 8, 12, 16, 20, 24.

Determining the sign
We use a trained neural network to determine the absolute value of b z i and b x i by feeding in stabilizer expectation values. These expectation values do not contain sufficient information to determine the sign of b z i and b x i . In order to find the sign, for each spin we additionally measure the expectation values σ z i and σ x i . We show in the following, that the sign of the field strength can be approximated as the sign of the corresponding expectation value σ z i and σ x i . In particular, we demonstrate this approximation for a model containing fields in both directions with an additional restriction ensuring solvability. Exact diagonalization and application to the complete Hamiltonian learning process for the analytically nonsolvable model discussed throughout this work confirms, that this decision mechnism to determine the sign holds approximately also for the generalized model. Let us examine the restriction added on the model to keep it solvable. For this purpose, we introduce the sets I z and I x , which denote the set of spins, for which b z i = 0 and b x i = 0, respectively. We set I z ∩ I x = ∅. In different words, only a field in one direction can be present on each spin. The Hamiltonian is then given by It can straightforwardly be verified that the un-normalized ground state is of the form As we aim to determine a sign, we can ignore overall factors. Hence, we will not discuss normalization here. We calculate the sign of the expectation value σ z i measured on the introduced ground state with the restriction I z ∩ Any overall factors occurring during the calculation are eliminated by the sign function sgn. In particular, all dependencies on the parameters b x i can be factorized to an overall factor using the condition I z ∩ I x = ∅ and therefore vanish. In addition, we introduced the group G 1 as the subgroup G 1 ⊂ G containing all elements g for which σ z i (g) = 1. Analogously, the group G 2 is defined such that σ z i (g) = −1 for g ∈ G 2 . The factors A 1 and A 2 are given by .
We can relate the sign of the expectation value σ z i to the sign of the parameter b z i by introducing the approximation A 1 ≈ A 2 2 . In addition, we make use of the fact that A 1 , A 2 > 0. Combining the two relations, we arrive at where we dropped the overall factor A 1 . Henceforth, the relations hold approximately. Symmetry implies the analogous relations for the parameter b x i and the expectation value σ x i . We can summarize the Hamiltonian learning process in two steps. In the first step, the stabilizer expectation values A s , A s , A s A s and B p , B p , B p B p are measured for each spin and the results fed into the trained neural network, which outputs the absolute values of the field configurations. In the second step, σ z i and σ x i are measured for each spin and the signs of the parameter determined correspondingly. With the obtained field strengths, the system can be corrected towards the pure toric code. The process can be iterated to achieve optimal results. On the level of quantum states, we have introduced as an error measure the probability that a single qubit flips its spin or phase after projection into the stabilizer eigenspace. In particular, this probability allows to assess the improvement our introduced method can in principle bring to quantum error correction. More concretely, if the calculated probability is reduced below the decoderspecific threshold, the performance of a standard decoder applied to the system is significantly improved. Let us make two clarifications to refine the relation to quantum error correction. The single qubit error threshold is usually defined with respect to the probability of a single qubit error per correction cycle [31,[42][43][44][45], where a correction cycle corresponds to measurements of stabilizers and applied single-qubit gates. Here, we examine a single cycle and hence, it is sufficient to calculate the probability of a single qubit error. In addition, the examined disorder throughout this work is not of the form of single qubit spin-or phase flip errors. Nevertheless, any disorder is translated to spin-and phase flip errors by projective measurements of the stabilizers. Consequently, we explain in this section how to calculate the single qubit error probability after projecting into the toric eigenspace.
Instead of simulating the projective measurements and "counting" the single qubit errors, we calculate the single qubit error probability in a way that does not rely on full quantum simulation and is hence also accessible for large lattice sizes.
We determine the probability of a physical phase-flip (spin-flip), by calculating as first step the probability for an arbitrary vertex (plaquette) to be flipped and deducing the single qubit error probability. We explain the calculation on the example of vertex flips and single qubit phase-flip errors. It can be repeated analogously for the case of plaquette flips and single qubit bit-flip errors.
a nian learning procedure. Every vertex has the eigenvalues {±1}. Thus, we can make an approximation and re-write a vertex expectation value as Here, the probability of the vertex A s to be flipped (projected into an eigenstate with eigenvalue −1) is given by p s . We invert the relation to find the probability p s To obtain an average vertex flip rate p, we average over all vertices b. Calculating the error rate We now understand how to obtain the probability of a vertex flip. We proceed by using the vertex flip probability to deduce the probability of a single qubit phase flip e r as follows.
We calculate the vertex flip rate p for different probabilities of a single qubit phase flip by numerically sampling. In particular, we start with a lattice where no phase is flipped and flip each phase with probability e r . The vertex flip probability p is obtained by repeating the procedure, counting the number of flipped vertices and averaging. We arrive at a function p(e r ). The inverse function e r (p) and as a result the probability of a single qubit error can be found by numerical regression.
The function p(e r ) is independent of lattice size in the parameter range we are considering here. We verified this by calculating the vertex flip probability for the single qubit phase flip probabilities e r ∈ [0, 0.2] for the lattice sizes k = 8, 12, 16, 20, 24, 28, 32, where the total number of spins is equal to 2k 2 (see Fig. 11). We can conclude that it is sufficient to find a numerical fit for the function of one lattice size. To minimize uncertainties caused by the statistical error of the simulation, we pick the largest simulated lattice size, k = 32, to fit the curve. We use an ansatz of a polynomial of degree 4 and minimize the mean squared distance. The curve is fitted up to a mean squared loss of l ms ≈ 10 −7 by the function e r (p) = 0.2187p + 0.72419p 2 − 2.5398p 3 + 4.90118p 4 .
(D4) Both numerically simulated data and the fit are shown in Fig. 12. The obtained function returns the probability of a single qubit phase flip given the vertex stabilizer expectation values and similarly applies to single qubit bit-flip errors given plaquette expectation values. Full quantum simulation for k = 3 confirms these results.

Measure for Hamiltonian error
An error measure on the level of Hamiltonian learning is given by the reconstruction error ∆ H [30]. In this section, we detail the calculation of this Hamiltonian error for the disordered toric code model. As a reminder, ∆ H is given by (9) Each term in the expansion of the Hamiltonian is a product of the form c i S i , with c i being a coefficient and S i an operator. The dependence on the parameters b z i and b x i is fully contained in the coefficients c i , the operator basis is independent of the parameters. More specifically, the operator basis resulting from the Hamiltonian expansion is a combination of the unit matrix and products of Pauli matrices. We illustrate the separation into coefficients and basis by re-writing all occurring operators as S i , an (arbitrarily) numbered basis element. The expansion of the Hamiltonian can be divided in two parts: a sum over all vertices s and a sum over all plaquettes p.