Maximum-Likelihood-Estimate Hamiltonian learning via efficient and robust quantum likelihood gradient

Given the recent developments in quantum techniques, modeling the physical Hamiltonian of a target quantum many-body system is becoming an increasingly practical and vital research direction. Here, we propose an efficient strategy combining maximum likelihood estimation, gradient descent, and quantum many-body algorithms. Given the measurement outcomes, we optimize the target model Hamiltonian and density operator via a series of descents along the quantum likelihood gradient, which we prove is negative semi-definite with respect to the negative-log-likelihood function. In addition to such optimization efficiency, our maximum-likelihood-estimate Hamiltonian learning respects the locality of a given quantum system, therefore, extends readily to larger systems with available quantum many-body algorithms. Compared with previous approaches, it also exhibits better accuracy and overall stability toward noises, fluctuations, and temperature ranges, which we demonstrate with various examples.


I. INTRODUCTION
Understanding the quantum states and the corresponding properties of a given quantum Hamiltonian is a crucial problem in quantum physics. Many powerful numerical and theoretical tools have been developed for such purposes and made compelling progress [1][2][3][4][5]. On the other hand, with the rapid experimental developments of quantum technology, e.g., near-term quantum computation [6,7] and simulation [8][9][10][11][12][13][14], it is also vital to explore the inverse problem, e.g., Hamiltonian learning -optimize a model Hamiltonian characterizing a quantum system with respect to the measurement results. Given the knowledge and assumption of a target system, researchers have achieved many resounding successes modeling quantum Hamiltonians with physical pictures and phenomenological approaches [15,16]. However, such subjective perspectives may risk biases and are commonly insufficient on detailed quantum devices. Therefore, the explorations for objective Hamiltonian learning strategies have attracted much recent attention [17][18][19][20][21][22][23][24][25].
There are mainly two categories of Hamiltonianlearning strategies, based upon either quantum measurements on a large number of (identical copies of) quantum states, e.g., Gibbs states or eigenstates [17][18][19][20][21][22], or initial states' time evolution dynamics [23][24][25][26], corresponding to the target quantum system. For example, given the measurements of the correlations of a set of local operators, the kernel of the resulting correlation matrix offers a candidate model Hamiltonian [17][18][19]. On the other hand, while established theoretically, most approaches suffer from elevated costs and are limited to small systems in experiments or numerical simulations [19,20,27,28]. Besides, there remains much room for improvements in stability towards noises and temperature ranges. * frankzhangyi@gmail.com Maximum likelihood estimation (MLE) is a powerful tool that parameterizes and then optimizes the probability distribution of a statistical model so that the given observed data is most probable. MLE's intuitive and flexible logic makes it a prevailing method for statistical inference. Adding to its wide range of applications, MLE has been applied successfully to quantum state tomography [29][30][31][32][33], providing the most probable quantum states given the measurement outputs.
Inspired by MLE's successes in quantum problems, we propose a general MLE Hamiltonian learning protocol: given finite-temperature measurements of the target quantum system in thermal equilibrium, we optimize the model Hamiltonian towards the MLE step-by-step via a "quantum likelihood gradient". We show that such quantum likelihood gradient, acting collectively on all presenting operators, is negative semi-definite with respect to the negative-log-likelihood function and thus provides efficient optimization. In addition, our strategy may take advantage of the locality of the quantum system, therefore allowing us to extend studies to larger quantum systems with tailored quantum many-body ansatzes such as Lanczos, quantum Monte Carlo (QMC), density matrix renormalization group (DMRG), and finite temperature tensor network (FTTN) [34,35] algorithms in suitable scenarios. We also demonstrate that MLE Hamiltonian learning is more accurate, less restrictive, and more robust against noises and broader temperature ranges. Further, we generalize our protocol to measurements on pure states, such as the target quantum systems' ground states or quantum chaotic eigenstates. Therefore, MLE Hamiltonian learning enriches our arsenal for cutting-edge research and applications of quantum devices and experiments, such as quantum computation, quantum simulation, and quantum Boltzmann machines [36].
We organize the rest of the paper as follows: In Sec. II, we review the MLE context and introduce the MLE Hamiltonian learning protocol; especially, we show explicitly that the corresponding quantum likelihood gradient leads to a negative semi-definite change to the negative-log-likelihood function. Via various examples in Sec. III, we demonstrate our protocol's capability, especially its robustness against noises and temperature ranges. We generalize the protocol to quantum measurements of pure states in Sec. IV and Appendix D, with consistent results for exotic quantum systems such as quantum critical and topological models. We summarize our studies in Sec. V with a conclusion on our protocol's advantages (and limitations), potential applications, and future outlooks.

II. MAXIMUM-LIKELIHOOD-ESTIMATE HAMILTONIAN LEARNING
To start, we consider an unknown target quantum sys-temĤ s = j µ jÔj in thermal equilibrium, and measurements of a set of observables {Ô i } on its Gibbs statê ρ s = exp(−βĤ s )/tr[exp(−βĤ s )], where β is the inverse temperature. Given a sufficient number N i of measurements of the operatorÔ i , the occurrence time f λi of the λ th i eigenvalue o λi approaches: where p λi = f λi /N i denotes the statistics of the outcome o λi , andP λi is the corresponding projection operator to the o λi sector. Our goal is to locate the model Hamil-tonianĤ s for the quantum system, which commonly requires the presence of allĤ s 's terms in the measurement set {Ô i }.
Following previous MLE analysis [29][30][31][32][33], the statistical weight of any given stateρ is: upto a trivial factor, where N tot = i N i is the total number of measurements. For Hamiltonian learning, we search for (the set of parameters µ j of) the MLE Hamil-tonianĤ, whose Gibbs stateρ maximizes the likelihood function in Eq. 2. The maximum condition for Eq. 2 can be re-expressed as: see Appendix A for a detailed review. Solving Eq. 3 is a nonlinear and nontrivial problem, for which many algorithms have been proposed [30][31][32][33]. For example, we can employ iterative updatesρ k+1 ∝R(ρ k )ρ kR (ρ k ) until Eq. 3 is fulfilled [31]. These algorithms mostly center around the parameterization and optimization of a quantum stateρ, whose cost is exponential in the system size. Besides, such iterative updates do not guarantee that the quantum stateρ remains a Gibbs form, especially when the measurements are insufficient to uniquely determine the state (e.g., large noises or small numbers of measurements and there are many quantum states satisfying Eq. 3). Consequently, extractingĤ ∝ − 1 β lnρ fromρ further adds up to the inconvenience.
Considering that the operatorR(ρ) has the same operator structure as the Hamiltonian, we take an alternative stance for the Hamiltonian learning task and update the candidate HamiltonianĤ k , i.e., the model parameters, collectively and iteratively. In particular, we integrate the corrections to the Hamiltonian coefficients to the op-eratorR(ρ), which offers such a quantum likelihood gradient ( Fig. 1): where γ > 0 is the learning rate -a small parameter controlling the step size. We denoteR k ≡R(ρ k ) for short here afterwards. Compared with previous Hamiltonian extractions from MLE quantum state tomography, the update in Eq. 4 possesses several advantages in Hamiltonian learning. First, we can utilize the Hamiltonian structure (e.g., locality) to choose suitable numerical tools (e.g., QMC and FTTN) and even calculate within the subregions -we circumvent the costly parametrization of the quantum stateρ. Also, the update guarantees a state in its Gibbs form. Last but not least, we will show that for γ ≪ 1, such a quantum likelihood gradient in Eq. 4 yields a negative semi-definite contribution to the negative-log-likelihood function, guaranteeing the MLE Hamiltonian (upto a trivial constant) at its convergence and an efficient optimization toward it. Theorem: For γ ≪ 1, γ > 0, the quantum likelihood gradient in Eq. 4 yields a negative semi-definite contribution to the negative-log-likelihood function M (ρ k+1 ) = − 1 Ntot log L(ρ k+1 ). Proof: We note that upto linear order in γ ≪ 1: where ad jÂB = [Â, ad j−1 AB ] and ad 0ÂB =B are the adjoint action of the Lie algebra. The first and third lines are based on the Zassenhaus formula [37] and the Baker-Hausdorff formula [37], respectively, while the second line neglects terms above the linear order of γ.
Following this, we can re-express the quantum state in Eq. 4 as:ρ k+1 =ρ k 1 + βγ 1 0 e βsĤ kR k e −βsĤ k ds 1 + βγ , where we have used tr[ρ kRk ] = 1 as a direct consequence of R k 's definition in Eq. 3. Subsequently, after introducing the quantum likelihood gradient, the negative-log-likelihood function becomes: where we keep terms upto linear order of γ in the log expansion.
On the other hand, we can establish the following inequality: where Z k = tr[e −βĤ k ] is the partition function, ||A|| F = tr[A † A] is the Frobenius norm of matrix A, and the non-negative definiteness ofρ k allowsρ k = (ρ The inequality in the fourth line follows the Cauchy-Schwarz inequality.
We note that the equality -the convergence criteria of our MLE Hamiltonian learning protocol -is established if and only if: which implies the conventional MLE optimization target Rρ =ρ in Eq. 3. We can also establish such consistency from our iterative convergence [38] following Eq. 4: where we have used the commutation relation [R k ,Ĥ k ] = 0 between the Hermitian operatorsR k andĤ k following Finally, combining Eq. 7 and Eq. 8, we have shown that M (ρ k+1 ) − M (ρ k ) ≤ 0 is a negative semi-definite quantity, which proves the theorem.
We conclude that the quantum likelihood gradient in Eq. 4 offers an efficient and collective optimization towards the MLE Hamiltonian, modifying all model parameters simultaneously. For each step of quantum likelihood gradient, the most costly calculation is onρ k+1 , or more precisely, the expectation value tr[ρ k+1Pλi ] fromĤ k+1 . Fortunately, this is a routine calculation in quantum many-body physics and condensed matter physics with various tailored candidate algorithms under different scenarios. For example, we may resort to the FTTN, or the QMC approaches, which readily apply to much larger systems than brute-force exact diagonalization. Thus, we emphasize that MLE Hamiltonian learning works with evaluations of the expectation values of quantum states instead of the more expensive quantum states themselves in their entirety.
Interestingly, MLE Hamiltonian learning also allows a more local stance. For a given Hamiltonian, the necessary expectation value of its Gibbs state tr[ρP λi ] takes the form: whereρ A ef f is the reduced density operator defined upon a relatively local subsystem A still containingP λi . The effective HamiltonianĤ A ef f =Ĥ A +V A ef f of the subregion A contains the existing termsĤ A within the subsystem and the effective interacting termsV A ef f from the trace operation [39]. According to the conclusions of the quantum belief propagation theory [39,40], the locality of the interaction in the latter term ||V A ef f (l A )|| F ∝ (β/β c ) l A /r , where β (β c ) denotes the current (modeldependent critical) inverse temperature, l A is the distance between a specific site in the bulk of A and the boundary of the subregion A, and r is the maximum acting distance(diameter) of a single operator in the original Hamiltonian(similar to the k-local in the next section). Thus, when β < β c (especially when β ≪ β c ),V A ef f is exponentially localized around the boundary of A, and the effective Hamiltonian in the bulk of A remains the same as that of the originalĤ A of the entire system. Therefore, we may further boost the efficiency of MLE Hamiltonian learning by redirecting the expectation-value evaluations of the global quantum system to that of a series of local patches, as we will show in the next section.
In summary, given the quantum measurements of a thermal (Gibbs) state: {Ô i }, N i , and f λi , we can perform MLE Hamiltonian learning to obtain the MLE Hamiltonian via the following steps ( Fig. 1 or an identity Hamiltonian.
For update, carry out the quantum likelihood gradient:Ĥ whereR k is defined in Eq. 3 or Eq. 16.
2) Evaluate the properties tr[ρ kPλi ] of the quantum state:ρ with suitable numerical methods.
3) Check for convergence: loop back to step 1) to update, k → k + 1, if the relative entropy M (ρ k ) − M 0 ≥ ϵ is above a given threshold ϵ; otherwise, terminate the process, and the finalĤ k is the result for the MLE Hamiltonian. Here, M 0 is the theoretical minimum of the negative-log-likelihood function: In practice,R k in Eq. 4 is singular for small values of tr[ρ kPλi ] and may become numerically unstable, which requires a minimal or dynamical learning rate γ to maintain the range of quantum likelihood gradient properly. Instead, we may employ a re-scaled version ofR k : where f g is a monotonic tuning-function: which maps its argument in (0, ∞) to a finite range (0, g).
Such a re-scaledR k regularizes the quantum likelihood gradient and allows a simple yet relatively larger learning rate γ for more efficient MLE Hamiltonian learning. We also have f g (1) = 1, thereforeR k →R k as we approach convergence. We will mainly employR k for our examples in the following sections.
In addition to the negative-log-likelihood function M (ρ k ), we also consider the Hamiltonian distance as another criterion on the quality of Hamiltonian learning: where ⃗ µ s and ⃗ µ k are the (vectors of) coefficients [41] of the target Hamiltonian and the learned Hamiltonian after k iterations, respectively. However, we do not recommend ∆⃗ µ k (∆µ for short) as convergence criteria as ⃗ µ s is generally unknown aside from benchmark scenarios [28].

III. EXAMPLE MODELS AND RESULTS
In this section, we demonstrate the performance of the MLE Hamiltonian learning protocol. For better numerical simulations, we consider the k-local Hamiltonians, with operators acting non-trivially on no more than k contiguous sites in each direction. For example, for a 1dimensional spin- 1 2 system, a k-local operator for k = 2 takes the formŜ α iŜ β i+1 orŜ α i , α, β ∈ {x, y, z}, whereŜ α i denotes the spin operator. In particular, we focus on general 1D quantum spin chains with k = 2, taking the following form: whereŜ α i denotes the spin operator on site i, α, β, ∈ {x, y, z}. There are 12L − 9 2-local operators under the open boundary condition, where L is the system size. We generate the model parameters ⃗ µ s = {J αβ i , h α i } randomly following a uniform distribution in [−1, 1]. This HamiltonianĤ s , specifically the model parameters ⃗ µ s , will be our target for MLE Hamiltonian learning. As the protocol's inputs, we simulate quantum measurements of all 2-local operators {Ô i } on the Gibbs states ofĤ s numerically via exact diagonalization on small systems and FTTN for large systems. For the latter, we use a tensor network ansatz called the "ancilla" method [34], where we purify a Gibbs state with some auxiliary qubitŝ ρ s = tr aux |ψ s ⟩⟨ψ s |, and obtain |ψ s ⟩ = e −βĤs/2 |ψ 0 ⟩ from a maximally-entangled state |ψ 0 ⟩ via imaginary time evolution. In addition, given a large number n of Trotter steps, the imaginary time evolution operator e −βĤs/2 is decomposed into Trotter gates' product as (Π i e −µiÔiβ/2n ) n + O(β 2 /n). Here, we set the Trotter step δt = β/n ∈ [0.01, 0.1], for which the Trotter errors of order O(β 2 /n) show little impact on our protocol's accuracy. Without loss of generality, we employ the integrated FTTN algorithm in the ITensor numerical toolkit [35], and set the number of measures N i = N for all operators in our examples for simplicity.
As we demonstrate in Fig. 2, MLE Hamiltonian learning obtains the target Hamiltonians with high accuracy and efficiency under various settings of system sizes and inverse temperatures β. Besides, instead of the original quantum likelihood gradient in Eq. 3, we may obtain a faster convergence with the re-scaledR k in Eq. 16 and a larger learning rate, as we discuss in Appendix B. In the following numerical examples, we use the rescaled quantum likelihood gradientR k and set g = 2 for the tuning function in Eq. 17. Within the given iterations, not only have we achieved results (Hamiltonian distance ∆µ ∼ O(10 −12 ) and relative entropy M (ρ k ) − M 0 ∼ O(10 −16 )) comparable to, if not exceeding, previous methods [19] for L = 10 systems and β = 1 straightforwardly, but we have also achieved satisfactory consistency (∆µ ∼ O(10 −2 ) and M (ρ k )−M 0 ∼ O(10 −9 )) for large systems L = 100 and low temperatures β = 3 that were previously inaccessible.
MLE Hamiltonian learning is also relatively robust against temperature and noises, two key factors impacting accuracy in Hamiltonian learning. For illustration, we include random errors δ⟨Ô i ⟩ following Gaussian distribution with zero mean and standard deviation δ to all quantum measurements: ⟨Ô i ⟩ → ⟨Ô i ⟩ + δ⟨Ô i ⟩. We note that such δ may also depict the quantum fluctuations [19,21] FIG. 3. The performance of MLE Hamiltonian learning maintains relatively well against noises and, especially, broader temperature ranges. Left: the Hamiltonian distance versus the inverse temperature β shows a broader applicable temperature range. Each data point contains 10 trials. Right: the performance (left figure's data averaged over temperature) versus the noise strength δ shows the impact of noises and the protocol's relative robustness against them. The slope of the straight line is ∼ 1, indicating a linear relationship between ∆µ and δ. Note the log scale log(∆µ) for the vertical axis. We set L = 10 for the system size, and learning rate γ = 1.
from a finite number of measurements δ ∝ N −1/2 i . We also focus on smaller systems with L = 10 and employ exact diagonalization to avoid confusion from potential Trotter error of the FTTN ansatz [34]. We summarize the results in Fig. 3.
Most previous algorithms on Hamiltonian learning have a rather specific applicable temperature range. For example, the high-temperature expansion of e −βĤ only works in the β ≪ 1 limit [42,43]. Besides, gradient descent on the log partition function, despite a convex optimization, performs well in a narrow temperature range [20]. The gradient of this algorithm is proportional to the inverse temperature, so the algorithm's convergence slows at high temperatures. Also, the gradient descent algorithm cannot extend to the β → ∞ limit -the ground state, while our protocol is directly applicable to the ground states of quantum systems, as we will generalize and justify later.
MLE Hamiltonian learning is also more robust to noises, with an accuracy of Hamiltonian distance ∆µ ∼ O(10 −11 ) across a broad temperature range at noise strength δ ∼ O(10 −12 ). Such noise level is hard to realize in practice; nevertheless, it is necessary to safeguard the correlation matrix method [17,19,44]. Even so, due to the uncontrollable spectral gap, the correlation matrix method is susceptible to high temperature, and its accuracy drastically decreases to ∆µ ∼ O(10 −3 ) at β = 0.01. In comparison, MLE Hamiltonian learning is more versatile, with an approximately linear dependence between its accuracy ∆µ and the noise strength δ across a broad range of temperatures and noise strengths, saturating the previous bound [20]; see the right panel of Fig. 3. We also provide more detailed comparisons between the algorithms in Appendix C.
Despite efficient quantum likelihood gradient and applicable quantum many-body ansatz, the computational cost of MLE Hamiltonian learning still increases rapidly with the system size L. Fortunately, as stated above in Eq. 11, we may resort to calculations on local patches, especially for low dimensions and high temperatures due to their quasi-Markov property. In particular, when β < β c (T > T c ), the difference between the cutoff Hamiltonian H A and the effective HamiltonianĤ A ef f in a local subregion A, V A ef f , should be weak, short-ranged, and localized at A's boundary [39,40]; therefore, for those oper-atorsP λi adequately deep inside A, we can useρ A , the Gibbs state defined byĤ A , to estimate the corresponding tr[ρP λi ]; see illustration in Fig. 4 upper panel. For example, we apply MLE Hamiltonian learning on L = 100 systems, where we iteratively calculate the necessary expectation values on different local patches of size L A = 10, 16. We also choose different cut-offs Λ, and evaluate tr[ρ AP λi ] for those operators at least Λ away from the boundaries and sufficiently deep inside the sub-region A, so that the effective potential V A ef f may become negligible. We also employ a sufficient number of local patches to guarantee full coverage of necessary observables -operators outside A or in Λ A are obtainable from another local patch B, as shown in the upper panel of Fig.  4, and so on so forth. Both the L = 100 target system and the local patches for MLE Hamiltonian learning are simulated via FTTN. We have no problem achieving convergence, and the resulting Hamiltonians' accuracy, the Hamiltonian distance ∆µ versus the inverse temperature β, is summarized in the lower panel of Fig. 4. Indeed, the local-patch approximation is more reliable at higher temperatures, as well as with larger subsystems and cutoffs, albeit with rising costs. We also note that we can achieve much larger systems with the local patches than L = 100 we have demonstrated.

IV. MLE HAMILTONIAN LEARNING FOR PURE EIGENSTATES
In addition to the Gibbs states, MLE Hamiltonian learning also applies to measurements of certain eigenstates of target quantum systems: 1. The ground states are essentially the β → ∞ limit of the Gibbs states. However, due to the order-of-limit issue, the γ → 0 requirement of the theorem on Gibbs states forbids a direct extension to the ground states. In the Appendix D, we offer rigorous proof of the effectiveness of quantum likelihood gradient based on groundstate measurements, along with several nontrivial MLE Hamiltonian learning examples on quantum critical and topological ground states. We note that Ref. 45 offers preliminary studies on pure-state quantum state tomography, inspiring this work. 2. A highly-excited eigenstate of a (non-integrable) quantum chaotic systemĤ s is believed to obey the eigenstate thermalization hypothesis (ETH), that its density operatorρ s = |ψ s ⟩ ⟨ψ s | behaves locally indistinguishable from a Gibbs stateρ s,A in thermal equilibrium [46]: where β s is an effective temperature determined by the energy expectation value ⟨ψ s |Ĥ s |ψ s ⟩ = tr[e −βsĤsĤ s ] tr[e −βsĤs ] . As MLE Hamiltonian learning only engages local operators, its applicability directly generalizes to such eigenstates |ψ s ⟩ following ETH.
3. In general, ETH applies to eigenstates in the center of the spectrum of quantum chaotic systems, while low-lying eigenstates are too close to the ground state to exhibit ETH [47]. However, in the rest of the section, we demonstrate numerically that MLE Hamiltonian learning still works well for low-lying eigenstates.
We consider the 1D longitudinal-transverse-field Ising model [46,47] as our target quantum system: where the system size is L = 80. We set J = 1, g x = 0.9045, and g z = 0.8090. The quantum system is strongly non-integrable under such settings. Previous studies mainly focused on eigenstates in the middle of the energy spectrum. In contrast, we pick the first excited state -a typical low-lying eigenstate considered asymptotically integrable and ETH-violating [47] -for quantum measurements (via DMRG) and then MLE Hamiltonian learning for its candidate Hamiltonian (via FTTN).
We summarize the results in Fig. 5. Further, the model Hamiltonian we established is approximately equivalent to the target quantum Hamiltonian at an (inverse) temperature β s ≈ 4 [46], which we have absorbed into the unit of ourĤ k . Therefore, we have accurately established the model Hamiltonian and derived the effective temperature consistent with previous results [46] for a low-lying excited eigenstate not necessarily following ETH. The physical reason for quantum likelihood gradient applicability in such states is an interesting problem that deserves further studies.

V. DISCUSSIONS
We have proposed a novel MLE Hamiltonian learning protocol to achieve the model Hamiltonian of the target quantum system based on quantum measurements of its Gibbs states. The protocol updates the model Hamiltonian iteratively with respect to the negative-loglikelihood function from the measurement data. We have theoretically proved the efficiency and convergence of the corresponding quantum likelihood gradient and demonstrated it numerically on multiple non-trivial examples, which show more accuracy, better robustness against noises, and less temperature dependence. Indeed, the accuracy is almost linear to the imposed noise amplitude, thus inverse proportional to the square root of the number of samples, the asymptotic upper bound [20]. Further, MLE Hamiltonian learning directly rests on the Hamiltonians and their physical properties instead of direct and costly access to the quantum many-body states. Consequently, we can resort to various quantum manybody ansatzes in our systematic quantum toolbox and even local-patch approximation when the situation allows. These advantages allow applications to larger systems and lower temperatures with better accuracy than previous approaches. On the other hand, while our protocol is generally applicable for learning any Hamiltonian, its advantages are most apparent for local Hamiltonians, where various quantum many-body ansatzes and localpatch approximation shine. Despite such limitations, we note that the physical systems are characterized by local Hamiltonians in a significant proportion of scenarios.
In addition to the Gibbs states, we have generalized the applicability of MLE Hamiltonian learning to eigenstates of the target quantum states, including ground states, ETH states, and even selected cases of low-lying excited states. We have also provided theoretical proof of quantum likelihood gradient rigor and convergence in the Appendix D, along with several other numerical examples.
Our strategy may apply to the entanglement Hamiltonians and the tomography of the quantum states under the maximum-likelihood-maximum-entropy assumption [48]. Besides, our algorithm may also provide insights into the quantum Boltzmann machine [36] -a quantum version of the classical Boltzmann machine with degrees of freedom that obey the distribution of a target quantum Gibbs state. Instead of brute-force calculations of the loss function derivatives with respect to the model parameters or approximations with the gradients' upper bounds, our protocol provides an efficient optimization that updates the model parameters collectively.
Acknowledgement:-We thank insightful discussions with Jia-Bao Wang.
We acknowledge support from the National Key R&D Program of China  the re-scaled counterpartR k . As we state in the main text,R k regularizes the gradient, allowing us to employ a larger learning rate γ = 1, which leads to a faster convergence (Fig. 6) and a higher accuracy (Tab. I) given identical number of iterations.

Appendix C: Comparisons between Hamiltonian learning algorithms
In this appendix, we compare different Hamiltonian learning algorithms, including the correlation matrix (CM) method [17,19], the gradient descent (GD) method [20], and the MLE Hamiltonian learning (MLEHL) algorithm, by looking into some of their numerical results and performances. We consider general 2-local Hamiltonians in Eq. 10 in the main text for demonstration and measurements {Ô i } over all the 2-local operators (instead of all 4-local operators as in Ref. [19]).
We summarize the results in Fig. 7: the accuracy of CM is unstable and highly sensitive to temperature; while GD performs similarly to the proposed MLEHL algorithm at low temperatures, its descending gradient becomes too small at high temperatures to allow a satisfactory convergence within the given maximum iterations.
We also compare the convergence rates of the MLEHL and GD algorithms with the same learning rate. As in Fig. 8, the MLEHL algorithm exhibits a faster convergence and a smaller computational cost, which is similar under both algorithms for each iteration. We also include noises following a normal distribution with zero means and O(10 −12 ) standard deviation. We employ a learning rate of γ = 1 for both GD and MLEHL. We set the system size L = 7 and the (inverse) temperature β = 1.
target systemĤ s , we obtain a number of outcomes as the λ th i eigenvalue ofÔ i as: where p λi = f λi /N i , andP λi is the projection operator of the eigenvalue o λi . Our MLE Hamiltonian learning follows the iterations: where |ψ gs k ⟩ is the non-degenerate ground state ofĤ k . Theorem: For γ ≪ 1, γ > 0, the quantum likelihood gradient in Eq. D2 yields a negative semidefinite contribution to the negative-log-likelihood function M (|ψ gs k+1 ⟩) = − 1 Ntot log L(|ψ gs k+1 ⟩) following Eq. 2 in the main text.
Proof: At the linear order in γ, we may treat the addition of −γR k toĤ k at the k th iteration as a perturbation: whereĜ k is the Green's function in the k th iteration: whereQ k = I − |ψ gs k ⟩ ⟨ψ gs k | is the projection operator orthogonal to the ground space |ψ gs k ⟩ ⟨ψ gs k |, and E gs k is the ground state energy. Keeping terms upto the linear order of γ in the log expansion of the negative-log-likelihood function, we have: where difference takes the form: Here, E l k > E gs k because E l k denotes the energy for eigenstates other than the ground state. Our iteration converges when the equality in Eq. D6 is established. This happens when |ψ gs k ⟩ is an eigenstate ofR k , consistent with the MLE conditionR |ψ⟩ = |ψ⟩ (orRρ =ρ).
One potential complication to the proof is that Eq. D3 needs to assume there is no ground-state level crossing or degeneracy after adding the quantum likelihood gradient. A potential remedy is to keep some low-lying excited states together with the ground state and compare them for maximum likelihood, especially for steps with singular behaviors. Otherwise, we can only hope such transitions are sparse, especially near convergence, and they establish a new line of iterations heading toward the same convergence. A more detailed discussion is available in Ref. 45 Here, we consider the spinless 1D Majorana fermion chain model of length 2L as an example [49]: whereγ j is the Majorana fermion operator obeying: and t and g = −1 are model parameters. This model presents a wealth of nontrivial quantum phases under different t/g. We focus on the model parameters in t/g ∈ (−2.86, −0.28), where the ground state of Eq. D7 is a c = 3 2 CFT composed of a critical Ising theory (c = 1 2 ) and a Luttinger liquid (c = 1).  11. The relative entropy, the fidelity fgs = ⟨ψs|ψ gs k ⟩, and the Hamiltonian distance ∆µ (inset) show distinct convergence behaviors in the iterations of MLE Hamiltonian learning for a 2D topological CSL system. Our system size is 4 × 4, and we set the learning rate γ = 0.1.
we consider the chiral spin liquid (CSL) on a triangular lattice: (D13) where the first and second terms are Heisenberg interactions, and the last term is a three-spin chiral interaction. Previous DMRG studies have establishedĤ s 's ground state as a CSL under the model parameters J 1 = 1.0, J 2 = 0.1, and K = 0.2 [50], which we set as the parameters of the target Hamiltonian. Here, we employ exact diagonalization on a 4 × 4 system. Based upon entanglement studies of the lowest-energy eigenstates, we verify that both the modular SU matrix corresponding to C 6 rotations and the entanglement entropy fit well with a CSL topological phase [51]. Subsequently, we perform MLE Hamiltonian learning based on quantum measurements of the ground state, focusing on the operators presenting inĤ s . We summarize the results in Fig.  11. The Hamiltonian distance indicates a stable converging accuracy, yet the relative entropy and the fidelity f gs = ⟨ψ s |ψ gs k ⟩ witness certain instabilities. Indeed, being a topological phase means ground-state degeneracy -competing low-energy eigenstates with global distinctions yet similar local properties.