Quantum computation of molecular structure using data from challenging-to-classically-simulate nuclear magnetic resonance experiments

We propose a quantum algorithm for inferring the molecular nuclear spin Hamiltonian from time-resolved measurements of spin-spin correlators, which can be obtained via nuclear magnetic resonance (NMR). We focus on learning the anisotropic dipolar term of the Hamiltonian, which generates dynamics that are challenging-to-classically-simulate in some contexts. We demonstrate the ability to directly estimate the Jacobian and Hessian of the corresponding learning problem on a quantum computer, allowing us to learn the Hamiltonian parameters. We develop algorithms for performing this computation on both noisy near-term and future fault-tolerant quantum computers. We argue that the former is promising as an early beyond-classical quantum application since it only requires evolution of a local spin Hamiltonian. We investigate the example of a protein (ubiquitin) confined in a membrane as a benchmark of our method. We isolate small spin clusters, demonstrate the convergence of our learning algorithm on one such example, and then investigate the learnability of these clusters as we cross the ergodic to non-ergodic phase transition by suppressing the dipolar interaction. We see a clear correspondence between a drop in the multifractal dimension measured across many-body eigenstates of these clusters, and a transition in the structure of the Hessian of the learning cost-function (from degenerate to learnable). Our hope is that such quantum computations might enable the interpretation and development of new NMR techniques for analyzing molecular structure.


I. INTRODUCTION
Quantum computing researchers are struggling to find near-term, 'beyond-classical' applications of quantum computers: problems whose solution has scientific or commercial value but that cannot be solved on classical devices alone.Fault-tolerant (FT) quantum computers able to solve valuable beyond-classical problems in chemistry [1][2][3][4] and materials science [5][6][7] are predicted to be years away, leaving us in the noisy intermediate scale quantum (NISQ) era [8].And, while initial quantum experiments beyond classical [9,10] or on the beyondclassical boundary [11,12] have proven to be of great interest, it has been difficult to extend these to solve practical problems in other fields.This is due to the large error rates on quantum computers, and to the overhead that comes from mapping non-native problems onto a quantum computer.Here, non-native means problems beyond studying statics and dynamics of local spin systems, such as fermionic quantum simulation, linear algebra, or optimization.Algorithms to solve these problems incur costs in circuit compilation [13][14][15][16] and measurement [17][18][19][20], creating a gap in hardware requirements before these can achieve beyond-classical results.Before this gap is crossed, it makes sense to focus on those practical applications that do not require this overhead.
Nuclear magnetic resonance (NMR) spectroscopy has been a lauded cornerstone of analytical and organic chemistry since its discovery 80 years ago [21,22], being applicable to any molecule or material containing atoms with non-zero nuclear spin.In an NMR experiment, these spinful nuclei are excited by a radio-frequency pulse and allowed to evolve for some period of time under a typically strong (multiple Tesla) magnetic field, yielding a small time-dependent response in said magnetic field.This response, or free induction decay, contains information about its generating nuclear spin Hamiltonian, which itself contains information about the chemical structure of the molecule or material under probe.The nuclear spin Hamiltonian is typically a strongly interacting quantum Hamiltonian due to its strong dipolar coupling.From accurate knowledge of the dipolar couplings between different spins, one can infer the real-space molecular structure.However, when molecules are free to quickly rotate (e.g., when in solution) this interaction averages out, leaving only chemical shifts (local fields), a weak electronmediated Heisenberg coupling term [23], and residual incoherent (classical) processes.Such Hamiltonians can be easily classically analysed (sometimes even by intuition), one of the reasons why NMR has shown such success to this day.However, NMR spectra are considerably more complex for systems that are not free to tumble in all directions (as they are in a solution field [24]): e.g., solidstate materials [25,26], molecules in gels [27], stuck on surfaces [28] or in membranes [29,30].Spectrum prediction can also pose a challenge for experiments operating in low magnetic field such as 'zero-field NMR' experiments [31][32][33][34], which promise more affordable and practi-cal NMR technologies not requiring huge magnetic fields.These systems present a region of parameter space where data cannot be analysed by classical computers, yielding a potential area for beyond-classical quantum computation.This ability to analyze classically intractable data sets could enable new types of NMR techniques to characterize previously difficult-to-characterize systems.
Quantum dynamics are generated by a system's Hamiltonian; from a sufficient set of experimental data, it should be possible to learn the Hamiltonian that generated it.Hamiltonian learning is the inverse problem to predicting experimental outcomes given a system's Hamiltonian [35], and is well-established in the field of quantum information as an approach to device characterization.Sufficiently-small devices may be characterized via classical post-processing of experimental data via Bayesian [36,37] or machine-learning methods [38][39][40].Exact classical methods are intractable in large systems whenever the forward problem becomes beyond-classical, but this can be avoided when possible by careful experiment design.For example, given sufficient control one can dynamically decouple a small subsystem from its environment, even in the presence of large background magnetic fields, and then learn the global structure pieceby-piece [41][42][43].It is also possible to learn Hamiltonians from expectation values of thermal or long-time average states, which are more easy to classically approximate [44,45].These methods also allow one to learn a Hamiltonian from highly accurate measurements at ultra-short periods of time, where e iHt 1 + iHt is a nearly-exact approximation.However, when none of the above methods are viable, the Hamiltonian learning problem becomes classically challenging, giving a potential beyond-classical quantum computing application.This still requires that the experiment yield sufficient data that the Hamiltonian can be inferred.
If a quantum system is chaotic, or ergodic, following some perturbation its state will explore its entire Hilbert space, showing little dependence on the precise Hamiltonian parameters and washing out long-time correlation functions [46].This suggests that the criteria for a system to be learnable is that it correspond to the breakdown of ergodicity.This breakdown has been well-studied in many-body physics for many years, most famously in the case of many-body localization [47,48], and corresponds to many interesting phenomena such as area law entanglement scaling [49,50] and the emergence of fractal many-body wavefunctions [51][52][53][54][55].To the best of our knowledge, little has been done to tie these fundamental physics concepts to the notion of learnability of a quantum system.
When considering quantum Hamiltonian learning as a beyond-classical experiment, there are actually two classically-intractable quantum computations being performed.The first is the Hamiltonian evolution itself (e.g.NMR experiment), which is an analog experiment being performed by the spectrometer and the sample.This produces a set of data which is beyond-classical whenever the experiment is beyond-classical.(Beyond-classical does not necessarily require the experiment to be BQP-hard; many of the algorithms that we will consider in this work lie in the DQC-1 complexity theory class [41,56].)A key part of this work lies in identifying those NMR experiments that are classically challenging to simulate, which is one quality distinguishing our work from previous suggestions to study classically tractable NMR signals on a quantum device [57].The second classically-intractable quantum computation is to learn the quantum Hamiltonian from the experimental data, which can be executed on a digital quantum computer.We assume that for the foreseeable future the connection between the spectrometer and the computer are classical.This prevents the quantum Hamiltonian learning technique of Ref. [58][59][60], or algorithms that require access to the quantum state [61,62] being implemented.(Access to a quantum connection between spectrometer and computer would be of immense interest if it could be achieved, as it could provide additional exponential speedups [62].)Alternatively, one can approach this problem by generating classically hard spectra (from a quantum simulation or NMR experiment) and using this to train a classical machine agent to infer Hamiltonians from spectra in the same phase (in a manner similar to Ref. [63]).However, this requires access to this additional training data.
Given a sample and an NMR spectrometer, chemists have many techniques at their disposal to infer molecular structure without requiring quantum computing assistance: magic angle spinning [64,65], dipolar decoupling and recoupling pulse schemes [66][67][68][69], polarization transfer methods [70], and choices of different spin species and use of heteronuclear NMR couplings [71].However, these methods throw away or approximate information about a system that is potentially valuable for characterization.Here we show that quantum computers can provide an extra tool in the NMR toolbox, and in doing so open up a realm of novel NMR experiments that have not been available before.
In this paper we propose a protocol to learn the nuclear spin Hamiltonian of a molecular or material system using a digital quantum computer and time-resolved measurements from an NMR spectroscopy experiment.We expect this protocol to present a beyond-classical quantum computation (in lieu of classical access to additional data such as a training set for a machine-learning algorithm) when the dataset from the NMR experiment is hard to classically simulate.In section II, we design quantum algorithms to estimate the cost function, Jacobian and Hessian of the learning problem.We attempt to identify those systems and situations where a beyondclassical application can be found, following some general discussions of learnability in section III.In section IV, we describe and cost circuits to implement our quantum algorithms in both fault-tolerant and NISQ cost models.We find that in both cost models the effect of integration error can be logarithmically suppressed or better, and that the ability to run deep coherent circuits in FT yields polynomial speedups in terms of various problem parameters; the duration and error in the experiment to be simulated, and the desired error in the final gradient itself.In section V, we identify NMR spectroscopy of proteins within cell walls or other membranes as one potential application, as the physical pinning of these systems within a membrane prevents the tumbling that would wash away strong correlations in solution.In section V A, we study an example protein, ubiquitin, as a benchmark with known molecular structure.We identify sets of clusters of 1 H spins within this molecule with strong intra-cluster coupling and weak coupling to the environment, that should produce a strongly-coupled signature able to be studied by a quantum device.We calculate the multifractal dimension of small clusters to study their ergodic to non-ergodic phase transition as the dipolar term is suppressed (e.g. by magic angle spinning or decoupling pulse schemes), and find the 'quantumfeasible' region to require a suppression factor between around α = 5 and α = 100 (assuming a background magnetic field of 23.5 T, corresponding to a proton frequency of 1 GHz).This gives a large window within which quantum computers could be expected to assist in NMR interpretation.We demonstrate the application of our learning algorithm to small clusters within this region, demonstrating its convergence on a small spin cluster in the presence of sampling noise.Finally, we show a direct correspondence between the loss of ergodicity (as measured in the multifractal dimension of ubiquitin spin clusters as their dipolar coupling is suppressed) and the onset of learnability (as measured by the analytical Hessian of our learning problem at the global minimum).To the best of our knowledge, this is the first demonstration of a clear connection between the notion of fractal eigenstates in a quantum system and its learnability by quantum or classical means.

II. QUANTUM-ASSISTED HAMILTONIAN LEARNING
We now consider the problem of learning a Hamiltonian H of some system from a set of time-resolved experimental data S x (t), where, x indexes different sets of experiments.As the Hamiltonian of a system dictates the time dynamics, this is a natural thing to learn from time series data; we will discuss later how one can infer molecular structure from a Hamiltonian of nuclear spins.Each experiment consists of an initial state preparation ρ x , time evolution by H plus an external time-dependent driving field H x (t), and final measurement of some observable O x .The signal S x (t) is then given by where U x (t 2 , t 1 ) is the time evolution operator generated by the Hamiltonian where T is the time-ordering operator.Note that one may consider ρ x = ρ and O x = O independent of the experiment x by encoding preparation and measurement terms onto the driving Hamiltonian H x (t).(As we will discuss later, this is often an accurate description of a real-world NMR experiment.)Alternatively, if the driving Hamiltonian H x (t) is only used for preparation and measurement, this may be encoded entirely in ρ x and O x , setting H x (t) = 0 and U x (t 2 , t 1 ) = U (t 2 , t 1 ) = e iH(t2−t1) .
To define our learning problem, we write our system Hamiltonian in the form where h n are a set of parameters and V n a set of Hermitian operators.We then consider the case where some or all of the h n are unknown, and we wish to estimate these by a set { hn }; this defines our learning problem.Given some prior {h n } with standard deviation w n , and assuming that each datapoint S x (t) is drawn from a normally-distributed experimental population with standard deviation σ 2 x,t , the maximum-likelihood estimation of the true parameters can be found by minimizing the cost function where Sx (t) = Trace[ Ūx (t, 0)ρ x Ū † x (t, 0)O x ] is the estimated signal with our estimates of the parameters hn .(Throughout this work, we use bars to denote quantities derived from estimated parameters rather than hidden ones.)Though Sx (t) cannot be estimated on a classical device, implementing it on a quantum computer simply requires a circuit to simulate the time evolution Ūx (t, 0).However, performing such an optimization gradient-free on a higher-dimensional surface is a costly endeavour.The first key result of this work is to give a practical form for the gradient and Hessian of Eq. 4 where we define Kn,m which may be obtained at no extra cost to the gradient estimation (assuming Sx (t) and Jn x (t) are measured to the same relative precision).This is important, as we can approximate the covariance matrix Σ of our final estimation of the { hn } as These equations may be alternatively derived via optimal control theory, which yields a conjugate field to the state ρx (t) that is generated by deviations Sx (t) − S x (t) = 0 and propagates backwards in time via the Schrödinger equation.

III. QUANTUM AND CLASSICAL LEARNABILITY
In the following sections, we will describe how the procedure outlined in section II may be developed into a complete quantum-assisted algorithm to learn Hamiltonian parameters via gradient optimization.However, there are two issues that limit the usefulness of our proposal to implement these algorithms on a quantum computer.We summarize these issues and give a sketch of the region in parameter space where the techniques we develop in this work are relevant in Fig. 1 The first issue is that the experimental data taken may not contain enough information to learn the desired couplings at all.A large amount of nature is chaotic, or ergodic -where the system tends to approximately explore its entire phase space.In such ergodic systems, properties such as local correlation functions die off quickly without any dependence on the internal structure [11,46].When this is the case the set of Cartoon of the "phases of learnability" of a quantum Hamiltonian.In the red region, the set of experimental data is insufficient to distinguish candidate Hamiltonians.This becomes increasingly likely for systems beyond a localization transition (red dashed line), as after this point experiments such as local correlators tend to provide little information about the system structure.In the blue region, the experimental data is sufficient to learn the system's structure, but the data processing may be achieved classically, rendering a quantum computer unnecessary.This classical processing may either be achieved by the system being wellapproximated by a classically-computable model, or by the experiment having sufficient control (grey dashed line) to isolate smaller subsystems.At the limit of good control, an experiment has the ability to dynamically decouple individual terms (blue dashed line), after which techniques such as those in Refs.[41,43] can be used to estimate terms individually.A simpler limit is to simply have the ability to address individual (or small frequency regions of) spins in the system (green dashed line), which renders the system more learnable.
e.g.local spin-spin correlations generated by two different Hamiltonians may be indistinguishable up to corrections smaller than any experimental noise.This makes learning impossible.Given sufficient local disorder (relative to the strength of spin-spin interactions), systems tend to localize, yielding a non-ergodic many-body localized regime characterized by an absence of transport [47].This lack of transport prevents local correlators decaying, resulting in a time-resolved measurement that may be used to distinguish between, and thus learn, different Hamiltonians.In certain systems, between the fully localized and ergodic regimes there has been proposed an intermediate non-ergodic regime, where the system explores a large fraction of its Hilbert space but does not completely thermalize [51][52][53][54].In these situations, some learning should also be possible.The transition to ergodicity and the loss of learnability may be mitigated somewhat given the ability to perform more complicated ex-periments (e.g.magic angle spinning [64,65], decoupling pulses [69], or other composite pulse sequences [72]).We summarize these notions going from left to right in Fig. 1.
Here, the x-axis denotes a rough measure of ergodicity of an arbitrary system (e.g. the localization length left of the localization transition, and the rate of entanglement growth on the right).This can in principle be changed by experimental control (i.e. by suppressing the dipolar interaction strength), which suggests that the localization transition should not be a vertical line.
The second issue we face is that a quantum computer may not be required to interpret the spectrum of a given Hamiltonian: it may be entirely possible to solve the learning problem with a classical device [73,74].While this reduction in complexity is a good thing for the experiment in question; nevertheless it limits the utility of using quantum computers in that context.The Hamiltonian learning problem may be solved classically for two reasons.Firstly, the Hamiltonian itself may be classically simulatable, or approximately simulatable.For example in a many-body localized system, the forward-scattering approximation or other perturbative expansions may be sufficiently accurate for learning.This suggests that a quantum computer will find the most relevance studying either intermediate non-ergodic phases that are not completely localized, or the region in the proximity of a direct many-body localization transition where the localization length is too large for classical simulation.The second reason why the Hamiltonian learning problem may be solved classically is if the experiment is controllable enough to isolate smaller subsystems or otherwise simplify the system.This can for instance be achieved if one has the ability to spatially resolve individual spins with a magnetic field.In this case it is possible to apply dynamic decoupling pulse sequences that isolate local Hamiltonian terms while cancelling out the remainder in an experiment [41][42][43], making local characterization possible.We summarize these notions along the y-axis of Fig. 1; as one gains more control over an experiment it becomes possible to learn Hamiltonians in more systems classically, until the barrier of dynamic decoupling is reached and all Hamiltonians are classically learnable.
These two concerns leave our quantum Hamiltonian learning algorithm with only a "Goldilocks" zone of applicability.We are interested in those experiments where we have some control over our input state and Hamiltonian, but not those where enough control is available to isolate individual terms.We are also interested in those experiments where our system is somewhat delocalized, but not completely.Experimentally, this can be summarized by saying that we can study those systems where some signal can be extracted, but where that signal is complicated by features (e.g.spectral line shifts) that cannot be easily understood perturbatively.

A. Robustness of learning
Assuming access to the derivatives in Sec.II, one may in principle solve the Hamiltonian learning problem using many well-known gradient-based or Hessianbased optimization techniques.(Using the approximate Hessian in Eq. 11 for minimization results in the wellknown Levenberg-Marquardt algorithm [75,76].)One may ask whether this can be made robust, to avoid being stuck in local minima.Similar questions have been asked and answered previously for single parameter estimation [41], quantum phase estimation [77], and device calibration [78,79].We give a sketch of an argument here for the robustness of our algorithm given a timeindependent Hamiltonian (H x (t) = 0) that works under the assumption that we start with an initial guess h (0) n sufficiently close to our true parameters h n .We stress that this is not a proof, and examining the landscape around the global minimum of our learning problem is a clear target for future work.
In this case, we may work in the eigenbasis |ξ a of the system Hamiltonian H|ξ a = E a |ξ a .Inserting two resolutions of the identity, our signal then takes the form x (t), where If our estimate deviates by some parameter h n → hn = h n + δ, then to lowest order in perturbation theory our estimated signal takes the form sa,b Note here that X is independent of δ.The second term in our cost function (Eq.4) takes the form This oscillates as a function of δ with a frequency bounded by 4t max ξ a |V n |ξ a , which is independent of the system size.This implies that we know that local minima in our parameter space must be separated by at least After converging on this data we may estimate the variance of our parameter guess (using Eq. 6), refine our estimate of δ, and increase the range of allowed t.Assuming that estimation at each t max yields an error δ new ≤ cπ/(4 max n V n t max ) for some c < 1, repeating this procedure over multiple orders will converge to some final error in O(log(1/ )) iterations of this procedure.If our initial uncertainty n | grows with the system size (which can simply be due to the increase in the number of parameters), this will necessarily shrink the initial choice of t max by the same amount.However, for a local Hamiltonian we expect the number of parameters and the magnitude of terms to only grow linearly in N .
The ability to estimate out to large times allows us to avoid learnability issues in localized systems where small long-range couplings contribute mostly to the offdiagonal part of the Hamiltonian (and thus do not affect the eigenstructure significantly).However, it is of no help in ergodic systems where signals S x (t) disappear quickly with t.As we will see, in those (ergodic) systems learning anything would be a significant challenge.

IV. QUANTUM ALGORITHMS
In this section, we outline algorithms to efficiently calculate our cost function C[H] and its first and second derivatives using a quantum computer.Quantum algorithm optimization is significantly different when targeting noisy near-term vs fault-tolerant long-term devices; we will present algorithms for both situations.In either case, we will discretize the integral in Eq. 8 where the weights z i > 0 are chosen such that i z i = t.
(A 2-dimensional discretization is similarly possible for the integral K n,m x (t).)There are many possible methods for choosing both the weights and the points s i ; one may use a simple trapezoidal or midpoint rule, or more complicated Gaussian quadrature methods, or one may take a Monte Carlo approach and choose the points at random.Each method incurs a discretization error that goes to 0 as I → ∞.For both the FT and NISQ quantum methods that we propose, cost depends primarily on [ i z i ] = t, and has only at most a logarithmic dependence on I.This allows us to choose our method of integration for simplicity or ease in circuit design.

A. Algorithms for near-term quantum computers
In the current NISQ era we want to run the shortest quantum circuits possible for any application.To this end, we propose estimating the signal Sx (t), and the integrands jn x (t) and kn,m x (t) on a quantum computer, and performing the integration and summation to yield dC [H]   dhn dhndhm classically.The signal Sx (t) is already in the form where it can be read as the expectation value of a quantum state following the application of a unitary circuit (Fig. 2, top).This requires that it take the form Trace[UΞU † M] for a quantum state Ξ, unitary U, and hermitian operator M. (This can be compared directly to Eq. 1.) The integrands jn x (t) (Eq.8) and kn,m x (t) (Eq.10) are not quite of this form.However, the integrands may be put in the correct form by adding control qubits to enlarge the Hilbert space, and then using the so-called generalized Hadamard test; this results in the middle and bottom circuits of Fig. 2.This circuit construction uses the identity (20) to transform a commutator into the above form (with Here the symbol c−U := I ⊕U denotes the unitary U controlled by the control qubit.These circuits require only local control of the V n unitary (see below) and uncontrolled time evolution, making them rather NISQ friendly (using e.g. the randomized Trotterization methods of Ref. [80] to simulate the time evolution).We assume in Fig. 2 that the terms V n are unitary (i.e., tensor products of Pauli operators for pairs of spins), but if this is not the case one may write V n as a linear combination of unitary operators, execute the circuits for each unitary component separately, and sum the resulting expectation values to yield the desired result [81].
The circuits in Fig. 2 assume the ability to prepare the initial states ρ x .In the applications we consider in this paper, these will be mixed diagonal states in the computational basis.The measurement operators O x will similarly be diagonal in the computational basis.Preparing mixed states requires that we average over many pure state preparations.Consider the case where we prepare a computational basis state |n , then perform a circuit U and measure the expectation values of a set of O x in parallel, yielding a set of estimates of n|U † O x U |n .If we have repeated this independently for all computational basis states, and each ρ x is diagonal in the computational basis, we can write x (t, s, r) (bottom) that are required to calculate the first and second derivatives of our cost function C[H] (Eq.4).For ease of viewing, we suppress labels in the circuits themselves (see legend for clarification).Circuits assume access to a preparation of ρ, and a means to simulate Ux(t, s) (without control) and to implement controlled perturbations Vn.The desired integrand can be found to be the expectation value of the product of the indicated operators, which (in NISQ) must be read out by repeated preparation and measurement.
As we know the initial distributions n|ρ x |n , the estimations of n|U † O x U |n may be used to compute the target trace.In practice we do not need to prepare all states; it suffices to sample from a distribution proportional to | n|ρ x |n |.In state-of-the-art quantum experiments this presents a small difficulty, as uploading a new pulse sequence to re-prepare each state may be impractical.One solution may be to initially prepare qubits in the |+ basis and measure them prior to performing a simulation, which results in a new preparation each time.
Repeating the above procedure at multiple points s i and r i allows for parallel estimation of Jn x (t) or Kn,m x (t) via Eq.19 (for fixed n and m).In principle in NISQ we are free to draw these times at random from the range [0, t] and [0, s] and choose the initial state.(In practice changing evolution times is more difficult than choosing initial states; making this scheme more practical is an important direction for future work.)In this case, each choice of starting state and time produces an independent random variable and Hoeffding's inequality may be applied.For the estimation of Jn x (t), M repetitions of our experiment yields an estimator Jn x (t) that satisfies and the number of samples required to estimate this at a constant failure rate scales as M = O( −2 O x 2 t 2 ) (where we use O to denote asymptotic complexity suppressing polylogarithmic factors).Similarly, for the estimation of Kn,m x (t), M repetitions of our experiment yields an estimator Kn,m x (t) that satisfies and the number of samples required to estimate this at a constant failure rate scales as ).This scaling in t is to be expected, as the integrals tend to scale as (To see this, note that the diagonal terms of t 0 ds Vn,x (t, s) in the Hamiltonian basis grow linearly in t [82,83].)This implies that we may estimate Jn x (t) and Kn,m x (t) to constant relative error with a number of samples independent of t.
We now give an analysis of the complexity of this sampling approach where we assume quantum simulation is performed under a Hamiltonian query model.The nofast-forward theorem [84] requires a number of queries of Ω(t) to execute each of the circuits in Fig. 2, so the total gate count to estimate Jn x (t) and Kn,m x (t) to error using these methods scales at best as O( −2 O x 2 t 3 ) and O( −2 O x 2 t 5 ) respectively.By comparison, the total query count to estimate Sx (t) to error using the circuit in Fig. 2 is bounded asymptotically as O( −2 O x 2 t).These estimates need to be combined to estimate the derivative in Eq. 5. To compute the total cost to estimate this to constant error, let us assume that Jn x (t) ∼ t, Sx (t) ∼ 1, that σ x,t ∼ σ and O x ∼ 1 are independent of x and t, and that we can optimize the number of repetitions of each experiment to minimize the total query count.Let us also assume that each experiment involves preparation and measurement in a commuting basis, and let us assume no covariance between parallel measurements.Then, we find the total number of queries required to estimate single derivative terms to error is bounded asymptotically by (see App.A for details) where N x is the number of distinct experiments performed.The evaluation of the sum over t depends on whether S x (t) are sampled logarithmically sparsely (in which case t t 3/2 ∼ T 3/2 log(T )), or densely (in which case t t 3/2 ∼ T 5/2 ), where T = max(t).In the former case, the total number of queries is bounded asymptotically by O σ −4 −2 N x T 3 , whilst in the latter it is bounded asymptotically by O σ −4 −2 N x T 5 .

B. Improved estimation of the gradient term on a fault-tolerant quantum computer
In a fault-tolerant cost model, it is preferable to perform the integration, multiplication and summation over t and x in the second term of Eq. 5 entirely coherently.This because on an error-corrected quantum computer the key resource to minimize is the total number of gates (i.e. the sum of the gate count of each circuit applied) rather than just the depth of the longest circuit.We now outline how this may be achieved for the gradient term.Assuming that ρ x = W x |0 0|W † x , and using the fact that Trace(A) • Trace(B) = Trace(A ⊗ B), we have where the approximation is the approximation from our numerical integration.One can confirm that U 0 (x, t, s) is unitary as long as V n and O x are unitaries (and if this is not the case, they may be decomposed as a linear combination of unitaries themselves).The second part of the second term in Eq. 5 requires multiplying by the experimental signal S x (t).This signal then needs to be loaded onto the device; if done naively this could easily become the dominant cost in our circuit.To lower this cost, we make the reasonable assumption that S x (t) consists of a small number N ω T of Fourier components where φ x = 0 or φ x = π/2 and a x,k > 0 is expected from the t = 0 behaviour of our signal.Then, writing cos(tω x,k + φ x ) = 1 2 (e i(tω x,k +φx) + e −i(tω x,k +φx) ), we have where we have labeled the operations acting on the different control qubits 0 and 1.Under the above assumptions, U 1 (x, t, s) is also a unitary operator.In the above, the s l,t and z l,t points are our integration points and weights respectively (following Sec.IV A), but allowing for the fact that the limits of integration (and thus both the points we should sample over and the total width we need to multiply by) are dependent on t.Both summations may be then block encoded using standard LCU techniques [85].These require control registers |x ,|t , |l , and |k to encode the summation variables (as well as some additional registers we will introduce later), and SELECT and PRE-PARE unitaries.(We assume here that our times |t have some finite binary representation.)These registers contain in turn n x ∼ log(N x ), n t ∼ log(KT ), n l ∼ log(L) and n k ∼ log(N ω ) qubits, where 1/K is the precision to which we store our times t; see Appendix B 2 for a de-tailed analysis of the truncation and discretization error.
The SEL 0 unitary selects the correct U 0 (x, t, s l,t ) unitary to implement based on the control register; in other words, . In Fig. 3, we show how this can be implemented using oracular access to U x (t, s), O x , and W x (which we will shortly give implementations for).The PREP a unitaries prepare the corresponding control states from an initial state |0 on the control register.(Note that the absolute value in Eq. 31 and Eq.33 are technically unnecessary as all summands are positive.)We omit garbage registers in these steps for simplicity.Given these, one can check that where the circuits act on the combined system and control register set.For a = 0, 1, we may then use the overlap estimation algorithm of Ref. [86] to estimate these values to error a with confidence 1 − δ using O(log(δ −1 ) −1 a ) queries to PREP a , SEL a ; see Appendix B 1 for a review of this algorithm.The number of additional gates used in [86] are negligible compared to the cost of block encoding.Setting a = O( /λ a ) allows us to obtain an estimate of dC [H]  dhn that is within with confidence 1 − δ.
i z l,t = t when we have sampled at time t, so if we assume that |S x (t)| ≤ 1 and σ x,t = σ, we have λ 1 = λ 2 = Nx σ 2 sampled t t.The number of oracle calls to PREPARE and SELECT then scales as (in comparison to Eq. 24) To make a heuristic comparison to the NISQ results, we again consider an oracular model.Our SELECT oracles require time evolution by up to T = max(t), so a quantum simulation algorithm with linear scaling in the evolution time would make O(T ) queries to the Hamiltonian oracle.Thus, in the sparse sampling case the total number of oracle calls is bounded by O(σ −2 −1 N x T 2 ) (a saving of σ −2 −1 T ), while in the dense sampling case the total number of oracle calls is O(σ −2 −1 N x T 3 ) (a saving of σ −2 −1 T 2 ).We expect much larger savings for simulating concrete Hamiltonians using fault-tolerant quantum algorithms, as NISQ approaches [80,81] typically cannot achieve linear scaling in the simulation time and also have worse scaling in the target precision.
FIG. 3. Circuit diagrams of the fault-tolerant oracles SEL0, SEL1, PREP0 and PREP1 described in this text.See text for details.Black circles on multi-qubit registers denote complex control procedures.Square boxes denote classical input to the system via QROM and coherent alias sampling (CAS) [15].Subscripts are omitted from gates for ease of reading.The dashed circles on the control for U in the SELa circuits indicates control that is only needed if the time evolution during an experiment changes between experiments.
The complexity of the SELECT unitaries is dictated by the need to implement the controlled time evolution x,t,l |x |t |l l| t| x|U x (s l,t , 0) and x,t,l |x |t |l l| t| x|U x (t, s l,t ).(We will discuss the initial preparation x |x x|W x later.)There are a wide range of fast quantum algorithms for simulating time evolution [84,[87][88][89][90]; which choice is optimal depends on the details of the Hamiltonian being studied.Here, we give an example implementation of the SELECT unitaries using higher-order product formulas, following the analyses developed in Ref. [91].We expect higher-order formulas to provide the fastest approach for simulating many spin Hamiltonians [91,92], at least asymptotically.We assume for practical purposes here that our time evolution is experiment-independent -U x (t, s) = U (t, s) = e iH(t−s) .This removes the need to consider the |x register in our implementation of SELECT.Our implementation requires that we fix the discretization of the integral in Eq. 8. We choose L points s l,t = tl L , with even weights w l = t L .The error in this approximation can be shown to be bounded by We implement this by dividing the total time interval [0, t] into R Trotter steps and implementing a higherorder product formula in each step.To ensure that the simulation has error at most η, we take [91] where the lower-case o(1) here represents a constant that can be taken to be arbitrarily small, and the X 1 and X 2 coefficients depend on the system size and graph connectivity.For a linear chain of N qubits, we have X 1 ∼ 1 and X 2 ∼ N [93].By contrast, assuming a model of clustered Hamiltonians [94] where H k,l H k,k whenever k, k ∈ K = L l, and we have X 1 ∼ Λ ind , X 2 ∼ Λ, where Alternatively, we can apply a partial Trotter decomposition without splitting the terms within each cluster [94].This reduces the Trotter error to instead scale with X 2 ∼ Λ int , where Each cluster can then be simulated using either product formulas or more advanced quantum algorithms.We expect that such a hybrid approach can improve the runtime of our approach, but a detailed study of such an improvement is out of the scope of the present paper and will be left as a subject for future work.
To analyze how the error of quantum simulation affects the accuracy of the overlap estimation, we use the blockdiagonal structure of Eqs.37 and 38.We see that the controlled time evolution has error at most η provided that quantum simulation is performed with accuracy η.
To achieve an accuracy of in the estimate of overlap, we set η a = O( /λ a ) for a = 0, 1, respectively.This sets the minimum number of Trotter steps R in Eq. 39.
We now explain how to add the double-control by the |l and |t registers to a general product formula S p .(The single-control by the |t register also required for the SEL a oracle can be implemented by the following techniques as well.)The near-linear dependence of the evolution time in R and requirement that L O(t 2 ) imply that we need L R. This in turn implies that lt/L is not necessarily an integer multiple of t/R.For simplicity, we assume that L, R are powers of 2, and write lt/L = rt/R + q for 0 ≤ q < t/R.We write q = qR/t < 1, and then lR/L = r + q gives the number of integer (r) Trotter steps and the fractional remainder q .Because L and R are powers of two, these integers r and q are already stored in the first log(R) and the last log(L) − log(R) bits of the l register, and may be identified by renaming l as (r, q ).The integer part (r) determines the number of times for which S p (t/R) needs to be applied: controlling S 2 br p (t/R) by the b r -th bit of the |r control register and the |t register implements the unitary where here the |t register dictates the angle of rotation of each component of the product formula.For example, we would directly implement t |t t|e itθZi bit-wise, using the b t th bit of the t register to control a rotation by e i2 b t /2 n t θZi .This has a gate complexity polylogarithmic in the input parameters, and so we neglect it.We can similarly implement the final fractional Trotter step controlled by the |q register; i.e, we implement the unitary q ,t |q |t t| q |S p (tq /R).
This also has a similar cost that is polylogarithmic in the input parameters.The final scaling of our doublycontrolled time evolution is then identical up to logarithmic factors to the cost of implementing the Trotter evolution without control.Our above implementation of controlled quantum simulation is developed and optimized specifically for product formulas.Another possible circuit implementation that works for not only product formulas but also more advanced quantum simulation algorithms is to use a binary representation of the evolution time and simulate for time 2 k with integer k; see [95] for details.In any case, the complexity only scales logarithmically with the input parameters and the overhead is negligible.This justifies the comparison between the oracular models in Eq. 36 and Eq. 24, as long as the PREPARE and controlled-W x circuits have lower costs than SELECT.
A naive implementation of the Trotter steps requires that we exponentiate all the terms in the Hamiltonian.For the clustered model in Eq. 40, this implies a gate complexity of O N 2 to implement each Trotter step.However, this may be improved by truncating Hamiltonian terms with very small magnitudes or by switching to an advanced quantum simulation algorithm.We also need to synthesize the rotation gates with respect to a fault-tolerant gate set, but the overhead in the circuit synthesis is asymptotically negligible.
The PREP 0 and PREP 1 oracles require loading the coefficients in Eqs. 30 and 32 respectively onto a quantum register.As we have chosen a uniform integration measure, the integration weights for both PREP 0 and PREP 1 are independent of the value of the l-register, which may be prepared by simply applying a Hadamard gate to all qubits.The remainder of our PREPARE oracles relies heavily on the QROM and the coherent alias sampling (CAS) technique of Ref. [15], which can be used to perform the mappings |j |0 → |j |a j and gates, where N d is the number of unique datapoints or indices j.This is important as we do not assume that our times t are chosen uniformly, so preparing the |t register is non-trivial.If we index our times by some uniform index j, i. cost equal to the number of unique datapoints.Combining this with the prepared |l register above yields the PREP 0 oracle.If σ x,t = σ t (i.e.all separate experiments are performed with the same error, which is a reasonable assumption), this has an identical cost of N d .We assume that N d scales at worst linearly in T (i.e. for dense sampling), and so the cost of implementing the PREP 0 oracle is bounded by O(T ) and dominated in the block encoding by the additive cost of the SEL 0 oracle.The PREP 1 oracle differs from the PREP 0 oracle only by the additional amplitudes a x,k .These may be mapped onto the device using coherent alias sampling at a cost scaling as N x N ω .This cost is additive to the N d cost above, and as we expect N x N ω N d , we expect this oracle to also be dominated by the cost of SEL 1 .
As an additional part of the SEL 1 subroutine, we need to implement the controlled Z rotation e −iZ2(tω x,k +φx) .The classical values ω x,k and φ x here need to be loaded onto the quantum device.In order to do this, we rely on the QROM technique of Ref. [15].Given the set of N ω classical datapoints f x,k = 2πω x,k to some fixed precision, QROM can be used to perform the mapping |k |x |0 → |k |x |f x,k , using O(N ω N x ) Toffoli gates.We can similarly map φ x onto a single qubit, |x |0 → |x |b x where b x = 0 if φ x = 0 and b x = 1 if φ x = π/2, at a cost of O(N x ) Toffoli gates.These mappings are more appropriate to implement during the PREP 1 step, so we shall insert them there.Then, in the SEL 1 subroutine, we may assume access to these registers, in which case the controlled Z rotation may be implemented by arithmetic of the same form as in product formulas at a cost polylogarithmic in the size of the |f x,k and |t registers.We may alternatively use the phase gradient method described in [96].
It remains to describe a preparation scheme for ρ x .This is especially important to consider as ρ x is not a pure state, so it is impossible to prepare it from an initial register with a unitary operation on an N -qubit quantum register.We require W x to be a unitary operation for the expectation value estimation algorithm, as it requires repeated access to W x and W † x (or equivalently the ability to reflect around ρ x ).To solve this problem, we expand the size of our system register, and prepare a purified state |ψ x = W x |0 such that for all observables O within our original system, . This requires that we at most double the number of qubits of our system N , and all operations other than W x and W † x we can ignore the additional qubits.As we will see below, for applications in NMR we are mostly interested in preparing states such as which is the maximally-mixed state on all qubits except qubit j x .To achieve this with an additional N -qubits, we begin with N copies of the Bell state ⊗N , which can be prepared using only Clifford gates.We then perform a Toffoli gate with x and j x -th qubit as controls and j x +N as target, followed by a Hadamard gate on the j x -th qubit controlled by x.Each controlled Hadamard can be implemented using a single Toffoli gate [4,FIG. 17].This prepares the state which has our desired properties.More generally, any thermal state of the classical 1D Ising model can be prepared as a 2N -qubit thermofield double state with perfect fidelity using a depth N/2 circuit [97].

V. APPLICATION TO NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY
We now focus on the application of our quantum learning algorithm to NMR spectroscopy.In an NMR experiment, a sample of a molecule, crystal, or other material is placed in a strong magnetic field, which interacts with the magnetic moment of any spinful nucleus.These also couple to each other through dipole-dipole and electron- Here, r i,j = r i − r j is the vector between the ith and jth nuclei, S i is the spin vector, B is the external magnetic field, χ i = γ i δ i is the shielded magnetogyric ratio of the ith nucleus (with δ i the chemical shift and γ i the unshielded magnetogyric ratio), Γ i,j = µ0γiγj 8π|ri,j | 3 is the dipole-dipole interaction strength (with µ 0 the vacuum permeability and Planck's constant), and J i,j is the electron-mediated coupling strength.Assuming that our nuclei are spin-1/2, S i = 1 2 (X i , Y i , Z i ).As the coupling constants Γ i,j depend directly on the physical distance |r i,j | between the molecular spins; knowledge of these distances is sufficient to infer the molecular geometry (modulo global translations, rotations and reflections) [99].An NMR spectroscopy experiment follows the protocol outlined in Sec.II.The system begins at thermal equilibrium, is perturbed by one or more magnetic field pulses, and has its free-induction decay read out (which is equivalent to measuring i X i ).In this formalism the starting state ρ x and measurement operator O x are the same for all experiments (which differ in their choice of H x (t)).However, the external perturbation is often chosen to polarize the initial state and flip spins in the end, making the signal S x (t) a local spin-spin correlation measurement.(We will discuss how to implement this in a strongly correlated system shortly.) Whether a NMR Hamiltonian lies within the ergodic, classically-feasible, or quantum-feasible regimes in Fig. 1 depends on the relative energy scales of the terms in Eq. 48.These are typically where β is the inverse temperature of the system, and χ = 1 N i χ i .Only terms coupled to the dipolar term (Γ i,j ) and the electron-mediated interaction (J i,j ) generate classically challenging dynamics, so one of them must be large for us to lie outside the classically feasible region of Fig. 1.However, in solution a molecule tumbles rapidly, averaging out the dipolar term to 0. In a strong magnetic field, the remaining Hamiltonian can be treated perturbatively and solved classically, rendering our quantum learning algorithm unnecessary.Previous proposals [57] that suggested using quantum computers to learn the structure of molecules in solution suffer from this applicability issue.(Note that in zero-or ultra-lowfield NMR [31][32][33][34] Eq. 49 does not hold; in these experiments Bχ J i,j and quantum computers may have a role to play in learning structure.) We propose to instead study systems where molecules are out of solution and not free to move.Indeed, NMR is used in a wide range of systems out-of-solution: gels [27], surfaces [28], proteins in membranes [29,30] and in the solid-state [24].In these systems the uniform magnetic field interaction term Bχ is still the dominant energy scale, and terms that do not commute with this cancel out, leaving (50) where φ i,j is the angle between B and r i,j .(This is commonly known as the secular approximation.)This approximation requires φ i,j to be well-defined (e.g. by stacking membranes so that all proteins are similarly aligned with the magnetic field).Simulating randomly scattered molecules would require classical averaging over many such alignments, presenting an additional simulation challenge (and in practice broadening out spectral lines).In solid state NMR experiments one typically removes the dipolar coupling by magic angle spinning; spinning at a high frequency around an angle θ = 54.74• to the magnetic field.This spinning suppresses the dipolar term by a factor (3 cos 2 (θ) − 1) ∼ 0 as long as the spinning frequency is much higher than the dipolar coupling strength.Combining this with frequency-selective dipolar recoupling [100] has achieved remarkable success in biochemistry, (see e.g.Ref. [101]).However, this is not typically achievable to high precision for proton NMR, where dipolar couplings are typically of the order of 30 − 40 KHz [102] (by comparison, the highest frequency centrifuges are around 100 KHz [29]).Even when magic-angle spinning is combined with decoupling pulse schemes, significant residual coupling in proton-NMR spectra can be observed [29,103,104].(Moreover, suppressing the dipolar coupling term removes valuable structural information about a system.)Proton NMR in solutions has achieved great success in biochemistry, but the folding of proteins can be quite different in vitro versus in vivo.For example, a large number of proteins in cell walls and membranes are folded precisely according to this external environment, and lose their shape in solution [29,30].This suggests that learning the structure of proteins in membranes via proton NMR is a potentially valuable beyond classical quantum computing application.
Proposing to study systems with strong dipolar coupling presents a challenge in designing realistic state preparation and measurement schemes.As the temperature is larger than all energy scales in our experiment, our initial state is a thermal state and we must perturb this in order to generate any signal at all.(The approximation here is quite good, as even for a 23.5 T background magnetic field βBχ ∼ 10 −4 .)We have two external handles to perturb our system: the ability to apply time-dependent RF pulses to modulate the background magnetic field B, and magic-angle spinning [64,65].The latter is a double-edged sword; we cannot alter the direction of the magnetic field nor the spinning sample on the timescale of our system, so it is only practical to use this to suppress the dipolar coupling by a fixed amount for the entire experiment.However this may be adjusted during the experiment via dipolar decoupling and recoupling pulse schemes [66][67][68][69]100].As a simple example, consider the 4-step WAHUHA scheme [105,106], which consists of (1) a rest of time dt and a π 2 pulse around the x axis, (2) a rest of time dt and a − π 2 pulse around the y axis, (3) a rest of time 2dt and a π 2 pulse around the y axis, and (4) a rest of time dt, a −π 2 pulse around the x axis and a final rest of time dt.By performing a Magnus expansion to first order in dt on the above combined unitary scheme, the dipolar term averages to zero.(This is true of any scheme that rotates the X, Y, Z Pauli spins to the z-axis of the magnetic field for equal periods of time.)Once the dipolar field is sufficiently suppressed, individual spins may be flipped by e.g.applying a low-amplitude magnetic field oscillating at a frequency ω which addresses spins for which Bχ i = ω.This simple scheme is likely difficult to achieve sensitivity below ∼ 1 KHz, but this may be improved by frequency-selective pulsing schemes [66,107] such as the DANTE (Delays Alternating with Nutations for Tailored Excitation) scheme [108].After applying such a scheme, the perturbation to the system is roughly where δω gives the accuracy of the technique.In an NMR experiment the signal is rescaled by the factor β in Eq. 52.
However, in a quantum computation we may divide by β before performing our estimation, which makes the error requirements in our final signal independent of β.Approximations in the above can be accounted for in our learning scheme by adjusting the starting state, or by incorporating the pulse sequence into the unitary U x , giving an additional advantage over classical Hamiltonian inference.
A. Determining the structure of ubiquitin as an example application A significant body of literature on protein structures already exists, which we can use as a benchmark for our quantum learning algorithm.The protein ubiquitin, named for its abundance throughout eukaryotic organisms, was discovered in 1975 [109,110].The structure of ubiquitin is well-known, making it a good benchmark.The protein contains over 600 protons, but these tend to cluster (in terms of their relative coupling strengths, which correspond to their location in space); we propose to divide the full molecule into smaller clusters that may be studied individually.This corresponds to assuming our experimental signature and we can either subtract the contribution of individual pieces from the total signal, or learn this linear combination in a single step.The latter has a linear overhead in the number of pieces, and does not require larger quantum computers or longer circuits.In practice we do not need to determine the clustering ahead of time; clusters will present themselves as minibands in the spectrum that cannot be separated as in Eq. 53, and we may account for these in our learning algorithm by allowing our optimization to adjust the number of spins in any given band.
To investigate this clustering, we write the ubiquitin molecule as a graph (with edges weighted by the dipolar interaction strength).We take the atomic co-ordinates of ubiquitin in solution from previous NMR data (protein data bank ID 1D3Z), and proton chemical shifts from the Biological Magnetic Resonance Data Bank (BMRB), entry 17769 (both in turn taken from Ref. [98]).We then define a cluster as any connected subgraph where all edges are higher weight than any edges (in the larger graph) that point from the subgraph out.(The above definition in principle removes some edges from the subgraph / couplings from the cluster, but upon identification of the cluster we consider all couplings between the spins regardless of their size.)Such subgraphs may be found by thresholding; setting a truncation threshold V min and eliminating all couplings lower than this threshold in the nuclear spin Hamiltonian cuts the graph of spins into a set of disconnected subgraphs, each of which is a cluster.In the ubiquitin molecule (Fig. 4), by adjusting this V min we can first split off a 466 spin central core of the molecule (left), and then a 238-spin backbone (middle top), and finally a 60-spin cluster (middle bottom).Given its size, we expect the problem of learning FIG. 6. Plot of the multifractal dimension of small clusters of spins in the ubiquitin molecule as the dipolar term is suppressed in a background 23.5 T field.Points in the main plot are extracted from a linear fit of the mean participation entropy of the different clusters as a function of the system size, as demonstrated in the two insets for a suppression factor 2 and 100.From these fits the multifractal dimension can be estimated.The colouring of the plot indicates the expected quantum and classical learnability of the system as the dipolar term is suppressed, corresponding to the different regions in Fig. 1.
the 60-spin cluster Hamiltonian to lie around the beyondclassical boundary.To investigate the cluster's coupling to its environment, we plot the distribution of coupling strengths (Fig. 4, right) both within the full molecule, within the 60-spin cluster, and between the cluster and the rest of the molecule.We see that the mean coupling within the ubiquitin cluster is around two to three times all couplings to the environment, and the majority of couplings to the environment are more than ten times smaller than the couplings within the cluster.We believe that this is sufficiently weak that these couplings may be treated perturbatively, though verifying this is a clear task for future study.
We now demonstrate our Hamiltonian learning algorithm for a small cluster of 6 spins in the ubiquitin molecule (Fig. 5).We assume that we have access to magic angle spinning or dipolar decoupling techniques to suppress our dipolar field by a factor α = 10, and that the background field is 23.5 T. To simulate the proposal that we know the protein backbone and are focused on learning long-range couplings, we start from a Hamiltonian (Eq.50) where all couplings larger than some V min are known precisely, and set ourselves the task of learning smaller couplings.This leaves 12 couplings to learn (we treat XX +Y Y and ZZ couplings independently), which we initialise at 0. To simulate sampling noise from the quantum computer, to each query of the device for S x (t) or the gradient we add a normally-distributed error term with a value of 10 −3 .Using the conjugate gradient optimization algorithm implemented in scipy [111], we find that our learning problem converges to a total error of 0.008 KHz in only 11 iterations (a relative error of 0.2%).

B. Learnability of spin clusters in ubiquitin
It remains to demonstrate that spin clusters in ubiquitin in a membrane or cell wall will generate an NMR signal within the region of quantum feasibility in Fig. 1.In order to study this, we investigate the participation entropy [55] of eigenstates |ψ = α ψ α |α of small clusters in the ubiquitin protein.By adjusting our threshold V min in the clustering protocol described above, we identify a collection of 127 clusters of N = 5 − 12 spins in ubiquitin.We do not expect the spectra of the ubiquitin Hamiltonian to be represented by a disjoint sum of some of these clusters (which would make it classically simulatable): the truncation is simply performed to give an ensemble on which to study, and we expect that the spectra from some of these clusters in the larger spin environment would differ significantly from those of the truncated piece.The dipolar couplings in these Hamiltonians may be suppressed relative to their chemical shifts by magic angle spinning or decoupling pulse schemes; we simulate this numerically by dividing the dipolar term by a variable suppression factor α.For the Hamiltonian of each cluster (Eq.50) we calculate the mean participation entropy S 1 across the middle half of the spectrum at P = N/2 half-filling as we increase the dipolar term suppression by a factor α. (For odd-sized clusters of N = 5, 7, 9, 11 spins, we take P = (N −1)/2-filling.)In an ergodic system where nearly all computational basis states contribute nearly equally to each eigenstate, the participation entropy scales as For the systems considered this is roughly S (ergodic) 1 ∼ −0.63N .By comparison, a completely localized system has a constant (or logarithmically-growing) participation entropy.As we increase α and the system becomes non-ergodic, the participation entropy follows a trend S 1 ∼ D 1 S (ergodic) 1 , where the multifractal dimension D 1 = D 1 (α) characterizes the fraction of the Hilbert space explored by an eigenstate [55].
In Fig. 6 we plot the multifractal dimension of our 127 spin clusters as we suppress the dipolar term by a factor α. Our quantum-feasible region corresponds to a multifractal dimension D 1 < 1, while our classicallyfeasible region corresponds to a multifractal dimension D 1 << 1.We see a clear region between a suppression factor of α ∼ 5 and α ∼ 100 where our system begins to localize and the multifractal dimension quickly drops, but the system is not completely localized and classically simple.In the thermodynamic limit this transition is discontinuous [55], but as we are interested in finite system sizes, we believe that the observed trend of D 1 as we cross this localization transition is relevant to our situation.On either side of the localization transition, local disorder will make individual clusters either more ergodic or more local than the mean, which implies that the boundaries between degenerate, quantumfeasible and classically-feasible are not sharp as a function of the dipolar suppression.(This can be seen in the insets of Fig. 6.) The ergodic to non-ergodic phase transition observable in the Hamiltonian eigenstructure maps immediately to the learnability of the NMR spin Hamiltonian.This can be studied in the Hessian of the Hamiltonian learning problem at complete convergence, given by Eq. 11.
As the Hessian corresponds to the Fisher information of the system, small eigenvalue-eigenvector pairs (λ, v) correspond to 'floppy modes' in our parameter space; linear combinations of parameters that may be adjusted in tandem without significantly altering our signal.Large eigenvalue-eigenvector pairs correspond to combinations of parameters that are well-learned.In Fig. 7 we study the typical eigenstructure of the Hessians of clusters of 5 − 8 spins.We see a clear transition that corresponds exactly to the ergodic to non-ergodic phase transition identified in Fig. 6.On the left-hand side, corresponding to the ergodic phase, the typical maximum eigenvalue (Fig. 7, top) of the ensemble shows a clear exponential decay in system size, implying that learning in a large system will be nigh-impossible.Moreover, the typical participation in these systems, exp grows quickly (Fig. 7, bottom), implying that these modes correspond to global data rather than specific couplings.By contrast, when the system is strongly localized, the largest eigenvalues are roughly constant in the system size, and correspond to linear combinations of only one or two parameters.As the system shifts between these two phases, we see a continuous improvement in learnability, where it appears we can learn some of but not all of the system.This can be observed in the full eigenspectrum data (Fig. 7, top inset).We note that the largest eigenvalues also correspond to smaller typical participation, which suggests that when a system is on the ergodicity boundary we can learn some local couplings rather than just global information.This result demonstrates the importance of having access to the Hessian when solving the learning problem, as it tells which of the converged parameters can be relied upon.

VI. CONCLUSION
In this work we introduced a new method for learning an unknown quantum Hamiltonian of a spin system from time-resolved measurements of the system.We constructed and costed circuits within NISQ and FT frameworks to estimate gradients of the cost function of this learning problem, finding clear asymptotic speedups when one is not constrained by poor coherence in NISQ FIG. 7. Eigenvalue and eigenvector participation data for Hessians of the Hamiltonian learning problem for small spin clusters in ubiquitin.(Data is taken as the set of Zi(t)Zj(0) and Xi(t)Xj(0) correlators using a set of equally spaced times between t = 0 and t = 5 ms in the absence of sampling noise, implying that units of the Hessian eigenvalues are arbitrary.)(Top) typical (geometric mean) largest Hessian eigenvalue for different system sizes as the dipolar term is suppressed.Inset shows the typical Hessian spectrum for each suppression factor over clusters of 8 spins -the light green line in the main plot is taken from the indicated cut through the inset.(Bottom) typical (geometric mean) eigenvector participation (Eq.55) for the same dataset.Inset shows the participation across the entire Hessian spectrum for clusters of 8 spins; each datapoint in the light green line in the main plot corresponds to a geometric mean over a single line in the inset.
devices.We outlined an application for these algorithms in classically intractable NMR experiments, and proposed a specific region in the NMR field (when dipolar couplings are strong and cannot be simply removed) as an area where beyond-classical computations may be very useful.Taking the ubiquitin protein as an example, we identified small clusters of spins in the larger 1 H dipolar coupling matrix, demonstrated the convergence of our learning algorithm on a toy example, and investigated the cluster-environment coupling and the learnability of the system as the dipolar coupling is suppressed.
As part of this work, we identified a direct correspondence between the ergodic to non-ergodic phase transition and the learnability of a Hamiltonian from time-resolved experimental data.The latter was clearly observable in the structure of the Hessian of the learning problem.We believe this to be a more general result than in the NMR problems that we have studied in this work.As far as we know, this is the first link demonstrated between the onset of wavefunction fractality and the onset of learnability of the generating Hamiltonian.
Our work opens a new field of quantum computing applications, leaving clear directions for future study.Our learnability data suggests that not all dipolar coupling parameters (that encode the 3D protein structure) may be learnable using the simple spin-spin correlations measured in this work.Developing future experiments to target the parameters relevant for structure calculations in NMR systems will be highly relevant in the future.A related question (which we have not yet determined the answer to) pertains to the conditions (if any) under which this problem is classically difficult.While we can rely on the fact that inference of NMR spectra from strongly-correlated problems appears difficult (and the forward problem of generating the spectra is at least DQC1-hard [41,56]), we do not know of a complexitytheory result explaining when the inverse problem is as difficult.Also, the protocol we propose in Sec.III A can, in principle, avoid trapping in local minima; however this is not optimized or fully costed.We suggest that following Ref.[41], it may be possible to define a protocol using our methods that learns at the Heisenberg limit (when learning is possible).It is also unclear precisely how well our protocol behaves in a system with a large number of unknown couplings, and whether we will have problems with vanishing gradients as our system size grows.(We note however that this is avoided somewhat in NMR problems where we start with a reasonable guess of our Hamiltonian parameters.)Another clear direction for future work is to consider learning Hamiltonian parameters from the Fourier transform of the spectra S x (t) instead of the time-resolved data, as spectral information is typically more robust to noise than amplitude data.It also remains to determine what the cost is to optimize a real-world NMR experiment from a realistic initial guess.Finally, thanks to the generality of our Hamiltonian learning techniques, it should be possible to extend these methods to the design of new NMR experiments, or to interpret data from other types of experimental procedures used to probe condensed matter, high energy physics, chemistry and materials science.We look forward to further explorations of what new possibilities for quantum experiments and data analysis these techniques can bring to the scientific community.shots) to achieve some target error [17,112] rather than working with an oracular model.However, this optimization allows us to make a fair comparison between the results of Sec As our estimate of Jn x (t) is bounded, assuming O x = 1 the bound in Eq. 22 yields an estimator for Jn x (t) with variance 2 using M = t 2 −2 repetitions of the circuit.We now assume that we may measure S x (t) and J n x (t) for different x in parallel.This is realistic for our NMR application, and will save a factor N x in the asymptotic scaling.As the number of oracle calls per circuit scales as t, we can achieve a variance Var Jn x (t) = a x,t,J t 3 C −1 t,J using C t,J oracle calls (for some t and x-independent constant a x,t,J ).Estimating Sx (t) with to a variance 2 requires simply repeating the corresponding circuit (Fig. 2 where we have absorbed the constants of the Sx (t) and Jn x (t) scaling into a x,t,S and a x,t,J .To optimize this, we adopt the same Lagrangian approach introduced for measurement optimization in Ref. [112].We write a Lagrangian and then differentiate with respect to our free parameters C x,t,J and C x,t,S and solve for the result being equal to 0.
x a x,t,J Substituting into the expression for a variance Var dC and rearranging for λobtains Finally, we can write the total number of oracle calls as as a x,t,J and a x,t,S are constants, we have asymptotically that x a x,t,J , x a x,t,S ∼ O(N x ).Substituting this into Eq.A8 yields immediately Eq. 24.
From these we obtain y The standard quantum phase estimation outputs an estimate of the eigenphase with accuracy by making O(1/ ) queries to the reflections, succeeding with a constant probability greater than 1/2.The precision parameter directly translates to a maximal error of O( ) in the estimated overlap.To succeed with a higher probability, we can repeat quantum phase estimation and take the median of the outcomes.By Hoeffding's inequality, the success probability can be made arbitrarily close to one with only logarithmic overhead.We thus obtain: Lemma 1 (Quantum overlap estimation).Given quantum state |Ψ = PREP |0 , unitary SEL, > 0, and 0 < δ < 1, there exists a quantum algorithm with output y such that In the description of the above algorithm, we have ignored the normalization factor λ > 0 introduced by |Ψ = PREP |0 and SEL.To get the target quantity, we need to multiply the outcome of quantum overlap estimation by λ.To ensure that the estimation succeeds with probability 1 − δ and accuracy , it then suffices to make O(log(1/δ)λ/ ) queries to PREP and SEL and use O(N • polylog(λ, 1/ , 1/δ)) additional gates.
It is instructive to compare the fault-tolerant approach with the sampling-based approach that is more suitable to implement on near-term quantum devices.That approach uses the generalized Hadamard test which produces an unbiased estimate of the real and imaginary part of Ψ| SEL |Ψ with constant variance.By Hoeffding's inequality, it suffices to take O(log(1/δ)λ 2 / 2 ) samples to estimate with accuracy and probability 1 − δ.Each sample requires constant number of queries to PREP and SEL.Therefore, we get a factor of Θ(λ/ ) saving by switching to the fault-tolerant quantum algorithm.See Sec.IV A and IV B for further discussions of these two approaches.

Truncation and discretization error
In this section, we analyze the error due to the truncation of real parameters and discretization of the time integral.We will see in Sec.IV B that this only introduces a logarithmic overhead in the overall cost.
We first consider discretizing the integral as The above discretization only achieves first-order accuracy, but one can improve this by switching to a higher-order scheme, which can reduce the cost of fault-tolerant implementation; see [7, Appendix H] for details.
In the following, we evaluate this bound for a model of clustered Hamiltonians acting on N sites: Here, each term from the Hamiltonian acts on at most two sites and the sites are further grouped into clusters.We use calligraphic capital letters such as K and L to denote the clusters, and use k to denote an arbitrary single site within K. Assuming that Hamiltonian terms are normalized H k,l ≤ 1, we have Altogether, we have discretized (B1) with error at most We take T and L to be powers of two to simplify our circuit implementation.We now consider the error due to the finite-digit truncation of the real parameters σ x,tj , t j , a x,m , and ω x,m .In general, the error in σ x,tj can be bounded under certain continuity assumptions with respect to the argument t j .Here, we take σ x,tj ≡ σ to be constant to simplify the analysis.In our circuit implementation, the evolution time will be loaded onto a quantum register using the QROM approach of Ref. [15] as where we have again taken K to be a power of two to simplify the implementation.Here, T is the maximum possible time so log T bits suffice to represent the integer part of t.The length of the decimal part should be chosen large enough to represent the time sufficiently accurate.Specifically, for |t − t| ≤ 1/K, we have Similarly, if t j and ω x,m satisfy t j − t j ≤ 1/K and ω x,m − ω x,m ≤ 1/K, then cos(t j ω x,m + φ x ) − cos(t j ω x,m + φ j ) ≤ t j ω x,m − t j ω x,m = O where T := max j t j and W := max x,m |ω x,m |.The coefficients in the Hamiltonian can be approximately prepared using the coherent alias sampling approach also described in Ref. [15].Using that approach with log K qubits for the inequality test, we have The total truncation error can now be bounded by To ensure that this error is at most , it suffices to choose K = O (poly(λ, N, Λ ind , T, W, 1/ )) .

(B29)
Appendix C: Alternative derivation of Hamiltonian derivatives via optimal control theory To derive Eq. 5 through optimal control theory, we enforce the evolution of ρ x (t) by H + H x (t) variationally.We introduce an auxiliary field κ x (t) as a Lagrange variable to enforce this condition, which transforms our cost function to  (C5) Substituting in Eq.C4 yields Eq. 5 as required.Following a similar procedure to take second-order derivatives of C with respect to ρx (t) and κx (t) yields Eq. 6 after some rearrangement.
FIG.1.Cartoon of the "phases of learnability" of a quantum Hamiltonian.In the red region, the set of experimental data is insufficient to distinguish candidate Hamiltonians.This becomes increasingly likely for systems beyond a localization transition (red dashed line), as after this point experiments such as local correlators tend to provide little information about the system structure.In the blue region, the experimental data is sufficient to learn the system's structure, but the data processing may be achieved classically, rendering a quantum computer unnecessary.This classical processing may either be achieved by the system being wellapproximated by a classically-computable model, or by the experiment having sufficient control (grey dashed line) to isolate smaller subsystems.At the limit of good control, an experiment has the ability to dynamically decouple individual terms (blue dashed line), after which techniques such as those in Refs.[41,43] can be used to estimate terms individually.A simpler limit is to simply have the ability to address individual (or small frequency regions of) spins in the system (green dashed line), which renders the system more learnable.
see Appendix B 2 for details.The controlled time evolution part of the SELECT unitary then takes the form c−U (s, 0) = e. writing t = t j , we can use QROM to map |j |0 → |j |t j .(This requires the |j register be of size n d = log(N d ).) Coherent alias sampling allows us to prepare the state 1 j |j |t j |x , with a

FIG. 4 .
FIG. 4. Finding spin clusters within the ubiquitin protein using data from Ref. [98].(left): a 466-spin cluster connected at a coupling of 10 KHz (the total protein has 692 H atoms). (middle-top) a 238-spin subset of this cluster, connected at a coupling of 12 KHz.(middle-bottom) a smaller 60-spin subset of this cluster, connected at a coupling of 14 KHz.Black lines and red dashed lines between spins indicate strong and medium-strength couplings within the cluster.Red dashed region in all three clusters is an 8-spin sub-cluster studied in this text.(right-top) histogram of couplings within the 60-spin cluster and to the environment.(right-bottom) a zoom-in on the histogram tail to show the distribution of the dominant couplings.

FIG. 5 .
FIG.5.Learning small couplings of a 6-spin cluster within ubiquitin given a fixed backbone from a noisy dataset with standard deviation 10 −3 .The system is learned from a dataset of 21 points equally spaced between 0 and 2 ms (assuming a 23.5 T background magnetic field, and a dipolar suppression of α = 10).Absolute error in individual parameters (blue faint lines) and the mean absolute error (black line) is plotted at each iteration of the CG algorithm.

)
This algorithm makes O(log(1/δ)/ ) queries to PREP and SEL (or their controlled version c−(PREP) and c−(SEL)), and uses O(N • polylog(1/ , 1/δ)) additional gates, where N is the number of qubits in the target system.
) := Trace O x e −i(tj −s)H V n e i(tj −s)H , e −itj H |ψ x ψ x |e itj H . (B11)This discretization error can be made arbitrarily small by choosing L sufficiently large.Here, we analyze how the error scales as a function of L. Using the integral expansion derivative of f within the time interval [0, t j ].The derivative f (s) takes the formf (s) := Trace O x e −i(tj −s)H [iH, V n ] e i(tj −s)H , e −itj H |ψ x ψ x |e itj H , (B15)which gives Trace (O x ρ x (t j )) − m a x,m cos(t j ω x,m + φ x ) accuracy of , it suffices to choose L = O λΛ ind T .(B21)

e
−it H − e −itH ≤ |t − t| H = O N Λ ind K , t, s) = O N Λ ind t K , |g(t ) − g(t)| ≤ |t − t| max τ |g (τ )| = O N Λ ind K (B24) for f (t, s) := Trace O x e −i(t−s)H V n e i(t−s)H , e −itH |ψ x ψ x |e itH , g(t) := Trace O x e −itH |ψ x ψ x |e itH .(B25) of our parameters h n lies within some δ ≤ n |h n −h . Flipping this around, let us suppose we know our initial guess h n . IV A and Sec.IV B. Propagating variance through Eq. 5 yields , top) −2 times, and so with C t,S oracle calls we achieve a variance Var Sx (t) = a x,t,S tC −1 t,S (for some t and x-independent constant a x,t,S ).As [ Sx (t) − S x (t)] is independent of t (being bounded by 2 when O x = 1), and Jnx (t) scales linearly in t (as discussed the main text), assuming that σ x,t = σ we have Trace ρx (t)O x − S x (t) This is now a functional; in addition to the finite real values hn , it also takes as input any smooth matrix-valued functions ρx (t) and κx (t).This implies that all dependence of C on H is explicit; the dependence of ρx (t) (and κx (t) will emerge by the principle of least action.The solution to our problem is given again by the minimum of the functional C. Taking a functional derivative δC δκx(t) = 0 yields the Schrödinger equation in its standard form which yields the solution in the main text: ρx (t) = Ūx (t, 0)ρ x Ū † x (t, 0), Taking the functional derivative with respect to ρx (t) and setting this equal to 0 yields an update rule for κx (t) − t ) Trace ρx (t )O x − S x (t ) O x , κ x (+∞) = 0. (C3)This takes the form of an external field κx (t) that propagates back in time and is perturbed in a non-unitary way by each measurement STo recover the update rule, we then take the partial derivative of C with respect to the parameters h n and set this to zero Trace κx (t) V n , ρx (t) .
x (t ) that does not completely match the predicted Sx (t ).Substituting in the solution for ρx (t) yields a solution for κx (t)κ x (t) = i Ū † x (t,t )O x Ūx (t, t ) Trace ρx (t)O x − S x (t) .