Learning quantum systems via out-of-time-order correlators

Learning the properties of dynamical quantum systems underlies applications ranging from nuclear magnetic resonance spectroscopy to quantum device characterization. A central challenge in this pursuit is the learning of strongly-interacting systems, where conventional observables decay quickly in time and space, limiting the information that can be learned from their measurement. In this work, we introduce a new class of observables into the context of quantum learning -- the out-of-time-order correlator -- which we show can substantially improve the learnability of strongly-interacting systems by virtue of displaying informative physics at large times and distances. We identify two general scenarios in which out-of-time-order correlators provide a significant advantage for learning tasks in locally-interacting systems: (i) when experimental access to the system is spatially-restricted, for example via a single"probe"degree of freedom, and (ii) when one desires to characterize weak interactions whose strength is much less than the typical interaction strength. We numerically characterize these advantages across a variety of learning problems, and find that they are robust to both read-out error and decoherence. Finally, we introduce a binary classification task that can be accomplished in constant time with out-of-time-order measurements. In a companion paper, we prove that this task is exponentially hard with any adaptive learning protocol that only involves time-ordered operations.


I. INTRODUCTION
Learning properties of quantum systems can pose challenges not present in their classical counterparts [2,3].These differences often stem fundamentally from the existence of entanglement-measurements of a quantum system that is highly entangled with another system or the environment reveal little information from which to learn.In practical settings, these difficulties are most commonly encountered in strongly-interacting quantum systems.Strong interactions can introduce non-local entanglement throughout the system at short time scales, and are found to thereby inhibit the learning of system properties (e.g. the Hamiltonian) from physical observables [4][5][6][7][8].
The ubiquity of strong interactions in experimental applications of quantum learning has spurred a variety of solutions to this problem.For instance, in nuclear magnetic resonance (NMR) spectroscopy, a suite of technologies have been developed to controllably dampen undesired strong interactions between solid-state nuclear spins, which has enabled the identification of hitherto inaccessible molecular structures [9].In a similar spirit, in quantum device characterization [10] and quantum sensing [7], dynamical decoupling control sequences [11] can effectively eliminate unwanted interactions and improve learning of the residual interactions.Other approaches include learning by transducing quantum data from the 1.Schematic of time-ordered correlators (TOCs) and out-of-time-order correlators (OTOCs) in strongly-interacting systems.TOCs typically decay in O(1) times and distances (top, red), making it hard to learn features (yellow bond) that manifest only at late times.OTOCs utilize backwards time-evolution to "refocus" many-body correlations (bottom, blue), enabling learning of such features.
In this paper, we introduce a different paradigm for learning in strongly-interacting quantum systemslearning via out-of-time-order correlators (OTOCs).
In this work, we utilize the OTOC as a tool for learning properties of strongly-interacting quantum systems.Our application is motivated by a simple intuition: while time-ordered observables decay quickly as a system becomes entangled, out-of-time-order observables continue to fluctuate up to long times (Fig. 1).Guided by this intuition, we demonstrate the power of learning via OTOCs across a range of physical systems, supported by numerical studies, phenomenological estimates, and rigorous information-theoretic proofs.We begin in locallyinteracting systems, where we identify two general scenarios in which OTOCs provide a strong learning advantage: (i) when experimental access to the system is spatially-restricted, for example via a single "probe" qubit [31][32][33], and (ii) for detecting weak interactions in an otherwise strongly-interacting system [7,10].We characterize these advantages using both informationtheoretic measures (the Fisher information) and performance metrics for concrete learning tasks.Moreover, we find that the advantages are robust to experimental read-out error and time-reversal imperfections arising from strong coupling with an environment or decoherence.Finally, motivated by recent advances in provable learning advantages [5,6,34], we introduce a learning task involving distinguishing two classes of unitary operations if given oracle access.In a companion work [1], we prove that OTOCs provide an exponential advantage in performing this task over any time-ordered learning protocol.

II. BEHAVIOR OF TIME-ORDERED VS. OUT-OF-TIME-ORDER CORRELATORS
We begin by reviewing the phenomenology of timeordered and out-of-time-order correlators in ergodic locally-interacting systems (Fig. 1).A time-ordered correlator (TOC) is defined as any correlation function that takes the following general form: C TOC = tr(A k (t k ) . . .A 1 (t 1 ) ρ B 1 (t 1 ) . . .B (t )).(1) where the operators A, B increase in time away from the initial density matrix ρ, i.e. by evolving the state ρ forward in time (e.g. via Hamiltonian evolution O(t) = e iHt Oe −iHt ) while applying intermediary quantum operations at each time t i , t j [35].Any correlation function that does not obey this form is called an out-of-time-order correlator.
A common example of a time-ordered correlator is the two-point function, where • ≡ tr(•)/2 L denotes the infinite temperature trace for L qubits, and V x , W x are local operators at sites x, x .Such correlators measure the spread of local quantities in space and time; for instance, how much spin prepared at site x at time zero has transferred to site x at time t.A wide range of literature on thermalization in strongly-interacting systems has found that local TOCs typically decay quickly, i.e. in O(1) times, to their thermal values [36].This quick decay can inhibit learning tasks, since no additional information can be acquired from the TOC at times after the decay has occurred [8].
Meanwhile, the prototypical out-of-time-order correlator is the four-point function [26], with local operators V x , W x .Unlike time-ordered measurements, OTOCs typically require both forwards and backwards time-evolution to measure [1,26].(Importantly for our application, nearly all experimental techniques for time-reversal rely only on the type of interaction being reversed and require no knowledge of the specific Hamiltonian, which one might wish to learn.For example, the same pulse sequence reverses an arbitrary dipole-dipole coupling Hamiltonian in an NMR experiment [29].)Physically, the OTOC probes whether information encoded at site x at time zero is contained in correlations involving site x at time t.This is quantified by the squared commutator of a time-evolved operator at x with a local operator at In local strongly-interacting systems, operators are expected to spread ballistically according to the connectivity of the system [24,37,38].Crucially, this spread continues for a duration proportional to the system's spatial extent ∼ L by which time the information has been delocalized across the entire system.
This phenomenology leads to two central intuitions for learning from OTOCs.First, the dynamics of the OTOC contain information primarily about the connectivity of the system under study.Second, the OTOC continues to reveal such information up to O(L) times, long after TOCs have decayed.Notice that this timescale increases as the system size increases.In what follows, we apply these intuitions to identify two broad regimes where access to OTOCs provides a significant learning advantage.

III. LEARNING WITH RESTRICTED ACCESS
The first regime we consider is learning in systems with restricted access.Specifically, motivated by recent advances in solid-state defects [32,33,39] and NMR [40][41][42], we focus on the scenario where an experimenter has state preparation and read-out capabilities over only a single "probe" qubit interacting with a larger system that one wishes to learn.We note that high-fidelity OTOC measurements have already been achieved in similar setups by using rapid global pulse sequences to reverse timeevolution [29,43,44].Previous theoretical approaches to learning in this scenario have been limited to noninteracting dynamics [31,42,[45][46][47][48].Meanwhile, experiments have found that it is in general difficult to learn features of a system that are distant from the probe qubit [32,33].In strongly-interacting systems, this difficulty can be understood from the quick decay of correlation functions in space and time.Here, we provide evidence via phenomenological estimates (Appendix B) and numerical simulations (Fig. 2) that access to OTOCs can exponentially improve the learnability of distant features.
To be concrete, we will assume for now that the experimenter has local unitary control over the qubits of the larger system [49].We will also assume that the larger system begins in an infinite temperature (i.e.maximally mixed) state, which is the natural scenario in NMR and solid-state defect setups [39,40].Within these assumptions, a simple class of measurement protocols proceeds as follows: 1. Prepare the probe qubit p in an eigenstate of an operator V p , such that the density matrix of the entire system is ρ 2. Time-evolve by time τ .
3. Perturb the system by a unitary operation W x on a qubit x.
4. Time-evolve by a time τ .
5. Read out the expectation value of V p on the probe qubit.
Taking τ, τ to be positive (e.g.τ = τ = t/2), this allows measurement of time-ordered correlation functions of the form V p (t) W x (t/2) V p (0) W † x (t/2) .With access to reversible time-evolution (e.g.τ = −τ = t), the above protocol also allows measurement of out-of-time-order correlation functions V p (0) W x (t) V p (0) W † x (t) .In Appendix B and C, we discuss how learning is modified when W is instead a global spin rotation over the larger system.
We begin our exploration of learning via OTOCs by introducing a concrete learning task.We consider the following scenario: one is given access to a quantum system consisting of two spin chains intersecting at a distance d from a probe qubit [Fig.2(b)].The value of d as well as the specific Hamiltonian parameters of the system are unknown (see below for the specific distribution that the Hamiltonian is drawn from).The goal is to learn the value of d, i.e. the geometry of the system, from measurements of the system's correlation functions.
To solve this task, we assume that the experimenter is capable of simulating quantum dynamics on either a classical or quantum computer.Since the task involves high-dimensional input data (i.e. the correlators for every x, t), we will approach it using machine learning techniques.Specifically, we envision using the quantum simulator to compute the correlation functions of an ensemble of Hamiltonians for each value of d.These ensembles can then be used to train a classical learning model to predict an unknown Hamiltonian's value of d given its correlation functions.
< l a t e x i t s h a 1 _ b a s e 6 4 = " t V q 7 q n G 8 g k 6 c Y k  Let us briefly summarize our numerical simulations in more detail (see Appendix A for a complete description).Throughout this work, we consider spin systems with disordered on-site fields, ] and α = x, y, z, and dipolar interactions between neighboring spins, 6, 1.4].We specify to Floquet dynamics consisting of alternating applications of H f and H c for time T = π/2, and simulate time-evolution via Krylovsubspace methods [50].We expect that learning Floquet dynamics will be qualitatively similar to learning timeindependent Hamiltonian dynamics at moderate times and distances, which we are restricted to in our numerics (see Appendix C for numerical support of this statement).At larger distances we expect Hamiltonian dynamics to be dominated by hydrodynamics of the conserved energy (Appendix B) and the two will differ.
Returning to the learning task at hand, we train a support vector machine (SVM) on 3000 randomly drawn Hamiltonians (300 for each value of d = 0, . . ., 9), and test its performance on 2000 additional Hamiltonians.
To ensure that learning is not sensitive to fine-tuned features of the correlation functions, we add a Gaussian distributed "read-out error" to all correlation functions, with mean zero and standard deviation δ = 3%.The model's predictions as a function of the actual value of d are displayed in Fig. 2(b), for learning either via TOCs (red) or both TOCs and OTOCs (blue).We find that learning via OTOCs allows accurate predictions of d within ±1 of its actual value for all distances probed (up to d = 9).In contrast, with access to only TOCs, the model performs significantly worse for all d and resorts to nearly random guessing for d 3.
To evaluate the learning advantage of OTOCs independent of a specific learning task, we turn to the Fisher information (FI).The FI quantifies the amount of information that a random variable (e.g. a correlation function C, measured within some read-out error δ) carries about an unknown parameter (e.g. a coupling strength, J), and thereby bounds the ultimate learnability of the parameter [51].If one assumes that read-out errors are normally distributed, the FI is simply a squared derivative, FI(J|C) ≡ δ 2 FI(J|C; δ) = |∂C/∂J| 2 , where we remove the δ-dependence by introducing a factor δ 2 .
We numerically compute the FI in ergodic 1D spin chains, where one seeks to learn a coupling J d lying a distance d away from a probe qubit [Fig.2(b) inset] [31,42,[45][46][47][48].We consider the same set of correlation functions as specified for the learning task in Fig. 2(a).In Fig. 2(b), we plot the maximum Fisher information max C FI(J|C) over all correlation functions (i.e. over all x, t), averaged over 200 and 1000 disorder realizations for TOCs and OTOCs respectively.We find that the maximum FI of TOCs (red) decays exponentially in the distance d from the probe qubit.In contrast, the maximum FI of OTOCs (blue) follows a slow algebraic decay, ∼ 1/d, thereby achieving a multiple-orderof-magnitude advantage over TOCs even at modest distances, d 3.This algebraic decay arises from the ∼ √ t broadening of the OTOC wavefront in time [24], see Appendix B for a full phenomenological derivation.

IV. LEARNING WEAK INTERACTIONS
We now turn to our second learning scenario: characterizing weak interactions in an otherwise stronglyinteracting system.Such characterization is notoriously difficult because weak interactions take long times to manifest (of order the inverse interaction strength), at which point TOCs have decayed due to the strong interactions.Previous approaches require either dynamical decoupling of the strong interactions [7,10] or highprecision measurements at early times [13,14].We will now show that access to OTOCs allows one to side-step these requirements when characterizing weak interactions that change the connectivity of a strongly-interacting system.Notably, in contrast to the previous learning scenario, this advantage holds when the experimenter is capable of measuring all local correlation functions of the system of interest.
For concreteness, we specialize to 1D spin chains with a single "weak link" interaction, of strength J much less than the typical interaction strength J [see Fig. 3(b) inset].We consider TOCs and OTOCs of the form Eq. ( 2) and Eq. ( 3), where x, x run over all qubits in the system.We anticipate that access to more general correlators within a given time-ordering, e.g.via shadow tomography or related techniques [52][53][54], will not qualitatively change the observed physics (see Appendix A).
We begin as before with a concrete learning task.Specifically, we suppose that one is given access to a spin chain with unknown Hamiltonian parameters and either no link interaction (J → 0) or a fixed non-zero weak link interaction strength J .For each fixed value of J , we train a binary SVM classifier on the correlation functions [Eqs.( 2), (3)] of 300 disorder samples, again including a read-out error δ = 3% in each correlator value.We test model performance on 200 additional samples; the resulting classification accuracies are shown in Fig. 3(b).We observe the following general trends: (i) the accuracy decreases as J decreases; (ii) learning via both OTOCs and TOCs (blue) allows detection of ∼ 10 times smaller J than learning via only TOCs (red); and (iii) OTOCs allow detection of increasingly small J as the size L of the chain increases.
To understand this behavior analytically, we first note that the optimal correlation functions for detecting the link will typically involve operators lying immediately adjacent to that link, on both of its sides.These correlators measure either the transfer of spin polarization (for TOCs) or operator support (for OTOCs) across the link, and will be non-trivial only if the link interaction strength is nonzero.For TOCs, one expects spin polarization to cross the link incoherently, at a rate ∼ J 2 /J, where J is the typical strong interaction strength.Combined with an overall exponential decay of spin in time (if the system has no conserved quantities), we expect C TOC ∼ (J 2 /J) t e −Jt .For OTOCs, one expects an operator's support to cross the link at a similar rate, 1 − C OTOC ∼ (J 2 /J)t.Crucially however, this growth persists until much later times, t ∼ L/J, at which information traveling "around" the chain will abruptly cause the OTOC to decay to zero.The optimal time for detecting the link occurs when these correlators are maximized, since each is zero in the absence of the link.The TOC is maximized at an order one time t ∼ 1/J, at which the correlator magnitude C TOC ∼ J 2 /J 2 is suppressed by the square of the weak link interaction strength.In contrast, the OTOC is maximized at a much later time t ∼ L/J, and thereby features a magnitude 1 − C OTOC ∼ L(J 2 /J 2 ).In both cases we see that detection of the link becomes more difficult as the link strength decreases.Detection via the OTOC is enhanced by a factor of L, which captures the connectivity change associated with the link.
We confirm these estimates quantitatively by computing the Fisher information of the link interaction strength.In Fig. 3(b), we plot the maximum Fisher information max C FI(log(J )|C) over all local correlation functions, averaged over 100 disorder realizations.Here, we consider the logarithm of the link interaction strength in order to appropriately compare the Fisher information over multiple orders of magnitude of the interaction.The Fisher information of log(J ) bounds the learnability of the interaction strength as a percentage of its actual value.Applying our phenomenological estimates, we predict that FI ∼ J 4 /J 4 for TOCs, and FI ∼ L 2 J 4 /J 4 for OTOCs.Observing Fig. 3(b), we indeed find that the FI is suppressed by ∼ J 4 (dashed lines) for small J , and dis-FIG. 5. Solution to the disjoint unitary problem with outof-time-order measurements.The state |0 ⊗n is prepared and the unknown unitary (either U or U1 ⊗ U2) is applied.Next σx is applied to the first qubit, followed by the inverse of the unknown unitary.Finally, it is checked if the second block of n/2 qubits ends up in the all zero state.If so, then the hidden unknown unitary is U1 ⊗ U2 as per case (ii); if not, then the unknown unitary is U as per case (i).
plays a multiplicative advantage for OTOCs (blue) compared to TOCs (red), which grows as L increases.

V. EFFECT OF EXPERIMENTAL ERRORS
Let us now address the impact of experimental errors on learning.We begin with errors that accumulate throughout time-evolution.These may occur from extrinsic decoherence or imperfect time-reversal dynamics, each of which disrupt the non-local correlations probed by the OTOC [28-30, 55, 56].While this disruption can be mitigated via independent error estimates [28,55], for sufficiently large errors these estimates involve measuring quantities of small magnitude (comparable to the TOC), squandering the OTOC's learning advantage.In Appendix B we estimate that our previous results are modified in the presence of a small local error rate ε J as follows: in the first learning regime, the OTOC maintains its advantage up to distances d J/ε; in the second regime, the L-fold advantage is replaced by a (min{L, J/ε})-fold advantage.
In practice, we find that learning via OTOCs remains robust even to relatively large amounts of imperfect timereversal [Fig.3(d)].We study this numerically in the "weak interaction" learning problem of Fig. 3(b).As a concrete instance of imperfect time-reversal, we take the spins to be coupled to an extrinsic cavity mode and assume that the spin dynamics are perfectly reversed but the cavity dynamics and spin-cavity coupling g are unreversed.We find that access to OTOCs substantially improves the classification accuracy even for quite large spin-cavity couplings g ∼ 0.5, up to half the spin-spin interaction strength.
We can also examine the dependence of learning on read-out errors, namely where one measures a correlator C up to additive error.Indeed, we have already incorporated a realistic read-out error δ = 3% in our previous numerical studies [Figs.2(a), 3(a), 4(a)].Intuitively, we expect larger read-out errors to make learning more difficult; however, we have little reason to expect read-out error to change the relative advantage of OTOCs compared to TOCs.We test this numerically by repeating the analysis of Fig. 3(b) for various read-out errors, δ.
For each δ, we compute the minimum link strength J * that can be learned with > 90% accuracy [Fig.3(c)].For errors δ 10 −3 , our results agree well with analytic estimates, which predict (J * /J) 2 ∼ δ for TOCs and (J * /J) 2 ∼ δ/L for OTOCs.Intriguingly, for sufficiently small errors δ 10 −3 , the minimum link strength detectable with TOCs saturates to a finite value J * ∼ 0.2.Below this value, sample-to-sample fluctuations of the TOC cause the learning task to be difficult regardless of the read-out error.

VI. PROVABLE LEARNING ADVANTAGE
We have so far demonstrated the learning power of OTOCs using phenomenological arguments and numerical simulations, owing to the difficulty of obtaining analytic results for ergodic Hamiltonian systems.Complementary to these results, we now introduce a binary classification task in which the OTOC is provably efficient.The task is as follows: Disjoint unitary problem: One is given oracle access to either: (i) a fixed, n-qubit Haar-random unitary U , or (ii) a tensor product of two fixed, n/2qubit Haar-random unitaries, U 1 ⊗ U 2 .The task is to determine which of (i) or (ii) is realized.
Qualitatively, this problem resembles the Hamiltonian learning scenarios identified previously.First, the feature we seek to learn-the connectivity of the unitarydirectly determines how information spreads through the system, as measured by the OTOC.Second, a Haarrandom unitary is inherently "strongly-interacting", which causes time-ordered measurements to decay and thus provide little information.
In Fig. 5 we show that the disjoint unitary problem can be solved with a constant number (with respect to n) of queries to the oracle and its time-reverse U † , by measuring an out-of-time-order observable.Letting V denote the unknown unitary (either U or U 1 ⊗ U 2 ), the OTOC is In case (i), the OTOC is near zero with probability exponentially close to one [1].In case (ii), the OTOC is one, since the two subsystems are not coupled by U 1 ⊗ U 2 .Thus, with probability exponentially close to one, the two cases may be distinguished with a single query to the unknown unitary and its time-reverse.In contrast, in a companion work [1], we prove that any time-ordered learning protocol requires an exponential number Ω(2 n/4 ) of queries of the unknown unitary to solve the disjoint unitary problem.Our proof applies even to adaptive measurement strategies, and leverages novel contemporary techniques from quantum learning theory [5,6,34].

VII. DISCUSSION
In this work, we have shown that out-of-time-order measurements can provide powerful advantages for learning the dynamics of quantum systems.Our results thus highlight the potential gains that can be achieved by quantum experiments if they have sufficient control and coherence to apply time-reversed dynamics.Extraordinary experimental progress has led to an ever-increasing number of such platforms [27][28][29][30][57][58][59], and we envision that learning via OTOCs might find applications across these diverse physical contexts.Specific future directions include learning long-range cross-talk in quantum processors [60], and strongly-interacting problems in NMR [8].
On the theoretical front, our results follow in the footsteps of recent works in quantum learning theory [5,6,34,61,62] to provide new avenues for quantum advantage.Our applications pertain to genuine questions of experimental interest, providing a new bridge between the theoretical tools of quantum learning theory and problems of practical importance in experiments.and H(t) is the time-dependent stroboscopic Floquet Hamiltonian specified in the main text (unless otherwise stated, in Fig. 7).
In the restricted access scenario considered in Figs. 2, 7, we use the following correlation functions: where p denotes the probe qubit, and • ≡ 2 −L tr(•) is an infinite temperature average.Each of these correlation functions can be measured using state preparation and read-out on the probe qubit, combined with time-evolution and a single local unitary operation on the larger system.(In the case of the auto-correlation function C TOC (t) no local unitary operation is needed, W x = 1.) In principle, we envision allowing V, W to run over all local operators in the system.For instance, they could run over all 4 w N w Pauli operators of weight ≤ w, where w ∼ O(1).This is naturally achieved by randomized measurement strategies such as shadow tomography with local Clifford unitaries and O(3 w ) measurements [53,54].In practice, we must restrict V, W to a few possible values in numerical simulations.Specifically, we take V = W ∈ {σ x , σ z } for TOCs, and V = W ∈ {σ z } for OTOCs.The OTOC is observed to be relatively insensitive to basis of V and W , hence our choice to restrict to a single operator, σ z (further, we note that adding σ x OTOCs could only improve the relative advantage of OTOCs compared to TOCs).More broadly, we do not expect that adding additional pairs of {V, W } will change the qualitative behavior of learning via TOCs and OTOCs.Specifically, we have seen that the learning advantage of OTOCs arises from their ability to detect highly non-local correlations in the system (i.e.large-weight components of the time-evolved operator V p (t), see Appendix B for more detailed phenomenological estimates).These correlations are not detectable by any time-ordered correlator involving only few-body operators; indeed, in ergodic systems we generically expect that they are not efficiently detectable by any time-ordered measurement.
For Figs. 3, 4 of the main text, we utilize two-point correlation functions between pairs of local operators: We again take V = W ∈ {σ x , σ z } for TOCs and V = W ∈ {σ z } for OTOCs.We allow x, x to span all qubits within a distance 2 of the link-this consists of 6 possible values for each of x, x , corresponding to distances 0, 1, and 2 to both the left and right of the link.In principle, we would like x, x to run over the entire lattice; however, in practice we observe that correlation functions involving qubits distant from the link provide little information, and so can be safely neglected.
We now briefly comment on our numerical methods for computing the above correlation functions and the Fisher information [Figs.2(b), 3(b)].We compute the infinite temperature average in the correlation functions by sampling over Haar-random initial states |ψ .To motivate this, we can insert a resolution of the identity, 1 = 1 2 L ψ |ψ ψ| into the correlation functions Eq. (A2) to obtain: and similarly for the OTOC.In numerics, we approximate this sum by sampling a finite number N ψ of states |ψ drawn from the Haar distribution; errors in this approximation will scale as ∼ 1/ N ψ 2 L .
In the learning problems considered in the main text [Fig.2(a), 3(a), 4], we take N ψ = 25, 25, 10, 1, 1 for system sizes L = 6, 8, 10, 12, 14, respectively.In contrast, when estimating the Fisher information [Fig.2(b), 3(b)], we perform a large-N ψ extrapolation to improve precision.This is required in order to establish the asymptotic scaling of the Fisher information at large d [Fig.2(b)] and small J [Fig.3(b)].Specifically, we compute the estimated < l a t e x i t s h a 1 _ b a s e 6 4 = " N 2 4 B + r k Z T D p X / r X P 5 T / 5 H X Z J j 3 (Left) For each value of N ψ , we then compute the maximum Fisher information over all correlation functions, max FI(N ψ ) (solid red lines, darker lines corresponds to higher N ψ ).(Right) Our estimate of the maximum Fisher information at infinite temperature (dotted lines, both plots) is obtained by fitting max FI(N ψ ) = max FI(∞) + A/N ψ and taking N ψ → ∞ (points denote data, solid lines denote 1/N ψ -fit).
correlation function C N ψ averaged over N ψ = 1, . . ., 25 Haar-random initial states, as well as the resultant Fisher information max FI(N ψ ) ≡ |∂C N ψ /∂J| 2 , maximized over all relevant correlation functions.We then perform a linear fit max FI(N ψ ) = max FI(∞) + A N ψ , where max FI(∞) and A are fitting parameters.Finally, the fitting parameter max FI(∞) represents our estimation of the Fisher information at N ψ → ∞, which we plot in Figs.2(b), 3(b).We illustrate this procedure in Fig. 6, using the data for Fig. 2(b).On the left of Fig. 6, we plot the maximum Fisher information, max FI(N ψ ), for each N ψ , as a function of the distance d.We observe that in regions where C is relatively large (i.e.small d), the estimates are quite accurate even for N ψ = 1, while in regions where C is small (i.e.large d) the Fisher information becomes successively smaller as the number of sampled states N ψ increases.On the right of Fig. 6, we re-plot the Fisher information for each d as a function of N ψ .Solid lines represent the results of the linear fit, which we observe to fit the N ψ -dependence of the data quite well.The extrapolated Fisher information [as displayed in Fig. 2(b)] is shown in Fig. 6 as a dashed line.

Imperfect time-reversal via cavity mode
In Fig. 4(a), we benchmark the effects of decoherence on learning by coupling the spin system to a single cavity mode.Our motivation for studying this model is two-fold.First, in ergodic many-body systems the effect of local errors on OTOCs is expected to be independent of the precise microscopic form of the error [56].We therefore expect the spin-cavity system to display similar OTOC physics to more generic local error models.Second, for L = 10 spins the spin-cavity system can be exactly simulated in a Hilbert space of size 2 L × L (we assume the cavity initially has zero occupation number; since the sum of spin magnetization and the cavity occupation is conserved, the cavity occupation is upper bounded by L).This is substantially smaller than the requirements to exactly simulate a mixed state quantum system, 2 2L .
More specifically, the spin-cavity Hamiltonian is as follows.We modify the Floquet time-evolution described in the main text to alternate between the following two Hamiltonians: where H f , H c are the field and coupling Hamiltonians written in the main text, a, a † are lowering/raising operators for a bosonic cavity mode, g = {0.0,0.25, 0.5} is the spin-cavity interaction strength, and ω = 1.7 is the cavity frequency.
Here, the ± denote values during forwards/backwards time-evolution; note that we do not reverse the spin-cavity interaction or the cavity frequency during backwards time-evolution.

Learning model
We now detail the machine learning techniques used in Figs.2(a), 3(a), 4 of the main text, and Fig. 8(b) of the Appendix.Throughout, read-out error is mimicked by adding a random Gaussian variable with mean zero and standard deviation δ to the exact correlation functions.
We begin with Fig. 2(a).Our goal is to predict the value of d [which specifies the geometry of the spin system, see Fig. 2(a)] from the correlation functions of the system, Eq. (A1).To do so, we train a learning model on 3000 randomly drawn disorder realizations of the Hamiltonian, consisting of 300 realizations each for d = 0, 1, . . ., 9. We test model performance on 2000 additional disorder realizations, again consisting of 200 realizations each for d = 0, 1, . . ., 9. For each disorder realization, the input to our learning model consists of the correlation functions Eq. (A1), evaluated at x = 2, . . ., L and 30 evenly spaced times between 0 and 12.We apply Gaussian distributed read-out error δ = 3% to each correlation function.We repeat this procedure, as well as the model training and evaluation that follows, first using only TOCs as input to the learning algorithm, and second using both TOCs and OTOCs.
We now turn to Fig. 3(a) and 4. Our goal is to perform binary classification using the correlation functions Eq. (A2) to distinguish whether the link interaction strength is zero or nonzero.To do so, we simulate the correlation functions of 300 randomly drawn disorder realizations of the Hamiltonian for each link strength, J = {0, 0.01, 0.017, 0.03, 0.06, 0.1, 0.17, 0.3, 0.6, 1.0}.For each nonzero J , we train a learning model to perform binary classification between link strength 0 and J .We test model performance on 400 additional disorder realizations, again consisting of 200 realizations each for link strength 0 and J .
The first step of our learning model is to prune the correlation functions used as input.We do so by estimating the mutual information between each individual correlation function and the link interaction strength, and selecting the K correlation functions with the highest mutual information.Here K is a hyperparameter that will ultimately be chosen via cross-validation.To estimate the mutual information, we fit the distribution of correlation functions values over disorder realizations to a Gaussian for each link strength, and compute the Jensen-Shannon divergence between the Gaussian distributions.The Jensen-Shannon divergence is equal to the desired mutual information [65].
As before, we input the selected correlation functions into a support vector machine (SVM) with radial basis functions [64].We now have three hyperparameters: the SVM hyperparameters C and γ and the number of selected correlation functions K.We choose C, γ, and K by performing five-fold cross-validation over each value C = {0.1, 1, 10, 100, 1000}, γ = {0.1,0.3, 1, 3, 10, 30}.We obtain Figs.4(b), by repeating this procedure for various simulated read-out errors, δ = {0.0001,0.0003, 0.001, 0.003, 0.01, 0.03, 0.1}.At each read-out error, we perform a linear interpolation of the classification accuracy as a function of J [as shown in Fig. 3(a) for δ = 0.03].The minimum detectable link strength J * occurs at the intersection of this interpolation with a horizontal line (not depicted) corresponding to a classification accuracy of 90%.
Finally, we turn to Fig. 8(b) in Appendix C. In the probe qubit scenario, training and testing are performed on 300 and 200 samples respectively for each geometry and each value of d.In the global state preparation and read-out scenario, we instead use 60 and 40 samples respectively for each geometry.Our learning model consists of a support vector machine with hyperparameters chosen via 4-fold cross-validation from the sets C = {0.1, 1, 10}, γ = {0.3, 3, 30}.As in our previous learning tasks, we apply a read-out error δ = 3% to each correlation function before use in learning.II.Phenomenological estimates of the scaling of the Fisher information in the restricted access scenario, for learning an interaction that lies a distance d from the probe qubit.

Fisher information in restricted access scenario
At sufficiently large times and distances, we expect the profile of correlation functions in ergodic many-body systems to be described by only a few phenomenological parameters.For instance, in one-dimensional systems the out-oftime-order correlator is predicted to take the following functional form [38,66], where the phenomenological parameters v B and A describe the butterfly velocity and the width of the OTOC wavefront, respectively.Here f is a compactly supported bump function which interpolates between zero and one and then zero again within an O(1)-sized region about the origin.Meanwhile, in systems with a local conservation law, we expect time-ordered correlators to be dominated by diffusion of the conserved quantity.This leads to the following profile for the auto-correlation function, where D is a diffusion constant.In the absence of conserved quantities, one expects time-ordered correlation functions to instead decay exponentially in time, parameterized by a decay rate γ.
To obtain the Fisher information, FI(J|C) = |∂C/∂J| 2 , we must compute the derivative of the correlation functions with respect to a local coupling strength, J y .To do so while leveraging the above phenomenological predictions, we must first recognize that the phenomenological parameters are themselves dependent on the local coupling strengths of the system, e.g.v B → v B ({J y }).We expand on this in further detail for each case below.The resultant scaling of the Fisher information in various physical regimes is summarized in Table I.

a. Fisher information of OTOCs
We begin with the Fisher information of OTOCs.Our treatment is broken into two parts, corresponding to the scenarios where the experimenter has either local or global unitary control over the larger system.The former scenario is simulated numerically in Fig. 2 of the main text.
Local unitary control.-Weconsider local OTOCs [Eq.(A1)] and are interested in the dependence of the OTOC on the local coupling strengths, {J y }.To approach this, we will assume that OTOC takes the same functional form as in Eq. (B1), but now with a position-dependent butterfly velocity, v B (x).Specifically, we assume that the effective butterfly velocity at time t receives contributions from all couplings that have been visited thus far, i.e. all J y with y x.Since the time to traverse a single coupling is proportional to the inverse coupling strength 1/J y , we expect the time to traverse all couplings up to a distance x to be proportional to the sum x y=0 1/J y .Equating this time to the distance divided by the effective butterfly velocity x/v B (x), we have: If each coupling strength is drawn independently from some disorder realization, then at large times the butterfly velocity will be close to its typical value, v B = 1/J.We can now compute derivatives of the correlation function with respect to a given coupling strength via the chain rule.The derivative of the butterfly velocity is which yields the following for the OTOC: There are two parameters of the local OTOC chosen by a potential experimentalist: the position x of the local perturbation, and the evolution time t.We are interested in the maximum Fisher information given an optimal choice of x and t.Observing Eq. (B7), we see that the derivative f is maximized by the choice x = v B t, while the delta function then sets v B t = d.Plugging these values in, we find the Fisher information max Global unitary control.-Wenow turn to an alternate experimental scenario, where one has only global unitary control over the larger system.In this scenario, the natural generalization of the correlation functions Eq. (A1) is the following: Here we replace the local unitary operations of Eq. (A1) with global spin rotations, e iφ x Wx , by an angle φ (here, W x is a local Hermitian operator on qubit x).
We expect the behavior of the OTOC under global control to be governed by the "size" of time-evolved operators [29,30,44].The size corresponds to the average of local OTOCs over all qubits in the system [67].In one-dimensional ergodic systems the size grows linearly ∼ v B t, which yields the following phenomenological expectation for the global OTOC [30]: Here we have made the butterfly velocity time-dependent to capture its dependence on the local coupling strengths, where v B = 1/J is the typical butterfly velocity.
Taking the derivative of the OTOC via the chain rule, we have: We would like to maximize the Fisher information over the parameters (φ, t).This entails taking the time t to be as early as allowed by the delta function, t ≈ d/v B , in order to minimize the exponential.The correlator is then maximized by choosing φ such that φ 2 v B t ∼ 1.This gives a Fisher information:

b. Fisher information of TOCs in absence of conserved quantities
We now turn to a simpler case, the Fisher information of time-ordered correlators in the absence of conserved quantities [Fig.2(b)].Under ergodic dynamics, we expect such correlation functions to decay exponentially in time at sufficiently large times, see Eq. (B3).Now, consider the derivative of the correlation function with respect to a local coupling strength at a distance d away from the probe qubit.By causality, this derivative can only be non-zero after a time t d/v B .However, at such times the magnitude of the correlation function has already decayed by a factor of e −γt .This suggests that the Fisher information will decay exponentially in the distance d, The scaling of the Fisher information for TOCs is modified in the presence of a conserved quantities.In this case, one expects the TOC at sufficiently large times to be dominated by slow diffusive dynamics of the conserved quantity.This lies in contrast to the exponential decay expected in the absence of conserved quantities.
To study this, we begin with the auto-correlation function [first line of Eq. (A1)].Recall that the auto-correlation function can be measured with access solely to the probe qubit, and is thus accessible in both the local and global control scenarios.Similar to the case of OTOCs, we will assume that the dependence of the correlation function on the local coupling strengths is captured by replacing the diffusion constant, D, with a time-dependent value, Following the logic of the previous section, we assume the effective diffusion constant takes the form [68], where D = 1/J is the diffusion constant's typical value.Differentiating with respect to the local coupling strength gives Computing the derivative of the auto-correlation function, we have: The magnitude of the derivative is maximized by taking the minimum possible time, t ≈ d 2 /D, which yields a Fisher information, This can be understood intuitively as follows.For the auto-correlation function to be sensitive to the coupling strength We now turn to the remaining time-ordered correlation functions in Eqs.(A1, B9), which require either local or global unitary control over the non-probe qubits.We will find that such correlators provide no scaling advantage beyond the auto-correlator.
We first consider the case of local unitary control [Eq.(A1)].Physically, these correlation functions correspond to preparing an amount of the conserved quantity (e.g. a spin polarization) at the probe qubit, letting it diffuse for a time t/2, flipping the spin polarization at a qubit x, and measuring the polarization at the probe qubit after an additional time t/2.We thus expect the TOC behave as follows, where q(x, t) ≈ (2πD(t)t) −1/2 exp −x 2 /(2D(t)t) is the propagator of the conserved quantity from position 0 to position x (or vice versa).The first term is equal to the auto-correlation function.The second term arises from the spin flip at position x and time t/2.The spin flip effectively inserts a negative polarization −2q(x, t/2) on the qubit x, which propagates back to the probe qubit with amplitude q(x, t/2).
The derivative of the second term is as follows, The magnitude of the derivative is maximized at Dt ∼ x 2 , x ∼ d, and is of order O(1/d 3 ).This is subleading compared to the auto-correlation function, of order O(1/d 2 ), and thus does not affect the asymptotic scaling of the Fisher information with d.
The case of a global control [Eq.(B9)] is even simpler.A global spin rotation about the x-axis by an angle φ multiplies the conserved quantity at each site by a factor of cos(φ).Here we assume that the x-and y-components of spin that are generated by the rotation quickly decay in time if they are not conserved by the ergodic dynamics.The resulting correlation function is then given by the auto-correlation multiplied by cos(φ).Again, this provides no scaling advantage in the Fisher information.

Effect of imperfect time-reversal and decoherence on Fisher information
We now incorporate imperfect time-reversal dynamics into our estimates of the Fisher information of OTOCs.Previous works have been found that a wide range of experimental errors (e.g.extrinsic decoherence, coherent errors in time-reversal) have a similar effect on OTOC measurements, as long as the relevant errors are local and the dynamics are ergodic [28-30, 55, 56].
Specifically, in one-dimensional systems, one expects that the OTOC under open-system dynamics, COTOC , is equal to the same OTOC under unitary dynamics, C OTOC , multiplied by an overall Gaussian decay in time [56]: Here ε is an effective local error rate, v B is the butterfly velocity, and a is an order one constant.The argument of the above exponential is proportional the volume of the time-evolved operator's light cone.Intuitively, Eq. (B22) states that each error in the causal past of an operator contributes a roughly equal amount to the decay of the OTOC.We note that in finite-size systems we do not expect Eq. (B22) to precisely hold, however, corrections are expected to be suppressed by ∼ ε/J where J is the local interaction strength [56], so we neglect them here.Substituting Eq. (B22) into our estimate for the Fisher information [Eq.(B13)] and setting v B t ≈ d, we find: Meanwhile, we assume that the Fisher information with respect to TOCs is comparatively unaffected by error, and thus once again follows a linear exponential decay in d: Setting the two exponentials to be equal, max C FI(J d | COTOC ) ∼ max C FI(J d | CTOC ), we find that the OTOC continues to provide an advantage over the TOC up to as quoted in the main text.We now apply the same analysis to our second learning regime.Let us set v B ∼ J for consistency with the main text.The Fisher information of an OTOC between operators on either side of the link with respect to the link interaction strength is now modified to max The maximum of the Fisher information as a function of time now occurs at which differs from the unitary OTOC for sufficiently high error rates.Again, we assume that the Fisher information of TOC is not affected by error to leading order.Taking the square root of Eq. (B28), we thus find the L-fold advantage of the OTOC is replaced by a J/ε-fold advantage at error rates ε/J 1/L, as quoted in the main text.
t J 2 i J M J 3 + B W P 0 2 / x g K z E P A k T U 7 P u b e 9 9 3 g x Z 0 r b 9 n f B 2 t s / O C y W j s r H J 6 d n 5 5 V q r a u i R B L q k o h H s u 9 h R T k L q a u Z 5 r Q f S 4 q F x 2 n P m z 0 t / d 6 c S s W i 8 E 2 n M R 0 t J 2 i J M J 3 + B W P 0 2 / x g K z E P A k T U 7 P u b e 9 9 3 g x Z 0 r b 9 n f B 2 t s / O C y W j s r H J 6 d n 5 5 V q r a u i R B L q k o h H s u 9 h R T k L q a u Z 5 r Q f S 4 q F x 2 n P m z 0 t / d 6 c S s W i 8 E 2 n M R 0  2].In (b), the maximum Fisher information is averaged over disorder realizations for both TOCs and OTOCs.At large d, we expect the Fisher information for Hamiltonian evolution to approach a power law decay ∼ 1/d 4 (see Appendix B), but this cannot be observed in our finite-size numerics.

Learning with time-independent Hamiltonian evolution
In the main text numerical simulations, we utilized Floquet time-evolution in which the spin interactions and local fields were applied in a stroboscopic fashion.Our motivations for using Floquet time-evolution instead of Hamiltonian time-evolution were three-fold: First, Floquet dynamics are prevalent in a variety of quantum systems that one might wish to learn, e.g. in digital quantum simulators, and NMR or solid-state defect setups with optical driving.Second, the Floquet dynamics considered are moderately faster to simulate via Krylov subspace methods than Hamiltonian dynamics, since the Hamiltonian of the former contains fewer terms at a given instant in time.Third, we do not expect the behavior of learning via TOCs or OTOCs under the two dynamics to qualitatively differ at moderate times and distances (although at large distances they may, see Appendix B).
Here, we check the latter assumption by repeating the numerical analysis of Fig. 2 using time-independent Hamiltonian dynamics.As shown in Fig. 7(a), we find that the results of the learning task of Fig. 2(a) behave quite similarly for Hamiltonian and Floquet dynamics.In particular, access to OTOCs continues to enable substantially more accurate predictions for the crossing distance d for all d 3.In Fig. 7(b), we turn to the behavior of the Fisher information as a function of a coupling's distance from the probe qubit.Unfortunately, we are not able to discern the ∼ 1/d 4 scaling predicted in Appendix B in our finite-size numerics.Instead, the Fisher information behaves qualitatively similar to that of Floquet dynamics [Fig.2(b)].We anticipate that at sufficiently large distances the Fisher information of Hamiltonian dynamics will indeed asymptote to the expected power law decay.However, at such distances the Fisher information will likely already be too small to be useful for most practical purposes.

Learning under restricted access with global unitary control
We now turn to learning when one has only global unitary control over the system of interest.We consider a learning task where one wishes to classify the geometry of an unknown spin system, which we assume is drawn with equal probability from the three geometries shown in Fig. 8(a).We find that access to OTOCs provides a substantial advantage in this classification task.Notably, we find that OTOCs continue to improve learning even when one has only global state preparation, control, and read-out (i.e. even in the absence of a probe qubit).
The classification problem we consider is a close variant of those introduced in the main text.We suppose that one has access to the correlation functions of an unknown Hamiltonian whose connectivity corresponds to one of the three geometries shown in Fig. 8(a).The goal is to distinguish which geometry describes the Hamiltonian.We again approach this task by training and testing a support vector machine on samples of disorder realizations, see Section A 3 for details.
We consider learning in two different experimental access scenarios.First, we consider the scenario where one has state preparation and read-out from a single probe qubit, and global control over the remainder of the system.In this case, we take the probe qubit to be a distance d away from any distinguishing features of the geometry (see Fig. 8), and study the learnability as a function of d.Note that we are restricted to relatively small distances, d ≤ 4, owing to the particular form of the three geometries considered.We find that access to OTOCs increases the classification accuracy between 10% and 35% for all values of d [Fig.8 ∼ 65% at d = 3, at which learning via TOCs has nearly trivial accuracy.
Our second scenario is even more restrictive: we suppose that one has only global state preparation, control and read-out over the entire system.Despite being commonplace in experiments such as NMR spectroscopy [9], learning in this scenario remains quite difficult in strongly-interacting systems, due to the combination of time-ordered correlators decaying quickly and local information being averaged out by global control and measurement.Indeed, in our learning task, we find that learning via TOCs features a classification accuracy of only ∼ 55%.Intuitively, we expect access to global OTOCs to improve learning, as operator spreading at late times is dependent on global geometric features of the system.In keeping with this intuition, we find that learning via both TOCs and OTOCs improves the classification accuracy to ∼ 80%.
FIG.1.Schematic of time-ordered correlators (TOCs) and out-of-time-order correlators (OTOCs) in strongly-interacting systems.TOCs typically decay in O(1) times and distances (top, red), making it hard to learn features (yellow bond) that manifest only at late times.OTOCs utilize backwards time-evolution to "refocus" many-body correlations (bottom, blue), enabling learning of such features.

FIG. 2 .
FIG. 2. Learning with state preparation and read-out restricted to a probe qubit, and local unitary control over the remaining system.(a) Results from SVM regression for learning the distance, d, in the spin geometry shown, with access to TOCs (red) or both TOCs and OTOCs (blue).Color bars (black ticks) denote 75% (100%) percentiles of predictions on 200 disorder realizations, and grey step function represents the actual d.(b) Fisher information, FI(J d |C), of an interaction, J d (top; red line), a distance d away from the probe (top; purple circle), maximized over all correlators, C, in an L-qubit 1D chain.The FI decays exponentially in d when C is time-ordered (red), and algebraically, ∼ 1/d, when C is out-of-time-order (blue).

J
t e x i t s h a 1 _ b a s e 6 4 = " o 6 y 5 d o f s c A s H P A k T U 7 P u b e 9 9 / i C U a U d 5 9 u y 9 / Y P D o 9 y + c L x y e n Z e b F U 7 q o

FIG. 4 .
FIG.4.Learning as a function of experimental error, in the "weak interaction" learning task of Fig.3(a).(a) Accuracy of binary SVM classification as in Fig.3(a), now with a coupling g, to an extrinsic cavity mode that is not time-reversed (cavity frequency ω = 1.7).Despite imperfect time-reversal, learning via OTOCs continues to provide an advantage up to large spin-cavity couplings g ∼ 0.5.(b) The minimum link strength J * classifiable with > 90% accuracy as a function of readout error δ, obtained by repeating Fig.3(a) for each δ.The minimum link strength in general decreases with decreasing δ; for learning via TOCs, this decrease plateaus for δ 0.1%, indicating that learning below this value is not limited by read-out error.

FIG. 6 .
FIG. 6. Depiction of the extrapolation method used to calculate the maximum Fisher information over time-ordered correlators [Fig.2(b)].Each correlation function is computed for 25 Haar-random values of the state |ψ [Eq.(A3)].For each value of N ψ between 1 and 25, we choose a random subset of N ψ values of |ψ and compute the average correlation function over the subset.(Left)For each value of N ψ , we then compute the maximum Fisher information over all correlation functions, max FI(N ψ ) (solid red lines, darker lines corresponds to higher N ψ ).(Right) Our estimate of the maximum Fisher information at infinite temperature (dotted lines, both plots) is obtained by fitting max FI(N ψ ) = max FI(∞) + A/N ψ and taking N ψ → ∞ (points denote data, solid lines denote 1/N ψ -fit).
) which decays algebraically, with an additional factor of d compared to the local unitary control scenario.Before moving on, we briefly summarize the intuition behind the two above estimates.In both cases, an O(1) perturbation in a local coupling strength produces an O(1) shift in the location of the OTOC wavefront.With local control, this shift produces an O(1/ √ d) change in the OTOC, since the OTOC wavefront is spread across a width ∼ √ d by the time it reaches the coupling.With global control, this produces an O(1/d) change in the OTOC, since the global OTOC depends on the average of ∼ d individual coupling strengths.Since the Fisher information involves the square of the OTOC derivative, these lead to an O(1/d) and O(1/d 2 ) Fisher information, respectively.
TOC ) exp(−2γx/v B ), (B14) as observed numerically in Fig. 2(b).c.Fisher information of TOCs in presence of conserved quantities J d , the conserved quantity must have spread to at least distance d.At such a distance the magnitude of the autocorrelation function is O(1/d), since the conserved quantity has spread over ∼ d sites.In addition, the derivative with respect to an individual coupling strength is suppressed by an additional factor O(1/d), since the auto-correlator depends only on the average (inverse) coupling strength over ∼ d sites.Combining these two factors and squaring leads to an O(1/d 4 ) Fisher information.
t e x i t s h a 1 _ b a s e 6 4 = " 8 + A 4 I b 9 8 f F p j y 6 O w C 6 r u k C s P / b k 4 n / e v F 0 o R h R W 9 + n v s j d / Y g H e U k s 0 f b M e v w w T F k Y J 5 q G Z D P y O O F I R 2 g V I g q Y p E T z h S G Y S G a 2 R m S K J S b a R L 3 1 U j B n s c r 2 f 9 8 E U K m Y w N 1 8 v L u k c 2 u 7 d 7 b z 2 m y 0 n r L o S 3 A B l 3 A N L t x D C 1 6 g D R 4 Q Y P A B n / B V + L H A K l r l T a l V y H r O Y Q t W / R e F b b V u < / l a t e x i t > 0.4/d < l a t e x i t s h a 1 _ b a s e 6 4 = " 5 e S g f 2 b l J J 7 v f o I U j Y b / U y T J 1 5 g = " > A A A C X 3 i c b V D L T g I x F C 3 j A 8 Q X 6 M q 4 a S Q m u H A y 4 y O 6 M S G 6 c Y m J P B J m Q j q d D j S 0 M 0 3 b I Z A J v + F W f 8 u l f 2 K B W Q h 4 k i a n 5 9 z b 3 n s C w a j S j v N d s L a 2 d 3 a L p b 3 y / s H h 0 X G l e t J W S S o x a e G E J b I b I E U Y j U l L U 8 1 I V 0 i C e M B I J x i 9 z P 3 O m E h F k / h d T w X x O R r E N K I Y a S N 5 H p m I + r V r 3 8 L w q l + p O b a z A N w k b k 5 q I E e z X y 0 8 e W G C U 0 5 i j R l S q u c 6 Q v s Z k p p i R m Z l L 1 V E I D x C A 9 I z N E a c K D 9 b D D 2 D l 0 Y J Y Z R I c 2 I N F + r f j g x x p a Y 8 M J U c 6 a F a 9 + b i v 5 4 Y T h X F a u X 7 L O B r 9 y B h 4 b r E Z 3 B 1 Z h 0 9 + h m N R a p J j J c j R y m D O o H z L G F I J c G a T Q 1 B W F K z N c R D J B H W J v G V l 8 I x F S r f f 7 I M o F w 2 g b v r 8 W 6 S 9 o 3 t 3 t v O 2 1 2 t 8 Z x H X w L n 4 A L U g Q s e Q A O 8 g i Z o A Q w E + A C f 4 K v w Y x W t I 6 u y L L U K e c 8 p W I F 1 9 g u s l 7 f M < / l a t e x i t > exp( 1.3d) < l a t e x i t s h a 1 _ b a s e 6 4 = " h G 4 r o E J Z p 2 m n a B r o 3 i o y C S 5 o Q a r d S / H R k W S q X C M 5 U C 6 6 n a 9 p b i v 1 4 8 T R U j a u P 7 z B N b d y / i / r Y k F m h z Z h 0 0 R x k L 4 0 T T k K x H D h K O d I S W I S K f S U o 0 T w 3 B R D K z N S J T L D H R J u q N l / w 5 i 1 W + / / s 6 g H L Z B O 5 s x 7 t L u r c N 5 7 5 h v 9 7 V 2 4 9 5 9 C W 4 h C u 4 A Q c e o A 3 P 0 A E X C D D 4 g E / 4 K v x Y Y B W t o 3 W p V c h 7 L m A D V u 0 X P / a 1 S g = = < / l a t e x i t > L = 8 L = L = 10 < l a t e x i t s h a 1 _ b a s e 6 4 = " h G 4 r o E J Z p

8 L = 12 FIG. 7 .
FIG. 7.Learning in the restricted access scenario under Hamiltonian evolution.Numerical simulations are performed identically to Fig.2but now with Hamiltonian evolution under (Hc + H f )/2 instead of Floquet evolution.In both (a) the learning task and (b) the Fisher information, the results for learning Hamiltonian dynamics are qualitatively similar to the results for learning Floquet dynamics [Fig.2].In (b), the maximum Fisher information is averaged over disorder realizations for both TOCs and OTOCs.At large d, we expect the Fisher information for Hamiltonian evolution to approach a power law decay ∼ 1/d 4 (see Appendix B), but this cannot be observed in our finite-size numerics.

FIG. 8 .
FIG. 8. (a)The three spin geometries considered in the learning task defined in the text.Each geometry consists of L = 14 spins.The probe qubit (purple) is located along a subset of the system that is identical between the three geometries up to a distance d away from the probe.(b) Accuracy of classification, using correlation functions that can be measured with (left) state preparation and read-out on the probe qubit and global unitary control over the remaining system, and (right) global state preparation, unitary control, and read-out.For the former, the accuracy is plotted as a function of the distance d of the probe qubit from the geometric feature of interest.In both scenarios, access to OTOCs (blue) substantially improves the classification accuracy compare solely accessing TOCs (red).

TABLE I .
Maximum Fisher information in restricted access scenarios