Quantum scrambling with classical shadows

Quantum dynamics is of fundamental interest and has implications in quantum information processing. The four-point out-of-time-ordered correlator (OTOC) is traditionally used to quantify quantum information scrambling under many-body dynamics. Due to the OTOC's unusual time ordering, its measurement is challenging. We propose higher-point OTOCs to reveal early-time scrambling behavior, and present protocols to measure any higher-point OTOC using the shadow estimation method. The protocols circumvent the need for time-reversal evolution and ancillary control. They can be implemented in near-term quantum devices with single-qubit readout.


I. INTRODUCTION
Quantum scrambling describes the delocalization of quantum information in quantum chaotic systems [1,2]. Much of the interest in scrambling derives from the study of black holes-the fastest scramblers in nature [2][3][4][5]. The holographic duality permits the investigation of black hole scrambling via the probing of certain models in condensed matter physics, like the Sachdev-Ye-Kitaev model [6,7]. Scrambling can be studied by probing the four-point outof-time-ordered correlator (OTOC) [8][9][10][11]. These correlators can be used to quantify chaos in many-body systems ranging from a non-integrable Ising model [12] to the Dicke model [13][14][15]. For fast scrambling systems [6,7,16,17], this correlator decays exponentially within the scrambling time, according to a Lyapunov exponent. Measuring OTOCs in this regime can be used to investigate models with holographic duals. OTOCs can also be used to describe slow scrambling due to a small Lyapunov exponent, by probing many-body localized systems [18][19][20][21][22][23].
The time ordering of the OTOC makes its measurement tedious. Nevertheless, there are experimental protocols to measure the four-point OTOC. Protocols based on time reversal evolution have been demonstrated [24,25]. The OTOC has also been measured using a nuclear magnetic resonance quantum simulator [26]. Recent investigations of scrambling have searched for measurement protocols that circumvent the need for time reversal. To distinguish between scrambling and decoherence, a teleportation protocol that measures the OTOC in the large-time limit has been developed in Ref. [27] and demonstrated in Ref. [28]. A method based on statistical correlations computes the four-point OTOC in terms of experimentally-friendly correlators [29]; it has been demonstrated in Ref. [30]. Although quantum chaos is often studied through the four-point OTOC, it is suspected that scrambling is sensitive to higherpoint correlators [28,31,32]. Thus, one desires a protocol to measure higher-point OTOCs, especially one without time reversal or ancillary control operations.
We present protocols to measure any higher-point out-of-time-ordered correlator using classical shadows [33,34], which is an efficient scheme recently proposed to predict functions of quantum states. While nonlinear functions are often computed by preparing multiple copies of a state [35][36][37][38][39], classical shadows allow us to bypass this preparation by measuring a single copy of the state. Our protocols avoid time reversal and can probe OTOCs at any time. While ancillary qubits are integrated, we do not require they exert an interaction on the system [40]. We find that the eight-point OTOC reveals early-time information delocalization not present in the four-point OTOC, making it a promising candidate to probe scrambling dynamics.
We provide a statistical error analysis to show that our protocols are more efficient than brute force tomography. Since our protocols express OTOCs in terms of nonlinear functions of a state, we give a refined variance analysis on nonlinear functions which rely on prior knowledge of the target state, establishing a tighter bound than previous works [34,41]-a result which is of independent interest. We also numerically simulate our protocols in a non-integrable, mixed-field Ising model and show that they can be implemented by current experimental platforms containing a moderate number of qubits.

A. Higher-point correlators
Scrambling is often quantified through operator spreading, described as follows. Consider a quantum many-body system consisting of N qubits that evolves by a chaotic Hamiltonian, H. Let W and V be local, unitary operators on Hilbert space H d of dimension d = 2 N . 'Local' refers to unitaries which act on different qubits. Assume W and V commute at time t = 0. The Heisenberg operator evolves with U H (t) = e −iHt ( = 1). The quantum butterfly effect [5,42] states that [W (t), V ] grows as W (t) delocalizes, i.e. spreads throughout the system. Intuitively, W (t) eventually acts non-trivially at the location of V , at which point the commutator no longer vanishes. The size of the commutator can therefore be used to quantify the spreading of W (t) and hence measure scrambling.
To measure the growth of the commutator, it is common to take its norm [43]. The Hilbert-Schmidt norm is most often employed [9][10][11], To measure this quantity, we interpret the trace as (up to a normalization) the expectation value of |[W (t), V ]| 2 over the infinite temperature thermal state ρ ∞ = 1 d I N . By adopting the notation · ≡ Tr{ρ ∞ ·}, the expectation value is where the four-point out-of-time-ordered correlator is defined as Scrambling causes this correlator to decay to near zero. Although C 4 (t) has been analyzed extensively in the literature, it does not describe the complete evolution of the commutator. Higher-point correlators are necessary to reveal new, early-time scrambling.
To extract higher-point correlators, we measure the commutator growth using the Schatten 2n-norm for positive integer n [44], , V ]| 2n , the Schatten 2n-norm can be expressed in terms of a linear combination of higher-point correlators for some coefficients b k (n). The 4k-point OTOC, for a non-negative integer k, is defined as We propose a physical interpretation of these correlators in Appendix A.
The paper is organized as follows. In Sec. II, we adopt a global random ensemble to measure the eight-point correlator C 8 (t), which reveals scrambling earlier than C 4 (t). In Sec. III, we propose three complementary protocols based on shadow tomography, only requiring random Pauli measurements on individual qubits. We give analytical variance upper bounds for a sample protocol in Sec. IV and show the protocol is more efficient than direct tomography. In Sec. V we present numerical simulations for the predicted and estimated OTOC values. We give conclusions and provide an outlook in Sec. VI.

II. GLOBAL PROTOCOL FOR EIGHT-POINT OTOC
We motivate our search for a novel protocol to measure higher-point correlators by first extending the four-point OTOC protocol developed in [29] to evaluate C 8 (t). The protocol implements global random unitaries and demonstrates new scrambling dynamics in C 8 (t).
Assume W and V are unitary, traceless, and Hermitian operators on H d . Defining operators A 1 = W (t) and A 2 = V W (t)V , the four-point and eight-point OTOCs are A 1 and A 2 are also unitary, traceless, and Hermitian operators. Let U be a unitary on H d randomly sampled from the Haar measure on the unitary group. Define the notation for the integral over the Haar measure as (· · · ) = Haar dU (· · · ). Define the expectation value over the pure state ρ 0 ∈ H d as · ρ0 = Tr{ρ 0 ·}. One can prove that (see Appendix D) for any pure state ρ 0 , even, for example, |0 ⊗N . In terms of W (t) and V , the OTOCs are Eq. (12) reveals C 8 (t) contains dynamics not captured by C 4 (t). Its third term is constant and does not affect the dynamics. The second term depends only on C 4 (t). The 'hidden' dynamics arise due to the new term To evaluate the OTOCs, we measure U † W (t)U ρ0 and U † V † W (t)V U ρ0 with global random unitaries as follows.
1. Randomly sample a unitary U from the Haar measure on the unitary group.
2. Prepare pure state ρ 0 and evolve with U . Then evolve with U H (t). Measure W .
3. Repeat step 2 many times to compute the expectation value U † W (t)U ρ0 . 4. Prepare ρ 0 and evolve with U . Apply V , then evolve with U H (t). Measure W .
5. Repeat step 4 many times to compute the expectation value U † V † W (t)V U ρ0 .
Although a sub-ensemble can be substituted in for the Haar measure on the unitary group via unitary t-design [32,59], an ensemble forming a 4-design is needed to measure C 8 (t). That is, one must apply a more random (chaotic) unitary ensemble to access higher-point scrambling features. Note that the Clifford group forms a 3-design [60][61][62], but not a 4-design [61]. One can generate an approximate t-design through a random local circuit [63], in particular, by inserting few T gates into Clifford circuits [64]. Since generating global random unitaries is experimentally challenging, Ref. [29] adapts its global protocol to local unitaries, greatly simplifying the measurement procedure. However, higherpoint correlators are difficult to evaluate using this local protocol, due to the lack of degrees of freedom needed to construct larger permutation operators. This kind of no-go result is also observed in the measurement of the higher moments of the density matrix [51].
The global protocol serves as a proof of principle that higher-point OTOCs contain new scrambling dynamics not present in C 4 (t). This motivates the development of protocols based on classical shadows [34] in the following section, which only utilize single-qubit, random Clifford unitaries.

III. CLASSICAL SHADOW PROTOCOLS
We present three protocols to estimate C 4k (t) using classical shadows generated through quantum shadow tomography [33,34]. We summarize the essentials of shadow tomography, but refer to [34] for further details.

A. Shadow Tomography
The classical shadow protocols rely on the prediction of functions of an N -qubit state ρ. First, define a random unitary as a tensor product of local unitaries, Each single-qubit unitary u i is drawn randomly and independently from the Clifford group. Evolve ρ to U ρU † . Upon measurement in the computational basis, the state collapses to |b b | by Born's rule, where |b ∈ {0, 1} ⊗N is an N -bit random variable. That is, the state is randomly measured in the Pauli basis, similar to traditional tomography. In shadow tomography, however, one is interested in estimating the properties of the state, not in reconstructing it. After measurement, the outcome is stored and processed classically. Then, we classically compute U † |b b | U and apply the inverted channel − Tr(·)I and I is the identity on a single qubit. The result is a classical snapshot of ρ,ρ The classical snapshot satisfies E(ρ(U ;b)) = ρ, where E is the average of the outcomes and of the unitaries over the local Clifford group. More formally, As a result, an observable O can be estimated using the snapshot through Tr{Oρ} with E(Tr{Oρ}) = Tr{OE(ρ)} = Tr{Oρ}. To reduce the statistical error of the estimation, one can repeat this process to generate K independent classical snapshots. The classical shadow of ρ is defined as the set of these snapshots Each snapshot corresponds to a new measurement. K is referred to as the size of the shadow. The shadow can also be used to estimate nonlinear functions of ρ, which are encountered in our measurement protocols. For example, a second-order function f 2 can be written as f 2 (ρ) = Tr{O ρ ⊗2 } for some observable O on H ⊗2 d . This can be estimated using two distinct snapshots: Tr{O ρ ⊗ρ }.

B. Multi-Bell state protocol
We construct a protocol to measure C 4k (t) by preparing multiple Bell states. Since C 4k (t) is a function of the evolution unitary U H (t), we introduce Bell states to 'store' U H (t) [65,66]. Consider a 2N -qubit system. Let ρ Bell = |Φ Φ| be the maximally entangled state, where and |i ∈ H d . State |Φ consists of N Bell states. To simplify computations, we introduce a graphical calculus [67] and refer to Appendix B for its complete description. With some stylistic adaptation from Ref. [68], ρ Bell is expressed as Evolving ρ Bell with the unitary channel associated with U H ⊗ I N , where I N is the identity on N qubits, the resulting state and its diagram are, respectively, The time dependence of ρ H (t) is suppressed for conciseness. State ρ H is actually the channel-state duality for U H (t). Now write C 4k (t) in terms of ρ H . First, express the correlator diagrammatically, 'Slide' U † H leftwards and use the implied periodic boundary conditions. Introduce the following identity relating A 1 to its transpose, for A 1 = V, V † . The correlator is now drawn as The dashed rectangle in Eq. (24) is the cyclic permutation operator over even indexes, T (4k,..., 4,2) . The correlator is now where O 4k = T (4k,...,4,2) (W † ⊗ V * ⊗ W ⊗ V T ) ⊗k . O 4k has no time dependence; all time dependence is stored in ρ H . C 4k (t) can therefore be estimated by performing shadow tomography on ρ H . One can also define a general 4k-point correlator as an expectation value over an arbitrary state ρ, This can also be expressed in terms of ρ H , FIG. 1. Quantum circuit to generate a classical snapshot for the multi-Bell state protocol. N Bell states are prepared. The subsystem composed of one qubit from each Bell state is evolved by UH (t). Each time this circuit runs, ui is randomly sampled anew from the Clifford group. A classical shadow is generated by running this circuit K times.
The appearance of ρ T may seem alarming, since the transpose is not a physical operation. However, since only ρ H is measured, ρ T is treated as an ordinary operator. This correlator enables an analysis of scrambling beyond the high temperature thermal state. Although our protocol can measure this general correlator, we focus on the maximally mixed state. We use shadow tomography to construct an estimator for C 4k (t). First, construct a classical shadow of size K ≥ 2k for state ρ H , We suppress the arguments of each snapshot,ρ Using the classical shadow, construct an unbiased estimator for C 4k (t) through the U-statistic [41,69] The sum is carried out over all size 2k permutations of classical snapshots in S H . Each term in the sum is a function of 2k independent snapshots. Unitary U ij and outcomeb ij of snapshotρ are independent of the unitaries and outcomes of any other snapshotρ H . When averagingĈ 4k (t) over all Clifford unitaries and outcomes, eachρ Since all snapshots satisfy E ρ We summarize the multi-Bell state protocol (see Fig. 1) to estimate C 4k (t): 1. Prepare N Bell states.
2. Evolve the subsystem consisting of one qubit from each Bell state with U H (t).
3. Create a classical shadow of size of K ≥ 2k for this state.
The initial state can be readily prepared. For instance, experiments with Rydberg-atom qubits and ultra cold-atoms have demonstrated high-fidelity control of many pairs of Bell states [70,71]. The multi-Bell state protocol carries the advantage that no additional assumptions aside from unitarity and locality are made about W or V . Furthermore, single-qubit Clifford unitaries can readily be implemented in experiments. The protocol can also be extended to measure a general correlator for an arbitrary state. Although the OTOC corresponds to an expectation value over an N -qubit state, this protocol requires a measurement of 2N qubits. For systems limited in size, a protocol requiring a measurement of only N qubits without a preparation of Bell states is favorable. We develop such a protocol in the next section.

C. Mixed state protocol
We introduce a protocol to estimate C 4k (t) requiring a measurement on only N qubits, by evolving the operator V with some corresponding initial state. Using the cyclic property of the trace, the correlator can be written as Set W and V , for example, to Writing Z = 2 |0 0| − I and introducing the initial state such that V = dρ in − I ⊗N . Defining the time-evolved state and using the expression for V , the correlator becomes Shadow tomography can be used to construct an estimator for C 4k (t) for any k. For demonstration, we estimate the four-point and eight-point OTOCs. Evaluating Eq. (36) at k = 1, By creating a shadow of size K ≥ 2 for state ρ V , an unbiased estimator of C 4 (t) can be constructed through the U-statistiĉ The sum is taken over all size 2 permutations of snapshots in S V . Eq. (36) evaluated at k = 2 yields the eight-point OTOC, Noting ρ 2 V = 2 d ρ V , the correlator simplifies to with the leading-order term which determines the scrambling dynamics not captured by the four-point correlator. Using shadow S V of size K ≥ 4, the unbiased estimator for the leading-order term iŝ The sum is over all size 4 permutations of snapshots in S V . Thus, the unbiased estimator for the eight-point correlator isĈ whereĈ 4 (t) is given by Eq. (39). The mixed state protocol (see Fig. 2) to estimate C 4k (t) is as follows: where qubit N is in |0 0| and the remaining qubits are in the maximally mixed state.

Evolve with U H (t).
3. Create a shadow of size K ≥ 2k for the state.
The advantage of the mixed state protocol over the multi-Bell state protocol is a measurement of only N qubits is required and preparation of EPR pairs is avoided. Initial state ρ in can be prepared on a nuclear magnetic resonance quantum simulator [26].

D. Single Bell state protocol
As a hybrid of the two protocols developed previously, we construct a protocol which introduces one Bell state by using just one ancillary qubit. The advantage of this protocol is that the operator V is not predetermined by the initial input state as in Sec. III C, but is selected in the final classical post-processing stage.
For simplicity, we first compute the estimator for C 4 (t), then generalize the result to C 4k (t). For instance, by taking W = Z 1 and V = X N , which both act on H d , the four-point correlator in Eq. (32) is Introduce a diagram for the correlator where each tensor leg now represents an index for the single-qubit Hilbert space A slash mark on a leg denotes all remaining qubits. The diagram with periodic boundary conditions shows To simplify notation, define B ≡ Z 1 X T a1 with Pauli Z on system qubit 1 and X T on an ancillary qubit denoted by a 1 . The correlator is redrawn as To clarify, the slash marks in Eq. (47) and (48) represent N − 2 and N − 1 system qubits, respectively. Define ρ N,a1 as the initial state where N -th system qubit forms a Bell state with the ancillary qubit. The remaining N − 1 system qubits are in the maximally mixed state. The small dotted rectangle in Eq. (48) represents ρ N,a1 up to a normalization factor d. The large dotted rectangle represents the permutation operator N l=1 T (l,N +1+l) , which is a product of swap operators. Define the time-evolved state where I a1 is the identity on the ancillary qubit. The correlator is To construct an estimator for C 4 (t), create a shadow of size K ≥ 2 for state ρ H,N,a1 : FIG. 3. Quantum circuit which generates a classical snapshot for the single Bell state protocol. The initial state is ρN,a 1 , where system qubit N forms a Bell state with an ancillary qubit, while the remaining qubits are in the maximally mixed state. Unitary UH (t) evolves the N system qubits. Then local random Clifford unitaries ui act on each qubit, followed by a measurement.
Use this shadow to computê This procedure can be generalized to construct an estimator for C 4k (t) for k ≥ 2, The sum is over all size 2k permutations of snapshots in S SB . The single Bell state protocol (see Fig. 3) to estimate C 4k (t) is: 1. Prepare N system qubits and 1 ancillary qubit. Create a Bell state between system qubit N and the ancillary qubit. Prepare the remaining system qubits in the maximally mixed state.
2. Evolve the system qubits with U H (t).
3. Construct a shadow of size K ≥ 2k for the state.
The single Bell state protocol carries the advantage that the OTOCs are given by a single trace, rather than a sum of traces as in the mixed state protocol. This feature makes the single Bell state protocol more appropriate for the commutator type correlators discussed in Appendix E 3. This protocol is ideal for systems with a limited number of qubits, since it only introduces one ancillary qubit. Similar to the multi-Bell state protocol, a manipulation of diagrams can easily yield an estimator for a general 4k-point correlator over an arbitrary state. The same classical shadow can be used to compute OTOCs for different choices of W and V , allowing for the calculation of multiple OTOCs with the same batch of measurements.

IV. STATISTICAL ERROR ANALYSIS
We analyze the statistical error for estimators due to the finite shadow size K. We focus on the mixed state protocol from Sec. III C for its experimental practicality. We state the following results on the variance in shadow tomography: whereρ is a snapshot of ρ.
where W is any Pauli operator andρ,ρ are two distinct snapshots of ρ.
The proof of Lemma 1 is left in Appendix F 1. The lemma shows a significant enhancement when compared with using Fact 1 on the doubled Hilbert space, which yields d 2 Tr{O 2 2 } = d 2 Tr{T 2 (1,2) } = d 4 . Lemma 1 is also suitable for O 2 = T (1,2) , which can tighten the variance for the purity measurement performed in [34,41]. The improvement relies on the prior knowledge that the target state here is in the tensor product form ρ ⊗ ρ. We believe a modification of Lemma 1 with an appropriate O on a few-copy state can enhance shadow tomography in other scenarios.

A. Variance of four-point OTOC
We compute the variance ofĈ 4 (t) in Eq. (39). The summation index constraint i 1 = i 2 inĈ 4 (t) can be changed to i 1 < i 2 by symmetrizing the observable as followŝ . The third line is due to T (1,2) commuting with W ⊗2 . To simplify notation, we suppress all time dependence in this section. The variance ofĈ 4 satisfies with Var i1<i2D Define D 4 = E[D 4 (i 1 , i 2 )] for any i 1 , i 2 and V 4 (i 1 , i 2 , j 1 , j 2 ) = E D 4 (i 1 , i 2 )D 4 (j 1 , j 2 ) . V 4 depends on the coincidences between indices i 1 , i 2 and j 1 , j 2 , respectively. A coincidence indicates that the twoD 4 in the second line of Eq. (58) share the same snapshot, and thus are not independent. There are three possible coincidence cases of the indices discussed as follows: • No coincidence: V 4 = E D 4 (i 1 , i 2 ) E D 4 (j 1 , j 2 ) = D 2 4 since all snapshots are independent.
• One coincidence: take for example i 1 = j 1 and i 2 = j 2 . There are a total of 2 K 1 K−1 2 such terms. We simplify In the second line, we take the expectation ofρ using Fact 1.
• Two coincidences: i 1 = j 1 and i 2 = j 2 . There are K 2 such terms in total. We can write Inserting the above three cases into Eq. (58), In the third line, we apply Fact 1 and Lemma 1 to the variances respectively. The fourth line is due to Tr{O 2 1 } = Tr{ρ 2 V } = 2 d . By using Chebyshev's inequality, one arrives at − 1 under confidence level δ and error , the shadow size K satisfing Remark 1. One may apply full quantum state tomography [72,73] on ρ V to calculate C 4 = dTr{T (1,2) W ⊗2 ρ ⊗2 V } − 1 directly. To make the error of C 4 less than , the error on the state ρ V should be around = /d. As a result, the necessary number of measurements on ρ V with quantum tomography is K = Ω(d 2 / 2 ) = Ω(d 4 / 2 ) . Even with the optimistic estimation by taking the trace distance comparable to the infidelity, K = Ω(d 2 / ) = Ω(d 3 / ), which is still worse than Eq. (62). Furthermore, if one is restricted to independent measurements on a single copy of ρ V (like in the classical shadow protocols), the scaling worsens: K = Ω(d 3 / 2 ) = Ω(d 5 / 2 ).
Similar to the analysis ofĈ 4 in Sec. IV A, the variance Var( depends on the coincidences of the indices. For simplicity, we consider the leading contribution to Var(Ĉ 8 ), namely, the variance of the first termL 8 : The summation over i labels the summation over i 1 < i 2 < i 3 < i 4 , and similarly for j. Define where D 4k = Tr{(W ρ V ) 2k }. In the early-time limit, the bound becomes The variance behaves like O(d 14 /K 4 ) and still outperforms full quantum state tomography (see Appendix F 2). We apply Fact 1 in the proof of Proposition 2, but expect an improvement for large coincidence cases by using similar techniques as Lemma 1.

V. NUMERICAL SIMULATIONS
We present simulations of predicted and estimated OTOCs using an N -qubit mixed-field Ising spin chain with Hamiltonian The parameters are J = 1, h x = 1.05, h z = 0.5, and E 0 = 4J 2 + 2h 2 x + 2h 2 z . X i and Z i are Pauli operators on the i-th qubit. This non-integrable model has been shown to exhibit chaotic behavior [12].
We simulate the evolution of C 4k (t) = (W † (t)V † W (t)V ) k , where W = Z 1 and V = Z N , for the first three OTOCs (k = 1, 2, 3) and various N in Fig. 4. In the early-time limit, all OTOCs experience an initial steep decay. As the qubit number increases, the time taken for any OTOC to exhibit this decay increases. Since the separation between W and V is larger, W takes longer to spread to the support of V , delaying the growth of [W (t), V ]. Intuitively, since all OTOCs are related to a Schatten 2n-norm of this commutator, the decay of every OTOC is delayed. Referring to the N = 8 and N = 10 cases of Fig. 4, all OTOCs decay to the same value in the large-time limit, reflecting the system's degree of scrambling. This point is discussed further in Appendix E 2. These three characteristics: an initial steep decay, the delay in decay for larger qubit systems, and the large-time decay to a floor value (for sufficiently large systems) are well-known features of the famous four-point OTOC for chaotic systems. Higher-point OTOCs also display these characteristics, making them reliable stand-alone quantities to study scrambling.
Studying the early-time behavior of OTOCs sheds light on the initial rate of information delocalization [74][75][76]. In Fig. 5, we numerically simulate the eight-point correlator's leading-order term L 8 (t) from Eq. (42) for various Ising spin-chain lengths. The early-time dynamics of this term contribute to the faster initial decay of the eight-point correlator relative to the four-point OTOC in Fig. 4. This is consistent with the discussion in Appendix A in which the multiple perturbations found in higher-point OTOCs are expected to result in faster information delocalization. In Fig. 5, we also simulate the measurement ofL 8 (t) from Eq. (43) with the mixed state protocol. To reduce the post-processing computations, we run the protocol with a fixed shadow size K several times and average the results. In Fig. 6, the predicted four-point OTOC is plotted against its estimator from Eq. (39) using the mixed state protocol for various shadow sizes. As the shadow size increases, agreement between the two improves.

VI. CONCLUSION
We present a definition of higher-point out-of-time-ordered correlators to describe the dynamics of quantum scrambling in chaotic systems. In the early-time limit, higher-point correlators exhibit faster decay and reveal information delocalization earlier than the four-point OTOC. We present protocols using classical shadows to estimate these correlators and show they can outperform full quantum state tomography. For sufficiently many measurements, good agreement between the predicted and estimated values can be achieved. The protocols avoid time reversal and can probe the dynamics of OTOCs at any time. They can be implemented using single-qubit Pauli measurements, making them ideal for experiments with single-qubit control. The protocols here extract nonlinear functions by measuring only a single-copy of a target state, thus avoiding the preparation of multiple identical copies and the joint control and readout on them. In addition, the same classical shadow can be used to compute multiple correlators by classical post-processing. Furthermore, the protocols can be used to construct estimators for general correlators over an arbitrary initial state (W (t)V W (t)V ) k ρ , allowing for the study of OTOCs beyond the thermal state background. There are a few interesting points which merit further investigation. First, our protocols can be directly extended to the noisy evolution scenario, where the system dynamics are described by general quantum channels. Adopting noise mitigation methods [77,78] may be an intriguing approach to distinguish quantum scrambling from classical decoherence effects [79][80][81]. Second, the enhancement of the variance analysis based on prior knowledge of the state's tensor structure can be generalized to other scenarios, which can further improve the practicality of shadow tomography for nonlinear functions. Third, it is possible to extend the protocols to measure other quantities in quantum scrambling, such as the operator weight distribution [82,83]. Finally, it is important to explore the operational and physical implications of higher-point OTOCs, and demonstrate our protocols on near-term quantum platforms.

ACKNOWLEDGMENTS
We would like to thank Beni Yoshida for helpful discussions on non-commutator type correlators, Xun Gao for discussions on diagrammatic representations of quantum objects, and Pei Zeng for discussions on shadow tomography.  We propose a physical interpretation of higher-point OTOCs. For simplicity, we approximate the expectation value over the maximally mixed state as one over a Haar random state |ψ h [84,85]. The approximate four-point OTOC is In analogy to the Loschmidt echo [86,87], define the perturbed evolution unitary as U p (t) = U H (t)V . The state U p (t) |ψ h represents a small 'kick', V , perturbing |ψ h before evolution by U H (t). The approximate OTOC in terms of U p (t) isC The state |ψ 1 = U † H W † U H U † p W U p |ψ h is interpreted through following perturbation procedure (see Fig. 7). Evolve |ψ h by U p (t), then apply W . Evolve the state using U † p (t), which corresponds to backwards time evolution by U † H (t) = U H (−t) followed by V † . Evolve with U H (t), apply W † , and finally evolve backwards in time using U † H (t). The correlator is the overlap between the initial state |ψ h and the evolved state |ψ 1 . The role of W is to perturb the state and ensure U p (t) is not undone by U † p (t). V probes how chaotic U H (t) is. In the case where V is the identity, U p (t) = U H (t) and the evolution procedure evolves |ψ h away from and back to itself. No chaos is detected. For a nontrivial V , the perturbation procedure evolves the state away from |ψ h .C 4 (t) indicates the sensitivity of the evolution of U H (t) to perturbations.
Consider the approximate 4k-point OTOC, This correlator gives the overlap of |ψ h with the state resulting from k applications of the perturbation procedure to |ψ h . Each application evolves the state successively further from |ψ h . The higher-point correlators correspond to a larger number of perturbations to the initial state, which probe the chaos induced by U H (t) in finer detail. We return to the discussion of exact OTOCs. A higher-point OTOC C 4k (t) reveals the finer scrambling dynamics due to k perturbations by V . After each perturbation, evolution by U H (t) further delocalizes the quantum information of the initially local operator W . Higher-point correlators therefore scramble information more quickly, resulting in a faster OTOC decay. In a different context, an interpretation of higher-point correlators is given in terms of multiple shock waves [31].

Appendix B: Graphical calculus
We present a graphical calculus for matrices adapted from [67,68] and mention a related picture-language for quantum information [66,88]. An operator A 1 is drawn as a box with input and output legs The purpose of the coloring is solely to distinguish these diagrams from quantum circuits. By expanding A 1 in a basis, A 1 = i,j a 1 i,j |i j|, we let the left leg correspond to |i and the right leg correspond to j|. Unless otherwise stated, each leg index corresponds to a state in Hilbert space H d . The transpose of A 1 is represented by A product of two operators is drawn by connecting the output leg of one with the input leg of the other More formally, a connection between legs is an index contraction and hence represents matrix multiplication. The trace Tr{A 1 } = i a 1 i,i i| i is drawn by connecting the legs of A 1 A tensor product of operators is represented as a 'stacking' of boxes The trace of this tensor product is drawn by connecting the input and output legs at each level. For a permutation π, we may define the permutation operator T π through T π |a 1 ⊗ |a 2 ⊗ · · · ⊗ |a k = |a π(1) ⊗ |a π(2) ⊗ · · · ⊗ |a π(k) .
For concreteness, consider the permutation π = (1)(2, 3) and a tensor product of three states |a 1 ⊗ |a 2 ⊗ |a 3 . The corresponding permutation operator acts as follows Diagrammatically, this permutation operator is Diagrams for other permutation operators can be constructed in a similar manner. As an example, we compute the following trace The diagram for the Bell state |Φ = 1 |i ⊗ |i is defined as As an outer product, the state is The diagram for |Φ can be used to construct the useful identity This can be directly verified using the diagrammatic representation of A T 1 from Eq. (B2).

Appendix C: Background on random matrix theory
This section relates permutation operators to random matrices, which allow for the computation of OTOCs through random measurements. For a more complete discussion, see [32,89]. Let U be a unitary on Hilbert space H d . Define the operator where A i is an operator on H d . By the Schur-Weyl duality, if [A, U ⊗k ] = 0 for all unitaries U , then A can be written as a linear combination of permutations operators T π , The sum is carried out over S k , the set of all permutations of the set {1, 2, . . . , k}. Define the k-fold twirling channel of A as Φ (k) The integral is over the Haar measure dU on the unitary group. As the Haar measure is invariant under the action of the group, we write the twirling channel as a superposition of permutation operators Φ (k) where the coefficients C π,σ are known as the Weingarten matrix. They satisfy where f (π • σ) is the number of cycles of the composition of permutations π and σ.
The sum is carried out over D k , the set of derangements (permutations with no fixed points) in S k . The notation · ρ0 = Tr{ρ 0 ·} denotes the expectation value over a pure state ρ 0 ∈ H d . The notation (· · · ) denotes an integral over the Haar measure on the unitary group: (· · · ) = Haar (· · · )dU .
Proof. Consider the k-fold twirling channel on Introducing pure state ρ 0 , we compute the following trace so For any π, we can write where n 1 , ..., n k are some integers. Since ρ 0 is a pure state, Tr{ρ m 0 } = Tr{ρ 0 } = 1 for any positive integer m. This yields Tr{ρ ⊗k 0 T π } = 1. Our expression for the trace then becomes It can be shown that the Weingarten matrix satisfies The trace then becomes We now have two expressions for Tr{ρ ⊗k 0 Φ (k) Haar (A 1 ⊗ · · · ⊗ A k )}. Together, they yield We have not yet used the traceless property of A i , so the above equation is valid even for an A i with a non-vanishing trace. We can split the sum over S k into two sums D k is the set of derangements and F k is the set of permutations in S k containing at least one fixed point. For σ ∈ F k , the trace Tr{T σ A 1 ⊗ · · · ⊗ A k } introduces at least one trace of the form Tr{A i }, which vanishes since all A i are traceless. The sum over F k then vanishes and we may write This yields proving the fact.
The goal of this section is to write the correlators A 1 A 2 A 1 A 2 and A 1 A 2 (where all A i are traceless, unitary, and Hermitian) in terms of simpler correlators that can be measured experimentally. This is done by introducing random unitaries.
The correlator A 1 A 2 may be rewritten in terms of a permutation operator This equation can be proven diagrammatically: Using Fact 2 with k = 2 yields Note that permutation (1,2) is the only derangement in D 2 . The correlator then becomes The correlators on the right may be measured using the protocol discussed in Sec. II. We now express A 1 A 2 A 1 A 2 in terms of similar correlators. We begin by introducing a permutation operator The application of Fact 2 with k = 4 simplifies to The corresponding set of derangements is The sum over D 4 simplifies to Since each A i is Hermitian and unitary, A 2 i = I. This yields Tr{A 2 1 }Tr{A 2 2 } = d 2 and Tr{A 2 Equating this to Eq. (D6), Rewriting, The correlator then becomes All correlators on the right can be measured using the protocol in Sec. II.
where in the second line we write the equation over 2k copies of the original Hilbert space H d and move the integral inside the trace. In the final line we apply the Weingarten formula. We also use the permutations σ 0 = (1, 2, . . . , 2k) and σ = σ 0 • σ. Since any Pauli operator W satisfies W 2 = I ⊗N and Tr {W } = 0, the term Tr T π W ⊗2k simplifies to Tr T π W ⊗2k = 0 , if π has an odd cycle , if π has no odd cycles (E3) where f (π) is the number of cycles of the permutation π. For example, (12)(34) and (123)(4) both contain two cycles, but both cycles are odd in the latter case. This relation also holds for Tr V ⊗2k T σ . As a result, Eq. (E2) simplifies to C 4k = 1 d π,σ∈S 2k , π,σ =σ0•σ even cycles C π,σ d f (π)+f (σ ) . (E4) A more general formula is derived in Ref. [32].

Eight-point OTOC and (non)commutator types
Reference [32] has found that there are two general definitions of eight-point OTOCs that display distinct largetime behavior. For operators A, B, C, D, one can define the commutator type correlator as C ct = Ã BCDÃDCB and the non-commutator type correlator as C nc = Ã BCDÃBCD . DefineÃ = U † AU , where U is a Haar random unitary. The non-commutator type correlator attains a large-time value which scales as ∼ 1 d 2 . This occurs even if U is sampled from the Clifford ensemble. The four-point OTOC also scales as ∼ 1 d 2 , indicating that non-commutator type correlators saturate the same floor value as the four-point correlator and therefore cannot reveal any new scrambling information in the large-time limit. The correlator C 8 (t) = W (t)V W (t)V W (t)V W (t)V defined in our work is a non-commutator type correlator. Thus, it can only be used to study early-time scrambling dynamics.
The commutator type correlator scales as ∼ 1 d 4 at long times, allowing us to probe systems which display higherpoint scrambling in the large-time limit. By taking A = B and C = D, the commutator type correlator can be written as Prepare a shadow of size K ≥ 2 for ρ H,1,a1 and ρ H,N,a1 . These two shadows can be used to compute the estimator Tr (ρ The commutator protocol to estimate C ct (t) is: both have I. For example, for a 6-qubit system with P i = X 1 Y 2 X 3 I 4 I 5 I 6 and P j = X 1 Y 2 I 3 Z 4 I 5 I 6 , the functions are: a(i, j) = {1, 2}, b(i, j) = {3, 4} and c(i, j) = {5, 6}. We use |a(i, j)| to denote the number of elements in a(i, j) and omit the i, j indices when there is no ambiguity. P a i is the |a|-qubit operator restricted on subsystem a from P i . One needs not to consider the P i , P j pair with different Pauli operators acting on the same qubit, since they return f (i, j) = 0. As a result, the summation in Eq. (F2) can be transformed to the summation on all possible a, b, c with |a| + |b| + |c| = n. We use (i, j) (a, b, c) to denote that P i , P j satisfy the operator constraints on the given a, b, c subsystems. We write Here 3 2|a| accounts for the function f (i, j) 2 , and we use the fact that the qubit Pauli operators of P i , P j are the same in a and they both have I on c.
The 2 |b| in the first line is due to the two possibilities for P i or P j taking a Pauli or the identity on the qubit in b. The last inequality is due to the purity Tr{ρ 2 b } ≤ 1. Inserting Eq. (F4) into Eq. (F3), we get the upper bound for the variance where we use the fact Tr{ρP i W P j W } 2 = Tr{ρP i P j } 2 for any Pauli operator.

Proof of Proposition 2
Proof. We rewrite the variance ofL 8 from main text here.
where the summation over i labels the summation over i 1 < i 2 < i 3 < i 4 , and similarly for j. Similar to the analysis ofĈ 4 in Sec. IV A, the variance Var(L 8 ) depends on the coincidences of the indices. For simplicity of notation, we use T t to denote the shift operator T (1,2··· ,t) in the following discussion.
• No coincidence: V 8 = D 2 8 , since the snapshots are independent and one can evaluate the expectation value separately.
• One coincidence: there are a total of M 1 M −1 6 6 3 such terms. We have The second line is due to the six different permutation orders returning the same result. We can take 2 such terms. Due to the redundancy of the twirling channel Θ 4 , we only need to consider the average on the following three orders where where we use Tr{O 2 } = 2 d . • Four coincidences: this corresponds to i = j, with M 4 such terms where O 4 = Θ 4 (T 4 )W ⊗4 for the 4-copy state, and  It is clear that in the large d limit, the final term is dominating and the variance scales as O(d 14 /K 4 ). To suppress the error to , one needs K = O(d 3.5 / √ ). If one naively uses quantum state tomography to reconstruct ρ V and calculate L 8 = d 3 D 8 = d 3 Tr{(W ρ V ) 4 } within an error , one needs the precision of tomography to be about = /d 3 . Even with the optimistic estimation obtained by making the trace distance comparable to the infidelity, the necessary number of measurements in quantum tomography scales as K = Ω(d 2 / ) = Ω(d 5 / ). Furthermore, if one is restricted to independent measurements on a single-copy of ρ V (like in the classical shadow protocol here), the scaling gets worse.