Scrambling Dynamics and Out-of-Time Ordered Correlators in Quantum Many-Body Systems: a Tutorial

This tutorial article introduces the physics of quantum information scrambling in quantum many-body systems. The goals are to understand how to precisely quantify the spreading of quantum information and how causality emerges in complex quantum systems. We introduce a general framework to study the dynamics of quantum information, including detection and decoding. We show that the dynamics of quantum information is closely related to operator dynamics in the Heisenberg picture, and, under certain circumstances, can be precisely quantified by the so-called out-of-time ordered correlator~(OTOC). The general behavior of OTOC is discussed based on several toy models, including the Sachdev-Ye-Kitaev model, random circuit models, and Brownian models, in which OTOC is analytically tractable. We introduce numerical methods, including exact diagonalization and tensor network methods, to calculate OTOC for generic quantum many-body systems. We also survey current experimental schemes for measuring OTOC in various quantum simulators.

In recent years, there have been remarkable developments in laboratory platforms for studying quantum physics.These systems range from ultracold atoms, trapped ions, and superconducting qubits to universal quantum computers, providing exciting opportunities to study quantum many-body physics that was previously out of reach.One research frontier concerns the long-time coherent quantum many-body dynamics in closed systems, which has drawn extensive research interests from multiple communities, such as condensed matter physics, atomic, molecular, and optical physics, quantum information science, and high-energy physics.Synergistic experimental and theoretical research has revealed a series of discoveries in the arena of quantum dynamics.Moreover, these platforms expand the scope of traditional condensed matter physics and demand new tools and frameworks to study quantum many-body systems that are far from equilibrium (see [1] for a recent overview on quantum simulators and references therein).
In the simplest kind of quench experiment, one prepares an initial state, designs a Hamiltonian or unitary circuit to evolve the state, and measures the final state.While the freedom in the initial state and engineered dynamics largely depends on the specific experimental platform, this general class of experiments certainly raises questions regarding the general behavior of quantum many-body dynamics when the initial state is far from equilibrium.Consider a simple product initial state of qubits, with each qubit in |0 or |1 .Now, let the state evolve under a generic unitary operator.The general expectation is that the state will not remain a product state but will instead become a complicated superposition of product states.
One way to track the complexity is to monitor the buildup of entanglement in the state.Entanglement can be quantified using the tool of entanglement entropy.Given a subregion A of the system, one can obtain the density matrix by tracing out the complement Ā: The entanglement entropy of A is defined as: It is also straightforward to show that S(A) = S( Ā) when the total state is pure.In a quench experiment starting from a product state, S(A) will begin at zero and then grow over time.This growth indicates that the density matrix is becoming more mixed, which corresponds to an increasingly featureless state of subsystem A. Based on statistical considerations, we expect the density matrix to approach a maximally mixed state if the evolving unitary is generic.In other words, at late times, the subsystem thermalizes by entangling with its environment.If we consider a system that conserves energy instead of a generic evolution, the late-time density matrix is expected to approach a thermal state with a temperature determined by the initial state's average energy.
Once ρ A thermalizes, it only depends on macroscopic quantities such as energy or charge, while microscopic information about the initial state is apparently lost [2][3][4][5][6].Two orthogonal initial states with the same energy or charge density would thermalize to the same local density matrix, and so one would be unable to distinguish them locally.On the other hand, the two states remain orthogonal since the dynamics is unitary and are always distinguishable if global information about the two states is accessible.Thermalization and unitary dynamics suggest that the local features of the initial state become increasingly non-local under the dynamics: they flow into more and more non-local degrees of freedom and cannot be recovered by local probes.
To be more concrete, let us consider a system of interacting qubits.Alice owns a qubit and prepares a single-qubit state |a on her qubit to encode a secret message.Initially, Bob can recover Alice's state on one qubit through state tomography.However, as time passes, the qubits interact with each other, and Alice's qubit is no longer in the state |a .Instead, this state becomes shared among multiple qubits, making it more difficult for Bob to recover Alice's initial state |a , even in principle.This process of information flow from local to non-local degrees of freedom is known as quantum information scrambling.Initially studied in the context of black hole dynamics [7][8][9][10], quantum information scrambling has also been extended to general quantum many-body systems [11][12][13] and becomes a finer tool than thermalization to characterize non-equilibrium dynamics.
Quantum thermalization and scrambling are related but distinct concepts.Quantum thermalization describes how a local region of a quantum system loses its initial information under unitary dynamics, while quantum scrambling concerns how the "lost" information flows to non-local degrees of freedom after thermalization.These two processes are characterized by different time scales.The thermalization time of a local region is typically independent of the system size and is determined by the coupling energy scale.On the other hand, the scrambling time, which is roughly the time scale when the initial local information is fully shared among the system, typically depends on the system size and is thus longer than the thermalization time.
In this tutorial, we delve into the topic of scrambling dynamics in quantum many-body systems.We argue that, just as a piece of metal can be characterized by its transport properties associated with electrons, a generic quantum many-body system can be characterized by its transport properties related to quantum information.Our tutorial aims to provide a deeper discussion by building on prior perspectives from related articles [14,15] that offer basic intuition.
The discussion is centered around a quantum information perspective with the overall aim of clarifying in a concrete and widely accessible setting the precise way in which out-of-time-order correlators measure information dynamics.We focus mostly on qubit models, including random circuit models, which are widely studied in the quantum information and condensed matter communities, and, to a lesser extent, in the high energy physics and quantum gravity communities.The upshot of this approach is that we will be able to state very precisely the relationship between information dynamics and out-of-time-order correlators.The cost is that we must omit or be much more schematic about many topics, including semiclassical physics, field theory approaches, connections to black hole physics [9], and much else.One may reasonably conjecture that the basic connection between information dynamics and out-of-timeorder correlators extends to these settings, but a considerably greater background is required to define these models and properly formulate the notion of information dynamics in them (e.g., see [16,17] for a discussion of AdS/CFT models in the same spirit as this tutorial).
The rest of the tutorial is structured as follows: in Sec.II, we examine the fundamental setup of scrambling dynamics using an Alice-Bob communication protocol.In Sec.III, we explore how to quantify scrambling dynamics through entanglement entropy measures, offering basic insight through random unitary dynamics.In Sec.IV, we examine the Hayden-Preskill protocol, a specific instance of the general setup, and demonstrate that the scrambling dynamics can be quantified by the outof-time ordered correlator (OTOC).In Sec.V, we link scrambling and the OTOC to operator dynamics and provide an overview of OTOC in systems with few-body interactions.In Sec.V C, we delve into the behavior of the OTOC in systems with short-range interactions through several toy models.In Sec.VI, we survey numerical methods for calculating the OTOC in general systems.Finally, in Sec.VII, we examine various experimental approaches for measuring the OTOC.

II. BASIC SETUP OF SCRAMBLING DYNAMICS
We begin by specifying our prototype quantum manybody system.For the sake of concreteness, we will consider a system of N qubits.Each qubit has a basis that is spanned by |0 and |1 .On each two-level system, a complete basis of operators can be defined, which consists of the identity and the Pauli operators.These operators are represented by matrices as follows: (3) The total Hilbert space is the tensor product of local Hilbert spaces and has a dimension of 2 N .In some parts of this tutorial, we will generalize the situation to include qudits with a local Hilbert space dimension of q, or Majo-rana fermion systems.However, for now, let us continue to work with qubits.
The dynamics of a given initial state |ψ in the system is described by a unitary time evolution operator given by: where U (t) is currently an arbitrary family of unitary matrices that act on the total Hilbert space.However, we will assign more structure to U (t) later on.The simplest quench experiment involves selecting an initial state |ψ , choosing a dynamics U (t), and selecting a set of observables to measure in the final state U (t) |ψ .Throughout this tutorial, we often use tensor network diagrams to provide a visual representation of equations for clarity.We will now introduce the graphic notation below.In general, a node with legs represents a tensor which is a multidimensional array, and each leg represents an index of the tensor.For instance, with one leg represent a single-qubit state vector, with two legs represents a single-qubit operator or a matrix in general, and has four legs and thus represents a tensor such as a two-qubit operator.In particle, we use a line to represent the identity operator δ ab and a line with a dot for the normalized identity operator δ ab / √ d where d is the dimension of the index.It can also be interpreted as the EPR state, i.e., In a tensor network diagram, the nodes are connected together by joining their legs and summing over the shared indices.For example, (6) represents a matrix-vector multiplication or an operator acting on a state, and the result with one open leg is a vector.

A. Unitary dynamics as a classical communication protocol: the significance of commutators
We will first consider unitary dynamics as an intuitive classical communication protocol to illustrate some of the essential aspects of scrambling dynamics, and in the next section, we will reformulate it in full quantum terms.Consider the scenario illustrated in Fig. 1, where Alice owns one of the N qubits denoted by q A , which she has full control over.Bob owns a set of qubits denoted by B.
Starting with an initial state |ψ , Alice wishes to send a classical bit a ∈ 0, 1 to Bob.Depending on the value of a, Alice either flips her qubit by applying the σ x q A operator FIG. 1. Alice and Bob try to communicate through a strongly interacting system of N qubits.The time evolution of the system is described by a unitary operator U .Alice has full control of the first qubit qA and Bob has access to a set of qubits in the system, but not all of them.
to the state or does nothing.The system then evolves for a time t.Finally, Bob makes a measurement O B of his qubits to attempt to learn whether Alice flipped the spin or not.The expectation value of Bob's measurement given that Alice does not flip her qubit is Alice can also flip her qubit, and then the expectation value of Bob's measurement is where σ x (−t) = U σ x U † is a Heisenberg operator.The difference between the expectation values O B 0 and where we used the operator identity σ x (−t)σ x (−t) = I.
Let us pause to understand the physics.Whenever the difference is small, Alice and Bob need to run the experiment many times to see the difference and communicate as the measurement outcome of each run is probabilistic.In this case, very little information is transmitted from Alice to Bob per run of the experiment since what Bob measures is nearly independent of what Alice did.
Using the Cauchy-Schwarz inequality (we provide a list of useful inequalities in Appendix A 3), we can bound the difference by: where O ∞ is the operator norm defined as the square root of the largest eigenvalue of the positive Hermitian operator O † O.The first inequality is from Cauchy-Schwarz, and the second is from the operator norm's definition.(Definitions of various matrix norms of operators are given in Sec.A 1.) Therefore, the difference in Bob's measurement between Alice flipping her qubit or not is bounded by the operator norm of the commutator [σ x (−t), O B ].This statement is independent of the initial state |ψ that Alice and Bob choose as the medium to attempt to transmit the information.
The bound has a very intuitive picture.At t = 0, σ x only has support on Alice's qubit and does not overlap with Bob's qubits.Therefore, the commutator is zero initially, and no matter what Bob does to his qubits, he cannot tell whether Alice flips her qubit.He can do no better than random guessing when trying to determine Alice's bit.As t increases, σ x (−t) starts to grow as a Heisenberg operator, acting on more qubits.Whenever the supports of σ x (−t) and O B start to overlap, their commutator becomes nonzero, and Bob has a chance to tell whether Alice flipped her qubit.The operator norm of the commutator |[σ x q A (−t), O B ]| ∞ quantifies how quickly σq A x (−t) spreads in the system and starts to overlap with O B .If this operator norm is small, then the overlap is small, and Bob's measurement cannot distinguish between the two cases of Alice flipping her qubit or not.

B. Bound on commutators
A natural first question is whether fundamental bounds exist on the norm of the commutators given t and r, which is the separation between q A and B. There are many possible behaviors for the commutator of local operators in a quantum many-body system.For example, one might expect very different behavior between integrable, non-interacting and strongly interacting models, and between localized and delocalized models.
One is probably familiar with at least one such constraint, namely the limitation on communication imposed by the speed of light.In the modern language of quantum field theory, this is called microcausality.It states that given any two physical local operators W (x) and V (y) located at spacetime points x and y, their commutator must vanish if x and y are 'spacelike separated', In other words, if y is outside of the 'light cone' of spacetime point x, then the corresponding operators must exactly commute.Crucially, this is an operator statement and hence a state-independent bound on information propagation.It is a fundamental property of any unitary Lorentz invariant local quantum field theory.
There is somewhat analogous property for many lattice models which do not have relativistic causality built-in microscopically.As discussed earlier, the commutator is zero when O B is outside the support of the Heisenberg The tensor network representation of unitary circuit with spatial locality built in, the so-called brickwork circuit.(b) Each blue block with four legs (two in and two out) represents a two-qubit unitary gate U q 1 q 2 ,q 1 q 2 .(c) Tensor network diagrams for the identities u † u = I and uu † = I.
operator σ x (−t).Therefore the support σ x (−t) which grows as a function of time serves as an emergent "lightcone" for the lattice models, which describes how fast information can propagate in these systems.To provide an explicit example of the emergent lightcone, let us consider the unitary time evolution operator with a tensor network structure built from local two-qubit unitary gates, as sketched in Fig. 2.This unitary operator describes the time evolution of a spin chain with nearest neighboring interaction, without relativistic causality built in but only locality.Given the brickwork structure of U , the tensor network representation of a Heisenberg operator, say Each blue or orange block represents a local unitary matrix, and the green block represents the σ x at t = 0.The blue block and orange block are unitaries conjugated to each other.As a result, if a blue block and an orange block are directly connected, they are replaced by straight lines, representing identity operators.The ten-sor network after this transformation is The remaining blocks (non-shaded region) form a linear lightcone, visualizing the growth of the Heisenberg operator over time.The effective speed of light is a/∆t, where a is the lattice constant and ∆t is the time scale associated with one layer of the unitary circuit.This is a simple but remarkable result, showing that an effective linear lightcone can emerge from the locality in systems without microscopic relativistic causality.The commutator [σ x (−t), O B ] is strictly zero when O B is outside the emergent lightcone.
There is one caveat to this approach to obtain the speed limit for a static Hamiltonian.For instance, consider a Hamiltonian describing nearest neighboring interactions between qubits.
One can trotterize the time evolution operator exp(−iHt) to the local tensor structure shown in Fig. 2. Each local unitary block takes the form u = exp(−iH r,r+1 ∆t).The tensor structure in Fig. 2 approaches to exp(−iHt) in the limit ∆t → 0. Then the velocity a/∆t from the tensor structure goes to infinity, which is not a meaningful bound.Nevertheless, for discrete models with a local Hamiltonian and a finite local Hilbert space dimension, one can establish a much tighter linear lightcone with finite speed of the commutator using a more sophisticated approach originally due to Lieb and Robinson, which we discuss in appendix E. This bound, now usually referred to as the Lieb-Robinson bound [18], states that for two local operators W (r) and V (r ) at (spatial) position r and r respectively, we have the following bound on the operator norm of the commutator, where a, λ and v LR depend on the microscopic parameters of the Hamiltonian.This inequality, combined with Eq. (10) The Lieb-Robinson bound is independent of the state |ψ and is universally applicable.Its importance lies in proving that information cannot travel super ballistically in quantum many-body systems with short-ranged interaction.However, in many cases, the bound can be quite loose, just like the physical speed of light is a loose bound on how fast an object moves in our universe.Moreover, it does not provide a way to calculate v LR but only proves that it is finite.Therefore we need an operational approach to calculate how fast information spreads for a given system-the information lightcone (we will provide a precise definition of the information lightcone in the next section).For now, we can interpret the information lightcone as the minimal set of qubits, outside which Bob either cannot distinguish Alice's actions at t = 0 or needs exponentially many measurements to do so.This set only contains Alice's qubit q A at t = 0 but grows over time.One of the central goals in studies of scrambling dynamics is to calculate the linear size of this region as a function of t.The Lieb-Robinson bound provides an upper bound R(t) ≤ v LR t for local Hamiltonians.
A key second goal is to determine Bob's optimal measurement to capture the signal sent by Alice, or to find the optimal O B to maximize the difference between O B 0 and O B 1 .In non-interacting or weakly interacting systems, the excitation created by Alice can remain coherent for a long time and propagates with a group velocity determined by the underlying medium.Imagine a wave packet of an electron or a magnon moving through a metal or a magnet.In this case, Bob can easily tell whether Alice made the excitation by performing a local measurement, since the signal (the wave packet) remains local for a long time.On the other hand, in strongly interacting quantum systems, the physics of excitations is typically very different.In fact, such a system often cannot sustain any coherent excitation for very long, unless that excitation has a special reason for being protected, such as a sound mode or a Goldstone mode associated with a broken symmetry.Fig. 3 illustrates this difference between a non-interacting system and a strongly interacting system.We use the circuit in Fig. 3 to represent the non-interacting case, which is built from the two-qubit SWAP gate.Since the SWAP gate simply swaps the two qubits it acts on, Alice's state on the first qubit propagates coherently through the system and always remains on a single qubit which Bob can measure.On the other hand, once the local gate of the circuit is perturbed away from the SWAP gate, the signal generally decays as time increases.
The lack of coherent excitations is a consequence of quantum thermalization.For a strongly interacting system, a time-evolved state becomes as complicated as allowed consistent with macroscopic constraints, such as a fixed total energy.As a result, the state looks thermal locally when t > τ , where τ is a relaxation time.This is The circuit is constructed by replacing the generic two-qubit gate in Fig. 2 with a non-interacting swap gate .In this case, Alice's qubit state propagates through the system coherently and remains on a single qubit (the red line).The signal travels ballistically because of the lightcone structure of the circuit model shown in Eq. ( 13).(b) The maximal signal Bob can obtain from single-qubit measurements.The x axis labels the qubit.Each line represents the signal at a different time with an offset.In the case of circuits from SWAP gates, the signal (red curves) remains maximal and propagates through the system.On the other hand, when we perturb the SWAP gate by multiplying it by another unitary gate close to identity, the signal (blue curves) decays quickly.
to say, for any local operator O, we have where β only depends on the average energy ψ| H |ψ of the initial state |ψ .Technically, a finite system will even-tually revive to its initial state after a very long (doubleexponential) time scale [19,20], which we do not consider here.
Let us go back to the communication protocol between Alice and Bob.Since |ψ 0 and |ψ 1 = σ x q A |ψ 0 only differ by a single-qubit flip, they have the same energy density.Therefore, we expect the two states to thermalize and look the same locally after a time τ .Now imagine Bob is trying to tell what Alice did by performing local measurements in a region some distance away from Alice's qubit q A .At early time, the information lightcone has not arrived at Bob's region yet, and by definition, Bob can do nothing to tell the difference between the two states.Later, when the information lightcone reaches Bob's region, the time is well past the relaxation time τ , so Bob still cannot tell the difference between the two states since they now look the same locally.Therefore, unlike in the non-interacting case, Bob cannot tell what Alice did using only local measurements when the system thermalizes.
It is not plausible that information has simply stopped spreading in the strongly interacting system, but it becomes inaccessible to local measurements.Said differently, it may be very difficult to transmit information in the Alice-Bob communication protocol coherently.However, information is still spreading and Bob needs an approach to recover it.We know at least one approach; Bob can perform the measurement using the Heisenberg operator σ z 0 (−t), which is highly non-local.Then it is equivalent to measuring σ z at t = 0, and Bob can easily tell what Alice did.However, this approach might not be optimal since σ z 0 (−t) can be very complex and its support contains Alice's qubit as well.
The discussion so far was framed in terms of classical information -whether Bob can tell if Alice flips her qubit or not.A stronger version is about transmitting quantum data.One can ask the following question.Given that Alice initially prepares an arbitrary quantum state on her qubit q A , is it possible for Bob to recover that quantum state on one of his qubits, denoted q B , following some decoding procedure, after the system is evolved by some time t?
The takeaway of this section is that information spreading is intimately related to commutators of local operators and its speed is upper bounded by the possibly very loose Lieb-Robinson bound.This section also raises two central questions regarding quantum information dynamics in strongly interacting quantum many-body systems: • How to detect the information propagation?
• How to recover the information?

III. QUANTUM INFORMATION FORMULATION
In this section, we formulate the general problem of information in fully quantum terms.Recall that Alice initializes one qubit, q A , into an arbitrary quantum state.The medium that Alice and Bob share then evolves by the unitary U .Finally, Bob's goal is to apply some decoding operation to isolate the quantum information |a that Alice originally encoded in qubit q A , |a In general, Bob's decoding can be a very complex quantum operation.We will discuss examples of Bob's decoding in Section IV, but let us first understand how to determine which qubits Bob needs to control to decode Alice's information in principle.In other words, let's understand how to track where the quantum information is before the decoding.Remarkably, this task can be accomplished just by following a special kind of entanglement with an auxiliary reference system, as we next explain (see also Section IVB).

A. Entanglement spreading
Consider a system containing N qubits whose dynamics is described by a unitary matrix U (t).We pick two orthogonal initial states, |ψ 0 and |ψ 1 that differ by a spin flip at Alice's qubit, where |0 and |1 are two orthogonal states in some basis on Alice's qubit q A , and |ψ is the state of the remaining system.These two states correspond to the two possible initial states in the Alice-Bob communication protocol in Sec.II.Next, introduce a new auxiliary qubit called reference R. The unitary dynamics U does not act on the reference qubit, which one may think of as sitting in an isolated box.Before isolating the reference, however, the reference qubit is entangled with the system through Alice's qubit q A .The initial composite state of the system and the reference is The reference R and Alice's qubit q A form a Bell pair and are maximally entangled.The time evolution of the state is given by or, graphically, where the black dot represents the EPR state Given the initial state, we may probe the information dynamics by tracking the entanglement between the system and the reference as a function of time.Initially, the reference is only entangled with the first qubit q A .This entanglement can be diagnosed using mutual information between R and q A .First, define the Von Neumann entropy of a set of qubits A, where ρ A is the reduced density matrix of A. Then the mutual information of A with B is Using subadditivity and the triangle inequality (see appendix A 3), one can show that 0 ≤ I(A : B) ≤ 2 min(S(A), S(B)) For R and q A , it is straightforward to show that at t = 0, S(R) = S(q A ) = 1 and S(R ∪ q A ) = 0 from the fact that they form a Bell pair initially.Therefore initially, I(R : q A ) = 2, which is the largest possible value, indicating maximal entanglement between q A and R. Correspondingly, at t = 0, the mutual information between R and any other set of qubits is zero.
Starting from the initially localized entanglement, one should expect the entanglement with the reference R to expand out across the system in some fashion.One possibility is that the entanglement is carried in some coherent wavepacket throughout the system, remaining localized in space at any given time.This can occur under the right conditions in non-interacting systems, for instance, in the SWAP circuit shown in Fig. 3.However, with strong interactions, the entanglement seems likely to spread and become more complex [21].In other words, while at time zero the reference is entangled with one qubit, as time progresses, the reference will instead become entangled with a complex collection of many qubits.This process can be quantified by the mutual information between the reference qubit and certain regime B in the state.
For example, one can choose B to include the first l qubits q 1 • • • q l in the system.Then, the mutual information between R and B is (25) At t = 0, the reference is entangled with q A contained in the first l qubits.Therefore I(R : q 1 • • • q l ) = 2 for all l.As time increases, fixing l, I(R : q 1 • • • q l ) decreases once the entanglement leaks out the first l qubits.This behavior is illustrated in Fig. 4, where I(R : q 1 • • • q l ) for 1 ≤ l ≤ N are shown for the mixed-field Ising model containing 22 spins (See Eq. ( 111)).Pure initial states are used here.Notice the parallel curves for l < N/2.This is the first concrete example of the ballistic propagation of information in a strongly interacting system in this tutorial.Also, notice that the late-time value of the mutual information increases with l and approximately stays at the maximal value of 2 for l > N/2.This is a signature of a strongly interacting system, which we will see again soon.

B. Communicating quantum information
How is entanglement spreading related to the Alice-Bob communication protocol?The description above might have already hinted at some similarities.We will now make the connection more precise.We consider a region B of the system and denote its complement as B. The mutual information between the reference R and B is which is between 0 and 2. Since the state |Ψ consisted of R, B and B is pure, S(RB) = S( B). (See Sec.A 2. Therefore The mutual information can also be cast into a form of relative entropy as follows, ) Therefore I(R : B) is a measure of the difference between the density matrix ρ(R)⊗ρ(B) and ρ(RB).These density matrices depend on the two orthogonal states |ψ 0 (t) and |ψ 1 (t) in the Alice-Bob communication protocol, where FIG. 4. (a) Entanglement spreading characterized by the mutual information I(R : q1 • • • q l ) between the reference qubit and first l qubits in the system of N qubits.Each curve represents the mutual information for l ranging from 1 to N .Here N = 22.For l < N/2, each curve remains at the maximal value 2 until the information leaks out and then decays to zero.The parallel curves are a signature of ballistic propagating quantum information.As l exceeds half of the system, the mutual information does not decay, indicating that one can recover the initial entanglement between R and the first qubit if one has access to more than half of the system.(b) The late-time saturation value of the mutual information in (a).It undergoes a sharp transition from 0 to 2 as l passes N/2, indicating that one can recover the initial entanglement between R and the first qubit if one has access to more than half of the system.The late-time value also agrees with the random matrix calculation in Eq. ( 40) Now we discuss the implication of the minimum and maximum of I(R : B) on the Alice-Bob communication protocol.From the quantum relative entropy, I(R : B) is zero only when ρ RB = ρ R ⊗ ρ B , leading to the condition ρ 0 (B) = ρ 1 (B) and ρ 01 (B) = 0.As a result for arbitrary operator O B within region B, indicating no operators in B can distinguish the two state.To make the implication of these conditions on the state |ψ 0 and |ψ 1 manifest, we perform a singular value decomposition on both states between regime B and B, The conditions ρ 0 (B) = ρ 1 (B) and ρ 01 (B) = 0 are equivalent to An arbitrary superposition of |ψ 0 and |ψ 1 has the same density matrix in the region B and thus is the same to Bob.
In the opposite limit, I(R : B) takes the maximal value 2 if and only if I(R : B) = 0.This leads to the same condition in Eq. ( 32) with B replaced with B. As a result, ρ 0 | act on orthogonal states in the Hilbert space of B, and |ρ 1 (B) − ρ 0 (B)| = 2, indicating maximal difference between the two states in region B. In this case, in principle, the optimal operator O B that differentiates the two states can be constructed as What happens when I(R : B) is small but finite?By the quantum Pinsker's inequality, we have, This can be further applied to upper bound any connected correlation between R and A, Applying this inequality to all Pauli operators O R leads to 1 Therefore, the smallness of I(R : B) prevents Bob from distinguishing the two states.

C. Quantum mutual information from random unitary dynamics
As we have seen from comparing the non-interacting (Fig. 3) and strongly interacting (Fig. 4) cases, the quantum mutual information between the reference R and a certain region of the system B I(R : B) depends on the unitary operator U that governs the dynamics.To gain some intuition for I in a strongly interacting system, we next consider a simple toy model where U is taken to be a random unitary operator drawn from the Haar measure.
It is important to understand that a random unitary matrix is a generic matrix acting on the Hilbert space and does not obey a local structure sketched in Fig. 2. Nevertheless, random unitary dynamics is a good starting point to think about quantum information dynamics.It can be used to mimic the local unitary dynamics in the late time regime where the initial entanglement between the reference and the system is fully scrambled over all degrees of freedom.

Pure State
To proceed, we consider the Rényi version of the mutual information, where S (2) stands for the Rényi entropy (See definition in the Appendix Eq. (A7)).We start with the case without the memory, where the combined state of the reference R and the system is pure.Recall that the time evolved state is given in Eq. (20), Of course, the Rényi entropy of any particular U is in principle a complicated function of U , but what is analytically tractable is an average of the Rényi entropy over all U sampled from Haar measure.The averaged Rényi entropy captures well the Rényi entropy of a particular U for large systems because the system size strongly suppresses the variance due to quantum typicality [22].The Rényi entropy averaged over Haar random unitaries can be calculated using the identities in Eq. (A15).We have, where E indicates averaging over the Haar ensemble.Since there is no notion of locality associated with U Haar , the results only depend on the size l of the region B but not its location in the system.We have taken the limit that N → ∞ in the last line.The entropy S(RB) is the same as that of the complement of B S( B) and can be obtained by replacing l with N − l in S(B).
Putting these results together, we obtain the Rényi mutual information as where l is the number of qubits in the region B. The mutual information is 1 when l = N/2, i.e., the region B occupies half the system.It increases to its maximal value of 2 as l exceeds N/2 and decreases to its minimal value of 0 exponentially when l is less than N/2.This result indicates that when the initial information is fully scrambled, any portion of the system less than half does not contain the initial information.On the other hand, any portion larger than half is maximally entangled with the reference spin R and can be used to recover the initial information.This is in sharp contrast with the non-scrambling SWAP circuit shown in Fig. 3 where the information is located at a specific qubit at a given time.
The change of the mutual information at l = N/2 can be understood as follows.The system contains two disconnected regions with l < N/2, but the reference spin cannot be simultaneously maximally entangled with two non-overlapping regions, otherwise it would violate the monogamy of entanglement.As a result, the mutual information drastically reduces when l < N/2.
This result is also valid in the late-time regime of local unitary dynamics or even Hamiltonian dynamics.As shown in Fig. 4(b), for the mixed-field Ising model, the late time value of mutual information for different regions obeys the random unitary calculation, drastically increasing from 0 to 2 when the region considered exceeds half of the system.In the large N limit, I increases from 0.033 to 1.967 in a window of 6 qubits around N/2.

Mixed State
This setup can be generalized to the case where the two initial states of the system are mixed.Similar to the previous case, we pick the two initial states where ρ is the density matrix of the system, excluding the first qubit.Then we introduce the purification of the density matrix ρ by including an additional auxiliary system called memory.The purification of ρ, denoted | √ ρ , is a pure state living in the Hilbert space of the system (excluding the first qubit) and the memory.It has the property that tracing out the degrees of freedom in the memory gives back the density matrix ρ, FIG. 5.In a fully mixed initial state setup, after applying the Haar random unitary to the system, the mutual information between the reference and the system excluding a few qubits, I(R : SYS − E) (blue curve), decays exponentially with the number of qubits excluded.On the contrary, the mutual information between the reference and the memory plus a few qubits in the system, I(R : SYS + E) (red curve), saturates to the maximum.
Alice's qubit still forms a Bell pair with the reference.The entire state, including the reference, the system and the memory, is This state is very similar to Eq. ( 19).The difference is that | √ ρ contains degrees of freedom of the memory that the time evolution operator does not act on.The time evolution of the state is A similar calculation of the averaged Rényi entropy now yields Therefore the mutual information is (47) where s(ρ) is the Rényi entropy of ρ in the initial state.It reduces back to Eq. ( 40) when the initial state is pure, i.e. s = 0.When the initial state is mixed and has finite entropy, I(l) drastically increases from 0 to its maximal value 2 at l ∼ (N + s)/2.The behavior of I(l) for l near (N + s)/2 is independent of s in the large N limit and is already shown in Fig. 4(b).This result suggests that one now needs more than (N + s)/2 qubits to recover the initial information.

Maximally mixed state
In the special case when the initial state is maximally mixed, s(ρ) = N − 1 (the first spin is entangled with the reference spin).In this case, I = log 2 (1 + 3 × 2 2l−2N ) and equals 2 when l = N , i.e. the entire system.I (2) drastically decreases as the accessible region contains fewer qubits.These results also apply to the late-time regime of unitary dynamics.Denote the part of the system that Bob does not have access to as E, and the number of qubits in this region as |E|.We have This behavior is plotted in Fig. 5.In this special case, the Rényi mutual information can be used to bound the Von Neumann entropy because the Rényi entropy S (2) (SYS − E) and S (2) (R) are maximal and equal their Von Neumann counterpart.Since Combining with I(R : SYS) = 2, this indicates that recovering the initial information requires accessing the entire system when the initial state is maximally mixed, in contrast with half of the system for pure initial state.
Another closely related diagnosis of scrambling in this special setup is the tripartite mutual information [12,23] given by which quantifies how much information is not encoded in E and SYS − E but in their union.The last term is 2 for all times.For a scrambled system, the first two terms are zero, the tripartite mutual information takes the minimal value -2.

IV. HAYDEN-PRESKILL PROTOCOL: DETECTING AND RECOVERING THE INFORMATION
Having understood where the information is located at late time, let us now understand how to recover the information.We start with the fully mixed case as this was the setup first studied in the Hayden-Preskill protocol [7], which was originally proposed for black hole dynamics.In the black hole problem, the memory is supposed to describe previously emitted Hawking radiation which is entangled with the remaining black hole.The maximally mixed case then corresponds to the entropic midpoint of the black hole's evolution.The question posed in Ref. [7] was: if another qubit is thrown into the black hole, then how long does one have to wait until the information in that qubit is available again in the subsequent radiation.The result of the prior section is that when Bob has access to the memory (the early radiation), then the information in the fresh qubit quickly becomes available again.
As shown in Fig. 5, when the accessible region of Bob is SYS−E, the mutual information drastically reduces to zero as the number of qubits in E increases, indicating that Bob needs the entire system to recover the information.What happens when Bob also has access to the qubits in the memory?The memory is only coupled to the system by the initial entanglement.The unitary dynamics never mix the system and memory.As a result, one expects that the memory does not contain any information about the reference, indicated by zero mutual information.On the other hand, since the quantum state on the composite system, including the reference, system and memory, is pure, we have Thus according to Eq. ( 49), I(R : MEM + E) saturates to the maximal value 2 exponentially fast (Fig. 5, red curve).In other words, remarkably, Bob can recover the initial information provided that he has access to the full memory, which does not contain any information about the reference (I(R:MEM)=0 for all time), and a few qubits E in the system.

A. Detecting the information front
The Hayden-Preskill protocol also provides an operational way to measure information dynamics.Let us still consider the setup with fully mixed initial state.The unitary operator U governing the dynamics now may have more structure, such as locality, instead of Haar random.Under time evolution, the mutual information between the reference R and the system SYS is always maximal, and the mutual information between the reference and the MEM is always 0. Initially, R is maximally entangled with the first spin.As discussed previously, when the unitary operator is Haar random, the entanglement instantly spreads over the entire system.The mutual information between the reference to any subregion of the output, even excluding a few qubits, becomes almost 0. Now suppose the unitary operator has a local structure built in, so that the entanglement spreads in a timedependent manner, as shown in Fig. 4. In order to capture the profile of the entanglement spreading, a natural approach is to trace the mutual information between the reference and the first l qubits in the output I(R : q 1∼l ), which increases monotonically with l.The front of the spreading at a given time is determined by the largest l for which I(R : because it implies that the information has begun to leak out the first l qubits.This approach is a conceptually straightforward application of the definition but is challenging for experiments since I(R : q 1 • • • q n ) involves the entanglement entropy of a nonlocal region.Alternatively, inspired by Hayden-Preskill, one can measure the mutual information between the reference and the memory plus one qubit in the system I(R : MEM ∪ q n ).A finite value indicates that the initial information has reached the lth spin.Therefore the front of the entanglement spreading can be determined by the largest n that I(R : MEM ∪ q n ) > 0.
The mutual information used in the second approach can be estimated by local measurement, as we discuss below, and is much more accessible in experiments.Now let us look into the mutual information I(R : MEM ∪ q n ) more closely.From the definition of I, we have In the second equality, we used that R ∪ MEM ∪ q n and SYS − q n are complementary regions in a pure state.Because the fully mixed state is used in the setup, S(R) = 1 and S(SYS−q n ) = N −1.Therefore the mutual information only depends on the entanglement entropy of B ∪q n , Initially, S(MEM ∪ q n ) takes the minimal value for all qubits q n except the first one.As time increases, its deviation from the minimal value signals that the information has reached the nth spin.Since the Von Neumann entropy upper bounds the Rényi entropy, we have, As a result, S (2) can be used as an indicator for the front of quantum information propagation.This result is particularly nice, since S (2) can be related to correlation functions that are more accessible than the von Neumann mutual information, as we show below.Also see experimental schemes in Sec.VII.
Since in the case we consider here, the qubits q 2∼N are in a fully mixed state, we can choose the simplest purification where the memory contains N − 1 auxiliary qubits that form N − 1 EPR pairs with the N − 1 spins in the system.The time evolved state and the density matrix ρ(MEM ∪ q n ) is From the density matrix, we can obtain the purity tr(ρ The operators V and W are summed over local Pauli operators (including the identity) on q 1 and q n .The second equality uses the completeness relation of Pauli operators in Eq. (A14).When W 1 or V n equals the identity, the trace contributes 1 to the sum.Separating these terms from the others, we get, The purity becomes a sum of correlators between timeevolved local Pauli operators.The Rényi entropy is just − log 2 tr(ρ 2 ).From Eq. ( 58), we can bound the mutual information by the correlators, We emphasize that this inequality applies to any unitary U .Each term in the summation has a maximum value 1, in which case the right-hand side takes the minimal value 0. This happens when the Heisenberg operator W 1 (−t) commutes with the operator V n for all V and W .When W 1 (−t) and V n start to overlap, the correlator decreases from 1.As a result, I(R : MEM∪q n ) is nonzero, indicating that the information has reached q n .At late time, all the terms decay to 0 and the right hand side becomes 4−log 2 7, consistent with Eq. ( 52) when |E| = 1.
Because the kind of correlator appearing on the righthand side of Eq. ( 59) can be used to diagnose the information propagation, it deserves a name.In the literature, it is referred to as the out-of-time ordered correlator or OTOC, usually written as The time argument is changed from −t to t for simplicity.Larkin and Ovchinnikov first introduced OTOC in the context of superconductivity [24].It has gained extensive renewed interest recently due to the connection to scrambling dynamics discussed here as well as quantum many-body chaos in the semi-classical regime (see a short discussion in the epilogue in Sec.VIII.Several remarks are in order.First, we emphasize that the OTOC is only an indicator of information propagation and scrambling.When W 1 (t) and V n do not commute, it indicates that the information front has reached the qubit n but does not mean that one can recover Alice's initial action by measuring the nth qubit.In fact, we have seen from the random unitary calculation that it requires all the qubits in the system, or the entire memory plus a few qubits with nonzero commutators, to recover the information.Second, the squared commutator as an indicator of the information propagation is tied to the simple initial state where ρ(q 2 ∼ N ) is fully mixed, namely the Hayden-Preskill setup.When the initial state is not fully mixed, a precise relation between the mutual information, which is the defining characterization of information propagation, and simple correlation functions such as OTOC is not established.One can even show that the OTOC overestimates how fast information propagates for some other initial states.

B. Recovering the information: many-body teleportation
This section discusses the second question of quantum information dynamics on how to recover the initial information.In Sec.III B, we showed that maximal mutual information between the reference and Bob's qubits indicates that Bob can distinguish Alice's action on the initial state using the operator constructed in Eq. (34).However, in practice, the optimal operator O B is nonlocal and challenging to construct.It will be ideal if the initial state of Alice's qubit q A reappears on one of Bob's qubits after Bob follows a specific decoding protocol on his qubits.This is called many-body teleportation [7,[25][26][27].

Many-body teleportation works if
for any state |a to be teleported, where q A is Alice's qubit, q B is one of Bob's qubit, |ψ is the initial state on the qubits except q A and |φ is the final state on the qubits except q B .This means that Alice would be able to teleport a bit of quantum information to Bob through a strongly interacting medium described by U , which fully scrambles her information into the entire system.The key point here is that Bob only owns part of the qubits so he cannot trivially unscramble the information by applying U † .Graphically, the condition for successful teleportation is ) Denote the quantum state after the decoding |Ψ out , the fidelity of teleporting the state |a to the qubit q B is defined as When the fidelity averaged over Alice's state E(F (|a ) is 1, it indicates that the system is able to teleport any quantum state with perfect fidelity.The averaged fidelity can be obtained by sampling Alice's state from the action of a random unitary u a on a basis state |0 , where Recall the boxed setup with the reference R used in Eq. ( 20) and (44).F EPR is just the fidelity that the reference R, initially forming an EPR pair with Alice's qubit q 1 , forms an EPR pair with one of Bob's qubits after the unitary transformation and decoding.F EPR = 1 indicates perfect teleportation.In this case, the mutual information between the reference and Bob's qubit q B is 2. The necessary condition for the perfect teleportation is that the reference and all of Bob's qubits I(R : B) is 2, as discussed in Sec.III.The purpose of the decoder is to concentrate the entanglement with reference to a single qubit q B .Even given maximal mutual information I(R : B), it is still in general challenging to design the decoding protocol.

Conventional teleportation
Before discussing the decoder for the Hayden-Preskill protocol, it is useful to review conventional quantum teleportation.Alice has a qubit encoding the state to be teleported |a .Alice has an additional qubit that forms an EPR state with Bob's qubit.To teleport the state |a to Bob's qubit, Alice first measures her two qubits in the Bell basis.The measurement projects the two qubits into one of the four Bell states, Those states obey that where the coefficient 1/2 is the square root of the probability of this Bell measurement outcome.Changing |Z and σ z for the other three outcomes, we obtain similar final states, each outcome with probability 1/4.In all cases, the state |a appears on Bob's qubit with perfect fidelity.Taking all the four measurement outcomes into account, the final state is a mixed state 1 4 I Alice ⊗ |a a| Bob , in which Alice's two qubits are in the fully mixed state and Bob has Alice's original state.One can also show that if we introduce an addition reference qubit forming an EPR pair with Alice's first qubit and perform the same protocol, the reference will form an EPR pair with Bob's qubit in the final state with fidelity F EPR = 1.

Many-body teleportation
Now we are ready to discuss the decoding protocol for the Hayden-Preskill protocol [25].Recall the setup for the Hayden-Preskill protocol.The system contains N qubits.The first one forms an EPR pair with the reference R, the remaining N − 1 qubits form N − 1 EPR pairs with another N − 1 auxiliary qubits in the memory, which Bob owns.In addition, Bob also owns a set of qubits E in the system.In the Sec.IV, we showed that I(R : B) approaches 2 exponentially fast as |E| increases when the system is time evolved into the late-time regime and fully scrambled, therefore a decoding protocol is in principle possible.The question now is how to design a decoding protocol on Bob's protocol so that R forms an EPR pair with one of Bob's qubits.The quality of the decoding protocol can be characterized by F EPR .In the Hayden-Preskill protocol, the state before the decoding is Yoshida and Kitaev [25] found two decoding protocols for this state, one probabilistic and one deterministic.The probabilistic decoding protocol goes as follows.
1. Bob takes another two qubits, q 1 and R and prepares them in an EPR state.
2. Bob applies the unitary operator U * to MEM ∪ q 1 .
3. Bob performs Bell's measurement on each qubit in E and its partner in MEM, with which it forms an EPR initially.
4. The entire protocol is repeated including preparing the state in Eq. ( 68) until the outcome of all the Bell measurements is the EPR state |I .
After these steps, the reference R and R , one of Bob's new qubits, would have high fidelity to form an EPR pair.This implies that in the case without the reference, any state injected into q 1 initially has a high fidelity to reappear on R , the other Bob's new qubit.This protocol is probabilistic because in step 3 Bob needs to postselect the EPR pairs from the Bell measurements.To understand this decoder, let us calculate the probability of successful postselection and the fidelity F EPR (R, R ) given successful postselection.The probability of successful postselection is Notice that the diagram is the same as that in Eq. ( 57) for calculating ρ 2 (MEM ∪ E).Using S (2) (R) = 1 and The probability of the postselection is directly related to the Rényi mutual information between R and Bob's qubit before the decoding.Given the successful postselection, the fidelity that R and R' form an EPR is When the unitary U is fully scrambling, such as a Haar random unitary, the Rényi mutual information can be obtained from Eq. ( 48) as Substituting it in the equation for F EPR , we get F EPR approaches 1 exponentially fast as |E| increases, indicating perfect teleportation fidelity given successful postselection for fully scrambling unitary time evolution.
In general, since tr 2 ρ 2 (MEM ∪ E) can be written as the sum of OTOCs as shown in Eq. ( 57), the fidelity F EPR is also directly related to OTOCs as ) where V 1 and W E are summed over all local operators, including the identity, in the first qubit and E, respectively.In the fully scrambled regime, the OTOC is 1 if either V 1 or W E is the identity and 0 otherwise, and we get Eq.( 73) back.We see that OTOC not only provides a tool to detect information propagation but is also directly related to the fidelity of information recovery.
The above decoding protocol is probabilistic.Bob has to postselect the state on E ∪ E to be the EPR state after the Bell measurement.The probability ∆ is directly related to the Rényi mutual information between the reference and Bob, ∆ = 2 −I (2) (R:MEM∪E ).The optimal fidelity F EPR requires the minimal ∆, which is 1/4 for teleporting a single qubit and 4 −n for teleporting multiple qubits.To overcome the small successful postselection probability, Yoshida and Kitaev [25] also designed a deterministic decoder that does not require postselection but only unitary transformation on Bob's qubits.The general idea is to perform a Grover search on Bob's qubits.After a series of unitary transformation, the states in which E does not form EPR with E is canceled due to destructive interference.
The above discussion assumes that the reference, system, and memory form a closed system and the dynamics is unitary.It would also be interesting to study how dissipation and measurement affect the recovery fidelity [28,29].

V. MICROSCOPIC PHYSICS OF OTOCS AND OPERATOR GROWTH
A. Relating OTOC to operator dynamics In previous sections, we established that OTOCs can be used to track the front of the information dynamics, and they are directly related to the fidelity to recover the initial information.In this section, we discuss the microscopic physics of OTOCs regarding the growth of Heisenberg operators [30,31].
We consider a quantum many-body system whose dynamics are governed by a unitary operator U (t).We use W 0 (t) = U † (t)W 0 U to represent the time-evolved Heisenberg operator, which is located at the origin at t = 0, and use V r as a static local operator at site r.Then the OTOC between W (t) and V r can be written as The space and time dependence of F (r, t) has a very intuitive picture based on operator growth, sensitive to whether the support of W (t) overlaps with that of V r .The connection between OTOC and operator growth can be made more explicit by introducing the squared commutator (76) which is proportional to the Frobenius norm of the commutator [W 0 (t), V r ] and thus always positive.One can easily show that The last two terms are local observables that typically relax to a constant quickly due to thermalization.Therefore, C(r, t) and F (r, t) have the same behavior after a thermalization time.In particular, if both W and V are unitary, the sum of the last two terms is 2. Furthermore, when W and V are Hermitian, F (r, t) is real.Therefore, when W and V are unitary and Hermitian, e.g., Pauli operators, C(r, t) = 2 − 2F (r, t).
The squared commutator manifestly depends on the "size" of the Heisenberg operator W (t), the number of degrees of freedom that W (t) acts on.At t = 0, W (t) only acts on a simple site and commutes with any V r that is far away, so C(r, t) = 0. (In a fermionic system, if the operator W and V are both fermionic, one should consider a squared anti-commutator instead of a squared commutator.)As time increases, W (t) becomes more and more non-local and starts to overlap with V r , indicated by the increase of C(r, t).Varying V r for different r, C(r, t) remains small if V r is outside the support of W (t) and large otherwise.Therefore C(r, t) probes the size of the Heisenberg operator W (t) at a given time.This is consistent with the discussion in Sec.III that the OTOC tracks the lightcone of information dynamics for the infinite temperature state.
To obtain a more precise understanding, it is useful to think about the growth of the Heisenberg operator W (t) in a complete basis of operators {S}.The basis obeys the following normalization and completeness condition The conventional choice of the operator basis for qubit systems is the Pauli strings, which are products of Pauli operators or the identity operator on every site, where s = 0, 1, 2, 3 stands for I, σ x , σ y and σ z .Note that there are in total 4 N different Pauli strings for N qubits.
The Heisenberg operator W (t) can be expanded in the basis Without loss of generality, we fix the norm of W (t) to be Since the time evolution is unitary, the normalization stays the same and S |α(S, t)| 2 = 1 for all time.Hence, |α(S, t)| 2 can be interpreted as a probability distribution of the operator.We also define a complete local operator basis at site r, denoted as S r .In qubit systems, the local basis includes three Pauli operators and the identity operator.The generalization to qudit systems and fermion systems are discussed in the appendix Sec.B. In the general case, the local Hilbert space dimension is q, and there are q 2 local operators in the local operator basis.
Using the completeness relation, one can show that the average OTOC between W (t) and all S r which measures the probability that W (t) acts on site r as the identity operator, according to the probability distribution |α(S, t)| 2 .Similarly, the average squared commutator is 1 which is complementary to the average OTOC.Eq. ( 82) establishes a precise connection between OTOC and operator probability.In the summation in Eq. ( 82) and ( 83), the term with S r = I does not have dynamics and can be separated from the other terms.Therefore we define the following average OTOC and squared commutator Starting with a simple local operator, the time evolution of the operator probability distribution can be very different between generic interacting systems and noninteracting systems.For instance, in non-interacting fermionic systems, a single-particle operator always remains single-particle.On the contrary, in interacting systems, a local operator tends to become as complex as possible in the late time.Maximal complexity for an initial traceless operator implies uniform distribution over the operator basis S except for the identity.The identity operator is special since it is static under unitary time evolution.Therefore in systems with L qudits with local dimension q, we have Based on Eq. ( 84), the late-time operator probability distribution determines the late-time value of the average OTOC and One can obtain the same late-time values of OTOC between W (t) and a single local operator S r using Haar random unitary as the time evolution operator.These late-time values suggest that an operator becomes maximally complex and can be used to distinguish generic interacting many-body systems from non-interacting systems.We also note that the discussion above assumes no symmetry present.We will briefly discuss how symmetries impact scrambling dynamics in the Epilogue VIII.

B. Scrambling dynamics in geometrically local systems
In Haar random unitary dynamics, there is no notion of space and time since F(r, t) reaches its final value in a single step for every r.In contrast, a physical manybody system only contains few-body interactions, such as spin-spin interactions or interaction terms involving four fermionic operators.A generic physical Hamiltonian of N sites is where H b acts on a set of sites by b and J b is the coupling strength that generally can be time-dependent.For example, in spin chains with the nearest neighboring interaction, b labels each bond.In general, one can regard a quantum many-body system as a hypergraph, in which each site defines a vertex and each term in the Hamiltonian H b defines a hyperedge e b to be the set of vertices involved in H b .This hypergraph completely determines the connectivity of the system.One can also generalize this description to unitary circuits, which are a tensor product of few-body unitaries, From OTOC one can define a time scale t * called scrambling time, at which F(r, t) relaxes to the final value ∼ 0 for all sites r, given a Heisenberg operator W (t) that is initially localized at one site.A natural question in this general setup is how the scrambling time t * depends on the system size.The time scale is determined by the hypergraph's connectivity and the coupling strength J b .
While the complete answer to this question is not known, there exists extensive literature tackling specific regimes.We also note that although this general definition of a physical quantum many-body system seems obscure and unnecessarily complex from a conventional point of view, there currently exist experimental schemes allowing for tuning the connectivity between microscopic degrees of freedom [32], making quantum many-body systems with general graph structure a physically relevant topic.
To make our discussion tangible, let us restrict to cases where H b only acts on two sites, describing a spin-spin interaction, for instance.In this case, the hypergraph reduces to a graph.One can imagine arranging the N qubits in a chain.The Hamiltonian can be written as The strongest connectivity occurs when the graph is a complete graph where each qubit connects to every qubit with approximately the same interaction strength J (exactly the same coupling between all pairs makes the model integrable and not scrambling).In such all-to-all connected models, it is typically found that the scrambling time scales logarithmically with N , t * ∼ log N , provided the couplings J b are normalized to give an extensive-in-N energy.Furthermore, the system is approximately permutation invariant in this case, so all qubits are equivalent.Therefore, C(r, t) does not have spatial dependence.Many calculations [11,[33][34][35] have shown that in the large N limit, C(r, t) in the early time takes an exponential growth form where λ is called the Lyapunov exponent related to the coupling strength J.Here C(r, t) becomes O(1) when t ∼ log N , setting the scrambling time scale.It is suggested that this is the fastest scrambling time scale for physical systems with a proper normalization of J rr [8,36].
Significant research interest has been put into imposing bounds on the Lyapunov exponent to understand how fast quantum many-body systems can process information.The chaos bound [11] concerns a finite temperature version of the OTOC and shows that λ L < 2πT in a quite general setting, leading to extensive studies on understanding and refining the bound and on operator growth in general, especially at finite temperature, e.g.[30,31,[37][38][39][40].We will briefly comment the notion of OTOCs at finite temperature and their connection to many-body quantum chaos in the Epilogue VIII.
Moving away from all-to-all connected models, one should expect the scrambling time to increase as one reduces the connectivity since it will take longer for a local perturbation to spread over the system.One approach to reducing the connectivity is to require that J rr decreases as a function of the distance |r − r | (Also see discussion about scrambling on sparse graphs [41][42][43]).As a result, C(r, t) generally develops a space-time profile, from which one can define an information lightcone by considering a contour of C(r, t).The contour specifies a function r(t) that describes how fast information propagates.The information propagation largely depends on how the interaction decays in space.An interesting case arises when J rr decays algebraically as a function of |r − r |, J rr ∼ 1/|r − r | α .One can show that as the interaction becomes more and more short-ranged, C(r, t) deviates from the fast scrambling behavior [44].As α increases, the asymptotic form of r(t) undergoes a series of transitions from exponential r ∼ exp(t η ) (η > 0) to algebraic r ∼ t ξ (ξ > 1) and eventually when α > 1.5 becomes linear t ∼ r [45,46], indicating that information transport slows down from super-ballistic to ballistic.When α → ∞, the interaction becomes short-ranged and the usual Lieb-Robinson bound in Eq. ( 15) applies, restricting r(t) to be at most ballistic [47][48][49].
Another line of research investigates how the presence of quenched disorder and localization affects the information lightcone [50][51][52][53][54][55][56][57][58][59].Based on a phenomenological model called the weak link model [56], it was shown that starting from a model with short-ranged interaction, increasing the disorder strength impedes the information propagation and drives a series of transitions of the lightcone function from linear r ∼ t to algebraic r ∼ t ξ (ξ < 1) and eventually becomes logarithmic r ∼ log t when the system becomes many-body localized.Similar transitions are found in quasi-periodic systems as well [60,61].As this high level overview makes explicit, information scrambling in quantum many-body systems displays a rich set of behaviors dependent on both connectivity and disorder.In the next section, we will focus on the case with short-ranged interaction where the information lightcone is linear and discuss the behavior of OTOCs in more detail.Furthermore, we will show explicitly how OTOCs are calculated in a brickwork random quantum circuit.

C. Scrambling dynamics in systems with short-ranged interaction
For a generic system with short-ranged interaction and no disorder, local operators spread ballistically, which has been shown in numerous systems including field theories [62,63], free and integrable models [64,65], interact- ing spin chains [66,67] and circuit models [68][69][70][71].In this case, we have r(t) ∼ v B t, where v B is called the butterfly velocity.The typical behavior of a ballistically expanding C(r, t) is sketched in Fig. 7. Fixing r and varying t, C(r, t) grows from 0 to the saturation value, telling how the operator becomes complicated locally once the operator front reaches point r.On the other hand, fixing t and varying r, C(r, t) decays from the saturation value to 0, as r exits the lightcone.These are the plots often used in the literature to characterize the behavior of C(r, t).
In the local system, the Lieb-Robinson bound, already mentioned in Sec.II imposes substantial restrictions on the form of C(r, t).Because of Lieb-Robinson, From the Lieb-Robinson bound, it is natural to guess that when C is far from saturation.This is indeed the correct form of many models in the large N or semi-classical limit based on field theory calculations [62,72].However, in the random unitary circuit model [68,69], the tail of the OTOC has a different behavior, In contrast to large N and semi-classical calculation, this ballistically expanding wave has a diffusively broadened wavefront, meaning the scale over which C varies as a function of u = r − t/v B goes like √ D B t.Note that the random circuit model also has a version with a large number N dof on each site, but while v B and D B depend on this number, the holographic form is never obtained.Furthermore, the Lieb-Robinson bound also applies to the non-interacting system, where the squared commutator can be calculated exactly.In this case, we obtain another different behavior The typical behavior of the three classes of OTOC is illustrated in Fig. 8.One should not expect the noninteracting limit to be generic, but the spectrum of multiple different universality classes allowed by the Lieb-Robinson bound is certainly raised.There seem many possibilities.
To understand the generic behavior of OTOC, let us look at the constraint on the functional form of OTOC imposed by the Lieb-Robinson bound.The Lieb-Robinson bound implies that (i) C(r, t) decays exponentially or faster with r, fixing t.
(ii) C(r, t) grows exponentially or slower with t, fixing r.
(iii) C(vt, t) decays exponentially or faster with t for v > v LR .
These constraints are most restrictive outside the lightcone where C(r, t) is still small.A general form [67,73] that satisfies these constraints is This growth form unifies the three classes mentioned above into a single framework by including one additional parameter p, called the broadening exponent.The large N and semi-classical result fits the form with p = 0 (no broadening).The random circuit result fits the form with p = 1 in d = 1 (diffusive broadening).The noninteracting fermion result fits with p = 1/2 in d = 1.The physics of the general growth form and the broadening exponent p are as follows.Given the general shape in Eq. ( 96), the contours C(r, t) = θ obey Hence, no matter the value of θ, asymptotically one has However, at any finite t, the contour has an extra subballistic time dependence going like t p p+1 which is due to the wavefront broadening.As a result, the spatial distance between two contours at a given t is This is the key difference between the large N /semiclassical models and the other models such as noninteracting systems and random circuit models.In the large N /semi-classical models, δr does not grow with t as t → ∞.Although the non-interacting system is not scrambling and should not be expected to be generic, the difference between the large N /semi-classical models and random circuit models, both strongly interacting and scrambling, requires understanding.

D. Random circuit model
We will now provide a concrete calculation of the OTOC in a random quantum circuit, a prototypical many-body model for studying entanglement generation [74] and operator dynamics [68,69].Also see a recent review [75].The random circuit contains alternating even and odd layers of random two-qubit gates (100) where each block is independent Haar random unitary with dimension 4 × 4 for qubits or q 2 × q 2 in general for local dimension q.There are also generalizations of the random unitary circuit to respect charge conservation [76,77], dipolar conservation [78] or other kinetic constraints, which are important for studying the interplay between transport phenomena and scrambling.In these random circuit models, the random average of the local unitary operator usually maps the quantum manybody models to statistical models that are easier to handle while retaining the universal aspects of the quantum many-body dynamics.
Let us focus on the random circuit model without any structure except for the brickwork structure.The idea to calculate the OTOC is to track the Haar averaged time evolution of the operator probability distribution, where S is an operator string defined in appendix B 1 and t is discrete.
In the random unitary circuit, each Haar random unitary can be averaged out independently.Consider applying a single local unitary gate u to site r and r + 1 to the operator W .The updated operator probability is given by It is instructive to consider only two sites first.It is straightforward to show that the Haar average leads to Notice the remarkable feature that the updated operator probability only depends on the operator probability before the update but not the amplitude.This is a simplification due to the Haar random local dynamics, but it is not true in general many-body systems.In addition, the identity operator stays the identity operator as expected from the unitary time evolution.The non-identity operators become uniformly distributed non-identity operators because of the scrambling nature of the Haar random unitary.To proceed, each operator string is mapped to a binary string.On each site, the identity operator is mapped to 0 and the others are mapped to 1.The probability of the binary string P (S) is the sum of the probability of operators mapping to the same string.Then the transition rule is given by P (00, t + 1) = P (00, t) P (01, t + 1) = 1 q 2 + 1 (P (11, t) + P (10, t) + P (01, t)) P (10, t + 1) = 1 q 2 + 1 (P (11, t) + P (10, t) + P (01, t)) P (11, t + 1) = q 2 − 1 q 2 + 1 (P (11, t) + P (10, t) + P (01, t)) (104) Since in Haar random circuit, each unitary is independent, the transition rule above also holds locally when a local unitary u acts on site r and r + 1.In this case, the binary string probabilities only change locally on site r and r + 1 according to the transition rule given above.Thus the unitary dynamics after random average become stochastic dynamics described by a master equation.
To gain an analytical handle on the effective stochastic dynamics, it is useful to consider the probability that the last 1 in the binary string ends at r, P end (r).Based on Eq. ( 104), applying the local unitary u rr+1 updates P end as follows P end (r) = 1 q 2 + 1 (P end (r, t) + P end (r + 1)) P end (r + 1) = q 2 q 2 + 1 (P end (r, t) + P end (r + 1)) Notice that (P end (r, t) + P end (r + 1)) is conserved as expected.The unitary u r,r+1 re-distributes them so that the operator has a higher probability of ending at r + 1, leading to expansion.Recall that the unitary circuit acts on the even and odd bond alternatively.The layer of even (odd) t acts on even (odd) bond.To take into account the combined effect of even and odd layers of the unitary circuit, it is more convenient to track the sum of P end on even bonds, denoted P end (b), after each even layer.Because of Eq. ( 105), P end (b) for even b fully specifies P end (r) after the even unitary layer.One can show that P end obeys This equation describes a biased random walk.In the continuum limit, the equation becomes a biased diffusion equation where Note that in the circuit model, an upper bound of velocity is 1, which is set by the geometry of the circuit and the naive lightcone of the Heisenberg operator shown in Eq. ( 13).This upper bound can be regarded as the Lieb-Robinson velocity of the circuit model.Here the butterfly velocity obtained is smaller than the upper bound.For q = 2, v B = 3/5.This is because of the return probability in the biased diffusion equation.As q → ∞, the butterfly velocity approaches 1 and D decreases to 0. The solution of the biased diffusion equation with initial condition δ(r) is To obtain the squared commutator, it is reasonable to assume that the operator on every site behind the endpoint of the operator is in local equilibrium.As a result, from Eq. ( 84) (110) It exhibits ballistic expansion and diffusive broadening of the wavefront.When t > r/v B , it quickly saturates to the final 2, indicating scrambling.The tail behavior of C(r, t) can be obtained by expanding the error function in the limit that r − v B t √ Dt.This leads to the growth form given in Eq. ( 94), which is in contrast with the growth form obtained in the semi-classical/large N and ADS/CFT models.Physically, the Gaussian tail of the squared commutator obtained in the random circuit model relies on two factors.First, the endpoint of the operator undergoes a random walk biased towards expansion.Second, the operators behind the endpoint immediately reach the local equilibrium because of the Haar random unitary.

VI. NUMERICAL METHODS
In this section, we discuss some existing numerical methods to calculate the OTOC in many-body systems, including commenting on their applicability and limitations.The numerical methods can be roughly divided into two categories, exact diagonalization and tensor networks.There are many other wholly or partially numerical approaches to calculating OTOCs, such as the truncated Wigner approximation in the semi-classical limit [79,80], but these are among the most general purpose.To be concrete, we consider the following prototypical spin chain Hamiltonian called the mixed field Ising model with nearest neighboring Ising interaction, We suggest interested readers try the different methods introduced below on this Hamiltonian.

A. Exact diagonalization and Krylov space method
We first discuss the exact diagonalization based method to compute the OTOC in the system described by Hamiltonian H.In this case, it is more convenient to consider the OTOC in the form of F = tr(W (t)V W (t)V ), where W (t) is the time-evolved Heisenberg operator and V is a local probe operator at site r.The most straightforward method to compute OTOC is to perform a full exact diagonalization on H to obtain the eigenvector |n as well as the eigenvalues n .Then the OTOC can be evaluated as e i nm t W nm V ml e i pq t W pq V qn (112) where mn = m − n .The matrices W and V are in the eigenstate basis.They only need to be computed once to calculate OTOC at different times.Although this is the simplest and numerically exact method, it suffers severely from limited scalability.The bottleneck is the full diagonalization of H and storing the eigenstates, which can be done for up to 15 qubits -It takes about 40G memory to store the whole set of eigenstates.
One can avoid exact diagonalization by implementing the Heisenberg time evolution using Krylov method [81].The Krylov basis can be generated by applying Hamiltonian to the operator iteratively k times, where k + 1 sets the effective Hilbert space dimension.Specifically, we have, The Krylov method is also numerically exact.It does not require diagonalizing the Hamiltonian, but it requires regenerating the basis by applying the Hamiltonian to the operator multiple times at each step, which can be time consuming.This method is limited by storing the large matrix of W (t). Initially, W is a local operator in the form of a sparse matrix.The complexity of W (t) increases with time, and in the late time, W (t) becomes dense and requires a huge memory to store.As a result, this operator-Krylov method, similar to the naive exact diagonalization method, can also only work for up to 15 spins.The difficulty of storing the full Heisenberg operator can be circumvented by calculating the OTOC in the Schrodinger equation and using the typicality of random states.Using the property of Haar random state |ψ , we know that Therefore, we can replace the trace in the OTOC by sampling over Haar random states, Based on quantum typicality, the sample-to-sample fluctuation is suppressed by the large Hilbert space.As a result, one random state is sufficient to capture the OTOC.Also see [82].One can evolve V |ψ and |ψ back and forth in time efficiently using the Krylov method.We dub this method as "Krylov-State".Fig. 9(a) shows the OTOC from this method and the exact diagonalization based on Eq. ( 112), which agree well with each other.
The main bottleneck of this method is to store the entire quantum state, a dense vector in the Hilbert space instead of a dense matrix as that in the previous methods.As a result, the maximal applicable system size is doubled, i.e., ∼ 30 qubits [83,84].All three methods introduced in this section are numerically exact for calculating the OTOC.They apply to arbitrarily long time but are limited to quite a small system size.Among the three, the Krylov-State method applies to systems with up to 30 qubits and thus is significantly better than the Krylov-Operator method and the most straightforward ED method.

B. Tensor-network methods
In this section, we discuss another class of methods that have been used in the literature to calculate OTOC, which is based on time-evolving matrix product state (MPS) and matrix product operators (MPO) [66,67,[85][86][87] and Sec 8.4 in [88].Unlike the exact diagonalization and Krylov method, these methods can be used to study huge systems containing a few hundred spins; the numerical cost increases linearly with L. The main bottleneck here is the time scale.These methods always produce a result for any time, but the result becomes more and more inaccurate as the state and operator are evolved longer and longer.In the following, we summarize the key concept of these methods and discuss their application to calculate OTOC, assuming the readers have a basic familiarity of density matrix renormalization group (DMRG) and time-evolution block decimation (TEBD) [89].
The central object in these methods is the matrix product state.In MPS, the wavefunction amplitude is a product of a series of matrices defined for each spin, Fixing s to ↑ or ↓, A s i is a matrix associated with the ith spin, and s is the physical index.The dimension of the matrix χ is called the bond dimension.On the first site and the last site, A s is a vector so that the product results in a number, the wave function amplitude of the spin configuration {s}.The tensor network representation of an MPS is Analogously, one can also write down the matrix product form of an operator called MPO, in which case, the matrix A has two physical indices.The tensor network representation of an MPO is similar to MPS as shown below, The MPS requires storing 2L matrices, the 2 coming from the physical index ↑ and ↓, in total 2Lχ 2 complex numbers.Therefore MPS is a very compact approach to represent a wavefunction that contains 2 L numbers.As expected, an MPS of a given bond dimension χ can only represent a tiny portion of the Hilbert space, characterized by low entanglement entropy.The entanglement entropy of an MPS with bond dimension χ is maximal log χ, a constant, while the entanglement entropy of a random state in the Hilbert space scales linearly with L. Fortunately, the entanglement entropy of the ground state of one dimensional gapped systems exhibits area law, i.e., does not scale with system size, and thus can be captured accurately.
A large toolbox for manipulating MPS/MPO has been developed over the years [88,89].The key idea behind studying dynamics using MPS/MPO is to evolve the state and operator while preserving the matrix product form.However, unlike searching for the ground state, the complexity of the simple initial states and initial operators increases over time, and in late time, the states exhibit volume law entanglement entropy.As a result, at some point during the evolution, usually a short time scaling as one over the interaction strength, the state cannot be captured faithfully by an MPS with a reasonable bond dimension.In practice, the quantities of physics interests are local observable and correlation functions, instead of the entire quantum state.To obtain the physical observable O(t) at a given time, one can repeat the simulation with increasing bond dimension, and the result can be trusted if it converges with the bond dimension.
There are two major approaches to evolving MPS/MPO: time evolving block decimation (TEBD) and time dependent variational principle (TDVP).In TEBD, the local unitary gate is directly applied to the MPS and increases the bond dimension of MPS.Then the MPS is truncated so that the bond dimension stays the same, leading to the truncation error.Furthermore, the truncation error in TEBD does not necessarily respect conserved quantities, such as the energy and charge, which start deviating from their initial value after a short time scale.This issue can be fixed by using the TDVP method to evolve the state instead.In short, TDVP generates an effective Hamiltonian for the local tensor in MPS, which depends on other tensors.The effective Hamiltonian is used to unitary evolve the tensor.As a result, the total energy is conserved by construction during the time evolution, and TDVP can take into account other conserved quantities.Although conserved quantity does not necessarily ensure correct local observable, correlation function or transport behavior, this is the first step towards approaching the corrected non-equilibrium beyond a short time.This is particularly important in the late time regime where hydrodynamics emerges and conserved quantities play a crucial role.
MPS based method -Now, we discuss applying these methods to calculate the OTOC.Like methods using exact diagonalization, one can also calculate OTOC in either the Schrodinger or Heisenberg picture.In Schrodinger's picture, similar to the exact diagonalization discussed before, the trace in the OTOC F (t) is replaced by an ensemble of states.However, the ensemble of random states is not feasible in the tensor network methods because the random states have large entanglement that MPS cannot capture.Instead, in practice, an ensemble of random product states is used.Fix the initial state as ψ.Either TEBD or TDVP can be used to generate the following two states, Then the OTOC is just the overlap between the two states followed by averaging over the initial states.The two methods, dubbed as TEBD-MPS and TDVP-MPS respectively, which differ by the time evolution method, are compared in detail in [85] and benchmarked with numerically exact Krylov-State method.As expected, the TDVP-MPS method produces more accurate results as time passes a short time scale.Since the numerical cost of TDVP-MPS and TEBD-MPS is comparable, TDVP is the better approach to calculate OTOC in the Schrodinger picture.In the state-based method, the primary source of error is the incapability of representing the complicated state using an MPS with a finite bond dimension.
MPO based method -During the time evolution of a state, the entanglement of the states increases uniformly across the whole system.On the other hand, because of the emergent lightcone structure in the Heisenberg evolution of a local operator shown in Eq. ( 13), the Heisenberg operator W (t) is almost a product of identity operators I ⊗• • •⊗I outside the lightcone and becomes complicated inside the lightcone.For this reason, the entanglement entropy of an operator, which can be regarded as a state with doubled physics degree freedom, develops a lightcone profile as well, staying small outside the lightcone and becoming large within the lightcone regardless of the time scale.Therefore, the part of the Heisenberg operator outside the lightcone can be captured accurately using an MPO with a small bond dimension.This intuition leads to the method of calculating OTOC in the Heisenberg picture using MPO.In the MPO-based method, the main step is to evolve the Heisenberg operator, which is similar to evolving MPS by treating the MPO as an MPS with doubled physics degrees of freedom.The time evolution operator to evolve the operator state is This can also be viewed as evolving the operator state |W using the "super Hamiltonian" H ⊗ I − I ⊗ H * .The time evolution can be implemented using either TEBD or TDVP.OTOC can be calculated once MPO form of the Heisenberg operator W (t) is obtained.In this approach, it is more convenient to calculate the OTOC in the form of a squared commutator where V r , a local operator on site r is scanned over the system.Given the MPO of W (t), the MPO of the commutator [W (t), V r ] is obtained efficiently by modifying the local tensor at site r as follows The bond dimension remains the same because V r is a local operator.The squared commutator can be obtained by contracting the MPO of the commutator with its Hermitian conjugate copy.However, noticing that C(r, t) is just the square of the Frobenius norm of the commutator, this last step can be replaced by evaluating the norm of the MPO directly by treating it as an MPS.Evaluating the norm of an MPS is a standard procedure in the tensor network simulation.Calculating the norm then squaring it is much more accurate than calculating the overlap between the MPO and its Hermitian conjugate copy, especially when C(r, t) is small.Fig. 9(b) shows the squared commutators from the MPO method and the Krylov-State method for 20 qubits.The MPO results are accurate when the squared commutator is small but starts to deviate from the exact result from the Krylov-State method as the squared commutator reaches ∼ 1.
As shown in [67], because of the lightcone structure of the Heisenberg operator, the MPO method is very accurate in capturing the tail of OTOC even with a bond dimension as small as 4, and can be used to obtain the butterfly velocity.This method also clearly demonstrates a broadened wavefront, agreeing with the general growth form for p > 0, in contrast to the behavior in the SYK chain.Obtaining the asymptotic value of p is quite subtle and requires accessing a wider space-time region and larger bond dimension.The TEBD-MPO method was also compared with TDVP-MPS method in [85], showing both of them are accurate in capturing the tail of OTOC and tracking the lightcone.The TEBD-MPO has two advantages over the TDVP-MPS.First, it does not require sampling over the initial states and thus is free of statistical error.Second, at given time t, the commutator of W (t) and V r and therefore C(r) for all r can be obtained efficiently using Eq. ( 121).On the other hand, in TDVP-MPS method, in order to get C(r, t) for all site r, initial states V r |ψ with different r need to be evolved individually in addition to sample |ψ .
One can also evolve the Heisenberg operator using the TDVP method based on the super-Hamiltonian H ⊗ I − I ⊗ H * .The quantity made explicitly conserved in the TDVP-MPO method, instead of the energy, is the expectation value of the super Hamiltonian which is just zero for a Hermitian operator W (t). In practice, TDVP-MPO does not significantly increase the accuracy of TEBD-MPO.It is less accurate to capture the exponentially small value of OTOC because it does not directly take advantage of the lightcone structure.However, TDVP-MPO is still a very useful and convenient method to use when the system has long-range interaction [46], in which case the circuit from Trotterization does not admit the structure shown in Fig. 2. It is also possible to generalize the TDVP-MPO method to conserve energy or/and charge, which remains to be explored.
A few remarks are in order.First, the MPO-based method utilizes the lightcone profile of the entanglement structure of the Heisenberg operator.It becomes exact if the operator entanglement entropy is bounded, such as in the non-interacting system.Second, in a chaotic system, because of the rapid generation of operator entanglement entropy within the lightcone, the tensor network based on the method is not expected to capture the growth and saturation of C(r, t).However, the accuracy of the MPO-based method can be improved significantly with increasing computational cost.The idea is evolving both operators W and V , forward and backward respectively, and evaluate This expression is equivalent to Eq. ( 121), since [W (t/2), V r (−t/2)] = e −iHt/2 [W (t), V r ]e iHt/2 .This time-splitting method divides the entanglement growth within the lightcone into two operators and reduces the truncation error.The cost is that each V r needs to be evolved individually in order to obtain the full space-time profile of C(r, t).In addition, the fast construction of the commutator in Eq. ( 121) is no longer applicable, since now the commutator is between two time evolved Heisenberg operators, both taking MPO form with bond dimension χ.As a result, the MPO of W (t/2)V (−t/2) and W (t/2)V (−t/2) has bond dimension χ 2 , and the MPO of their difference, the commutator, has bond dimension 2χ 2 .In graphics, The minus sign in the local tensor of the last site implements the difference between the two terms in the commutator.Once the MPO of the commutator is constructed, its Frobenius norm and thus the OTOC C(r, t) can be calculated the same as before.As shown in Fig. 9(b), the time splitting approach significantly enhances the accuracy of MPO in the late-time regime.Finally, in all the tensor-network based methods applied to a system with ∼ 100 spin, because of the targeted intermediate time scaling ∼ 100/J, where J is the coupling constant, one should carefully check the convergence of the result with the Trotter time step.In general, the time step dt should be reduced to ∼ 0.005/J to avoid accumulating the Trotter error.

VII. EXPERIMENTAL SCHEMES
As we enter an era of quantum simulation, marked by the existence of multiple experimental platforms with unprecedented power to control and detect quantum manybody physics far from equilibrium, there is a surge of interest in measuring scrambling dynamics and OTOCs.The essential step required to measure an OTOC is to write it as an observable or a combination of observables, including specifying the initial states, the time evolution operators, and the measurements.In this section, we briefly discuss several such experimental schemes, including those already implemented in the laboratory.This discussion focuses on the essence of several OTOC measurement protocols, and it does not delve into the experimental details of various concrete experiments.These schemes fall into two main categories, one requiring rewinding time and the other one requiring measurements averaging over random states.

A. Rewinding time
The experimental schemes falling into this category require engineering both forward evolution operator U (t) and backward time evolution operator U (−t), and is closely related to the Loschmidt echo [90,91].The first scheme that we describe is based on an interferometric protocol [92].This scheme requires introducing a reference qubit that is initialized in the state The initial state of the system and the reference qubit is To measure OTOC W † (t)V x r W (t)V x r , one first applies the controlled V gate to the state (assuming V is unitary), acting on-site r and controlled by the reference qubit, then evolve the state by a butterfly unitary circuit U butterfly , and applies the CNOT gate.The butterfly unitary circuit U butterfly does not act on the ancillary qubit and is in the following form where U (t) is an arbitrary unitary circuit and W is a local unitary gate.The butterfly circuit is just another name for the Heisenberg operator of a local unitary operator.After these steps, the state is The last step is to measure σ x on the reference spin.The expectation value, which can be obtained by running the circuit and measuring repetitively, is in the form of an OTOC.The imaginary part of the OTOC can be obtained by measuring σ y instead of σ x .This protocol works for an arbitrary initial state of the system but requires introducing the reference spin and the CNOT gate and is used in [93] For certain initial states, introducing the reference qubit is not necessary.Consider an initial state that is an eigenstate of a local operator, for instance, σ x r .This means that the local qubit of the state is either |+ or |− .The idea is to measure whether the qubit at site r remains an eigenstate of σ x r after evolving by the butterfly circuit.This is quantified by its expectation value, given by If the support of W (t) does not overlap site r, then the local qubit remains an eigenstate, otherwise the expectation value decay.Formally, one can write the above equation in the form OTOC ψ| W † (t)σ x r W (t)σ x r |ψ , using the fact that the state |ψ is an eigenstate of σ x r .This scheme is carried out in [94][95][96] In the previous example, the OTOC is measured with respect to a pure state.To directly measure OTOC for the infinite temperature ensemble, the most straightforward setup is to prepare double copies of the system, such as a ladder.The initial state is a product of Bell states across each rung Then OTOC can be cast into an observable in this double copied system Replacing the initial state |Ψ with the thermal field double state | √ ρ generalizes this scheme to measure OTOC at finite temperature for a specific generalization [97].
The double copy setup is also essential to directly realize the many-body teleportation protocol discussed in Sec.IV, where one copy represents the system and the other copy represents the memory qubit owned by Bob.The many-body teleportation protocol has been implemented in ion trap [98] and superconducting qubits [99] experiments.
It is also possible to measure OTOC at infinite temperature without introducing the second copy in NMR experiments [100][101][102].In this setup, the initial state is a high temperature mixed state in the presence of the magnetic field along the z direction.The initial state is given by The density matrix is then evolved by unitary time evolution in the form of the butterfly circuit, which can be engineered by a sequence of pulses.In the evolved mixed state, the total magnetization is given by in the form of an OTOC.Note that the identity part of the initial mixed state does not contribute to the expectation value.In NMR experiment, a natural choice of W is exp(iθ r σ z r ) from a pulse, which is a non-local operator.Then result σ z is a periodic function in θ with period 2π, denoted as σ z θ .Using the Lehmann representation, one can show that (134) This squared commutator approximately counts the number of spins within the support of the local Heisen-berg operator σ z (t).In practice, one can perform a discrete Fourier transformation on a finite number of measurement results with a discrete value of θ.We note that it is also possible to measure the OTOC between two local operators using selective pulses for an ensemble of small molecules [100].

B. Randomized measurement
The key ingredient for above protocols is the butterfly circuit U butterfly , which requires implementing both U (t) and U † (t), namely effectively rewinding the time, and can be challenging for generic U .An experimental protocol that bypasses this requirement is to exploit statistical correlation from randomized measurement [103], which has been implemented in NMR [104] and ion traps [105].In its simplest form, this protocol considers the product of two expectation values When the initial state is averaged over Haar ensemble, using the Haar random average formula in Eq. (A15), the above quantity leads to Typically, the first term is O(1), while the second term, which is the important piece, is O(2 −N ).However, when W is a traceless operator, the first term vanishes and only the second term, the OTOC, survives.This scheme can also be extended to obtain the leading finite temperature correction of OTOC as well.In this protocol, the major challenge is to prepare the Haar random initial states.Alternatively, one can also estimate the OTOC by sampling over local unitary that acts on each qubit and averaging the initial states in the computational basis in a specific manner.In this approach, the quantity that is being considered is Compared with Eq. ( 135), the two expectation values are measured from two different states |ψ n and |ψ 0 .The state |ψ n are generated by acting local unitary to a state |n in the computation basis ) where u i is drawn from the Haar ensemble or any ensemble producing the same averaged result as the Haar ensemble involving four copies of the state (such ensemble is called 2-design) and n i ∈ {0, 1}.The state |ψ 0 is generated from all-zero state |0 using the same unitaries.The random average of the unitaries can be performed independently on each qubit using Eq.(A15).To gain insight into the result, we first average F n over u 1 .We define the following operator acting the first qubit Averaging over u 1 leads to (140) The trick here is to use a weighted sum over n 1 to cancel the first term.One can show that Consecutively averaging over other local unitaries u i and summing over n i in the similar fashion leads to the OTOC One can also extend this protocol to directly measure the operator probability distribution defined in Sec.V [106].

VIII. EPILOGUE
In this tutorial, we have covered the definition of quantum scrambling dynamics in generic quantum manybody systems, which is distinct from thermalization.While thermalization concerns the local density matrix, scrambling dynamics manifests in non-local degrees of freedom.We have shown that scrambling dynamics can be understood by treating unitary dynamics as a quantum communication protocol.Scrambling, by definition, is characterized by the time-dependent mutual information between a referee qubit, which is initially entangled with one of the qubits, and a partition of the system.It also affects other non-local dynamical properties of the system, such as the entanglement entropy and entanglement spectrum [107].Importantly, we have shown that one does not need full access to the system to recover an initial qubit state prepared in one of the qubits in the many-body system, even when the system is fully scrambled.In the Hayden-Preskill setup, which involves a maximally mixed initial state, the mutual information can be converted to the out-of-time ordered correlator (OTOC).We have demonstrated the important role of OTOCs in quantum information and dynamics and discussed their general behavior in local systems featuring ballistic information dynamics.We have presented several toy models where the OTOC can be calculated exactly, as well as numerical tools to compute OTOCs in generic systems.We have also surveyed recent exciting experimental progress in detecting information dynamics.However, it is not practical to cover all aspects of this large field in this tutorial article.There are several related interesting topics that we did not discuss, and we would like to briefly mention them here.
Finite temperature -First, this tutorial largely considered scrambling at infinite temperature.Physically, this amounts to saying that Alice and Bob were using a medium at high temperature compared to the scales in the intrinsic Hamiltonian.Mathematically, this means we considered OTOCs where the expectation was taken in the maximally mixed state.This regime is ideal for understanding the basics of information dynamics since we do not have to deal with any static correlations in the quantum state.However, it is interesting to understand information away from infinite temperature, especially in the context of black hole physics.
Given a quantum state ρ, one can define an OTOC in this state by F ρ = tr(ρW (t)V W (t)V ).When ρ is the maximally mixed state, this recovers the simple trace expression we considered for most of this tutorial.However, if Alice and Bob wish to use a medium at non-infinite temperature to convey quantum information, then one expects so-called thermal OTOCs to be relevant where ρ ∝ e −βH is a thermal equilibrium state.However, the physics of scrambling is more complicated in this case.This is because there are multiple versions of F ρ called "regulated" OTOCs, Fρ = tr(ρ q1 W (t)ρ q2 V ρ q3 W (t)ρ q4 V ), (143) where q i ∈ [0, 1] and i q i = 1.This class of objects corresponds to displacing the operators W (t) and V in imaginary time (when ρ ∝ e −βH is the thermal state).The case q 1 = 1, q i>1 = 0 is the usual OTOC in state ρ.The case q i = 1/4 is a particularly natural choice from a mathematical point of view; this is the form of OTOC to which the chaos bound [11] applies.In early examples, different regularizations gave equivalent characterizations of the scrambling dynamics.However, it was later discovered that the butterfly velocity can depend on the choice of regularization [108][109][110].This raises the question of which regularization is most relevant for information spreading.A finite temperature version of some of our information calculations was given in an appendix of [12].A notion of "perfect size winding" [111] has also been shown to be related to optimal many-body teleportation at finite temperature.Both of these cases are related to the (1/2, 0, 1/2, 0) and (1/4, 1/4, 1/4, 1/4) OTOCs, but there is still work to do to elucidate the general structure of information spreading at finite temperature.There are several proposals to measure OTOC at finite temperature for different regularization [97,103,112], and recently OTOC with regularization (1/2, 0, 0, 1/2) is measured experimentally in a small system [113].
Symmetries -Symmetries and conservation laws can strongly affect scrambling dynamics.The basic intuition is that an initial state cannot scramble as much in the presence of conserved quantities due to the restricted Hilbert space.For instance, the early growth rate of OTOC is suppressed by temperature [11] and/or chemical potentials [114,115].Moreover, in extended systems, conserved quantities usually lead to slow mode associated with their transport, which can significantly slow down the scrambling dynamics and cause a prolonged powerlaw tail of OTOC in the late time regime [52,[76][77][78]116].
An interesting question regarding symmetry is how it affects the information recovery fidelity in the Hayden-Preskill protocol discussed in Sec.IV.It boils down to studying the late-time value of OTOC appearing in Eq. ( 74), which we repeat here where W (−t) = U W U † .This equation holds for any unitary operator U acting on qubit systems.In the presence of conserved quantity Q, we have [U, Q] = 0. Due to the block diagonalized structure U , the late-time value of OTOCs cannot be estimated using Haar random unitary such as in Eq. (86).Instead of scaling with 1/2 N , one can show that in systems with conserved charge or energy, the late-time value of the OTOCs scale as O(1/poly(N )) [115,117].Larger symmetry groups can even lead to a finite late-time value of OTOCs and thus significantly suppress the recovery fidelity [118,119].
More on wavefront broadening -Another direction concerns conjectured universality in the OTOC structure in locally interacting systems.In Section V C we indicated that semi-classical/large N models and random circuit models gave two distinct classes of OTOC behavior, and we argued that the random circuit behavior was generic, i.e. that finite N corrections would qualitatively change the large N form of the OTOC.It is important to understand better the universality classes that can arise, especially in Hamiltonian systems at non-infinite temperature.In the literature, OTOCs are sometimes analyzed along a ray within the lightcone in terms of a velocitydependent Lyapunov exponent [73,120], where v = |x|/t.For instance, λ(v) = λ(1 − v/v B ) p+1 for the growth form in Eq. (96).The definition of the butterfly velocity is λ(v B ) = 0, and at large N one typically finds that ∂ v λ(v)| v B = 0.This corresponds to p = 0 in our discussion above.By contrast, the 1d random circuit form has ∂ v λ(v)| v B = 0, which corresponds to p = 1.As we said, the latter form is conjectured to be generic in 1d, but it is important to better understand this issue.Quantum chaos -Another motivation for studying OTOCs in the literature lies in their connection to quantum chaos.In the semi-classical regime, schematically, one can replace the commutator in the squared commu-tator with a Poisson bracket and obtain [24,121] which measures the sensitivity of the final position to the initial position, and thus classical chaos.Then it seems natural to promote the squared commutator to be a diagnosis of quantum chaos [11].However, it turns out to be quite subtle [122].For instance, although C is expected to exhibit early time Lyapunov growth in the semi-classical limit of many-body systems before saturation [121,123,124], not all quantum many-body systems have a semi-classical limit, some large N limit in which quantum fluctuation is suppressed.The random quantum circuit introduced in Sec.V D does not have the exponential growth behavior expected to be a diagnosis of quantum many-body chaos.Furthermore, some integrable systems exhibit early-time exponential growth due to unstable dynamics [125][126][127].Therefore while the latetime value of OTOC has a clear physics meaning as discussed in Sec.V, the connection of the early-time growth of the squared commutator to previously proposed measures of quantum many-body chaos (see [128,129] for discussions), such as the spectrum form factor from the random matrix behavior, needs to be further settled.A more general question is whether quantum scrambling and quantum many-body chaos measure the same property, and if not, when they differ from each other.This discussion also largely hinges on a precise definition of many-body chaos.
More on numerical methods -Ongoing experiments on scrambling and non-equilibrium quantum many-body dynamics in general are reaching system sizes beyond the capability of exact diagonalization, calling for new numerical tools.In Sec.VI, we discussed an MPO/MPSbased method to calculate the tail of OTOCs for large system sizes in 1D utilizing the lightcone structure of operator spreading.One direction is to extend such a method to higher dimensions using other ansatz for states or operators.For example, [130] calculates the OTOCs of the mixed field Ising model in 2D using the restricted Boltzmann machine.
Moreover, it would be ideal to develop general-purpose numerical tools to directly simulate many-body teleportation for arbitrary unitary circuits with local structure and initial states for large system sizes of ∼ 100 qubits.The conventional MPS/MPO based method does not work due to the rapid growth of the entanglement entropy that an MPS with a finite bond dimension cannot capture.It would be interesting to explore the effectiveness of other wavefunction ansatz, such as various neural network states or multi-scale entanglement renormalization ansatz (MERA), in this context.
Understanding scrambling dynamics, i.e., how information flows from local to non-local degrees of freedom, is also useful for developing numerical methods to simulate conventional thermalization dynamics of local observables and calculate transport coefficient, which is usually given time ordered two-point correlation function.Schematically, in a strongly interacting system, the equation of motion of single qubit observables depends on the equation of motion of two-qubit observable, which depends on three-qubit observables, leading to an infinite series of equations involving arbitrary order of correlations functions that becomes impractical to solve.Transport properties are captured by low-order correlations.It is tempting to assume the dynamics of sufficiently high-order correlation functions do not feed back to the dynamics of simple correlation functions and to truncate the infinite series of equations or simplify the higher-order equations by approximation.These ideas have led to multiple new numerical algorithms [131][132][133][134][135].There is still work to do to justify the assumptions and better understand the interplay between the scrambling dynamics and dynamics of local observables.
We believe that we are only just starting to explore this exciting field of quantum information scrambling.With the many connections discovered so far and the prospect of new large-scale experiments on the horizon, there are many exciting possibilities to explore, including the discovery of new maximally chaotic quantum systems, the laboratory simulation of holographic models of quantum gravity, a deeper understanding of quantum chaos, new insights into transport in strongly interacting systems, and much else.We therefore hope that the reader will consider getting into this field and bringing a new point of view.
Consider a non-interacting Majorana system described by the Hamiltonian where χ is the Majorana operator that obeys the commutation relation {χ r , χ r } = δ rr .The Hermicity of H requires that h rr = −h * rr and h rr = −h r r .Therefore h is a purely imaginary antisymmetric matrix.In addition, we assume that H is translational symmetric and thus can be diagonalized by Fourier transformation.Introduce Then the Hamiltonian can be written as The spectrum k is an odd function in k because matrix h is imaginary and antisymmetric.The goal here is to calculate the OTOC or equivalently the squared anticommutator defined in the appendix Using the Hamiltonian in the momentum space, one can show that the Heisenberg operator χ 0 (t) is where Because k = − −k , the coefficient g(r, t) is real as expected from expanding the Hermitian operator χ 0 (t).
The fact that only a single Majorana operator appears in the expansion is because of the non-interacting Hamiltonian.In general, Majorana strings of all lengths would appear.From g(r), one can obtain the OTOC as Therefore, the behavior of C(r, t) is controlled by g(r, t), which can be analyzed using saddle point approximation.We expand the function around k 0 as C8) where v(k 0 ) is the group velocity at k 0 .One can always find suitable k 0 making the first derivative term vanish for |x| < max(|v(k 0 )|)t, while it is not possible to do so if |x| > max(|v(k 0 )|)t.This leads to a change of behavior of g(x, t) at |x| = max(|v(k 0 )|)t, indicating that the butterfly velocity v B is the maximal velocity.In the region x > v B t, the first-order derivative term is always nonzero.We keep up to the third-order term and obtain that where Ai(Z) is the Airy function.Note that here (3) k0 is negative since the group velocity is maximal at k 0 .In the limit that r − v B t |∂ 3 k ε(k 0 )t/2| 1/3 , we can use the asymptotic form of the Airy function and obtain Eq. ( 95).In this case, the wavefront broadens subdiffusively with δr ∼ t 1/3 and the broadening exponent p = 1/2.On the other hand, in the long time limit t r/v B , one can find k 0 that makes the first derivative vanish and perform the Gaussian integral to get g(r, t) ∼ 1/t 1/2 .As a result, in the long time limit C(r, t) decays to 0 as 1/t in the non-interacting systems.This is in sharp contrast with a scrambling system where C(r, t) approaches 2 in the long time limit.These results from saddle point analysis can be explicitly checked by exact calculation using the nearest neighboring hopping model given by h rr = iδ r,r +1 − iδ r,r −1 .In this case, k = sin k and g(r, t) = J r (t), a Bessel function.

Brownian models
We have seen that the random circuit model and the SYK model give rise to distinct functional forms of the OTOC.Both cases feature a ballistic lightcone, but the wavefront broadens diffusively in the random circuit model while the wavefront is sharp in the SYK model.The question then arises which, if any, of these characteristic shapes describes the generic case with a finite local Hilbert space dimension.Unfortunately, this question cannot be reliably answered using small-sized numerical simulations.These exhibit ballistic expansion with some broadened wavefront, but it is unclear if the broadening will vanish in a large system, tend to the diffusive limit, or have some other characteristic form.Non-interacting particles exhibit a ballistic expansion of C with yet another characteristic broadening of the wavefront (C does not saturate at late time in this model, instead falling back to zero).One should not expect the non-interacting limit to be generic, but the spectrum of multiple different universality classes is certainly raised.
To answer this question, we now introduce another class of models known as the Brownian models [10,[34][35][36][138][139][140], in which the interaction between the underlying degrees of freedom are stochastic variables.Here we consider a specific version of the model with spinspin interactions, called the Brownian coupled cluster model.Like the random circuit model, it features a random time-dependent Hamiltonian, but unlike the random circuit model, it has a large N limit.Using it, we will develop a physical picture of why p = 1 is generic for one-dimensional scrambling systems.The model can be defined in any dimension, but here we continue to focus on d = 1.The degrees of freedom are arranged in clusters which are then connected into a one-dimensional array.Every cluster contains N spin-1/2 degrees of freedom, and there are L clusters.The Hamiltonian is timedependent and consists of two kinds of terms, withincluster interactions and between-cluster interactions.In order to avoid mathematical complexities associated with stochastic calculus, it is simplest to present the model in discrete time.
The time evolution operator is with m a discrete time index.The within-cluster terms and the between-cluster terms are where α, β ∈ {0, 1, 2, 3}, a, b = 1, • • • , N label spins within a cluster, r, r label clusters (sometimes called sites), and rr means nearest neighbors.At each time step, the models contains two sets of uncorrelated random variables J and J with mean zero and variance 1 8(N −1) dt and 1 16N dt, respectively.In the limit that dt → 0, one can formulate a stochastic differential equation for the time evolution operator.From it, one can derive a master equation for the operator probabilities |c(S)| 2 averaged over circuit realizations, i.e., over realizations of the couplings J and J.We will not get into the details of these equations here, but see Ref. [35] for complete details.The only important property we need is that |c(S)| 2 depends only on the total number of non-identity Pauli operators on each cluster.This is technically an approximation, but it holds after a short time even if the initial condition does not obey it because the circuit average erases any distinction between the different Pauli operators.The total number of non-identity Pauli operators in P S on cluster r is called the weight of the cluster and is denoted w r (S).
In this model, the operator averaged squared commutator C has a cluster index as well as the index of the qubit within a cluster,

(C13)
According to Eq. ( 84), C a (r, t) = which measures the averaged number of non-identity operator that appears over the N spins, or the averaged operator weight w r within the cluster r.At early time, φ(r, t) ≈ 0 while at late time it saturates to φ(r, t) = 2 as the usual case.
Like the Haar random circuit, using random averaging of coupling, one can derive a master equation governing the dynamics of the operator probability |α(S)| 2 .When N is small, it can be shown that C(r, t) obeys a driftdiffusion equation as in the random circuit model.This leads to a circuit averaged φ(r, t) obeying the universal form Eq. ( 96) with p = 1.Hence the Brownian coupled cluster model recovers the result of the random circuit model at small N .
At infinite N , something very different occurs.It can be shown that φ(r, t) obeys a so-called Fisher-Kolmogorov-Petrovksy-Piskunov (FKPP) type equation of the form Here g is the ratio of the strength of the between-cluster and within-cluster terms, and we have taken a continuum limit, which is qualitatively accurate.Although we will not explain its detailed derivation, one can see that this equation contains three essential pieces of physics: exponential growth in time, spreading in space, and saturation.The FKPP equation is very well known and describes a wide variety of physical processes including the propagation of combustion waves, the dynamics of invasive species, and the physics of certain quantum chromodynamics processes.
The key physical property of the FKPP equation is that, starting from a localized source, it supports traveling wave solutions with C(r, t) = f (r − v B t) where v B = 18g 2 (1 + g 2 ) is the butterfly velocity.Well ahead of the front at r = v B t, the waveform is φ(r, t) ∼ e λ L (t−r/v B ) , (C16) which is Eq. ( 96) with p = 0. Hence the Brownian coupled cluster model also recovers the physics of large N and semi-classical results.The exponent λ L = 6(1 + g 2 ) is an example of a quantum Lyapunov exponent.
Given the large and small N limits, the next question is how they are connected as N is varied.Physically, the infinite N limit functions to suppress quantum fluctuations, so that one may view the distribution |c(S)| 2 as being concentrated on a single weight configuration.At finite N , quantum fluctuations occur, meaning that the distribution |c(S)| 2 now assigns non-vanishing probability to different operators weight configurations.It is important to understand that these fluctuations are which is a convolution between a Fermi-Dirac function and a Gaussian distribution, describing a traveling wave moving at velocity v.In the limit that λ → ∞, the Fermi-Dirac distribution function becomes a step function, and Eq.(D1) recovers the behavior of the random circuit model.On the other hand, in the limit that D → 0, the Gaussian distribution becomes a delta function, and Eq.(D1) reduces to the exponential form for the large N models.Let us work with the dimensionless variables which recovers the large N form.Interestingly, both the Lyapunov exponent and the butterfly velocity increase by a factor (1 + ξ/2).We perform the integration numerically and plot log C as a function of r − t fixing t in Fig. 10(a) which clearly demonstrates that log C interpolates between the two limiting cases, the error function and the exponential function.
Based on the discussion above, one arrives at the following picture of scrambling dynamics in an extended quantum many-body system with short-ranged interaction and finite local Hilbert space dimension.The information propagates ballistically with a butterfly speed v B .Due to inevitable quantum fluctuation, in the spacetime region near r − v B t ∼ √ Dt, which we denote as diffusive region, C(r, t) is characterized by a diffusive broadened wavefront.In this region, there is no welldefined Lyapunov exponent that is independent of position and velocity.However, far ahead of the wavefront r − v B t > λD/v B t, C(r, t) grows exponentially with a well-defined Lyapunov exponent, and thus we denote this region as the exponential region.
Still, it would be nice to check this claim that the wavefront broads diffusely in a quantum spin chain with no randomness in space or time and generic interactions.More generally, the preceding discussion did not provide a method to calculate squared commutators for generic physical systems.One may wonder if there exist general methods or numerical algorithms to study OTOC for these systems.This will be the topic of the next section.Before going to the details of these methods, we would like to point out that numerical results strongly support diffusive broadening of the information wavefront in spin chains with large system sizes.Fig. 11 shows data obtained from the matrix product operator approach introduced in Sec.VI for the mixed field Ising model at infinite temperature.The simulation is performed for n = 201 spins out to quite a long time, several hundred 1/J.It has been checked that the results are converged in bond dimension with a bond dimension as low χ = 32.What is The inset shows a log-log plot of the same data.The asymptotic approach to a slope of 1/2, corresponding to p = 1, is clearly visible in the data.Taken from Fig. 4(b) of [35].
plotted are the contours of the constant squared commutator.The inset shows the difference between different contours as a function of time on a log-log plot.Eq. (97) predicts that the difference between contours should go like t p p+1 , so on a log-log plot the data should approach a straight line of slope p p+1 .This is precisely what occurs with an asymptotic slope of 1/2 corresponding to p = 1.Hence we verify that for a large, non-random interacting spin chain, the operator growth dynamics is ballistic with a diffusively broadened front exactly as predicted.
The chaotic regime ahead of the front should also be present but is difficult to observe.Based on the picture shown in Fig. 11(b), the size of the chaotic regime is determined by the ratio v 2 B /λD.Therefore, one may observe the exponential growth in systems with large v B but small λ, as shown in [149].

Appendix E: Lieb-Robinson bound
This appendix reviews an elementary proof of a Lieb-Robinson bound for a simple one-dimensional spin to give a sense of how it works.The analysis follows a discussion of Osborne.Let's assume the Hamiltonian can be written as a sum of terms h r that act on sites r and r + 1.This can always be done coarse-graining any finite range interaction.Let the operator norm of h r be J, which measures the local energy scale of the Hamiltonian.
Consider an operator W located at site r 0 .The goal of Lieb-Robinson is to upper bound how far from r 0 this operator can spread after time t.The rough idea is to consider a series of approximations to W (t) which involve truncating more and more distant terms in the Hamiltonian.These truncations then converge, roughly speaking, to W (t) while also giving bound on the spreading.(E8) Using AB ≤ A B and the triangle inequality, one has The factor of four is a crude upper bound that takes into account both h r0+ and h r0− which both appear twice due to the commutator.Now we solve the upper limit of this system of differential equations with the initial condition that α 0 = W and α >0 (t = 0) = 0.The result is This result is almost the end of the calculation.The remaining thing to do is to estimate the difference between W and the true W (t).This is W ∞ (4Jt) j j! .
(E11) There are various ways to treat this infinite sum.For 4Jt, the simplest estimate is to say that it cannot by much larger than its first term, which is quite small.More precisely, the ratio of term j to term j + 1 is 4Jt j+1 ≤ 4Jt +2 , so making even a crude approximation using a geometric series in 4Jt +2 converges to something order one times the first term.After using Stirling's approximate for large , the first term is This result corresponds to roughly the th order in perturbation theory when expanding W (t) in a Taylor series.Physically, it will describe the commutator dynamics for sufficiently small t and large .Neglecting the difference between and + 1, the first term is order one when = 4eJt.Setting 0 = 4eJt, the first term can be written as e log 0 . (E13) Using −1 ≥ −1 + log 0 ˜ (valid for ˜ ≥ 0 ) and integrating both sides from 0 to , it follows that The left hand side is the first order expansion of the right hand side in − 0 , so the inequality states that going beyond first order only decreases the value.Hence e log 0 ≤ e −( − 0) , (E15) or, using 0 = 4eJt, Here f (t) is a polynomial prefactor that does not affect the basic exponential scaling.Note that the bound is definitely not tight at very large , since 1/ !decreases faster than e − .The bound is also trivial once < 0 because the right hand side is growing exponentially while the left hand side is bounded by 2 W .
Having established that W is close to W (t) for Jt, remains to upper bound the commutator.The idea is straightforward: If an operator V is a distance r from W , then an upper bound on the commutator [W (t), V ] is obtained by approximating W with W =r−1 since W r−1 exactly commutes with V .First add and subtract W r−1 inside the norm to give and then use [W r−1 , V ] = 0 and the bound on W (t) − W r−1 to obtain Using the upper bound above, this is which is Eq. ( 15) in the main text.

FIG. 3 .
FIG. 3. (a) Quantum unitary circuit made from SWAP gate.The circuit is constructed by replacing the generic two-qubit

FIG. 6 .
FIG.6.Control information propagation by long-range coupling and disorder.(a) As the long-range connectivity of a system increases, the shape of the information lightcone changes from a ballistic lightcone to a algebraic lightcone, and finally to an exponential one.(b) Increasing the strength of disorder in a system causes a slowdown in information propagation.As the disorder strength increases, the shape of the lightcone changes from ballistic to algebraic and finally to logarithmic.

FIG. 7 .
FIG. 7. Schematic sketch of C(r, t) for local clean systems.In this case, the information lightcone is ballistic.The squared commutator C(r, t) increases with t and decreases with r.

FIG. 8 .
FIG. 8. Schematic illustration of the squared commutator C(r, t) with a ballistic propagating front in (a) large N /semiclassical systems, (b) non-interacting systems and (c) random unitary circuits.

FIG. 9 . 1 and σ z r 2
FIG. 9. Comparison of the squared commutator between σ z r 1 and σ z r 2 calculated using different methods.(a)Comparison between the Heisenberg time evolution based on exact diagonalization (ED-Operator) and the Schrodinger time evolution based on Krylov method and quantum typicality (Krylov-State).The squared commutators from the two methods agree well with each other.(b) Comparison between the Krylov-State method and MPO methods.The Krylov-State method is numerically exact.The result from the MPO method, which involves truncation errors, agrees perfectly with the exact result up to C zz ∼ 1.The accuracy of the MPO results can be enhanced by time splitting.The inset shows the same figure on a log-linear scale, demonstrating that the three methods agree when C is small.The maximal bond dimension of both MPO methods is set to 32.
controlled by a single parameter ξ.For simplicity, we have dropped the tilde in r and t.The integration does not have a closed form, but we understand its behavior in different space-time regimes.First, one can show that C(r, t) = 1 when r = t, setting the wavefront as expected.When (r − t)/ √ t ≤ const., the Fermi-Dirac function approaches a step function in the large t limit, andC(r, t) ≈ erfc r − t √ 2t .(D4)On the other hand, when r −t > ξt, the exponential term dominates the denominator and the integration leads to C(r, t) ≈ 2 exp(ξ ((1 + ξ/2)t − r)), (D5)

FIG. 10
FIG. 10.(a) The squared commuter crossover from the diffusive regime to the exponential regime as r − t increases, i.e., moving away from the lightcone.(b) A schematic illustration of the different space-time regions of the squared commutator.Near the lightcone, its behavior is dominated by the diffusive region, displaying diffusive broadening.

5 FIG. 11 .
FIG. 11.Separation between two different contours of constant C as a function of time for the mixed-field Ising model.The inset shows a log-log plot of the same data.The asymptotic approach to a slope of 1/2, corresponding to p = 1, is clearly visible in the data.Taken from Fig.4(b) of[35].
Denote the restriction of H to the interval [r − , r + ] by H , which means keeping only terms from H that are fully supported on the interval.The restricted Hamiltoon r 0 suppressed.Using the H , define the sequence of Heisenberg operators W via W = e iH t W e −iH t .(E2)Toquantitatively estimate how these terms differ from each other, define the norms α byα = W − W −1 ∞ (E3)with α 0 = W ∞ .In terms of these, it is possible to upper bound objects of the form W − W ∞ asW − W ∞ ≤ j= +1 α j (E4)by repeatedly adding and subtracting W s and using the triangle inequality.The α s are determined using a differential equation, of the norm under unitary transformations, the right hand side can be equivalently written asd dt (U +1 W U † +1 − W ) +1 , W ] + [iH , W ] ∞ = [H +1 − H , W ] ∞ .(E7)The next step identifies H +1 − H with h r0+ + h r0− , since these are the only new terms in H +1 fully supported on [r 0 − − 1, r 0 + + 1] but not fully on [r 0 − , r 0 + ].We also use the fact that W −1 has no non-trivial support on r 0 ± or r 0 ± ( + 1) and hence commutes with H +1 − H . Thus the right hand side of the α differential equation can be taken to be[H +1 − H , W − W −1 ] ≤ 4J W − W −1 = 4Jα −1 .
After the measurement, Alice tells Bob the Bell state |s she obtained.Based on Alice's message, Bob applies the corresponding Pauli operator to his qubit or does nothing if the Bell state is |I .Then the state |a appears on Bob's qubit.To see why it works, we can visualize the final state of Alice and Bob for a specific outcome of the Bell measurement,