From Dual Unitarity to Generic Quantum Operator Spreading

Dual-unitary circuits are paradigmatic examples of exactly solvable yet chaotic quantum many-body systems, but solvability naturally goes along with a degree of non-generic behaviour. By investigating the effect of weakly broken dual-unitarity on the spreading of local operators we study whether, and how, small deviations from dual-unitarity recover fully generic many-body dynamics. We present a discrete path-integral formula for the out-of-time-order correlator and use it to recover a butterfly velocity smaller than the light-cone velocity, $v_B<v_{LC}$ , and a diffusively broadening operator front, two generic features of ergodic quantum spin chains absent in dual-unitary circuit dynamics. We find that the butterfly velocity and diffusion constant are determined by a small set of microscopic quantities and that the operator entanglement of the gates plays a crucial role.

It is natural to ask whether and in what sense DUCs might serve as a starting point to understand the behavior of more general systems. This question is especially relevant given recent advances in noisy intermediate scale quantum devices. While much is now known about DUCs themselves, there are only few results about deviations from dual unitarity [31][32][33] and many questions remain open. In Ref. [31] the behavior of local correlation functions in circuits close to dual unitarity has been investigated. It was found that in most instances local correlators acquire a more generic spatio-temporal structure, not being exclusively supported on the light cone anymore.
In this paper we investigate operator dynamics by considering out-of-time-order correlators (OTOCs) in a broad class of chaotic quantum circuits in which dual unitarity is weakly broken. We show that the OTOC can be expressed as a sum over all possible paths resulting from scattering on individual dual-unitarity-breaking gates, acting as defects in an otherwise dual-unitary circuit. We find that after an initial period in which the dual-unitary form is approximately preserved, even a weakly broken duality leads the OTOC to recover a nonmaximal butterfly velocity and a diffusive broadening of the operator front, hallmarks of generic one-dimensional chaotic quantum many-body systems absent from pure DUCs. The operator front at late times takes a universal form and its parameters are microscopically determined by the entangling properties of the gate. Our manipulations are controlled in the limit where the dual-unitarity-breaking gates are dilute along the direction of the light cone, but we argue that they capture the relevant characteristics of the OTOC on intermediate to long timescales even in the case of a Floquet circuit with dense perturbations. The developed framework is expected to be applicable to different probes of operator spreading.
Dual-unitary circuits. We consider circuits composed of unitary gates U acting on two sites with local Hilbert space dimension q, with matrix elements U ab,cd graphically expressed as (1) In this notation each leg corresponds to an index in the local Hilbert space, and connecting two indices corresponds to a tensor contraction (see e.g. [34]). Unitary gates arranged in a brickwork geometry [ Fig. 1(a)] provide simple models for local, unitary quantum many-body dynamics on a one-dimensional lattice [3][4][5][6]35], with the number of discrete time steps t corresponding to the number of rows in the circuit. A gate U is called dual-unitary if the associated dual gateŨ defined byŨ ab,cd ≡ U db,ca is also unitary [10,13].
Let us review the computation of OTOCs in dualunitary circuits [17]. We consider a basis of local operators σ α normalized according to tr σ † α σ β = qδ αβ . Setting σ 0 = 1, the remaining operators are traceless, and we take σ α (x) to act as σ α on site x and as the identity everywhere else. Denoting the time evolution operator as U(t), we write σ α (x, t) = U(t) † σ α (x) U(t) and consider the OTOC  [36][37][38], and is experimentally accessible in (digital) quantum simulation platforms [30,[39][40][41][42][43]. The OTOC's propagation speed is called butterfly velocity v B and sets the maximal speed at which information can spread in the circuit [44,45]. The OTOC exhibits a strong parity dependence. Here, we focus on C + (x, t) for which (t − x) ∈ 2Z, but the derivation is analogous for C − (x, t), for which (t − x) ∈ 2Z + 1. In general circuits, Eq. (2) can be graphically represented as the contraction of a two-dimensional tensor network, the size of which is set by the light-cone coordinates n = (t − x + 2)/2, m = (t + x)/2, where we have introduced the 'folded' gate acting on four copies of the local Hilbert space, as well as the following vectors, The tensor network (3) can be understood as the contraction of powers of a transfer matrix This transfer matrix is a contracting map, i.e. T n v ≤ v . Hence, all its eigenvalues lie inside or on the boundary of the complex unit disk. In the limit m → ∞ the OTOC is completely determined by the eigenvectors of T n with leading eigenvalue (i.e., modulus 1). If the gate is dual-unitary there exist n + 1 leading eigenvectors that are independent of any further characteristics of the gate and can be constructed explicitly [17]. If these vectors exhaust the set of leading eigenvectors, the gate is called maximally chaotic [15].
While the existence of a local conserved quantity implies the existence of further leading eigenvectors, the set of maximally chaotic gates is dense in the set of dualunitary gates [15]. We call the subspace spanned by this generic set of vectors the maximally chaotic subspace (MCS). Its construction and properties are elaborated in the supplemental material [46].
For circuits composed of maximally chaotic dualunitary gates the following holds [17]: After an initial transient regime, the OTOC for (t − x) ∈ 2Z is only nonvanishing on the light cone edge, |x| = t, where it takes the universal value −1/(q 2 − 1), resulting in a maximal butterfly velocity v B = v LC = 1. For (t − x) ∈ 2Z + 1 the OTOC decays exponentially inside the light cone. This behavior is to be contrasted with the OTOC in generic unitary dynamics, where v B < 1 and the ballistic spreading is accompanied by a diffusively broadening front [5,6].
Breaking dual unitarity. For concreteness we consider gates of the form U = V e iεW where V is a maximally chaotic dual-unitary gate, W is Hermitian, and ε is taken to be small (ε 1). How does the breaking of dual unitarity affect the OTOC? In the absence of dual unitarity and without further constraints, the transfer matrix has only a single, 'trivial', leading eigenvector that leads to lim m→∞ C + αβ (n, m) = 1 for finite n [17]. Signaling that the butterfly velocity in such a circuit is nonmaximal, v B < 1, the butterfly velocity then needs to be determined from the subleading eigenvectors of the transfer matrix.
The contribution of a subleading eigenvector with eigenvalue |λ| < 1 decays on a timescale τ ∼ (− log|λ|) −1 . For weak perturbations from dual unitarity, the manifold of leading eigenvectors is split by an amount O(ε 2 ). If in the limit m → ∞ a finite spectral gap to the remaining spectrum exists, then for sufficiently small ε the largest subleading eigenvectors are predominantly composed out of vectors in the MCS, and the behavior of the OTOC on long times t ≥ 1/ε 2 is determined by those eigenvectors.
In the absence of symmetries that force a gap closing, we expect this to be the generic scenario.
To proceed, we project the transfer matrix to the MCS, similar in spirit to degenerate perturbation theory, and compute the OTOC with the projected transfer matrix. If the perturbed gates are dilute along the light-cone, this approximation is controlled [46]. However, for perturbations that are sufficiently small compared to the spectral gap we expect this description to remain valid in the dense limit.
Notably, the MCS only grows linearly with the size of the transfer matrix compared to the full operator space, which grows exponentially. Reducing the dynamics to the MCS presents a significant computational advantage. Moreover, the resulting transfer matrix can be efficiently truncated, allowing for analytic evaluation (see below).
The matrix elements of the transfer matrix in the MCS, ( |T n |k), can be readily calculated [46]. These depend on the properties of the gates through a set of quantities B k , k = 1, . . . , n. Graphically Defining z 1 := (B 1 − 1)/(q 2 − 1) and z k := (B k − B k−1 )/(q 2 − 1) for k > 1 we find that (0|T n |0) = 1 and otherwise. (8b) They have the following structure: (i) the matrix is upper triangular, as a direct consequence of unitarity [46]. (ii) Except in the first row, the k-th side diagonal has the same entry everywhere. This is only the case if all gates are identical and follows from translational invariance. Taken together, these imply that the eigenvalues are λ 0 = 1 with algebraic multiplicity 1, and λ 1 = 1 − z 1 < 1 with algebraic multiplicity n. All matrix elements can be given a quantuminformation theoretic interpretation.
Inserting the Schmidt decomposition of the gate, U = j √ σ j X j ⊗Y j , into z 1 reveals that this quantity is equivalent to the linear operator entanglement of the gate E(U ) [47], with z 1 = 1 − q 2 q 2 −1 E(U ). The properties of the operator entanglement imply 0 ≤ z 1 ≤ 1. As dual unitarity of U is equivalent to U having maximal operator entanglement [48], z 1 = 0 iff U is dual unitary, and z 1 hence quantifies proximity to dual unitarity.
The subleading eigenvalues follow from Eq. (8b) as 1 − z 1 , such that the timescale τ for deviations from dual unitarity to become apparent follows as All B k for k > 1 can analogously be given the interpretation as operator entanglements of k-fold diagonally composed gates on enlarged Hilbert spaces [46]. Because the diagonal composition preserves dual unitarity [24], z k = 0 for all k iff U is dual-unitary. The z k are bounded by 0 ≤ z k ≤ (1 − z 1 ) k−1 , implying that z k → 0 for k → ∞ [46]. In general, knowledge of all higher-order operator entanglements is necessary to compute the OTOC, however they are increasingly less important.
While it is not possible to calculate arbitrary powers of the transfer matrix exactly, the structure of Eq. (8) allows for a systematic expansion of the OTOC using a path-integral approach. We write, where the |u j ) are vectors containing the off-diagonal matrix elements, ( |u k ) = ( |T n |k) for k > and ( |u k ) = 0 otherwise. The off-diagonal terms are of order z k ∼ 2 , and we expand the OTOC in powers of (T n − D). Each such power can be expressed as The sum consists of products of off-diagonal matrix elements indexed by sets {j 1 , . . . , j ν }. These indices can be interpreted as nodes of a path and the off-diagonal matrix elements ( |u k ) act as propagators determining the amplitude of jumping k − steps inside the light cone. Crucially, these only depend on the difference k − and the amplitudes for negative step sizes vanish, making sure only causal paths contribute to the OTOC [ Fig. 1(b)]. From Eq. (8) it follows that jumps of size k are controlled by z k /q k , such that jumps with a large k are exponentially suppressed. We note that having maximal operator entanglement implies that all off-diagonal matrix elements vanish. Hence, in dual-unitary circuits the edges of an operator string can only move along the edges of the light cone and we recover the previous result.
Although the mathematical origin is different, the picture presented here is qualitatively similar to the one presented in Ref. [31] for the two-point functions. The skeleton diagrams appearing there resemble the scattering paths introduced above. The similarity of the two results might merely be a reflection of the underlying physics of dual-unitary circuits. In dual-unitary circuits all excitations move with maximal velocity. Breaking dual unitarity then allows processes that violate this rule, suggesting expansions in orders of such processes.  Truncation of the path integral. To make analytical progress, we restrict the sum over paths. In the largeq limit the path integral is dominated by those paths in which single steps are at most of size 1, since higher-order steps are suppressed by powers of q [see Eq. (8)].
We now approximate the OTOC for general q by only considering these paths, and call this the one-step approximation. On the level of scattering amplitudes this approximation corresponds to letting only z 1 be nonzero. However, even for qubits (q = 2) the one-step approximation already produces the correct functional form of the asymptotic profile close to the center of the front [ Fig. 2 Within this approximation the path integral can be evaluated exactly, leading to where B z1 (a, b) denotes the incomplete β-function. An asymptotic expansion yields a butterfly velocity v B,1 = (1 − z 1 )/(1 + z 1 ) and a front that takes the form This is the form expected from a diffusively broadening front with diffusion constant . Both the form of the front and the scaling of the diffusion constant for v B,1 → 1 agree with results from Haar random circuits [5,6].
To obtain a better quantitative agreement for q = 2, we consider the two-step approximation. An exact calculation yields [46]  with ξ = z 2 /(q 2 z 1 ). This result can be understood by noting that the path integral is asymptotically dominated by the typical path. If z 2 z 1 the typical fraction of steps of size two is approximately given by the ratio of scattering amplitudes ξ = z 2 /(q 2 z 1 ). The steps of size two therefore effectively shift the profile [Eq. (13b)] deeper into the light cone, n → n(1 − ξ).
Close to the shifted front the shape of the profile is preserved, with renormalized parameters where δ = (1 + v B,1 )ξ/2. By taking into account longer steps, operator strings can move into the light cone faster, diminishing the butterfly velocity and, since this increases the variance of the distribution of the endpoints of operators, enhancing diffusion. Numerical results. Restricting the dynamics to the MCS (of linear size in n) presents an enormous simplification compared to the full exponentially large space when probing the long time limit. This restriction allows the efficient numerical evaluation of the OTOC under the assumptions stated above.
First, we study the behavior of the OTOC on the geometric light cone for a Floquet circuit consisting of identical perturbed dual-unitary gates with q = 2. We find that the main features of dual unitarity persist up to the timescale (9) [Fig. 3]. At this timescale, the deviation of the OTOC on the geometric light cone C + (t, t) from the dual-unitary prediction, −1/(q 2 − 1) = −1/3, becomes of order 1. This indicates that for earlier times most of the operator strings still travel at maximal velocity.
In the late-time regime the operator front slows down, moving ballistically with v B < 1, and shows approximate diffusive broadening [see Fig. 2(a,b)]. At late times the shape of the operator front is well described by an error function of the form described in Eq. (14), indicating that higher k-steps only serve to further renormalize the arguments of the obtained profile.
We extract the butterfly velocity and diffusion constant and compare them to the analytic prediction obtained by truncating the path integral [ Fig. 4]. For a specific but randomly selected perturbation we find that the two-step approximation is in good agreement with the full path integral result for low to intermediate perturbation strengths. For the diffusion constant the discrepancy is larger, but the qualitative behavior is captured. We attribute this discrepancy to paths which contain large steps. For both quantities, the two-step approximation significantly improves the one-step approximation, and the accuracy of this approximation is expected to increase for larger q or by including higher steps.

Discussion.
The observation of diffusively broadening fronts in non-random systems far away from the dual-unitary limit [8,29] hints at the presence of a more general mechanism. Numerical results indicate that the degeneracy of the subleading eigenvalue can remain stable far from dual unitarity, suggesting that a similar path-integral description remains possible beyond the perturbative regime. Moreover, the non-Hermiticity of the transfer matrix might play a central role -Ref. [49] previously observed that non-Hermiticity can strongly influence the behavior of OTOCs.
Our work shows that dual-unitary circuits can serve as a starting point to investigate more generic settings. We hope that this work opens up further studies on perturbed dual-unitary cicuits, e.g., on entanglement dynamics, the relation to transport, or spectral properties.
The developed framework can be directly applied to more general probes of operator dynamics in perturbed dual-unitary dynamics involving multiple replicas of the circuit, e.g. Rényi (operator) entanglement [15,32], spectral form factors [16,50], or in studies of 'deep thermalization' [33,51,52]. In this supplemental material we detail the calculations mentioned in the main text. The first section reviews the construction and properties of the maximally chaotic subspace (MCS), as well as its application to the computation of out-of-time-ordered correlators (OTOCs) in dual-unitary circuits. Next, the matrix elements of general gates in the MCS are computed. In the second section the quantum-information-theoretic interpretation of the scattering amplitudes z k is elaborated upon and used to prove various bounds. The third section presents details of the derivation of the discrete path integral formula and the profile of the OTOC in the 1-and two-step approximation is derived and its asymptotic form derived.

S-I. TRANSFER MATRIX IN THE MAXIMALLY CHAOTIC SUBSPACE
In this section the construction of the MCS and the computation of OTOCs in dual-unitary circuits is reviewed (for a more detailed derivation see Ref. [17]), and the matrix elements of the transfer matrix for perturbed circuits in this space are computed. For an overview of unitary circuits, see Ref. [35]. The conventions used in the following are introduced in the main text.

A. Construction of the Maximally Chaotic Subspace
As introduced in the main text, the OTOC is defined as Following Ref. [17], after representing this equation graphically and eliminating all gates outside the causal light cones of σ α and σ β , the OTOC can be represented as the following tensor network (TN) contraction for t − x ∈ 2Z where the light-cone coordinates n = (t − x + 2)/2, m = (t + x)/2 set the size of the TN. For t − x ∈ 2Z + 1 we find and in this case n = (t − x + 1)/2, m = (t + x + 1)/2. We are interested in the dynamics at long times and relatively close to the geometric light cone t = x. Therefore, we consider the TN in the limit of large m as being generated by the column transfer matrix T n , defined as which we rotate by 90 • for convenience. We introduce the vectors corresponding to the left and right boundary conditions For odd (t − x) the right boundary condition reads such that we can write The transfer matrix is a contracting map, i.e. T n v ≤ v . Hence, all its eigenvalues lie inside or on the boundary of the complex unit disk. In the limit m → ∞ the bulk of the TN becomes a projector onto the leading (modulus 1) eigenspace of the transfer matrix. For circuits composed of generic unitary gates the leading eigenspace is nondegenerate. As T n is in general not symmetric the leading left-and right-eigenvectors are not related by transposition. In the folded representation unitarity is expressed as where the triangle denotes an arbitrary permutation of even and odd legs. These relations imply that the following two vectors are leading eigenvectors Dual unitarity, which is expressed in the folded language as, allows the construction of further leading eigenvectors. Using the above identities, it can be shown that vectors of the form are right eigenvectors with eigenvalue 1. Moreover, the associated transposed vectors (n, k| are left eigenvectors with eigenvalue 1. These vectors can be used to construct a set of orthonormal eigenvectors as |n, 0) := |n, 0), |n, k) := q|n, k) − |n, k − 1) We call the space spanned by these vectors the maximally chaotic subspace (MCS) [15]. In the main text we adopt the simplified notation |k) := |n, k) for convenience. The MCS is constructed without any reference to the properties of the particular gate, apart from dual unitarity. If the leading eigenspace of the column transfer matrix constructed from a gate equals the MCS, this gate is called maximally chaotic. Not every dual-unitary gate is maximally chaotic, in particular the existence of a local conserved quantity always leads to additional leading eigenvectors, but the set of maximally chaotic gates is dense in the set of dual-unitary gates.

B. OTOCs in Maximally Chaotic Dual-Unitary Circuits
For maximally chaotic gates the OTOC acquires a universal form in the late-time regime. As discussed above, for m → ∞ the bulk of the TN becomes a projector on the leading eigenspace, in this case the MCS. Thus, to compute the OTOC it only remains to contract the vectors in the MCS with the boundary conditions of the TN. The nonzero overlaps are given by L n |n, 0 = 1 q n/2 , L n |n, 1 = − 1 L n (σ α )|n, k n, k|R + n (σ β ) = − 1 q 2 − 1 δ n,1 , (S15a) (S15b)

C. Projected Transfer Matrix for General Circuits
We perform degenerate perturbation theory in the MCS. For this we need to compute the matrix elements of the column transfer matrix in the MCS. First of all we compute (n, |T n |n, k) for ≥ k and find the same result as for dual-unitary gates (n, |T n |n, k) = 1 q 2n+1 since the unitarity condition can be applied to every gate and we obtain a number that does not depend on any microscopic properties of the gates. Next we compute (n, 0|T n |n, k) = 1 q 2n+1 where we have used the unitarity of the gate and expressed the matrix element through B k as introduced in the main text. Finally, for 0 < < k we find (n, |T n |n, k) = 1 q 2n+1 Translating these results into the orthonormal basis {|n, k)} yields (n, 0|T n |n, 0) = 1 (S19a) otherwise. (S19c) The quantities z k = (B k − B k−1 )/(q 2 − 1) have been introduced in the main text. Notice that the transfer matrix in the MCS has a non-trivial Jordan structure. The algebraic multiplicity of λ 1 = 1 − z 1 is n, while its geometric multiplicity is 1. The Jordan structure is crucial in obtaining a diffusive OTOC profile, since a diagonalizable transfer matrix can only give rise to exponential decay (given the overlaps of MCS vectors with the boundary conditions).

D. Dilute Limit
In the following we argue that the projection of the transfer matrix to the MCS is controlled in a particular dilute limit. Note that after repeated application the transfer matrix of a maximally chaotic dual-unitary circuit approaches the projector on the MCS, lim m→∞ T m n,DU = Π MCS . Hence, a circuit composed of lines of defects orthogonal to the light cone with patches of maximally chaotic dual unitary gates inbetween can be expressed using such a projected transfer matrix, up to corrections exponentially small in the distance between the defects. This is depicted in Fig. S1 for finite sized patches.

S-II. SCATTERING AMPLITUDES AND OPERATOR ENTANGLEMENTS
In this section relevant properties of the matrix elements B k and the resulting scattering amplitudes z k are collected. It is shown that the B k can be expressed as operator entanglements on an enlarged Hilbert space. This construction allows to prove the bound 1 ≤ B k ≤ q 2 (implying z k ≤ 1). Furthermore, it is shown that for gates that are not dual-unitary the B k converge to q 2 for k → ∞ (and hence z k → 0 for k → ∞).
FIG. S1. Time-evolution operator for a circuit composed of lines of defects orthogonal to the light cone (marked in red) with patches of maximally chaotic dual unitary gates in between (marked in yellow). In the limit of large separation between the lines of defects the projection to the MCS is controlled.

A. Equivalence of B1 and Operator Entanglement
In the following it is shown that the quantity B 1 is equivalent to the operator entanglement of the gate. This equivalence implies the bound 1 ≤ B 1 ≤ q 2 and that B 1 = 1 if and only if the underlying gate is dual unitary. We begin by reviewing the operator-to-state mapping for two-site gates: an operator U acting on two sites can be mapped to a state |ψ U on a 4-site Hilbert space by here |φ = 1 √ q j |j |j denotes the Bell state on two sites. Operationally, two Bell pairs are created and subsequently entangled by the gate U . This is clearer in the graphical representation: It is useful to introduce the Schmidt decomposition of U , U = q 2 j=1 √ σ j X j ⊗ Y j , where σ j ≥ 0 are the Schmidt coefficients, and X j and Y j denote orthonormal operator bases of the local Hilbert space, tr[X † j X k ] = δ jk and tr[Y † j Y k ] = δ jk . This implies a decomposition of the state |ψ U as The entanglement measure that is related to B 1 is the purity of the reduced density matrix on the subset AC. It holds that The so-called linear operator entanglement is then defined as E(U ) = 1 − Tr[ρ 2 AC ] [47]. Its maximal value is E(U ) = 1 − 1 q 2 when all Schmidt values are equal. We call such operators maximally entangled. An operator being maximally entangled is equivalent to it satisfying unitarity in the spatial direction [48]. Hence, dual unitary gates are those gates which are unitary and maximally entangled. For separable operators of the form U = qX A ⊗ Y B the operator entanglement takes the minimal value E(U ) = 0.
On the other hand, inserting the Schmidt decomposition into the matrix element B 1 we find The equivalence can also be seen from inspection of the corresponding diagram The bound on E(U ) immediately implies 1 ≤ B 1 ≤ q 2 and that B 1 = 1 is attained iff the gate is dual unitary.

B. B k and Operator Entanglement of Diagonal Compositions
In the following we generalize the above arguments to B k with k > 1. We show that these matrix elements can also be expressed through the operator entanglement of a gate acting on a larger Hilbert space with respect to a particular bipartition. This immediately implies 1 ≤ B k ≤ q k+1 , a generalized version of the bound derived in the previous section. We then use that the enlarged gate is constructed via diagonal composition to prove the more stringent bound B k ≤ q 2 .
Consider the following Schmidt decomposition of a gate U ∈ U (q k+1 ) with X j : C q → C q k and Y j : C q k → C q forming complete orthonormal bases. Using the graphical calculus we can show that B k can be expressed through these Schmidt values as Notice that this result does not rely on any internal structure of U. We can define the operator entanglement with respect to the above partition as This quantity determines the higher-order values of B k as In the following we make use of the fact that the gate U determining B k is constructed out of the two-site gate U by diagonal composition (see Eq. (S32)) to derive the stronger bound B k ≤ q 2 . We use an argument based on monogamy of entanglement. On the level of operators it expresses that operators constructed by diagonal composition cannot be separable with respect to the partition introduced above.
Formally, we introduce 2k + 2 sites with a local q-dimensional Hilbert space and consider the partitions Then we define the state where |φ XY denotes the generalized Bell state on the appropriate subspaces. In words, we prepare the subsystems A C and B D in maximally entangled states respectively and then we apply the diagonally composed unitary transformation U to the subset A B . For illustration, in the case of k = 2 we have the state Eq. (S28) expresses that the operator entanglement deriving from the particular Schmidt decomposition introduced above is given by computing the purity E k (U) = 1 − Tr[ρ 2 AC ] with respect to the subset AC. Consider what happens to a separable gate in this setup. Take e.g., U to be the identity, for which direct computation returns B k = q 2 . The operator-to-state mapping can be represented as The first site of A and C are maximally entangled, while the remaining sites of A are independent of C. After tracing out B D it holds that This is exactly the product of a Bell state on sites A 1 and C with a maximally mixed state on the remaining sites. It follows directly that Tr ρ 2 AC = 1/q k−1 , which gives B k = q 2 and reproduces the earlier result. It is a consequence of the gate not passing any entanglement between A and B, thus A and C remain maximally entangled. In the general case, U passes entanglement between A and B and thus, by monogamy of entanglement, the entanglement of AC is diminished. Hence, q 2 indeed constitutes an uppper bound for B k .

C. Asymptotic Behavior for large k
In this section we show that for gates that are not dual unitary B k → q 2 for k → ∞. In the generic case, the convergence is determined by B 1 .
First, by unfolding the graphical representation of B k it can be expressed as the contraction of a transfer matrix T Acting from the right with T and interpreting the contraction on the left as a density operator, T is a completely positive trace preserving map. Graphically This map allows for a direct operator sum representation as in which k,k E † cc ,aa kk E aa ,bb kk = δ bc δ b c . The contracting property of T also immediately implies that B k ≥ B k−1 and hence z k ≥ 0.
Generically, this transfer matrix has one leading left-and right-eigenvector, equal to the transpose of the right and left boundary conditions respectively, and for k → ∞ we can again replace the repeated application of the transfer matrix with a projection operator, now returning However, for dual unitary gates the transfer matrix has two degenerate leading eigenvectors, both left and right, such that B k = 1 for all k since the boundary conditions are exact eigenstates, This result can also be understood by noting that diagonal composition of dual-unitary gates preserves dual-unitarity [24], such that we can immediately extend the argument that B 1 = 1 in the dual-unitary case to arbitrary values of k.
Close to dual unitarity we can use degenerate perturbation theory in this two-dimensional subspace to find This approximate value is in fact a lower bound on B k . In the representation of Eq. (S37) the contraction from the left is equal to T acting on the identity matrix. Since T is a quantum channel, T (1) is a positive operator which has a spectral decomposition T (1) = α p α |ψ α ψ α | with p α ≥ 0 and the eigenvalues are identical to the eigenvalues of T . Consequently, we can write B k ∼ φ|T (1)|φ = α p α | φ|ψ α | 2 , with |φ a Bell state. Thus, the contribution of every single eigenvector in the spectral decomposition of T to B k is nonnegative, and it follows that Eq. (S40) constitutes a lower bound. Combining this lower bound with the previously established upper bound in turn implies an upper bound for z k via

S-III. PATH INTEGRAL FORMULA FOR THE OTOC
In this section the diagram rules for the computation of the OTOC are derived. These rules are then used to obtain analytic expressions for the OTOC after identifying the dominating contributions to the path integral. We discuss the one-and two-step approximations that result from restricting to paths containing steps of at most length 1 (2) and analyze their asymptotic behavior. We demonstrate that the one-step paths dominate the calculation of the OTOC in the limit of a large local Hilbert space.

A. Derivation of the Diagram Rules
We write the transfer matrix in the orthogonal basis of the MCS introduced in the first section as For convenience we introduce M := T n − D = n j=1 |u j )(j|. To compute the OTOC we need powers of T n . As the matrix M is linear in the scattering amplitudes z k , we can expand T m n in M as (notice that we do not expand D in z 1 -we comment on this in the next section) The zeroth order term is given by For z 1 > 0 this expression differs from the dual-unitary result because the matrix D already takes into account certain deviations from dual unitarity due to the inclusion of the z 1 terms in D. These terms can be interpreted as scattering processes that leave some weight of the right edge of the operator string on the edge of the light cone. Physically, they lead to damping: the right edge of the operator string becomes trivial after a time ∼ (log(1 − z 1 )) −1 . We could have equally well absorbed these terms into the propagator by setting (j|u j ) = −z 1 (which explains the interpretation as scattering processes), at the expense of having paths with negative weights. The ν-th order contribution can be generally written as From the explicit form of D if follows that M D = (1 − z 1 )M , which allows to write From this expression we immediately see that the effect of light-cone-edge scattering can be absorbed entirely into a renormalization of the first vertex |u j1 ) that appears in each path, or alternatively speaking into the overlap with the left boundary (L n |u j1 ). Notice that this overlap now depends on the coordinates n, m and the order in perturbation theory ν. The combinatorics of the edge scattering is contained in the generating function The coefficients of the polynomial are given by the number of solutions to the equation k 1 + · · · + k ν = m − ν − k 0 where the k i are nonnegative integers and k 0 is held fixed. By "stars and bars" it follows that For ν ≥ 2 this polynomial can be brought into the form of a hypergeometric function by defining β k = m−k−1 and consequently The last piece required to compute the renormalized left overlaps is the action of D on the propagators, D|u j ) = (u j0 , (1 − z 1 )u j1 , . . . , (1 − z 1 )u jn ). Hence, The OTOC is given by computing the overlap By defining the renormalized left overlap (L n,m,ν | := (L n |P m,ν (1 − z 1 ) −1 D we can write this as a sum over weights of the form W ± n,m,ν (j 1 , . . . , j ν ) = (L T n,m,ν u j1 ) The indices {j 1 , . . . , j ν } can be interpreted as nodes of a path and the quantities ( |u k ) play the role of propagators determining the amplitude of jumping k − steps inside the light cone. The amplitudes for negative step sizes vanish making sure only causal paths contribute to the OTOC. This allows us to formulate the following diagram rules • For each 1 ≤ ν ≤ m consider all paths with ν nodes, {j 1 , . . . , j ν }, such that j i < j i+1 (causality) and j ν ≤ n.
• For a given path, associate each step i → i+1 with the propagator (j i |u ji+1 ), the starting point with (L n,m,ν |u j1 ) and the endpoint with (j ν |R ± n ). • Sum over all paths at a given order, and then over all orders, weighted by (1 − z 1 ) m−ν .

B. Alternative Diagram Rules
In the above approach, we have absorbed the amplitude for edge scattering processes into the diagonal matrix D. As a consequence, the effect of these processes is to renormalize the left boundary condition of the path integral. It is however also possible to keep these processes in the path integral, at the expense of negative weights and more involved combinatorics. In that case we have where (j|u j ) = −z 1 and (j|u k ) = (j|u k ) for j = k. The diagram rules now read • For each 1 ≤ ν ≤ m consider all paths with ν nodes, {j 1 , . . . , j ν }, such that j i ≤ j i+1 and j ν ≤ n.
• For a given path, associate each step i → i + 1 with the propagator (j i |u ji+1 ), the starting point with (L n |u j1 ) and the endpoint with (j ν |R ± n ).
• Sum over all paths at a given order, and then over all orders, weighted by m ν .

C. one-step Approximation
Using the diagram rules, the OTOC in the one-step approximation can be found. Only two distinct paths contribute. The first starts at j 1 = 1 followed by n − 1 steps upward (ν = n), and the second starts at j 1 = 2 followed by n − 2 steps upward (ν = n − 1). This yields We define the front profile in the large q limit F z1 (n, m) := z n 1 (1 − z 1 ) m−n P m,n (1 − z 1 ).
We find that the second term in Eq. (S55) vanishes asymptotically, while the first, given by F z1 , remains finite. The above considerations have to be modified exactly on the light cone, n = 1, where paths cannot start at 2 and the zeroth order term (L n |D m |R + n ) also contributes. Explicit calculation yields

Asymptotic Analysis of the Front Profile
While we have an exact expression for the OTOC in the MCS in the large q limit, the function is inefficient to compute numerically for large values of its argument. To better understand the behavior of the OTOC, we investigate the properties of F z1 (n, m) for large n, m.
First, we can rewrite F z1 in terms of the incomplete beta function B z1 (p, q) F z1 (n, m) = n m n B z1 (n, m − n + 1).
The following asymptotic analysis is facilitated by using the integral representation of the incomplete beta function, To determine the butterfly velocity we investigate the OTOC on rays of constant velocity x = vt as t → ∞. Recall that n, m are related to the spacetime coordinates as Hence, in the relevant limit, κ −1 := m/(n − 1) in which we have defined Eq. (S61) has the form of an Laplace integral, which can be asymptotically analyzed using a saddle-point approximation. We find that the saddle point is situated at s 0 = κ = 1−v 1+v . For fixed perturbation strength z 1 the saddle point is located either inside, outside, or on the border of the integration region depending on the value of the velocity. The asymptotic behavior on rays x = vt thus drastically changes its character as a critical velocity v B,1 = 1−z1 1+z1 is crossed. This velocity corresponds exactly to the butterfly velocity, as will be demonstrated below.
Consider first the case v < v B,1 in which the saddle point lies outside the integration domain. Physically we expect the OTOC to decay exponentially to zero as t → ∞, signifying scrambling. The large n approximation to the beta function reads We further use Stirling's approximation for the binomial where Putting these results together we find with γ(v, z 1 ) given by For v < (1 − z 1 )/(1 + z 1 ) we have that γ is smaller than 1, such that the OTOC decays exponentially inside the light cone, as expected. We define a scrambling time as t * := −(log γ(v, z 1 )) −1 . Moreover, we have γ v = 1−z1 1+z1 , z 1 = 1. Outside the light cone, for v > v B , the saddle point lies inside the integral and hence the large n analysis yields B z1 (n, m − n + 1) ≈ e mg(κ) 1 − κ 2π m|g (κ)| = πκ(1 − κ) 2m (1 − κ) m−n κ n−1 .
This indicates that operators outside the light cone commute. These two results together already show that v B,1 indeed equals the butterfly velocity. In the following the form of the OTOC close to the front, x − v B,1 t = O( √ t), is determined. We write x = v B,1 t + δ where we assume δ ∼ O( √ t). We begin with evaluating the incomplete β-function in this limit. The integral is no longer a Laplace integral, but it possesses two large parameters. Following Fulks' method from Ref. [53], we find that (recall that the saddle point of the original problem lies on the boundary of the interval) [54] B z1 ≈ A(t) where we have subsumed the known prefactors only depending on t in the function A(t), and have introduced the error function erf(x) = 2 √ π x 0 ds e −s 2 . Let us also analyze the remaining terms in F . The prefactor contains only terms that scale as δ/t ∼ 1/t 1/2 beyond leading order. The binomial calls for a more thorough analysis. Formally, Fulks' method has to also be applied, but in this case this approach can be avoided by carefully expanding Stirling's approximation in δ. Corrections to Stirling's approximation are not relevant, since they are suppressed by at least 1/t. Dropping terms of order 1/t, we write The different contributions to the exponentials can now be considered. At leading order we can pull out ζ t , with ζ defined in Eq. (S66). The remaining term reads These are now all terms that are of order one in the limit t → ∞, δ ∼ √ t. Hence we conclude This is indeed the shape of a diffusively broadening front, 1 2 (1 + erf( δ √ 2D1t )), with diffusion constant D. two-step Approximation Using the above rules, the OTOC in the two-step approximation can be constructed. There are three possible starting points. Starting at n = 1 we denote the number of two-steps by h 2 . There are n−1−h2 h2 choices to distribute these steps. A given path with h 2 two-steps has a total of ν = n − h 2 nodes. We find (1 − z 1 ) m−n+h2 P m,n−h2 (1 − z 1 ) .
Introducingz 1 = z 1 − z 2 /q 2 we can rewrite this in terms of the first-order front profile F z1