Fastest local entanglement scrambler, multistage thermalization, and a non-Hermitian phantom

We study random quantum circuits and their rate of producing bipartite entanglement, specifically with respect to the choice of 2-qubit gates and the order (protocol) in which these are applied. The problem is mapped to a Markovian process and proved that there are large spectral equivalence classes -- different configurations have the same spectrum. Optimal gates and the protocol that generate entanglement with the fastest theoretically possible rate are identified. Relaxation towards the asymptotic thermal entanglement proceeds via a series of phase transitions in the local relaxation rate, which is a consequence of non-Hermiticity. In particular, non-Hermiticity can cause the rate to be either faster, or, even more interestingly, slower than predicted by the matrix eigenvalue gap. This is caused by an exponential in system size explosion of expansion coefficient sizes resulting in a 'phantom' eigenvalue, and is due to non-orthogonality of non-Hermitian eigenvectors. We numerically demonstrate that the phenomenon occurs also in random circuits with non-optimal generic gates, random U(4) gates, and also without spatial or temporal randomness, suggesting that it could be of wide importance also in other non-Hermitian settings, including correlations.


I. INTRODUCTION
Entanglement is one of the key properties that can make quantum systems different than classical ones. This is reflected in quantum information -large entanglement is a necessary resource to gain advantage over classical computation, and many of the new phases discovered in recent decades can be distinguished by different patterns of entanglement [1]. Because entanglement and the related concept of quantum information plays such a fundamental role it is also instrumental in the quest to push the boundaries of the present day physics, for instance, trying to figure out essential rules that quantum gravity should obey.
An elementary question is how can one efficiently generate this resource? One procedure that we focus on are the so-called random quantum circuits [2], where quantum gates are chosen randomly from a certain set of gates. What set one takes might foremost depend on the available resources; while one wants to generate entanglement as quickly as possible, one must use the resources as efficiently as possible. What is meant by efficient will depend on the context, however, there are some common conditions. Richness of nature emerges due to two ingredients, innate properties of constituent objects (particles) and local interactions between them. Locality, being intimately related to causality, is rather important and is typically also the costly resource in quantum computation. Local transformations, i.e. 1-site unitary operations, are faster to perform and typically have higher fidelity, while interactions in the form of 2-site gates are expensive. We shall focus on random circuits in which 1-site resources are random (1-qubit unitary from the Haar measure) while the 2-site transformations are held fixed. Such a choice makes sense for two reasons: (i) it follows quantum information cost guidelines, and (ii) in some cases allows exact solvability. We will demon-strate that taking a fixed good entangling 2-qubit gate is actually better than randomly choosing the whole 2qubit transformation. Optimal random circuits that generate entanglement the fastest should be of interest also in the near-term applications of noisy quantum computers where they have been identified as prime candidates in the quest to demonstrate quantum supremacy [3]. Using the optimal circuit in such a quest is of high relevance as it directly affects the execution speed and therefore the attainable fidelity, which can in turn be crucial for an experiment to be on the right side of the supremacy frontier.
Random circuits have also another use -they are believed to correctly describe some of the properties of generic quantum systems, like e.g., dynamics of operators [4][5][6][7], and can therefore serve as models of chaotic many-body systems [8]. The main advantage over chaotic systems is that their randomness enables analytical simplifications, leading to exact results. Exact results for entanglement evolution have been obtained also for the CFT [9]. Recently the two pictures, that of random circuits and solvable quantum systems, emerged in the form of the so-called dual-unitary circuits, e.g. Ref. [10], which are solvable models having some elements of chaotic systems (as well as of integrable ones). As we shall see, extremal random circuits use the same building blocks as dual-unitary circuits. Interesting phenomena we discuss though are not limited neither to extremal nor to dual-unitary circuits.
Entanglement generation has been as well studied in the context of black hole physics [11][12][13]. A question of intense interest is in particular the maximal possible entanglement generation, with various bounds end explicit results [14][15][16]. The fact that random circuits 'scramble' information well can be used also for decoupling protocols [17] -a procedure in which any initial (local) correlations between an observer and a system are spread arXiv:2101.05579v2 [quant-ph] 21 Apr 2021 out globally, such that no local measurement on the system can reveal any correlation anymore -an observer becomes decoupled from a system (under local measurements). Such quantum information decoupling is in fact what is effectively going on during the thermalization process [18], and so random circuits can also be thought of as being models of ideal thermalization (i.e., lacking any Hamiltonian-specific features).
We obtain a complete class of random protocols that generate entanglement optimally, and more importantly, identify several new and surprising features that can emerge in non-Hermitian matrices describing many-body systems. Specifically, we (i) find random circuits that produce entanglement in the fastest possible way. Our fastest circuit is significantly faster than the best previous random circuits. (ii) We identify a number of phase transitions in time -at certain moments the convergence rate to the asymptotic "thermal" entanglement of random states suddenly changes. This shows that an ideal thermalization modeled by a random circuit is a two-stage process rather than relaxation with a constant rate. (iii) We find that the convergence rate may not be given by the transfer matrix gap but can instead be either larger or smaller. This does not occur just in some obscure unimportant cases but in almost every case we looked at, specifically in the fastest circuits as well as in generic ones, and also in circuits with random U(4) gates much studied in the past. This happens in spite of the spectrum itself being rather innocuous, e.g. the 2nd largest eigenvalue λ 2 is gapped away from both λ 1 = 1 and |λ 3 | < |λ 2 |. We also interestingly observe that in the thermodynamic limit the phenomenon is observed also in a single realization of a random circuits and that, in fact, randomness is not necessary neither in space, nor in time (one can use the same random 1-qubit transformation at all sites and at all times). The fact that a common 'folk theorem' that the decay is given by the gap does not hold might have important implications in many areas of physics where one deals with non-Hermitian matrices, e.g., dissipative systems, transfer matrices, etc.. Preliminary results show [19] the same phenomenon also in out-of-time-ordered correlations (OTOC).
One particularly intriguing case is where the decay is ∼ ( 1 2 ) t rather than ∼ ( 1 4 ) t suggested by |λ 2 | = 1 4 . We show that this occurs due to a 'phantom' eigenvalue 1 2an 'eigenvalue' that is not in the spectrum but is rather just mimicked by exponentially growing expansion coefficients in front of smaller (true) eigenvalues. We are not aware of any similar observations; the closest is perhaps a recent study [20] of vanishing gaps in Lindblad generators, finding that due to exponentially growing expansion coefficients the gap does not always give physically relevant relaxation timescale [21].
6 5 = 1.2 [24,25] XY,CNOT rand. 4 3 ≈ 1.33 [24] TABLE I. Existing exact results for the purity decay rate rE, defined as I(t) ∼ exp (−rEt), in qubit random circuits (cf. Fig. 1). Per unit of time ∼ n gates are applied. Deterministic nearest-neighbor (n.n.) protocol with a brick-wall (BW) pattern of gates is faster than randomly choosing a n.n. pair on which the gate acts (rand.). Allowing coupling between an arbitrary pair of gates (all-all) is even faster. Results are shown for open (OBC) and periodic boundary conditions (PBC). U(4) denotes a Haar random gate, while the XY gate is also known as the iSWAP or DCNOT gate.
vergence properties of random circuits are Refs. [2,22]. First exact results about the convergence rate towards random states were made possible by mapping [23] the average dynamics to a Markovian chain. Using the mapping, the question about the speed of generating entanglement boils down to the question about the gap of a certain transfer matrix. Such a mapping is rather fruitful, not just for the specific question of entanglement generation [24][25][26][27][28], but also for a nice systematic treatment of any expectation that involves t copies of the propagator U and t copies of U † -a so-called unitary t-design [29][30][31][32][33]. The simplest entanglement quantifier purity I = trρ 2 A therefore belongs under the realm of a 2-design. The Markovian mapping in particular allows to get exact results for the purity convergence rate (purity "entanglement speed") by recognizing that the resulting matrix is equivalent to an (integrable) spin chain [24]. Recently, Markovian description has been used to describe evolution of purity for all possible bipartitions at once [27]. Exact results are by now available for various protocols containing the 2-qubit random Haar U(4) gate (including quantities beyond purity), e.g., choosing a random nearest-neighbor pair [24,25], picking a random permutation [34], or having a brick-wall pattern [4,5,35], or for random single-site unitaries and a global phase [28]. If one allows the coupling between all pairs of qubits one can also get the exact entanglement speed for some nonrandom gates, in particular for the important XY gate (as well as for the CNOT) [24], see Table I. Exact results are also available in the limit of large local Hilbert space dimension q [4,8,25,[34][35][36] (we shall focus on qubits, q = 2). For numerical results see Refs. [26,33,[36][37][38]. Summarizing these results in Table I we see that using nearest-neighbor (n.n.) qubit gates, the fastest known protocol is the brick-wall (BW) configuration with a U(4) gate for which the entanglement rate defined via a longtime purity decay I(t) exp(−r E t) is r E ≈ 0.45 [39] per boundary link. We will find a protocol which is more than 3 times faster and has the maximal possible entanglement speed. The optimal 2-qubit gate will turn out to be the so-called XXZ gate (also known as the pSWAP), a special case of which is the XY gate (iSWAP). The same gate has appeared before in the context of random circuits: it has been observed numerically that the XY gate is the fastest of all gates for random n.n., or all-all protocol [38] (which though are much slower than our best BW protocol). The same type of gate has recently emerged in dualunitary circuits [10]. Due to their special properties such circuits (see also related concepts [41] of 2-unitaries [42] and perfect tensors [43]) allow exact results, for instance of Rényi entropies for a non-symmetric bipartition in a disordered kicked Ising model [10] (this system has maximal entanglement rate provided one counts time in an optimal way), see also Ref. [44]; for a symmetric halfinfinite bipartition see Ref. [45]. One can also calculate the tripartite information [46], or find circuits with maximal butterfly velocity [47] in the dual-unitary context.
We are going to characterize entanglement generation through the decay of average purity I(t) = trρ 2 A (t). Because we want to understand entanglement generation on a global scale we shall focus on a symmetric bipartition [48] of our system of n qubits into two equal subsystems A and B, each with n A = n B = n/2 qubits. That is, for a chain geometry of qubits we split the chain in two equal halfs of consecutive qubits. We shall calculate the average purity behavior by mapping its dynamics to a Markovian process [23,24,27], reducing the problem to that of properties of a particular transfer matrix M . At long times the purity will decay exponentially with a rate r E determining how fast entanglement is produced. Because the purity entanglement rate r E will be proportional to the area A of the boundary between A and B one often considers the entanglement speed v E (called also the purity speed, or tsunami velocity [15]) defined as For random circuits the asymptotic state is an infinite temperature state having maximal entropy density s ∞ = ln q, while A is simply the number of 2-qudit gates applied across the boundary between subsystems A and B per unit of time. If one uses natural units of time such that one applies one gate on each bond per unit of time then A is just the number of bonds cut by a bipartition, which for a symmetric bipartition we use is A = 1 for open boundary conditions (OBC) and A = 2 for periodic boundary conditions (PBC). Because one can generate 2 maximally entangled 2-qudit states per one application of a 2-site gate [51], the maximal v E one can achieve is In quantum circuits one therefore has the upper bound  Fig. 1). The fastest entanglement generation for local couplings is achieved for the XXZ gate in a BW configuration (bold). Such circuits are optimal and much faster than e.g. using random U(4) gates or all-all coupling (Table I), or using the CNOT gate. The staircases configuration S is on the other hand the slowest. The rate is in general not given by the 2nd largest eigenvalue |λ2| of the associated transfer matrix.
r E ≤ 4 ln q for PBC, and r E ≤ 2 ln q for OBC. One can also study other quantities measuring entanglement, like the von Neumann entanglement entropy S(t) = −tr(ρ A ln ρ A ), or the Rényi entropies S r (t) = ln ρ r A /(1 − r), and use them to define corresponding rates and speeds. In general all those entanglement speeds can be different [35], an example being a circuit with random U(4) gates. The reason for the difference is variation in entanglement properties of different circuit realizations, for instance, some rare realizations could use very inefficient entanglers from the U(4) set of gates. For the circuits we focus on we expect such effects to be much less pronounced. In particular, the fastest scramblers that we find have the maximal speed allowed by causality and so all Rényi entropies should have the same maximal speed (similarly as e.g. in Ref. [10]). Furthermore, even for non maximal random circuits with fixed 2-qubit gates the variation in entanglement between different realizations should be much smaller than e.g. for the random U(4) case because the entangling 2-qubit gates are the same at every step (even for random U(4) the difference between the speed from the average purity and the average 2nd Rényi entropy is likely very small [35]). The circuits we study will at not too short times produce typical states that have good self-averaging properties, and we expect von Neumann entropy and Rényi entropy to behave essentially the same as the logarithm of the average purity, S(t) ≈ S 2 (t) = − ln I(t). We leave possible differences for future work. In the remainder of the paper we will always discuss only purity rates and speeds.
Our results for r E are summarized in Table II and in Fig. 1. We can see that the rates are in general not equal to the ones suggested by the 2nd largest eigenvalue λ 2 of the transfer matrix M , and that in the two optimal cases, namely the BW configuration with PBC, or the OBC: OBC, the entanglement velocity is v E = 2 and therefore saturates the bound (2). The optimal gate is in both cases found to be the XXZ gate for any value of the anizotropy 0 ≤ a z < 1 in front of the σ z j σ z j+1 coupling (see Eq. 4 for its definition). In particular, using the XXZ gate is much faster than using random U(4) gates, as well as for instance the CNOT gate. We do not find any transition with a z in the relevant entanglement rate r E , though we do find it in the leading eigenvalue λ 2 of the transfer matrix (which in most cases nevertheless does determine the late-time entanglement rate after the phase transition at the thermalization time). For dual-unitary systems with transitions with a z in transfer matrices see Refs. [46,53].

A. Random quantum circuits
We study random quantum circuits in which 2-qubit unitary transformations are drawn from some set of unitary gates. One elementary gate -a 2-qubit transformation U i,j acting on qubits i and j -is a product of independent single-qubit random unitaries V i acting on i-th qubit and V j acting on j-th qubit, drawn according to the Haar measure on group U(2), and a 2-site unitary W i,j . The whole gate is therefore U i,j = W i,j V i V j . The 2-site W i,j will be the same for all steps, while the single-site V i are independent for each step and each qubit. The formalism that we will use can actually accommodate also the situation where W i,j would be random from U(4) (we are going to briefly comment on that case in Sec. IV C), however, such circuits produce entanglement slower (see Table I) than the ones we study.
A given random circuit is specified by a fixed 2-site W and a sequence of qubit pairs, called a protocol or configuration, on which U i,j are applied. For instance, one could choose W to be the CNOT gate and a sequence to be a brick-wall pattern (Fig. 2). We are going to find a gate W and a protocol that generate entanglement the fastest. In the main body of the paper we shall limit to protocols in one-dimension (1D), where the gates are allowed only between nearest-neighbor (n.n.) qubits with either open (OBC) or periodic boundary conditions (PBC), i.e., qubits are on a line or on a circle. We shall only briefly mention the 2D case of the Sycamore [3] quantum processor, and in Appendix G the all-to-all coupling (n.n. protocols are faster than those). Focusing on the 1D setting, the space of quantum circuits over which we need to optimize is still large (infinite), therefore simplification is necessary. First, we shall focus on periodic protocols in which the geometry of applied gates repeats after each period. One period will consist of exactly one gate applied on each allowed bond, that is, for OBC there are T = n − 1 gates in a period whereas there are T = n for PBC. Optimization therefore needs to be performed over all T ! permutations describing different orderings of gates within one period, and over all choices of the gate W . Our time t that we use throughout the paper measures the number of periods (see Fig. 2). In Sec. III we shall prove that one does not need to consider all ∼ n! permutations but only ∼ n of them.
One can also reduce the number of W that need to be checked. Because we have a single-site invariance due to random single-qubit gates it suffices to consider W in the canonical form parameterized by only 3 parameters (instead of 16 for general W ∈ U (4)). Namely, every 2qubit unitary can be written in the following canonical form [54,55], where V (α) k are one-site unitary operators, σ x,y,z are Pauli matrices and a = (a x , a y , a z ) has three real parameters, which can be constrained to 0 ≤ a z ≤ a y ≤ a x ≤ 1. A detailed explanation on how to calculate this decomposition from an arbitrary unitary two-site gate W can be found in Ref. [56]. Fixed single-site V (α) k for a specific gate W can be lumped together with random unitaries V k , so that we need to explore only canonical gates w(a). For instance, the CNOT gate has a = (1, 0, 0), the gate that will turn out to be the fastest is the XY gate with a = (1, 1, 0) (also called the iSWAP or DCNOT gate). More generally, the optimal set will consist of gates i.e. with parameters a = (1, 1, a z ) and a z < 1, that we call the XXZ gate (also known as the pSWAP -parametric SWAP gate). The SWAP gate reached at a = (1, 1, 1) generates no entanglement and therefore has completely different entanglement properties than the XXZ family of gates. For our best protocols one therefore has in the thermodynamic limit (TDL) a discontinuous transition from the maximal rate at a z < 1 (XXZ gate) to zero rate at a z = 1 (SWAP gate).
To quantify bipartite entanglement of pure states we shall use purity I(t), where ρ A (t) = tr B |ψ(t) ψ(t)|, and |ψ(t) = U (t)|ψ(0) , with U (t) = (i,j) U i,j a product of all 2-qubit gates upto time t (because we have T gates per unit of time the total number of 2-qubit gates in U (t) is therefore T · t). The whole system of n qubits is bipartitioned into a subsystem A with n A qubits and a complement B with n B qubits, n = n A + n B . In explicit numerical demonstrations we shall always use a symmetric half-half bipartition where A is composed of the first n/2 consecutive qubits while B is the rest. The results though do not depend on the specifics of the bipartition provided one has an extensive n A ∝ n connected subsystem A. The initial state |ψ(0) is always chosen to be fully separable with respect to any bipartition such that I(0) = 1. Purity will therefore decay with time from its initial 1 to the asymptotic value being equal to the purity of random states [57], For large t our random circuit namely uniformly samples U from the unique unitarily invariant Haar measure. Observe that this asymptotic purity is not exactly equal to the purity of the infinite-temperature state ρ A ∝ 1 which is I(β = 0) = 1 2 n/2 , whereas I ∞ = 2 2 n/2 1 1+2 −n for the symmetric bipartition. For easier understanding we shall often show which is a quantity that will grow from 0 at t = 0 to its maximal value S 2 (∞) ≈ n 2 − 1 (for a half-half bipartition) reached when the state |ψ(t) converges towards a random state. S 2 (t) will reach its saturation ∼ n/2 at time t ∞ given by For instance, for the BW PBC random circuit the scaled thermalization time will be t ∞ /n A = 1/4. To more clearly show the relaxation process towards I ∞ we shall also study Looking at ∆S 2 (t) instead of just S 2 (t) will allow to discuss relaxation (thermalization) on timescales beyond t ∞ , which will reveal interesting new phase transitions. Because the state reached (at non-small times) is akin to a random state one will have good self-averaging properties in the thermodynamic limit so that other measures of entanglement, like the von Neumann entropy of the Rényi entropies, will behave similarly as the above S 2 (t), see Appendix F 1 for an explicit numerical demonstration. Also, because the variance of I(t) between different circuit realizations is small in the TDL one can as well study the average I(t), i.e., averaging over different realizations of single-qubit unitaries V j . Such averaging is crucial to get an analytically tractable setting and we shall describe next the formalism used for that.

B. Markov chain description
As mentioned, one can describe [23] the average dynamics of ρ 2 A , i.e., of squares of the expansion coefficients of ρ A , by a Markovian matrix M i,j that describes the averaged action of one 2-qubit gate U i,j . Such description has been widely used, e.g. Refs. [24,25,30,31,33], for random U(4) gates or for Clifford gates (gates that map a product of Pauli matrices to a single product of Pauli matrices, examples being the CNOT or the XY gate). Recently, a different approach has been taken [27] in which one directly describes the evolution of purity for all possible bipartitions rather than of ρ 2 A . It allows describing purity evolution for a protocol with an arbitrary 2-qubit gate W i,j . As we shall see, despite being more general and formally different, it will, for the cases for which the old method works (Clifford gates), at the end result in the very same transfer matrix M i,j as obtained in Ref. [24].
One starts by formally defining a Hilbert space with basis |s , where s = (s 1 , s 2 , . . . , s n ) with s i ∈ {↑, ↓}. A given s encodes a bipartition one looks at, that is, if s i =↑ the i-th qubit is in the subsystem A, otherwise it is in B. A state vector whose evolution we want to describe is then defined as where I s (t) is the purity of a state |ψ(t) for a given bipartition labeled by s. The state Φ (t) therefore implicitly depends on the initial state -for the most interesting fully separable initial state |ψ(0) the initial Φ (0) ≡ Φ 0 is simply Φ 0 = (| ↑ +| ↓ ) ⊗n . Averaging over two one-qubit random matrices V i and V j it has been shown [27] that the vector Φ is mapped to M i,j Φ (see also Appendix A). Repeating this step for all T two-qubit unitaries (each involving independent 1-qubit random unitaries) that are applied per unit of time we get the transformation Dynamics of purity -the expansion coefficients of Φ (t)is therefore given by iterating a single-step transfer matrix M describing Markovian average purity evolution.
To get purity for the most interesting symmetric half-half bipartition one just has to project Φ (t) to the basis state | ↑ . . . ↑↓ . . . ↓ . Slightly abusing notation and defining the initial vector Φ 0 = (1, 1, . . . , 1) and the bipartition basis vector Φ half with components [Φ half ] k = δ k,2 n/2 , we have the average purity after t steps The 4 × 4 matrix M i,j can be calculated for an arbitrary W i,j in its canonical form w(a) (3). Following the procedure in Ref. [27] we obtain (Appendix A) where the basis is ordered as s i s j = {↑↑, ↑↓, ↓↑, ↓↓}, and h = 1 9 (3 − v), u = cos (πa x ) + cos (πa y ) + cos (πa z ) and v = cos (πa x ) cos (πa y ) + cos (πa x ) cos (πa z ) + cos (πa y ) cos (πa z ). For instance, for the XY gate we have Not surprisingly, due to local unitary averaging the parameters h, u, v can be related to known 2-qubit gate objects. Specifically, one has h = 2e(W ), where e(U ) is the entangling power [58] of unitary U , while v itself is equal to the local invariant G 2 [59,60]. Note though that knowing the 2-site properties of the gate W does in no way help us understand the entanglement generating properties of the whole circuit. As we will see, the latter is a full many-body property (the interesting effects that we discuss only arise in the TDL) and also crucially depends on the configuration, not just on the choice of a 2-qubit gate. For instance, the entangling power [61] of W is maximal for the XY as well as for the CNOT gate, and continuously decreases with a z for the XXZ gate. Our results on the other hand show that in the many-body setting of random circuits the CNOT gate is for instance more than 4 times slower than the XY gate (Table II), and that for the XXZ gate the rate does not depend on a z . We observe that the matrix M i,j is not symmetric. Often it is easier to work with symmetric matrices and it turns out that it is possible to transform M to a symmetric form. This will make symmetries more transparent and facilitate comparison with previous Markovian descriptions of random circuits. We can obtain the symmetric form with a similarity transformation arriving at where u ± = (3 ± u)/6. This elementary 2-site matrix can be equivalently written in terms of Pauli matrices, Observe that even if one starts with a noninteracting gate like the XY the resulting M is interacting (J z = 0). For Clifford gates, e.g. CNOT and XY, this is in fact the same transfer matrix as the one already derived in Ref. [24]. The new mapping [27] that we use therefore generalizes the Markovian mapping to non-Clifford gates. As a side note, the transfer matrix M i,j for a random U(4) gate [24] is obtained by formally using u = 0 and v = −3/5 in Eq. (17). The similarity transformation A preserves the spectrum -the spectrum of M is the same as of M , only the vectors have to be transformed. Similarly as in Eq. (11) we use M to denote a product of T 2-qubit M i,j appearing in one unit of time. We shall use a prime to denote matrices and vectors written in the (unrotated) basis |s , like M i,j or Φ (t), whereas unprimed objects, like M i,j or Φ(t) are in the basis transformed by A (where needed, we will denote the basis vectors of this space by e k ). For instance, the initial vector corresponding to a product initial state Φ 0 is transformed to Φ 0 = (A −1 ) ⊗n Φ 0 and has components Φ 0 = (1, 0, . . . , 0), while Φ half = (A T ) ⊗n Φ half . A boldface notation is used to stress that a vector (components) is written in a given basis, like Φ 0 , while Φ 0 is used for a basis-independent ket notation. The average purity after t periods of our random circuits in this new basis is I(t) = Φ half M t Φ 0 . For a symmetric bipartition and a product initial state, due to the structure of Φ 0 and Φ half , purity is determined by one column of M t , or equivalently by one row of (M ) t . Due to a block structure of M i,j (16) it is also clear that M conserves the parity Z := (−1) N ↓ = j σ z j of a vector |Φ(t) , i.e., the number of down spins N ↓ . Because the initial state in the primed basis |Φ 0 is invariant under the particle-hole transformation (i.e., spin-flip), X|Φ 0 = |Φ 0 , where X = j σ x j (this is because the purity is invariant under exchanging subsystems A↔B), the parity of the initial |Φ 0 is always even (even N ↓ ). Indeed, The relevant parity sector of M describing purity evolution of physical |Φ 0 is therefore the one with an even number of down spins. We note that in some cases other additional symmetries are present. For instance, for BW protocol with PBC and n divisible by 4 one also has a reflection symmetry about the site n/2 and the translation symmetry by 2 sites.
Another useful observation is that M i,j has eigenvectors that are independent of gate parameters u and v, namely One consequence of this is that M always has (at least) 2 eigenvalues equal to λ 1 = 1. The corresponding eigenvectors are steady states and are also independent of the gate W . This is consistent with the asymptotic convergence to purity of random states. The physically relevant steady-state vector Φ ∞ , i.e. the even parity eigenvector of M with λ 1 = 1, has components equal to I ∞ (6), One can indeed use the above 1-site eigenvectors v 1,2 to construct the steady state on n qubits and explicitly verify I ∞ (see Appendix B). The steady state (20) can also be written compactly as a matrix product state of rank 2, see Appendix C. Time evolution of the average purity is therefore Markovian (11), i.e., there is no memory. The transition matrix, either M or M (13,16), is however not stochastic (rows do not sum to 1) and therefore does not describe transition probabilities. The probabilistic interpretation is recovered if one works with squares of the expansion coefficients of the state ρ on the original full 16 dimensional 2-site Pauli basis [23], it is though lost if one uses reduction [24] down to 4 nontrivial basis states that results in M (18), or directly works with purity evolution [27].
Note that the transfer matrix for one step M is in general not symmetric, even though the individual gates M i,j are. The spectrum of M is therefore complex and contained within the unit circle. Provided the spectrum has a gap (which is always the case for nontrivial gates) the next largest eigenvalue |λ 2 | should determine the asymptotic decay of purity, and therefore the growth of entanglement (decay of I(t)). Asymptotically one should have where we denote by λ 2 the largest eigenvalue smaller than 1 (in absolute value). The corresponding entanglement rate (1) would then be r E = − ln |λ 2 |.
In the next Section we shall therefore study the gap of M for various gates W and configurations, and in particular prove that there are large classes of configurations with the same spectrum. This will enable us to then find circuits with the largest gap, i.e., the smallest |λ 2 |. However, it will turn out, surprisingly, that the entanglement rate is in fact not given by |λ 2 |, the reason being a non-symmetric nature of M . As we will show, one can have two configurations with exactly the same spectrum but with different entanglement rates (for example, taking XXZ gates and comparing S configuration with OBC, and BW with OBC, see Table II). Nevertheless, the spectral analysis will suggest good candidates for optimal circuits. Furthermore, the two theorems on spectral equivalence are completely general and hold for products of any matrices acting on nearest-neighbor sites (for PBC the matrices need to be symmetric) and could therefore be of use in other situations.
Combining spectral results on λ 2 from Sec. III and numerically calculated true entanglement rates in Sec. IV will result in the main general message of our work: in a many-body setting involving non-Hermitian transfer matrices spectral analysis can give wrong results.

III. SPECTRAL OPTIMIZATION
In this section we will determine the protocol and the gate with the smallest 2nd largest eigenvalue |λ 2 |, i.e., the largest gap of M , in a 1D geometry where only local gates between nearest-neighbor sites are allowed. Specifically, if we have T distinct n.n. gates, i.e., T = n − 1 for OBC and T = n for PBC, we find a sequence (configuration) of those T gates among all T ! different permutations as well as among all possible choices of a 2-qubit gate M i,i+1 (held fixed for all gates) that has the smallest |λ 2 |.
As mentioned, we shall treat chains with periodic boundary conditions where also the gate M n,1 is allowed, and chains with open boundary conditions. In both cases we shall first prove that among all factorially many T ! configurations there are only few with different gaps. The idea of proofs is to show that the spectrum of a product of gates M i,i+1 does not change under certain rearrangements (holding the 2-site gate, i.e., M i,j , fixed). This will then allow us to focus on equivalence classes defined as arrangement with the same spectrum. We shall use the equivalence notation A B if the matrices A and B have the same spectrum. It turns out that for OBC there is only one equivalence class, while for PBC there are n 2 ( n 2 is the largest integer smaller or equal to n 2 ). This greatly reduces complexity and allows to numerically find the optimal configuration and the gate.

A. OBC protocols
For OBC we have (n−1)! possible configurations, however, all are spectrally equivalent. Namely, one can prove the following theorem.
The superscript in S (k) j indicates the number of consecutive gates in such canonical order and the subscript j the first site on which they act. For the proof which uses the spectral equivalence under cyclic permutations, ABC CAB, see Appendix D 1.
As a consequence, one can transform any configuration with OBC to the canonical form S (see Fig. 3 for an illustration) without affecting the spectrum of M . For OBC there is therefore a single spectral equivalence class and we can focus only on one representative staircases (S) configuration, so we only need to find the optimal twoqubit gate M i,i+1 (16). Staircases configurations have been considered before in e.g. the context of operator spreading [5] or complexity [64].
We have calculated the 2nd largest eigenvalue |λ 2 | of M = S (n−1) 1 for all different 2-qubit gates in the canonical form. Results (see Fig. 20 in Appendix D 4) show that the smallest eigenvalues |λ 2 | come from the region around a x = a y = 1, a z = 0, which is the XY gate. Looking more precisely at the n dependence for the XY gate (Fig. 21) we find that in the TDL the S configuration with OBC has Exploring also the region around the XY gate, we find that the same largest eigenvalue (23) is achieved also for sufficiently small nonzero values of a z and a x = a y = 1.
In Fig. 4 we can see that at a z = a c ≈ 0.32 the transition happens in |λ 2 |, so that the optimal |λ 2 | = 1 4 is obtained for all XXZ gates with a z ≤ a c .

B. PBC protocols
For PBC things are a bit more complicated, resulting in n 2 spectral equivalence classes. Each class can be labeled by an integer p = 1, . . . , n 2 , being the maximal number of consecutive commuting gates in the canonical representative configuration where B is a brick-wall configuration made out of 2p − 1 gates (see Fig. 5 for an example) More precisely, the following theorem holds. Theorem 2. Any product of n symmetric 2-site matrices acting on n.n. sites on a circle (PBC configuration) is spectrally equivalent to one of n 2 canonical configurations of the form (Fig. 6) The proof that in addition to cyclic permutations uses also that the spectra of A and A T are the same can be found in Appendix D 2 (observe that our M i,j (16) are symmetric matrices).
For PBC protocols we therefore have to find the optimal configuration among the n 2 different ones as well as the optimal 2-qubit gate. We have numerically computed |λ 2 | for small sizes n, see Appendix D 4 and Fig. 22. One observes that fixing a 2-qubit gate (canonical parameters a x , a y , a z ) and n the gap always monotonically increases as we change the equivalence class from p = 1 to p = n 2 . Configurations in the class p = 1 have in the TDL the smallest gap, those in p = n 2 the largest. Focusing on the region around the XY gate we see that in the TDL and for p = 1 as well as for p = n 2 the fastest gate is obtained for a z = 0, a x = a y = 1 (Figs. 23 and 24 in Appendix D 4), which is the XY gate. Analyzing the dependence on n (Fig. 21 in Appendix D 4) we get that in  III. The origin of entanglement rate: "phantom" denotes a phantom eigenvalue resulting in a slower entanglement generation than given by |λ2|, "faster" denotes cases with faster generation than given by |λ2|, and λ2 the one case where the rate is actually determined by |λ2|.
the TDL the eigenvalue is for the fastest class p = n 2 , and for the slowest p = 1 that we simply call S PBC. Contrary to the OBC case the smallest eigenvalue |λ 2 | is achieved only at a single point a x = a y = 1, a z = 0. For the XXZ gate the gap smoothly decreases as one moves away from a z = 0, see Fig. 4. Among all protocols, OBC and PBC, the one with the largest gap, and therefore expected fastest entanglement generation, is the brick-wall (BW) with PBC and the XY gate for which |λ 2 | = 1/9. While we used numerics to obtain this value we can in fact rigorously show, see Appendix D 3, that λ = 1/9 is in the spectrum of M for p = n 2 , which implies an exact relation |λ 2 | ≥ 1/9. Proving analytically that |λ 2 | is 1/3, 1/4, 1/9 for the XY gate and PBC S, OBC, and PBC BW, respectively ( Fig. 4) is an interesting open problem.

IV. ENTANGLEMENT GENERATION SPEED
In the previous section we have identified gates and configurations that have the smallest |λ 2 | and are therefore expected to have the fastest asymptotic entanglement generation rate r E . Comparing the spectral gap for the S configuration between PBC and OBC (Table II) we see that the gap is larger for the OBC than for the PBC. If the gap would be the end of the story that would mean that the entanglement generation rate would be larger for the OBC than for the PBC which would be odd as one would expect exactly the opposite. For symmetric bipartition one has 2 cuts for the PBC and only 1 for the OBC and therefore the rate should be twice as large for PBC than for OBC. Faster rate for the OBC would mean that applying less gates across the cut would produce more entanglement. We will see in this section that numerically analyzing the true entanglement rate will rectify this odd spectral situation -for OBC the true rate will be slower than suggested by |λ 2 |, for PBC it will be faster than |λ 2 |, such that at the end the ratio of the two is recovered at the expected value 2.
For all extremal random circuits, either the fastest (BW), or the slowest in its class (the S configuration), the relevant entanglement rate r E on times smaller than the thermalization time t ∞ (8) is, surprisingly, not necessarily given by − ln |λ 2 | despite |λ 2 | being gapped away from other eigenvalues. In other words, the time t ∞ at which S 2 reaches its random-state saturation value ∼ n/2 is not equal to t ∞ = n/(−2 log 2 |λ 2 |). It can be either larger due to a phantom eigenvalue -a phenomenon where exponentially large expansion coefficients mimic a fake eigenvalue larger than |λ 2 |, or smaller due to again exponentially large coefficients in front of eigenvalues that are smaller than |λ 2 |. As we shall see, the culprit lies in the non-Hermiticity of the transfer matrix M . A quick overview of our results for extremal circuits that we focus on is in Table III.
Non-Hermiticity is important for the following reason. The spectral decomposition of a non-Hermitian operator M has the form (assuming diagonalizability), where λ j are in general complex eigenvalues, while |R j and |L j are the associated right and left eigenvectors. Importantly, left and right eigenvectors are mutually orthogonal, however, left and right eigenvectors are not orthogonal between themselves. Expanding the initial state |Φ 0 over a non-orthogonal basis |R j we have If M would be Hermitian (having orthogonal |R j ) the triangle inequality would bound |c j | 2 ≤ Φ 0 |Φ 0 . For non-Hermitian M no such bound exists and the expansion coefficient c j can be arbitrarily big (see e.g. Ref. [65] for a simple Ising chain in a tilted field with an imaginary component, where the eigenvectors effectively span a subspace of lower dimension and thus the expansion coefficients can get large; see also the Lindblad case in Ref. [20]). This is precisely what will happen for our random circuits. Exponentially large (in n) coefficients c j>2 will dominate over the term c 2 λ t 2 for times t < t ∞ , causing the entanglement rate r E to be larger than − log |λ 2 |. In one case things will be even stranger -the entanglement rate will be smaller than − log |λ 2 |, i.e., like there is a "phantom" eigenvalue larger than |λ 2 |. We shall also explain how that comes about.
To calculate the correct entanglement rate we are going to use numerical simulations to get I(t), or closely related ∆S 2 (t) (9). We will use two methods with the goal of simulating as large a system as possible. This will turn out to be important as the multistage thermalization reveals itself clearly only in relatively large systems -simulating systems with e.g. just n = 16 qubits one could have easily missed it. The first method works by directly iterating the full state |Φ(t) , obtaining where 20). We used such iteration to obtain the exact ∆S 2 (t) for n ≤ 34. For larger circuits memory requirements of the exact method become prohibitive and we used the matrix product state ansatz (MPS) for more memory-efficient representation of |Φ (t) . While the MPS method [68] allows to study large systems it has its limitation beyond times when S 2 becomes too large. Namely, when e.g. S 2 ∼ 50 the floating point double precision "noise" of size ∼ 10 −15 ≈ 2 −50 becomes comparable to I(t), making further simulation hard despite using large MPS bond dimensions upto 1500.
The main part of this section is devoted to the brickwall configuration in section IV A and to the staircases in section IV B. In section IV C we briefly demonstrate that the same phenomena are observed also for generic fixed gates as well as for the random U(4) gates. In section IV D we touch upon the experimentally relevant Sycamore processor that uses a 2D layout of qubits.

A. Fastest scrambler
Let us first have a look at the fastest random circuits, namely the BW configuration for which the spectral analysis would predict r E = ln 9 for the XY gate and PBC, and r E = ln 4 for the XY with OBC. The BW configuration with an appropriate gate type that we will identify will turn out to be the fastest possible entanglement scrambler saturating the theoretical upper bound on the entanglement speed in Eq. (2).
In the following we shall show a number of figures of the same type which will demonstrate the asymptotic entanglement rate and a multistage thermalization. Let us describe what those figures will be showing. The main quantity is the local entanglement rate r E , being equal to the local slope of ∆S 2 (t) = − ln |I(t) − I ∞ |. Specifically, from numerical data we shall plot Logarithms used will be base-2 so that for qubits the theoretical upper bound on the rate is an integer (2 or 4 depending on boundary conditions). We will in addition show plots of purity − log 2 I(t) as well as purity with subtracted asymptotic saturation value I ∞ , i.e., − log 2 |I(t) − I ∞ |. In Fig. 7 we show results of numerical simulations for the BW configuration with the XXZ gate and PBC. In frame (a) we show the XY gate (a z = 0) whereas in (b) and (c) a more general XXZ gate with a z = 0.5. The behavior is similar for any a z < 1 (a z = 1 corresponds to the SWAP gate which does not generate any entanglement). We can see from the purity plot in Fig. 7(c) that there is a kink -a change in the entanglement growth rate -at the point when I(t) gets close to I ∞ (dashed saturating curves). The change in r E is also clearly seen in Fig. 7(b). Regardless of a z the rate in the TDL increases towards 4 for times smaller than the saturation time t ∞ (note that we rescale the time axis by n A = n/2). The convergence with n is however rather slow. If one would look only at small n one could be mislead into thinking that the rate is given by r E = − ln |λ 2 | (n = 34 in Fig. 7(a)), however, at larger n the initial bump gets higher and moves to smaller times. The rate is therefore not determined by λ 2 and equal to − ln |λ 2 |, but is instead larger. For instance, for a z = 0 we have λ 2 = 1 9 , while the rate converges with n towards ln 16 ( Fig. 7(a)). This is perhaps even more clear for a z = 0.5 in Fig. 7(b). Based on the data we therefore conjecture the following scenario in the TDL, holding for any a z < 1: the rate is r E = ln 16 for t ≤ t ∞ = 1 4 n A , at which point there is a discontinuous transition in the rate to a smaller value determined by the 2nd largest eigenvalue |λ 2 |. For any zero or nonzero a z < 1 there is one discontinuous phase transition in the local rate at the critical time t c = 1 4 n A . This is reflected in a phase diagram shown in Fig. 9(a). The relaxation process has two phases: in the first faster phase, whose entanglement rate is in the TDL independent of a z (even if we are very close to a non-entangling SWAP gate), one approaches a random state, in the 2nd, that starts at t c = t ∞ = n A /(2A) when the state is already close to random, the relaxation is slower and determined by − ln |λ 2 |. The physically relevant entanglement generation rate that determines how long it takes to generate all ∼ n 2 bits of entanglement is therefore not determined by the matrix gap, rather, it is equal to ln 16. How can that be? By studying the spectrum of M (in the even sector) and the relevant overlaps with Φ 0 and Φ half (31) that appear in the spectral expansion we can identify the leading terms in the sum. Focusing on a z = 0, there is exactly one with |λ 2 | ≈ 1 9 and then there are ∼ (n − 4)/4 with |λ j | ≈ 1 16 . The weight d j of terms with λ j ∼ 1/16 grows with n, albeit first for small n rather slowly (see e.g. slow convergence with n in Fig. 7(a)). Nevertheless, for large n the terms d 3 ( 1 16 ) t overwhelm the "leading" d 2 ( 1 9 ) t ; this happens roughly when the ratio d 3 /d 2 ∼ (16/9) n/8 ≈ 1.07 n gets large.
Data for the BW with OBC is shown in Fig. 8. For a z = 0 the rate is in the TDL r E = 2 ln 2 for any t. This is in-line with the fact that in this case |λ 2 | = 1 4 and therefore one does have r E = − ln |λ 2 |. Out of the 4 cases studied, BW and S with OBC or PBC, this is the only one where the gap does determine the entanglement rate. Apparent singularities in the rate at integer t/n A visible in Fig. 8(a) are due to a sign change in the I(t) − I ∞ , which is also visible as sharp kinks in − log 2 |I(t) − I ∞ | plotted in frame (b). For nonzero a z things though change. As we have seen in Fig. 4, for a z < a c the gap for the OBC does not change with a z , while for a z > a c ≈ 0.32 it starts to decrease. This is in turn also reflected in the local entanglement rate r E . Upto the thermalization time t ∞ the rate stays at 2 ln 2, and is therefore not equal to − ln |λ 2 | for a z > a c , while at t c = t ∞ = n A /2 the rate jumps to − ln |λ 2 | (Fig. 8(c)). This change in the rate is continuous at a z = a c , but discontinuous for a z > a c . Correspondingly, ∆S 2 (t) (9) exhibits a kink at t c (Fig. 8(d)). Worth noting is that while in the scaled time t/n A the rate will in the TDL be ln 4 already at t/n A → 0 (Fig. 8(c)), in real time t there is a 'delay' of about ∆t ≈ 15 before the rate becomes ≈ ln 4 (slower initial growth in Fig. 8(d)). The phase diagrams showing dependence of r E on a z for BW with OBC is shown in Fig. 9(b).
BW configuration with the XXZ gate is the fastest entanglement scrambler. Considering that λ 2 does not give the correct rate one could wonder whether some other gate that did not look optimal according to λ 2 in Sec- tion III could be even better? The answer is no because the rates 4 and 2 for PBC and OBC, respectively, saturate the bound (2). There are no better scramblers. What could in principle happen, but we think is unlikely, is that for some other canonical gate with parameters a different from XXZ one would get the same maximal rate. For instance, for a = (0.9, 0.8, 0.5) with BW PBC one again has faster rate than given by |λ 2 |, though at ≈ 2.77 it is smaller than 4, see Section IV C for more details.

B. Slow scramblers
Here we discuss the staircases (S) configuration ( Fig. 2(b)) which is also an extremal configuration in the spectral sense. Namely, for PBC its |λ 2 | is the largest among all configurations with PBC (spectral equivalence class p = 1). Note that we do not discuss here the slowest scramblers in the absolute sense because we will still use the fastest gate within the class p = 1 (XXZ). For instance (Table II), XXZ with S and OBC is still faster than CNOT ("slow gate") with BW and PBC (fastest configuration). We expect the same behavior that we reveal here for p = 1 to hold in the TDL also for any finite p, i.e., for a spectral class that consists of a finite BW section and an extensive S part. Let us first have a look at the S configuration with PBC. Results are shown in Fig. 10. We can see that, similarly as for the BW with PBC, the entanglement rate r E is larger than − ln |λ 2 |. Specifically, for a z = 0 one has − ln |λ 2 | = ln 3 while the rate is r E = ln 4 ( Fig. 10(a,b)). While the numerical results are not super clear, perhaps for a z = 0 the rate seems to converge in the TDL to r E = ln 4 also for times t > t ∞ so that there is no phase transition. For nonzero a z one on the other hand has a phase transition in the rate from the initial r E = ln 4 to smaller rate given by the spectral gap, Fig. 10(c). For a z = 0.5 the phase transition seems to happen at a distinctively larger time t c than the thermalization time t ∞ (see Fig. 10(d)), while at larger a z , for instance a z = 0.7 (data not shown), it is very close to t ∞ = n A /2. As opposed to the BW configuration, here the convergence with n towards the TDL rate ln 4 is much faster so that there is almost no delay in purity starting to follow the asymptotics (Figs. 10(b,d)). A conjectured phase diagram is sketched in Fig. 11(a).
The explanation for such larger entanglement rate is similar as for the BW configuration with PBC, though with some different details. The even parity spectrum of M for n = 12 is shown in Fig. 12. Besides the steady state giving I ∞ (blue point) one has n − 1 nondegenerate eigenvalues (λ 2 ) distributed around the circle with radius 1 3 (in the TDL, Fig. 4), and (n − 1)(n − 4)/2 eigenvalues around the circle of smaller radius 1 4 (those are grouped into n − 1 in the TDL degenerate clusters each having (n − 4)/2 eigenvalues, e.g. 4 for the n = 12 example in Fig. 12). In the TDL though not all contribute in the purity expansion (33). It turns out that only those on the real axis contribute, that is one with λ 2 = 1 3 (orange point), and (n − 6)/2 with λ j = 1 4 (red point). The relative weight d j /d 2 of those with 1 4 grows with n and that is how the faster entanglement rate r E = ln 4 emerges in the TDL.
Configuration S with OBC is even more interesting. In Fig. 13 we show numerical results for r E (32). The rate is r E = ln 2 and is therefore smaller than ln 4 suggested by the gap. For a z = 0 such rate persists for any finite time t/n A , however, looking at I(t) − I ∞ (Fig. 13(b)) we note that there are discontinuous jumps at integer t/n A , the size of which increases with n. At such times purity I(t) closes in on its asymptotic value I ∞ by a finite amount (that increases with n) in a single time step. For nonzero a z = 0.5 the situation is similar, with the difference being that the rate at t > t c = n A jumps to smaller value than ln 2 (Fig. 13(c)). In this case though the rate after t c is not simply equal to − ln |λ 2 | as in other cases studied.
Explanation of how one can get slower rate r E = ln 2 despite all nontrivial eigenvalues of M being |λ 2 | ≤ 1 4 is very interesting and is illustrated in Fig. 14. In the frame (a) we can see that there are n − 1 eigenvalues with |λ j | = 1 4 , of which however only n/2 have nonzero overlap with the initial vector (red points in (b)). The size |c j | of those coefficients grows exponentially with n and they come in conjugate pairs. The resulting expression for purity is therefore of the form I(t) − I ∞ ≈ n/2+1 k=2 |d k | · |λ k | t 2 cos (ϕ k t + α k ), where ϕ k is the phase of λ k and α k the phase of d k . Coefficients d k , Eq. (33), are such that the corresponding sum including cosine terms mimics exponential growth ∼ 2 t upto time t = n/2, when it discontinuously jumps to 0 in a single unit of time (frame (c)). The end result is that their growth 2 t partially cancels ( 1 4 ) t from the actual eigenvalues, resulting in a decay ( 1 2 ) t as there would be a phantom eigenvalue of size 1 2 . Again, this is possible only due to non-Hermitian M whose eigenvectors are not orthogonal. The size of discontinuous jumps in (c) grows exponentially with n and is responsible for jumps we observed in Fig. 13(b).
In Fig. 14(d) we can transparently see how all this is reflected in I(t): discontinuous jumps are such that on times larger than t ∞ = n/2 one actually does get an overall ∆S 2 (t) t ln 4 as predicted from λ 2 , however, in the TDL the time t ∞ is pushed to infinity and the physically relevant entanglement rate towards the random-state entanglement is instead given by the phantom eigenvalue 1 2 . Note that even though the local entanglement rate is for non-integer t/n A always ln 2, due to discontinuities in I(t) at integer t/n A the overall growth of ∆S 2 (t) for a z = 0 is given by ln 4 and not the local rate (Fig. 14(d)). For a z > a c ≈ 0.32, when |λ 2 | moves, the local rate at t > t c = n A also decreases, see Fig. 13(c). It is though not equal to − ln |λ 2 | (the rate decreases from ln 2 whereas the eigenvalue decreases from ln 4). Note that r E = ln 2 for all a z < 1 regardless of |λ 2 | increasing with a z (Fig. 4). This is all reflected in the phase diagram in Fig. 11(b).
We remark that one could in principle get a slower effective decay than |λ 2 | t for special initial states that would have very small overlap with the corresponding eigenvector; in the asymmetric simple exclusion process that exhibits a cutoff [66] such are states with an extensive empty or occupied sections. This is however not what is happening in our case; we get such phantom eigenvalue decay for the relevant initial state, i.e., generic bipartition (provided n A is extensive). The origin of slower decay is in our case different. It is the left and right eigenvectors corresponding to |λ j | ≈ 1/4 (red points in Fig. 14(a)) that must be special as they mimic exponential function (Fig. 14(c)).
We have seen that in most circuits that we studied the relevant entanglement rate r E is not given by the largest nontrivial eigenvalue of M . The ratio of ln |λ 2 | between the PBC and the OBC (Table II) for a fixed W is not equal to 2 neither for the BW configuration, nor for the S, and in particular changes with the a z of the XXZ gate. However, the correct rates r E we identified in this section are always in the expected ratio of 2 : 1 for the two boundary conditions. Rates that are different than − ln |λ 2 | are made possible due to non-symmetric nature of M (despite individual gates M i,j being symmetric matrices) that results in expansion coefficients growing with n. Another observation is that the ratio of r E between the BW and S configurations for fixed W and boundary conditions is always 2 for the XXZ gates, which however is not the case for all 2-qubit gates [67].
It is worth noting that in general the exponential growth of coefficients in our case can not be traced to de-generacies. For instance, taking a simple non-symmetric 2 × 2 matrix the eigenvalues are λ 1 = 1 and λ 2 = 1 − , with the associated right eigenvectors x 1 = (1, 0) and x 2 = ( − 1, ).
In the limit → 0 the two eigenvalues become degenerate (an 'exceptional point'; for = 0 one has a non-diagonalizable Jordan canonical form) and the right eigenvectors are almost co-linear. Expanding y = (a, 1) over such a basis will result in large expansion coefficients of the order of the inverse eigenvalue separation that scales as 1/|λ 1 − λ 2 | ∼ 1 , y = (a − 1 + 1 )x 1 + 1 x 2 . While in some cases one indeed has ∼ n fold degeneracy (S with PBC, Fig. 12), in others one does not (S with OBC, Fig. 14(a)). So it does not seem that the phenomena we found can be simply ascribed to degeneracies.
The phantom eigenvalue phenomenon and more generally of multistage relaxation in which the relaxation rate exhibits a sudden transition has been identified in random circuits. Considering that random circuits are often used as models of chaotic systems one naturally asks if the randomness explicitly present in space and time due to random 1-qubit unitaries is actually necessary. A detailed study of this interesting question goes beyond the present work, however we have numerically checked (see Appendix F 2) that in the TDL one gets the same multistage thermalization even in a single circuit realization and without randomness in both space and time, i.e., one can use the same 1-qubit unitary on all qubits and at all time steps.

C. Generic gates
So-far we have focused on the fastest 2-qubit gates in its spectral equivalence class p = 1 and p = n/2, which turned out to be the XXZ-type gates (4). Random circuits with XXZ gates are dual-unitary and so one naturally asks if the observed multistage thermalization and phantom eigenvalues are perhaps limited to (some) dualunitary circuits? The answer is no. All the phenomena we discussed can occur in circuits that are not dualunitary and this is what we shall briefly demonstrate in this section. Full treatment goes beyond the present paper so we will only show two examples.
The first is a case of the BW configuration with PBC and a generic 2-qubit gate. We pick a completely anisotropic non dual-unitary gate with canonical parameters a = (0.9, 0.8, 0.5). In Fig. 15 we can see that one has exactly the same phenomenology as for the XXZ gate (compare with Fig. 7). Namely, the initial rate is in the thermodynamic limit larger than − ln |λ 2 | and therefore one has a multistage thermalization. This local rate though is smaller than for the XXZ gates, where it is r E / ln 2 = 4. This is the reason why we conjecture that One has a multistage thermalization with rate rE/ ln 2 ≈ 2.77 that is non-maximal (less than for XXZ gates) but still faster than the one calculated from |λ2| (dashed brown line at ≈ 1.34 in (a)). the XXZ gates are the only ones having the maximal entanglement production rate. The second non dual-unitary example that we show is a much studied case of completely random 2-qubit gates distributed according to the Haar measure on U(4). For previous results on the rates for different configurations see Table I. We shall focus on the S configuration with OBC. Their dynamics can again be described by a Markovian matrix [23] with an elementary 4 × 4 two-site matrix [24]. Due to spectral equivalence of all configurations with OBC we also know that λ 2 for the S configuration that we study is the same as for the BW configuration. In the TDL it has been calculated ( Table I) that for the BW with OBC one has |λ 2 | = (4/5) 2 . Therefore, if the rate would be determined by λ 2 one should have asymptotic decay I(t) (4/5) 2t , i.e. r E = 2 ln (5/4). Numerical simulations in Fig. 16 however show that this is not the case. Based on the data we conjecture that at any finite t the purity for a half-half bipartition is in the TDL in fact exactly equal to The decay is slower than one would predict from λ 2 and one therefore again deals with a phantom eigenvalue like in other staircases circuits that we studied. For the BW configuration U(4) random circuit with open boundary conditions see Ref. [69]. Because the Markovian matrix M essentially determines evolution of squares of expansion coefficients like those of the density operator, one can expect that the phenomena identified in purity will be present also in other quantities that are quadratic in time-evolved coefficients. One such object are the out-of-time-ordered correlators (OTOC), for which our preliminary results show [19] the same effects. One of the largest experimental realizations of quantum computation is a recent work by Google AI Quantum performing a random-circuit like computation on 53 qubits [3]. Here we analyze the entanglement generation speed of their 2D qubit geometry that is used in the Sycamore quantum processor.
In the Google experiment they used a fixed 2-qubit Sycamore gate with canonical parameters a x = 1, a y = 1, a z = 1/6, whereas for 1-qubit they randomly sampled from a set of square-roots of Pauli matrices [3]. For the sake of simplicity, and to stay within the framework used in our paper, we take the 1-qubit gates to be random and uniformly distributed on U(2). The geometry used in the Sycamore processor and that we analyze is schematically represented in Fig. 17. The protocols that we study are composed of commuting groups of 2-qubit gates denoted by 4 letters A,B,C and D, see Fig. 17. For instance, the protocol used in the Google's supremacy experiment is ABCDCDAB. Due to size limitation we of course can not simulate a circuit with n = 53 qubits, so we focus on a half-sized patch-circuit with n = 27 used by Google. We shall in addition show results for two other protocol steps (ABCD and ABCDCADB) that turn out to be optimal at smaller sizes.
In Table IV we compare all three mentioned protocols, as well as different choices of a 2-qubit gate. In order to facilitate the comparison of entanglement generation rates between protocols with different number of gates we shall compare the effective eigenvalue λ. The effective eigenvalue is equal to the true eigenvalue |λ 2 | renormalized to a step with n gates, that is, in a protocol step with T local gates M i,j it is λ = |λ 2 | n/T . For example, protocol steps with 8 letters have T = (10n − 30)/3, whereas the ABCD protocol has T = (5n − 15)/3. Similarly as in 1D protocols, fixing a configuration the fastest gate is the XY gate. Comparing different configurations (for small n we checked all possible permutations of 8letter protocols) we find that while at smaller n there are configurations that have smaller λ than the Google's ABCDCDAB, at n = 27 the Google's choice seems to be the optimal one. We can see that the optimal eigenvalue λ = 0.141 for the XY gate is by ≈ 15% larger than for the BW PBC |λ 2 | = 1/9 (27), however, applying a BW protocol on the 2D geometry in Fig. 17 would necessitate additional long-range 2-qubit gates. Similar gain (≈ 10%) would be obtained also for the Sycamore gate. For non-optimal 2-qubit gates, like the CNOT, the optimal protocol can be different than ABCDCDAD even at n = 27, see Table IV.
In 1D protocols studied in the rest of the paper we  TABLE IV. The effective eigenvalue λ for different configurations, gates (XY, Sycamore, CNOT, and random), and system sizes n in a 2D geometry (Fig. 17). With random a we denoted the gate with randomly generated canonical parameters ax = 0.8501, ay = 0.4628, az = 0.1204. The effective eigenvalue describes purity decay after the application of n 2-qubit gates.
have seen that the entanglement generation rate r E was usually not given by the spectral gap. In Fig. 18 we show numerically calculated purity for protocols discussed in this section. The subsystem A for our bipartition is for even m composed of m/2 bottom rows of qubits (see Fig. 17), while for odd m we take the bottom (m − 1)/2 rows plus an additional leftmost qubit in the next row.
We can see that for all cases the purity rate agrees with − log |λ 2 |. It is now evident that taking a CNOT gate instead of the XY or the Sycamore gate would results in a significantly slower entanglement generation. It is an important and nontrivial task to find the optimal configuration and choose the best gate for each topology at hand.

V. DISCUSSION
Studying purity entanglement generation speed in quantum circuits with random single-qubit unitaries and a given nearest-neighbor 2-qubit gate we have identified circuits with the fastest entanglement production. They are the fastest possible and by a significant factor faster than previously studied cases, for instance, circuits with CNOT or random U(4) gates. These fastest scramblers are circuits with a brick-wall pattern of applied gates, with either open or periodic boundary conditions, and the XXZ 2-qubit gate, a special case of which is the XY gate.
Such random circuits should be of interest in experiments where they can reduce the running time and therefore increase the fidelity. They are simple examples of "fastest" chaotic many-body systems, see also Ref. [10] for a Floquet system with such property. Interestingly, relaxation towards the asymptotic random-state entanglement in optimal protocols, as well as other, does not proceed with a single relaxation rate but rather exhibits a phase transition: relaxation (or, equivalently, the entanglement generation rate) is faster until the thermalization time (thermalization time is the time when the entanglement closes in on its volume-law saturation value ∼ n A ), but then discontinuously transitions into a smaller rate. This happens because the initial relaxation rate is not given by the transfer matrix gap, as one would naively expect. On a mathematical level it arises due to nonsymmetric Markovian transfer matrix even though the action of each individual gate is described by a symmetric matrix. This is so because the expansion coefficients in the spectral decomposition can get exponentially large in the size of a many-body system. As a consequence a subleading eigenvalue, even if it is gapped away from the leading one, can give the dominant relaxation rate. Even stranger is the observation that for the same non-Hermitian reasons it can happen that such exponentially large terms result in relaxation with a rate that is smaller that the one given by the leading eigenvalue. It is as if one would have an additional 'phantom' eigenvalue in the spectrum that is larger than any actual eigenvalue.
We did demonstrate that the same phenomenology occurs also for non-extremal fixed 2-qubit gates (non dualunitary), as well as for the case of Haar-random 2-qubit gates. For the latter several conjectured exact results about the largest eigenvalue and the purity decay are left as open problems. We also show that spatial as well as temporal randomness of 1-site unitaries is not necessary. This suggests that the same phenomenology of a multistage relaxation should arise also in many other situations, including non-random systems. While we discussed the average entanglement, for large system sizes explicit averaging over different circuit realizations is not necessary -a single circuit realization will already show the phenomenon.
The phenomena therefore seem to be rather general, a common point being non-Hermitian many-body matrices. One wonders in how many other situations with non-Hermitian operators such a physics can arise (e.g., dissipative systems, scattering problems, etc.). A promising approach would be to find a continuous-time system with the same phenomenology. Alternatively, starting with a Hamiltonian with noise (instead of with a Floquet system -a quantum circuit -with random unitaries) one can arrive at the Lindblad master equation after averaging over noise. In some cases or limits the resulting [70] non-Hermitian generator is equal to some well studied models, simplifying the insight. It has been shown [70] that some features in such a setting are the same as in random circuits.
We stress that one needs sufficiently large systems to observed the effects. We are therefore dealing with genuine many-body physics. Convergence with n is namely in some cases quite slow and one needs good numerical methods to access the required sizes. How do other quantities besides purity behave in such random circuits is also an interesting open problem. This includes other Rényi entropies as well as operators, correlation functions etc.. Von Neumann entropy for instance does show multistage thermalization. Preliminary results show [19] that OTOCs also do exhibit effects of a phantom eigenvalue. Because the extremal random circuits belong to a class of dual-unitary circuits non-Hermitian effects that we identify could be at play in such situations, and perhaps even more generally in some chaotic Floquet systems [10,47,71].
Do Markov chains describing random circuits exhibit a cutoff phenomenon known to happen in some Markovian chains [72,73], and which has been speculated [38] to actually occur in certain random circuits, and if yes, are phenomena we observe in any way related to it? In the cutoff phenomenon relaxation towards the invariant state as measured by the total variational distance is sudden -upto the cutoff time there can be 'memory' of the initial state and no relaxation, which is then suddenly 'forgotten' at the cutoff (mixing) time. Based on purity decay, which is always exponential but with a "wrong" rate that is only by a factor different than the inverse gap, it would be tempting to conclude that there is no cutoff. Namely, a necessary condition for a cutoff [74] is that the mixing time is parametrically larger than the inverse gap (their ratio should diverge in the TDL). However, to really probe the cutoff one should study the full measure relaxation as quantified by the total variational distance which is in particular concerned with a worst-case relaxation scenario. We were on the other hand studying initial states that correspond to a valid purity vector (purities for different bipartitions can not be arbitrary) and relevant bipartitions with extensive n A .
While we explained the behavior in terms of nonorthogonality and the way expansion coefficients behave, the underlying physics remains to be analytically understood. When and why do such phase transitions in the rate happen? Is it associated with some change in the entanglement properties of the underlying state, like e.g. their Schmidt spectrum being different than that of random states [50]; features like that have for instance been identified in either appropriate random ensembles [75] or under random evolutions [76,77]. It has been observed that around thermalization time fluctuations qualitatively change in random circuits [78].
We focused on qubits and the case of nearest-neighbor protocols, however other topologies are also of interest, including systems in higher dimensions as well as systems with larger local Hilbert space (qudits). For instance, for the all-to-all coupling we have calculated the exact asymptotic expression for the spectral gap for any gate (Appendix G), however we did not explore the entanglement generation.
We would like to thank A. Nahum, S. Gopalakrishnan, A. G. Green and A. Lamacraft for discussion. Support by Grants No. J1-1698 and No. P1-0402 from the Slovenian Research Agency is acknowledged.
The first term at the rightmost position of equation (B5) will be, up to the normalization factor ( 1 √ 3 ) n , equal to 2 nA , and the second term will be 2 nB . We normalize the coefficients of Φ ∞ by demanding I ∅ (∞) = 1, so which agrees with equation (6).   which is the canonical configuration with p = (i + 1)/2. This concludes the proof.

Proof of eigenvalue 1/9
We can also prove that λ = 1/9 is an eigenvalue of M for the brick-wall configuration with PBC and the XY gate.
Proof. Let us for simplicity focus on even n. Taking PBC BW M = B with the XY gate we can explicitly construct an eigenvector v of M with eigenvalue 1/9. The ansatz for the eigenvector is where v α = v 1 +αv 2 , where v 1,2,3 are the eigenvectors of a 2-qubit M i,i+1 (19) and T is the translation operator by one site on a circle, e.g. with even i). The ansatz vector v is for an arbitrary α already the eigenvector of M o with λ = −1/3. If we want it to be the eigenvector of the whole M , we must fix the constant α so that it is also the eigenvector of M e . Let us focus on a pair of qubits (i, i + 1) with even i. We can expand v as a linear combination of n4 n 2 −1 basis vectors e k (we choose the same basis as in equation (16)). Each basis vector can be further decomposed as e k = w k 1,i−1 ⊗ v k i,i+1 ⊗ w k i+2,n , where the subscripts denote the range of qubits which these decomposition vectors describe. For every v k i,i+1 we want ei- i,i+1 must be either (3, 0, 0, 1) or (0, ±1, 1, 0). The vectors v k i,i+1 depend on the neighboring pairs (i − 1, i) and (i+2), which can be v α ⊗v α , v α ⊗v 3 or v 3 ⊗v α (D2). We have 32 possible combinations on qubits (i, i + 1), but for now let us focus on 4 instances: (9, 0, 0, α 2 ), (α 2 , 0, 0, 1), (0, 3, α 2 , 0) and (0, α 2 , 3, 0). We obtain the desired vector if we fix α = √ 3. For all other 28 possibilities our demands are automatically fulfilled without the specification of α; one can check this by writing down all 32 possibilities that come from terms v α ⊗ v α , v α ⊗ v 3 and v 3 ⊗ v α . For small systems (n = 4, 6, 8) we numerically checked the correctness of our guess. The eigenvector v does not have a good parity and can be decomposed as v = v e + v o , into an even and odd parity parts, both of which are, due to parity conservation of M , still eigenvectors. This concludes the proof.

Numerical data for λ2
In this appendix we present numerical results used to determine the fastest asymptotic scrambler for PBC and OBC protocols. To calculate λ 2 we used either exact diagonalization or the power method as described in Appendix E. Fig. 20 shows color-coded plots of |λ 2 | for the S configuration with OBC for different values of all three canonical parameters a j . Looking for the smallest |λ 2 |, and how things change with increasing n, a promising candidate for the smallest |λ 2 | is a = (1, 1, 0), i.e., the XY gate. Studying more closely how |λ 2 | depends on n for this XY gate, Fig. 21, we can conjecture that in the TDL one has |λ 2 | = 1/4 for any protocol with OBC (remember that for OBC the spectrum does not depend on the configuration). Fig. 22 shows the values of |λ 2 | for different twoqubit gates (parameters a x , a y , a z ) with PBC and different configurations (parameter p) for fixed n = 10. From Fig. 22 we learn that |λ 2 | monotonically decreases as we increase p. Moreover the fastest entanglement generation comes from the region a z = 0. Figs. 23,24 show the dependence of |λ 2 | on a x , a y , a z = 0 and n for p = 1 (slowest scrambler) and p = n 2 (fastest scrambler) respectively. Contrary to OBC protocols, where the fastest asymptotic scrambler is obtained for all a x = a y = 1, a z ∈ [0, a c ≈ 0.32], the smallest |λ 2 | for PBC protocols comes from a single choice of two-qubit gates, namely a x = a y = 1, a z = 0 (XY gates). Appendix E: Numerical calculation of spectral gaps for large n Whenever exact numerical diagonalization was too demanding in terms of memory, we calculated the dominant eigenvalue |λ 2 | using the power method. Briefly, the power method consists of iterations of form where M is the matrix of which we want |λ 2 | and w k is the vector we iterate. For almost all initial vectors w 0 the power method converges to the vector with the greatest eigenvalue in absolute value. In our case, the matrix M has one even-parity eigenvector Φ ∞ with eigenvalue equal to 1 (for details see Appendix B). If we wish to get |λ 2 | from our power iteration, we must subtract the component corresponding to Φ ∞ from w i at every step i. We iterate where * , * is the standard inner product. If |λ 2 | is non-degenerate, the power iteration converges to the corresponding eigenvector. In general λ 2 ∈ C. Because the product M is real, then if λ 2 ∈ C there is also aλ 2 in the spectrum of M . Suppose we want to extract the value |λ 2 | from our power iteration. We begin with a vector w 0 and denote with w ∞ the eigenvector of M with eigenvalue λ 2 . After k 1 iterations we can write where c = w ∞ , w 0 . In equation (E5) we use the fact that at every step we subtract the component resulting from eigenvector with λ = 1; we can equivalently start with a vector orthogonal to Φ ∞ : w 0 , Φ ∞ = 0.
An arbitrary quotient of (k + 1)-th and k-th step can be written as where we used the notation v k = 1 + V cos(kφ). Now suppose φ = l m 2π for integers l, m, then v k+m = v k for every k. Taking the geometric mean of norms ||M k w 0 || of m subsequent iterations we get We found that for most cases φ ≈ l m 2π holds, hence a good estimation of |λ 2 | was possible. In Fig. 25 we compare the von Neumann entropy with the logarithm of the average purity that we studied in the rest of the paper. As we can see von Neumann entropy behaves rather similar as purity, or more precisely as − log I(t) . In particular it also exhibits a phase transition in the local rate at t ≈ t ∞ (t ∞ ≈ 8 in the figure). In the upper data the thermal value is subtracted (left label), while for the lower two saturating sets it is not (right label). All is for n = 30 and S configuration with PBC and the XXZ gate with az = 0.5 (the same parameters as in Fig. 10(c,d)) for which the rate is rE = 2 ln 2.

Fluctuations and randomness
For sufficiently large times the state reached under random-circuit evolution is close to a random state. Due to the measure concentration in a large Hilbert space one can get good self-averaging properties even for a single random circuit realization. In Fig. 26 we numerically check such self-averaging property for the S configuration with PBC and two different system sizes in order to get insight on how fluctuations behave with n. First thing we notice is that a single realization with random Haar i.i.d. 1-qubit unitaries in both space and time (red curves labeled by "diff.t,diff.x") is for large n almost on top of the average purity. Therefore, while an explicit averaging over single-site Haar random unitaries simplifies analytical treatment (results in a Markovian process) it is not necessary for the observed phenomena.
We also check how randomness in 1-site unitaries influences our results. To that end we compare our canonical case where 1-site rotations at each site and timestep are independent ("diff.t,diff.x") with a situation when unitaries are the same at every site ("hom.x") and/or the same at every timestep ("hom.t"). We interestingly see that if we use the same random 1-qubit unitary at every Single random circuit realization results for a single random product initial state (four colored curves with labels) and the average purity (smooth black full and dashed curves). All is for the S configuration with PBC and az = 0.5 (the same parameters as in Fig. 10(c,d)) and system size n = 22 (top), and n = 30 (bottom). Labels for colored curves denote whether 1-qubit Haar unitaries are different on each site ("diff.x") or the same ("hom.x"), as well as whether they are the same at every timestep ("hom.t") or different at every circuit layer ("diff.t").
site, as well as at every time ("hom.x,hom.t"), i.e., the whole single relization of a circuits uses only one Haar 1-qubit unitary, one will get the same behavior in the thermodynamic limit. Based on the data we can conjecture that in the TDL explicit randomness in not necessary neither in space, nor in time. Comparing fluctuations between the 4 cases shown, they are expectedly the smallest for 1-qubit unitaries that are independently random in space and time, followed by the case of random in space and the same in time (every circuit layer uses the same 1-qubit unitaries), then the case with no spatial randomness but with new unitaries at every time, and lastly the largest fluctuations are observed for the circuits that is homogeneous in space and time.