Measurement and entanglement phase transitions in all-to-all quantum circuits, on quantum trees, and in Landau-Ginsburg theory

Quantum many-body systems subjected to local measurements at a nonzero rate can be in distinct dynamical phases, with differing entanglement properties. We introduce theoretical approaches to measurement-induced phase transitions (MPT) and also to entanglement transitions in random tensor networks. Many of our results are for"all-to-all"quantum circuits with unitaries and measurements, in which any qubit can couple to any other, and related settings where some of the complications of low-dimensional models are reduced. We also propose field theory descriptions for spatially local systems of finite dimensionality. To build intuition, we first solve the simplest"minimal cut"toy model for entanglement dynamics in all-to-all circuits, finding scaling forms and exponents within this approximation. We then show that certain all-to-all measurement circuits allow exact results by exploiting the circuit's local tree-like structure. For this reason, we make a detour to give universal results for entanglement phase transitions in a class of random tree tensor networks, making a connection with the classical theory of directed polymers on a tree. We then compare these results with numerics in all-to-all circuits, both for the MPT and for the simpler"Forced Measurement Phase Transition"(FMPT). We characterize the two different phases in all-to-all circuits using observables that are sensitive to the amount of information propagated between the initial and final time. We demonstrate signatures of the two phases that can be understood from simple models. Finally we propose Landau-Ginsburg-Wilson-like field theories for the MPT, the FMPT, and for entanglement transitions in tensor networks. This analysis shows a surprising difference between the MPT and the other cases. We discuss variants of the measurement problem with additional structure, and questions for the future.

A quantum system whose unitary dynamics is interspersed with repeated measurements follows a random trajectory through Hilbert space [1][2][3][4][5], determined both by the unitary part of the dynamics and by the sequence of measurement outcomes. In the many-body case this random dynamics admits a "measurement phase transition" (MPT) between two qualitatively different, stable dynamical phases, with distinct entanglement properties . For definiteness, consider a system of many spins in a pure state, evolving under a quantum circuit that includes both entangling two-spin unitary gates and measurements, which are made at random times at a finite rate per spin. Informally, sufficiently frequent measurements yield a "disentangling" phase: in this phase, the state at a given time is weakly entangled, and is fully specified by the outcomes of a relatively recent set of measurements. (The limiting case of this disentangling dynamics is where all the spins are measured simultaneously, leaving the system in a product state that can be read off from the measurement outcomes.) But when the frequency of measurements falls below a critical threshold, the dynamics enters an entangling phase. In this phase the dynamics produces states with extensive entanglement, which retain quantum information from much earlier times. If the initial state is mixed, rather than pure, then it will rapidly be purified [12] by the repeated measurements in the disentangling phase, but not in the entangling phase.
The simplest toy model for the MPT arises from thinking about the connectivity of the spacetime diagram of the quantum circuit, viewed as a tensor network [6]. In this representation a measurement event is a break in the worldline of a spin, across which quantum information cannot be transmitted. When measurements become sufficiently frequent, the circuit falls apart into disconnected pieces, implying that entanglement in the final state is short ranged and there is no transmission of quantum information, from the initial to the final state, over long timescales.
The existence of the MPT poses several types of questions. Viewing the circuit as a quantum information processor, the MPT is a transition in the properties of a randomly generated error-correcting code [11,24,28], the structure of which has been argued to be optimal in some senses [12]. Pragmatically, we may ask what the transition implies about the the computational difficulty, for a classical computer, of simulating various types of open or monitored quantum systems [6,[29][30][31][32]. For example, dynamics that is in the disentangling phase may allow an efficient matrix-product or tensor network representation of the evolving state. Philosophically, we may wonder what the existence of two regimes implies about how to distinguish dynamical processes that are intrinsically quantum from those that are effectively classical. For example, in both of the phases separated by the MPT, quantum correlations between local observables are "weak", but for different reasons. In the disentangled phase, a local operator is correlated only with a few others nearby. In the entangled phase, it may have nontrivial correlations, but these are detectable only by highly nonlocal, "scrambled" operators, and hidden from local ones. Only close to the transition point does the system escape both mechanisms, allowing nontrivial correlations for local operators [6,9,[13][14][15][16][17]24]. Yet another key question is how to probe the MPT experimentally [13,17]. This question is nontrivial: for example, a naive approach leads to a severe sampling problem (due to the need to compare measurements in distinct experimental runs that have the same measurement outcomes).
Another way of looking at the MPT is as a problem in statistical mechanics and critical phenomena [6, 7, 9, 10, 12-16, 22, 24, 25, 27, 33]. Open questions abound, both about the nature of the phases and about the critical point separating them. Many variants of the measurement transition can be imagined; how do we sort them into universality classes? Are there simplifying limits where exact results are possible? Are there useful continuum field theories for the MPT and related problems, that allow us to apply the tools of the renormalization group?
This statistical mechanics problem is closely connected to an entanglement transition that takes place in random tensor networks [14,34,35] (we will explore the similarities and differences further here) and the same questions apply in that setting. These problems are challenging partly because of the need to average over randomness: either intrinsic randomness in the definition of the dynamics (for example if we consider dynamics using a random quantum circuit) or simply the inevitable quantummechanical randomness in measurement outcomes.
Our focus in this paper is on circuits built from generic unitary gates (for example Haar-random gates). An alternative profitable direction is to study circuits made from Clifford unitaries [7,12,15,17,22,[36][37][38][39][40]. Clifford circuits are efficiently classically simulable, which has allowed direct tests of conformal invariance at the MPT in 1+1D [9,15] and simulations in 2+1D [22]. In general, the universality class of the MPT is expected to differ for Clifford versus generic unitaries (see e.g. Ref. [16]), though many features of the stable phases are similar.
This paper is a journey through several approaches to the MPT, and also to the closely related "forced" measurement phase transition (FMPT, defined below), and to the entanglement transition in various types of random tensor networks (RTN). Our aim is to find settings in which exact results can be obtained for the transition, as well as to clarify the properties of the two phases. We examine several different tools and settings, but the unifying feature is that we consider measurement and entanglement transitions in situations where the compli-  To gain intuition, we start by solving a classical toy model for the transition in the all-to-all circuit, which gives scaling forms for the "minimal cut" cost that determines the zeroth Rényi entropy (Sec. III). In Sec. IV we turn to the truly "quantum" problem, obtaining exact results for random tree tensor networks. These can be applied to the true quantum transition in allto-all circuits with "forced" measurements. Sec. V simulates all-to-all measurement circuits, using operator entanglement of the time evolution operator, and the convergence with time of two initially orthogonal states, to diagnose the preservation of information over an exponentially long timescale that is a hallmark of the entangling phase. Sec. VI develops analytical approaches to the MPT and to entanglement transitions based on the replica trick, clarifying the properties of the two phases and suggesting candidate field theories for the critical points in various settings.
cations arising from low-dimensional spatial structure are reduced.
Much of this paper is concerned with circuits with allto-all couplings between qubits, i.e. with no fixed spatial geometry, which we study using analytical arguments and numeric simulations. (Various types of all-to-all circuit have also been discussed recently in Refs. [12,26].) These circuits are in turn closely related to tree tensor networks, for which we give exact results, including the first exact identification of an entanglement transition in a generic system with finite bond dimension.
Turning to models in generic spatial dimensionality, we discuss and extend tools based on mappings to effective "lattice magnets" [13,14,[33][34][35][41][42][43][44], involving a replica limit [13,14,33,35], which capture the properties of the two phases, and in principle the critical point. We suggest alternative ways of thinking about these effective models, making connections with ideas from disordered magnetism. A key outstanding question is the existence of effective field theories for the MPT. Here we propose -speculatively -two Landau-Ginsburg theories, one for the MPT and one for both the FMPT and the RTN.
The cartoon in Fig. 1 contrasts an all-to-all measure-ment circuit and a circuit with a fixed spatial geometry. In this figure, time runs vertically, and each worldline represents a spin/qubit. Unitary gates are applied between randomly chosen spins at random times, and projective measurements are applied to randomly chosen spins. All-to-all coupling is perhaps the simplest setting for the MPT. Since the distinction between area and volume law breaks down in the all-to-all case (as also in the limit of infinite dimensions), it is natural to focus instead on the transmission of information between initial and final times. Here we characterize this transmission via the operator entanglement [45][46][47][48][49][50] of the nonunitary time evolution operator, defined below. This quantity has a simple interpretation in terms of the surface tension of the "entanglement membrane" in the effective replica description, which we discuss. An even simpler heuristic picture for it comes from the classical toy model, in terms of the minimal cut that separates the top of the circuit from the bottom. We apply all the approaches mentioned above (tree approximations, simulations, replica field theories) in the setting of generic quantum circuits for spin-1/2, as well as related random tensor networks, giving results for scaling properties in the entangled phase and close to the critical point. We also study a solvable "classical" limit of the problem. Our main approaches are illustrated in Fig. 2, and the ensuing section, Sec. II, gives an overview of our results. In closing this Introduction, however, let us briefly clarify the logic of our four-pronged approach to understanding the MPT and its relatives.
Before tackling the "true" quantum circuit problem, we find it instructive to first solve the classical toy model mentioned above, in the particular setting of all-to-all circuits (Sec. III). In this model the entanglement is described in terms of a "minimal cut" through a circuit in which worldlines have been broken by measurement. The minimal cut becomes an exact description of the MPT in certain limits, but in general it does not capture either the location of the critical point or the true critical scaling of the quantum problem. Nonetheless, the minimal cut problem yields some useful lessons for the full quantum problem. Most prominently, it captures key qualitative features of the two phases, including the appearance of an exponentially long timescale for survival of quantum information within the entangled phase. Solving the minimal cut problem also makes clear certain crucial concepts for understanding the MPT in all-to-all circuits, including the local tree structure of the circuit and the relevance of crossover scaling phenomena.
The fact that all-to-all circuits have a local tree structure motivates us to study entanglement transitions in quantum trees (Sec. IV). In this setting we are able to obtain the exact location of the entanglement transition (and exact critical properties) for a dynamics involving generic random gates. This result may be useful for further investigations: studies of the MPT in systems with generic unitaries are often hampered by the restriction of numerics to small sizes, which make it difficult to accu-rately pinpoint entanglement transitions. Moreover, we argue that the critical measurement rate that we identify in the quantum tree is also the exact result for the full all-to-all quantum circuit with forced (postselected) measurements.
Armed with the understanding gained from the minimal-cut and quantum tree problems, we turn our attention to direct numerical simulations of the quantum circuit (Sec. V). The results we obtain are consistent with the critical scaling forms suggested by the previous approaches, and highlight the emergence of an exponentially long timescale associated with information transmission through the circuit in the entangling phase.
Finally, in Sec. VI we discuss mappings of the MPT and of random tensor networks to effective lattice models for a "pairing field", and we discuss how to coarse-grain such models. We construct the simplest candidate Lagrangians that are consistent with the replica symmetry and describe some of their features. We also touch on free fermions subject to measurement [51], which do not show the same kind of transition between weakly and strongly entangled phases but do show transitions of a different type [36,52,53]. We contrast these systems with generic models, and we discuss some other variants of the MPT.

A. Models
Our starting point is a dynamical process in which a large number N of spin-1/2s undergoes unitary evolution punctuated by projective single-spin measurements: Fig. 1, Left. (Circuits with both unitaries and measurements have been referred to as "monitored" or "hybrid" quantum circuits.) The spins are "all-to-all" coupled, meaning that unitary gates may be applied between any two spins in the system. These gates are applied at a uniform rate between randomly chosen pairs of spins, and are themselves drawn independently from a random ensemble (e.g. the Haar ensemble). Measurements, which are made in the Z-basis, are also applied at a uniform rate to randomly chosen spins. The only parameter is r ∈ [0, 1], which determines the relative rate of measurements and unitaries: in a unit interval of time there are on average rN measurements and (1 − r)N unitary operations.
We distinguish between two possibilities for the projective measurements, which we refer to as "measurements" and "forced measurements", respectively. (Correspondingly we refer to the "measurement phase transition", or MPT, and "forced measurement phase transition", or FMPT.) The outcomes of "measurements" are determined as usual by the Born rule, based on the state of the system at the time of measurement. By contrast the probability of a given outcome for a "forced measurement" is independent of the state. We will take it to be 1/2 for both of the two possible outcomes, ↑ and ↓ -but in fact, for the ensembles of random unitaries we consider, it is completely equivalent to take all the measurement outcomes to be ↑. We can think of the FMPT as pertaining to a protocol in which we run (exponentially) many samples, discarding all those except those that yield the desired ("postselected") sequence of outcomes.
To formalize the distinction between MPT and FMPT, define V m to be the nonunitary time evolution operator represented by a given realization of the circuit. This operator is the product of unitaries and projection operators: we have labelled it by a given sequence m of outcomes for the measurement events: for example m = (↑, ↓, . . . , ↑). (V m also depends on the total time t, locations and times of the unitaries and measurements, and the specific random unitaries in the circuit realization, but we leave these dependencies implicit.) For the MPT, and for a given sequence of unitaries and measurement locations, the probability of a sequence of measurement outcomes m is where |ψ(0) is the initial state. For the FMPT it is where |m| is the number of measurements in a given realization of the circuit. In both cases, the time evolution of a pure state is It is occasionally useful to generalize the circuit to a variable number of spin states q for each site. In particular, the limit of large q is one way to motivate the classical problem we describe below. Having started with the models above, we will be led to consider some other related problems. These models will be introduced as we need them. Sec. IV considers a class of tree tensor networks, one example of which is closely related to the FMPT case above. Sec. VI addresses both circuits and tensor network models in a finite number of dimensions, in which we do have a sense of spatial locality.

B. Detecting the entangling phase
Before turning to the critical properties, we discuss the more basic issue of how to distinguish the two phases.
The entanglement transition can be identified with the vanishing of an effective surface tension for a membranelike object in spacetime, as we discuss below. In the classical toy model, this membrane is a minimal cut through the circuit [6]. In a more precise picture, it is a domain wall in an effective statistical mechanics problem (see following sections). The surface tension of this membrane/domain wall is positive in the entangling phase [6]. The entanglement entropy of states and operators can be described in terms of the surface tension of an effective membrane/domain wall. Left: interpretation of entanglement entropy of a spatial subregion, in the entangling phase, as the free energy of an anchored membrane. Right: operator entanglement of the nonunitary circuit (in the entangling phase) in terms of a "horizontal" membrane (see also Ref. [13]).
In finite dimensions, the vanishing of this surface tension, which we denote s n (r), 1 implies a vanishing of the entanglement entropy density of the states produced by the dynamics at late time. This density is the coefficient of the volume law for the entanglement entropy of a spatial subregion, and is given by the surface tension s n (r). This is because the subregion's entropy maps to the free energy of a membrane that is anchored on the boundary of region A on the final time surface: see Fig. 3 for a schematic in 1+1D.
In the all-to-all circuit there is no distinction between areas and volumes (as in the limit of high dimensions), so the naive attempt to define an entropy density using the entropy of a spatial subregion is contaminated by trivial short range entanglement. 2 Instead it is simpler to consider the entanglement properties of the operator that implements the time evolution itself. The operator entanglement [45][46][47][48][49], defined below, is a measure of the amount of quantum information transmitted from the initial to the final time by the nonunitary evolution operator V m . In the membrane picture, this operator entanglement is equal to the free energy of a "horizontal" membrane that completely traverses the system [13], as shown in Fig. 3. This observable also detects the vanishing surface tension s n (r) for the domain wall, as detailed below, but it does not require us to specify a spatial subregion.
Gullans and Huse proposed in Ref. [12] to think about dynamics with measurements in terms of the entropy of a state that starts out as maximally mixed, and is gradually purified by the dynamics. (The entangling phase is then a "mixed phase, where the state remains mixed for a long time, and the disentangling phase is a "pure phase where the state is rapidly purified.) This mixed state en- 1 In general the membrane tension can depend on the Rényi index n [33]. It can also depend on the orientation of the membrane [50], but here we are interested only in membranes that are "horizontal" on large scales. 2 I.e. entanglement that can be removed with a shallow-depth circuit.
tropy is in fact equal to the operator entanglement of the nonunitary evolution operator. Ref. [12] noted the exponentially long timescale for the survival of quantum information in the entangled phase (and plateaus in various observables), which will play an important role below. See also the recent Refs. [27,54]. Formally, the nth operator entanglement entropy of the circuit, denoted S n throughout this paper, may be defined via the singular value decomposition of the nonunitary time evolution operator V m : where {|j 0 } and {|j t } are bases corresponding to the initial and final time. Normalizing the λ j so λ 2 j = 1, For the unitary case (r = 0), S n = N ln 2 is maximal at all times. For positive r, and for asymptotically late times, a single term dominates Eq. 4, meaning essentially that all initial states are projected onto the same final state -i.e. the final state can be read off from measurement outcomes m (and the structure of the circuit) without knowledge of which initial state was fed in. We will also discuss another observable for quantifying the transmission of quantum information from initial to final times, which is more numerically tractable: this is the overlap between two initially orthogonal states, both subjected to the same V m . (In the entangling phase, initially orthogonal states remain orthogonal for a long time. ) We will characterize the operator entanglement in the classical toy model (Sec. III), in numerical simulations (Sec. V), using the replica trick (Sec. VI I), and with a crude toy model based on multiplying random matrices (Appendix E). The following basic points hold in all of these approaches.
First, in the entangling phase a nonzero density can be associated with the operator entanglement: We think of this quantity as the information transmitted per spin, or in the membrane picture as the surface tension for a "horizontal" membrane. s n (r) is positive in the entangling phase, and vanishes continuously, for all n ≥ 1, as the critical measurement rate r c is approached from below. As for almost any product of many random matrices, we expect that if N and r are fixed, then at sufficiently late times one of the singular values dominates the others and S n decays exponentially in time. But if s n (r) is positive, this exponential decay does not set in until a time τ (r, N ) that is exponentially large in N . We may define a(r) = lim N →∞ ln τ (r, N ) N . (7) Close to the transition, at r r c where s n (r) is small, (up to an order 1 constant of proportionality). On times t satisfying ln t ln τ , the entanglement deviates logarithmically from the "plateau" value dictated by Eq. 6. For example in one regime, This formula has also been obtained in various limits in Refs. [12,27]: in particular Li and Fisher in Ref. [27] give a discussion very similar to that in Sec. VI I, in terms of domain walls in an effective quasi-1D model. In this interpretation ln t is the translational entropy of a domain wall. More generally, randomness and other effects can modify the nature of the subleading term above slightly, depending on the time regime.
In contrast to the above, the information transmitted per spin, lim N →∞ 1 N S n (r, N, t), decays exponentially with t in the disentangled phase.
We now give an overview of our approaches to critical properties of these circuits and related models, considering each approach in turn (summarized in Fig. 2). The reader may obtain the key points of each approach from the corresponding Overview section. We also highlight some points that are not yet resolved, and places where our arguments rely on conjectures that could be tested further.

C. Min-cut toy model
Before attempting an exact treatment of the true quantum transition in the spin-1/2 circuit, we consider a limit (we will sometimes refer to this as the "classical" limit) in which the entanglement transition becomes a simple geometric problem involving a random graph. This graph is defined such that its edges represent the time evolution of each spin, which can be severed by measurement, and its nodes represent interactions (applied gates) between spins. The analog of the operator entanglement entropy is the cost of a "minimal cut" that disconnects the initial-time and final-time nodes: see Sec. III for a detailed definition.
Determining the scaling of this min-cut cost is a toy problem that provides intuition for the generic "quantum" problem. The minimal cut becomes an exact description of the operator entanglement only in special limits, as described in Sec. III (specifically, for projective measurements in the case where the local Hilbert space dimension q goes to infinity, and for a generic local Hilbert space dimension if we consider the somewhat unphysical zeroth Rényi entropy, S 0 ). The "classical" problem has its transition at a measurement rate r cl c that is, for spin-1/2, strictly larger than the critical measurement rate r c for the true quantum transition, as diagnosed for example by all the S n with n ≥ 1.
We first identify the critical point r cl c associated with percolation on the graph, which illustrates the importance of local tree structure in all-to-all circuits. We then present an effective continuum field theory for percolation on this graph, which gives the relevant scaling forms near r cl c . We demonstrate this critical scaling using extensive numerical simulations for percolation observables and correlation functions. This demonstration is possible despite significant finite time-corrections, which arise because the critical timescale scales as N 1/5 and is modest even for simulations with very large N .
We demonstrate the plateau in the cost of the minimal cut that was described above, S 0 ∼ s 0 (r)N over a long timescale. Close to criticality at r r cl c we find the entanglement density (min-cut tension) which is an appropriate limit of a general scaling form S 0 = H(t/N 1/5 , δrN 2/5 ), and the corresponding long timescale The scaling we identify applies not only for the all-to-all problem, but also for spatially local circuits with spatial dimension d ≥ 5, as follows from a standard crossover scaling argument.
D. Tree tensor networks: exact results When the system size N is large, the structure of the quantum circuit in the vicinity of a given unitary is treelike (the smallest loops involve a parametrically large number of unitaries). This means that it is trivial to locate the classical critical point mentioned above. But in some cases (forced measurements) it also allows exact results for the quantum problem. This motivates us to study entanglement transitions on "quantum trees", i.e. tree tensor networks, in Sec. IV.
While our approach could be generalized, we focus on trees with bond dimension 2, where each node is a random tensor whose probability distribution is invariant under U(2) rotations on its legs. This includes trees that appear spontaneously in the spin-1/2 FMPT circuit for unitaries drawn from the Haar measure, for example.
Formally we can think of an (upside-down) tree like that shown in Fig. 2 as a tensor network wavefunction for a single spin at the apex (root) and many spins (leaves) at the base. Our starting point is to characterize the entanglement between apex and base, which for a bonddimension 2 tree is characterized by a single number, Z.
For an asymptotically large tree, Z has a critical vanishing at a particular measurement rate r c . (In more general trees, r can be thought of as a parameter in the node tensors' distribution. ) We write a random recursion relation for Z as a function of the generation number k of the tree. This recursion relation allows us to derive the location of the critical point r c analytically for the case with Haar-random unitaries (we also study a slightly broader class of distributions): This critical point r c is detected by any Rényi entropy S n with n > 0; S 0 instead detects the classical transition, at the strictly larger value r cl c , discussed in the previous section. This is the difference between the existence of a percolating path connecting the root of the tree to infinity (for r < r cl c ) and the ability of the tree to broadcast a nonzero amount of quantum information from the root of the tree to infinity, rather than an amount that decays exponentially with the distance from the root.
Assuming a plausible conjecture, Eq. 12 is also the value of r c for the FMPT in the all-to-all Haar circuit, and yields a bound on the critical scaling of the entanglement density s 2 (r) (Eq. 6). While the treatment of the tree may hold lessons for the MPT in addition to the FMPT, we do not discuss the MPT from this perspective: the measurement correlations encoded in Eq. 1 hamper our approach.
We also obtain the the critical scaling of Z for r r c . Since the full nonlinear recursion relation for Z is complicated, this requires us to make a conjecture, which is that the universal features of the scaling are faithfully retained in a simplified nonlinear recursion relation. We can then write a continuum description that describes a Fisher-Kolmogorov-Petrovsky-Piskunov-like traveling wave [55]. This is a standard description for the partition function of a directed polymer on a tree [55], with the addition of a diffusion constant that varies with the (fictitious) spatial coordinate, reflecting the nonlinearity of the recursion. For the parameters of the trees we treat, there is a surprisingly rapid scaling close to the critical point: the entanglement between apex and base of an infinite tree scales as (S tree 2 2Z) We also address the scaling of S n as a function of tree size exactly at r c . Using a nonrigorous argument, we extend these formulas to give the entanglement of a subset of the spins, in a tree tensor network wavefunction for a spin chain (whose spins are the leaves of the tree). These results are not relevant to the all-to-circuit, but are interesting in the context of tree tensor network states, which are toy models for some features of scale invariance in 1+1D, and are also useful numerical tools [56][57][58][59][60][61][62][63][64][65]. We obtain a "modified minimal cut" formula for the tree, in which the cost of cutting a bond in the tree is loosely speaking weighted by appropriate factors of the quantity Z, which is parametrically small close to r c . This gives a quantitative picture of how the entanglement of consecutive spins in a tree tensor network state goes from the well-known logarithmic scaling, S ∼ c(r) ln , suggested by its hierarchical structure [64][65][66], to an area law state, S = O( 0 ). We find that c(r) vanishes exponentially as r → r c and that the state is area-law even at r c .
Recently, Ref. [65] studied the entanglement transition in a quantum tree state, with bond dimension 3, using a different approach. The authors conjectured that the scaling was the same as in a statistical mechanics model that shares some of the features of a replica formulation derived from the tree (the exact replica formulation was not tractable). Surprisingly, the findings in Ref. [65] are quite different from those we obtain (assuming the conjecture mentioned in the previous paragraph) in the trees studied here. For example, Ref. [65] finds that the coefficient c in S ∼ c ln is a power law in the control variable close to the transition, and that entanglement is superarea-law at r c . The reason for the different results in these two models remains to be understood.
Our conjectured continuum description for scaling in the tree has a parameter ∆ that describes the degree of disorder in the tensor network, and which determines the critical exponents. For the trees we study, whose node tensors have a distribution with a U(2) invariance property on the legs, this parameter is fixed to ∆ = 1/4 at r c . This corresponds to a "strong disorder" regime [55]. We raise the question of whether general distributions of tensors allow us to explore the phase transition at other values of ∆. If so, it is possible to obtain a range of universality classes for the tree transition, analogous to a renormalization group fixed line. However, we have not determined whether this is possible.

E. Direct simulations of quantum circuits
We perform direct simulations of the all-to-all measurement circuit and forced measurement circuit, and interpret the results in the light of the tree calculation and the replica approach described below. These simulations are computationally demanding: we are limited to system sizes N ≤ 20 for quantities involving states and to smaller sizes for the operator entanglement. Determining r c accurately (the value of which is expected to differ for measurements and forced measurements) is not possible, but we are able to confirm many of the key features of the entangled phase in Sec. II B.
We give evidence for the plateau (6) in the operator entanglement, with a nonzero information transmission per spin s(r) that is asymptotically time independent, and for a positive exponential growth coefficient a(r) > 0 for the characteristic timescale within the entangled phase.
It is convenient to define this timescale τ via the latetime convergence of two distinct, initially orthogonal, states |ψ 1 (t) and |ψ 2 (t) that are postselected to undergo the same sequence of measurement outcomes, so that they are evolved with the same V m . These states remain approximately orthogonal for a long time in the entangled phase: a kind of effective unitarity of the nonlinear, nonunitary time evolution Eq. 3 for a given m. (This orthogonality is related to the error-correction property of the dynamics [11,12].) The two states collapse at late times. We show that a(r) is positive at small r and vanishes at large r.
For forced measurements our expectation is that r c is given by the result of the tree calculation. Numerically, it in fact becomes unmeasurably small at a significantly smaller value of r. Our interpretation of this is that, because of exponential scaling in Eq. 13, the quantities s(r) and a(r) vanish extremely fast as r → r c . A more stringent test of the identity of the two transition points would be valuable.
We have also examined the observables discussed here in 1+1D circuits, motivated by the fact that, since they do not require us to introduce a spatial bipartition of the system, they avoid introducing a lengthscale that is smaller than the system size. We will report on this elsewhere.

F. Replicas and field theories
A key question is whether useful continuum field theories can be written for the MPT and FMPT, and also for entanglement transitions in (reasonably generic 3 ) random tensor networks. This question has not been resolved, despite progress on mapping the quantum problems to effective "classical" lattice models [13, 14, 33-35, 41, 42]. A basic issue is the need to handle disorder. The most familiar approach to this is to use the replica trick [13,14,33,35]. (In this section we use N to denote the number of replicas: this should not be confused with the number of physical spins in the previous sections.) However the complicated N -dependence of the interactions makes it unclear a priori how to coarse-grain these effective lattice models.
In Sec. VI we start by reviewing the approach of mapping circuits and tensor networks to effective lattice models for permutations. We discuss coarse-graining of such models in a heuristic way. We then suggest an alternative way of thinking about effective statistical mechanics models for circuits (motivated by a physical picture for the emergence of permutations, in terms of phase cancellation in sums over Feynman histories [33,42]). This picture connects entanglement transitions to approaches fa-miliar from disordered magnetism, the random field Ising model, spin glasses, etc. [67].
With this motivation, we construct the simplest Lagrangians that capture the global symmetry associated with the replica formulation [33,35,42], which we denote and which pass some basic consistency tests. The limiting number of replicas N is distinct for the case of (i) the MPT and (ii) both the FMPT and the RTN [13,14,35]. For the FMPT and RTN we need to take N → 0, as in standard quenched disorder problems. For the MPT, realizations are weighted by the additional Born rule factor, which increases the number of replicas: we need to take N → 1 [13,14]. Previously, properties in the vicinity of a fine-tuned point have been used to motivate the suggestion that all of these problems may have similar universal properties, despite the differing numbers of replicas [14]. However, we find that the simplest field theory candidates (which may of course be too simple) are strikingly different in the two different cases.
The Lagrangians we propose have the schematic forms F is the tensor F ab,cd = δ bd + δ ac . We have suppressed all coupling constants except the crucial one that drives the transition, denoted µ or ν. Both space and time derivatives are grouped together in the derivative term: in the case of the circuit there will in general be a nonuniversal speed v appearing, so that the derivative terms have the form (∂ t X) 2 + v 2 (∇X) 2 . The plus sign means that there is an emergent Euclidean, rather than Lorentzian, spacetime symmetry [6,9,15]. 4 In the Lagrangian L X , the field X ab is a real N × N matrix satisfying a X ab = 0 and b X ab = 0. It may be thought of (modulo a constant shift) as a coarsegrained permutation matrix. This Lagrangian is appropriate for the replica limit N → 1. It has upper critical 4 For generic versions of the MPT and FMPT, the emergent spacetime symmetry is of course partly a conjecture. It is perhaps made more plausible by the existence of such symmetries in some simpler limiting models. The minimal cut problem [6], and also some alternative q → ∞ limits [13,14], map to percolation problems which have this symmetry. Some measurement induced critical points with a free fermion structure also map to conformally invariant models [36-38, 40, 53]. Conformal invariance in 1+1D Clifford measurement circuits has been demonstrated numerically [9,15]. There is numerical evidence that the dynamical exponent is unity for the Haar-random MPT [6]. Finally, we may use dual-unitary circuits to set up measurement circuits that have 90 • rotational invariance in spacetime even microscopically (we will discuss this elsewhere).
dimension D = 6 (this is the spacetime dimension in the case of the circuit). This is a candidate Lagrangian for describing the MPT. At first we might assume that the same Lagrangian L X for the measurement transition could be continued to the distinct limit N → 0 in order to describe the random tensor network and the forced measurement transition. We argue in Sec. VI that this is not the case. Instead, the simplest candidate for the FMPT and RTN is the Lagrangian L Y . Here, the field Y is a real N × N matrix, with N → 0, that does not satisfy any constraints on its row and column sums. The upper critical dimension for this theory is the unexpectedly large value D = 10. See Sec. VI for further discussion.
We caution that these theories are conjectures based on symmetry considerations and certain limited consistency checks. Further investigation is required to determine whether they are in fact sufficient to describe the problems of interest. It is possible that more elaborate continuum descriptions are required, either for a particular microscopic model or in general. Indeed, the trees described in Sec. II D, which have exponential order parameter scaling close to the critical point, appear to be one case that is not captured by L Y . (Contrary to the naive guess that the high-dimensional limit of the field theory and the tree would show similar "mean field" critical scaling.) We defer an examination of the reason for this to a future work.
In Sec. VI we also present some results that are independent of the speculative field theories above. In particular we use effective domain wall pictures to obtain the scaling within the phases (mentioned above in Sec. II B).
We also briefly discuss the use of Ising toy models for the properties of the second Rényi entropy in measurement dynamics, pointing out that the formalisms of [33] or [42] allow these to be justified in certain strongly entangled regimes, rather than being regarded simply as toy models as in previous work. However, quenched disorder must be taken into account in the resulting Ising model. Additionally, the Ising picture breaks down close to the critical point (or in the disentangling phase) and also at long times.
Finally we discuss variants of the MPT, FMPT and RTN phase transitions. We point out that quite different scaling obtains for models of free fermions subjected to measurements, as a result of continuous rather than discrete replica symmetry.

III. MINIMAL CUT PROBLEM
A natural starting point for understanding the MPT is to map the quantum circuit to a classical graph on which one can study a classical "minimal cut" optimization problem [6]. In this mapping there is a phase transition at the point where the graph percolates.
We think of this classical min-cut problem as a toy model for the generic quantum transition. In the circuits we study, the cost of the minimal cut gives the exact value of the (somewhat unphysical) zeroth Rényi entropy, 5 S 0 . It also gives exact results for the other Rényi entropies in the limit of large local Hilbert space dimension (e.g., a large value of each spin), with Haar random gates. But in general, the minimal cut is only an upper bound on the entanglement entropies S n with n ≥ 1. (There can be no quantum information propagated from the initial to the final time if the associated classical graph is disconnected; in this regime, the "cost" of the minimal cut vanishes.) The true "quantum" transition in general occurs at a smaller value of r than the classical transition discussed in this section (and in general has distinct universal properties). Despite this, the classical problem conveys some useful lessons.
Viewed as a graph, the circuit is a bond percolation configuration, as described below. The frequency of projective measurements determines the fraction of broken bonds in this percolation configuration. The minimal cut is the minimal number of additional bonds that must be severed in order that two parts of the boundary of the circuit, A andĀ , no longer have any percolating path between them. This minimal cut is a unifying heuristic [34,50,64,[68][69][70] for the entanglement of various objects, depending on how we choose A andĀ. If these are taken to be two complementary subsets of the legs of the circuit at the final time, then the minimal cut gives the entanglement S 0 of a subset A of the spins in the final state quantum state, assuming the initial state was a product state. Here we are more interested in a minimal cut separating the top boundary of the circuit from the bottom. That is, A contains all the circuits "legs" at the final time, andĀ all those at the initial time. This "horizontal" minimal cut is a measure of information transmitted from the initial to the final time, equal to the operator entanglement S 0 for the nonunitary time evolution operator V (Sec. II).
In the percolating regime, this horizontal minimal cut must sever a number of bonds that is extensive in the number of spins N , so that S 0 sN . The coefficient s is a "surface tension" [50] for the minimal cut, which vanishes continuously at the percolation threshold. In 1+1D this transition is conformally invariant. Many of the critical exponents, such as the correlation length exponent ν, are standard percolation exponents, while others are less familiar, since the minimal cut is an additional optimization problem built on top of the percolation configuration [6,71].
In the circuit without fixed spatial structure, where any qubit can couple randomly to any other, the location of the critical point, and the basic critical exponents, can be determined exactly, as we show in this section. These exponents also apply to the finite-dimensional minimal spins. Black worldlines represent the evolution of a particular spin, with time proceeding vertically. Colored blocks indicate two-spin unitaries, and broken lines (marked with red crosses) indicate single-spin measurements. (b) The equivalent graph, with nodes representing unitaries (node of a particular color corresponds to the unitary of the same color in (a)) and edges representing unbroken segments of worldline. Small red/gray circles denote the initial/final time for a given spin. A possible minimal cut for this graph is shown by the dashed line: removing the two indicated edges disconnects the initial and final times. (c) The classical graph arranged as a tree, with the purple node used as a seed and generation number k proceeding downward. (This illustrative circuit forms a tree; in general the structure of a large circuit is only locally treelike.) cut problem when the spatial dimensionality d is greater than or equal to 5 (Sec. III E). Interestingly, there is also reason to speculate that the exponents apply for some versions of the quantum measurement transition in high dimensions, even without the minimal cut approximation (see Sec. VI, where we discuss Landau theory for the measurement transition and entanglement transitions).
For all-to-all circuits, the classical percolation problem is defined as follows. The circuit defines a random graph, in which the nodes (vertices) correspond to unitaries and the edges are the sections of spin worldline that are not broken by measurements. In other words, an edge connects two nodes whenever (i) the two nodes correspond to successive unitaries in the time evolution of a particular spin; and (ii) that spin is not measured during the time in between the two unitaries. Figure 4(a) shows an example circuit, and Fig. 4(b) shows the corresponding graph. Each node has at most four edges connected to it, corresponding to the four legs of each unitary in Fig. 4(a). The minimal cut in the figure indicates an operator entanglement S 0 = 2 for this small circuit.
We take the number of spins to be very large, N 1, while by definition the degree (connectivity) of each node is only of order 1. In this situation, standard considerations [72] imply that the local structure of the graph is treelike on both sides of the percolation transition. Above the percolation transition closed loops do exist, but their length is of order ln N .

A. Local tree structure and percolation
To relate the classical graph to a tree, imagine starting at an arbitrarily chosen "seed" unitary in the bulk of the circuit (far from the initial and final time boundaries) and tracing out its cluster: finding the nodes connected to the seed by an edge, then those connected to the seed by a path of length 2 edges, etc. In this way the cluster containing the seed may be arranged in a tree, with the seed at the top and subsequent generations of connected nodes below: see Fig. 4(c). We denote the generation number by k, with k = 0 being the seed.
The probability p that a given one of a unitary's four possible edges is absent is equal to the probability that (as we travel along that segment of worldline) the spin undergoes a measurement before it is involved in another unitary. This probability is given by The small circuit shown in Fig. 4 contains no loops. In general the circuit can contain loops. However, a subcluster of any finite size is guaranteed to be free of loops in the limit N → ∞ (since the probability that two unitaries in generation k both connect to the same unitary in generation k + 1 is of order 1/N ).
To understand the location of the critical point, note that the average branching number of the tree (the average number of descendants of a given node with k > 0) is 3 × (1 − p). The percolation transition in the graph occurs when the branching number is 1, i.e. at p c = 2/3 (as also noted in Ref. [12]), or (In this section only, r c denotes the classical transition point, r c = r cl c .) When r is greater than r c , all trees are finite even in the limit N, t → ∞, where the graph itself is infinite: starting at a seed node, the tree inevitably dies out after a finite number of generations. Therefore at r > r c all unitaries are in finite clusters; this is the non-percolating phase. When r < r c , however, there is a nonzero probability f ∞ that a tree continues forever, or rather until it includes a number of nodes proportional to N . In the percolation problem, f ∞ is the order parameter -the probability that the unitary lies in the infinite cluster. The critical exponent β for this order parameter is 1, which is the mean field value for percolation. A simple recursive treatment (App. A 1) shows that, close to r c , Note that the window of r in which we can hope to observe critical scaling, corresponding to 0 < f ∞ 1, is rather narrow as a result of the large (nonuniversal) prefactor in Eq. 18.

B. Effective 1D continuum theory
We now show that near the critical point the basic scaling variables for the percolation and minimal cut problems are: where δr = r − r c and t is, say, the temporal duration of the evolution. For example, the characteristic timescale for a large system at its critical point scales as N 1/5 . In Secs. III C, III D and Apps. A 2-A 4 we will show how these variables appear in scaling forms for the minimal cut and other observables. This problem is similar to one of crossover scaling, in which a system that is effectively very high dimensional on short timescales crosses over to one that is onedimensional on long scales. This analogy can be used to obtain the above exponents, as we discuss in Sec. III E. This approach also sheds light on the quantum problem (Sec. VI). Here, however, we solve the classical problem directly.
To simplify the discussion, let us consider a percolation problem with the same basic features as the circuit, but with a simpler connectivity rule inspired by the Erdős-Rényi random graph [72]. This simplification does not change the universality class, as we show numerically in App. A 3. The random graph we consider has a layered structure, with one layer for each timestep. This graph may be contrasted with one studied in Ref. [26], which maps a measurement transition in a class of "instantaneous quantum polynomial time" circuits to the percolation transition in an Erdős-Rényi graph without a time dimension.
We discretize the time t in integer steps. At each t we have N nodes, labelled (i, t) with i = 1, . . . , N . We allow edges only between sites in adjacent t layers, each edge being present with probability b /2N , independently of the others. This scaling with N ensures that the average degree of a site, b , is O(1), as in the circuit. It is easy to see by thinking about the local tree structure that the phase transition is at b c = 1. As in the circuit, connectivity is local in time, but there is no notion of spatial structure with a layer at a fixed time.
Classical percolation can be mapped to the Q-state Potts model in the limit Q → 1 [73][74][75][76]. For our problem, the fact that each site couples to all the sites on the adjacent layers means that the Potts partition function simplifies after a Hubbard Stratonovich transformation with a field Φ(t) that depends only on time. This transformation is shown in detail in App. A 2. The field Φ may be taken to be a Q × Q traceless diagonal matrix, on which Potts symmetry acts by permuting the diagonal components.
It is possible to take the continuum limit in a controlled way, to give an effective one-dimensional field theory. Close to the critical point, such that b − 1 = δb 1, the partition function for this field theory is with Modulo the values of the order 1 constants, we expect the same field theory to apply to the percolation model arising from the circuit. The factor of 1/ √ N in Eq. 21 allows a long timescale and nontrivial scaling forms to emerge at the critical point δb = 0, despite the fact that the effective field theory is one-dimensional. One-dimensionality implies that for any fixed N , correlations decay exponentially at sufficiently large t, but the timescale diverges with N .
The critical exponents for the minimal cut problem in the all-to-all circuit follow from the observation that the change of variables eliminates N from the action. Scaling forms for correlation functions follow from this fact together with the corresponding scalings for operators. We discuss some examples in the following subsection and in App. A 4. We may also obtain these exponents from a crossover scaling argument if we assume that they are the same as those in a system which does have spatial structure, but with a very high spatial dimensionality d. This crossover is described in Sec. III E.
In fact, the exponents in Eq. 19 apply for any d > 5 (in an appropriate regime of timescales) with logarithmic corrections in d = 5. This is because d = 5 gives a total spacetime dimension of 6, which is the upper critical dimension for percolation. This fact allows an even simpler mnemonic for the above exponents. Suppose for a moment that we are considering a graph with a regular lattice in spatial dimension d = 5, with N = L d = L 5 , where L is the system size. d = 5 is the lowest dimension in which mean-field exponents apply (up to logarithms). In this picture, the first scaling variable above is simply t/L, corresponding to the dynamical exponent z = 1 in the 5-dimensional theory, and the second scaling variable is u = δrL 1/ν , with the mean field correlation length exponent ν = 1/2. In d > 5 we must also consider the dangerous irrelevance of the interaction term in the field theory [77] (which means that the relevant timescale is no longer t/L), but this term can be treated using a standard coarse-graining argument (Sec. III E).

C. The percolation probability
Before describing the minimal cut itself (Sec. III D), we first consider an observable that is simpler to study both analytically and numerically -namely, the probability P perc of percolation between initial and final times in the classical graph. The value of 1 − P perc is equivalent to the probability that the operator entanglement is exactly zero, since non-percolation of the classical graph implies that the initial and final times are causally disconnected. P perc has scaling dimension zero, i.e. it has no powerlaw prefactor in N , so it is useful for numerical tests of the scaling defined by Eq. 19. In App. A 4 we present numerical results for two observables with nontrivial scaling dimension: namely, the probability of two nodes on either the same or opposite time boundaries being connected to the same cluster. We show that these observables are also described by the scaling variables in Eq. 19. In the Potts language, P perc is expressed in terms of the free energy cost of twisted boundary conditions [78]. (In the 1D field theory this free energy involves boundary magnetic fields that are parametrically large in N ; this is discussed in App. A 2.) We obtain the scaling form: (Here t denotes the full temporal duration of the dynamics.) First consider the critical point r = r c , for which In principle we should obtain a scaling collapse simply by plotting P perc as a function of the scaling argument. Practically speaking, however, the characteristic timescale N 1/5 is modest for the values of N we can access numerically, and it appears to be necessary to include a subleading correction. This correction is of a type that is generically present for non-periodic boundary conditions, and corresponds to replacing the scaling variable with (t − c 0 )/N 1/5 , for a nonuniversal O(1) constant c 0 . Figure 5 (inset) shows raw data for the percolation probability P perc of the classical graph (for the all-to-all circuit) as a function of time and N . As can be seen in the main panel, this data collapses onto a single curve when P perc is plotted against (t + c 0 )/N 1/5 , where c 0 ≈ 1.3.
At any fixed values of r and N , the probability P perc decays exponentially with time t at large enough values of t. One can extract the associated decay time τ (r, N ), which according to Eq. 23 has the scaling form This scaling is confirmed in Fig. 6. Figure 6 comprises a check of off-critical scaling close to r c as well as the scaling at r c that is shown in Fig. 5.
The decay time τ (r, N ) of the percolation probability constitutes one way of defining a characteristic timescale over which information is able to propagate between the initial and final times. A key feature of the classical graph, which carries over to the quantum case, is that the timescale τ grows very rapidly with N within the entangling phase. At any fixed r < r c , we can argue that as N → ∞ the timescale τ grows as neglecting power-law prefactors. In the present classical problem, this exponential growth can be understood in terms of rare events that disconnect the cluster. Close to the transition we must have a(r) ∼ (δr) 5/2 (27) in order to match the scaling form. 6 We expect that P perc is close to 1 for t τ . This exponentially long timescale can also be seen directly from the field theory in Eq. 21, in terms of "instantons" in the field theory (domain walls in time); see App. A 2.
As mentioned in the previous subsection, one can model the minimal cut problem using the simpler setup of a sequence of layered Erdős-Rényi random graphs. Within this model one can calculate the percolation probability P perc and characteristic decay time τ . In App. A 3 we show that this layered Erdős-Rényi model gives the same scaling behavior as in Figs. 5 and 6.
Percolation two-point functions give further information on the connectivity of the circuit. These are analyzed in App. A 4.

D. Scaling for the minimal cut
Because of the lack of spatial structure in the all-to-all model, it is natural to focus on the transmission of information between the initial and final times. One measure of this transmission is the operator entanglement of the linear, but nonunitary, operator V that defines the time evolution for a particular sequence of measurement outcomes (Sec. II).
In the minimal cut picture, the operator entanglement between initial and final times is the cost of the minimal cut through the circuit that separates the initial and final times [as illustrated in Fig. 4(b)]. We refer to this cost as S 0 (the Hartley entropy), although in some cases (including the limit of infinite local Hilbert space dimension, mentioned above) it is equal to the other Rényi entropies as well.
The behavior of S 0 is most interesting within the entangled phase, so let us consider some fixed r < r c . As illustrated in the previous subsection, in this phase there is an exponentially large (in N ) timescale over which the percolation probability is close to 1. Correspondingly, there is a parametrically large time range, corresponding to 7 1 ln t N , over which S 0 /N is approximately constant. The crudest picture for the subleading correc- tions gives 8 We refer to the range of times where subleading terms are negligible as the "plateau" in the entanglement. Over this large time window, the horizontal minimal cut has a well-defined cost per spin, s. This cost per spin s is the infinite-dimensional version of the line tension for the minimal cut in the 1+1D case or the surface tension in the 2+1D case [6]. These quantities all vanish at the critical point. In the plateau regime, the information per spin transmitted by the circuit is nonzero, up to an exponentially large time.
The inset of of Fig. 7 shows this cost per spin in the entangled phase, as measured by a numerical simulation using the Ford-Fulkerson method [79]. The details of the extrapolation to large N are described below and in App. A 5.
Let us relate this minimal cut to the scaling theory close to the critical point. We expect the scaling form 8 For a naive picture of the scaling of S 0 with N , consider a minimal cut that has zero temporal width within the entangled phase.
There are O(t) choices for the time at which to place the cut. Each choice has a random cost, which for the present illustration we assume to be Gaussian with variance N , arising from a sum of O(N ) random contributions. Taking the minimum gives the formula above. The second term is subleading so long as ln t N .
(see Sec. III B). If we assume that within the entangled phase there is a time regime during which S 0 is extensive in N and time-independent (i.e. independent of the first scaling variable above), then we obtain in this regime with the entropy per spin s(r) scaling as The main panel of Fig. 7 shows s(r) close to the critical point on a double logarithmic scale. Though we cannot extract a clear power law from the data, it seems roughly consistent with the prediction (31).
In order to numerically obtain the value of s(r) for the plots above, we measure S 0 as a function of the time t and the system size N from simulations. For a fixed t, we find that S 0 (t, N )/N has a linear dependence on 1/ √ N at large N (in line with the simple picture in Eq. 28). This dependence allows us to estimate a value of S 0 (r)/N in the limit of N → ∞ by extrapolating the linear relationship to 1/ √ N = 0. Further details of this extrapolation procedure are presented in Appendix A 5.
The scaling exponents that we found in Secs. III B and III C also apply to the classical problem in a system with a regular spatial lattice (and unitaries applied only between nearest-neighbors) in a large enough number of spatial dimensions d, as we now discuss. The total spacetime dimension, d + 1, should be greater than 6, which is the upper critical dimension for percolation (in d = 5 we will have the same exponents with additional logarithms).
We start with the standard Potts representation of percolation [73][74][75][76] in d + 1 dimensions. Suppressing all O(1) constants, as well as a nonuniversal velocity scale, a continuum action is Here φ is a traceless diagonal Q × Q matrix, as in Sec. III B. Our system is of extent L in each of the spatial dimensions, with and extent t L in the time direction. We take the UV cutoff ("lattice spacing") to be 1.
We coarse-grain the system by a factor of order L, so that the spatial system size becomes comparable with the UV cutoff, and we have an effective 1D theory as far as correlations on scales L are concerned. Since the cubic coupling is irrelevant, with RG eigenvalue y 3 = −(d − 5)/2, it decreases during the flow, leading to (again we suppress order 1 constants): Here s, the coarse-grained time coordinate, is equal to t/L, and φ ∼ L (d−1)/2 φ from the scaling dimension of the field in d + 1 dimensions. If we now write the action in terms of t and Φ ≡ L 1/2 φ , we recover the form of the action in Eq. 21 with N = L d .
Because of the dangerous irrelevance of φ 3 [77], a finitedimensional model with d > 5 has two distinct large timescales, The shorter timescale (which is compressed to order 1 in the all-to-all model) marks the crossover between (d + 1)-dimensional and 1-dimensional scaling for correlation functions. The longer timescale is the one of more interest to us, and indicates the time at which the percolation probability starts to vary away from unity. This longer time becomes the characteristic critical timescale in the all-to-all model. The scaling forms that we have already discussed carry over to the present case (5 < d < ∞) with N → L d .

F. Lessons for the full quantum problem
So far we have discussed the (classical) minimal cut problem in all-to-all and high-dimensional circuits. A priori, one can expect the universal properties of the generic measurement transition to be different from those for the minimal cut transition: the minimal cut is only an exact representation of the entanglement in certain special cases (as described at the beginning of Sec. III). Nevertheless, as in 1+1D, the solution of the minimal cut problem provides more general lessons.
First, there are qualitative features that carry over to the generic problem. The most basic feature is the existence of a transition between a phase in which the operator entanglement S(t) -the information propagated from the initial to the final time -decays quickly with time, and a phase in which an extensive value of entanglement, S(t) ∼ sN , persists over a time that grows exponentially with the number N of spins. In Secs. V and VI I we demonstrate that these features carry over to the operator entanglement (as measured by the von Neumann or Rényi entropies S n≥1 ), and related observables, in spin-1/2 circuits with measurements or forced measurements. Another generic feature is that close to the critical point, the scaling of the exponential timescale is tied to that of the plateau entanglement: ln τ (r) ∼ N s(r) (Sec. VI I).
The minimal cut model also illustrates a possible relationship between the all-to-all case and the case of a high-dimensional regular lattice. In the classical problem, the exponents of the all-to-all model are those of finite but high dimensions, once we take account of the dangerous irrelevance of interactions in high dimensions, which leads to a critical timescale L const. that is parametrically larger than the linear system size L (a timescale τ ∼ L is what one would naively expect from z = 1 scaling). In Sec. VI we discuss similar crossovers in field theories for generic quantum models. However, we caution that our results in Sec. IV suggest more complex possibilities in the all-to-all systems.
Finally, we saw that in the classical problem, the percolation order parameter and the value of r c could be obtained exactly by studying a simpler problem on a tree. In the next section, we propose that exact results for the full quantum version of the FMPT can also be obtained by studying trees: not only their classical connectivity, as here, but their "quantum" connectivity as defined by entanglement measures for tree tensor networks.

A. Motivation for studying quantum trees
Locally the all-to-all circuit has the structure of a tree (Sec. III A). Viewing the circuit as a graph whose nodes are unitaries and whose edges are segments of spin worldline, the size of the smallest loops diverges when N → ∞. This is true for all values of the measurement or projection rate, including deep in the entangled phase. We propose that this allows some exact results for the phase transition in the circuit, in certain cases (the FMPT), by studying the entanglement transition in a tree tensor network. As a by-product, we give exact results for general tree tensor networks. Fig. 8 (Left) is a schematic of the first k = 3 generations of the tree that is connected to one end of a link somewhere in the bulk of the circuit. For later convenience we have used a slightly different definition of the tree to that in Sec. III A. Previously we "pruned off" all the branches below a projection operator, while in Fig. 8 (Left) we leave them in place, so that the number of descendants after k generations (the number of links at the base of the tree) is always 3 k . Each four-coordinated node in this figure, such as the one denoted t, includes a unitary, together possibly with projectors on its legswe describe this below.
This tree is a tensor network. It has one free tensor index at the top and 3 k free indices at the bottom, and tensors t in the interior (built from a unitary and projectors). A basic way to characterize such a tensor network is via the amount of quantum information shared between apex and base. We can quantify this by the entanglement entropy between apex and base (Sec. IV C below). This language suggests analogous, but distinct, criteria for the classical and quantum transitions. The classical percolation transition at r cl c (Sec. III above) has a simple interpretation in terms of the tree tensor network. For an asymptotically large tree, r cl c is the projection rate beyond which the apex and the base are guaranteed to be strictly disconnected by projectors. That is, once we go beyond the classical transition, the quantum information shared between apex and base vanishes for simple geometric reasons. 9 This suggests that we can also diagnose the quantum transition in the circuit, occurring at a value r qm c (we will see below that r qm c < r cl c ) using the properties of the tree. We will show that the tree has a transition at a critical value r c , which we conjecture is also the location of the critical point for the circuit (r c = r qm c ). For r > r c the amount of information shared between apex and base decreases exponentially with the number of tree generations, even though the apex and the base may not be disconnected in the trivial geometrical sense. For r < r c , the von Neumann entanglement entropy between apex and base instead remains positive: lim k→∞ S 1 > 0.
Motivated by this connection between the circuit and trees, in this Section we derive some universal results for entanglement transitions of tree tensor networks. We will argue that the tree structure allows us to find the exact location of the critical point for the simplest version of the all-to-all circuit model exactly. In the language of Sec. II, this is the FMPT rather than the MPT. We explain in Sec. IV B immediately below why it is necessary for us to restrict to the FMPT in this section.
Tree tensor networks are also interesting quite apart from the connection to the all-to-all circuit [56][57][58][59][60][61][62][63][64][65]. They are instructive toy models for 1D wavefunctions with a scale-invariant entanglement structure [63,64], and they also allow efficient numerical tensor contraction algorithms [56]. Many of the results of the following subsections apply to more general disordered tree tensor networks that are unrelated to the circuit (see the discussion in Sec. IV I).
We obtain specific universal results for a broad class of trees that includes those arising in the FMPT circuit.
These trees have bond dimension 2, and the probability distribution of the local tensors has a simple invariance property. We also discuss, speculatively, what happens for trees with more general disorder distributions. Our conjectured continuum theory allows, a priori, for the the entanglement transition to be in distinct universality classes -a phenomenon analogous to a line of fixed points (there is an overview in Sec. IV D). Strikingly, for the class of trees that we study here, the transition is constrained to lie on a specific point on this line. It remains to be seen whether other points on the line can be obtained by varying the model.
Heuristically, these different possibilities for the tree transition can be related to different possibilities for the disentangled phase close to the transition. In the disentangled phase the entanglement between apex and base is exponentially small in k. But we can distinguish, in principle, between a "strong disorder" regime where this small amount of entanglement is (loosely speaking) dominated by a single path from apex to base, and a "weak disorder regime" where exponentially many paths through the tree contribute. For the tree tensor networks we study here, we show that the former (strong disorder) case applies. The possibility of these two regimes is due to the existence of a glass transition in the classical problem of a directed polymer on a tree [55]. We will rely heavily on the methods developed in Ref. [55] for the directed polymer problem, which relate a linear recursion relation for the polymer's partition function to a travelling wave equation.
B. Structure of tree tensor network

Generalities
The trees we consider have branching number three and bond dimension 2 for each bond (these are not essential restrictions). The four-index tensor t a bcd at a given node has bond index a = 1, 2 for the upper bond and b, c, d for the lower bonds. Below we describe the structure of t for the circuits we consider. We note that they fall within a special class of tree tensor networks with a simplifying feature, for which we will be able to make strong statements.
Let us first consider trees like such as Fig 8 (Left) in general terms, without assuming that they arise from a circuit problem.
First, our analytical treatment will assume that the individual random tensors t a bcd for the nodes are statistically uncorrelated. This is important as it allows a simple recursive equation for the entanglement between top and bottom. We will also take them to be identically distributed.
Second, for most of this section we will assume that the probability distribution of the local tensors t a bcd has a simple invariance property. Namely, the distribution is invariant under multiplying an arbitrary U(2) matrix u on any index: for example under This feature simplifies the recursive equation: it means we can write a recursion for singular values alone, without having to keep track of singular vectors. In fact we only need a weaker condition: below the invariance property will hold for only the lower indices of t a bcd , which is sufficient.
The two assumptions above will be satisfied naturally for the circuit ensembles we consider, for example those built from Haar-random two-site gates. They turn out to lead to surprisingly strong constraints on the structure of the recursion.
Towards the end of our discussion of trees (Sec. IV I) we speculate about what happens when we relax the second condition.

Application to FMPT in circuit
In applying our results on trees to the transition in a circuit, the first assumption above (on the statistical independence of the node tensors) restricts us to the FMPT rather than the MPT.
Recall that for the MPT the measurement outcomes in the circuit are determined by Born's rule. That means they have nontrivial statistics that depend on the random unitaries, violating the first assumption above. But in the FMPT the local projection operators are fixed independently of the choice of unitaries, not with Born's rule. This means all the nodes of the tree are statistically independent, allowing a recursive statistical treatment. The nodes are described explicitly below. For the ensembles of two-site unitaries we study, it does not in fact matter how the directions of the local projections are fixed, so long as this is done independently of the realization of unitaries. For definiteness we take all the local projectors to be onto the spin-up state.
To complete the specification of the circuit model, we just need to fix the distribution from which each two-site unitary U is drawn. (As in Sec. III, the rate at which projection operators is applied is r.)

Choice of ensemble of unitaries
The simplest choice is to take each U independently Haar-random in U(4), i.e. drawn from the circular unitary ensemble.
For numerics we found it useful also to study a second ensemble of more weakly entangling gates: this increased the separation in r between the quantum transition of interest in this section and the classical transition (at r cl c = 0.8) discussed in Sec. III.
Unitaries in the second ensemble, referred to below as the "∆t ensemble", are of the form (38) where V 1 , V 2 , W 1 and W 2 are Haar-random one-site unitaries, and U fixed is a non-random, fixed unitary:

Node tensor in tree
Recall from Sec. III that we can "grow" the tree by starting at some seed location in the circuit and following links (segments of spin worldline) to form a cluster of unitaries at greater and greater distance from the seed. In fact, if we start on a link, we can think of it as a seed for two trees, one attached to each end of the link. It suffices to consider the properties of one of these trees separately. Truncating the tree at k generations gives a tensor network with a single bond at its apex and 3 k bonds at the base (we follow the convention in Sec. IV A where all branches are kept, even if they contain projections).
First consider a tree with no projections, where each node is a unitary. Now, when we include projections, each link of the tree has a probability p to contain a projection. 12 If a projection is present, we choose to incorporate it into the node below the link. The node tensor is therefore: or in components (we write the row index of U as a superscript, and the column index as a subscript; both are multi-indices, since the unitary acts on two spins): The matrix Q, shown as a circle in the picture, is either the identity, or the projector onto up, with probabilities 1 − p and p for each of the options. Recall that, in terms of the measurement rate (Eq. 16), In the case where the projector is present, we could simply prune off all the branches below it, but it is simpler to treat the geometry of the tree as fixed. Note that the distribution of Eq. 41 is invariant under multiplication of U(2) matrices on any of the lower indices, as required in Sec. IV B 1.

C. Entanglement between apex and base
We will characterize the phase that the tree is in by the amount of quantum information shared between its apex and its base. Depending on the phase, this can either be exponentially small in the number k of generations of the tree, or it can be order 1 even for asymptotically large trees.
We can always think of such a tree tensor network as a wavefunction for a single spin at the apex and multiple spins at the base. The information shared between apex and base is then quantified by the entanglement entropy between top and base, or more formally, by the singular value decomposition when we partition the tensor network between the top and the base: see Fig. 8 (Right). 13 Since the bond at the apex has a bond dimension of 2, there are only 2 singular values. After normalizing the tree, their squares sum to one, so we are in fact characterizing the tree by just a single number. We will take this to be the square of the smaller singular value, and will denote it by Z k for a tree with k generations of nodes: The entanglement mentioned above is quantified by the Rényi entropies, which at small Z k are approximately In the random tensor network Z k is of course random. Its distribution can be obtained recursively, using the fact that a larger tree can be built up by combining subtrees.

D. Overview: classes of quantum tree
Let us summarize our basic conclusions for Z k before getting into calculations. Depending on the location in the phase diagram, the random variable Z k may have a broad distribution, and it will be vital to define its typical value using the average of ln Z k : We must condition on Z k not being strictly zero in order to define the typical value. 14 For the quantum circuit with two-site unitaries and projections, the tree undergoes an entanglement transition at a critical value r c . The value of r c depends on the ensemble of unitaries, but a basic point is that it is strictly below the classical transition point for any ensemble satisfying our assumptions: We compute the value of r c analytically for the Haar circuit: This critical point at r c 0.749004 lies not that far from the classical transition at r cl c = 0.8. For the ∆t = 0.3 ensemble the spacing is increased, 14 The trees we encounter in the circuit have a nonzero probability of terminating before they achieve k generations, in which case Z k is identically zero. So long as we are below the classical transition, an order 1 fraction of the probability distribution for Z k is supported on nonzero values even when k is finite but large. For other tree tensor networks Z k may be nonzero with probability 1 (for any finite k), in which case we simply define ln Z typ In the disentangling phase the information shared between apex and base tends to zero exponentially with the size of the tree: with the "speed" |c r | vanishing linearly as r → r c . In the entangling phase Z typ k is instead nonzero as k → ∞, so that information is shared between apex and base even in the limit of an infinitely large tree. This information becomes small as we approach the transition from the entangled side. The scaling is very rapid: with C a nonuniversal constant. The distribution of Z is also very broad when r c − r is small. For example Z ∞ ∼ Z typ ∞ , so that the mean is parametrically larger than the typical.
If we are right at the critical point, the value of Z decays more slowly with k than in the disentangled phase. A somewhat heuristic argument in Sec. IV H 4 suggests The results above rely on an exact treatment of the linearized form of the recursion relation for Z k , together with the conjecture that the effect of nonlinearity is captured by a simplified, analytically tractable model. Making this assumption (for which we provide numerical evidence), the results above hold for the trees derived from any forced measurement circuit ensemble with the structure described in Sec. IV B 3 (recall that we assumed invariance of the distribution of U under rotations on each leg.) In fact, they apply for the entanglement transition in any tree tensor network that obeys the two assumptions described in Sec. IV B 1, in particular the U(2) invariance property of the node tensor. In this context r is no longer interpreted as a measurement rate: instead it is any parameter characterizing t a bcd that can be used to drive the entanglement transition. However for the purposes of the discussion we will use the notation appropriate to the forced measurement circuit.
An analysis of random tree tensor networks outside the above class is a task for the future (see also Ref. [65]). However, our conjectured effective description suggests the interesting possibility that there may be multiple universality classes for the tree entanglement transition. The effective description includes a parameter ∆ > 0, which controls the scaling of Z typ ∞ near r = r c . At first glance ∆ is a nonuniversal parameter that will depend on the model. But, surprisingly, the U(2) invariance property fixes ∆ = 1/4 at the entanglement transition. Eq. 51 applies for ∆ < 1 in the effective description, but for ∆ > 1 there is instead power law scaling of the "order parameter" Z close to the transition. For completeness, we solve the effective model in this ∆ > 1 regime also. We find a regime with a variable exponent, and a regime where this exponent is pinned to 1: However, at present it is unclear whether these regimes of the effective model can be accessed by any tensor network, or whether they exist only in the effective model (Sec. IV I).

E. Recursion relation for singular values
Let us think of the tree as a quantum state for a spin at the top and 3 k spins at the base: this is just to fix notation for bras/kets. We may write its Schmidt (singular value) decomposition: The states are Schmidt states in the appropriate Hilbert spaces (the second ket lives in the 2 (3 k ) -dimensional Hilbert space associated with the base). In the problem we are studying, the overall normalization of the tree is not important, so we will always take the Schmidt/singular values to be normalised: λ 2 1 + λ 2 2 = 1. Given three trees T k , T k , and T k , each of k generations, we may form a tree T k+1 of k+1 generations by attaching T k , T k and T k to the base of the t node shown in Eq. 40. The statistical invariance of U under single-site rotations means that we are free to take the the Schmidt states |i top (Eq. 55) for T k , T k and T k to be simply the two basis states (up and down spin states), which we denote |1 top and |2 top . Then Here λ, λ and λ are singular values for T k , T k and T k , {|a } top are computational basis states, and {|bcd bottom } is a set of 8 orthonormal states associated with the base of the full tree, formed from the Schmidt states of the three sub-trees. Equivalently, in this basis, It is straightforward to compute the normalised singular values of T k+1 . 15 Let us denote the smaller singular value squared of T k+1 by Z k+1 . If t includes the projector, then trivially Z k+1 = 0 (58) 15 Considering T ≡ T k+1 as a state, then λ 2 1 + λ 2 2 = 1 and λ 4 1 + λ 4 2 = Tr ρ 2 /(Tr ρ) 2 , where ρ is the un-normalized reduced density matrix for the spin at the apex: ρ a,a = bcd T a bcd T * a bcd .
We write the other case explicitly for completeness, though we will only need a simple limit of it: We are interested in a transition between a phase where Z k vanishes as k → ∞, and a phase where the typical value of Z k remains positive in this limit. Even in this phase, if we are close to the phase transition, this typical value of Z k is small. Therefore to understand the critical properties we can study the recursion relation in the regime where the minimal singular values are close to 0 for all the trees. We order the singular values of any tree such that λ 2 ≤ λ 1 , and define Z = λ 2 min = λ 2 2 for each tree.
The first step is to examine the linearized recursion relation. Taking Eqs. 58, 59 to order Z, Here A i are three positive constants that depend on the random unitary: Analogous formulas hold for more general choices of the node tensor t, see Sec. IV I. Let us consider the meaning of this equation. Z k , Z k and Z k refer to trees of the same size, so they are drawn from the same probability distribution. The recursion relation then defines the probability distribution for a variable Z k+1 at the next level in the hierarchy. This defines a sequence of probability distributions P k (Z) for increasing k. The initial condition at the lowest level of the hierarchy is Z 0 = 1/2 (for a single bond, the two singular values are equal), i.e. P 0 (Z) = δ(Z − 1/2). (This initial condition is far outside the linear regime, but close to the transition, Z k becomes small at large k. The specific choice of the initial condition in the linear tree is unimportant as long as it is non-zero and positive.) The linearized recursion relation (60) is crucial. It is sufficient to obtain the exact location of the entanglement transition, although we will need to add nonlinearity to understand what happens close to this transition in the entangled phase.
Let us collect here some properties of the A i that will be useful below. It turns out that the statistical invariance of U under single-site rotations allows some exact statements, regardless of the precise choice of distribution for U . We demonstrate these in App. C 2. In particular we will need the following identities, which hold for all i = 1, 2, 3 (so long as U is nontrivially entangling with probability 1): The first of these is at first sight surprising, since if the unitary U is the identity (for example if ∆t → 0 for the distribution in Eqs. 38, 39) then A 2 and A 3 are exactly equal to zero. 16 However, this is a singular limit for A i , see below.
In the case where U is a Haar-random U(4) matrix, we can obtain more general analytic results (App. C 2): To determine the location of the phase transition we will need the special cases A 1/2 i . These are given in Table I (App. C 2) for both Haar and ∆t = 0.3 ensembles.
The asymptotics of the probability distributions of the A i are also obtained in App. C 2. These three variables are correlated, but here we discuss only the marginal distribution of a given one. Let us define For any generic distribution of U , the tails of the V i distribution are exponential: If the distribution of unitaries is taken to be weakly entangling, for example if ∆t 1 in Eq. 39, then the right-hand tail of the distribution has an intermediate part, extending over the range ln ∆t 2 V i ln ∆t −2 , that decays with the smaller exponent −1/2 (App. C 2). This slowly decaying tail, cut off at a parametrically large V i , is responsible for the failure of the limit ∆t → 0 to commute with the average in Eq. 62 that was mentioned above.

F. Linearized recursion relation
The most basic question about the linearized recursion (60) is whether the typical value of Z is exponentially growing or exponentially shrinking at large k [55]. We may define an exponential growth speed c p : The evolution is similar in the two cases for p > pc, but for p < pc, the nonlinearity causes Z typ to saturate. The initial value Z0 = 10 −6 has been used.
In this section we will usually use p as the parameter, rather than the equivalent r, Eq. 42, since p has a more direct interpretation in terms of the tree.
Define the point p c to separate a regime of exponential growth, which we will see occurs for p < p c , from a regime of exponential decay at larger p: We will see that p c is precisely the location of the entanglement phase transition for the tree. When the linear recursion predicts that Z typ → 0 at large k, this remains true when higher powers of Z are included in the recursion. On the other hand, when the linear recursion predicts that Z typ → ∞ at large k, then the nonlinear terms in the recursion replace "∞" with a finite value, in a universal manner that we discuss in Sec. IV H. This is illustrated using simulations for the case of Haar-random unitaries in Fig. 9. This compares Z typ k for the linear and nonlinear recursion relations. In the linear case, evolution follows ln Z typ k ∝ k at large k, for all p. In the nonlinear case, this is only true for p > p c . Details of these simulations are described in App. C 1.
The speed c p can be extracted using the method of Ref. [55] which relates the linear recursion to a travelling wave problem. Define the generating function where the average is over Z k . The recursion relation (60) then becomes: where the remaining average is only over the V i = ln A i defined in Eq. 65. (The fact that the V i appear additively in the arguments of the generating functions here is the reason why the generating function in Eq. 69 is usually written with the double exponential.) It may be helpful to think of G k (x), defined in Eq. 69, as a smeared version of the cumulative probability distribution for ln Z. This definition shows that for x ln Z typ , G k plateaus at the value 1, while for x ln Z typ , G k plateaus at the probability of Z k being exactly zero. 17 G k has a "front" at that interpolates between these two plateaus. It is useful to think of x as a fictitious spatial coordinate, and of k as fictitious time coordinate [55]. Then, at late time, this front propagates as a traveling wave with speed c p , and obeys the traveling wave ansatz: The wave speed v p (λ) depends on a parameter λ of the solution G (λ) . This parameter, which must be determined, is the exponential decay constant of G (λ) at large argument [55]: Substituting this form into (70) gives an explicit formula for the speed v(λ) of the traveling wave solution with a given λ: We must then determine the correct value of λ, i.e. which traveling wave solution the initial condition converges to. This is done by standard considerations of velocity selection for travelling waves [55,80].
In outline, there is a privileged minimal speed traveling wave defined by the parameter value G k will converge to this minimal speed solution if λ * is less than 1, 18 while it will converge to the solution with λ = 1 if λ * > 1. In the latter case the speed is v p (1), which we refer to as the "annealed" value of the speed (for reasons described in Sec. Therefore the desired exponential growth rate is given for any p by Recall that λ * is determined using Eq. 74, via v p (λ * ) = 0, so it depends on p.
The above equation (77) can lead to a nonanalyticity in c p as p is varied. This has a meaning in terms of the statistical mechanics of the linearized recursion relation [55], which we review in Sec. IV G. For now we simply note that, for the present class of circuits 19 the first line in Eq. 77 is always the one that applies for p close to p c . This is shown in Sec. IV G. Given this, p c is determined by solving for λ * and p c . Fig. 10 shows v p (λ), defined in Eq. 74, for the Haar tree and the ∆t = 0.3 tree, in the vicinity of their respective p c values. (Numerically, these are obtained by simple averages using a single tensor. In the Haar case, Eqs. 63, 64 also give the exact form.) c p is given by the minimal value of the curve, c p = v p (λ * ), which passes through zero at p = p c .
This can be used to determine p c numerically, but in fact further analytical progress is possible. Using the definition of v p (λ) in Eq. 74, the equations (78) reduce to Remarkably, the second identity in Eq. 62 shows that the solution is always at λ * = 1/2, for any ensemble of unitaries satisfying our assumptions. This fact gives an 19 Recall that we assumed various invariances of the distribution of unitaries to simplify the treatment (Sec. IV B). explicit expression for p c as a simple average for the local node tensor, This may be evaluated analytically for the Haar case (Eqs. 63, 64), giving p c = 212+75π 512+75π (equivalent to the r c value quoted in Sec. IV D) and numerically for the ∆t ensemble. The location of the critical point in the ∆t ensemble is shown for various values of ∆t in Fig. 11.

G. Aside: glass transition in linear recursion
The canonical example of linear recursion relations like Eq. 60 is the problem of the directed polymer on a tree [55]: see Fig. 12. In the disentangled phase, where the linear treatment is valid at large k, this gives another interpretation of the singular-value-squared Z k as a sum over paths through the tensor network. Here we briefly review this mapping and use it to clarify which of the regimes in Eq. 77 is relevant. This subsection is not essential to the subsequent development.
Within the linear approximation Eq. 60, Z k is exactly equal to the partition function of a polymer that lies along a path from the top to the bottom of a tree of depth k, as in Fig. 12. We view −V 1 , −V 2 , −V 3 in Eq. 65 as random potentials on the three bonds below a given node. The energy of the polymer is the sum of the potentials for the bonds it visits: This is easily seen to satisfy the recursive Eq. 60. There are minor differences from the standard polymer model. the subtrees below them can simply be removed). Second, the V s have a nontrivial distribution, with links that share the same parent node having correlated potentials. The polymer can be in either a glass phase or a paramagnetic phase [55]. These are distinct thermodynamic phases in the polymer problem, but to avoid confusion we will refer to them as "regimes" , because they do not correspond to distinct phases of the entanglement problem. (The distinction between the glass and paramagnet is a feature of the linearized problem only, and is unrelated to the distinction between entangled and disentangled phases.) The glass obtains when the pinning effect of disorder on the polymer defeats the depinning effect of entropy. Usually the glass would be entered by decreasing the temperature (increasing the scale of V ). Here we increase the strength of disorder by increasing p. In the paramagnetic regime the polymer has extensive entropy (propotional to k) while in the glass the entropy per unit length vanishes.
The glass and paramagnet regimes have a simple translation to the language of the traveling wave (Sec. IV F), which we only state [55]. The polymer is in the glass regime if λ * < 1, and in the paramagnetic regime if λ * > 1 [55]. These correspond to the two lines in Eq. 77 for the growth rate c p , which is simply (minus) the free energy per unit length of the polymer.
In our problem, the entanglement transition necessarily takes place in the glass regime of the linear recursion, essentially because of the fact that A i = 1. Let us give an intuitive picture.
To begin with, imagine that the polymer is in the paramagnetic regime. In this regime (but not in the glass 20 ) the "annealed" expression for the free energy/growth rate c p applies (the second line of Eq. 77). This expression is 20 Eq. 82 is the exact growth rate of Z k for any p in the linear problem, but it is only for p < p glass that Z typ k has the same growth rate as Z k . It is cp as defined by Z typ k that will be relevant when we include nonlinearity. The failure of the annealed approximation when λ * < 1 is because, in this regime the distribution of Z becomes broad in the sense that lim k→∞ Z/Z typ = ∞. (In this regime the tail in the probability distribution for ln(Z/Z typ ) decays as e −λ * ln(Z/Z typ ) .) in fact just the annealed approximation to the free energy, in which we average the partition function of the polymer, Z k , instead of averaging its logarithm. In the present linearized problem this gives: Recall that the entanglement transition is at the value of p where c p = 0. We see from Eq. 82 that if the polymer was in the paramagnetic regime in the vinicity of p c , then the entanglement transition would coincide with the classical percolation transition at p classical c = 2/3! We can see that this is inconsistent as follows. Consider the structure of large trees when we approach the classical percolation transition at p classical c = 2/3 from below. After deleting subtrees that terminate before reaching the base, 21 a large tree with Z = 0 is made up of onedimensional chains connected by branching events. Close to the classical transition, the typical length of one of these 1D chains grows like (2/3 − p) −1 . 22 Treating them as renormalized bonds in the polymer problem, one may check that the effective disorder strength on these renormalized bonds grows without bound as they get longer. This increasing disorder strength implies that we must enter the glass regime before we get to the classical transition. That is, either the linear recursion relation is in the glass regime for all p, or it is in the glass regime for all p > p glass for some p glass < 2/3.
When the polymer is in the glass phase, c p is strictly smaller than the annealed approximation above (Eq. 77). Therefore c p in fact hits zero at a smaller value of p than v ann p does. In other words, p c is strictly smaller than p classical c .
The value of p glass is determined by the equation v p (1) = 0. For the ∆t = 0.3 ensemble the value of p glass is evaluated numerically and found to be negative, indicating that this ensemble is always in the glassy phase. For the Haar ensemble, p glass = (3 − e 7/9 )/3 ≈ 0.274 (from Eqs. 63,64,74). But since this value lies inside the entangled phase, where the linearized recursion is not valid, we do not expect that the glass transition is physically significant for the tensor network.
The arguments here, showing that the entanglement transition must take place within the glass regime of the linear recursion, extend to the class of tree tensor networks described in Sec. IV B 1. The possibility of other universality classes of entanglement phase transition for other kinds of quantum trees is discussed in Sec. IV I.

H. Including the nonlinearity
Having understood the linear approximation to the recursion relation for the singular value squared, Eq. 60, we must now consider the effect of nonlinearity. The nonlinearity is necessary to make sense of the entangled phase, where Z k is of order 1, rather than being exponentially large in k as the linear equation would predict. Our aim in this section is to determine the scaling of Z close to the transition, on the entangled side. Our basic conclusions have already been summarized in Sec. IV D.

Numerical results
Let us show numerical results before turning to an analytical treatment.
First, Fig. 13 shows the probability distribution of ln Z k for the Haar ensemble (Sec. IV B 3) in a tree of k = 150 generations, where we have removed instances where Z is exactly zero. 23 Various values of p less than or equal to p c are shown. The maximal possible value of Z k is 1/2: deep in the entangled phase the distribution is concentrated near this upper limit, but as we approach the critical point ln Z typ moves to the left. The shape of the distribution also stabilizes. (In fact it approaches the shape for the linear problem, except on the right where Z is of order 1.) Next, in Fig. 14 we show the scaling of Z typ for both choices of the ensemble of unitaries (Sec. IV B 3), close to the critical point. The analytic treatment below gives ln Z typ −D/ √ r c − r, which corresponds to a straight line with slope −1/2 in the plot. This slope is indicated 23 We note that due to the forced measurements there is a finite probability for Z to be exactly zero, i.e. the distribution function has a delta function with a finite weight. These are instances where the tree is classically disconnected. When we compute Z typ and present distribution functions we do not include these trivial instances. by the trend line. The data is consistent with this value of the exponent.
However, the value of the non-universal constant D that we extract from fitting this data is D = 2.01 for Haar and D = 3.24 for ∆t = 0.3, which is far from that predicted below, for both ensembles. Experimenting with simpler toy models suggests that this may just be because of finite r c − r effects, i.e. not being close enough to r c . The numerical method we use is afflicted by severe finite size effects (see Refs. [81][82][83] and Appendix C 1), associated with correctly sampling the right hand tail of the distribution in Fig. 9, which mean we cannot approach too close to the critical point. Details of the numerical method are in Appendix C 1.

Nonlinear toy model
The nonlinear recursion relation in Eqs. 58, 59 is not very approachable, even if expanded only to quadratic order. To make progress, we conjecture that the universal properties can be understood in a simpler model that retains a few basic features. We study a recursion relation satisfying two requirements. First, it contains both linear terms and nonlinear terms of order Z 2 which tend to suppress Z (naively, the terms of higher order than Z 2 should be negligible when we are parametrically close to the transition and Z 1). Second, its linearized form is in the glass regime, as for the circuit. (Though in fact we will study both this case and the paramagnetic case for completeness.) We first write down a toy model for a tree with a discrete generation number k, but as in Ref. [55] it will be convenient to take a continuum limit in k. We assume that this continuum limit preserves the universal properties, as is the case for the linear problem. For the toy model, define the random variable Z k+1 at level k + 1 in terms of a sum of random variables Z Here is the branching number of a node, and is taken to be random with a distribution p for ≥ 0. The precise range allowed for is not important, so for simplicity we allow = 0, 1, 2. We also include nonlinearity of strength γ, and a multiplicative random variable written as e V , with Gaussian V : This can be viewed as the composition of a linear transformation analogous to Eq. 60, (but slightly simpler because we avoid having correlated random variables) and a nonlinear one, The exponential form is arbitrary: for the continuum limit below it will anyway be sufficient to expand only to order γ, giving a quadratic recursion relation for Z. However the above form guarantees that Z k+1 is positive for any input values, which was important for our numerical explorations. We conjecture that by solving this simple nonlinear system we also capture universal scaling for the problem of interest (Sec. IV E).

Continuum traveling wave equation
The equation for the generating function (cf. Eq. 69) that follows from expanding Eq. 83 to order γ is: where p 0 and p 2 are the probabilities of a termination and a branching, respectively. Now we take the continuum limit in the "time" k. When γ = 0, this gives the Fisher-KPP traveling wave equation [55]. We introduce a "time" step δτ which will be sent to zero and define τ = k δτ . The probabilities p 0 and p 2 are taken to be of order δτ (i.e. p 1 = 1 − p 0 − p 2 is close to 1) so that in the limit the tree becomes a continuous time branching process. The parameter γ is taken of order δτ (note that the prefactor does not matter: it can be absorbed into the normalization of Z) and the strength of the random potential is also taken to vanish with δτ . It is convenient to parameterize its mean and second moment as: Finally we absorb some constants into the generating function by defining: 24 The asymptotics of H may be taken to be After absorbing a constant into the definition of τ , and shifting x by a constant, H satisfies: where the growth rate is ∆ = 2(p 2 − p 0 )/ V 2 (which is finite in the δτ → 0 limit), the drift coefficient is a in Eq. 87, and there is a spatially varying diffusion coefficient The exponential term in Eq. 91 is the effect of the nonlinearity γ in the tree problem. Note that nonlinearity in the tree is unrelated to nonlinearity in the Fisher-KPP field H(x, τ ) (which instead reflects branching of the tree).
H forms a traveling wave, whose speed c sets the exponential growth rate of Z (cf. Eq. 67 and Eq. 71). On their own, the combination of ordinary diffusion and logistic growth (the ∆ term) in Eq. 90 would give a traveling wave propagating to the right (c > 0) which corresponds to exponential growth of Z. Here, in one phase, this wave instead propagates backwards (c < 0): this is possible because of the drift term in Eq. 90. In the other phase, the wave attempts to propagate to the right but is stopped by the exponential growth of the diffusion constant at positive x, which prevents the buildup of H at large x. This results in c = 0.
There is therefore a transition between a phase where Z is exponentially small at large generation number and a phase where Z remains order 1. This is the toy model's version of the entanglement transition.
In the absence of the e x term in D(x), the velocity of a traveling wave with tail H ∼ e −λx is [80] v(λ) = λ + a + ∆λ −1 (92) (as we see by keeping only the order H terms in Eq. 90), with a minimum at λ * = √ ∆. Therefore the linearized tree is in the glass regime [55] (Sec. IV G), where the traveling wave travels at speed v min = v(λ * ), so long as ∆ < 1. This is the case we are interested in for the current circuit models, where λ * = 1/2 at the entanglement transition.
The wavespeed is then c = 2 √ ∆ + a, so in this toy model the analogue of the entanglement transition is at We are interested in small positive σ, just inside the entangled phase. In principle we would like to solve for the stationary solution at late times, which we expect to exist when σ > 0. In the absence of a full solution, we consider the equation piecewise [84]. Let the position of the front, whose scaling with σ we wish to determine, be denoted x front (σ). We assume (and confirm below) that x front (σ) is large and negative at small σ.
First, at large positive x, the leading term in the equation is simply ∂ x e x ∂ x H = 0, so the only solutions satis- Second, consider −|x front (σ)| x 0. In this regime we neglect both the variation of the diffusion coefficient and the O(H 2 ) term. From Eq. 92, we can find a stationary solution for positive σ only by making λ complex [84]. Keeping only the leading σ dependence, We would like to use the as-yet-undetermined constants x 0 and φ in order to allow this solution to match onto the solutions at large positive and negative x. Note that the slope of this solution on a logarithmic plot is For generic x, this slope is close to √ ∆, because of the small factor √ σ in the second term. However, close to the zeroes of the tangent this is not true. This allows us to match on the right hand side of the range, 25 where the slope is steeper, so long as we take φ = π to leading order in σ. Similar considerations on the left show the argument of the tangent must approach 0 as the vicinity of the front is approached. Therefore, to leading order in σ, the position of the front is The constant x 0 in Eq. 95 then has the same leading term, to to ensure that H is of order 1 in the front region. Since x front (σ) also sets average value of ln Z, By considering the tail of the distribution, we see that in the regime we are discussing, where ∆ < 1, the mean scales as Notice that Eq. 98 is it depends only on the function v(λ) for the linear problem! Indeed the strength of the nonlinearity γ in Eq. 83 cannot appear, since it can be absorbed into a rescaling of Z (which does not affect ln Z at leading order). This suggests that we can apply the result to the quantum tree of Sec. IV E, using Eq. 74 for v. This gives with For the Haar-random case C is given exactly by Eqs. 63, 64. In terms of r (Eq. 42),

Tree entanglement at critical point
So far we have discussed scaling in the two phases. Exactly at the transition, we might expect that Z k tends to zero with k, but more slowly than in the distentangled phase. Figure 16 shows data for this for the Haar ensemble. The data is compatible with, though it does not clearly establish, the scaling ln Z typ k ∼ k 1/3 which is suggested by the following argument for the continuum model.
We expect that for the time-dependent equation x front drifts sub-ballistically to the left. Let us conjecture that at a given time t, and in the range x front x 0, the instantaneous solution of the nonlinear equation approximates sufficiently closely the traveling wave solution G (λ) of the linear equation with the same instantaneous speed, v =ẋ front . At the critical point (σ = 0), v(λ) has a double zero at λ = √ ∆, so this means that This gives a solution like Eq. 95, but with |ẋ front | in place of √ σ. Eq. 97 then becomes which gives x front ∼ (3π 2 t/ √ ∆) 1/3 . These values for the exponent and the prefactor are in good agreement with a numerical solution of Eq. 90 at a = −2 √ ∆ (we checked the case ∆ = 1/4).
If σ is small but positive there must be a crossover at a large time t sat from x front ∼ −t 1/3 to x front ∼ −1/ √ σ.
This suggests t sat ∼ σ −3/2 , which also agrees well with numerical solutions.

I. Quantum trees: other universality classes?
Above we noted that a priori there were two possibilities according to whether the entanglement transition takes place within the glass or the paramagnetic regime of the linearized recursion relation: λ * < 1 and λ * > 1 respectively.
However, our approach required the statistical invariance of the node tensor t a b1,b2,b3,...,b under U(2) rotations on a leg. (We are free to allow for an arbitrary branching number .) This invariance was necessary so that we could write a recursion relation for singular values only: otherwise we need a combined recursion relation for singular values and singular vectors. For any such tree, the argument of Sec. IV F and App. C 2 shows that λ * = 1/2 at the transition. That is, the recursion relation is of the form 26 with A i = 1 and A 1/2 i ln A i = 0, which is sufficient to ensure λ * = 1/2 at the critical point (Sec. IV F).
In this class of trees the weak correlations between top and base in the disentangled phase are dominated by only a subgraph of the tensor network that contains a few paths from top to bottom. For this broad class of trees we expect the universal scaling described above. Therefore within the class of trees that our formalism applies to there is no freedom to vary λ * .
However it is interesting to ask what happens in trees where the unitary invariance property is broken. Breaking this invariance introduces correlations between singular values and singular vectors. A plausible guess, at least if these correlations are not too strong, is that in this setting the same toy model nevertheless captures the universal scaling. If this is the case (which we will not determine here) then the next question is whether in these more general models it is possible to vary the critical value of λ * away from 1/2.
With this somewhat speculative motivation (and for completeness), below we extend the analysis of critical scaling in the toy model to the regime λ * > 1.
Our analysis has also been restricted to trees with bond dimension two. A recursion relation (for the subleading squared singular values squared) may be formulated for 26  trees with larger bond dimension, but has a more complicated structure, even at lowest order. It would be interesting to study this further.

Scaling of nonlinear recursion: ∆ > 1
We return to the toy model of Sec. IV H 2 in the continuum limit, now with ∆ > 1. The near-critical regime inside the entangled phase is now with 0 < σ 1 representing the control parameter that drives the entanglement transition.
The difference from the case studied above (cf. Eq. 95) is that the solutions λ of v σ (λ) = 0 are no longer complex: instead there is a real solution at λ = 1 + O(σ), and a larger real solution at λ + = ∆ + O(σ). That is, if we neglect both the nonlinearity in H and the x-dependence of the diffusion constant, the stationary solution is a sum of two exponentials, in contrast to Eq. 95.
In App. C 3 we study this regime via the equation for R = ∂ x ln H, which interpolates between 0 for x x front and −1 for x 0. We conclude that where the denominator appearing in κ is the difference of the two solutions to v(λ) = 0 at the critical point σ = 0. In the present model, κ = max{1/(∆ − 1), 1}, but we conjecture that the form in Eq. 107, which requires only knowledge of the speed function v(λ) of the linearized problem, applies to a wider set of models. Our argument in App. C 3 is not rigorous, so we have compared the formula κ = max{1/(∆ − 1), 1} with a numerical solution of the continuum equation. Results are shown in Fig. 17 and are in fairly good agreement with the prediction.
Numerical solution of the continuum equation suggests that the above exponent κ also determines the decay of Z right at r c , We have also studied the discrete toy tree model in Eq. 83 numerically in the regime with λ * > 1. We find polynomial scaling of Z typ near the critical point as expected. The numerical estimates of the exponents differ somewhat from the predicted ones, which we attribute to finite size limitations. See Fig. 17 (inset) for examples. The parameters corresponding to PM − 1 are p 1 = 0.15, p 2 = 0.85, = 0.95, and γ = 0.5, as such ∆ = 3.97391 and we expect Z typ k→∞ ∼ σ −1 . Indeed the best fit exponent from our simulation is κ = −0.96. On the other hand, the parameters corresponding to PM − 2 are p 2 = 1, = 1.5, and γ = 0.5 such that ∆ = 1.5066 as such the expected exponent is κ = −1.97. We however  find a best fit exponent of κ = −1.53 and attribute the discrepancy to finite-size of the pool and distance from the critical point.

J. Trees, entanglement and min-cut
So far we have characterized the entanglement between the top of the tree and the base. We now apply this to more general entanglement quantities in the tree. Fig. 18 is a schematic of a wavefunction for a chain of spins that is given by a tree tensor network (note that there is no longer a free bond at the top). Here we will consider the entanglement S(R) of a set A of R 1 contiguous spins in a much larger chain. This problem has also been tackled recently in Ref. [65] using a different method: see Sec. II D.
As is well known, in such a geometry the minimal cut cartoon suggests the scaling S ∼ ln R [64][65][66] , which is the number of bonds cut for "typical" choices of the placement of the region A (the tree strongly breaks translational invariance). Figure 18 shows an example of a minimal cut in a small tree. The logarithmic scaling is presumably correct in the entangled phase, but what happens close to the transition? For simplicity we consider the second Rényi entropy.
Note that the minimal cut in Fig. 18 lops off a disjoint set of smaller subtrees, marked in red/thick. We will assume that the region is placed so that this is the case. In this setting, a natural conjecture for the tree is that the universal scaling forms for the entanglement close to the transition and in the disentangled phase are given by a "modified minimal cut" formula: we first find the geometrical minimal cut, but then weight the contribution to the entanglement of a bond at height k (k generations above the base) by an amount that depends on Z k . Since we are interested in the region close to the critical point and large k, we assume Z k 1. What should this weight be? The simplest case is where the minimal cut only breaks one bond, i.e. where region A corresponds to a single connected sub-tree. In such cases the minimal cut breaks the full tree into two subtrees, and each one is characterized by a Z value. (In general one of them has an irregular structure, with different numbers of generations for different branches, but we can still use the recursion relation to compute its Z value.) A 2 × 2 matrix calculation shows that the second Rényi entropy S 2 is proportional to the product of these Z values, S 2 ∝ ZZ , to leading order. 27 Using the fact that Z typ k is asymptotically non-increasing in k, the Z values of the subtrees are both of typical size Z typ k for k equal to the height of the cut bond.
This suggests the conjecture In this schematic formula, c ln R is the maximum height reached by the minimal cut, and Z k , Z k are random variables. All order one constants have been neglected, since we only aim to capture the asymptotic scaling with R and with the distance from the critical point. We can confirm Eq. 109 explicitly in an artificial limit in which the scale of the Zs tends to zero, with R arbitrary but fixed: this is described in App. C 4. However in the physical problem we wish to take R to infinity, so this does not prove the conjecture. 27 The prefactor depends on the singular vectors at the top of the sub-trees, but is of order one.
Consider the first class of trees (including those with the statistical invariance property of the node tensors, for example those appearing in the Haar circuit). Let r be an arbitrary parameter that drives the tree's entanglement transition. The results in the previous sections and Eq. 109 yield: Surprisingly, the entanglement is order 1 at large R at the critical point, because of the rapid decay of Z with k (Sec. IV H 4). This is also true in the disentangled phase.
In the previous section we speculated about the existence of trees with an effective value of ∆ > 1. If such trees exist, then the same reasoning as above gives S 2 (R) ∼ (r c − r) 2κ for r r c , with a variable exponent κ = max{[∆ − 1] −1 , 1}. The entanglement right at r c is again O(1).

K. Connecting back to the quantum circuit
Our original motivation for studying the tree was the conjecture that, for the forced measurement circuit models described in detail in Sec. IV B 3, the critical point r c of the appropriate tree ensemble was also the critical point for the circuit.
Here we give an argument which bounds the operator entanglement in the circuit in terms of the entanglement in the tree. This argument is very heuristic: a task for the future is to make the connection between the circuit and the tree more precise.
The basic idea is to imagine breaking a bond in the interior of the FMPT circuit, and to ask how much effect this can have on properties of the nonunitary time evolution operator V . Let b be a bond inside the circuit at time coordinate ∼ t/2. Then V b will be a modification of V in which bond b is broken.
Starting from b we imagine marking the two trees T and T attached to either end it, using the convention in Sec. 4, where bonds with projectors on them are removed. We stop after k generations, choosing the largest possible k such that these are indeed two disjoint trees (no loops). Therefore k should be of order ln N . We assume that the number of spins N is very large, so that the typical size of Z and Z for these trees (the minimal singular value squared) is given by the asymptotic large k result. Close to r c , this typical value is small.
Together T and T , connected by b, form a tensor network T . This can be seen as a state in a tensor product Hilbert space H ⊗ H associated with the bonds on the boundary of T and T respectively. We may form the corresponding singular value decomposition of T : the smaller of its two singular values is of order √ ZZ , in terms of the Z values or T and T (assumed small, since we are in the critical regime).
Breaking the bond is defined to mean dropping this minimal singular value. Formally, this induces an error that is of order √ ZZ in Z and Z . After averaging over the local unitaries, the error in any physical quantity is -again, formally -of order ZZ . 28 This suggests that the average change in S 2 when we break a single bond is at most of order Z ∞ 2 . This is of course far from being a proof, because in principle the small term ZZ in the formal expansion could be systematically compensated by a large prefactor.
Assuming this bound, we may straightforwardly bound the plateau value s(r)N of the operator entanglement close to r c (Sec. II, Sec. VI I). Since breaking all the N bonds in a timeslice reduces the entanglement to zero, we must have (c 1 and c 2 are constants): for small r c − r. Assuming also our conjecture that r c is the same for the tree and the circuit, this indicates that s(r) vanishes extremely rapidly as the critical point is approached. In turn, s(r) is related to the scaling of the exponential timescale in the entangled phase (Sec. VI I), just as it was in the classical problem (Sec. III). We note that the bound (112) on the scaling need not be tight. This can be understood by considering the analogous argument for the classical minimal cut problem.
Above, our bound used Z . The analogous quantity in the classical problem is the the probability that a given tree is infinite. This is essentially the order parameter in the classical problem, scaling like f ∞ ∼ (r cl c − r). We can bound the cost of the classical minimal cut S 0 as follows. Consider all the bonds of the percolation configuration that traverse some timeslice, say at time t/2. Each bond has a probability ∼ f 2 ∞ that the two trees attached to either end of it are both infinite. These are the only bonds we need to cut (the others lie in disconnected clusters or dangling ends). This shows that s cl (r) goes to zero at least as fast as (r cl c − r) 2 close to the transition. This bound is consistent with, but weaker than, what we have argued is the true scaling in the classical problem, s cl (r) ∼ (r cl c − r) 5/2 (Sec. III D).

V. SIMULATIONS OF QUANTUM CIRCUITS
Having made a connection between trees and all-toall circuits, we now turn to the numerical simulation of the latter. Exponentially large in system size Hilbertspace dimensions restrict us to systems with N ≤ 20 spins-1/2. We simulate both measurement circuits and forced measurement circuits, keeping in mind that the results obtained from the tree apply only to the latter. Unless specified, the results shown here are for the Haar ensemble Eq. 37 (we also comment briefly on the ∆t = 0.3 ensemble, Eq. 38).
All-to-all circuits have no spatial structure. Consequently, the entanglement transition does not entail a volume-to-area law transition in the entanglement associated with a spatial bipartition of a state. Instead, we consider two observables which quantify the amount of quantum information transmitted from the initial to the final time: (i) the time-evolution of the operator entanglement entropies (opEE) of the non-unitary evolution operator V , and (ii) the overlap of two initially orthogonal states that are both evolved using V .
We will show that the entanglement transition separates an entangled phase at r < r c , wherein an extensive amount of quantum information is retained for an exponentially long time, from a disentangled phase at r > r c wherein memory of the initial state is rapidly lost. This is in agreement with analytical results for the quantum problem in Sec. VI I, and is qualitatively similar to what we found in the classical toy model in Sec. III.
First, we will give evidence for a plateau in the operator entanglement for r below a critical value. We will then turn to observable (ii) above: since this does not require exact diagonalization of V , it allows larger N to be accessed. We use this observable to define a timescale τ (r, N ), and show that this timescale scales exponentially with N inside the entangled phase. Details of numerical calculations are relegated to App. D.

A. Operator entanglement
The amount of information carried from the bottom of the circuit to the top can be quantified via the opEE of V . In the case of measurements, where we must choose an initial state in order to define the Born rule probabilities, we take this state to be a product state (with spins aligned in the positive x direction, |→→ · · · → ).
The opEE is obtained from the singular value decomposition where D H = 2 N is the Hilbert-space dimension, and {|j 0 } and {|j t } are bases corresponding to the initial and final time. (We leave the t-dependence of V implicit.) The opEE is: where λ j ≡ µ j / j µ 2 j . For a unitary V , S n takes on its maximal value of N ln 2. Any reduction compared to this value reflects loss of information between initial and final time due to worldlines of the spins broken by FIG. 19. The opEE, S1, of the non-unitary time-evolution operator (see Eq. 114) for the Haar ensemble with forced measurements. The top panel is for r = 0.1, deep in the entangled phase where there is a plateau in S1/N in the N → ∞ limit. The extrapolated N → ∞ value is shown as the black dashed line; the extrapolation in N is shown for a few exemplary time points in the inset. The bottom panels correspond to r = 0.3 and r = 0.75, the latter being the putative rc obtained from the Haar tree. Note that the value of S1/N is already quite small in the entangled phase at r = 0.3. All data is averaged over 5000 realisations.
measurements. S n is bounded from above by the minimal cut separating the initial and final times. Close to the FMPT transition, we also have the conjectural bound Eq. 112 on the scaling form for s 2 (r) = S 2 /N in the plateau region.
Results for S 1 for the Haar ensemble with forced measurements are shown in Fig. 19. In the top panel, we plot the entanglement density S 1 /N vs. t for r = 0.1 and various systems sizes. After an initial linear decrease with t associated with the first measurements, there is a time regime where S 1 /N increases with N . This suggests the emergence of a plateau in S 1 /N at large N (Sec. II).
Recall that in the entangled phase we expect a nonzero value for and that when N is finite but large, S 1 /N remains close to s 1 (r) over a range of times that grows exponentially with N . To give evidence for the nonzero value of s 1 (r) at r = 0.1, we extrapolate the data to N = ∞ for each value  Fig. 19 (top), the opEE density, S1/N , of the non-unitary time-evolution operator for the Haar ensemble but with measurements for r = 0.1. The data is consistent with there being a plateau in S1/N in the N → ∞ and t → ∞ limit. All data is averaged over 5000 realisations.
of the time. 29 This N → ∞ extrapolation is shown as a dashed line in the figure. It is consistent with a plateau, extending to t = ∞, with s 1 (0.1) > 0. (We defer an analysis of timescales to the following subsection.) A similar plateau was observed in Ref. [12] in Clifford circuits.
It is clear that the plateau value s(r) decreases very rapidly with increasing r. In the lower left panel we plot S 1 /N for the same set of N and r = 0.3. This r value is still far from the conjectured location of the critical point obtained from the tree (r c 0.749). An increase of S 1 /N with N is still observed, but it is clear that s 1 (assuming it is nonzero) is small. It is tempting to associate this with the exponential scaling in Eq. 112, which suggests that s(r) goes to zero very fast as the critical point is approached, so that a plot of s(r) against r would be very flat for r r c .
On the other hand, at r = 0.75 (lower right panel), S 1 /N decays exponentially to zero, with a very weak Ndependence and no indication of saturation at large t. In fact, the trend with increasing N is in the opposite direction to the cases r = 0.1 or 0.3.
Thus, the opEE for these system sizes is consistent with an entanglement transition, occurring below the classical critical point, and with the expected plateau for S/N in the entangled phase. However, the rapid decay of the plateau value s(r), and the weak N -dependence, make it hard to pin down the position of the transition. We have checked (but do not show) that the data for S 2 is qualitatively similar to that for S 1 .
We find the same qualitative features for the Haar circuit with true measurements. Figure 20 shows the case r = 0.1. In fact at this relatively small value of r, the 29 We use a naive linear extrapolation: the detailed functional form of the subleading corrections to S/N may depend on the precise regime of N and t (Sec. VI I). data for forced measurements and measurements is almost indistinguishable.

B. State overlap and timescales
We now turn to the overlap of two initially orthogonal states undergoing time-evolution with the non-unitary operator V : We also define the "distance" D(t) as The states are initiated as product states in the σ x -basis, |ψ (1) (0) = |→→ · · · → and |ψ (2) (0) = |←← · · · ← and are evolved using the non-unitary operator In the case of forced measurements the spins are always projected along the positive σ z -direction, and there is a symmetry between ψ (1) (t) and ψ (2) (t) . In the case of measurements we use ψ (1) (t) to determine the Born rule probabilities, so this symmetry is absent. O(t) is an another way to quantify the amount of information retained from the initial state. In the limit r = 0, where V is unitary, the two states remain orthogonal for all time, O(t) = 0. In the opposite limit r = 1, where no unitaries are applied, O(t) will be exactly one as soon as all spins have been measured. For any fixed r > 0, and for a fixed value of N , the states will inevitably converge to 1 as t → ∞, because they are being subjected to the same projections. However we expect the timescale for this to grow exponentially with N in the entangled phase. For a broadbrush view, we first show O(t) as a heatmap in the space of r and t for the Haar forced measurement circuit with N = 20 spins: Fig. 21. O(t) grows towards unity with an r-dependent timescale. One way to define a timescale is using the contour O(t) = 1/2. This is shown not only for N = 20, but also for smaller values of N , in the figure. The conjectured r value of the phase transition is marked by the red dot. The timescale grows rapidly as r is decreased. It also shows a clear N -dependence in the entangling phase, which becomes much weaker on approaching the transition point. Fig. 22 shows the time-dependence of the overlap in more detail for r = 0.5 and r = 0.75. 30 . In both cases 30 Note that the latter value of r is right at the putative critical point. This value was chosen to avoid being above, or to close to, the classical critical point r = 0.8 where the network is trivially disconnected. the time required to achieve a given value of O increases with N , but in the latter case this is mostly a shift of the curve, whereas in the former case there is the clear sign of an increasing time constant for the exponential approach of O to 1.
We use the exponential approach of the overlap to unity to define a timescale τ (r, N ). Since at late times D = 1 − O is exponentially small, and may have a broad distribution, we choose to look at its typical value. We define this by ln D typ (t) ≡ ln D(t), where instances in which D(t) is exactly zero are excluded from the average (similar to the treatment of the singular value Z in the quantum tree, Sec. IV D). At late times this shows an exponential decay, Data for ln D typ (t) vs. t are shown in Fig. 23, for the same values of N and r as in Fig. 22. We see clear exponential decay. In fact, Fig. 23 vividly shows the qualitative difference between the cases of r = 0.5 and 0.75: while τ grows with N for r = 0.5, it appears essentially N -independent for r = 0.75. Data for the circuit with measurements (not shown) is qualitatively similar.
We now analyze τ (r, N ) in the entangled phase. This is the asymptotic slope of plots like Fig. 23. We extract this from a plot of τ eff (t) = − (d ln D typ /dt) −1 , the timedependent slope: at late times, τ eff (t) should stabilize at the value τ . Representative data for τ eff (t) and the plateaux therein are shown in App. D, see Figs. 39 and 40.
It turns out that finite-time effects become significant at larger values of r, but for r not too large we are able to obtain an estimate of τ . The data (shown in App. D) is consistent at small r with exponential-in-N growth of the timescale: ln τ (r, N ) ∼ a(r)N. The coefficient a(r) is plotted against r in Fig. 24. This figure also shows data for the case of true measurements.
We expect a(r) to vanish at the critical point with a(r) ∼ s(r) (see Sec. VI I). Unfortunately, hamstrung by severe finite-size effects, we are not able to estimate the critical point accurately. 31 The data is certainly consistent with a critical point for the FMPT which is below the conjectured value 0.749. However, we speculate that this is instead a symptom of a(r) vanishing very rapidly as r c is approached, as is suggested by the essential singularity in Eq. 112. The dashed line in the figure shows this exponential form with c 1 = 39.4 and c 2 = 3.8. These values have no theoretical significance: this line is simply to indicate the possibility of a(r) remaining very small even for r considerably below r c .
The data for S 1 at small values of r is very close for measurements and forced measurements, as noted above. We do see differences between the two cases at intermediate r, with the forced measurement circuit having slightly larger entanglement at a given r. This is shown for S 1 /N in Fig. 25, Left. The comparison between the overlap data for the two cases at the same value of r (Fig. 25, Right), is also consistent with the above, with O(t) growing more slowly in the forced measurement case. This hints that r c for the measurement case may be lower than that for the forced measurement case, but our data does not allow us to determine this. While the data above was shown for the Haar circuit, we performed the same set of numerical calculations for the ∆t = 0.3 circuit as well. The results were qualitatively similar. Consistent with the results from the quantum tree, r c appeared to be smaller for the ∆t = 0.3 circuit compared to the Haar.

VI. FIELD THEORIES FOR MEASUREMENT AND ENTANGLEMENT TRANSITIONS
A key question about the measurement phase transition (MPT), not previously resolved, is whether there is a simple Landau-Ginsburg-Wilson-like field theory that captures its universal properties. This question is also unresolved for entanglement transitions in random tensor networks (RTNs), and for the closely related FMPT. In this section we propose candidates for these field theories. (In this section the spacetime dimensionality D = d + 1 is allowed to be arbitrary.) We obtain two Lagrangians, one for the MPT, and one for both FMPT and RTN. Surprisingly, these two Lagrangians are quite different in their structure, having for example different values for the upper critical dimension.
Microscopically, random circuits and random tensor networks can be mapped to lattice statistical mechanics models [13, 14, 33-35, 41, 43, 44]. These are effective spin models where the "spin" is a group element in the permutation group S N for N objects (we review this below; here N is a replica number and not the number of qubits as in previous sections). However, using these lattice models to guess appropriate continuum field theories is nontrivial for various reasons, one of them being a replica limit that is necessary to handle randomness. "Replica" lattice models were described for a random tensor network in Ref. [35], for Haar circuits in [33], and for circuits with measurement in Refs. [13,14].
Previous work pointed out that in certain limits (either by artificially deforming the weights in the effective spin model [35], or by taking a q → ∞ limit in the measurement problem [13,14]) one could access a fine-tuned point where the effective spin model had a simple continuum theory, namely that of percolation. While this was a useful step, this fine-tuned point has an infinite number of relevant perturbations [35] so unfortunately this does not provide a definite Lagrangian for the physical phase transitions of interest. Another approach has been to study Ising models that are obtained by simply omitting the replica limit, roughly in the spirit of an annealed average in conventional disordered systems [24,27,34]. These are useful toy models for various phenomena in the entangled phase [24,27] (we will give an explanation for why this is, building on [42]) but they cannot capture the correct critical properties. Therefore we attempt here to formulate explicit continuum replica field theories.
We emphasize that these theories are speculative conjectures, based on writing down the simplest Lagrangians compatible with the basic symmetries of the problem. It is certainly possible that in fact something more complicated happens in the continuum. Indeed, the exponential scaling we found in the tree seems to mean that it is not described by the high-dimensional limit of the field theory for RTNs proposed below (see Sec. VI H). How to resolve this tension is a question for the future.
We will first review the replica approach, the inevitable global symmetry of the field theories we are looking for,. and the emergence of permutations in the simplest Haarrandom models (Secs. VI A, VI B are largely review). We then discuss coarse-graining of these degrees of freedom (Sec. VI C). Next we note that these degrees of freedom have a more general meaning in terms of Feynman trajectories in the circuit [42,85]. This picture motivates an alternative derivation of a lattice field theory which in turn suggests a simpler continuum formulation (Sec. VI D). Our discussion also suggests an alternative way of thinking about the effective statistical mechanics of random tensor networks, in a way that is closer to traditional replica formulations of random magnets. Then we discuss the issue of "replica group theory" for the MPT on one hand, and the RTN and FMPT on the other: that is, constraints on the field theories associated with the replica symmetry [86]. We propose the simplest candidate Lagrangians in each case (Secs. VI E -VI G). We discuss some of the basic consequences of the simpler of these Lagrangians, that for the MPT (Sec. VI H). Our discussion of these field theories is relatively schematic: further details will be given in Ref. [87].
Sec. VI I, which is independent of the field theories proposed here, addresses scaling within the two phases, not necessarily near the critical point. Finally Sec. VI J describes variations of the measurement problem that are in distinct universality classes, for example models with free fermion structure or with additional symmetries.

A. Multi-layer circuits and replica symmetry
The crucial symmetries of the problem arise when dynamical quantities are written in terms of a multi-layer circuit, illustrated schematically in Fig. 26. (We will use the language of a circuit, with d spatial dimensions and one time dimension, but but analogous considerations apply to a D = d + 1 dimensional RTN.) This multi-layer circuit is a discrete analogue of a path integral with mul-tiple forward and backward paths, and it arises when we write powers of the reduced density matrix, say for the final state, in terms of the circuit. Let us briefly review this.
The layers are N identical copies of the original circuit V (t) and N copies of its complex conjugate V (t) * . We will call these "forward" and "backward" layers respectively. Formally, the multi-layer circuit with a given N may be written The physical quantity of interest will dictate the boundary conditions at the top and bottom: for example contractions of indices between layers, or contraction of the bond indices at the bottom of a layer with an initial wavefunction. We review this in a simple setting in Sec. VI A. The global symmetry of the effective models arises ultimately from a simple invariance of V (N ) under various operations. V (N ) is clearly invariant under (i) permutations of the forward layers among themselves; (ii) permutations of the backward layers among themselves; and (iii) complex conjugation accompanied by exchange of all the forward layers with all the backward layers [42]. Together these make up the symmetry group: Here the Z 2 is generated by (iii) above. G N is a symmetry of the bulk structure of the tensor network; it will in general be broken by boundary conditions, e.g. by a choice of index contractions at the boundary of V (N ) . A formal way to see the importance of this symmetry is via explicit mappings of random circuits or random tensor networks onto effective lattice spin models. We review this next. We will give an alternative picture below in Sec. VI D, by introducing an Edwards-Andersonlike field in a multilayer tensor network (this alternative picture may be more intuitive for those familiar with random magnets).
In simple models, averaging over the random tensors or unitaries leads to effective lattice magnets in which the "spins" σ (not to be confused with the physical spins that the circuit acts on) are valued in the permutation group [13, 14, 33-35, 41, 43, 44]: . We refer to σ and its continuum versions as the "pairing field".
We will not need details of the lattice construction, but Fig. 27 shows an example for a 1+1D circuit geometry. For each unitary in the original circuit, we obtain a spin degree of freedom σ in the effective statistical mechanical model. We may write the partition function for these spins schematically as The boundary conditions on the σ depend on the observable (Sec. VI B). The Boltzmann weight W ({σ}) is a product of local weights on each of the shaded triangles in Fig. 27: the form of the weight J(σ a , σ b , σ c ) for the three spins σ a , σ b , σ c on a given triangle interact via interactions whose form may be found in Refs. [13,14] for a circuit with measurements and in Ref. [33,41] for the purely unitary case. Constructions for the random tensor network with random Gaussian tensors were discussed earlier in Ref. [34] and extended to take into account the replica trick in Ref. [35]. In all these cases the interaction terms are, loosely speaking, ferromagnetic, in that the Boltzmann weight is maximized when the σ configuration is uniform.
Physically, the spin σ should be thought of as a way to label a choice of pairing of the forward layers with the backward layers. Let the permutation σ ∈ S N map a given element i ∈ {1, . . . , N } to σ(i). Then σ stands for the pairing in which forward layer i is paired with backward layer σ(i) and so on. For example the identity permutation, σ = I, denotes the pairing of 1 with1, of 2 with2, and so on, i.e. in the pattern: We have taken N = 3 for this example, and we have reordered the layers in comparison with Fig. 24 so the pairing can be drawn without crossings. For the transposition, σ = (12), layer 1 is paired with2 and layer 2 with1: Since σ specifies a pairing of layers, we will sometimes refer to it (and the continuum versions in the subsequent sections) as the "pairing field". The physical interpretation of these pairings of layers is discussed in Sec. VI D below. Heuristically, pairing Feynman histories in the discrete time evolution allows phase cancellation to be avoided [33,42], in the spirit of the diagonal approximation in periodic orbit theory [85].
That is, we may think of the multi-layer circuit as a discrete path integral for N forward and N backward copies of the system. A Feynman trajectory is specified by a sequence of spin states in each of the copies. In a given layer, the corresponding product of matrix elements of local gates is the discrete analogue of the exponentiated action for a continuum Feynman trajectory: e iS or e −iS depending on whether it is a forward or a backward layer. After averaging (or, in some cases, even without averaging [42]) this multi-layer path integral may be dominated by configurations in which forward and backward layers form "pairs" with similar spin configurations, contributing opposite phases to the total weight. Such a pairing allows the effect of phase cancellation to be reduced. (See also Sec. VI D.) The pattern of pairing will in general differ at different locations in spacetime, corresponding to spacetime dependence of the pairing field σ. If the boundary conditions -say at the final time -involve pairwise index contractions of layers, as arise in the expressions for Rényi entropies (Sec. VI B), this will act as a boundary "magnetic field" which selects out a particular value for the pairing field σ at the boundary. G N acts on σ via both left and right multiplications, for the two S N factors respectively, and via inversion for the Z 2 generator, so that we have the symmetry transformations: for permutations g L and g R , together with combinations of the above. The effective spin interactions in Eq. 124 are local for simple choices of the random tensors or gates, but in general depend nontrivially on N , and may even be negative. 32 However there is a relatively simple picture of the entangling phase as a phase where σ is ferromagnetically ordered, so that G N is spontaneously broken, and of the disentangling phase as a disordered phase. Entanglement entropies may be expressed as free energy 32 Simplifications arise in the fully unitary case. Even there, for general N it is possible to have negative Boltzmann weights Wσ.
Simplifications also arise at large local Hilbert space dimension [13,14,33,35]. 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " P e R B u J D J Z y 6 M 4 u o 9 n B v h b D F t C n w = " > A A A C B H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c l Z k q 6 L L o x m U F + 4 B 2 K E k m 0 4 Z m J k O S E c r Q r X / g V n / A n b j 1 P 9 z 7 I W b a W d j W A 4 H D O f d y T w 5 J B N f G d b + d t f W N z a 3 t 0 k 5 5 d 2 / / 4 L B y d N z W M l W U t a g U U n U J 1 k z w m L U M N 4 J 1 E 8 V w R A T r k P F d 7 n e e m N J c x o 9 m k j A / w s O Y h 5 x i Y 6 V e P 8 J m R A j K v O m g U n V r 7 g x o l X g F q U K B 5 q D y 0 w 8 k T S M W G y q w 1 j 3 P T Y y f Y W U 4 F W x a 7 q e a J Z i O 8 Z D 1 L I 1 x x L S f z S J P 0 b l V A h R K Z V 9 s 0 E z 9 u 5 H h S O t J R O x k H l E v e 7 n 4 n 9 d L T X j j Z z x O U s N i O j 8 U p g I Z i f L / o 4 A r R o 2 Y W I K p 4 j Y r o i O s M D W 2 p Y U r h E g R 5 L V 4 y y W s k n a 9 5 l 3 W 6 g 9 X 1 c Z t U V A J T u E M L s C D a 2 j A P T S h B R Q k v M A r v D n P z r v z 4 X z O R 9 e c Y u c E F u B 8 / Q L y e p j b < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " P e R B u J D J Z y 6 M 4 u o 9 n B v h b D F t C n w = " > A A A C B H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c l Z k q 6 L L o x m U F + 4 B 2 K E k m 0 4 Z m J k O S E c r Q r X / g V n / A n b j 1 P 9 z 7 I W b a W d j W A 4 H D O f d y T w 5 J B N f G d b + d t f W N z a 3 t 0 k 5 5 d 2 / / 4 L B y d N z W M l W U t a g U U n U J 1 k z w m L U M N 4 J 1 E 8 V w R A T r k P F d 7 n e e m N J c x o 9 m k j A / w s O Y h 5 x i Y 6 V e P 8 J m R A j K v O m g U n V r 7 g x o l X g F q U K B 5 q D y 0 w 8 k T S M W G y q w 1 j 3 P T Y y f Y W U 4 F W x a 7 q e a J Z i O 8 Z D 1 L I 1 x x L S f z S J P 0 b l V A h R K Z V 9 s 0 E z 9 u 5 H h S O t J R O x k H l E v e 7 n 4 n 9 d L T X j j Z z x O U s N i O j 8 U p g I Z i f L / o 4 A r R o 2 Y W I K p 4 j Y r o i O s M D W 2 p Y U r h E g R 5 L V 4 y y W s k n a 9 5 l 3 W 6 g 9 X 1 c Z t U V A J T u E M L s C D a 2 j A P T S h B R Q k v M A r v D n P z r v z 4 X z O R 9 e c Y u c E F u B 8 / Q L y e p j b < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " P e R B u J D J Z y 6 M 4 u o 9 n B v h b D F t C n w = " > A A A C B H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c l Z k q 6 L L o x m U F + 4 B 2 K E k m 0 4 Z m J k O S E c r Q r X / g V n / A n b j 1 P 9 z 7 I W b a W d j W A 4 H D O f d y T w 5 J B N f G d b + d t f W N z a 3 t 0 k 5 5 d 2 / / 4 L B y d N z W M l W U t a g U U n U J 1 k z w m L U M N 4 J 1 E 8 V w R A T r k P F d 7 n e e m N J c x o 9 m k j A / w s O Y h 5 x i Y 6 V e P 8 J m R A j K v O m g U n V r 7 g x o l X g F q U K B 5 q D y 0 w 8 k T S M W G y q w 1 j 3 P T Y y f Y W U 4 F W x a 7 q e a J Z i O 8 Z D 1 L I 1 x x L S f z S J P 0 b l V A h R K Z V 9 s 0 E z 9 u 5 H h S O t J R O x k H l E v e 7 n 4 n 9 d L T X j j Z z x O U s N i O j 8 U p g I Z i f L / o 4 A r R o 2 Y W I K p 4 j Y r o i O s M D W 2 p Y U r h E g R 5 L V 4 y y W s k n a 9 5 l 3 W 6 g 9 X 1 c Z t U V A J T u E M L s C D a 2 j A P T S h B R Q k v M A r v D n P z r v z 4 X z O R 9 e c Y u c E F u B 8 / Q L y e p j b < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " P e R B u J D J Z y 6 M 4 u o 9 n B v h b D F t C n w = " > A A A C B H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c l Z k q 6 L L o x m U F + 4 B 2 K E k m 0 4 Z m J k O S E c r Q r X / g V n / A n b j 1 P 9 z 7 I W b a W d j W A 4 H D O f d y T w 5 J B N f G d b + d t f W N z a 3 t 0 k 5 5 d 2 / / 4 L B y d N z W M l W U t a g U U n U J 1 k z w m L U M N 4 J 1 E 8 V w R A T r k P F d 7 n e e m N J c x o 9 m k j A / w s O Y h 5 x i Y 6 V e P 8 J m R A j K v O m g U n V r 7 g x o l X g F q U K B 5 q D y 0 w 8 k T S M W G y q w 1 j 3 P T Y y f Y W U 4 F W x a 7 q e a J Z i O 8 Z D 1 L I 1 x x L S f z S J P 0 b l V A h R K Z V 9 s 0 E z 9 u 5 H h S O t J R O x k H l E v e 7 n 4 n 9 d L T X j j Z z x O U s N i O j 8 U p g I Z i f L / o 4 A r R o 2 Y W I K p 4 j Y r o i O s M D W 2 p Y U r h E g R 5 L V 4 y y W s k n a 9 5 l 3 W 6 g 9 X 1 c Z t U V A J T u E M L s C D a 2 j A P T S h B R Q k v M A r v D n P z r v z 4 X z O R 9 e c Y u c E F u B 8 / Q L y e p j b < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " P e R B u J D J Z y 6 M 4 u o 9 n B v h b D F t C n w = " > A A A C B H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c l Z k q 6 L L o x m U F + 4 B 2 K E k m 0 4 Z m J k O S E c r Q r X / g V n / A n b j 1 P 9 z 7 I W b a W d j W A 4 H D O f d y T w 5 J B N f G d b + d t f W N z a 3 t 0 k 5 5 d 2 / / 4 L B y d N z W M l W U t a g U U n U J 1 k z w m L U M N 4 J 1 E 8 V w R A T r k P F d 7 n e e m N J c x o 9 m k j A / w s O Y h 5 x i Y 6 V e P 8 J m R A j K v O m g U n V r 7 g x o l X g F q U K B 5 q D y 0 w 8 k T S M W G y q w 1 j 3 P T Y y f Y W U 4 F W x a 7 q e a J Z i O 8 Z D 1 L I 1 x x L S f z S J P 0 b l V A h R K Z V 9 s 0 E z 9 u 5 H h S O t J R O x k H l E v e 7 n 4 n 9 d L T X j j Z z x O U s N i O j 8 U p g I Z i f L / o 4 A r R o 2 Y W I K p 4 j Y r o i O s M D W 2 p Y U r h E g R 5 L V 4 y y W s k n a 9 5 l 3 W 6 g 9 X 1 c Z t U V A J T u E M L s C D a 2 j A P T S h B R Q k v M A r v D n P z r v z 4 X z O R 9 e c Y u c E F u B 8 / Q L y e p j b < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " P e R B u J D J Z y 6 M 4 u o 9 n B v h b D F t C n w = " > A A A C B H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c l Z k q 6 L L o x m U F + 4 B 2 K E k m 0 4 Z m J k O S E c r Q r X / g V n / A n b j 1 P 9 z 7 I W b a W d j W A 4 H D O f d y T w 5 J B N f G d b + d t f W N z a 3 t 0 k 5 5 d 2 / / 4 L B y d N z W M l W U t a g U U n U J 1 k z w m L U M N 4 J 1 E 8 V w R A T r k P F d 7 n e e m N J c x o 9 m k j A / w s O Y h 5 x i Y 6 V e P 8 J m R A j K v O m g U n V r 7 g x o l X g F q U K B 5 q D y 0 w 8 k T S M W G y q w 1 j 3 P T Y y f Y W U 4 F W x a 7 q e a J Z i O 8 Z D 1 L I 1 x x L S f z S J P 0 b l V A h R K Z V 9 s 0 E z 9 u 5 H h S O t J R O x k H l E v e 7 n 4 n 9 d L T X j j Z z x O U s N i O j 8 U p g I Z i f L / o 4 A r R o 2 Y W I K p 4 j Y r o i O s M D W 2 p Y U r h E g R 5 L V 4 y y W s k n a 9 5 l 3 W 6 g 9 X 1 c Z t U V A J T u E M L s C D a 2 j A P T S h B R Q k v M A r v D n P z r v z 4 X z O R 9 e c Y u c E F u B 8 / Q L y e p j b < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " P e R B u J D J Z y 6 M 4 u o 9 n B v h b D F t C n w = " > A A A C B H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c l Z k q 6 L L o x m U F + 4 B 2 K E k m 0 4 Z m J k O S E c r Q r X / g V n / A n b j 1 P 9 z 7 I W b a W d j W A 4 H D O f d y T w 5 J B N f G d b + d t f W N z a 3 t 0 k 5 5 d 2 / / 4 L B y d N z W M l W U t a g U U n U J 1 k z w m L U M N 4 J 1 E 8 V w R A T r k P F d 7 n e e m N J c x o 9 m k j A / w s O Y h 5 x i Y 6 V e P 8 J m R A j K v O m g U n V r 7 g x o l X g F q U K B 5 q D y 0 w 8 k T S M W G y q w 1 j 3 P T Y y f Y W U 4 F W x a 7 q e a J Z i O 8 Z D 1 L I 1 x x L S f z S J P 0 b l V A h R K Z V 9 s 0 E z 9 u 5 H h S O t J R O x k H l E v e 7 n 4 n 9 d L T X j j Z z x O U s N i O j 8 U p g I Z i f L / o 4 A r R o 2 Y W I K p 4 j Y r o i O s M D W 2 p Y U r h E g R 5 L V 4 y y W s k n a 9 5 l 3 W 6 g 9 X 1 c Z t U V A J T u E M L s C D a 2 j A P T S h B R Q k v M A r v D n P z r v z 4 X z O R 9 e c Y u c E F u B 8 / Q L y e p j b < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " P e R B u J D J Z y 6 M 4 u o 9 n B v h b D F t C n w = " > A A A C B H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c l Z k q 6 L L o x m U F + 4 B 2 K E k m 0 4 Z m J k O S E c r Q r X / g V n / A n b j 1 P 9 z 7 I W b a W d j W A 4 H D O f d y T w 5 J B N f G d b + d t f W N z a 3 t 0 k 5 5 d 2 / / 4 L B y d N z W M l W U t a g U U n U J 1 k z w m L U M N 4 J 1 E 8 V w R A T r k P F d 7 n e e m N J c x o 9 m k j A / w s O Y h 5 x i Y 6 V e P 8 J m R A j K v O m g U n V r 7 g x o l X g F q U K B 5 q D y 0 w 8 k T S M W G y q w 1 j 3 P T Y y f Y W U 4 F W x a 7 q e a J Z i O 8 Z D 1 L I 1 x x L S f z S J P 0 b l V A h R K Z V 9 s 0 E z 9 u 5 H h S O t J R O x k H l E v e 7 n 4 n 9 d L T X j j Z z x O U s N i O j 8 U p g I Z i f L / o 4 A r R o 2 Y W I K p 4 j Y r o i O s M D W 2 p Y U r h E g R 5 L V 4 y y W s k n a 9 5 l 3 W 6 g 9 X 1 c Z t U V A J T u E M L s C D a 2 j A P T S h B R Q k v M A r v D n P z r v z 4 X z O R 9 e c Y u c E F u B 8 / Q L y e p j b < / l a t e x i t > 1 < l a t e x i t s h a 1 _ b a s e 6 4 = " P e R B u J D J Z y 6 M 4 u o 9 n B v h b D F t C n w = " > A A A C B H i c b V D L S g M x F L 3 j s 9 Z X 1 a W b Y B F c l Z k q 6 L L o x m U F + 4 B 2 K E k m 0 4 Z m J k O S E c r Q r X / g V n / A n b j 1 P 9 z 7 I W b a W d j W A 4 H D O f d y T w 5 J B N f G d b + d t f W N z a 3 t 0 k 5 5 d 2 / / 4 L B y d N z W M l W U t a g U U n U J 1 k z w m L U M N 4 J 1 E 8 V w R A T r k P F d 7 n e e m N J c x o 9 m k j A / w s O Y h 5 x i Y 6 V e P 8 J m R A j K v O m g U n V r 7 g x o l X g F q U K B 5 q D y 0 w 8 k T S M W G y q w 1 j 3 P T Y y f Y W U 4 F W x a 7 q e a J Z i O 8 Z D 1 L I 1 x x L S f z S J P 0 b l V A h R K Z V 9 s 0 E z 9 u 5 H h S O t J R O x k H l E v e 7 n 4 n 9 d L T X j j Z z x O U s N i O j 8 U p g I Z i f L / o 4 A r R o 2 Y W I K p 4 j Y r o i O s M D W 2 p Y U r h E g R 5 L V 4 y y W s k n a 9 5 l 3 W 6 g 9 X 1 c Z t U V A J T u E M L s C D a 2 j A P T S h B R Q k v M A r v D n P z r v z 4 X z O R 9 e c Y u c E F u B 8 / Q L y e p j b < / l a t e x i t > costs for non-uniform boundary conditions [33][34][35] (see Sec. VI B). We will appeal only to these facts and the symmetry structure above. Note that the simplest nontrivial case is N = 2: then there are only two possible pairings, I and (12). Denoting these + and − leads to an effective Ising model [34,41]. In this case G N reduces to a simple Z 2 symmetry relating the two states. Finally, we must specify the replica limits of interest. Loosely speaking, the required value of N [14,35] can be seen by counting powers of V . It is

< l a t e x i t s h a _ b a s e 6 4 = " P e R B u J D J Z y 6 M 4 u o 9 n B v h b D F t C n w = " > A A A C B H i c b V D L S g M x F L 3 j s 9 Z X a W b Y B F c l Z k q 6 L L o x m U F + 4 B 2 K E k m 0 4 Z m J k O S E c r Q r X / g V n / A n b j P 9 z 7 I W b a W d j W A 4 H D O f d y T w 5 J B N f G d b + d t f W N z a 3 t 0 k 5 5 d 2 / / 4 L B y d N z W M l W U t a g U U n U J k z w m L U M N 4 J E 8 V w R A T r k P F d 7 n e e m N J c x o 9 m k j A / w s O Y h 5 x i Y 6 V e P 8 J m R A j K v O m g U n V r 7 g x o l X g F q U K B 5 q D y 0 w 8 k T S M W G y q w j 3 P T Y y f Y W U 4 F W x a 7 q e a J Z i O 8 Z D L I x x L S f z S J P 0 b l V A h R K Z V 9 s 0 E z 9 u 5 H h S O t J R O x k H l E v e 7 n 4 n 9 d L T X j j Z z x O U s N i O j 8 U p g I Z i f L / o 4 A r R o 2 Y W I K p 4 j Y r o i O s M D W 2 p Y U r h E g R 5 L V 4 y y W s k n a 9 5 l 3 W 6 g 9 X c Z t U V A J T u E M L s C D a 2 j A P T S h B R Q k v M A r v D n P z r v z 4 X z O R 9 e c Y u c E F u B 8 / Q L y e p j b < / l a t e x i t >
which must be included in every average for the MPT. We review this more carefully in Sec. VI B.

B. Boundary conditions in replica formalism
In order to review the replica formalism [13,14,35], let us express the operator entanglement S 2 of the nonunitary time evolution operator V in a measurement or forced measurement circuit. The latter case is precisely analogous to a random tensor network, except that for the case of time evolution there is a natural division of the external legs of the tensor network into those associated with the initial time and those associated with the 33 Recall that we label V = Vm by the sequence m of measurement outcomes obtained in a given realization of the dynamics.
final time. We focus in this subsection only on reviewing how the boundary conditions in the effective partition function arise formally (see Sec. VI D for more on how the "pairing field" arises in the bulk). We defined the operator entanglement in Sec. II. Recall that, if we view V formally as a tensor network wavefunction for 2N spins, then ρ t is the unnormalized reduced density matrix associated with the final-time legs. Let us start with the case of the FMPT, where expectation values (denoted by E[. . .] or [. . .]), are simple averages over the unitaries and projections in V . The expectation value of the second Rényi entropy is On the right we have indicated the pattern of index contraction graphically. The vertical lines represent a stack of copies of V and V * , like that in Fig. 24, but viewed from the side. For convenience, we have ordered the four layers in the stack as follows: V * , V , V * , V (instead of grouping all of the V * s together as we did in Fig. 24). The arcs at the top and bottom indicate the pattern of index contractions between layers. Index contractions are done separately for each of the physical sites. Next let us define "partition functions" that are averages of the multi-layer circuit with particular choices of boundary conditions. We use the notation Z N (σ|τ ) for the average of the circuit with N layers of V and N layers of V * , and with index contractions in the pairing pattern σ at the top and τ at the bottom. For example, Z N (σ|τ ) maps to a partition function for the pairing field with an effective "magnetic field" favouring pairing state σ at the final time (top) and τ at the initial time (bottom). Eq. 131 is not immediately written in terms of such partition functions, because of the logarithm and the fraction, but this can be dealt with using the replica trick [35]. Eq. 131 is trivially equivalent to for any m > 0, since the factors of m cancel. But one may check (by expanding in m in the numerator and denominator below) that in the limit m → 0 the expectation value may be taken for the numerator and denominator separately: As usual, we treat m as a positive integer at intermediate stages of the calculation. The above then becomes Here τ 2,m denotes a permutation in S 2m that is a product of m commuting 2-cycles [33] τ 2,m = (12)(34) . . . (2m − 1, 2m).
Eq. 135 may now be interpreted as the free energy cost of imposing distinct boundary conditions for the pairing field σ (represented by the continuum field X in the sections below) at the initial and final times. If the free energy cost for given boundary conditions σ and τ is 34 This generalizes directly to higher Rényi entropies. (The von Neumann entropy can either be obtained using an additional limit n → 1, or by a slightly different construction with a single replica limit [35].) Note that the total number N of replicas (denoted 2m above) tends to zero as stated above for the FMPT and the RTN. The simplest situation, discussed in the next subsection, is where the pairing field is well-ordered across the entire sample. Then the free energy cost F is essentially the free energy cost of inserting a single domain wall in this order [33][34][35]. See for example Fig. 30 in Sec. VI I.
In fact, in this situation (the strongly-ordered regime ) results from the unitary case suggest that in the replica limit can be dispensed with: we can map the entanglement to the free energy cost of a single domain wall in an effective classical disordered system [33]. The most direct way to understand this is to avoid the replica trick entirely [42]. It is possible to make a formal mapping of the multilayer circuit in Eq. 131 (with N = 2) to an "Ising model" without any averaging. In general this model has complicated long range interactions, so that it is is not useful for discussing the critical point. But in the strongly ordered regime we expect (assuming the considerations for the unitary case in [42] carry over) that the interactions are effectively local after sufficient coarsegraining. S 2 in a given realization can then be understood as a domain wall cost in a disordered Ising model. This is a route for justifying the the use of an Ising model to discuss for example subleading corrections to the volume law in the entangled phase [24,27]. When the critical point is approached we must however return to the replica description above.
The application to the MPT is similar to the case of the FMPT. Graphically, the Born probability P (m) = ψ| V † m V m |ψ for a sequence of measurement outcomes may be denoted by where the dots represent contraction with ψ or ψ * as appropriate. Let us absorb a trivial constant into "E" so that it denotes the average over the structure of the circuit together with the unweighted sum over m: We may simplify the formulas slightly by averaging over the initial state, which yields The replica trick then allows us to write Formally this is similar to (135), but the total number of replicas N = 2m + 1 is taken to 1 rather than 0 [13,14].

C. Permutations and coarse-graining
We will now focus on the critical properties. Let us first make a brief detour to consider coarse-graining a lattice model of permutations, such as that shown schematically in Eq. 124, in an abstract sense, in order to understand one of the basic challenges. (This section is not an essential prerequisite for the following developments -the reader who wants to get to the concrete results may wish to skip it.) We work throughout with a system in some finite number of dimensions D = d + 1 (the spacetime dimension in the case of a circuit). Naively we might expect the limit of large d to match the all-to-all circuit (as in Sec. III E) but this is unclear (Sec. VI H).
Let us first imagine attempting a block-spin RG procedure in a naive way, by simply "averaging" the spins i within each D-dimensional local block: What does this expression mean? At this point, each σ i on the RHS is a formal group element in S N . Their linear combination, µ block , is no longer in S N , since addition is not a group operation (only multiplication). Instead it is an element of the group algebra of S N [88]. A general element of the group algebra is a linear combination of the elements g of the group with numerical coefficients M g , where in the present case M g ∈ R. In other words, we can think of the coefficients M g as forming a vector M of length N !, which is the order of S N . The coarse-grained spin above is equivalent to this vector. However, µ block , or equivalently the vector M, is not a natural coarse-grained field in general. The reason for this is that M does not form a single representation of the global symmetry G N . Instead, the N !-dimensional vector space splits into many distinct representations, in fact a number of representations that grows exponentially as N grows. Standard results for the group algebra imply that the representations of G N that appear when we decompose µ block are in one-to-one correspondence with the irreducible representations of S N [88]. To extract a particular representation of G N , we simply replace the formal group elements in Eq. 142 with their matrix representatives in the corresponding representation of S N .
This means that our initial attempt to form a block spin has led us not to a single coarse-grained field, but to an indeterminate number (because N must be left free) of different coarse-grained fields, each in a different representation of the global symmetry group G N .
In principle, we could try to write down a Lagrangian including all of these fields. However, since the number of these fields, and therefore the number of couplings, depends on N , this does not seem promising. Instead, it is natural to hope that only one or a small number of the fields become massless at the critical point, and the other fields do not need to be included in a continuum Lagrangian. This is the assumption we will make, motivated by the more explicit picture in the following section.
This picture of splitting µ block into separate fields gives an alternative view on the discussion of the percolation fixed point in Ref. [35]. The authors imagined starting with a lattice model with a much enlarged symmetry, S N ! (not S N or S N × S N or G N ). This much larger symmetry group is allowed to arbitrarily permute all the N ! values σ ∈ S N that the spin can take. Such a lattice model is simply a Potts model with Q = N ! states, for which the continuum theory is well known (becoming percolation when Q → 1). The authors then considered deforming model in the direction of the physical model of interest (cf. 124) which does not have S N ! symmetry. They found that the lowest order perturbation that could be added was quadratic in the Potts field, and so relevant. However there was considerable freedom in the index structure of this perturbation, which could be formed from any class function of S N .
From the present point of view, this perturbation is a sum of mass terms, with one independent mass for each of the infinite number of fields that appear when we decompose µ block above into representations of S N for arbitrary N .

D. Motivating a simple Landau theory
A familiar way to represent a permutation in S N is as an N × N matrix X a,b of ones and zeros, with a single 1 in each row and in each column, Under the global symmetries in Eq. 127, this matrix transforms as: where L and R are permutation matrices representing g L and g R . We might hope that we can build a Landau theory from such a matrix. In terms of the discussion in the previous section, this will correspond to the simplest choice of representations of G N to include in the continuum theory (discussed below). In fact we can motivate such a Landau theory in a more direct way, without the need to go through the mappings discussed above involving permutations.
For this we appeal to the basic physical picture for why the pairings of layers arise in the multi-layer circuit, which is to avoid phase cancellation. To make this explicit, let's consider a particularly simple example of a tensor network V (which we can interpret formally as a nonunitary time evolution for qubits) with the geometry in Fig. 29, Left. Label the bond index values by S = ±1 (these are the spins' σ z values if V is interpreted as a time evolution). Take the local gates w, with bond indices S 1 , S 2 , S 3 , S 4 , to have the simple form w S1,S2,S3,S4 = exp where each h is an independent, identically distributed complex Gaussian variable with mean zero, and equal variance ∆ 2 h /4 for its real and imaginary parts, and similarly for the Js, with variance ∆ 2 J /2. (These couplings are taken complex since tensors in a generic tensor network are complex.) The tensor contraction defining V involves a sum over all the indices carried by the internal bonds in Fig. 29, i.e. over all Feynman trajectories, if we think of the vertical direction as time. This tensor contraction is an Ising partition function for the indices S i on the bonds i. With the choices above, this Ising model lives on a rotated square lattice. We may write its partition function as where the exponentiated lattice "action" e iS is just a product of terms of the form (146), so that iS [{S}] is an Ising Hamiltonian with random complex magnetic fields and random complex nearest-neighbour couplings. This is schematic as we have left the boundary conditions unspecified. (Fixed boundary conditions on the spins give a matrix element of V , for example; in practise we are interested in taking several layers of Z which are coupled at their boundaries.) Quantities of interest involve the replicated partition function (cf. Fig. 24). Up to boundary conditions, this is given by averaging Z N × Z * N over all of the random h and J parameters, as in the standard application of the replica trick to the Ising model with random bonds or random fields [76]. Introducing N replicas of the Ising spin for the forward layers, denoted S a for a = 1, . . . , N , and N replicas for the backward layers denoted S b , the replicated partition function Z N has the form (we do not include an i in the definition) with where we have defined the "pairing field" This is similar to an Edwards-Anderson order parameter in an Ising spin glass. However the usual Edwards-Anderson order parameter would be of the form S a S b (as there would be no distinction between forward and backward layers) and the replica permutation symmetry would act on both a and b together. In the present case we have separate permutation symmetries for the the row and column indices of X.
We defer an explicit discussion of coarse-graining for this and other microscopic models to a separate publication [87]. Here we note only that the form of Eq. 149, with ferromagnetic interactions between the pairing field X for different sites, motivates writing a continuum Lagrangian for an N × N matrix, as discussed above.
In the present microscopic formulation, X is not a permutation matrix, but the action of symmetry is the same (Eq. 145). This is what we will use, together with the assumption that the pattern of symmetry breaking in the entangled phase is the simplest one corresponding to pairing, i.e. to a choice of permutation.
Without loss of generality, let this permutation be the identity permutation (other cases are related by symmetry). Then the pattern of symmetry breaking is captured by an expectation value of the form where f is the order parameter.
Here c is a constant which is generically nonzero even in the disordered phase, since (unlike f ) it does not break any symmetry. This order breaks G N down to S N × Z 2 , where the remaining permutation group is the subgroup of diagonal S N × S N transformations with g L = g R . (We will briefly discuss more complex possiblities for symmetry breaking in Sec. VI J.) The physical interpretation of X is simple: if in some region the spin configuration in the forward layer a is close to that in backward layer b, then the coarse-grained X ab in this region will be large. Heuristically, we expect repulsive interactions between X ab and X ac for b = c: if the configuration in a is close to that in b, the phases from the a layer are already (partially) cancelled, so there is less gained by also pairing with c.
Let us briefly mention a caveat to the above discussion. A "random tensor network" is by definition a statistical mechanics problem with very little required structure. Similarly the complex Ising model discussed above (which is an example of a random tensor network) is close to being the most general Ising model that one could write down for this lattice geometry. 35 On the other hand, the true measurement dynamics in the MPT does have some structure (for example, structure associated with causality) which is not present in a generic tensor network. In writing down the field theory in the next section we are assuming that the only aspect of the structure of the MPT that is important for the critical theory is the shift in the number of replicas from N = 0 to N = 1 that is induced by the Born probability. This assumption should certainly be examined further.
We note that the unitary limit, r = 0, is a case where additional structure due to unitarity certainly is important. There the appropriate effective "spin model" has hard constraints on the allowed spin configurations, which for example enforce causality [33,41,43] (these are relaxed when projection operators are included [13,14]). As a result, the unitary models do not possess invariance under O(d+1) rotations in spacetime, even in the scaling limit, and are not described by the field theories below, which do possess this symmetry. However the unitary models do share some features with the ordered phases of these theories, such as a positive domain wall tension.

E. A field theory for the measurement transition
With this motivation, let us write the simplest Lagrangian for X ab , which can represent a coarse-graining either of a permutation matrix or of the composite field above. We will see that this simplest Lagrangian passes a basic consistency check for the MPT. (In the next section we will see that we need to extend it for the RTN and the FMPT.) Let us make subtractions so that the row and column sums of the matrix give zero: In the case where X is microscopically a permutation, this simply requires us to subtract a constant: As a result of these linear constraints, which are preserved under coarse-graining,X has (N −1) 2 independent components, and forms an irreducible representation of G N . Below we will omit the caret onX. Including terms in the potential only up to cubic order in X, and imposing G N symmetry gives a relatively simple Lagrangian. The theory we propose for the MPT is: We have included both time and space derivatives in the first term with the same coefficient, i.e. we have set a nonuniversal speed to 1. This field theory has emergent Euclidean rotational invariance (not Lorentz invariance) in spacetime if this is not broken by boundary conditions. The components of the matrix X are not independent, because of the constraints in Eq. 152. Note that as a result, in contrast to the theory discussed in the next section, the only linear term ab X ab that would be allowed by symmetry is in fact zero. The replica limit N → 1 is also implied. The renormalized squared mass vanishes at the critical point, µ 2 ∝ (r − r c ). Alternately, we may write X in terms of an unconstrained (N − 1) × (N − 1) matrix field φ αβ , 36 (155) The tensor D is a tensor product of that appearing in the cubic term of the Potts model [74,75]: The theory with the cubic term can only make sense for the replica limit -for N > 2 we have an unstable potential and for N = 2 the cubic term vanishes. This is also the case for the Landau-Ginsburg-Wilson-like theory for percolation, which we have already discussed in Sec. III B. Like that theory, the upper critical spacetime dimension of (154) is D = 6. A basic consistency check on our picture is that this theory indeed sustains a stable ordered phase, with the simple pattern of symmetry breaking described in previous sections, when µ 2 < 0. That is, the masses of fluctuations about the ordered state should remain positive in the replica limit N → 1: otherwise some more complex pattern of symmetry breaking might be required [89][90][91]. To check this we put X ab = f (δ ab − 1/N ) + W ab , where f is the magnitude of the order parameter, and W represents fluctuations (with a W ab = 0, etc.). The saddle-point equation requires f = −µ 2 3g N N −2 . The mass terms in the Lagrangian for W are then L = −µ 2 N 2(2−N ) ab W 2 ab − 2 a W 2 aa . We may check that the eigenvalues of the mass matrix appearing here are indeed positive when µ 2 < 0 and N → 1 (App. F 1), so this consistency check is satisfied. Now we consider another important consistency check.

F. Counting fields
Above we started with an N × N matrix X ab transforming under G N symmetry. For integer N > 1 we may split a general such matrix into four distinct fields, trans- 36 We use the set of N vectors e 1 , . . . , e N , each of N − 1 components, that are familiar in the context of the Potts field theory [74,75] (see App. A 2) and satisfy a e a α e a β = δ αβ : forming under distinct representations of S N × S N : The last of these,X, is in the fundamental (standard) representation for both S N factors. It lives in an irreducible representation of G N of dimension (N − 1) 2 . R and C each transform under only one of the S N factors. Since they are exchanged by the Z 2 generator, together they form a single representation of G N of dimension 2(N − 1). S is a singlet. For the MPT we constructed a Landau theory that contained only the fieldX. This was the obvious thing to do for various reasons (for example, if we think of X microscopically as a permutation matrix, then R, C and S are trivial constants). We conjectured that for the MPTX is the only field that becomes massless at the critical point.
However the group theory at N → 0 [92] gives additional constraints which strongly suggest that all of the representations in Eq. 157 become simultaneously massless at the critical point, so that we cannot throw away the representations R, C and S. Therefore we have to work with a general matrix X in which the row and column sums are not fixed to zero. The first indication of this is that the subtractions in Eq. 153 and Eq. 157 diverge when N → 0.
As with many other replica field theories, the partition functions that we are interested in become trivial -exactly equal to 1 -in the replica limit, for certain choices of boundary conditions. An unusual feature of the circuit models with measurements or forced measurements is that this occurs at two values of N . When N → 0 (FMPT) it occurs for the usual reason -because the partition function is the average of something raised to the power zero. When N → 1 (MPT) the partition function is the sum of the probabilities of all the measurement outcomes -again giving 1 but for a different reason.
The fact that the microscopic partition function is equal to 1 implies constraints on the spectrum of operators in the continuum theory [86,[92][93][94]. Here a minimal heuristic point will be sufficient: there should not be any massless fields left when N is set equal to N * , the desired number of replicas, otherwise we will have a nontrivial free energy, contradicting Z = 1.
The Lagrangian (154) for the MPT satisfies this condition, since the field is in a representation of dimension (N − 1) 2 , which tends to zero when N → 1. Therefore it passes this basic consistency check.
At first we might have assumed that the same field theory could also be continued to N = 0 in order to describe the RTN and FMPT. However this is not the case. Since (N − 1) 2 is equal to one in this limit, rather than zero, this is not consistent. However, the total multiplicity of all the representations in Eq. 157 is just N 2 (the number of components of the matrix), which does tend to zero in the replica limit N → 0. This suggests that we should write a Lagrangian for a matrix X without imposing any condition on its row or column sums. 37 This is what we do next.
G. Field theory for random tensor network/FMPT Let us denote the unconstrained real N × N matrix by Y , to distinguish it from the matrix X above which obeyed linear constraints. Assuming only G N symmetry and no constraints on Y , we argue below that the most relevant terms as N → 0 are contained in The parameter that drives this theory off criticality is r, the coefficient of the linear term (not to be confused with the measurement rate in previous sections, also denoted r). Since no constraint is imposed on Y , this linear term does not vanish (contrast Sec. VI E). The term ab Y 2 ab is absent because its coupling can be set to zero by a shift Y ab → Y ab + C with a constant C, i.e. it is redundant [95]. Surprisingly, we will find below that for this theory the upper critical dimensionality of spacetime is D = 10.
A peculiar feature of the N → 0 limit of Eq. 158, which is shared with some other replica field theories such as the Landau-Ginsburg formulation of the random field Ising model [76], is the presence of a quadratic coupling which is not zero at the critical point and which cannot be removed. This is the term m 2 F Y.F.Y . If we instead study the above theory for a larger value of N , for example in the N → 1 limit, then the effect of m 2 F is simply to give a mass to certain representations in the decomposition of Y . The corresponding fields can therefore be eliminated at large scales/low momenta. Doing so returns us to the critical theory proposed in Sec. VI E for the MPT, with µ 2 ∼ −r. This is shown explicitly in App. F 2. However, writing the propagator explicitly shows that the limit N → 0 that is of interest 37 At first sight the interpretation of these additional fields may seem obscure, given that for a permutation matrix they are trivial constants. This may be more transparent in the approach of Sec. VI D.
to us in this section does not commute with the limit of small momentum [76]. Therefore we have to retain the Y F Y term explicitly. Note that this term, which can be written includes contributions such as Y 12 Y 13 : this is consistent with the "repulsion" that was discussed heuristically towards the end of Sec. VI D, between pairing patterns involving a given layer. G N symmetry allows many other terms at order Y 3 but we argue that in the N → 0 limit they contribute only less relevant couplings. The dimensional analysis may be simplified using an approach [96][97][98][99] introduced by Cardy for the field theories of the random field Ising model [100,101] and the branched polymer [102][103][104]. Since decomposition into representations of S N fails in the N → 0 limit, the next best thing is to exploit a decomposition into representations of an S N −1 subgroup acting on indices 2, . . . , N . Here we must do this for both the row and column indices of Y .
We make a linear transformation to rewrite the field Y ab as a field y αβ whose indices α and β take values in the set {+, −, 2, . . . , N }: The Because of the linear constraints j y j+ = 0 etc., sectors (2) and (3) each contain N − 2 copies of the same theory, and sector (4) contains (N − 2) 2 copies of the same theory. 38 In the above rewriting, terms with couplings that vanish as N → 0 were dropped [96]. Before writing the interaction terms, we use the quadratic terms to assign engineering dimensions to the various fields (see App. F 3 for details). We assign dimensions x αβ such that all the quadratic terms in the Lagrangian Eq. 162 are marginal at the critical point r = 0. This gives with (recall that in the case of a circuit D = d + 1 is the spacetime dimension) The RG eigenvalue of a cubic interaction term y αβ y α β y α β is then determined by the difference in the number of + indices it contains and the number of − indices it contains among α, . . . , β (App. F 3). However, the terms that can appear are constrained by the G N symmetry (whose effects are less obvious in the new representation). We confirm in App. F 3 that the cubic term g ab Y 3 ab shown in Eq. 158 is strictly more relevant than the other symmetry-allowed cubic terms (at least for large enough D) and is of the form g Y 3 ab = g 2 6y ++ (y ++ y −− + 2y +− y −+ ) + 6y ++ y −k y +k + y j− y j+ + y jk y jk /4 + 3 (y +− y j+ y j+ + y −+ y +k y +k ) + 3y j+ y +k y jk + less relevant terms.
The RG eigenvalue of g is (10−D)/2, so the upper critical dimension for this theory is D = 10.

H. Consequences of the MPT field theory
We discuss some simple consequences of the putative field theory for the MPT, deferring a detailed analysis, and a discussion of the more complicated theory in the previous section, to another time. However, first we note an important caveat to the discussion.
Our initial hope was that the large-D limits of these field theories would give exact results both for the all-toall circuits and for tree tensor networks. For example, this is what we found for the classical minimal cut toy model (because all-to-all percolation could be understood using the field theory for percolation in high dimensions, Sec. III E.) But the class of tree tensor networks that we understand best, including those derived from the all-toall FMPT circuit with Haar-random gates, seems not to be described by the field theory of Sec. VI G, simply because it is hard to imagine the exponential scaling of the order parameter in Eq. 51 being reproduced by a meanfield treatment of Eq. 158. Therefore it seems unlikely that the all-to-all circuits studied in this paper are described by the d → ∞ limit of the above field theories. We do not yet understand the reason for this difference.
It is not ruled out that our Lagrangians overlook some crucial structure, and that as a result they do not capture any models of measurement circuits or random tensor networks, even in finite dimensions. For present purposes we will assume this pessimistic scenario does not hold, and that the two field theories in Secs. VI E and VI G do capture at least some class of models for the MPT and for the FMPT/RTN. We will explore these issues further elsewhere.
The simpler of the two field theories is that in Sec. VI E for the MPT, involving a field X ab with vanishing row and column sums. As a result of the cubic term, this theory has upper critical spacetime dimension d + 1 = 6. Interestingly, the logic of Sec. III E for the percolation problem above 5 spatial dimensions applies in this case too, since it relied only on the engineering dimensions of the fields. We can therefore carry over the exact exponent values so that (neglecting physics on timescales shorter than L = N 1/d , see Sec. III E) the natural scaling variables in high dimensions are again where δr = r − r c is the parameter driving the transition (and the number N of spins should not be confused with the replica number in the preceding sections). Let us consider the operator entanglement in the ordered phase, still above the upper critical dimension. The plateau value of the operator entanglement, S 2 ∼ sN , is proportional to the energy cost of a domain wall in X that spans the system in the spatial directions, as discussed in the following section. In high dimensions the scaling of s follows from dimensional analysis, giving s ∝ µ 5 g −2 (154), or in terms of the deviation δr from criticality, which is the same exponent as for the classical problem in high dimensions. A similar scaling form will again apply, S 2 = H[t/N 1/5 , N 2/5 δr], but with a different scaling function H. The size of the order parameter X itself, which may be measured using appropriate correlation functions, grows linearly with the distance from the critical point, X ∼ |δr|. Again we have a characteristic timescale τ = N 1/5 W (N 2/5 δr), for an appropriate scaling close to the transition. Below 5+1 dimensions the scaling is different, because the cubic term is no longer dangerously irrelevant. The appropriate scaling variables are as usual where ν is the correlation length exponent for the field theory (154). Exponents could be computed in a 6 − expansion and will differ from percolation exponents (since the structure of the field theory is different, despite sharing the same upper critical dimension). In the ordered phase there is still an exponentially long timescale, with close to the transition (Sec. VI I).

I. Long timescale in the entangled phase
So far in this section we have focussed on the continuum description close to the transition. Here we discuss something simpler, namely the emergence of a timescale that (in the entangled phase) is exponentially large in the number of spins, and the contrasting short timescale in the disentangled phase. We may consider either a model in d spatial dimensions with N = L d spins, or the all-toall model. The results in this section are independent of the conjectural field theories above, as they rely only on more basic features of the effective spin model (pairing field) descriptions.
The appearance of a long timescale may be understood in analogy to standard 1D or quasi-1D classical models. Here the 1D coordinate is time: see Fig. 30.
In the ordered phase the pairing field (either σ on the lattice or X in the field theory) has long range order across a temporal slice and, after coarse-graining sufficiently, we may think of it as a function only of time. There is then a competition between the free energy cost of imposing a domain wall at a particular time, which scales as sN with s > 0, and the entropy ln t associated with translating the domain wall in the time direction. At a timescale τ with ln τ ∝ s N the translational entropy wins, and domain walls proliferate. Long-range order then no longer extends from the initial to the final time. By the identification of the entanglement with a free energy, this also means that the entanglement begins to decay exponentially with time.
Recently the exponentially long timescale in the entangled phase has been discussed from several points of view. Refs. [12] and [54] consider a limit where the unitary evolution during a unit time can be treated as a 2 N × 2 N Haar random unitary (see also App. E here for related considerations). Ref. [27] has also given an analysis in terms of Ising domain walls that is similar to our considerations below.
The proportionality in Eq. 175 allows for an order 1 constant: however we expect that N −1 ln τ vanishes in the same manner as s when the critical point is approached from the entangled side (for example with the same power of the tuning parameter when this dependence is a power law).
At times sufficiently shorter than τ the operator entanglement entropy has a plateau at an extensive value. The plateau value is corrected by a negative subleading term whose magnitude grows logarithmically with time. In terms of the pairing field, the plateau regime is that where the number of domain walls is the minimal number allowed by the boundary conditions. For S 2 , in the plateau regime, it is in fact sufficient to think about an Ising domain wall in a system with (Ising symmetric) disorder, for reason discussed towards the end of Sec. VI B. That is, we expect that the replica trick can be avoided in the strongly entangled regime. It is also possible to argue for the Ising picture using the replica treatment, by arguing that in this regime the replica theory is equivalent to the replica representation of a disordered Ising model [33]. 39 If we neglect quenched disorder, then we obtain in the plateau region. The second term is the contribution from translational entropy, arising because the centre of mass temporal coordinate t dw of the domain wall can be located anywhere in (0, t). The form in Eq. 176 was obtained in Ref. [12] in a limit of very dilute measurements, where the system can be viewed as completely scrambled by a random unitary between each measurement. Ref. [27] gave a picture in terms of Ising domain walls equivalent to the one presented above. Here we have also suggested how the effective Ising model can be justified (in an appropriate regime and at the level of universal properties) rather than being only a heuristic model. Our consideration also implies that we should take into account quenched disorder, as discussed below.
(For another application of domain wall entropy in an effective 1D model to quantum chaos, see Refs. [85,105].) As a check on the replica picture, we have also considered a toy model for the entangled phase that involves multiplying large random matrices. A crude treatment in App. E (which neglects spatial structure, random fluctuations, and also the n-dependence of the Rényi entropies) reduces to computing the singular values of a sub-block of a large Haar-random unitary. This treatment also yields Eq. 176, and shows that the plateau value sN determines the timescale for exponential decay of S n in the regime of much later times, as expected from the above. An analysis of related random matrix models has recently been presented in Ref. [54].
Eq. 176 is the simplest picture, neglecting quenched disorder.
In reality there will be more complex crossovers. For example, in the all-to-all model there may be a regime of timescales where the subleading correction is not ln t but instead proportional to √ N ln t as in the classical minimal cut problem (Sec. III D). This is because the conditional free energy F (t dw ), given by fixing t dw , will vary with t dw due to randomness: F (t dw ) = sN + η(t dw ). In high enough dimensions, and therefore presumably also in the all-to-all model, the typical fluctuations η(t dw ) will be Gaussian with a scale √ N . Although these fluctuations are much smaller than N , they are in principle much larger than 1. Therefore at early enough times the free energy will be dominated by the optimal (most negative) value of η(t dw ), rather than by translational entropy. 40 But at larger times, there may be a regime where ln t entropy again dominates, giving the functional form in Eq. 176. At still larger times multiple domain walls will proliferate (and the full replica treatment is required) and eventually S 2 decays exponentially in time.
The fact that only a single domain wall plays a role in the plateau regime means that there is an approximate factorization property for S 2 in a given realization 40 This is similar to what happens for the classical minimal cut. If η may be treated as Gaussian, the correction to the entanglement is of order √ N ln t in this regime.
of the circuit. If we divide V into two parts, V (1) corresponding to evolution from 0 to t and V (2) from t to t, then e −S2 e −S (1) 2 + e −S (2) 2 . The first term includes configurations with t dw ∈ (0, t ) and the second those with t dw ∈ (t , t). (This is approximate not only because it neglects configurations with multiple domain walls but also because it does not correctly treat domain walls with t dw close to t .) We now contrast the properties of the disentangled phase with those of the entangled phase. Let us take the limit N → ∞ first, so that as usual we can define the operator entanglement per spin at a given time: (we have written this equation for S 2 , but the choice of Rényi index n ≥ 1 should not be crucial). In contrast to the quasi-1D limit discussed above, this is the free energy cost, in an infinite slab of finite thickness, of imposing the domain wall boundary conditions described in Sec. VI B.
In the disentangled phase the free energy cost per unit transverse area decays exponentially with the thickness of the slab, so that s 2 (r, t) decays exponentially to zero with time.

J. Variants and comments
In this subsection we discuss a few extensions of the field theory approach we have presented, as well as some open questions.
The measurement problems and random tensor networks that we have discussed so far have no internal global symmetries. One could also consider, say, measurement dynamics with an Ising symmetry [37,40]. The definition of the pairing field in Sec. VI D allows such symmetries to be incorporated, and suggests that in many cases they will change the universality class of the entanglement transition.
For example, if the tensor network in Sec. VI D has a Z 2 Ising symmetry that changes the sign of S a (and if we assume that the field whose mass vanishes at the transition is still X ab ∼ S a S b ) then odd powers of X are forbidden by symmetry in the continuum Lagrangian, which completely changes its structure in the limits of both N → 0 and N → 1. This symmetry consideration highlights a feature of the discussion in Sec. VI D, which is that the definition of X ab involves choosing a local basis. In many cases this choice may not seem natural: for example, in many random models, the statistical invariance property emphasized in Sec. IV B 1 ensures that any choice of local basis is equivalent to any other. (The exact mappings to models of permutations avoid having to choose a basis, but on the other hand it is less obvious how to coarsegrain them.) An open question is whether this necessity of choosing a basis is just an aesthetic issue, or a fun-damental one. Is it possible, for example, that the statistical invariance property imposes constraints on the continuum theory that we have neglected to take into account?
Other restrictions on the unitaries, not related to conventional symmetries, can also change the symmetries of the replica theory. For example, if all the unitaries are real-valued [106] then there is no distinction in the bulk between forward and backward layers. The symmetry group G N is then enlarged to S 2N . In this case we can introduce a pairing field in a similar manner to Sec. VI D, now with a replica symmetry action like that in standard disordered magnets and spin glasses. (The restriction to Clifford unitaries [7,12,15,17,22] is a more drastic change, which may require a different theoretical approach.) The picture in Sec. VI D relates random tensor networks (for which the limit N → 0 is the appropriate one) to the language typically used to discuss spin glasses. This relation raises the question of whether other types of replica symmetry breaking, or other types of glass transition [91], are relevant to natural choices of circuit or tensor network. For example, one could imagine a second transition taking place inside the entangled phase for some choices of tensor network. At the entanglement transition, the 2N layers form a collection of N pairs, breaking G N symmetry down to S N × Z 2 . Can the residual S N symmetry be broken in a subsequent transition? What are the entanglement properties of the resulting (presumably glassy) phase?
A statistical mechanics problem that provides a possible analogy for some of these phenomena is the directed polymer with random complex (or random sign) weights [107][108][109][110][111][112]. The replica formulation of this problem involves N copies of the polymer's partition function and N copies of its complex conjugate. Averaging over random phases forces the copies to form pairs in order to avoid phase cancellation [109], in analogy to the pairing phenomenon in the circuits. Further, the paired object -a bound state of polymers from different copies -may itself undergo phase transitions due to disorder. Perhaps this simpler problem can provide lessons for the circuit.

K. Free fermion measurement dynamics
Models of free fermions subjected to stochastic dynamics [36,37,[51][52][53][113][114][115][116][117] can also show a transition in d > 1 between two phases with differing amounts of entanglement [36]. However, instead of an area law and a volume law phase (for states in finite dimensions), we instead have an area law phase and a phase with a logarithmic violation of the area law [36,52,53].
We may also characterize the two phases by transmission of information between initial and final time, which gives a distinction that makes sense in any dimension or for the all-to-all setup. For concreteness we may consider the latter case. The model of Ref. [36], which used the 31. The field theory description of the Majorana measurement model of Ref. [36] has a continuous replicalike symmetry, allowing smooth domain walls that give a more rapid decay of Sn than in the interacting case where replica symmetry is discrete (App. B2). This exhibits a more general feature of free fermion models.
language of Majorana fermions, has a simple field theory description that is related to a model of classical loops (random walks) representing Majorana worldlines. The quasi-one-dimensional regime which is relevant here has been studied in depth in Ref. [118], which also characterizes the statistical properties of random samples. Here we consider only some more basic average quantities.
The characteristic timescale for the operator entanglement to decay is of order N , where N is the number of lattice sites, rather being than exponentially large in N as we found in the interacting case. This is a generic feature of free fermion models, as discussed below. Within the "more entangled" of the two phases, the scaling of the operator entanglement is Here K (the sigma model stiffness) is an order-one constant deep in the phase, and vanishes as K(r) ∼ (δr) 2 upon approaching the transition at r = r c to the disentangled phase (c is a fixed order-1 constant). Note that the scaling in Eq. 178 is identical to the conductivity of a disordered N -channel wire, showing the crossover from Ohm's law to localization on a timesale of order K(r)N [118,119]. The reason for the reduced timescale in the entangled phase (of order N compared to the exponential timescale in interacting models) is that the appropriate replica field theory has continuous, rather than discrete, replica symmetry. In the ordered regime, a nonlinear sigma model description may be used. Domain walls are smooth objects whose free energy cost decreases with their thickness, which in the case of interest is the temporal duration t of the evolution: see Fig. 31.
For this reason, we anticipate that the scaling in Eq. 178 applies to more general free fermion models with measurement. (The scaling of K(r) close to r c will depend on symmetries and dimensionality. The constant c may also depend on n in general.) General free fermion models can be formulated using the replica trick, in close analogy to replica sigma models for Anderson localization [120], leading to continuous replica symmetries. However in addition to the N → 0 limit familiar from localization, the N → 1 limit is now also of interest. We will discuss this elsewhere [87].
The timescale of order N for free fermions agrees with the recent results of Ref. [54], which studied a model in which measurements of a single fermionic mode were alternated with Gaussian unitaries acting on the entire system. This model has even less locality structure than the all-to-all circuit. In this limit also, the authors found that O(N 2 ) measurements were required to forget the initial state: this corresponds to t = O(N ) in our conventions.

VII. OUTLOOK
It remains an open question to what extent the properties of the MPT, in various settings, will turn out to be tractable (either analytically or numerically). In this paper, however, we have shown that exact results are possible in certain regimes. We close by summarizing the regimes we have studied, and some of the outstanding questions.
We began our analysis by considering the "classical limit" of the MPT in the all-to-all setting. We showed that a fairly complete picture is possible, including an analytical derivation of the critical point, critical exponents, and scaling forms for the entanglement.
Our results for quantum trees, including those obtained from an all-to-all circuit, show that exact results are also possible even far from this classical limit. In this setting it was possible to demonstrate that an entanglement transition occurs at a definite nonzero measurement rate that is distinct from the classical value. (It may even be possible to obtain rigorous results on the phase diagram using the recursion relation approach.) The critical scaling on the tree is qualitatively different from a simple percolation picture.
We argued that the critical point on the tree is the same as the critical point of the FMPT in the all-toall circuit (which is locally treelike). Since the location of the critical point in the circuit is difficult to check numerically, this equivalence has not yet been demonstrated clearly by our numerics. In the future we would like to have a clearer demonstration (or disproof) of this relationship between the tree and the all-to-all circuit. Our results based on the tree were also restricted to the FMPT; it would be interesting to understand to what extent they are relevant to the MPT.
The scaling on the tree raised several questions that we hope to return to elsewhere. First, it will be worthwhile to examine the relationship between the random recursion relation studied here and approaches to tree tensor networks based on replicas [65]. Second, we raised the question of whether there are multiple universality classes on the tree. This question remains to be settled, and could perhaps be addressed by generalizing our approach to a broader class of trees (with more general distributions of tensors or with larger bond dimension). Finally, it remains to be understood how to reconcile the scaling that we found on the tree with field theory.
In our numerical study of the MPT we have proposed observables that have benefits over the state entanglement, in that they do not require one to specify a spatial subregion. (Constructing such observables is crucial in the all-to-all setting, for which there is no meaningful distinction between area law and volume law phases, but they are also useful in 1+1D, where significant finite size effects make it important to avoid introducing lengthscales that are smaller than the system size.) We demonstrated numerically that there is a long timescale in the entangling phase over which some aspects of unitarity are retained; for example, two initially orthogonal states remain approximately orthogonal.
The optimal numerical protocol for studying critical properties in the all-to-all circuit remains to be settled. One complication is the lack of a priori knowledge of how the characteristic timescale scales with N when r = r c . In the 1+1D problem, establishing that the dynamical exponent is equal to unity [6,15] allows one to reduce the number of independent variables in scaling collapses by fixing t/L to a constant. Our candidate field theory for the MPT suggests that in high dimensions the appropriate scaling variable is t/N 1/5 , but it is unclear whether this theory applies to the all-to-all circuit.
The proximity of the classical critical point (r cl c = 0.8) to the quantum one (e.g. r c = 0.749 for the FMPT with Haar-random gates) in the ensembles we studied may also complicate the numerical analysis. For this reason it might be useful to study an all-to-all model (for example, involving weak measurements) in which the classical transition is eliminated entirely. It will also be interesting to relax the unitary invariance property of the gate distribution: the strong constraints imposed by this invariance are a surprising feature of our analysis of the quantum tree.
Finally, we discussed the replica approach to the MPT and to random tensor networks, both in the two phases and near the critical point, and we have made concrete proposals for field theories for these problems. The domain of applicability of these theories remains to be determined.
Appendix A: More on classical problem

Density of infinite cluster
Here we briefly derive Eq. 18, which describes the probability f ∞ that a given node in the interior of the classical graph is connected to an infinite number of other nodes in the limit of infinite N and T . In other words, f ∞ describes the density of the infinite cluster.
Consider the process of building a tree starting with an arbitrarily chosen node, as depicted in Fig. 4(c). The starting node has four possible edges, each of which may be severed by a measurement. If we denote by e ∞ the probability that following a given edge will lead to a subtree with an infinite number of nodes, then The quantity (1−e ∞ ) 4 denotes the probability that none of the four edges connected to the starting node leads to an infinite number of other nodes. Following a particular edge, one may next encounter either a measurement (with probability p = r/(2 − r)) or a node (with probability 1−p). The probability that this node is connected to an infinite number of other nodes at later generations is given by 1 − (1 − e ∞ ) 3 . Thus we can write a self-consistency relation for e ∞ , given by Near the critical point, p = 2/3 + δp, where δp = (25/18)δr and δr = r − r c 1. On the disconnected side of the transition, e ∞ = 0, while just on the connected side (small negative δr) 0 < e ∞ 1. Expanding Eq. A2 for small δr gives e ∞ −(25/6)δr. A similar expansion of Eq. A1 gives Eq. 18 of the main text.

Effective 1D field theory
We derive the mapping between the "layered Erdős-Rényi" percolation model and a one-dimensional field theory that was described in Sec. III B.
This is a bond percolation model with sites labelled (i, t) with i = 1, . . . , N and t = 1, . . . T . Generalizing slightly from the case in the text let a bond between sites (i, t) and (j, t) on the same time-slice be present with probability b/N , and a bond between sites (i, t) and (j, t + 1) on the next slice be present with probability b /2N . The average degree of a bulk node is z = b + b , and from considerations like those in Sec. III A the critical case is z = 1.
Bond percolation can be simply mapped to the Potts model with Q → 1 states (see Ref. [76] for a review). We introduce a Potts spin σ(i, t) = 1, . . . , Q on each site (i, t), and couplings for pairs of sites that are allowed to be connected by a bond. For each pair of spins that is allowed to be connected there is a term in the Boltzmann weight, where p is the bond probability. The two terms correspond, in a diagrammatic expansion, to the presence and absence of the bond, respectively. Sites in the same percolation cluster have the same Potts spin state because of the Kronecker deltas on the bonds. Summing over spin states gives a factor of Q #clusters which becomes 1 in the replica limit. Spin correlation functions can be used to diagnose connectivity. The probability that two sites (i, t) and (j, t ) are in the same cluster is [76] p conn (i, t; j, t ) = lim Below, the limit Q → 1 will be left implicit.
Using the fact that the bond probabilities are of order 1/N 1, the partition function may be written As is standard in the field theory formulation of the Potts model [74,75], it is convenient to use a set of (Q as we see by considering ( σ e σ µ e σ ν )e τ ν and applying (A6). Writing e(i, t) = e σ(i,t) , and denoting the sum of the spins in a layer by the partition function is (we drop an unimportant multiplicative constant) We can use two sets of Hubbard-Stratonovich fields, one set located at half-integer times, denoted f t+1/2 , to decouple the b term, and one set located at integer times, denoted g t , to decouple the b − b term. Each has Q − 1 components. Once the E appear linearly in the exponent we can sum over the spins in a given timeslice t (the prime indicates that the sum is only over these spins) via which defines V (y). Expanding in y for small y and using the identities mentioned above for the set of vectors {e σ }, The tensor d is [74,75] After integrating out the spins, where the final sum is over integer t. For 2 ≤ t ≤ T − 1, At the boundaries we have e.g.
For the present we will neglect the boundary terms. The boundary condition on the field theory is important but we will fix it on physical grounds. The negative power of N in y will allow us to truncate the action at cubic order. Let us combine f and g into a field h labelled by both integer and half-integer values, h t = g t , h t+1/2 = f t+1/2 . The lattice field theory is then (with τ, τ ∈ Z/2) where A 3 contains the cubic terms. To avoid clutter, let us immediately set Q = 1 in the dispersion relation. The matrix T is then (the first row/column shown correspond to a half-odd-integer index value): We have defined This becomes imaginary when b > b -which includes the line b = 0 on which we do simulations of this model -but this does not present a problem in the formal derivation below. 41 Let us write where z is the mean degree of a site, and the location of the critical point is z = 1 for any value of ∆. The dispersion relation has one "massive" mode, and one mode that becomes massless at the critical point z = 1, at frequency ω = 0, with the eigenvalue of T being (1 − z) + z−∆ 4 ω 2 + O(ω 4 ). At z = 1 and ω = 0 the eigenvector of this mode is (g, f ) ∝ ( √ ∆, √ 1 − ∆). For the low-frequency theory we make the coefficient of this mode a slowly-varying field, φ(t). Let us write for the parameter that vanishes at the phase transition. Let us drop the small parameter δz except in the mass term, where it is the leading factor: (A21) Thanks to the small prefactor 1/ √ N of the cubic term, we may take the continuum limit in a controlled manner. The cubic term is negligible for frequencies ω of order 1 due to the small prefactor, but important at parametrically small frequencies (since it is RG relevant). Since only small frequencies are important we can simply insert the form of the low-lying mode at k = 0 into the cubic term without any need to explicitly integrate out high-frequency modes. The final result is with the "Lagrangian" (again a factor of Q has been set to 1 in the denominator of the final term) Above, all the (Q−1) components of the field φ are independent. We can write a more explicit form at the cost of using Q fields that obey a linear constraint (summing to zero). For notational convenience we write them as the components of a Q×Q diagonal matrix with components The constraint is tracelessness Eq. A23 becomes This is the result given in the text, in special case b = 0 (i.e. ∆ = −z −1 close to the critical point).
Close to the critical point, the connectedness correlation function for sites at distinct times is where φ 1 is an arbitrarily chosen component. We can also write this as In App. A 4 we present results for the connectedness correlation functions of boundary points. Since the boundary conditions on the Potts spins are free, this corresponds to the "ordinary" surface transition (discussed for percolation in Refs. [121][122][123]) where the boundary spin operator is ∝ ∂ t Φ in the continuum theory rather than Φ as in the previous equation [76]. This gives the scaling forms in App. A 4.
Finally let us consider the percolation probability P perc . This can be used to define a characteristic timescale t * (r, N ) for the classical problem, and it is much simpler to formulate in field theory than the minimal cut cost. As discussed in Sec. III E, all this carries over to finite spatial dimensions d > 5 by setting N = L d . P perc is equal to 1 − e −∆F , where ∆F is the free energy cost of imposing twisted boundary conditions 42 on the Potts spins [78].
The scaling form is 42 Translating these BCs into the continuum field theory Eq. A30 gives boundary magnetic field terms in Eq. A23 of the form h δ(t) eσ.φ(t) (and similarly at the final time boundary) with h ∝ N 1/2 . At first sight one worries that the N -dependence of h introduces another exponent that could appear in scaling forms. However we believe that the basic point is just that h diverges with N , so that the asymptotic scaling forms are those of the h → ∞ limit and the detailed N dependence of h determines only subleading corrections.
where t is now the total time, and we have used the notation r for the parameter driving the transition to match the circuit. Let us consider a few different regimes. By a rescaling of the field and the time coordinate we can choose to write the action in the form (suppressing order-one constants) This rewriting suggests that if we take the limit of large N and t (and small δr) in such a way that the scaling variable δr t 2 is fixed while N/t 5 becomes large, ∆F is given by a saddle point action (we will not try to make this precise in this replica theory), In particular, at the critical point Note that for a system with finite d > 5, unlike the case d < 5, P perc is parametrically close to 1 at the critical point of a system with t ∼ L, i.e. with t ∼ N 1/d . This is because, for percolation above the upper critical dimension, there are many percolating clusters in a large hypercubic sample at p c [124][125][126][127][128][129].
Next, let us take N large with δr < 0 small but fixed, in order to examine the exponential growth of the timescale with N inside the percolating phase. By an alternative rescaling of the field, Assuming again that we can make an analogy with saddle-point solutions in more conventional theories with discrete symmetry, we anticipate a localized domain wall or "instanton" solution interpolating between the two boundary condition values of the spin, with a classical action N ×c(r). The scaling form will require c(r) ∼ (δr) 5/2 close to the critical point. At sufficiently early times there is at most one such instanton, which can be placed at any time in between 0 and t: The δr dependence of the prefactor has been fixed by requiring consistency with the scaling form. 43 Therefore the plateau at P perc 1 lasts for an exponentially long time The interpretation is just that the probability of having a disconnection event at a given time is p break ∼ |δr| 1/2 e −const.(δr) 5/2 . Since the probabilities of such events are independent, at long times we have exponential decay of P perc , with a timescale also given by Eq. A35.

Criticality in layered Erdős-Rényi graphs
As introduced in Sec. III B, the layered Erdős-Rényi model is a simplification of the classical random graph depicted in Fig. 4(b), in which a large number N of nodes are arranged in discrete layers with time index t. Each node may be connected only to nodes in adjacent time layers; there are no connections between nodes within the same time layer. Edges between time layers t and t+1 are randomly-chosen, such that a total number of edges cN are created between adjacent layers. The connectivity c is the major parameter of the model (c is equal to b /2, in the notation of Sec. III B), and plays a similar role as the complement of the measurement rate, 1 − r. The critical value of c is c crit = 1/2, since a given node at time t has connections to both t − 1 and t + 1 and its expected number of total connections is 2c.
Since the layered Erdős-Rényi description is the basis for the theoretical derivation of the scaling forms in in Sec. III B, we numerically simulate the layered Erdős-Rényi graph to verify these scaling forms and ensure that they yield the same behavior as the data presented in Figs. 5 and 6 for the full classical graph. Figure 32 shows the percolation probability at the critical point, plotted as a function of time for different system sizes N . A good scaling is observed as a function of the variable t/N 1/5 , as suggested by Eq. 19. At a fixed value of c and N , the percolation probability P perc is observed to decay exponentially with time. As shown in Fig. 33, near the critical point the scaled exponential decay time τ (c, N )/N 1/5 is a function only of the variable (c − c crit )N 2/5 . This is consistent with the scaling forms in Eq. 19.

Two-point correlation functions
In Sec. III C we showed that the probability of percolation in the classical graph can be described in terms of the scaling variables t/N 1/5 and (r − r c )N 2/5 (Eq. 19). Various correlation functions can also be understood in terms of these same scaling variables. The simplest correlation function, which we denote by C, is the probability that two distinct nodes in the graph belong to the same cluster. The nodes may be on a temporal boundary (either the same boundary or different ones) or in the bulk of the graph.
In the Potts language, the correlator is the spin twopoint function. The bulk and boundary operators have different scaling dimensions, with the former scaling like N −2/5 (or equivalently like t −2 ) and the latter like N −3/5 (or t −3 ); see Sec. III E and App. A 2.
As a result of this scaling, the probability C oppo that two nodes on opposite temporal boundaries are connected by a cluster has the form C oppo (T, N ) = 1 N 6/5 F oppo T /N 1/5 , δrN 2/5 . (A36) In Fig. 34 we test this scaling for case r = r c , in order to confirm the theoretical value for the operator's scaling dimension. This data is for the simplified multi-layer Erdős-Rényi model described in Sec. III B. The probability C same for two sites on the same temporal boundary to be connected has, in addition to the scaling term, a non-critical contribution of order 1/N which is in fact dominant at r c . This 1/N factor is on the order of the probability for the two sites to be connected by a "microscopic" path (for example by a single bond). For simplicity, consider the limit T → ∞, when only one of the arguments of the scaling function remains: The noncritical term may be eliminated by a subtraction: C same (N ) = C same (N ) − 2 −1 C same (N/2). Fig. 35 demonstrates a reasonable scaling collapse for this quantity. This plot constitutes a second check that δrN 2/5 is the appropriate off-critical scaling variable. If r r c is fixed and N → ∞, Eq. A37 shows that C same (N ) scales like (r c − r) 3 ; this is the square of the surface order parameter (the probability that a boundary site lies in the infinite cluster), which is parametrically smaller than the bulk order parameter (which scales as in Eq. 18) when r c is small. In the language of surface critical phenomena, this is the "ordinary" transition [76,[121][122][123].
As an aside, let us make a distinction between the correlation function C same above and the mutual information between spins in the final state. C same indicates whether in the final state two spins lie in the same con- nected tensor network. However, being in the same connected component does not imply that spins' mutual information, which can be detected with appropriate physical two-point functions, is large [6]. The zeroth Rényi mutual information I 0 is given by a different classical correlation function to the one above, which in the finite-dimensional problem behaves as a power law at r c [6]. A related observable is the distribution of entanglement entropy S 0 for a single spin [17]. These observables again map to boundary correlation functions of the Potts spins, 44 but in a Potts system with a magnetized boundary condition, rather than with the free boundary conditions used above for C same . (In the classification of surface criticality this is the "extraordinary" transition [132,133].) The critical contribution to these quantities at r c is smaller than a trivial analytic contribution similar to that mentioned above, and we have not been able to see it numerically. 44 S 0 for a single spin is either 0 or 1 bits, with the former holding if the spin is connected to no other spins on the final time boundary. This may be written as a one-point function of the spin operator in a system where other boundary spins are fixed. I 0 for two spins is either 0, 1 or 2 bits, and is given by a minimal cut formula.
If we assume that the probabilities for both I 0 = 1 and I 0 = 2 have the same scaling form (this can be demonstrated in 1+1D [6]) then we can focus on the simpler case I 0 = 2, which occurs only if the two spins are connected to each other, but not to any other spins on the final time boundary. This is the two-point function of the same Potts operator.

Extrapolating min-cut tension to N → ∞
As mentioned in Sec. III D, the behavior of the classical minimum cut value S 0 (t, N ) within the percolating phase, r < r c , has the functional form at large N , and for all times t that are larger than an initial transient but short enough to satisfy ln t ln τ . Here, d(t, r) is a constant for fixed t and r and τ denotes the decay time of the percolation probability; τ grows exponentially with N (see Fig. 6). An example of this scaling is shown in Fig. 36. One can estimate the value of the minimum cut per spin, s(r), by extrapolating this relation to 1/ √ N = 0. As illustrated in Fig. 36, this extrapolated value is relatively insensitive to the time t, so long as t is larger than a short-time transient (t 5 is sufficient for the r-values plotted here and in Fig. 7) and shorter than τ .
The data in Fig. 7 corresponds to t = 10, and comes from an extrapolation using system sizes N = 100, 200, 400, 600, and 800. As shown in Fig. 36, performing an extrapolation at t = 20 yields essentially identical results. The extrapolation procedure becomes numerically difficult at very small r c − r, since the decay time τ becomes short for all but very large system sizes. The extrapolated data in Fig. 7 is therefore limited to r c − r ≥ 0.02. phases by the operator entanglement between initial and final times. In the nontrivial phase the RP n−1 field is "ordered" (on timescales much shorter than τ below) and so we use a nonlinear sigma model formulation for an n × n matrix Q ab that parameterizes RP n−1 . This has the form S = K dt tr(∂ t Q) 2 , with a stiffness that scales as close to the transition on the nontrivial side. A rescaling of t shows that this leads to a characteristic timescale of order For t τ , the simple one-dimensional field theory has exponentially decaying correlations, and the operator entanglement (computed from the cost of imposing twisted boundary conditions between the initial and final time [137]) decays exponentially. For t τ , the operator entanglement is extensive in N and scales as S ∼ δr 2 N/t.
Note the large difference between the timescale in Eq. B2, which is linear in N , and the timescale in the entangled phase (which is accessed in more generic dynamics) that is exponentially large in N . The model outlined in this Appendix has only pairwise Majorana correlations, and a very restricted entanglement structure. However this particular feature of the present problem is likely to carry over to larger set of models involving unitaries and measurements or projections for free fermions [52,117], since the key feature is having continuous, rather than discrete, replica symmetry [117,120].
Appendix C: Calculations for quantum tree

Numerical recursion for quantum tree
In this section we briefly describe the numerical procedure used to simulate the quantum tree. Due to the single-site Haar rotations the Schmidt basis of the tree Eq. 55 becomes uniformly distributed and we can thus characterize the wave function using only the two Schmidt values (that is, using a single real positive number between 0 and 1/2, which is the minimal Schmidt value squared Z = λ 2 min ). The recursive procedure to generate the tree at generation k + 1 consists of using three singular values at generation k, Z k , Z k and Z k and connecting them to a node using Eq. 56. Thus, the number of eigenvalues required to describe a certain instance of the tree exactly grows exponentially as 3 k , which is clearly not simulable at large k. Here we take advantage of the fact that in the case of the FMPT the nodes of the tree are statistically independent. Thus, at a certain level k we can generate a large constant pool of N singular values, where 1 N 3 k . Assuming the pool spans the distribution function of Z k faithfully we can then draw randomly three singular values from this pool to generate a member in the pool of the next generation. This is known as the "pool method" [81][82][83].
To verify that the pool spans the distribution of Z k faithfully we test the convergence of the evolution of Z k with the generation number k as a function of N . It is known that convergence in N can be very slow for the pool method [81,82]. For example, in Fig. 37 we present Z typ k at ln(r c − r) = −5.3 as a function of k for different values of N , which is the point closest to the critical point in Fig. 14. The origin of the strong N -dependence lies in the exponent λ, which approaches 1/2 at the transition causing the distribution of Z to become broad. Upon tuning farther away from the critical point the minimal k and N required for convergence is found to decay rapidly (not shown in the figure).
Finally we also note that due to the forced measurements the distribution function of the singular values has a delta function at Z = 0 with a known prefactor. Namely, the probability of a singular value at generation k to be exactly zero is given by the recursive relation In principle we can keep these zeros in our pool. However, this is highly inefficient, especially when p starts to get close to the classical transition, where f ∞ (p) = −10 0 10 ln A i 10 −5 10 −3 38. The distribution function of the parameters Vj ≡ ln Aj for a Haar-random two-site unitary (Left) and the ensemble in Eqs. 39, 38 with ∆t = 0.3. The data here is collected from N = 10 7 random unitaries.

Averages of tree recursion constants
In Sec. IV E we described the recursion relation for the singular values of the tree. The linearised recursion relation involves the random multiplicative constants in Eq. IV E, which we repeat for convenience: In this Appendix we derive the facts about the distribution of these quantities that were given in Sec. IV E. Some of these facts also hold for general node tensors, not necessarily expressed in terms of unitaries. As in the text, we assume that the distribution of U is invariant under multiplication by single-site U(2) matrices on any of its four legs. Initially however we do not assume that it is invariant under 2-site unitaries, i.e. we do not assume that U is Haar-distributed in U(4).
First we show that, so long as the unitary U is nontrivially entangling (defined below) with probability 1, the average of any of the above quantities is exactly equal to one, and also The argument is the same for any of the three A i , so consider A 3 for definiteness. The argument only relies on the property of U(2) invariance on a leg mentioned in Sec. IV B (together with the assumption that certain singular values are not fine-tuned to zero), so they hold for more general choices of t satisfying this requirement. (We could also consider unitaries acting on more sites/trees with a larger branching number.) The expression for A 3 involves only the matrix elements U ad 11 for a, d = 1, 2. Regarding this as a 2 × 2 matrix with row index a and column index d, we make a singular value decomposition, with positive singular values η 1 and η 2 : Here, w and v are U(2) matrices. Now we note that, for any given U , the singular values of the tree we are considering are invariant under unitary basis transformations for the bond at the top of the tree. This implies that A 3 must be invariant if U ad 11 is multiplied by an arbitrary single-site unitary acting on the a index (see Eq. 40). We choose this unitary to be the inverse of w, so that U ad 11 is replaced in Eq. C2 by Together with | det v| 2 = 1, this gives the expression For some trivial, nonentangling two-site unitaries, such as the identity or swap, one of the singular values η is exactly zero, and A 3 vanishes. We assume that the distribution of U is such that, with probability 1, both singular values η are nonzero. This is our definition of "nontrivially entangling" above. The above expression involves a single column, v a1 , of the U(2) matrix v ad . Since we assumed that the distribution of U is invariant under single-site rotations, v ad is Haar-distributed, and v a1 is just a unit vector (with two complex or four real components) that must be averaged uniformly over the sphere S 3 .
This can be done in a standard way by relating the average over the sphere to a Gaussian average. Let us write the four real components of the unit vector (v 11 , v 21 ) as V = (w, x, y, z). If . . . µ is the Gaussian average with weight proportional to e −µV 2 , then as we see by splitting the Gaussian integral on the LHS into radial and angular parts. The latter gives the integral over a sphere of fixed radius, which is the last expression on the RHS. We are interested in . . . |V |=1 , the average over the unit sphere. The Gaussian average of A 3 , which is of order |V | −4 , diverges at small R, so we instead first consider f (µ) ≡ e −(η 2 1 (w 2 +x 2 )+η 2 2 (y 2 +z 2 )) µ = µ 2 (µ + η 2 1 )(µ + η 2 2 ) . Using Eq. C8 we may alternately write f (µ) as an average over the sphere. By scaling out a factor of R from the components of V we obtain an average over the unit sphere, and performing the R integral gives .
Equating the expressions for f (µ), and taking the limit µ → 0, the U(2) Haar average is where we have restored the previous notation for the vector. Plugging this into Eq. C7 gives A 3 = 1, as stated above, regardless of the precise distribution of η.
The same argument applies for A 1 and A 2 , using the appropriate singular value decomposition. Note that each of the A i involves only a subset of the components of U , so that it effectively reduces to a 2 × 2 matrix, as above for the matrix U ad 11 . That is, in each case two of the four legs of U are set to index value "1".
By multiplying f (µ) with µ a and integrating over µ, we find with H = η 2 1 /η 2 2 . The remaining average on the RHS is over these singular values, again for the appropriate singular value decomposition of U .
Differentiating with respect to λ at λ = 1/2 gives Eq. C4, irrespective of the distribution of H.
While Eqs. C3, C4 simplified, more general moments depend on the detailed distribution of U . For the 2site Haar case we may write analytical formulas for A λ i (given in Eqs. 63, 64 of the main text).
First consider A 3 , as above. This simplifies because, for a Haar-distributed unitary, U ad 11 can be viewed as a normalized and uniformly random vector in a Hilbert space of dimension 2 × 2 (a Page-random state). We are interested in the singular values when this state is split into two equal subsystems. Writing s i = η 2 i , this distribution is [138], for 0 < H < ∞. Applying this to (C10) gives This is finite for λ ∈ (−1, 2). By the right-invariance of the Haar measure, the distribution of A 1 is the same as that of A 2 . We must consider the singular values for a decomposition of the matrix U a1 b1 , which is the upper left 2 × 2 block of a 4 × 4 Haar matrix. The distribution of singular values for such a sub-block of a Haar unitary may be found in Ref. [139]. Writing again s i = η 2 i , P (s 1 , s 2 ) = 6(s 1 − s 2 ) 2 (C12) with the constraint 0 < s i < 1 but, unlike in the previous case, no constraint on s 1 + s 2 . Since there is a relabelling symmetry under s 1 ↔ s 2 , or equivalently under H ↔ 1/H, we may insist 0 < H < 1. Then Applying this to (C10) gives as stated in the text. Again this is finite for −1 < λ < 2.
Finally, let us discuss the asymptotics of the distributions of the A i . This will clarify the following point. In the main text we described an ensemble of 2-site unitaries with an entangling strength parameterised by ∆t. In the limit of small ∆t, these unitaries become closer and closer to the identity. For the identity, A 2 = A 3 = 0 exactly. But we have shown above that, for any nonzero value of ∆t, no matter how small, A 2 = A 3 = 1. Therefore the limit ∆t → 0 does not commute with the average over unitaries. This is because the distribution of A i develops a long tail when ∆t becomes small.
Define the 2-component vectors Then we may write Eqs. C2 as We see that A i can become arbitrarily large if |φ| becomes small, with A i scaling like |φ| −2 in this limit. Since φ has two complex (or four real) components, we expect that for a generic distribution of unitaries, the cumulative probability distribution of |φ| scales like |φ| 4 at small |φ|. This gives at large A i , as stated in the text. At small A, similar considerations for the numerators in Eq. C2 show that generically These power laws are consistent with numerics and also with the fact that the moments A λ i for the Haar case diverge at λ = 2 and at λ = −1. The A i are of course correlated, but we do not consider their joint distribution here. Now consider the case of a weakly entangling unitary with random single-site scramblers, with ∆t small but fixed. (H may either be fixed, as in an ensemble discussed in the main text, or random.) We focus on A 2 and A 3 , whose distributions become broad at small ∆t (that of A 1 does not). The two cases are similar, so consider A 3 , which is given by Eq. C7. We expect that for small ∆t we typically have η 2 1 ∼ ∆t 2 (here we keep only the scaling with ∆t). Therefore so long as |v 22 | 2 ∆t 2 , i.e. in the regime A 3 ∆t −2 , we have We will check below that for 1 < ∆ < 2, there is a range of x where this expansion is valid, i.e. where the final term in (C31) can indeed be neglected. According to this expansion, as we increase −x, the value of R begins to increase significantly from −1 once This suggests that the front is at x f ∼ − 1 ∆−1 ln 1 σ , and that here we can match onto the stationary solution of the travelling wave equation Eq. C25 with σ = 0 and with a spatially constant diffusion coefficient. (This matching makes sense since the forward part of this solution has R = −1 when σ = 0.) However, we must check the self-consistency of our neglect of the final term in Eq. C31. Assuming the above scaling for x f , we have for x x f . If 1 < ∆ < 2, this term is indeed much smaller than the RHS of Eq. C32 for x x f . Therefore for this range of ∆ the above analysis, giving is self-consistent, though not rigorous. On the other hand, for ∆ > 2, when the power of σ in Eq. C36 is negative, this term cannot be dropped from Eq. C31 for x x f . Numerically solving Eq. C31 suggests that instead the final term in Eq. C31 contributes at leading order if we fix x and take σ → 0. Indeed, the alternative would be to have some x c , with x f x c 0, such that the term is negligible for x x c but not for x f < x < x c , and this may be seen to be inconsistent by examining the ratio of this term to the right hand side of Eq. C32. Using this fact, that H/σ should be of order 1 when x ∼ 0, and assuming that H ∼ e −(x−x f ) for x x f , we find that Eqs. C37, C38 give the power laws Z typ ∼ σ 1/(∆−1) and Z typ ∼ σ stated in Sec. IV I. These power laws can be checked directly by making a numerical solution of Eq. C25 for H(x, τ ) at different values of ∆ and a. We use a numeric differential equation solver, solved over a wide domain of discrete values x ∈ (x L , x R ), with boundary conditions such that H(x L ) = 1 and H(x R ) = 0. These are exponentially close in x L , x R respectively to the true values for the solution on the infinite domain. An initial guess is used for H(x, 0) and then evolved until a very long time τ = τ f in order to arrive at the steady-state solution, from which we can read off the position of the front. We define this as the value of x = x front such that H(x front , τ f ) = 1/2. For the data presented in Fig. 17, x L = −70, x R = 40, τ f = 10 8 , and the domain of x is discretized into 8001 points.
Making a linear fit of x front against ln σ for a given value of ∆, and recalling x front ∼ ln Z typ , allows one to determine the value of the exponent γ, defined by Z typ ∝ σ γ . We make this linear fit over the range of σ such that −10 < ln σ < −6. Smaller σ requires very high numerical accuracy (a dense discretization of the domain of x), while at larger σ the critical behavior may not be apparent. The results are shown in the main text in Fig. 17. The error bars in this figure are defined by the difference in slope obtained from fits using only the left half of this range, −10 < ln σ < −8, as compared to only the right half of the range, −8 < ln σ < −6.

Minimal cut formula on tree
Assume that the minimal cut chops out m subtrees from the full tree, with each subtree being cut only once, at its apex. The singular values of a given subtree a ∈ {1, . . . , m} are {λ (C42) Each term is associated with a bond lying on the minimal cut. Let the height of this bond above the base be k(a).
For a given term, the coefficient in brackets will vanish if the states e a and e 0 become parallel. Each of these is a state in the "rest" Hilbert space, given by the tensor network made up of the tensors on one side of the minimal cut. The bonds lying on the minimal cut have fixed states attached to them: one of these is changed in going from e 0 to e 1 . Exploiting the fact that these truncated tensor networks are still trees, we can write e 0 and e 1 in terms of the singular value decompositions of two subtrees. We see that the the term in brackets is of the same order as the singular value for a tree of depth k(a). (This assumes that the singular values are not growing with k: this can occur if we fine-tune the boundary conditions, but is not relevant to the case we are discussing.) Therefore in the present limit of small Z we confirm the minimal cut conjecture in the main text, according to which each bond a on the minimal cut contributes an entanglement of order Z Imagine dividing up the circuit up into blocks of temporal duration ∆t, which corresponds to writing the time evolution operator V (t) as a product of random matrices W i , with each matrix of size 2 N ×2 N and i = 1, . . . , t/∆t. ∆t is chosen to be much larger than 1 but much smaller than the timescale τ that will emerge below. Each block has a singular value decomposition W i = U (1) i DU (2) i † . As a toy model, we will treat the unitaries U (1) and U (2) as Haar random (neglecting locality) and we will make the simplest choice of D that yields a given value of S 1 = sN , which is a flat entanglement spectrum: I B×B 0 0 0 , B = e sN , 0 < s < ln 2.
Note that the nonzero block is a small fraction of the size of the matrix for large N . Finally, we will make an uncontrolled simplification by also treating the entanglement spectrum of V (t) as flat: Up to a normalization factor, V is just a rectangular block, of size B(t) × B, taken from a Haar unitary of exponentially larger size (2 N × 2 N ). Correlations between unitary matrix elements become weaker as the size of the matrix increases, so we expect that we can treat them as Gaussian, with E U ab = 0 and E U ab (U a b ) * = 1 2 N δ aa δ bb . (Higher cumulants are suppressed by powers of 2 N .) The singular-values-squared of V , denoted v i = η 2 i , are eigenvalues of the B(t) × B(t) matrix When the matrix elements are of V are complex Gaussian random numbers, this is as a Wishart random matrix (see e.g. Ref. [138] for an application in a related context). The distribution of its eigenvalues depends on B as well as on B(t). We assume that 1 B(t) B. Normalizing the matrix so the s i sum to one, the eigenvalue density for V V † is the Marcenko-Pastur distribution,

B B(t) .
From this distribution the operator Rényi entropies can be calculated as As expected, the entanglement spectrum does not remain flat. In our crude approximation, however, we neglect this, and apply the above transformation iteratively, so that in the continuum limit for times ∆t: Here C is an order-1 constant. At times larger than ∆t, but short enough such that S(t) 1 (which we assumed above), this equation gives a solution: independently of the value of C. Note that we have neglected random fluctuations (see Sec. VI I). This analysis suggests a characteristic timescale τ with ln τ sN . We may verify this dependence directly in the opposite limit of asymptotically late times, where (as usual for a product of random matrices) there is a separation of scale between the largest singular value, the second largest, and so on. This separation allows us to consider only the two largest singular values. At a given time t, let them be normalized as Then in place of M in Eq. E3 we have a 2 × 2 matrix Each of the elements of M ik is a different sum of many random variables, so we assume that M can be approximated as Gaussian: After absorbing a normalization constant into M , where a, b, β have mean zero and a 2 = b 2 = |β| 2 = 1. The new (small) singular value squared is Let us study the typical value of this exponentially small quantity, defined by ( 2 new ) typ = exp ln 2 new . We have Here the average is taken over a, b, β. Note that the leading fluctuation term, of order 1/ √ B, averages to zero. The next term, however, gives a negative drift under the recursion. Recalling that B = e sN , and applying this map iteratively with each added block, Thus, the main conclusion is that in this toy model sN sets both the value of the early time plateau and also the timescale for the late-time exponential decay, in agreement with the picture from the replica treatment.
Appendix F: Field theory: further details 1. N → 1 ordered phase In Sec. VI H we stated that for µ 2 < 0 the field theory of Sec. VI E has an ordered phase with X ab = f (δ ab − 1/N ) + W ab , where f = −µ 2 3g N N −2 is the order parameter, and W ab represents fluctuations around the saddle-point value whose quadratic Lagrangian is (F1) We would like to check that if we compute the masses of the fluctuation modes using this expression, and then take N → 1, these masses remain positive. Viewing W ab as a vector, the term a W 2 aa is W.M.W where the matrix M ab,cd (with row index a, b and column index c, d) is 1 if a = b = c = d and zero otherwise. We want the eigenvalues of M when projected onto the subspace of W satisfying a W ab = 0 and b W ab = 0, i.e. the eigenvalues of M ab,cd = P aa P bb M a b ,c d P c c P d d where P is the projector P aa = δ aa − 1/N . Since M is nonzero only when its four indices are equal, drawing a diagram shows that tr M k = tr S k , where S a,b = (1 − 2/N )δ ab + 1/N 2 . This gives the nonzero eigenvalues of M as (N − 1)/N → 0 with multiplicity 1 and (N − 2)/N → −1 with multiplicity N − 1. Altogether, the eigenvalues of the matrix (I − 2 M ) which appears in (F1) are either 1 or 3 in the limit and are positive. Here we show that for N > 0 (for example in the replica limit N → 1, but not in the limit N → 0), and in the vicinity of the critical point r = 0, the theory for a matrix X with vanishing row and column sums, by discarding massive modes. First, shifting the field by Y ab → Y ab + c with c = −r/(2N m 2 F ) + O(r 2 ) removes the linear term and generates a mass term. The quadratic part of the Lagrangian is then L 2 = 1 2 ab,cd Y ab (k 2 + µ 2 )δ ac δ bd + m 2 F F ab,cd Y cd (F4) with µ 2 = −3gr/(N m 2 F ) + O(r 2 ). As a matrix, F = I ⊗ E + E ⊗ I, where E is the N × N matrix with unit elements: E ab = 1, so the matrix appearing in the brackets in Eq. F4 is This can be decomposed in terms of the projection matrices P 1 ≡ N −1 E and P N −1 ≡ I − P 1 as: (k 2 + µ 2 ) (P N −1 ⊗ P N −1 ) (F6) + (k 2 + µ 2 + m 2 F N ) (P 1 ⊗ P N −1 + P N −1 ⊗ P 1 ) (F7) + (k 2 + µ 2 + 2m 2 F N ) (P 1 ⊗ P 1 ) .
This is a decomposition into three representations of G N of dimensions (N − 1) 2 , 2(N − 1), and 1. We see that, at the critical point (where r and therefore µ 2 vanish) the second and the third representations remain massive and only the first becomes massless. Retaining only this representation is equivalent to fixing the row and column sums of Y to zero. Doing so and renaming the resulting field X yields precisely Eq. F3. More precisely the two massive representations ought to be integrated out, renormalizing the values of the couplings in Eq. F3. These manipulations manifestly require N > 0: for example they are appropriate for the replica limit N → 1 relevant to the MPT. The replica limit N → 0 must be handled separately. where the extra "1" in the third line is in the ith place. There are N + 1 of these vectors but the v i are not linearly independent: i>1 v i = 0. We use these vectors to rewrite Y ab in terms of y αβ : In this appendix we use a, b, a , b , . . . to denote indices that run from 1 to N , and α, β, . . . to denote indices that take the N + 1 values {+, −, 2, . . . , N }. We will use i, j, k to denote indices that run only over 2 to N . Note that The final 1 in the second line is in (i + 1)st place, which in our labelling convention corresponds to the component x i β with β = i. We have α x a α v α a = δ a a , so that Explicitly (for j, k > 1): Y 11 = (y ++ + y +− + y −+ + y −− ) , (F17) Y 1k = (y ++ − y +− + y −+ − y −− ) + (y +k + y −k ), (F18) Y j1 = (y ++ + y +− − y −+ − y −− ) + (y j+ + y j− ), (F19) Y jk = (y ++ − y +− − y −+ + y −− ) (F20) + (y j+ − y j− ) + (y +k − y −k ) + y jk . (F21) Inserting these relations into the derivative term, where the final line contains terms whose coefficients contain an explicit factor of N , which we assume can be neglected in the limit N → 0. Next consider the F term in the Lagrangian, which has the form Since the coefficient of the linear term is zero at the critical point, we assign "engineering" dimensions to the fields such that the quadratic terms in (F22) and (F24) are marginal. Denoting the spacetime dimension by D, the inverse length dimensions of the fields are, in order of increasing scaling dimension, so that if the number of "+" indices for a component of y is n + , and the number of − indices is n − , the engineering dimension is x(n + , n − ) = (D − 2) − 2(n + − n − ) 2 . (F31) This formula implies that, at a given order in y, terms with the largest number of + indices and the smallest number of − indices are most relevant. Now consider the additional terms in the Lagrangian beyond those in Eqs. F22, F25, F24, order by order in Y .
G N symmetry does not allow any linear terms other than (F25). At quadratic order the remaining possibilities are ab Y 2 ab and ( ab Y ab ) 2 . The former is redundant -it can be cancelled by a shift in Y because of the presence of ab Y 3 ab . We will also check this below in the new parameterization. The latter, ( ab Y ab ) 2 , is prortional to (y −− ) 2 by Eq. F25, so is less relevant than the terms on the RHS of Eq. F24 and can be neglected.
Cubic terms are obtained by contracting indices in i.e. by setting some indices equal to others and summing. Left indices may only be set equal to other left indices, and similarly for right indices. This allows three different types of index contraction: an index (e.g. a) can be summed without being set equal to any other index; two indices can be set equal and then summed (e.g. a = a ); or three indices can be set equal and summed (a = a = a ). When we rewrite the contracted expression in terms of y αβ y α β y α β , the single, double, and triple index contractions lead to contractions with the tensors αα = a x a α x a α , d αα α = a x a α x a α x a α , respectively, for the indices α, α , α ∈ {+, −, 2, . . . , N }. For each of these, we may check the maximal value of the difference in the number of + and − indices, that may appear on the right hand side. (∆ should not be confused with a scaling dimension.) Since the difference in the total number of + and − indices is what determines the engineering dimension of a field or a product of fields (Eq. F31), identifying ∆ max for each type of index contraction allows us to say which types of contraction will give the most relevant cubic terms: they are those for which the sum of ∆ max , over all contractions, is largest. Explicitly, d (1) α = (N, 2 − N, 1, . . . , 1) α .
(F46) (Repeated j or k indices are summed from 2 to N .) All other contractions of Y Y Y , such as aa bb Y ab Y ab Y a b , give terms that are strictly less relevant according to the engineering dimensions. Let us check that, having dropped less relevant terms from the Langrangian in the process of rewriting it in terms of y, the coupling of the quadratic term 1 4 ab Y 2 ab = (F47) 2(y +− y −+ + y ++ y −− ) + y j+ y j− + y +k y −k + y jk y jk 4 ,