Theory of the many-body localization transition in one dimensional systems

We formulate a theory of the many-body localization transition based on a novel real space renormalization group (RG) approach. The results of this theory are corroborated and intuitively explained with a phenomenological effective description of the critical point and of the"badly conducting"state found near the critical point on the delocalized side. The theory leads to the following sharp predictions: (i) The delocalized state established near the transition is a Griffiths phase, which exhibits sub-diffusive transport of conserved quantities and sub-ballistic spreading of entanglement. The anomalous diffusion exponent $\alpha<1/2$ vanishes continuously at the critical point. The system does thermalize in this Griffiths phase. (ii) The many-body localization transition is controlled by a new kind of infinite randomness RG fixed point, where the broadly distributed scaling variable is closely related to the eigenstate entanglement entropy. Dynamically, the entanglement grows as $\sim\log t$ at the critical point, as it also does in the localized phase. (iii) In the vicinity of the critical point the ratio of the entanglement entropy to the thermal entropy, and its variance (and in fact all moments) are scaling functions of $L/\xi$, where $L$ is the length of the system and $\xi$ is the correlation length, which has a power-law divergence at the critical point.


I. INTRODUCTION
Anderson had postulated, already in his original paper on localization, that closed many-body systems undergoing time evolution would not come to thermal equilibrium if subject to sufficiently strong randomness [1]. Significant theoretical effort has been devoted in the last few years to understand this phenomenon, the only known generic exception to thermalization (see e.g. [2,3] for recent reviews). The recent work led to classification of many-body localization (MBL) as a distinct dynamical phase of matter, characterized by a remarkable set of defining properties: (i) there are locally accessible observables that do not relax to their equilibrium values and hence can be related to a set of quasi-local integrals of motion [4][5][6][7][8]; (ii) even after arbitrarily long time evolution retrievable quantum information persists in the system and may be extracted from local degrees of freedom [9,10]; (iii) entanglement entropy grows with time evolution only as a logarithmic function of time [4,[11][12][13].
In spite of the progress in understanding the MBL phase, very little is known about the dynamical phase transition which separates it from the delocalized thermal phase. Part of the difficulty lies in the fundamental difference between the energy eigenstates found on either side of the transition. Eigenstates in the thermal phase are expected to obey the eigenstate thermalization hypothesis, which, in particular, implies extensive (i.e. volume law) entanglement entropy. The nonlocality of quantum mechanics is fully exploited in such states, where information resides in highly non local entities: the exponentially many expansion coefficients of the wave-function in terms of local basis states. On the other hand, in the many-body localized phase the eigenstates feature area-law entanglement entropy akin to quantum ground states. Hence a putative critical point separat-ing the two types of states would be unlike any other known phase transition. Ground state quantum critical points and dynamical critical points which occur inside the localized phase mark transitions between area-law states, whereas thermal critical points are transitions between distinct states with extensive (i.e. volume law) entropy. The need to describe this critical point, where the eigenstates change from area law to volume law entanglement, and hence the quantum information in some sense escapes from localized degrees of freedom to highly non-local ones, calls for a new theoretical approach.
In this paper we develop a strong disorder renormalization group framework which can address this manybody localization phase transition. We find a transition controlled by an infinite randomness RG fixed point, where the broad distributions are of a scaling variable directly related to the entanglement entropy of the system's eigenstates. Thus, using this RG scheme we obtain finite size scaling results for the probability distribution of the entanglement entropy near this phase transition. A corollary of the analysis is that the phases adjacent to this critical point are Griffiths phases, where some properties are dominated by rare regions. On the delocalized side of the transition, there is a thermal Griffiths phase showing anomalous (sub-diffusive) transport and sub-linear entanglement growth under time evolution, due to rare, locally insulating regions that impede the transport but do not prevent thermalization.
Before proceeding, we mention the relation to recent work on the MBL transition. Most of this work has relied on exact diagonalization of very small systems [14][15][16]. In particular, the numerical results of Ref. [15] suggested an infinite randomness critical point. More recently, Kjall et. al. [16] identified a peak in the variance of the eigenstate entanglement entropy as a sensitive variable for locating and characterizing the transition. This indeed turns out to be related to the main scaling variable in our theory. Finally, recent work identified and explored the sub-diffusive regime in the vicinity of the transition [17,18]. In this paper we present a comprehensive theory which naturally explains and unifies the different phenomena associated with dynamics and entanglement near the many-body localization phase transition.

II. RG SCHEME
A strong disorder renormalization group scheme has been developed recently to describe the dynamics within the many-body localized phase [4,19,20]. This approach, however, neglects resonances, i.e. non-local modes involving many of the microscopic degrees of freedom. Because these are the very processes that lead to delocalization, a new approach, which incorporates the physics of resonances, is needed in order to describe the many-body localization transition.
The microscopic systems we have in mind include disordered spin chains as well as interacting lattice particles hopping in a random potential in one dimension. But in order to capture the effect of resonances we will concede the fully microscopic starting point and instead work with an effective coarse grained model of the system, which we expect nonetheless provides a faithful description of the system near the critical point. We are able to consider within the same framework closed systems with energy conservation as well as periodically driven (Floquet) systems, which lack any conservation laws other than the unitarity and locality of their time evolution. We work at the energy density that corresponds to infinite temperature when the system thermalizes.
Regardless of microscopic details, we assume that sufficiently close to the critical point the system can be viewed as being composed of blocks i of varying lengths l i , which locally behave more like insulators or more like thermalizing systems. We define the length l of a block as the binary logarithm of the dimension N of its Hilbert space, so N = 2 l . Thus for a disordered spin-1/2 chain l is the number of spins in the block. When a block is consid-ered in isolation, if it is an insulating block the eigenstates of its Hamiltonian typically exhibit only short-range entanglement on length scales shorter than the length of the block. These insulating blocks, when isolated, contain conserved operators with localization length shorter than the block, and as a result the many-body spectra of such insulating blocks have nearly Poisson level statistics. On the other hand, in locally thermalizing blocks, long range resonances have proliferated enough that such blocks, even when isolated, do not contain conserved operators that are localized on scales shorter than the block length. The eigenstates of these thermalizing blocks thus exhibit entanglement that extends from one end of the block to the other, and as a result their spectra have nearly Wigner-Dyson level statistics. More generally, there is a dimensionless coupling parameter g i for each block, with g = 0 being the insulating limit, g = N being the fully conducting limit, and the crossover between the insulating and thermalizing regimes occuring near g = 1.
Our coarse grained model consists of a chain of coupled blocks as shown in Fig. 1(a), where each single block and each pair of adjacent blocks is characterized by a minimal set of parameters as described below. At the basis of the RG scheme lies the assumption that this is the minimal set of parameters required to capture the universal behavior at the critical point. Note that the Hilbert space dimension N of the coarse grained model of a chain of L microscopic spins-1/2 is still 2 L , exactly as the bare model. Thus we do not "integrate out" states. However, the retained information is reduced because we now keep only a few parameters for each block i of l i spins. In the course of renormalization, pairs of adjacent blocks are joined into longer blocks, so the total number of retained parameters is steadily reduced.
To identify the parameter g i for a given block i it is useful to consider the time and energy scales that characterize the block. Of course each block is characterized by a typical many-body level spacing ∆ i = W 2 −li √ l i , where W is a microscopic bare energy scale. In addition there is a parameter that we call the entanglement rate, and denote by Γ i , that is set by the time scale for quantum information and entanglement to spread from one end of the block to the other. Then can be viewed as an "entanglement Thouless time" for block i in the following sense: Put one end of block i in strong local contact with a much longer block j that is a good conductor g j 1. Initialize this two-block system in a pure product state with no entanglement between the two blocks. We choose typical random initial pure states of the two blocks, not eigenstates. Under the unitary time evolution of these two coupled (but otherwise isolated) blocks, the entanglement entropy will then grow and saturate on time scale τ i = 1/Γ i , with the final value close to the thermal equilibrium entropy of the smaller block i. Thus we can call τ i the entanglement time of block i; it is the time for entanglement to spread across the full length of the block. In principle, with knowledge of the microscopic couplings one could attempt to compute this time, but here the Γ i of the blocks are taken as inputs for the RG scheme.
It is noteworthy that τ is not the energy transport time. The latter, denoted by τ tr , is the time scale to relax an extensive energy imbalance across the block. On the end-to-end entanglement time-scale τ i the amount of energy transported across the block remains of order the microscopic energy scale, so is not extensive. To relax an extensive energy imbalance requires transporting an extensive (in l i ) amount of energy, so requires of order l i entanglement times. Hence τ tr ∼ l i τ i . Note that the entanglement time τ i is well defined even in a system subject to external periodically time dependent fields, such as a Floquet system, where total energy is not conserved and there is no extensive quantity that can be transported, so the transport time is meaningless.
The two-block parameters, Γ ij , ∆ ij and g ij = Γ ij /∆ ij , are defined as the block parameters that would ensue if the two adjacent blocks are treated as a single block. For instance ∆ ij = W 2 −(li+lj ) l i + l j ∼ = ∆ i ∆ j /W . We call the link between these adjacent blocks i and j "effective" if g ij 1 and "ineffective" if g ij 1. A general requirement to be met by the initial distributions and retained throughout the RG flow is that the smallest block rate min i Γ i is larger than the largest two-block rate Ω = max ij Γ ij . Ω, the largest two-block rate, serves as the running RG frequency cutoff scale. In this way all the fast rates (Γ > Ω) are intra-block, while the slow rates, below the cutoff scale, are inter-block.
We now frame the RG as a strong disorder scheme operating on the chain in real space. At each RG step the cutoff scale Ω is reduced by joining the two blocks with the largest inter-block rate Γ ij . Thus the old twoblock parameters become the new one-block parameters of this new larger block. The non trivial part of the renormalization is to determine the new two-block parameters Γ L and Γ R , which connect the new block to its left and right neighbors. To compute these rates we have to solve for the entanglement rate of three coupled blocks. This calculation cannot be done microscopically in the most general case, but the structure of the solution is rather constrained by the known behavior in limiting cases. These constraints allow us to formulate a closed and self consistent RG scheme. Modifying details of the RG scheme within the allowed constraints does not significantly change the outcome.
Suppose we are now joining blocks 1 and 2 with the fastest two-block rate Γ 12 and want to find the new rate Γ R of the three block system 1, 2, 3. There are two limits in which we can obtain simple reliable expressions for this rate. First, if both links are ineffective, g 12 1 and g 23 1, then we can compute Γ R by straight forward perturbation theory in the weak dimensionless couplings (see appendix A) to obtain This case describes the process of making a bigger insulator out of two insulating links. When applied repeatedly to a long insulating chain this rule indeed leads to the expected exponential increase of the entanglement time with the length of the insulator. Second, if both links lead to effective coupling, g 12 1 and g 23 1, then the entanglement spreads sequentially through the three block chain and we must add the entanglement times G −1 = Γ −1 : (2) In a system with energy conservation the above formula is simply Ohm's law for the thermal resistances. The two RG rules given above lead to the correct scaling of length and time in insulating regions (l ∼ log τ ) and fully conducting regions (l ∼ τ ∼ τ tr /l). To complete the RG scheme we have to determine the behavior of boundaries between insulating and conducting regions. There we expect to encounter three block systems with one effective link g 12 1 and one ineffective link g 23 1. We have to distinguish the case in which the effective link is a link between two metallic blocks from the case when it is a link between a metallic and an insulating block.
Joining an insulator and a conductor, even if the link ultimately turns out to be effective, leads to exponential suppression of the relaxation rate with the length of the insulator (see appendix A). Coupling this structure to yet another insulator (i.e. the ineffective link g 23 ) would lead to further exponential suppression and hence insulatinglike scaling of Γ R as prescribed in the RG rule (1). In appendix A we justify this formula using resummation of the perturbation theory. On the other hand, if the effective link g 12 is between two conductors, then the exponential suppression of the rate Γ R is only the result of the transport through the ineffective link g 23 which constitutes the bottleneck for entanglement spread. In this case we should use the second RG rule (2), which essentially adds the time of this bottleneck to the faster timescale Γ −1 12 . We did not give expressions for intermediate regimes where g ij ∼ 1. Our approach will rely on having such a wide distribution of g's at the interesting fixed point, that the probability of having g ∼ 1 on a link vanishes. In practice we thus treat any g > 1 as g 1, and any g < 1 as g 1.

III. FIXED POINTS AND FLOWS
Application of these RG rules to a chain with a random distribution of coupling constants leads to a flow of those distributions. Instead of solving the rather complicated integro-differential equations for the scale-dependent distributions we simply simulate the RG process on an ensemble of chains, each with up to 10 5 or more initial blocks. Each block in the initial state is taken to be a 100 × 100 matrix with uniform ∆ = W/100 and g = 1, so the initial block lengths are l 0 = log 2 (100). This immediately implies also a uniform ∆ ij . The randomness is introduced in the distribution of the inter block couplings g ij , which are generated in the following way. First a set g ij is drawn from a log normal distribution with mean log(g 0 ) and standard deviation σ g = 1. The problem with this initial set, however, is that the link entanglement timesτ ij obtained from these couplings do not necessarily satisfy the requirement that all link times must be longer than the individual block times τ i = τ 0 (taken to be constant initially). To guarantee this hierarchy we adjust the link times by adding to them the adjacent block times τ ij =τ ij + 2τ 0 . The new dimensionless link couplings g ij are now obtained from the adjusted link times g ij = 1/(τ ij ∆ ij ). We use the parameter log(g 0 ) as the tuning parameter for the many-body localization transition. Although we start with moderate randomness, and only on the links, near the critical point the RG flows rapidly to strong randomness in all parameters.
In the course of renormalization blocks are being joined together into larger ones, so that the typical block length l(Ω) grows as the cutoff Ω decreases. We study how the distributions of the block parameters are behaving as a function of the length scale l(Ω). Qualitatively, the system can flow to two simple fixed points characterized by the scaling of the average value of the dimensionless coupling g with l. If the system is in the many-body localized phase then g(l) vanishes exponentially with the length-scale l. If, on the other hand, the system is in the delocalized phase then g(l) increases exponentially. Fig. 2(a) shows how the flow is driven across the transition between the two phases by tuning the characteristic bare coupling log g 0 .
The location of the critical point is accurately determined by a careful finite size scaling analysis, using a measure of the entanglement entropy as a scaling variable, is described in section VI below. An alternative finite size analysis using the slopes d log(g) /dl is given in appendix B. Both approaches give the same outcome. Once the transition point has been located we can characterize the RG flow of the distribution of coupling constants near it.
A natural scaling variable to consider in our case, which reflects the link entanglement rates, is γ = ln(Ω/Γ link ). The flow of this variable is characteristic of an infinite randomness critical point [21]. Specifically, we find a linear growth of the standard deviation δγ with the RG flow parameter λ = log(Ω/Ω 0 ), where Ω 0 is the bare energy scale. This is the same as in the well known random singlet phase discussed by Fisher. However, as we will see below, the scaling between length and time scales at the critical point is different. Here we find log(t) ∼ l ψ with ψ = 1, compared to ψ = 1/2 at the random-singlet ground state infinite-randomness fixed point. Thus in this sense the flow to infinite randomness is stronger at our new fixed point than it is at the ground state infiniterandomness fixed points, and this distinction has important consequences.
The flow to infinite randomness is important also as a (a) The many-body localization transition tuned by the bare coupling g0 as seen in the RG flow of the dimensionless coupling g = Γ/∆. In the thermal phase g grows exponentially with block length l (linearly with the dimension of the block's Hilbert space), whereas in the localized phase g decreases exponentially. (b) The length-time scaling for thermal transport ltr ∼ t α extracted from the RG flow. The dynamical exponent α is plotted as a function of the tuning parameter. It reveals a continuous transition from a localized phase to a sub-diffusive but thermal Griffiths regime, i.e. with 0 < α < 1/2. The transport becomes diffusive, α = 1/2, deep in the delocalized phase.
posteriori justification of the RG scheme. In the formulation of the RG steps we have assumed that the links realize extreme situations with either g 1 or g 1. As shown explicitly in appendix B, the flow of the scaling variable γ to infinite randomness is accompanied by broad distribution of the dimensionless coupling constant g = Γ/∆, so that near the critical point almost all links indeed realize extreme values of g. In the following sections we analyze the universal dynamics and the scaling of physical quantities on either side of the transition, where they are influenced by the infinite randomness critical point.

IV. ENERGY TRANSPORT AND ENTANGLEMENT SPREAD
An obvious property to study in systems with energy conservation is the behavior of thermal transport near the many-body localization transition. Information on the thermal transport can be gained directly from the RG flow by inspecting how the typical thermalization time of a block τ tr = l/Γ scales with the block size l.
In the insulating phase we expect that τ tr (l) grows exponentially with l, or l ∼ log τ tr , while in a diffusive conductor l ∼ √ Dτ tr , where D is the diffusion constant. One might expect that D vanishes continuously as the transition is approached. However, this is not the result we find from the RG flow. Rather, we obtain a lengthtime scaling which follows a generalized power-law scaling l ∼ τ α tr . As seen in Fig. 2(b), far from the critical point we indeed have the diffusive α = 1/2, but closer to the transition the thermal transport is subdiffusive. The exponent α vanishes continuously at the critical point, where the length-time scaling becomes logarithmic as in the localized phase.
From the anomalous diffusion exponent α we can immediately infer the rate of entanglement entropy growth in a system undergoing time evolution from an initially nonentangled product pure state. The bipartite entanglement entropy across a link in our chain generated after time τ is proportional to the number of degrees of freedom that become entangled by that time, i.e. S ∼ l(τ ). Substituting τ tr = lτ into the relation l ∼ τ α tr we then find S ∼ τ α/(1−α) . In particular this scaling relation implies ballistic entanglement spreading (S ∝ t) in systems with diffusive energy transport, as already noted in Ref. [22]. On the other hand the two exponents α and α ent have the same asymptotic behavior at the critical point as α and α ent → 0.

V. EFFECTIVE GRIFFITHS PHASE MODEL
We argue that the existence of a sub diffusive phase is a natural precursor to the many-body localization transition in one dimension. If we assume that the many-body localization transition is continuous, then it must be accompanied by a diverging correlation length, ξ ∼ |g 0 − g 0c | −ν . As we show below, we do indeed find such a power law divergence of the correlation length. If we look at the system at this length scale, then it looks critical. Since this is a critical point governed by an infinite randomness fixed point with ψ = 1, regions of this critical system viewed at this length scale ξ show a wide range of local behavior, ranging from insulating to thermalizing, with blocks of length ξ being either critical or insulating, each with a probability of order one. For a system that is globally delocalized (g 0 > g 0c ), on longer scales l than ξ the system is typically locally thermalizing, but longer locally critical blocks of length l may exist with a probability that behaves as p(l) ∼ exp (−l/ξ).
While they are exponentially rare, such long critical or insulating regions lead to an exponentially long delay of the entanglement time τ (l) ∼ τ 0 exp(l/ξ 0 ), where ξ 0 and τ 0 are microscopic length and time scales respectively. Hence these rare critical regions have a significant effect on the average; this is a defining feature of a Griffiths regime [23].
In a long section of length L ξ, the typical length l m of the longest locally critical block is given by p(l m ) ∼ ξ/L, which gives l m = ξ log(L/ξ). Near enough to the critical point, these rare, long critical blocks are the dominant bottlenecks to entanglement spread and energy transport. Substituting l m in the exponential for the time scale we find τ ∼ L z and τ tr ∼ L z+1 , with continuously variable Griffiths dynamical exponent z = α −1 ent ≈ ξ/ξ 0 . Note that if we instead had ψ < 1, as at the ground state infinite randomness critical points, then τ (l) ∼ exp(l ψ ), which would be too weak to produce this subdiffusive behavior, so the dominant Griffiths domains are not critical regions, but insulating regions.
These Griffiths effects dominate the long time transport as long as z > 1, while the systems has "normal" transport farther from the transition, where z "sticks" to the value z = 1 that gives ballistic entanglement spread and diffusive energy transport. Note this also implies that the Griffiths exponent for entanglement spreading behaves as α ent ∼ (g 0 − g c ) ν as one approaches the transition. Fig. 7 of Appendix C shows that our results are consistent with this behavior. Near the transition, these Griffiths effects lead to a broad distribution of the dimensionless coupling g(L) at large L, due to the variation in the severity of the slowest bottleneck. This then matches on nicely to the broad distributions we find at the critical point.

VI. SCALING OF EIGENSTATE ENTANGLEMENT ENTROPY
So far we considered only the properties of a system undergoing time evolution. From this perspective the critical point is similar to the many-body localized phase in that both exhibit transport or entanglement times that grow exponentially with the length. However, more direct insight on the nature of the critical point can be gleaned from studying how generic energy eigenstates change across the critical point. As mentioned in the introduction, the many-body localization transition represents a novel type of critical point at which the eigenstate entanglement scaling changes from area-law to volume law [16,24,25]. The real space RG approach can lend information on how this change takes place.
First, we explain the relation between the dimensionless coupling g and the entanglement entropy in eigenstates. Suppose we renormalized the chain all the way down to the point where we have only two blocks remaining in the system. If these two blocks were decoupled then the exact eigenstates would be non-entangled product states of the two blocks. The rate Γ ij represents the lifetime of the product states due to weak coupling between the blocks (relative to intra-block coupling). The true eigenstates are then a superposition of the ∼ (g 12 + 1) = 1 + Γ 12 /∆ 12 product states nearest in energy (one is added to correctly match the decoupled limit g 12 = 0, where the superposition still contains one state, the original product state). Hence S 12 = log(1 + g 12 ) has the meaning of a "diagonal" entropy associated with a single energy eigenstate when the corresponding density matrix is expressed in the basis of product states. This entropy is related to entanglement entropy, but is defined without tracing out part of the system; it can be as large as the full thermal entropy of the two blocks.
The above definition might not reflect a bulk entropy in cases where the last decimated link is a very weak link which happens to be located far from the center and close to one of the ends of the chain. To avoid this issue we use a slightly modified definition of the entropy. We keep track of the coupling g associated with the block that spans the middle of the original chain at each stage of the RG and record its maximum over the entire flow. We denote the outcome as g max and define S = log(1+g max ).
The need for taking g max rather than the last surviving g is particularly important when there is a very weak link somewhere in the chain. As a toy example consider a chain of three blocks, where blocks 1 and 2 are coupled and together span the interface, whereas blocks 2 and 3 are completely disconnected (i.e Γ 23 = 0). In this case we will first join blocks 1 and 2 to get a new block with g 12 > 0, which spans the interface. Obviously there is entanglement across the interface, which S 12 = log(1 + g 12 ) represents. However if we now continue to renormalize we would obtain g = 0 for the last remaining block, which of course represents only the absence of entanglement across the disconnected link.
The RG scheme is repeated on a large number of disorder realizations allowing to obtain a full distribution of the the entanglement entropy. Examples of entropy distributions found in the different states, including the localized state, the critical point, the Griffiths phase and the diffusive regime are shown in appendix D. Here in Fig. 3(a) we present the average and standard deviation of the entropy as a function of the bare coupling log g 0 calculated for varying system sizes L (L in units of elementary blocks of l 0 spins). The entropy and its fluctuation are normalized by the extensive thermal entropy S T = L l 0 log 2. As expected, the variation of S/S T and δS/S T as a function of log g 0 sharpen with increasing size in a way which suggests the existence of a critical point in the limit L → ∞. In this case we anticipate that near the critical point the functions S(g 0 , L)/S T and δS(g 0 , L)/S T should all collapse on scaling functions of a single variable (g 0 − g 0c )L 1/ν , where the critical value g 0c and the universal critical exponent ν are fitting parameters (see Fig. 3(b)).
The correlation-length critical exponent extracted from the data collapse, ν ∼ = 2.8, satisfies the Harris inequality ν ≥ 2/d required for stability of the critical point [26,27]. It is interesting that a much smaller exponent, which violates this inequality, was found in recent finite size scaling analysis of exact diagonalization data [16,28]. This, as well as other differences from our scaling form may be due to the small system sizes studied in Refs. [16,28], L < 18, which are likely to be too small to approach the scaling limit. Indeed in our case, although we start from a coarse grained model, system sizes of 50 or more blocks are needed. We do make some assumptions in choosing some of the details of our RG rules. There thus remains the possibility that the precise value of this exponent ν is sensitive to such details, so this estimate of ν should be viewed as possibly approximate.
The entanglement scaling functions shown in Fig. 3(b) expose important properties of the subdiffusive Griffiths phase. A finite size system in the subdiffusive Griffiths phase corresponds to a point on the positive scaled x axis. With increasing system size, such a system flows on the scaling functions to the right hand side of the graph, where the entropy density approaches its thermalequilibrium value. At the same time the relative fluctuations of the entropy in the Griffiths phase vanish with system size. These results indicate that the entire Griffiths regime is fully thermal in the limit of large L.
More information on the critical point itself can be gleaned from inspecting the full distribution of the entropy. The result shown in appendix D Fig. 8(b) suggest that the entropy distribution approaches a powerlaw p(s) ∼ 1/s ζ with ζ ≈ 0.84. For ζ < 1 the average and standard deviation of the entropy density is expected to be a non vanishing constant. The fact that ζ is close to 1 may explain why these constants appear to be very small. We note that if ζ = 1 then the entropy density at the critical point would approach zero as 1/ log L and its fluctuations would vanish as 1/ √ log L.

VII. CONCLUSIONS
We presented a new renormalization group framework, which provides a description of the many-body localization transition in one-dimensional systems. The dynamical scaling between length and time l ∼ t α within the thermal phase extracted from the RG calculation shows that the transition occurs at a critical point where α vanishes continuously. Hence the delocalized phase near the critical point displays sub-diffusive transport and subballistic entanglement growth in time evolution. This behavior is understood in terms of a Griffiths phase dominated by rare critical inclusions in a conductor.
We pointed out a connection between dynamical properties and the entanglement entropy associated with individual eigenstates near the critical point. Using this relation we show how eigenstates with area law entanglement in the localized phase transition to ones with volume law entanglement entropy characteristic of thermal states. This occurs through an infinite randomness critical point at which the distribution of entanglement entropy becomes broad, spanning the entire range from S ∼ 0 to the full thermal value S ∼ L. The RG flow to infinite randomness is as strong as is possible, with exponent ψ = 1. In the delocalized Griffiths phase, on the other hand, the entanglement entropy density is peaked near the thermal value with fluctuations that vanish in the limit of large L, indicating that this delocalized Griffiths regime is fully thermal in spite of its anomalous transport properties. More generally, the variation of the entropy and its fluctuations across the transition is expressed in terms of finite-size scaling functions from which we extract an estimate of the critical exponent ν ∼ = 2.8 associated with the diverging correlation length.
It is interesting to understand our result in view of the constraints set on the many-body localization transitions by the strong sub-additivity property of the entanglement entropy [25]. The constraint relevant to the situation we describe is that the critical point marking a direct transition to a thermal delocalized state must itself obey the eigenstate thermalization hypothesis and show thermal behavior of the entanglement entropy. At first sight this may appear to contradict our finding of strongly fluctuating, non thermal entanglement entropy at the critical point. However we note that strong sub additivity requires only thermal behavior of the the entanglement entropy associated with a subsystem of size l much smaller than the full system size L. That is S(l, L) must behave as the thermal entropy with vanishing fluctuations in the appropriate thermodynamic limit L → ∞ while l/L → 0. Thus we conclude that the critical point represents a weaker class of thermal states than the delocalized Griffiths phase. In the latter the entanglement entropy density for half of the system is thermal with vanishing relative fluctuations in the L → ∞ limit, while for the former only the limit of small subsystems (l L) is fully thermal.

ACKNOWLEDGMENTS
Illuminating discussions with Anatoli Polkovnikov, Gil Refael, Dmitri Abanin, Rahul Nandkishore, Sarang Gopalakrishnan, and Joel Moore are gratefully acknowledged. EA thanks the Miller Institute at UC Berkeley, the Aspen Center for Physics under NSF Grant # 1066293 and the Perimeter Institute for hospitality. EA and RV were supported by ERC grant UQUAM, the ISF grant # 1594/11 (EA).

Appendix A: Derivation of the RG rules
In this section we derive the RG rules, which are used in the main text. At each step of the RG we join the pair of blocks connected by the fastest link rate Γ i,i+1 into a single block thereby making the link variables of this pair into the new block variables. The non trivial part of the transformation prescribes what are the new link rates Γ L and Γ R connecting the newly joined block to its left and right neighbors. Before explaining how to compute these rates we shall discuss the physical meaning of the input two-block rates.

Two block relaxation
The "bare" two-block relaxation rates Γ ij are given as input and not directly calculated. However we need to know how they depend on the microscopic coupling matrix elements between blocks in order to understand how these rates enter the calculated three-block rates. In general we want to consider a situation of two neighboring blocks 1 and 2 characterized by internal rates Γ 1 and Γ 2 and level spacings ∆ 1 and ∆ 2 . Recall that these blocks are really chains of microscopic constituents, e.g. spins. Therefore, for a Hamiltonian system with energy conservation the band-width that captures almost all of the many body spectrum grows with the block length as W √ l, where W is a microscopic energy scale and l is the block length. Correspondingly the typical manybody level spacing for the block is ∆ ∼ W √ l/2 l . In a Floquet system on the other hand the bandwidth remains constant W and therefore the mean level spacing ∆ ∼ W/2 l . In practice this difference is sub-leading to the exponential dependence and will make no difference for the critical point.
In absence of coupling between the blocks the eigenstates of the two block system are, of course, products of the single block eigenstates. We now introduce coupling between the two blocks through a local operator J 12 which changes microscopic degrees of freedom on the two neighboring edges of the blocks. It can be written aŝ where A 1 operates on the edge of block 1 and A 2 operates on the edge of block 2.
If the system is prepared in a product of the single block eigenstates, the coupling leads to decay of the state by inducing transitions to other product states. The Fermi golden rule expression for this decay rate is where | i 1 , i 2 is the initial state and the summation is over the possible final states | n 1 , n 2 . Here, ω 1 = E n1 − E i1 and ω 2 = E n2 − E i2 are the energy changes due to the transitions in block 1 and 2 respectively.
The nature of the transition matrix elements n b | A b | m b , relevant for relaxation from one side of the block to the other, depend on whether the block b under consideration is delocalized or localized. If it is a strongly localized block, we can write the block eigenstates as mutual eigenstates of quasi-local integrals of motion τ z i ("l-bits"), | n = | τ 1 , . . . τ l . The local operator A † b at the edge of the block can be written in terms of these l-bits as where the operatorsÔ nr are non local operators that flip multiple l-bits extending up to a distance r from the first site. a nr are random coefficients of order 1 and random sign and ξ b is a microscopic length scale (ξ b ≤ 1). Since we are interested in the end to end relaxation we only consider transitions, which change the state of the integrals of motion τ z i all the way to the other side of the block. For typical matrix elements of interest we have: where a nm are random numbers of order 1 drawn from a state independent distribution as long as the two states | m and | n differ by an energy of up to order W . a mn essentially vanish for larger transition energies (e.g. for transitions of the order of the bandwidth W √ l). On the other hand, when dealing with a delocalized block the single block integrals of motion are the projectors on single block eigenstates | n n | , which are highly non local operators. We take the transition matrix elements of the local operator A b between these states to be functions of the energy difference between them alone: These matrix elements are directly related to the temporal decay of the autocorrelation function where ρ b ∼ 1/∆ b is the density of states of block b. For example if the block is diffusive and the system is energy conserving then f b (t) ∼ τ 0 /t. Note that we can unify the notations for the different cases if in the insulating We are now ready to evaluate the relaxation rates by converting the sums in (A2) into integrals over the respective density of states and plugging in the appropriate matrix elements.
The upper cutoff is set by the minimum of the decay times of the two blocks, τ = min(Γ −1 1 , Γ −1 2 ). Let us pause to consider the relaxation rate in different cases, i.e. when we couple (i) two insulators, (ii) two conductors or (iii) an insulator and a conductor. In case (i) taking for simplicity ξ 1 = ξ 2 ≡ ξ 0 and J 12 ≈ W , the microscopic energy scale, we have: Now, using ∆ b = W √ l b /2 l b and ∆ 12 = W √ l 12 /2 (l12) we can express the dimensionless coupling as This must be smaller than 1 because a similar expression, with the same exponential factor, holds for g 1 and g 2 of the individual blocks, and for the latter to be much smaller than 1, as assumed, we must have ln 2 < ξ −1 0 . In case (ii), when both blocks are conducting, perturbation theory is not valid, but at least it can indicate that the coupling must be effective and the blocks thermalize.
In this case we expect the end-to-end entanglement time is simply the sum of the times to entangle across each block: Γ −1 12 = Γ −1 1 + Γ −1 2 . Finally in case (iii) of a conductor coupled to an insulator we find Γ 12 = Γ −1 1 + (W 2 /∆ 2 )e −2l2/ξ2 . In the case of a fast conductor (Γ 1 Γ 2 ) we have for the dimensionless coupling : If the length l 2 of the insulator is long enough, the incipient conductor of length l 1 is not able to thermalize it.

Two block entanglement rate
If the two block relaxation rate calculated above turns out to be smaller than the two block level spacing ∆ 12 then g 12 < 1, it is deemed ineffective and the two blocks do not exhibit end-to-end relaxation. However, there is still a physical rate, which describes the rate at which the degrees of freedom at the furthest ends of the blocks get entangled with each other. We will see that the ex-pression for the entanglement rate of two blocks that end up insulating turns out to be identical to the expression (A6) for the relaxation rate.
The fact that the coupling between the blocks is ineffective means that operators, which were integrals of motion of the individual blocks map continuously to integrals of motion of the two block system. In particular the projectors on single block eigenstates | n 1 n 1 | ⊗ 1 and 1 ⊗ | n 2 n 2 | are continuously connected to integrals of motion of the coupled two block system. The coupling between the two blocks generates a diagonal interaction between these conserved quantities, which can lead to generation of entanglement in the course of time evolution. We want to find the effective diagonal coupling generated between degrees of freedom that, if localized, are located at opposite ends of the two block system.
The diagonal interaction we are interested in is generated by second order of perturbation theory in the local (off diagonal) inter-block coupling: This is exactly the same expression we obtained for the relaxation rate. It is important to note here that matrix elements m b | A b | n b , which lead to generation of end to end interaction (and through it end to end entanglement) are only those which involve end-to-end couplings between degrees of freedom at the far ends of the two block system. Hence, as in the previous section, we are interested in the non-local tail of the operator A b , when written terms of the block integrals of motion (A3). For this reason the function F b (ω) in an insulating block involves a suppression factor of order e −2l b /ξ b

Perturbative three block relaxation
Suppose we are now joining blocks 1 and 2 with the fastest link rate Γ 12 . We must then find the new rate Γ R needed for thermalization (or end-to-end entanglement in the insulating case) through the three block system 1, 2, 3. We will see that this rate can be expressed in terms of the two-block and single block rates.
The simplest case to treat is when both of the links are ineffective, i.e. g 12 1 and g 23 1. In this case the decay rate from initial state | i to final state | f is obtained using the generalized Fermi-golden rule with the T -matrix given by where H 0 is the Hamiltonian of decoupled blocks (i.e. contains only the intra-block interactions) andĴ is the coupling between the blocks. In our case,Ĵ =Ĵ 12 +Ĵ 23 . Clearly, to lowest order inĴ we recover the usual fermi golden-rule.
A crucial point is that the relaxation process we calculate involves a decay from an initial state | i 1 , i 2 , i 3 to a final state | f 1 , f 2 , f 3 , which is different from the initial state in at least the first and the last labels (i.e i 1 = f 1 and i 3 = f 3 ). Otherwise it would not correspond to full end-to-end relaxation of the three block system.
The matrix elements of the T -matrix now take the explicit form The first order term dropped because | i and | f are not connected by a single application ofĴ 12 or J 23 . We also introduces a sum over complete interme For the summation over intermediate states of the second block we choose a basis of block eigenstates that is not the eigenstate basis. Rather, to take advantage of our knowledge of the intra-block rate Γ 2 we divide the middle block to two halves and take a basis of product states of the two halves. These states are broadened by Γ 2 (by the definition of this internal block rate), that is their energy can be effectively taken to be E m + iΓ 2 . In particular, this will give rise to an imaginary part of the energy denominator η = Γ 2 when evaluating the T -matrix element (A14).
We are now in position to compute the decay rate using (A12) and (A14) to obtain (A15) Plugging (A1), defining the energy shifts within the blocks , and converting the sum into an integral over the density of states we get In the second line we changed variables to ω = ω 1 + ω 2L , ω = ω 3 +ω 2R , L = (ω 1 −ω 2L )/2 and ω R = (ω 3 −ω 2R )/2, and integrated over ω . We now transform the integral over L and R to integrals over time where in the second step we integrated over ω using the Fourier transform of a Lorentzian. Now, because there is an independent cutoff Γ −1 2 on each of the integration times, set by the second exponential factor above, we can drop the cutoff on the time difference set by the first exponential. Hence up to multiplicative constants that would be irrelevant in the RG for strong disorder we have are usually characterized by (i) how the scaling variables (associated with coupling constants) and their standard deviations scale with the block length l, and (ii) how the block length scales with the logarithm of the energy cutoff λ = log(Ω 0 /Ω). Fig. 5 clearly shows a linear dependence of the scaling variable γ = log(Ω/Γ link ), associated with the link entanglement rates, and of its standard deviation δγ, with the block length l. The same linear scaling behavior is seen for δ log (g). The length time scaling at the critical point is shown in Fig. 6(b), showing a linear dependence of l on λ, that is, a logarithmic dependence on the time.
These results indicate a different class of infinite randomness critical point from the famous cases of the random singlet phase and the random transverse field Ising model considered by Fisher [21,29]. Specifically, at Fisher's fixed point the RG flow parameter λ scales with l as λ ∼ l ψ with ψ = 1/2, while in our case ψ = 1.

Appendix C: Length time scaling
In Fig. 6 we present the length vs. time scaling extracted from the RG calculation. The lines show the average block length l versus the time-scale given by the inverse of the cutoff frequency t = Ω −1 . As explained in the main text this scale corresponds to the time for entanglement spreading. Therefore the power-laws fitted in the delocalized phase give the dynamical exponent for entanglement growth α ent . To obtain the transport exponent α shown in Fig. 2(b) of the main text we use the scaling relation derived in the main text α = α ent /(1+α ent ). We note that at the critical point and the localized phase, i.e. for log(g 0 ) ≤ log(g 0c ) ≈ −1.8, the exponent vanishes and the dependence becomes instead logarithmic growth of entanglement with time.
We can ask how the dynamical exponent α or α ent vanishes as g 0 approaches g 0c . The plot of α ent versus (g 0 − g 0c ) on a log-log plot, computed using the RG on systems of varying sizes is shown in Fig. 7. These results seem consistent with the expected α ent ∼ (g 0 − g 0c ) ν . Note however that it is rather hard to obtain the asymptotic behavior of α near the critical point because of two requirements that have to be met. First, we must get sufficiently close to the critical point in order to be in the critical scaling regime. Second, the sub-diffusive transport is a property of the Griffiths phase, thus for given value of the tuning parameter g 0 − g 0c in the scaling regime we must obtain α from systems sizes that are much larger than the long correlation length ξ = c(g0 − g 0c ) −ν . The quality of the power-law fit suggests that the systems we are calculating may be just barely reaching the asymptotic scaling of the exponent α.

Appendix D: Entanglement entropy distributions
In this appendix we show examples of the entanglement entropy distributions computed using the RG flow applied to an ensemble of disorder realizations. Fig. 8 shows four distributions taken respectively from the localized phase, the critical point, the Griffiths phase and the diffusive regime for long chains with L/l 0 = 10000. In the localized phase the entanglement entropy follows an area law, therefore the distribution of the specific entropy s = S/S T is concentrated near zero, with the tail of the distribution consistent with a simple exponential. At the critical point the entanglement entropy shows a broad distribution that is consistent with a power law P c (s) ∼ 1/s ζ with ζ ∼ = 0.8. In the Griffiths phase the distribution has a relatively narrow peak near the ther-  mal value. Finally, in the diffusive phase the distribution becomes essentially a delta function at the thermal value minus a tiny finite-size correction.