Randomness-Assisted Exponential Hierarchies

Inspired by the localization phenomenon in condensed matter systems, we explore constructions in the theory space of multiple scalar fields, in which exponentially suppressed couplings could originate from random parameters. In particular, we find a new class of non-local theory space models, in which scalar fields at non-adjacent sites interact with each other but with strengths decaying exponentially with the site separation. Such a model could have very different localization properties, compared to the local theory space scenarios with only nearest-site interactions, based on the original Anderson localization model. More specifically, we find that a particular non-local interaction pattern leads to bi-localization of the two lightest eigenstates. Exponential localization (and thus exponentially suppressed couplings) then emerges only and immediately when randomness is introduced, no matter how tiny it is. We discuss variants of the model and possible UV completions as well.


Introduction
In his seminal paper in 1958 [1], Anderson showed that electron energy eigenstates localize in a singleparticle system with short range hopping between adjacent sites, even in the presence of disorder. Since then, localization and phase transitions between extended ergodic and localized phases have been hot topics in condensed matter physics. Over the last 60 years, there have been significant numerical and analytical efforts exploring Anderson localization and generalizations of Anderson's original model. For some recent reviews, see [2,3]. One interesting direction to extend the original Anderson model is to consider single-particle systems with long-range hopping, that is, hopping between non-adjacent sites on a lattice. Examples include consider a constant hopping between any arbitrary pair of sites [4][5][6][7] and long-range hopping that follows a power-law decay [8][9][10][11][12][13][14]. One of the most important findings is that the correlation in the long-range hopping tends to localize the system [14].
The localization phenomenon in condensed matter systems could have interesting analogues in particle physics. Recently, it is pointed out in ref. [15] that one might apply a new mechanism analogous to the original Anderson model in one dimension, in which mass eigenstates along a local theory space localize with random mass parameters, to generate exponential hierarchies in four-dimensional quantum field theories. 1 Other analogies have also been discussed in the context of inflation [19], particle production in the early Universe [20] and high-dimensional gravitational theory [21].
In this article, we first introduce and study a new class of random Hamiltonians with translationalinvariant long-range hopping that decays exponentially with separation between any two sites. We will demonstrate that unlike the models with power-law long-range hopping in the condensed matter literature, the energy eigenstates are exponentially localized in this case.
It may be difficult to realize this Hamiltonian in a condensed matter system, in which it is more natural to expect that the Coloumb or dipole interactions lead to power-law hopping. Yet it is plausible to realize such kind of exponentially-decaying long-range hopping using quantum field theory in a non-local theory space. A theory space [22] consists of a set of sites denoting global or gauge symmetry groups, connected by links -quantum fields that transform under adjacent groups. In this paper, we consider two toy models of N real scalars in the theory space with exponentially decaying non-local interactions (i.e., interactions between scalars at non-adjacent sites, not to be confused with interactions that traverse the real space), that could be generated at loop levels through tree-level interactions between adjacent scalars. They both result in similar scalar mass matrices and their mass eigenstates are exponentially localized. Yet the details of the localization, such as how localization emerges, localization length and whether they could be UV completed, could be quite different. In particular, one type of construction and its variants could strengthen localization of the lightest mass eigenstates by making them more peaked at one site in the theory space, in the limit of small random diagonal perturbations. This example addresses one of the major motivations of our study: to explore and identify novel classes of theory space models based on random matrices with different localization properties, especially those that could improve localization.
Exponential localization leads to exponentially suppressed couplings between different sites in the theory space and thus could be used to generate exponential hierarchies. As is known, there exist quite a few puzzling hierarchies in fundamental physics, including the minuscule cosmological constant, the hierarchy problem of a light elementary Higgs boson, and the standard model fermion mass hierarchy. Any new mechanism that could potentially explain a large hierarchy is worth exploring, even when its early versions may not fully solve the problems above. The main interesting feature of the localization-inspired field theory constructions is that the localization emerges only when randomness is introduced. This is in sharp contrast to most of the other approaches to hierarchies, which rely on either continuous or discrete symmetries. In the local theory space model [15], one still needs the random diagonal masses to be of order the nearest-site interactions for the exponential localization to be evident. In this article, in the non-local theory space model with the improved localization, localization at a single site originates from bi-localization due to the model structure without any randomness and then emerges immediately when one turns on even a tiny random perturbation.
The paper is organized as follows: in Sec. 2, we review the basics of localization phenomena in singleparticle systems and random matrix theory, the main tool to study the localization property. In Sec. 3, we present the new class of Hamiltonians with random on-site energies and exponentially decaying long-range hopping. We demonstrate the localization of eigenstates through both numerical and analytical studies. In Sec. 4, we review the local theory space construction, inspired by the original Anderson model. In Sec. 5, we introduce and study two non-local theory space toy models of N real scalars, analyze their exponential localization properties and discuss their feasibilities from the UV point of view as well as the localization property of a variant model. We conclude in Sec. 6.

Localization in Condensed Matter Systems
We begin this section by defining localization in random matrix theory and introducing the simplest random matrix model with exponential localization in the context of the famous one-dimensional Anderson model. We also discuss several important generalizations including long-range interactions. This section is mostly a review of basic concepts, tools, and models in the condensed matter literature, which are most relevant to our studies and can be safely skipped by readers who are familiar with the subject.

Definition of Localization
The mathematical tools used to study Anderson localization are random matrix theory, a subject analyzing the statistical properties of various ensembles of matrices with some or all of their elements pulled from prescribed probability distributions. The statistical properties that have been analyzed to great success include eigenvalue distributions, level spacing, repulsion of eigenvalues, norms of matrix powers, and typical eigenvector properties. For reviews, see [23][24][25][26].
The statistical property that we are most interested in is the existence of a "localized" phase -i.e. a region of parameter space where eigenvectors are typically localized [14]. 2 Consider an N × N random matrix model with N being a free parameter. We say that the eigenvectors of the model are localized if their support sets are independent of N. In contrast, fully ergodic states have support on essentially all components while weakly ergodic states have support for approximately cN components where 0 < c < 1. In between the localized and ergodic phases, there may exist a fractal phase characterized by eigenvectors with support sets scaling like N D with 0 < D < 1 [10,14].
One way to assess numerically whether an eigenvector v of an N × N random matrix is localized is to compute its q-th moment, defined by where v j gives the j th component of the vector [27,28]. It has been argued that where ξ is the size of the support set [10,27,29].
One could also determine the existence of different phases using the simple Mott and Levitov conditions [9,30]. These criteria can be applied to any matrix model M with entries taking the form M j,k = j δ j,k + m j,k , where j 's are random parameters with j = 0 and 2 j = σ 2 = 0. m could be deterministic or random. Mott's criterion holds for eigenstates in the bulk of the spectrum and is given by where ∆(m) is the entire set of eigenvalues of the matrix m, usually referred to as the spectrum of m. The difference between largest and smallest eigenvalues of m, |∆(m)| = max(∆(m)) − min(∆(m)), is the bandwidth of the spectrum [14,30]. On the other hand, Levitov's criterion holds for all eigenvectors and is given by [9,14]: Note that both criteria are sufficient but not necessary conditions. There are also different classes of localized states. An eigenstate of a matrix model is exponentially localized if it can be well-approximated by an exponentially decaying envelope, that is, if the components of the eigenvector take the form where j 0 labels the component of v with the largest absolute value and L is a constant that is referred to as the localization length of the eigenvector. L characterizes the spread of the eigenstate around j 0 . Similarly, we say that an eigenvector is algebraically localized if its components decay as an algebraic function. In the rest of the article, we will refer to localization as being synonymous with exponential localization unless stated otherwise. While one can use numerical simulations to show that components of an eigenvector are well-fit by some function for a finite N, it is often important to demonstrate that the fitting is a genuine feature of the model and not just some numerical artifact at small N's by showing that eigenvector components converge to the function arbitrarily far away from j 0 . To this end, one often uses a locator expansion method [1,26,31,32] (more discussions will be presented in section 3.2.2) or a transfer matrix approach [33,34].

Anderson Localization
The simplest random matrix model with exponentially localized eigenstates was studied by Anderson in the pioneering paper [1]. The model contains a Hamiltonian of a single particle moving on a lattice of N sites: where the subscripts j and k denote the sites, g is a coupling constant dictating the strength of the nearestneighbor hopping and j 's are random parameters pulled from a uniform distribution in the interval [0, W] which corresponds to local potential defects on the lattice. It is argued that there exists a critical W c ≥ 0 such that for all W > W c , eigenstates of the Hamiltonian are exponentially localized with probability one [1]. A crucial result is that in one dimension, W c = 0, which was proven rigorously using transfer matrices [33,34] and the locator expansion technique [1,26,31,32]. The Anderson model has two interesting limits: strong and weak localization regimes, where the average eigenvector localization lengths are short and long respectively. In the strong localization regime, W g. Localization is unsurprising since the Hamiltonian becomes nearly diagonal. In this regime, localization length is given by [3] In the weak localization regime, W g, the localization length becomes eigenvalue dependent (we denote the eigenvalue λ), and is given by [3]: While localization in this regime is, indeed, much weaker, eigenstates with eigenvalues near the edges of the spectrum ∆(H) ⊂ [−2g, 2g + W] exhibit stronger localization relative to bulk eigenstates. In addition, it is known that if one chooses to vary g between adjacent sites, the model still typically has exponentially localized eigenvectors provided that the pairs ( j , g j ) are chosen independently from a two-dimensional distribution [35].
Another important discovery is that the Anderson Hamiltonian can be generalized to a square lattice in two dimensions with local defects and nearest neighbor hopping. These models also feature exponential localization of eigenstates. But for d ≥ 3, there is only localization when W > W c with W c > 0. Furthermore, W c is a function of eigenvalues in this case. Again the eigenstates near the band edges are more localized with smaller values of W c [26].

Non-local Models
The Anderson model only includes nearest-neighbor hopping interactions. Since then, there has been a growing interest in studying variants of the original model adding non-local interactions.
One simple example with long-range hopping is the Yuzbashyan-Shastry Hamiltonian for a one-dimensional lattice [6] H In this case, the interactions are fully correlated since all the off-diagonal matrix elements are identical [14]. It has been shown that there are no truly extended states for all σ and γ's, which is quite surprising because the model has an approximate cyclic symmetry, which is broken only by the diagonal perturbations. Indeed, all Hamiltonians with cyclic symmetries are necessarily characterized by completely delocalized Bloch states [14]. The localization is interpreted as an effect of destructive interference of long-range hopping trajectories in the presence of a (full) correlation between the long-range interactions.
Instead of having a fully correlated model, one could also consider a partially correlated model with Hamiltonian in which the hopping terms still preserves the translational invariance. A totally uncorrelated model is given by the Rosenzweig-Porter Hamiltonian [36] where there is no symmetry left and t j,k 's only need to satisfy the Hermicity requirement t * j,k = t k,j . 3 These two ensembles still have a localized phase, but only for γ > 2 [14]. When γ < 2, the two models are in either the fractal or ergodic states. While it may not appear surprising that localization exists for matrices with γ > 2 at least in the large N limit with all the off-diagonal terms approaching zero, it is not obvious why the systems are delocalized when 0 < γ < 2. The three examples above demonstrate that longrange interactions do not necessarily delocalize the system. However, increasing correlations in the nonlocal hopping terms generally leads to improved localization, a phenomenon dubbed correlation induced localization in ref. [14].
In general, translationally invariant Hamiltonians are particularly interesting since they arise often in physical situations. It could be useful to apply the Fourier transformation and study the system in the momentum space. More specifically, a Hamiltonian of the form H j,k = j δ j,k + h j−k can be Fourier transformed to the dual oneH p,q =Ẽ p δ p,q +J p−q with assuming periodic boundary conditions [14]. Note that in the momentum space, the diagonal random entries and the off-diagonal interactions in the real space switch their roles. Thus it is easy to diagonalize the translation-invariant models in the momentum space when one sets aside the random diagonal perturbations. The Fourier transform is particularly useful for models with Hamiltonians parametrized by the family of power-law banded matrices: H j,k = j δ j,k + [1 + (|j − k|/b) a ] −1 , where a and b are positive adjustable free parameters. These models with algebraically decaying off-diagonal couplings have algebraically (rather than exponentially) localized eigenvectors, v j ∼ |j − j 0 | −a [10].

Hamiltonian with Long-range Exponentially Decaying Hopping
In this section, we introduce a new class of Hamiltonian with long-range interactions for a particle moving on a one-dimensional lattice with N sites: where both 's and g have the dimension of energy and b is dimensionless. We take b > 1. Analogous to the Anderson model, j 's are random parameters pulled from a uniform distribution in the interval [0, W]. In this model, the long-range hopping decays as an exponential function of the site separation. The hopping terms are translationally invariant, which means that the hopping between two different sites only depends on their separation but not on their specific positions. Here we only consider fully deterministic hopping, but one could generalize the discussions to random hopping with varying g.

Numerical Results
We first present the numerical properties, i.e., localization lengths, of the Hamiltonian ensembles defined in Eq. (13), taking as an ansatz that they are characterized by exponentially localized eigenvectors. Indeed we find that numerically, the eigenvectors are well-fit by exponential envelopes. In the next section, we will also provide analytical evidence for the exponential localization. We generate a sample of 300 random matrices for given N and W, take the lowest energy eigenvector for each matrix in the sample and compute its localization length by fitting it with an exponential envelope. We thus obtain a distribution for the localization length. The median localization lengths (more precisely, their logarithms) as functions of N and W, for both the model with exponentially-decaying long-range interaction and the Anderson model, are shown in the heat maps in Figure 1. 4 The darker the region is, the shorter the localization length is and the more localized the lightest eigenstate is. In the plot, we have also fixed g = 1/2 for the Anderson model and g = 1, b = 2 for the exponential long-range model so that their nearest neighbor hopping terms are the same. Note that g and 's are in the same unit of energy, which we don't specify and does not affect the discussions.
From Figure 1, we confirm numerically that exponentially-decaying, long-range interactions do not harm exponential localization in contrast to other long-range interactions. It is not entirely surprising since the long-range hopping decays exponentially and becomes negligible when the hopping distance is large. In addition, in the W g limit, the localization improves slightly in the long-range model in the sense that the localization length shrinks a bit. In the opposite limit with W g, however, the localization worsens in the long-range model.

Localization Criteria
We first apply the Mott's and Levitov's criteria reviewed in Sec. 2.1 to determine the localization phase of our model. To use the criteria, we need to evaluate the bandwidth of the spectrum. The hopping terms could be represented by a Toeplitz matrix, t, which is translationally invariant and satisfies t j,k ≡ t k−j . Spectral properties of Toeplitz matrices have been studied extensively in the math literature, and we review the relevant mathematical tools and apply them to our model in Appendix A. We only quote the final result below. As long as b > 1, we have the spectrum in the full range so, the bandwidth is |∆( The variance of the random terms in the model is 2 ≡ σ 2 = W 2 /3. Thus the Mott's criterion for ergodic states in Eq. (3) is never satisfied. Since Mott's criterion is a sufficient but necessary condition, its violation cannot guarantee localization. On the other hand, Levitov's criterion in Eq. (4) is always satisfied since where ∑ j =k 1 b |j−k| ∝ N. Since Levitov's criterion is a sufficient condition of localization, we prove that the eigenstates of our model are localized. Yet, it does not tell us the detailed properties of the localization, which we will study in the next section.

Analytical Approach in the Strong Localization Regime
Now we use the locator expansion method and forward scattering approximation to show that localization is exponential and to estimate the localization length in the strong localization regime. While the emergence of localization is trivial in the W = ∞ limit, it is still non-trivial to confirm that this localization phase is stable when including hopping with strength g/W 1. More importantly, we want to know whether the localization follows an exponential or power law and what the localization length is. We will first review the methodology in the context of the Anderson model and then modify and adjust the derivation to our model. For complete treatments see refs. [1,26,31,32], which offer more mathematical insights concerning the general theory.
Review of the Method. In the Anderson model, in the infinite diagonal disorder limit, we could write the Hamiltonian as where i, j label the sites on the lattice. As discussed before, 's are taken randomly from a uniform distribution in the range [0, W]. Consider the resolvent defined by where ∆(H) is the spectrum of H. We denote the set of eigenstates of H as {ψ α }. The matrix element of the resolvent in the site basis is given by In the limit of large diagonal randomness, the size of an eigenvector ψ α at site n with its main support located at site a (so that ψ * α (a) ≈ 1) is given by The eigenvector is exponentially localized if ψ a (n) decays exponentially in the large n limit. Given the general discussion above, it is clear that our goal is to calculate the resolvent, which we could carry it out in perturbation theory with the expansion parameter g/W. The unperturbed resolvent is We iterate this equation and find We could rewrite this series summation in a random-walk representation: where p : i → j is a path starting from the i-th site and ending at the j-th site with a total length |p|. p(s) labels the site at the s-th step in the path. This expression could be further refined as a sum of self-avoiding walks by factorizing all closed loops out of the summation. Combining Eq. (20) and (22), we have To simplify the computation, we proceed with the forward scattering approximation. In this approximation, we only consider the maximal term in the sum, which corresponds to the shortest path as all longer paths are higher order in g/W. For simplicity, we set α = a = 0 and a = 0. This gives where we use the central limit theorem and replace ∑ n s=1 ln( p(s) ) by n ln( p(s) ) = n W W 0 ln( )d = n (ln(W) − 1). This shows that in the strong localization limit, ψ falls off exponentially away from the dominant supporting site: |ψ 0 (n)| ∝ e −n/L with the localization length given by which is positive and finite in the large randomness limit, W g.
Application to our model. We are now ready to apply a similar strategy to study localization in the large randomness regime of our model. In our model, the perturbation Hamiltonian changes to The random walk representation of the resolvent is Comparing the equation above with the representation of Anderson model in Eq. (22), one could see that the main difference is that the coupling for each step of random walk could vary, depending on the step length. To use Eq. (20) to determine the shape of the eigenfunction ψ 0 from G(0, n, E), we consider two possible paths: a one-step path p 1 : 0 → n and a n-step path: p n : 0 → 1 → 2 · · · → n. Taking into account only one of the two paths at a time, we find that In the limit W/g 1, the wavefunction at site n receives its dominant contribution from p 1 . The other paths' contributions are also suppressed compared to that of p 1 . The estimate above is crude but suffices to demonstrate the existence of exponential localization in the large randomness limit. Taking into account p 1 only, we estimate the localization length to be which we confirm to be consistent with numerical results. Let's compare our model with the power-law long-range interaction model, in which the off-diagonal hopping scales as |i − j| −β . Considering a single hop from the 0-th to the n-th site (the p 1 path discussed above), we find that The wave function decays algebraically as n −β , consistent with results in the literature, e.g., ref. [14]. These techniques can also be used to estimate the localization length for the theory space models that we will discuss in the ensuing sections. It is not difficult to check that in the large randomness limit, the localization length in the theory space models are the same as the ones of the corresponding Hamiltonian models.

Localization in Theory Space I: Anderson-Inspired Models
Now we discuss how the localization ideas from condensed matter systems with Hamiltonians represented by random matrices can be adapted to construct quantum field theory (QFT) models. In this and next section, we consider toy QFT models consisting of a large number of real scalars π's. In other words, we consider the theory space of scalar fields with each scalar field corresponding to a site on the onedimensional lattice reviewed in section 2. Instead of studying the Hamiltonian, we study the mass matrix of the scalars, M π .
The Anderson localization inspired construction has already been considered in refs [15], which we will briefly review below. Consider a model of N real scalars π i with the following Lagrangian: where i are drawn randomly from a uniform distribution over [0, W]. This toy model could arise from a model of complex scalar fields, Φ i 's, 5 The complex scalar UV completion has a U(1) N symmetry, which we take to be spontaneously broken by V(Φ † Φ) and explicitly broken by the rest of the potential terms. The spontaneous breaking gives rise to a set of goldstone bosons π's, which are the phase modes of the Φ's. π's obtain a potential with the quadratic terms given in Eq. (31) from the explicit breaking terms. This model is equivalent to: which could be obtained from Eq. (31) by a field redefinition π even → −π even while keeping π odd unchanged. Thus their localization properties are identical. The Lagrangians in Eq. (31) and Eq. (33) clearly resemble the Anderson model. The real scalar π i corresponds to the i-th site on the 1d lattice. The diagonal mass term with coefficient i plays the role of the on-site energy while the mass mixing terms correspond to nearest neighbor hopping. In the strong localization limit, when W g, each mass eigenstate obviously has support dominantly from only one of the real scalars. In the more interesting weak localization limit, when W g and the diagonal mass terms are perturbations, the lightest eigenstate is more localized compared to the other eigenstates, analogous to the states at band edges in the Anderson model. Note that the nearest neighbor mixing terms only break down U(1) N to U(1) while terms break U(1) N entirely. Thus, in the weak localization limit, the lightest eigenstate remains as a massless Goldstone boson before adding the diagonal perturbations.
Numerically the localization length is quite large in the weak localization limit with large g/W, which is the more interesting limit compared to the strong localization limit from the model building point of view. For example, g/W ∼ 20, one needs to include at least N ∼ 30 scalar fields to achieve an exponential hierarchy e −N/L of order 0.2 (see panel (b) in Fig. 2). Relatively weak localization persists even when g/W is reduced to be of order 1. Thus it will be interesting to explore other theory space models based on random matrices, which could have different localization properties and potentially improve the localization quality with g/W 1.

Localization in the Theory Space II: Non-Local Theory Space Models
In this section, we study non-local theory space models, which are inspired by the Hamiltonian model introduced in Section 3. We first consider two toy models for N real scalars π's with Lagrangian L + and L − : where j is pulled from the uniform distribution [0, W]. The j 's and g have mass dimension two, and b > 1 is dimensionless. L + and L − only differ by relative signs in the second potential term. While these two models appear very similar, and, indeed, all of them have the lightest eigenstate exponentially localized at large but finite N, they differ significantly in both embedding into UV completions and localization properties such as localization lengths for a given set of parameters. On the contrary, in the local theory space model reviewed in the previous section, we could change (π j − π j+1 ) 2 to (π j + π j+1 ) 2 without changing any localization property and both models have very similar UV completions. We will start by examining the localization properties of these two relatively simple models in subsection 5.1 and then explain differences in the UV completion of these models in subsection 5.2. We choose to proceed in this order because subsection 5.2 introduces a third Lagrangian which is motivated solely from an interesting UV completion but is most easily studied and understood once we have already gained perspective on the toy models. The localization properties of the third Lagrangian are discussed in our final subsection, 5.3.

Localization of Toy Models in the Finite N Limit
In this section, we will present both numerical and analytical results of the localization properties of L ± in Eq. (34). As we shall discuss in the next section, L + does not have a simple perturbative UV comple-tion while L − does. Nevertheless, we will address both of them here as they have qualitatively different localization properties, which helps understand more complicated UV completions.

Numerical Results
.  We compute the localization lengths of the lightest mass eigenstates for three different types of random matrix ensembles based on L A and L ± at different N's and W's and the median of the results are presented in the first three panels of Figure 2. From now on, we fix g = 1/2 in L A and (g, b) = (1, 2) in L ± , where all the quantities are in the same (arbitrary) energy units. By this choice, the mass mixing between nearest neighbors takes the same value 1/2 in all three models. We see two interesting effects: • The inclusion of the non-local interactions in L − strictly worsens the localization of L A for all pairs of (N, W) studied.
• The inclusion of the non-local interactions in L + strictly enhances the localization of L A in the small disorder regime with g/W 1.
Indeed, as we will explain in the following section, for arbitrary small diagonal mass perturbations (W g), the lightest mass eigenstates in the L + model typically have a localization length L ∼ O(1). In addition, an O(1) localization length is present even in the intermediate region W ∼ g, indicating that strong localization of the lightest eigenstate is attained throughout the entire parameter space. Note that these properties of L + persist with the mean localization length as well as the median. As briefly commented in section 3.1, the reason we choose to report the median rather than the mean is due to the fact that the distribution of the localization length could have a small tail extending to very high values for some random We also want to comment that the nice localization properties of L + are robust even when we further allow random variations in the non-local mass mixings. Consider L G + defined as follows: where the superscript G stands for Gaussian and indicates that g j−i should be interpreted as parameters pulled independently from a normalized Gaussian distribution centering around g = 1 with a variance σ, f (x| 1, σ). In this case, the off-diagonal terms in the mass matrix are only partially correlated. The median localization length is presented in the last panel of Figure 2. As expected, adding randomness in the nonlocal mass mixing increases the localization length, e.g., by a factor of 3 when increasing σ = 0 to σ = 0.01, and thus weakens localization. Yet an O(1) localization length is still maintained in the L G + model even when σ is allowed to be as large as ∼ 100, demonstrating that the strong localization properties of L + are preserved under this generalization. There is a subtly: allowing g j−i to be pulled from a Gaussian distribution means that these parameters can now assume negative values. This implies that the massmatrix for L G + may not be always positive-definite and there could be tachyonic directions in the field space. Accordingly, it is necessary to introduce positive quartic terms of π's to stabilize their potential for this model.
Another way to study the exponential localization of L + is to use moments of the eigenvectors defined in Eq. (2). The results are demonstrated in Fig. 3. In panel (a), we show the components of lightest eigenstates for different choices of N. It is clear that the lightest eigenstates are exponentially localized. In panel (b), we compute and present I q 's of the lightest eigenstates from ensembles of mass matrices with N = 25, W = 0.05 based on three toy models. Since all three models are localized, I q (v) ∝ ξ −(q−1) according to Eq. (2), where ξ, the size of the support set, is related to the localization length and also measures how localized the state is: the smaller ξ is, the more localized the state is. From the panel, we could see that ξ(L + ) < ξ(L A ) < ξ(L − ), corroborating our findings using the median localization lengths. In panel (c), we show I q 's for different W's in the L + model. The larger W is, the more spread I q is for a given q, suggesting a weakened localization. Lastly, in panel (d), we present I 2 as a function of N, fixing W = 0.05. Indeed I q is independent of N, confirming again that the lightest eigenstate is localized. So far we focus on the lightest eigenstate and have not yet addressed the localization properties of the other light mass eigenstates. To this end, we present Figure 4, which plots the minimal end-site overlaps, min(|v 1 |, |v N |), as a function of mass for the various ensembles. For this plot, we have normalized the couplings g's in the various models to establish a similar bandwidth, while adjusting W to ensure that it is scaled proportionally to the nearest neighbor couplings (g and g/b in the L A and L ± models, respectively) in order to provide a fair comparison across the models. Having a small value of min(|v 1 |, |v N |) indicates that the state is well-localized. One could see that indeed for all the models, the lightest eigenstate is more localized than the other states.

Semi-analytical Understanding
While L + is just a toy model, it has very compelling numerical properties in the limit W g, which necessitate a more complete analytical treatment. This is important because as we will show in the next section, while L + itself doesn't have a UV completion, UV theories tend to interpolate between L + and L − . Understanding L + will be useful to understand the results of more complicated Lagrangians.
Background on Bisymmetric Matrices. Consider N real scalars π i with i = 1, · · · , N. In the regime, W g, we can write the mass matrix of the π fields with L + as: where P and C correspond to the random diagonal perturbations and contributions from (π i + π j ) 2 terms respectively: where the second term in C arises from summing over geometric series, The matrix C is almost a Toeplitz matrix with a translational invariance, except for the correlated but not necessarily equal diagonal entries, which break the invariance.
The most important feature of C is that it is bisymmetric. It is both symmetric and persymmetric, which means that C F = C where F denotes the flip transpose -a transpose about the diagonal extending from bottom left to top right of the matrix. An even-dimensional bisymmetric matrix could be decomposed into blocks as follows: where A T = A and B F = B. The concrete forms of A and B are given in Appendix B. There is a similar decomposition for odd-dimensional bisymmetric matrices. For simplicity, we will focus on the evendimensional case in the following discussions and similar arguments could be found for the odd-dimensional case. It is useful to introduce an 2N × 2N exchange matrix J, or equivalently, J i,j = δ i,N+1−j . The effect of multiplying J to be left of an arbitrary matrix, X, is to flip its rows from top to bottom while the effect of multiplying it from the right is to flip X's columns. It could also be easily verified that X F = (JX J) T = JX T J. It follows that all bisymmetric matrices commute with J. Thus C and J could be simultaneously diagonalized. We call an eigenvector x even if Jx = +x and odd if Jx = −x. In other words, even eigenvectors have x j = x N+1−j and are center-symmetric while odd eigenvectors are characterized by x j = −x N+1−j and are center-antisymmetric. It is also known that eigenvectors of the bisymmetric matrix C take the form where v is an eigenvector of A + BJ and v is an eigenvector of A − BJ. These forms are consistent with what we discuss above -the first type of eigenvector is even while the second type is odd. Clearly eigenvectors of C cannot be localized (yet they could be bi-localized as we will show). Given that they are either symmetric or anti-symmetric about the center, they cannot be fit by a single exponential envelope of any size. The eigenvalues of C are given by the union of the eigenvalues of A + BJ and A − BJ. Formal proofs of these statements are given in refs. [37,38], which also discuss similar cases of odd dimensional matrices.
Two Special Properties. The discussions above only rely on the bisymmetric nature of the C matrix. They hold for all the models such as the (π i ± π j ) 2 model. Yet the C matrix representing the (π i + π j ) 2 model also has some special features, which are absent in the other models. We first summarize these features, which are confirmed numerically: • The two lightest eigenvectors peak at both end sites.
• The two lowest eigenvalues of C matrix are almost degenerate, with the splitting decreasing exponentially with N: log(λ 2 − λ 1 ) ∼ −N.
Below we provide some semi-analytical understandings of the two features. These features arise partly due to the near commutativity of the submatrices A and BJ. More specifically, A and BJ commute in the infinite N limit. At a finite N, there exists a unitary matrix, U, that could almost diagonalize both A and BJ, with the average of the off-diagonal components of U † AU and U † (BJ)U scaling as a negative power of N. We provide the mathematical proof in Appendix B. Thus A and BJ share similar eigenvectors with small differences at large but finite N. Consequently, v and v , the lightest eigenvectors of A + BJ and A − BJ respectively are almost identical as well. Indeed numerically we confirm that at least for the two lightest eigenstates, v ≈ v . In addition, numerical studies show that v and v peak at the N-th site (or equivalently, they are dominantly supported by π N ) with an approximate exponential envelope. 6 Given Eq. (40), the two lightest eigenstates of the matrix C that encodes (π i + π j ) 2 are approximately v 0 = (Jv, v) T and v 1 ≈ (−Jv, v) T , both of which peak at both ends. This is shown in the top panels of Figure 6.
The corresponding two smallest eigenvalues, λ 1 , λ 2 , are approximately v T · (A ± (BJ)) · v (assuming v's are normalized) so that the difference between them is approximately 2v T · (BJ) · v. Indeed this estimate agrees well with the numerical results, as shown in Fig. 5. The mass splitting exponentially decreases with increasing N. Yet there is one subtlety in evaluating 2v T · (BJ) · v. One has to use the numerical values of v instead of using the approximate exponential envelop v ∼ e (i−N)/L . The reason is that since the mass splitting is tiny, any small deviations between the exact values of v and the exponential fits could modify its value.
Localization Mechanism. The special properties of C discussed above tell us how localization emerges in the (π i + π j ) 2 model in the small diagonal disorder limit. When the diagonal perturbations, i π 2 i are absent, the two lightest eigenstates, v 0 = (Jv, v) T and v 1 = (−Jv, v) T , form a two-dimensional nearly degenerate sub-space with wavefunctions peaking at both end sites. After adding the diagonal perturbations, the mass matrix of π's in the subspace becomes Diagonalizing the subspace, one finds that the two new lightest eigenstates are given by v 0 ± v 1 = (0, v) T and (Jv, 0) T to the leading order in the perturbation theory. Since v only peaks at one end site, the two lightest eigenstates are exponentially localized around one end site once the perturbation is turned on. This is shown in the bottom panels of Figure 6. Note that localization emerges in the models with L − or L A in a quite different way. For both models, there is no approximate two-fold degeneracy between the two lightest eigenstates. In fact, the lightest eigenstate is massless before adding the diagonal perturbations represented by the matrix P. It could be understood most easily from the point of view of UV completions. In both UV completions of the two models, there remains a U(1) symmetry which is only spontaneously broken, resulting in a massless goldstone mode with a flat wavefunction. Turning on a non-zero P leads to the localization of the lightest state. Yet the localization length shrinks slowly with increasing the magnitude of the perturbation W. This is in sharp contrast with the L + model. No matter how small the perturbation is, one gets exponential localization with a small localization length immediately after the perturbation is included because the lightest states are already bi-localized.

UV Completions
The real scalars π's could be identified as pseudo Nambu-Goldstone bosons, arising from symmetry breaking of a model with a set of complex scalars Φ's. The exponentially suppressed mass mixing between nonadjacent π's could be generated at loop levels from interactions between adjacent scalars. Let's illustrate the idea with the following Lagrangian of N complex scalars, by the symmetry preserving potential term V(Φ † j Φ j ) and each scalar obtains a vacuum expectation value (vev). For simplicity, we assume all the vevs are the same, Φ j = f . Using non-linear parametrization, we write Φ j = f + φ j e iπ j /( √ 2 f ) . Now we inspect the two explicit symmetry breaking terms and the potential of the pseudo Nambu-Goldstones they generate. We assume λ j 1 and j f 2 so that these two terms could be treated as perturbations. The j Φ 2 j terms break the U(1) N symmetry explicitly and entirely while the quartic interaction between each Φ and its adjacent scalar breaks U(1) N explicitly down to a single U(1). Assuming that each scalar carries charge +1 under the corresponding U(1), each quartic coupling λ j could be treated as a symmetry breaking spurion, carrying charge −2 under U(1) j and charge 2 under U(1) j+1 . At tree level, these two explicit breaking terms generate random diagonal masses, e.g., j π 2 j /2 and mass mixing between adjacent π's, e.g., λ j f 2 (π j − π j+1 ) 2 . At loop level, one could generate quartic interactions between non-adjacent complex scalars such as Φ 2 i Φ †2 j with |j − i| > 1. By spurion analysis, the coefficient of Φ 2 i Φ †2 j has to be proportional to λ i λ i+1 · · · λ j . One example Feynman diagram is shown in Fig. 7, which leads to an estimate of the coefficient to be up to a logarithmic factor. We assume that all λ's take the same value for simplicity, λ j ≡ λ. Integrating out the heavy radial modes (with masses of order ∼ f ) and mapping onto the low-energy EFT containing only π's (with masses of order max( √ λ f , √ ) ), we obtain L − in Eq. (34) and identify Note that the identifications above are approximate and could differ at the order one level, due to the logarithms from loops. Figure 7: An example of the diagrams that generate quartic interaction between two non-adjacent scalars at loop level, One may think to apply the same strategy as above to UV complete L + . However, using spurion analysis, one would find that it is not possible to construct a UV completion leading to L + . Instead, one always gets a mixture of (π i + π j ) 2 and (π i − π j ) 2 terms. To see this, consider a different tree-level Lagrangian: where terms other than the quartic interactions are identical to those in Eq. (42). In this case, the spurion y j carries charge −2 under U(1) j and −2 under U(1) j+1 . A series of quartic interactions between non-adjacent Φ's are generated at loop order, taking the form Using the non-linear parametrization of Φ and setting all y i 's to be real and equal to y to preserve CP, one could find this leads to a low-energy EFT with the Lagrangian which could be reduced to L − by a field redefinition π even → −π even . One could also consider a mixture of quartic interactions containing both Φ 2 j Φ 2 j+1 and Φ 2 j Φ †2 j+1 . This leads to a low-energy EFT which contains a linear combination of the potential terms in L − and L + . Notably, while each individual quartic interaction explicitly breaks the U(1) N symmetry down to U(1), considering both sets of interactions together breaks the U(1) N completely. To derive the form of this Lagrangian, we could again perform a spurion analysis. Take the mass-mixing between π i and π i+3 as an example. There are eight Feynman diagrams contributing, as depicted in Figure 8. They generate up to logarithms. Now, we take λ j ≡ λ and y j ≡ y with λ, y real to simplify the expression. The above equation then becomes Note that the prefactor of λy 2 (λ 2 y), which is three in this example, could be understood as a combinatoric ( 3 1 ): we are choosing exactly one of the three vertices to be a λ (y). In general, the coefficient of the mass mixing term takes the form λ a y b with a + b equal to the separation of the two scalar fields. The prefactors could be obtained using a combinatorial argument. First go back to the original model with the most general couplings. In the spurion analysis, y i y * j carries charge (-2, 2) under U(1) i × U(1) j+1 . Equivalently, a diagram going through y i and y * j vertices connect Φ 2 i and Φ †2 j+1 . In general, diagrams containing even pairs of y i and y * j vertices connect a scalar field and the complex conjugate of another scalar field. On the other hand, diagrams containing odd numbers of y i and y * j vertices connect a scalar field with another scalar field or the complex conjugate with another complex conjugate. When we set all y's (λ's) to be equal and real, this just reduces to the statement that if the coefficient contains even numbers of y's (b is even), they are associated with Φ 2 i Φ †2 j while they contain odd numbers of y's (b is odd), they are associated with Φ 2 i Φ 2 j . The prefactors could be obtained by counting the number of ways to select b vertices out of a total of a + b vertices to be y's: ( a+b b ). This analysis gives rise to the following quartic operators connecting Φ i and Φ j (or the conjugates): up to logarithmic factors we ignore. The reason we could ignore logarithmic factors is that varying the coefficients by order one amount does not change the localization property, as we will show in the following section. Φ 2 i Φ 2 j lead to (π i + π j ) 2 while Φ 2 i Φ †2 j lead to (π i − π j ) 2 in the low-energy EFT. Combining and summing over terms lead to the following Lagrangian of π's: where we could identify: It is possible to generalize the discussions above in two ways. The first is to consider the case where we allow the couplings y's (λ's) to take different values. Indeed, one could conceivably let i , y i , λ i all be random parameters pulled from various distributions. We shall treat this setting in the following section. Another way to generalize our discussion is to consider a UV theory will all possible relevant and marginal operators that couple adjacent scalars in the theory space. For example, one could consider a UV theory that also contains operators like Interestingly, allowing for the zoo of operators gives qualitatively similar results when we consider (non-local) mass-mixing terms to their lowest order in couplings because for any set of operators that allow us to traverse through the theory space via loops, the loop structures are similar and getting closed form answers is reduced to combinatorics. Because the models appear to be qualitatively similar, we forgo discussing this more complicated example.

Localization of L mixed
In this section, we will show that the L mixed model constructed at the end of last section allows us to achieve O(1) localization lengths even when randomness is arbitrarily small. The mechanism that guarantees localization in the small randomness region is identical to the mechanism for the L + model discussed in section 5.1.2.
We first note that UV Lagrangian described in the last section that leads to L mixed has four free parameters: W, f , λ, and y. L mixed could be parametrized by W, g, c, d, which are functions of the UV parameters, as demonstrated in Eq. (52). For the rest of the section, we will use the set of UV parameters. To have the explicit symmetry breaking terms controlled by W, λ and y as perturbations and all quartic couplings relevant for the discussions, we need to impose  Figure 9: Median localization length heat map (based on ensembles of 100 matrices for each set of λ and y) for L mixed with j ∼ 0.08 f 2 . Note that we have not taken the log 10 of the localization lengths in this figure as we have done with the other heat maps. Accordingly, one should be cautious to compare this heat map to the others presented earlier.
The small parameters are technically natural in the sense that in the limit they are zero, the Lagrangian preserves a U(1) N symmetry. For the rest of the section, we take W/g = 10 −3 , which corresponds to j / f 2 ≈ 0.08, and vary λ and y.
The heat map of median localization length for this model is presented in Figure 9. Note that while all other heat maps in this article present log 10 (L), we simply present the median localization lengths in Figure 9. Remarkably, all these median localization lengths are all O(1) when λ and y are of order of 10 −2 − 10 −1 . Note that the localization lengths should remain the same under the exchange (λ, y) → (y, λ). This is because this transformation changes L mixed via (c, d) → (c, −d), which is the same as L mixed under the field redefinition π even → −π even . The small differences in the localization length under the exchange, shown in Fig. 9, is due to statistical fluctuation of our ensemble with a limited number of matrices.
Why do these O(1) localization lengths appear even when W/g 1? It turns out that the localization mechanism for L mixed is identical to the localization mechanism in L + studied in section 5.1.2. Indeed, we see that the two lightest states of L mixed with W = 0 are nearly parity-conjugates of one another with maximal support on the first and the last components. This is depicted in Figure 10, which plots the two lightest eigenstates for a random selection of this class of models. Furthermore, the masses of these states are nearly identical with a tiny difference for many technically natural values of λ and y. It follows that the conditions described in section 5.1.2 are met and that for arbitrarily small values of W, the lightest eigenstate will be exponentially localized with support around one particular end-site.
What happens when we loosen our restrictions on λ and y by allowing them to assume random values? We first consider the case where λ and y are real parameters (to preserve CP), chosen independently from the uniform distribution U [.02, .1]. We give the probability density function (PDF) of the resulting localization lengths in the top panel of Figure 11. Interestingly, we find that the median localization length is less than 1 and that 99.9% of all computed localization lengths are less than 10. Next, recall that in the previous section we made the simplification λ j ≡ λ and y j ≡ y in order to simplify the formalism and to allow a low-energy EFT to be written in a closed form. In principal, however, we do not need to do this; indeed, while we cannot write the mass-matrix of the low-energy EFT in a closed form, we can always construct it and simulate its mass matrix numerically. To this end, we let λ j and y j be real parameters all pulled independently from U [.02, .1]. The relevant PDF is given in the bottom panel of Figure 11. One can check that the median localization length is even smaller in this generalization of the model and, again, 99.9% of all localization lengths are less than 10. Altogether, it is clear that the strong localization in the low-energy EFT is robust to random perturbations of the UV theory provided couplings in the UV theory remain small.

Conclusion and Outlook
In this article, we explore a new class of random matrix model with long-range hopping that decays exponentially with site separation. We first consider the associated Hamiltonian model of a single particle on a 1D lattice with N sites and prove the exponential localization of the energy eigenstates, both numerically and analytically. We then mainly focus on the inspired non-local theory space constructions containing a lattice of scalars, π's, mixing with each other in mass terms and with Lagrangians, L ± (Eq. (34)). In particular, we are interested in the small diagonal randomness limit, in which the diagonal random masses of the scalars are small and intuitively there is no guarantee of a localized eigenstate. While the various theory space models look similar and all exhibit exponential localization analogous to the Hamiltonian model, they have dramatically different properties. The main findings of our paper are, compared to the local theory space model in Ref. [15], in the small diagonal disorder limit, • The non-local mixing in the L − model weakens the localization of the lightest mass eigenstates.
• The non-local mixing in the L + model significantly strengthens the localization of the lightest mass eigenstates, with an O(1) localization length.
What happens in the L + model is that in the absence of random diagonal perturbations, the two lightest mass eigenstates are nearly degenerate and peak at both end sites. Once the random diagonal masses are turned on, no matter how small they are, both the lightest eigenstates are mixed up and the resulting new eigenstates peak only at one end site. Unfortunately while the L − model could be easily UV completed by identifying π's as pNGBs and generating non-local interactions at loop levels, the L + model does not have a simple perturbative UV completion. However, we could construct models, following a similar strategy to UV complete the L − model, which leads to a low energy EFT of π's, L mixed , that interpolates between L + and L − . We show that their localization properties are very similar to the L + model. There are several directions to expand the study of localization in the non-local theory space models. One important question, which we don't fully address, is the origin of the exponential bi-localization of the L + and L mixed models in the absence of random disorders. Ultimately, the goal of studying exponential localization in the theory space models is to develop new mechanisms to generate exponential hierarchies. For example, as discussed in Ref. [15], one could use the Anderson-inspired local theory space model to explain the flavor hierarchy of the standard model quarks with an exponential profile of a localized scalar. Similarly, we could use the non-local theory space model to explain the CKM matrix with fewer scalars involved. Other possible applications could be found by drawing analogies to those of generalized clockwork [39] and clockwork WIMP/neutrino models [40][41][42][43]. Since it is not our focus to develop any new applications in this article, we will save this as exercises for interested readers. It is also of great interest to explore whether there are other classes of random matrix models and associated theory space constructions that enjoy different localization properties, like the L + model and its variant we present.
A Spectrum of Hamiltonian in Eq. (13) In this appendix, we discuss the spectrum of Hamiltonian in Eq. (13). The Hamiltonian could be written as The hopping matrix T is a Toeplitz matrix, respecting the translational invariance T j,k = t j−k . It is useful to compute the Fourier transform of the Toeplitz matrix elements [44] where the function f is usually referred to as the symbol of the Toeplitz matrix. For the Toeplitz matrix defined in Eq. (54), we find that It is known that the spectrum of a Toeplitz matrix is bounded by the essential infimum and supremum of its symbol [45]. In our case, since f (Eq. (56)) is an even function that is 2π-periodic, we only need to consider f on [−π, 0]. In the limit N → ∞, the minimal and maximal eigenvalues converge to min f and max f respectively. In addition, for Riemann integrable f such as the one in Eq. (56), the set of eigenvalues {λ j } has the same distribution as the set Furthermore, it has been proved that the above two sets are almost equal and the difference between corresponding elements is small [46]. An example is presented in Fig. 12, where we illustrate that the eigenvalues of T are, indeed, almost identical to the analytical prediction obtained by considering T's symbol. f (−π + πj N +1 ) Numerical To compute the bounds on the spectra of the full matrix H, one could also use the following inequality: where λ j (A) denotes the j-th largest eigenvalue of matrix A [47]. Altogether, we find which gives Eq. (14). It is also worth noting that this technique is quite general and can be used to bound the spectra of the mass matrices considered in sections 4 and 5. These mass matrices are slightly more complicated, however, because we cannot simply write them as the sum of a Toeplitz matrix and a diagonal matrix with j 's as entries. Instead, it is useful to write the mass matrix as the sum of a Toeplitz part, T, a diagonal remainder part, D, and an -dependent random part, P. For instance, one would decompose the mass matrix of L + (Eq. 34) as follows L + ⊃ π T · M π · π M π = gT + gD + P with Then, one could find the spectrum of T by computing its symbol and incorporate the spectra of D and P (which are trivial to calculate because these matrices are diagonal) to discern the spectrum and bandwidth of M. This technique works for an arbitrary bisymmetric model too and is a powerful tool that can also be used to show that the smallest two eigenvalues of L + are suppressed by an order one amount when compared to the rest of the spectrum. Concretely, to calculate the spectrum of L + , we start by computing the symbol of T, which gives a good estimate of most of the eigenvalues, as shown in Figure 13. One can see that the two lightest eigenvalues are evidently smaller than the prediction of the symbol, however. To accurately approximate those two, we include the diagonal matrices P and D to first order in perturbation theory. This gives the following expected shifts to the lightest two eigenvalues where we have used the fact that minimal mass eigenstates are localized around one of their end-sites. We plot the two lightest eigenvalues taking into account of the first order corrections, denoted λ * min , in Figure 13 as well. It is clear that our analytical predictions give a good estimate of the spectrum, and this perturbative technique offers a compelling explanation regarding why the lightest eigenvalues of L + are relatively suppressed. Why aren't the other eigenvalues suppressed similarly? This is because the lightest eigenvectors have components that are localized around one of the end-site fields. Indeed, having larger components on end-sites means that v T · D · v is smaller because D too has larger components near the edges of the diagonal that taper off exponentially as one proceeds to more central components. Because other eigenvectors corresponding to heavier eigenstates have more centralized support (to ensure orthogonality), they are less affected by D, to the first order in perturbation theory.

B Proof of Near Commutativity of A and BJ
The goal of this appendix is to discuss the approximate commutativity of the matrices A and BJ defined in Eq. (38), which is used in Sec. 5.1.2 for a semi-analytical understanding of the results. For simplicity of notations, we define the coupling, g j ≡ g/b j .
We consider N real scalars with N being even. Similar arguments could be developed for odd N. Given Eq. (37) and (38), we have A and BJ as N/2 × N/2 matrices taking the forms: g 1 · · · g N/2−1 g 1 G N/2−1 · · · g N/2−2 . . . . . . . . . . . . g N/2−1 g N/2−2 · · · G 1       BJ =       g 1 g 2 · · · g N/2 g 2 g 3 · · · g N/2+1 . . . . . . . . . . . . g N/2 g N/2+1 · · · g N−1 where the diagonal entries G i are given by To demonstrate the nearly commutativity of A and BJ, it is useful to introduce several quantities. First, the Frobenius norm of a real matrix, X, is defined as: We also define the sum of the modulus of the off-diagonal components squared as: Lastly we define α(X, Y) as α(X, Y) = inf{Off(U † XU) + Off(U † YU) | U ∈ U(N)}, where inf indicates the greatest lower bound of the set. If α(X, Y) is small, there exists a unitary matrix that almost diagonalizes both matrices X and Y. It has been shown that there exists a real valued function such that lim x→0 (x) = 0 and for all selfadjoint N/2 × N/2 matrices X and Y with unit Frobenius norm, with (x) = f (x −1 )x 1/5 near the origin, where f is a function independent of N that grows slower than any power of x [48]. In other words, if two self-adjoint matrices almost commute so that the right hand side of equation above is close to zero, they could almost be diagonalized at the same time and thus share similar eigenvectors. Now, we present a proof that as long as ∑ i g i < ∞, then the matrices A and BJ commute in the N → ∞ limit. 7 We introduce the normalized matricesĀ and BJ as It is straightfoward to show that In the large N limit, it is clear that Combining the equations above, we find that κ α(A, BJ) N 3/2 < (|| Ā , BJ || F ), where κ is some real number independent of N. Next we evaluate the commutator on the right hand side in the equation above, Finally we want to prove that ||[A, BJ]|| F is bounded by some constant and doesn't grow with N.

||[A, BJ]||
To get the third line, we remove the middle term which is always negative in the models we consider as all elements of A and BJ are positive. It can be shown that this term is also bounded by a constant. We then 7 Interestingly, ∑ i g i < ∞ also leads to satisfying the Levitov criterion.
insert the expressions for the matrix elements (BJ) i,j = g i+j−1 , A i,j = g |i−j| for i = j and A i,i = G N/2+1−i ≤ g 0 , where g j < g 0 ≡ max{G j | j ∈ N} < ∞.
where in the last line, we use ∑ j g j and ∑ j (∑ k g k+j−1 ) 2 are finite.
In summary, we have demonstrated that where κ, η are real constants. Denoting the average value of the squares of the off-diagonal components of the matrix X as X i,j | off , we have α(A, BJ) = N(N − 1)( U † AU | off + U † (BJ)U | off ). Given the behavior of (x) near the origin in the N → ∞ limit, we have Therefore, in this limit, the matrices A and BJ can be simultaneously diagonalized, and, accordingly, components of their eigenvectors converge. Indeed, the fraction of components of U † AU (and U † (BJ)U) that are non-zero (given by N −3/5 ) tends to zero demonstrating that eigenvectors of A and BJ agree except perhaps at a finite fraction of sites.