Glassy word problems: ultraslow relaxation, Hilbert space jamming, and computational complexity

We introduce a family of local models of dynamics based on ``word problems'' from computer science and group theory, for which we can place rigorous lower bounds on relaxation timescales. These models can be regarded either as random circuit or local Hamiltonian dynamics, and include many familiar examples of constrained dynamics as special cases. The configuration space of these models splits into dynamically disconnected sectors, and for initial states to relax, they must ``work out'' the other states in the sector to which they belong. When this problem has a high time complexity, relaxation is slow. In some of the cases we study, this problem also has high space complexity. When the space complexity is larger than the system size, an unconventional type of jamming transition can occur, whereby a system of a fixed size is not ergodic, but can be made ergodic by appending a large reservoir of sites in a trivial product state. This manifests itself in a new type of Hilbert space fragmentation that we call fragile fragmentation. We present explicit examples where slow relaxation and jamming strongly modify the hydrodynamics of conserved densities. In one example, density modulations of wavevector $q$ exhibit almost no relaxation until times $O(\exp(1/q))$, at which point they abruptly collapse. We also comment on extensions of our results to higher dimensions.


I. INTRODUCTION
A common paradigm in quantum dynamics is that isolated quantum systems usually thermalize if one waits long enough [1][2][3][4][5].Indeed, assuming that interactions are spatially local, quantum systems tend to approach a form of local equilibrium on a timescale that is independent of system size, with the late-time dynamics governed by the hydrodynamic relaxation of a small number of conserved densities.The main possible exception to this rule is the many-body localized phase in strongly disordered systems [6,7], or in systems which effectively self-generate strong disorder [8][9][10].The structures that can be rigorously shown to arrest thermalization in translationinvariant quantum systems-such as integrability [11], quantum scars [12][13][14][15][16][17][18][19][20], dynamical constraints [21][22][23][24][25][26][27][28][29][30], etc.-are usually fine-tuned in some sense.However, they are still of experimental relevance since many realistic systems are near the fine-tuned points where these phenomena occur [31][32][33][34][35][36].Systems near these points have long relaxation timescales and approximate conservation laws that are essentially exact on the timescale of realistic experiments on noisy quantum hardware and cold atom systems.Although the algebraic structure of integrable systems and systems with many-body scars has been well studied, a general understanding of the extent to which local Hilbert space constraints can arrest thermalization is still under development.
In this work we introduce an alternative viewpoint for understanding thermalization in a large class of onedimensional models with local Hilbert space constraints.
We begin by developing a general framework for characterizing models with constrained dynamics in terms of semigroups, algebraic structures that resemble groups but need not have inverses or an identity.This approach reproduces examples of constrained models known in the literature, but also provides us with new examples with unusual properties.In particular, it enables us to leverage ideas from the field of geometric group theory to construct (a) models with both exponentially long relaxation times and sharp thermalization transitions, and (b) models where thermalization never occurs due to an unusual type of ergodicity breaking we call "Hilbert-space jamming" or "fragile fragmentation." The relation between constrained one-dimensional spin chains and semigroups can be summarized as follows (see Sec. II for a more formal discussion).Most examples of constrained systems in the literature-and all of the examples considered in this work-have constraints which can be formulated in a local product state basis.In these systems, the constraints place certain rules on the processes that the dynamics is allowed to implement, and it is these rules which endow the dynamics with the structure of a semigroup.This identification works by viewing each basis state of the onsite Hilbert space, in the computational basis, as a generator of a semigroup.Since any element of the semigroup can be written as a product of generators, a many-body computational-basis product state is naturally associated with an element of the semigroup, obtained by taking the product of generators from left to right along the chain.We call each computational basis state a word, with each word being a presentation of a certain element in the semigroup.In this picture, the constraints are encoded by requiring that the dynamics preserve the semigroup element associated with each product state.In Sec.II, we show that all local dynamical constraints can be formulated in this way.
Of course, not all words represent distinct semigroup elements.For example, in the case where the semigroup is a group G with identity element e and elements g 1 , g 2 , three distinct words of length 4 representing the same group element are eeee, g 1 g −1 1 g −1 2 g 2 , and g 1 g 2 g −1 2 g −1 1 .Each distinct semigroup element is thus an equivalence class of words under the application of equivalence relations like g i g −1 i = ee.The most general local dynamics that preserves semigroup elements is precisely one which locally implements these equivalence relations.
The equivalence classes so defined produce multiple sectors of the Hilbert space ("Krylov sectors" or "fragments") that the dynamical rules are unable to connect, breaking ergodicity and leading to Hilbert space fragmentation [21][22][23].Within a given fragment, thermalization of an initial basis state is a process by which the dynamics of the system "works out" which words represent the same semigroup element as the initial state.Crucially, when the problem of determining which words represent a given element is computationally hard, thermalization within each fragment is slow.
This general perspective is powerful because it maps the problem of thermalization in these models onto a well-known algorithmic problem, the word problem for semigroups (a perspective also adopted by Hastings and Freedman in Ref. [37] to provide examples of dynamics exhibiting 'topological obstructions' which provide a separation between the performances of QMC and quantum annealing).The word problem is the problem referenced above, namely that of deciding whether two words represent the same semigroup element.This identification allows us to lift examples of computationally hard word problems from the mathematical literature to construct models with anomalously slow dynamics.In these models, the dynamics connects the basis states within each fragment in a manner which is much sparser than in generic systems (see Fig. 1 for an illustration), and it is this phenomenon which leads to long thermalization times.
The first part of this work focuses on models where the word problem takes an exponentially long time to solve.We place particular focus on the "Baumslag-Solitar model", a spin-2 model with three-site interactions for which the relaxation time of a large class of initial states under any type of local dynamics (Hamiltonan, random unitary, etc.) is provably exponentially long in the system size.This model has a conserved charge, and this exponentially long timescale shows up as an exponentially slow hydrodynamic relaxation of density gradients.Not only the relaxation timescale but also the functional form of relaxation is anomalous: a state with density gradients relaxes "gradually, then suddenly," with an initially prepared density wave experiencing almost no relaxation for exponentially long times, but then undergoing a sudden collapse at a sharply defined timescale.Despite this extremely slow hydrodynamics, the states involved are not dynamically frozen: each configuration is rapidly locally fluctuating, and generic local autocorrelation functions decay rapidly.
In the second part of this work, we turn to word problems that have high spatial complexity: i.e., solving them requires not just many steps, but also a large amount of additional space for scratchwork.Put differently, in these problems, deciding whether two words of length L are equivalent requires a derivation involving intermediate words much longer than L.
In the corresponding dynamical systems, one has (for any fixed L) pairs of basis states |a⟩ , |b⟩ such that: (i) |a⟩ is not connected to |b⟩ by the dynamics of a chain of length L, but (ii) |a⟩ ⊗ |c⟩ is connected to |b⟩ ⊗ |c⟩ by the dynamics of a longer chain, with |c⟩ a fixed ancillary product state.This phenomenon can be viewed in two complementary ways: as a "fragile" form of Hilbert space fragmentation, 1 or as an unusual type of jamming which has no counterpart in known examples of jammed systems.Using constructions similar to the group model discussed above, we can construct examples where the amount of additional spatial resources grows extremely rapidly with L; not just exponentially but also as e e L , e e e L , and so on.This paper is organized as follows.In Sec.II we introduce word problems for semigroups and groups, and relate them to fragmentation.In Sec.III we use the complexity of the word problem to derive bounds on thermalization times, and in Sec.IV we explore an explicit example of a high-complexity group word problem which yields dynamics with exponentially slow relaxation.We present numerical evidence and analytical estimates for the anomalously slow hydrodynamics of this model.In Sec.V we introduce and analyze a family of group models with fragile fragmentation.Secs.VI and VII respectively present examples based on semigroups, and generalize our one-dimensional examples to two-dimensional loop models.Finally, we conclude with a discussion of future directions in Sec.VIII.

II. SEMIGROUP DYNAMICS AND CONSTRAINED 1D SYSTEMS
In this section we introduce the general framework used to construct the models described above.We will refer to this framework as semigroup dynamics, which encompasses a general class of constrained dynamical systems whose constraints can be derived from the presentation of underlying group or semigroup (to be de-FIG.1.A schematic illustration of the difference between the Hilbert space connectivity of generic dynamics and semigroup dynamics, with each yellow dot representing a computational basis product state in a single connected sector of the dynamics.In a 1D system of length L, typical dynamics (left) requires at most O(L) steps of the dynamics (applications of a Hamiltonian or layers of a unitary circuit) to move between any two basis states.In semigroup dynamics (right), the number of steps needed scales as the Dehn function Dehn(L) of the semigroup in question, which measures the word problem's temporal complexity.When Dehn(L) is large, the basis states in each sector are connected very sparsely, leading to long thermalization times.fined below).These types of constraints are particularly appealing from a theoretical point of view: it turns out that we can rigorously characterize many properties-thermalization times, fragmentation, and so on-using tools from the field of geometric group theory.Broadly speaking, geometric group theory is concerned with characterizing the complexity and geometry of discrete groups (see [41] for an accessible introduction), ideas which will be made precise in the following.
As a starting point, we will describe the necessary mathematical background needed to motivate group dynamics.This discussion will center around the word problem, a century-old problem lying at a heart of results regarding the geometry and complexity of groups.We will then see how algorithms solving the word problem can naturally be encoded into the dynamics of 1D spin chains, whose thermalization dynamics is controlled by the word problem's complexity.Finally, we will see how the structure of the word problem leads to Hilbert space fragmentation, and discuss the properties of the group which control the severity of the fragmentation.
Throughout this paper, we will mostly be studying constrained dynamics on 1D spin chains whose onsite Hilbert space is finite dimensional. 2We will only consider systems whose time evolution has constraints that can be specified in a local tensor product basis (referred to throughout as the computational basis), either directly or after the application of finite-depth local unitary circuit.In the latter case we will assume the unitary trans- 2 Generalizations to 2D are briefly discussed in Sec.VII.
formation has been done, to avoid loss of generality.
We will let S denote the set of computational basis state labels for the onsite Hilbert space, with individual basis states being written as letters in typewriter font (|a⟩, |b⟩, etc.): Strings of letters will be used as shorthand for tensor products, so that e.g.|word⟩ = |w⟩ ⊗ |o⟩ ⊗ |r⟩ ⊗ |d⟩.A ket with a single roman letter will denote a product state of arbitrary length, e.g.|w⟩ = |word⟩.

A. Dynamical constraints and semigroups
We will write Dyn(t) to denote time evolution for time t under the dynamics in question, which may be performed using a set of unitary gates, a (possibly space-or time-dependent) Hamiltonian, or a bistochastic Markov chain.Having a dynamical constraint means that not all computational basis product states |w⟩, |w ′ ⟩ can be connected under the dynamics.We will use φ(|w⟩) to denote the dynamical sector of the state |w⟩, defined as the set of all computational basis product states that |w⟩ can evolve to, thus The tensor product of computational basis statesthat is, the stacking of one system onto the end of another-defines a binary operation on the dynamical sector labels, which we write as ⊙: Since the tensor product is associative, so too is ⊙.This equips the set of dynamical sectors with an associative binary operation, thereby endowing it with the structure of a semigroup, a generalization of a group which needn't have inverses or an identity element. 3Since all |w⟩ are formed as tensor products of the single-site basis states |a⟩, the φ(|a⟩)-which we will write simply as a to save space-constitute a generating set for this semigroup.
The dynamical sector of a state |w⟩ = |a 1 ⟩ ⊗ • • • ⊗ |a L ⟩ is then determined simply by multiplying the a i along the length of the chain 4 : Following usual group theory notation, we will denote a semigroup with generating set S as where R denotes a set of relations imposed on the product states that can be formed from elements of the on-site computational basis states S. When writing presentations of groups we will omit the inverse generators and the identity from S, and will likewise omit trivial relations like aa −1 = e, ae = ea from R. For the remainder of the paper, presentations of semigroups which are not groups will always be denoted by semi⟨S |R⟩, while presentations of groups will be denoted simply by ⟨S | R⟩.
The relations in R are determined by φ, namely by which product states are related to one another under Dyn.Consider any two states |w⟩, |w ′ ⟩ such that φ(|w⟩) = φ(|w ′ ⟩) define the same element of G. Since we are interested in dynamics which are geometrically local, it must be possible to relate |w⟩ to |w ′ ⟩ using a series of local updates to |w⟩.This means that the set R must be expressible in terms of a set of equivalence relations which each involve only an O(1) number of the elements of S-implying in particular that |R| must be finite.A semigroup where both S and R are finite is said to be finitely presented, and all of the semigroups we consider will be of this type.
Given a semigroup G, we will use the notation Dyn G to represent a general local dynamical process acting on H = H ⊗L loc which satisfies the constraint (2), and hence preserves the dynamical sectors of all computational basis states.The locality of the dynamics means that Dyn G must be composed of elementary blocks (unitary gates or Hamiltonian terms) of constant length which, when acting on computational basis product states, implement the relations contained in R. Writing the relations in R as r l = r r with r l/r = a l/r,1 • • • a l/r,n , the locality of Dyn G is determined by the maximal length of the r l/r , which we denote as ℓ R : For us ℓ R will always be O(1) (and when G is a group, one can show that there always exists a finite presentation of G such that ℓ R ≤ 3; see App.A for the proof).
To be more explicit, first consider the case where Dyn G corresponds to time evolution under a (geometrically) local Hamiltonian H.The semigroup constraint and locality of H means that ⟨w ′ |H|w⟩ can be nonzero only if the words w, w ′ differ by the local application of a relation in R. H consequently assumes the general form where ,n ⟩ i+n and the λ i,r are arbitrary complex numbers.Note that the above Hamiltonian is only well-defined if |r l | = |r r | for all relations r.In cases where this does not hold, we will rectify this problem by adding a trivial character e to the onsite Hilbert space-with e defined to commute with all of the other generators a-which allows us to then 'pad' the relations r in a way which ensures that |r l | = |r r |.
As a simple example, consider the group Z 2 = ⟨x, y| xy = yx⟩, which as we will see later in some sense has trivial dynamics.Since the full generating set S = {x, x −1 , y, y −1 , e} of this presentation has dimension |S| = 5, a Hamiltonian H Z 2 with Z 2 -constrained dynamics thus acts most naturally on a spin-2 chain.The single nontrivial relation xy = yx has length ℓ R = 2, and thus H Z 2 can be taken to be 2-local, assuming a form like which describes two species of conserved particles, each of which defines a U (1) conserved charge n a = i |a⟩⟨a| − a −1 a −1 where a = x, y.The explicit examples we consider in this work will not be more complicated than H Z 2 in terms of their degree of locality or the dimension of H loc , but their dynamical properties will be much richer.
The construction of group-constrained random unitary dynamics is similar to the Hamiltonian case.For random unitary dynamics, Dyn G is constructed using ℓ Rsite unitary gates U whose matrix elements ⟨w ′ |U |w⟩ are nonzero only if the length-ℓ R words w, w ′ satisfy φ(|w⟩) = φ(|w ′ ⟩).Such unitaries admit the decomposition where G ℓ R denotes the set of elements of G expressible as products of precisely ℓ R generators.A particularly natural realization of Dyn G is when each U g is drawn from an appropriate-dimensional Haar ensemble, although we shall not need to specify to this case.
Existing examples of constrained 1D dynamics in the literature-from multipole conserving systems to models based on cellular automata and other constrained classical systems [21][22][23][24][25][26][27][28]-are all described by Dyn G for an appropriate semigroup G and an appropriate kind of dynamics (random unitary, Hamiltonian, etc.). 5 One of the main messages of this paper is that in addition to providing an organizing framework for discussing 1D constrained dynamics, approaching things from this point of view enables a large arsenal of mathematical tools from the field of geometric group theory to be brought to bear, FIG. 2. Semigroup dynamics and the word problem.a) In the word problem, we are given a length-L word |w⟩ and a series of re-writing rules that let us make local updates to the characters of the word.The word problem for |w⟩ is the task of deciding whether or not a sequence of allowed updates can be found which transforms |w⟩ into a particular reference word, here chosen to be |e L ⟩.The Dehn function Dehn(w) measures the time complexity of the word problem, namely the minimal number of updates needed to connect |w⟩ to |e L ⟩. b) In some situations, |w⟩ can only be transformed into |e L ⟩ by increasing the amount of available space, done here by appending |ee • • • ⟩ onto the original word.The minimal amount of extra space needed defines the expansion length EL(w), which captures the spatial complexity of the word problem.c) For a semigroup G = ⟨S|R⟩ defined by generators a ∈ S and relations between the generators ri ∈ R, our construction defines a dynamics Dyn G which acts on a 1D chain with an onsite Hilbert space H loc = {|a⟩ : a ∈ S}.The dynamics implimented by Dyn G is restricted to local updates which preserve the semigroup element obtained by multiplying the generators along the length of the chain; the allowed updates are consequently fixed by the relations in R (with the figure drawn using the relations r1 : ab = cd, r2 : ef = gh, etc.).
leading to general bounds on thermalization times, precise characterizations of ergodicity breaking, and explicit examples of models with extremely slow dynamics.

B. The word problem
We will now formulate the semigroup word problem, a concept key for determining the thermalization beahvior of models with semigroup dynamics.We will refer to a computational basis product state-defined by a string of generators in S, e.g.w = a 1 • • • a L -as a word.In what follows we will often slightly abuse notation by letting the symbol w stand for both an abstract string a 1 • • • a L of generators in S, as well as the associated computationalbasis product state Words are naturally grouped into equivalence classes labeled by elements of G. Letting W (S) denote the set of all words, we define these equivalence classes as Any two words belonging to the same equivalence class can be deformed into one another by applying a sequence of relations in R. For any two w, w ′ ∈ K g , we define a derivation from w to w ′ , written D(w ⇝ w ′ ), as the sequence of words appearing in this deformation: where each arrow → indicates applying a single relation from R.
In Sec.III we will see that the way in which Dyn G thermalizes is determined by the complexity of the word problem for G, a fundamental problem in the fields of abstract algebra and computability theory.The word problem is defined by the following question: Word problem: Given two words w, w ′ , does φ(|w⟩) = φ(|w ′ ⟩)?That is, does there exist a derivation D(w ⇝ w ′ )?
A key result is that even in the case where both |S| and |R| are small, answering this question can be very difficult (even undecidably so; see App.C).For semigroups or groups which do not have an undecidable word problem, a key problem is to determine the time and space complexity of algorithms that solve it.In what follows, we will introduce two functions characterizing the word problem's complexity: the Dehn function, which governs its time complexity, and the expansion length, which governs its space complexity.
Our operational definition of the time complexity of the word problem is the minimum length of a derivation linking w to w ′ , as illustrated in Fig. 2 (a). 6We denote this by Dehn(w 1 , w 2 ): Dehn(w, w ′ ) measures the non-deterministic time complexity of the word problem since it is the maximum runtime of an algorithm which maps w to w ′ by blindly applying all possible relations in R to w in parallel and halts the first time w ′ appears in the resulting superposition of words.Time evolution under Dyn G can be naturally regarded as a way of simulating this process, a connection enabling the derivation of the bounds arrived at in Sec.III.
For any word w, we define the length |w| of w as the number of generators appearing in w.To denote the subset of length-L words in K g , we write as the set of length-L words in K g (or equivalently, following our practice of letting w stand for both a word and a computational-basis product state, as the collection of product states associated to such words).The (worstcase) time complexity across all words in K g,L defines the function We will be particularly interested in how Dehn g (L) scales asymptotically with L. In App.B, we show that this scaling is the same for any two finite presentations of the same semigroup: thus we may meaningfully speak about the Dehn function of a semigroup, rather than a particular presentation thereof.This presentation independence imparts some degree of the robustness to the dynamical properties that we will derive.
For the case where G is a group (rather than just a semigroup), a fair amount more can be said.All groups have a distinguished identity element e, and in App.B we show that 7   Dehn e (L) ≳ Dehn g (L) ∀ g ∈ G. ( We furthermore show that, as long as |K g,L | scales exponentially in L, then Dehn e (L) ∼ Dehn g (L).For groups, we thus define as for a particular G there may exist other (faster) algorithms for constructing a proof.We therefore must stress that when we speak of the time complexity of the word problem, we are referring to its time complexity within the so-called Dehn proof system, a logical system where the only manipulations one is allowed to perform are the implementation of local relations in R. If one wants an algorithm which works for all choices of G, operating within this system is generically the best one can do. 7Throughout this work, ∼, ≳, ≲ will be used to denote (in)equality in the scaling sense.For example, when we write f (L) ∼ g(L), we mean that there exists a constant as a simpler characterization of the word problem's time complexity.The calculation of Dehn e (L) also simplifies further for groups, as we may fix |w ′ ⟩ = |e L ⟩ in (14) without changing the asymptotic scaling of Dehn(L).Thus for groups, we will mostly focus on computing Dehn(w, e L ) (G a group) (17) Even for groups where |S|, |R| are both O(1), Dehn(L) can scale in many different ways.To start, it is easy to verify that Dehn(L) ∼ L for all finite groups, and that with Dehn(L) ∼ L 2 only when G is infinite.Indeed, for all such groups, such as the G = Z 2 example above, Dehn(L) is bounded from above by the time it takes to transport the conserved charges n a ≜ i (|a⟩⟨a| i − |a −1 ⟩⟨a −1 | i ) across the system, which is ∼ L 2 .For our purposes, we will regard any G with Dehn(L) ≲ L 2 as uninteresting, as for these groups Dyn G thermalizes on timescales generically no slower than for conventional systems with conserved U (1) charges.
A simple example of a group with an "interesting" Dehn function is the discrete Heisenberg group H 3 , which has the group presentation and possesses a Dehn function scaling as Dehn(L) ∼ L 3 [42].In Sec.IV we will see an example of a simple group where Dehn(L) ∼ 2 L and in Sec.V one with Dehn(L) ∼ 2 2 L ; examples of semigroups with similarly slow dynamics are given in Sec.VI.Going beyond these examples, Refs.[43,44] remarkably show that for any constant α, almost any function with growth L 2 ≤ f (L) < L α is the Dehn function of some finitely presented semigroup; this includes for example "unreasonable-looking" functions such as L 7π , L 2 log L, and L e log(L) log(L) log log L. In addition to the worst case complexity of the word problem, we will also have occasion to consider its average-case complexity (both for groups and semigroups), a quantity has recieved much less attention in the math literature (the only exception the authors are aware of is [45]).To this end, we define the typical Dehn function Dehn g (L) as the number of steps needed to map a certain constant fraction of words in K g,L to one another: Establishing rigorous results about Dehn is unfortunately much more difficult than for Dehn, and the landscape of typical Dehn functions is comparatively less well explored.For the examples we focus on in this paper however, a combination of physical arguments and numerics will nevertheless suffice to understand the rough asymptotic scaling of Dehn g (L).

Space complexity
Another complexity measure is the (nondeterministic) space complexity of the word problem. 8The space complexity is nontrivial in cases where, during the course of being deformed into w ′ , a word w must expand to a length L ′ > L.More formally, we define the expansion length 9 of a derivation D(w The relative expansion length EL(w, w ′ ) between two words is then defined as the minimal expansion length of a derivation connecting them: as illustrated in Fig. 2 (b).The expansion length of a K g,L sector is likewise Similarly with the Dehn function, we show in App.B that when G is a group, EL g (L) ≲ EL e (L) for all g, so that for groups we may use as a simple metric of the space complexity.
In App.B we show that EL(L) ≲ L for all Abelian groups, and that all finite groups have EL(L) ≤ L + C for some constant C; such scalings are "uninteresting" from the perspective of space complexity.Just as with the Dehn function though, there exist simple semigroups for which EL(L) grows extremely fast with L, the consequences of which will be explored in Sec.V.

C. Semigroup dynamics and Hilbert space fragmentation
The existence of multiple K g,L sectors means that Dyn G is not ergodic as long as G is not a presentation of the trivial semigroup.The simplest case is when G is an Abelian group.In this case, the K g,L can be associated with the symmetry sectors of a global symmetry.This is true simply because the sector that a given product state lives in can be determined by computing the expectation 8 The same caveat we made for the time complexity also applies here-we are referring explicitly to the space complexity within the "Dehn proof system". 9The expansion length function is called the filling length function in the math literature.
value of the operators ) for each generator g.Thus Dyn G for Abelian G is already very well understood, given the plethora of work on thermalization in the presence of global symmetries.
When G is not an Abelian group, global symmetries may still be present, but there inevitably exist other nonlocal conserved quantities which distinguish different dynamical sectors.Indeed, in App.D we prove that the dynamical sectors of such models never be described by global symmetries alone.Since Dyn G thus always has non-local conserved quantities if G is not an Abelian group, the lack of ergodicity due to these quantities leads to Hilbert space fragmentation (HSF) [19,[21][22][23]46], a phenomenon whereby the dynamics of initial states becomes trapped in disconnected subspaces of H-in our notation simply the K g,L -whose existence cannot be attributed to the presence of global symmetries alone.
The original works on HSF [21,22] focused mainly on fragmentation in spin systems with conserved dipole moments.While these systems fall within our semigroup framework, 10 we will instead find it more instructive to review the pair-flip model introduced in [47].The pairflip model is described by a spin-s Hamiltonian of the following form: For generic choices of λ ab,i , the model is non-integrable, so any ergodicity breaking cannot be associated with integrability.Note that the product states |w⟩ where w = a 1 • • • a L are annihilated by H P F if a i ̸ = a i+1 for all i.These product states alone provide (2s + 1)(2s) L−1 dynamically-disconnected dimension-1 sectors not attributable to any local symmetry, meaning that H P F exhibits HSF.The dynamics generated by H P F is in fact a special case of our construction applied to the group where * denotes the free product.H P F can be obtained from our general construction by considering a modified onsite Hilbert space H ′ = span{|g⟩, |g −1 ⟩ : g ∈ S} which contains no e generator.The relation g 2 i = e can be rewritten without e as g 2 i = g 2 j for all i, j, and a general local Hamiltonian acting on (H ′ ) ⊗L which obeys the may accordingly be written as 10 Concretely, the strongly-fragmented models of [21,22] may be phrased as semigroup dynamics for the "dipole semigroup" Dip defined by the semigroup presentation where R = {020 = 101, 121 = 202, 120 = 201, 021 = 102}.
which matches the Hamiltonian in (26).Unfortunately the Dehn function of Z * (2s+1) 2 is easily seen to scale as Dehn(L) ∼ L 2 , and thus does not provide a complexity scaling that is unusual enough to warrant further study.
Having discussed the concept of HSF, let us further understand the structure and size of disconnected sectors under the dynamics Dyn G .The number of such disconnected sectors at least as large as the number of subspaces, each labelled by K g,L for some g ∈ G. Thus, the number of such subspaces-which we denote as N K (L)depends on the number of semigroup elements expressible as words of length ≤ L. In particular, we define the geodesic length |g| of an element g ∈ G as the length of the shortest word representing g: A word w with φ(|w⟩) = g satisfying |w| = |g| is called a geodesic word.Then Semigroups with more elements "close" to the identity thus have dynamics with a larger number of Hilbert space fragments.As an example, one may readily verify that in the pair flip model, N K (L) ∼ (2s) L grows exponentially as long as s > 1/2.For many models with group constraints, including the pair-flip model, N K (L) exactly determines the number of Hilbert space fragments.However, for some semigroups, N K (L) is not the full story, with each K g,L further fragmenting into subspaces in a way controlled by the expansion length function defined in (23).Understanding this phenomenon is the subject of Sec.V.
In the study of HSF, a distinction is often made between "strong" and "weak" HSF.This distinction was originally discussed in the context of Hamiltonian dynamics [21], where it was defined by violations of strong and weak ETH, respectively.Since we are discussing things at a level where the nature of Dyn G may or may not involve eigenstates, we will instead adopt a slightly different definition in terms of the size of the largest K g,L sector, which we denote as K max,L (in App.B we prove that for groups, the largest sector is in fact always the one associated with the identity, K max,L = K e,L ).We will say that the dynamics is Note that the above definitions are made without reference to any global symmetry sectors.This means that if global symmetries happen to be present, the quantum numbers associated with them will constitute part of the elements g labeling the different dynamical sectors, and the above definition of strong/weak HSF will simply single out the largest, regardless of its symmetry quantum number(s).When symmetries are present one could also quantify the degree of fragmentation by first fixing a quantum number, changing K max (L) to be the largest sector having that quantum number, and replacing |H| by the dimension of the chosen symmetry sector.However, since generic Dyn G dynamics needn't have any global symmetries, and since the result of the above procedure can depend sensitively on the chosen quantum number [48][49][50], we will focus only on the above (simpler) definition, which maximizes over all symmetry sectors.
Our models provide a way of addressing two questions raised in Ref. [21].The first question was whether or not 1D models exist with 0 < |K max,L |/|H| < 1 in the thermodynamic limit (known examples with weak fragmentation all have |K max,L |/|H| → 1 in the thermodynamic limit).Our construction answers this question in the affirmative, with examples provided by Dyn G for any finite non-Abelian G (e.g.G = S 3 ).
The second question concerned the existence of models where |K max,L |/|H| vanishes more slowly than exponentially as L → ∞ after specifying to a fixed symmetry sector (in fact an affirmative answer to this question was already provided by the spin-1 Motzkin chain introduced in [51], where |K max,L |/|H| ∼ L −3/211 ).Our models provide (many) more examples of this phenomenon, as one need only let G be a group with L 2 ≲ Dehn(L) ≲ L ∞ , a simple example being the discrete Heisenberg group H 3 .12One may further ask whether there are strongly fragmented systems where |K max,L |/|H| decays at a rate in between 1/poly(L) and exp(−L).The answer to this is again affirmative, with one such example being the focus of Sec.IV.

III. SLOW THERMALIZATION AND THE TIME COMPLEXITY OF THE WORD PROBLEM
From the discussion of the previous section, it is natural to expect that systems with Dyn G dynamics will have thermalization times controlled by the time complexity of the word problem, as diagnosed by the Dehn functions Dehn g (L) defined in (14).In this section we make this relationship precise by using the functions Dehn g (L) to place lower bounds on various thermalization timescales.In the case of random unitary or classical stochastic dynamics, Dehn g (L) will be used to bound relaxation and mixing times; for Hamiltonian dynamics it will appear in bounds for hitting times.
A common feature these timescales have is that they quantify when Dyn G is able to spread initial product states across Hilbert space.This is often not something that can be probed by looking at correlation functions of local operators, which would be the preferred method for thinking about thermalization.While the bounds derived in this section do not mandate that the relaxation of local operators also be controlled by Dehn g (L), in Sec.IV we will explore an explicit example in which they are.Until then, we will focus on the more "global" diagnostics of relaxation and hitting times.
We note in passing that it is impossible to use Dehn g (L)-or any other quantity-to place upper bounds on thermalization timescales without making any additional assumptions about the details of Dyn G (as without additional assumptions we could always choose Dyn G to be evolution with a many body localized Hamiltonian, and the relevant timescales would all diverge).Even in the case where Dyn G is an appropriately constrained form of random unitary evolution, upper bounding the relaxation timescales requires techniques beyond those employed below, and constitutes an interesting direction for future work.

A. Circuit dynamics
We first discuss the case where Dyn G is generated by a G-constrained random unitary circuit, which will we see can be mapped to the case where Dyn G is a classical Markov process.We assume that Dyn G is expressed as a brickwork circuit whose gates act on ℓ R sites, with ℓ R the maximum size of a relation in R. Dyn G (t) will be used to denote t ∈ N timesteps of this dynamics, with each timestep consisting of ℓ R staggered layers of gates.
Let us first look at how operators evolve under Dyn G (t).We use overbars to denote averages over circuit realizations, so that acting on an operator O with one layer of the brickwork (corresponding to a time of ) where each U j,g acts on a length-ℓ R block of sites and G ℓ R as before denotes the set of group elements expressible as words of length ≤ ℓ R .Assuming the U j,g are drawn uniformly from the |K g,ℓ R |-dimensional Haar ensemble, 13  performing the average gives where we have defined as the projector onto the space of length-ℓ R words w with φ(|w⟩) = g, as well as-without loss of generality-taken O to factorize as O = j O j .
Thus after a single step of the dynamics, all operators completely dephase and become diagonal in the computational basis, operators violating the dynamical constraint evaluate to zero under Haar averaging.We may thus focus on diagonal operators without loss of generality, which we will indicate using vector notation as |O(t)⟩.With this notation, (32) becomes with the matrix where projects onto the uniform superposition of states within K g,ℓ R .Different layers of the brickwork likewise define matrices M i with i = 1, . . ., ℓ R where i denotes the staggering.Defining diagonal operators evolve as M is a symmetric 14 doubly-stochastic matrix, with the smallest eigenvalue of 0 and the largest eigenvalue of 1.
13 Note that unlike in Ref. [52], here we take the U j,g to be nontrivial in every K g,ℓ R sector, even the one-dimensional ones (where U j,g acts as multiplication by a random phase).This leads to a comparatively simpler expression for the Markov generator to appear below. 14Strictly speaking M = M T only if we double each step of the evolution by including a reflection-related pattern of gates, namely j=1 M ℓ R +1−j ; we will tacitly assume this has always been done.
Due to the group constraint, M does not have a unique steady state following from that fact that the Markovian dynamics is reducible as Each M g however defines a irreducible aperiodic Markov chain, whose unique steady state is the uniform distribution w∈K g,L |w⟩.Infinite temperature circuit-averaged correlation functions are determined by the mixing time and spectral gap of the M g , which we now relate to the Dehn function.
Consider first the mixing time of M g , which we may view as a characterization of the thermalization timescale within K g,L : ) A basic result [53] about t mix (g) is that it is lower bounded by half the diameter of the state space that M g acts on, simply because in order to mix, the system must at the minimum be able to traverse across most of state space.This gives the bound However, this bound is in fact too lose.Intuitively, this is because to saturate the bound, Dyn G (t) would need to immediately find the optimal path between any two nodes in configuration space.Since M generates a random walk on state space, the dynamics will instead diffusively explore state space in a less efficient manner.On general grounds one might therefore expect a bound on t mix (g) which is the square of the RHS above.This guess in fact turns out to be essentially correct, with which follows from Prop.13.7 of [54] after using that the equilibrium distribution of M g is always uniform on K g by virtue of M g being doubly stochastic.Since |K g,L | is at most exponential in L, we thus have for some g-dependent O(1) constant C g .We generically expect the random walk generated by M to be the fastest mixing local Markov process, and hence expect it to saturate the above bound on t mix (g), a prediction which we will confirm numerically for the example in Sec.IV.Correlation functions for operators computed in states in K g,L are controlled by the relaxation time t rel of M g , defined by the inverse gap of M g : where λ 2 is the second-largest eigenvalue of M g .This quantity admits a similar bound to t mix (g), as one can show that [53] and hence with C ′ g another O(1) constant.Thus Dehn g (L) places lower bounds on both mixing and the decay of correlation functions.Note that the operators whose correlators decay as t rel (g) need not be local; indeed the obvious ones to consider are projectors like |w⟩⟨w|.In Sec.IV we will however explore an example which possesses local operators that relax according to (46).
Note that the mixing and relaxation times are worst case measures of the thermalization time.We can additionally define a "typical" mixing time t mix (g) as where Pr |w⟩ indicates the probability over words uniformly drawn from K g,L .By following similar logic as above, one can derive similar bounds on t mix (g) in terms of the typical Dehn function.

B. Hamiltonian dynamics
We have thus far discussed mixing times of the Markov chains which arise from random unitary dynamics.However, purely Hamiltonian dynamics does not mimic that of a Markov chain, and being reversible it possesses no direct analogues of mixing and relaxation times.Nevertheless, the Dehn function can still be used to place bounds on the timescales taken for time evolution to spread wavefunctions across in Hilbert space.To be more quantitative, we will focus on the hitting time t hit (g) of Dyn G , which we define as the minimum time needed for product states in K g,L to "reach" all other product states in K g,L .Since ⟨w ′ |e −iHt |w⟩ will generically be nonzero for all |w⟩, |w ′ ⟩ ∈ K g,L as soon as t > 0, we will need a slightly different definition of t hit (g) as compared with the Markov chain case.
To define t hit (g) more precisely, we first define the hitting amplitude between two computational-basis product states |w⟩, |w ′ ⟩ ∈ K g,L as Note that h w,• (t; g) is a probability distribution on K g,L , with w ′ ∈K g,L h ww ′ (t; g) = 1 for all |w⟩ ∈ K g,L .We define t hit (g, w, w ′ ) between two words w and w ′ as the minimum time for which h ww ′ (t; g) reaches a fixed fraction of its infinite-time average h ww ′ (g), defined by The hitting time is the maximum over all pairs of words |w⟩, |w ′ ⟩ ∈ K g,L of t hit (g, w, w ′ ), which is written as (50) To bound t hit (g), we first evaluate the hitting amplitude as ) We know that the minimum k such that ⟨w ′ |H k |w⟩ ̸ = 0 is given by Dehn g (w, w ′ ), which for brevity we denote by d ww ′ in the following.The first d ww ′ terms in the above sum will thus vanish, and so truncating the above Taylor series at the leading nonzero term, we apply the remainder theorem to find To diagnose the hitting time we need to compare h ww ′ (t; g) with its long time average h ww ′ (g).We do this by relating the above matrix element |⟨w ′ |H d ww ′ |w⟩| 2 to h ww ′ (g) as follows.Writing {|E⟩} for H's eigenbasis, (53) where the second line follows from the Cauchy-Schwarz inequality.The first factor in parenthesis is simply h ww ′ (g).This can be readily verified: which, assuming that the spectrum of H| K g,L is nondegenerate (which we assume to be the case throughout this paper), gives Inserting this in (53), Since H is local, ||H|| ∞ = λL for some O(1) constant λ.
We may thus write (52) as As long as d ww ′ = Ω(L), h ww ′ (t; g) is thus much smaller than its equilibrium value when t = Ω(d ww ′ /L).Indeed, from Stirling's approximation we have which can always be made exponentially small in L by choosing η appropriately if d ww ′ = Ω(L).We conclude that which is essentially the same bound as our naive result (41) for t rel (g) in the case of random circuit evolution.
It would be interesting to see if this bound could be improved to the square of the RHS, as in (46).Finally, we note that just as with the mixing time, a typical hitting time may also be defined as (60) Arguments similar to the above then show that t hit (g) admits a similar bound in terms of Dehn g (L).

IV. THE BAUMSLAG-SOLITAR GROUP: EXPONENTIALLY SLOW HYDRODYNAMICS
The previous two sections have focused on developing parts of the general theory of semigroup dynamics.In this section we take an in-depth look at a particular example which exhibits anomalously long relaxation times.
Our example will come from a family of groups whose Dehn functions scale exponentially with L, yielding word problems with large time complexity. 15When Dehn(L) ∼ exp(L), the hitting and mixing times discussed in the previous section are exponentially long.It is perhaps already rather surprising that a translation invariant local Markov process / unitary circuit can have mixing times that are this long, but our example is most interesting for another reason: it also possesses a global symmetry whose conserved charge takes Dehn(L) time to relax.This allows the slow time complexity of the word problem to be manifested in the expectation values of simple local operators, rather than being hidden in hitting times between different product states.
Our example comes from a family of groups known in the math literature as the Baumslag-Solitar groups [55], which are parameterized by two integers n, m ∈ N.Each group BS(n, m) in this family is generated by two elements a and b obeying a single nontrivial relation, with the group presentation Models with Dyn BS(n,m) dynamics are thus most naturally realized in spin-2 systems with max(n, m)-local dynamics and onsite Hilbert space The simplest Baumslag-Solitar group which is interesting for our purposes is BS(1, 2), and the rest of our discussion will focus on this case.For convenience, in what follows we will write BS(1, 2) simply as BS.
The nontrivial relation in BS reads so that a generators duplicate when moved to the right of b generators.By taking inverses, the a generators are also seen to duplicate when moved to the left of b −1 generators: This duplication property means that an O(n) number of b and b −1 generators can be used to grow an a generator by an amount of order O(2 n ), as We will see momentarily how this exponential growth can be linked to the Dehn function of BS, which also scales exponentially.
Another key property of ( 63) is that it preserves the number of b's.This means that models with Dyn G dynamics possess a U (1) symmetry generated by the n b , defined as It is this conserved quantity which will display the exponentially slow hydrodynamics advertised above.

A. A geometric perspective on group complexity
To understand how dynamics in Dyn BS works, we will find it helpful to introduce a geometric way of thinking about the group word problem. 16Given a general discrete group G, we will let CG G denote the Cayley graph of G. Recall that CG G is a graph with vertices labeled by elements of G and edges labeled by generators of G and their inverses, with two vertices h, k being connected by an edge for the free group Z * Z = ⟨x, y | ⟩ is a Bethe lattice with coordination number 4. 17It is useful to realize that from any Cayley graph CG G , we can always construct a related 2-complex by associating oriented faces (or 2-cells) to each of CG G 's elementary closed loops; the 2-cells have the property that the product of generators around their boundary is a relation in R. For the just-mentioned examples, Z N would have one N -sided face, Z 2 would have a face for each plaquette of the square lattice (with the generators around the plaquette boundaries reading xyx −1 y −1 ), and the free product Z * Z would have no faces at all (due to its lack of non-trivial relations).The 2-complex thus constructed is known as the Cayley 2-complex of G, and we will abuse notation by also referring to it as CG G .
Any group word w ∈ W (G) naturally defines an oriented path in CG G obtained by starting at an (arbitrarily chosen) origin of CG G and moving along edges based on the characters in w.The endpoint of this path on the Cayley graph is thus the group element φ(|w⟩).Additionally, applying local relations in R to w deforms this path while keeping its endpoints fixed.This gives a geometric interpretation of the subspaces K g,L : In particular, the (largest) subspace K e,L is identified with the set of all length-L closed loops in CG G .The number N K (L) of Krylov sectors is the number of vertices of CG G located a distance ≤ L from the origin.
Having provided a relationship between Krylov subspaces and the Cayley 2-complex, what is the geometric interpretation of the Dehn function?Based on the definition (17), we can restrict our attention to deformations D(w ⇝ e L ) for w ∈ K e,L , which are simply homotopies that shrink the loop defined by w down to a point.The loop passes across one cell at each step, and the number of steps in D(w ⇝ e L ) is the area enclosed by the surface swept out by the homotopy.In particular, the Dehn function of a w is where the minimum is over surfaces in CG G with boundary w; we will thus refer to as the area of w.The Dehn function of the group is then the largest area of a word in K e,L : This perspective is important as it gives the algorithmic definition of Dehn(L) a geometric meaning.The large-scale geometry of CG G thus directly affects the complexity of the word problem, and consequently the thermalization dynamics of Dyn G .

B. The geometry of BS and the fragmentation of
Dyn BS The simple examples (discrete groups, abelian groups, free groups) discussed above are all geometrically uninteresting, but BS group is a notable exception.CG BS has the structure of an infinite branching tree of hyperbolically-tiled planes, and is illustrated in Fig. 3. To understand how this comes about, recall that b −n ab n = a 2 n .This means that for all n, b −n ab n a −2 n forms a closed loop in CG BS .These closed loops give rise to a tiling of the hyperbolic plane, as shown in Fig. 3 (a).Letting multiplication by a correspond to the motion along the x direction and multiplication by b to the motion along ŷ, the hyperbolic structure comes from the fact that to moving by 2 n sites along the x direction of the Cayley graph can either be accomplished by direct path (multiplication by a 2 n ) or a path that requires ex-ponentially fewer steps which first traverses n steps along the ŷ direction before moving along x (multiplication by b −n ab n ).
The full geometry of CG BS is more complicated than a single hyperbolic plane, however.As shown in Fig. 3 (a), this can be seen from the fact that the word bab cannot be embedded into the hyperbolic plane.Instead, this word must "branch out" into a new sheet, which also forms a hyperbolic plane.Fig. 3 (b) illustrates the consequences of this for the Cayley graph, whose full structure consists of an infinite number of hyperbolic planes glued together in the fashion of a binary tree (formally, the presentation complex of BS is homeomorphic to R × T 3 , where T 3 is a 3-regular tree).
The locally hyperbolic structure of CG BS means that the number of vertices within a distance L of the originand hence the number of Krylov sectors N K (L)-grows exponentially with L. In App.E we argue that the base of the exponent is very nearly √ 3: Interestingly, despite having exponentially many Hilbert space fragments, it is known that the largest sector K e,L occupies a fraction of the full Hilbert space H that de- creases sub-exponentially with L: where the numerical results in Fig. 4-obtained by sampling random words and postselecting on membership in K e,L -indicate that α ≈ 1.84.Models with Dyn BS dynamics provide an example (indeed the first that we are aware of) of a strongly fragmented model where the largest sector occupies neither an exponentially nor polynomially small fraction of Hilbert space.

C. The Dehn function
We finally have the necessary tools explain why the Dehn function of BS grows exponentially with L. Define the word which is of length |w large (n)| = 4n + 4, belongs to K e,L , and is shown for n = 3 in Fig. 3 as the thick dashed line in panel (b).As a path, this word can be broken into two legs.The first leg moves to the vertex labelled by a 2 n by first going "up" the hyperbolic plane of a given sheet, traversing one step along x, and then coming back "down".The second leg moves back to the origin, but does so by passing along a different sheet of the tree.From Fig. 3, it is clear that the area of the minimal surface bounding w large (n) is exponentially large; more precisely it is which scales exponentially with n.This construction thus shows that Dehn(L) = Ω(2 L ) [56].This bound is in fact tight (see App. E), and so From the results of Sec.III, this implies that Dyn G has exponentially long mixing, relaxation, and hitting times.
A more sophisticated treatment needs to be given in order to understand the scaling of the typical Dehn function Dehn(L).To our knowledge, this question has not been answered in the math literature.While we will not provide a completely rigorous proof of Dehn(L)'s scaling, a combination of physical arguments and numerics-which we relegate in their entirety to App.E-indicate that The rough intuition behind this result is that a typical closed-loop path in K e,L will roughly execute a random walk along the tree part of CG BS , reaching a "depth" of √ L; from this point it is then able to enclose an area exponential in this depth.Giving a rigorous proof of ( 76) could be an interesting direction for future research.

D. Exponentially slow hydrodynamics
An observation about the word w large (n) is that it contains a density wave of bs, with the profile of n b,i looking like two periods of a square wave as a function of i. Utilizing the fact that Area(w large (n)) ∼ 2 n , one sees that this density wave takes an exponentially long time to relax, with a thermalization time t th (n) ∼ [Area(w large (n))] 2 = Ω(2 2n ) (where the square of the area comes from the square in ( 46)).Furthermore, almost any word with an n b density wave will admit a similar bound on t th .Indeed, consider words w large (n, L) obtained by extending w large (n) to length L > 4n + 4 by inserting L − (4n + 4) random characters at random points of w large (n) (we will assume for simplicity that in fact w large (n, L) ∈ K e,L , but this assumption is not crucial).The only way for Area(w large (n, L)) to be significantly smaller than Area(w large (n)) is if the added characters cancel out almost the entirety of the b density wave, or cancel a large number of a's located near the peaks and troughs of the density wave.For a random choice of w large (n, L) these situations are exponentially unlikely to occur, and we expect with probability 1 in the large n limit.The long relaxation time of the density wave is attributed to the long mixing time of BS because of its large Dehn function (see Sec. III).This is remarkable because probing the large scale complexity of BS only requires studying the dynamics of local operators, namely those that overlap with n b .We henceforth will denote the relaxation and mixing times by t th .
Of course since 2 n is the smallest possible time needed to map w large (n) to e 4n+4 , it gives only a lower bound on t th .In the case where Dyn BS (t) is generated by a classical Markov process or random unitary circuit dynamics, Dyn G (t) effectively leads to words executing a random walk on configuration space of loops.Due to the diffusive nature of this random walk, this leads us to expect that the true thermalization time instead scales as the square of its lower bound, namely an estimate that we will confirm shortly in Sec.IV E.
To examine the relaxation in more detail, consider a state |w A,q ⟩ which contains a n b density wave of momentum q and amplitude A, but which is otherwise random.By this, we mean that the expectation value of n b,i in |w A,q ⟩ is (switching to schematic continuum notation) and that |w A,q ⟩ is chosen randomly from the set of all states in K e,L that have this expectation value (with the restriction to K e,L done purely for notational simplicity).
The "depth" n A that w A,q proceeds into CG BS is equal to the contrast of the density wave, which we define as the integral of the b density over a quarter period of the density wave: We thus expect the density wave defined by |w A,q ⟩ to have a thermalization timescale of Note that this exponential timescale is not visible in the standard linear-response limit, in which one takes A → 0 before taking q → 0: the two limits lead to qualitatively different relaxation.In the standard linearresponse limit, fluctuations at scale q will decorrelate on the typical relaxation timescale ∼ 2 √ L .Beyond t th , we would also like to know how the amplitude A-or equivalently the contrast n A -behaves as a function of time.To estimate this, consider first how w large (n) relaxes.By the geometric considerations of Sec.IV B, the shortest homotopy reducing w large (n) to the identity word must first map w large (n) → w large (n − 1), then w large (n − 1) → w large (n − 2), and so on; see Fig. 9 for an illustration.The self-similarity of this process means that the time needed to map w large (m) → w large (m − 1), namely t th (w large (m − 1)), is equal to the combined time needed to perform all of the maps w large (k) → w large (k − 1) with k < m.Applying this observation to the density wave under consideration and using (78), we estimate for k ≤ n A (0). Writing the k on the RHS in terms of the time t = t th (1 − 2 −k ), these arguments suggest that the density relaxes as where t th = C2 2A/q with C an O(1) constant, n b,0 is shorthand for n b (q, 0), and the 2 −nb(q,0) t/t th inside the logarithm ensures that n b (q, t th ) = 0.
An interesting consequence of ( 83) is that the relaxation of the density wave happens "all at once" in the large contrast limit (e.g.small q at fixed A), meaning that the density profile remains almost completely unchanged until a time very close to t th , at which point the density wave is abruptly destroyed.Indeed, define the collapse timescale as the time at which the collapse of the density wave becomes noticeable within a precision controlled by the constant 0 < ε < 1.Then the form (83) implies that lim implying that the collapse is instantaneous in the largecontrast limit.These arguments show that density waves collapse abruptly, but do not tell us about the statistical distribution of collapse times obtained when considering an ensemble of realizations of the dynamics (specific disorder realizations of random unitary circuits, specific choices of Hamiltonian, etc.).In systems which relax suddenly, the width σ t th of this distribution can be either parametrically smaller (in system size) than the mean collapse time t th , or can scale as Ω(t th ).The former case occurs in Markov chains displaying a "cutoff" (see e.g.[38,39]), which can occur when the chain describes random motion on a highly-connected graph, or when it describes relaxation in a metastable system (where relaxation is caused by rare events that yield a Poissonian scaling σ t th ∼ √ t th = o(t th )).The latter case is typical in systems which relax diffusively, with an unbiased random walk on Z d being a typical example.
From the geometric considerations of this section, it is perhaps not clear which scenario should apply a priori.The numerics of the following section tentatively point to the latter scenario, with the distribution of thermalization times roughly obeying σ t th ∼ t th .Giving a more precise characterization of this scaling would be interesting to explore in future work.

E. Numerics
We now present the results of numerical simulations that let us take a more detailed look into the relaxation of n b and confirm the predictions made in the previous subsection.(89).Thermalization occurs at time t th , when this observable drops to zero, which corresponds to the b-charge density wave collapsing to a flat profile.The run corresponding to panel (a) is shown in grey, while the blue curve corresponds to an average over several independent runs, with the red shade showing the standard deviation.Time for each run has been rescaled by the respective thermalization time.The analytic formula from Eq.( 83) is shown in dashed green (rescaled by the value at t = 0).(c) Thermalization time scales exponentially with the system size L if the density of b's in the initial word is kept fixed, n = L/10.

Stochastic circuits
Our simulations all treat Dyn BS as time evolution under BS-constrained random unitary circuits.Because offdiagonal operators are rendered diagonal after a single step of random unitary dynamics (as was shown in ( 32)), we can without loss of generality focus on the evolution of diagonal operators.In Sec.III, we showed that the product states associated with diagonal operators evolve in time according to the stochastic matrix M derived in (35).Explicitly, since the maximal size of an elementary relation in BS is ℓ R = 3 (e.g.abe = baa), M is most naturally constructed using 3 brickwork layers of 3-site gates: where, as in (35), the matrices M i induce equal-weight transitions among all dynamically equivalent 3-letter words: where the 1/|K g1g2g3 (3)| factor ensures that M i is stochastic, in fact doubly so on account of M T i = M i .The steady-state distribution |π g ⟩ of M within K g,L is accordingly given by the uniform distribution on K g,L : In practice we do not actually diagonalize M, but rather use the matrix elements of M to randomly sample updates that may be applied to a computational basis state |w⟩, with the system thus remaining in a product state at all times.A single time step in our simulations corresponds to a single brickwork layer of M, i.e. to the application of a single relation at each 3-site block of sites.Since ⌊L/3⌋ relations can be applied at each time step, in these units it is Dehn(L)/L-rather than Dehn(L)which lower bounds the mixing and relaxation times of M.

Slow relaxation
We start by exploring how long it takes the state |w large (n, L)⟩ to relax under M's dynamics, where as above w large (n, L) is obtained from |w large (n)⟩ by padding it with L − (4n + 4) identity characters inserted at random positions.This is done by initializing the system in |w large (n, L)⟩ and tracking the local n b,i density over time.We focus in particular on the moving average of the n b density and the fluctuations thereof, defined as where any nonzero value of ⟨n b,i ⟩(t) indicates a lack of equilibration (for g ̸ = e the average ⟨π g |n b,i |π g ⟩ can be nonzero, and must first be computed in order to diagnose equilibration).
The evolution of ⟨n b,i ⟩(t) for a single realization of the dynamics initialized in |w large (n, L)⟩ is shown in Fig. 5 (a) for n = L/10 and L = 120.From this we can clearly see the sudden collapse phenomena predicted above in (85): the n b density wave hardly decays at all until very close to t th ∼ 5 × 10 6 , and which point ⟨n n,i ⟩(t) rapidly becomes very close to zero.This is quantified in Fig. 5 (b), which plots the spatially-averaged n b fluctuations ⟨n b ⟩ 2 for the same realization.A relatively good fit (dashed line) is obtained using the square of the "sudden collapse" function defined in (83).
We now investigate how the thermalization time of |w large (n, L)⟩ scales with n, continuing to fix n/L = 10 so as to keep both n, L extensive.Operationally, we define the thermalization time as the first time when the magnitude of fluctuations in n b drop below a fixed fraction of their initial value: Fig. 5 (c) shows the scaling of t th with L, which is observed to admit an excellent fit to the predicted scaling of ∼ 2 2n .As shown in Fig. 6, the distribution of thermalization times across different runs is additionally observed to be rather broad, with a standard deviation that scales approximately in the same way as t th .While this confirms that the density wave present in |w large (n, L)⟩ relaxes on a timescale of t th ∼ 2 2n , it does not show that all states with b density waves of amplitude A and wavelength q relax on times of order 2 2A/q .This in fact cannot be true, and fast-relaxing density waves always exist.Indeed, consider the word w small (n, L) obtained from w large (n, L) by replacing all as and a −1 s with es.The Dehn time of this word is merely Dehn(w small (n, L)) ∼ n 2 , since there are no as present to slow down the dynamics of the bs (any aa −1 pairs created in between the segments of the density waves have a net zero number of as, and are thus ineffectual at providing a slowdown).This means that the relaxation time of an initial state |w⟩ carrying a n b density wave cannot be predicted from knowledge of the conserved density alone-one must also have some knowledge about the distribution of a's in |w⟩.
We expect however that t th ∼ 2 2A/q for generic states containing an amplitude-A, wavelength-q density wave.Indeed, as argued above near (77), this is simply because as long as the value of the a-charge i (|a⟩⟨a| i − |a −1 ⟩⟨a −1 | i ) is nonzero in the regions between the segments of the density wave-which will almost always be true for a random density wave state in the thermodynamic limit-the as trapped "inside" the density wave will take an exponentially long time to escape.Note that this remark applies to a generic density wave state |w⟩, regardless of whether or not |w⟩ ∈ K e,L .
In Fig. 7, we numerically investigate the relaxation of generic density waves by considering initial states which host density waves of momentum q = 4π/3L and amplitude A = nq (with n fixed at n = L/10), but which are otherwise random.Compared with |w large (n, L)⟩, these random density waves (which generically lie in different K g,L sectors) exhibit a much broader range of thermalization times; some thermalize very quickly, while some never thermalize over our chosen simulation time window (Fig. 7 a).Nevertheless, we still observe the average thermalization timescale to scale exponentially with n (Fig. 7 b).The large sample-to-sample fluctuations of t th make it difficult to reliably extract the exact scaling behavior, but the above reasoning suggests that t th continues to scale as 2 2n for typical initial states.

V. FRAGILE FRAGMENTATION AND THE SPACE COMPLEXITY OF THE WORD PROBLEM
Our discussion in the past few sections has focused on the way in which the time complexity of the word problem enters in the thermalization times of Dyn G .In this section, we turn to the space complexity of the word problem.As discussed in Sec.II B, the space complexity of the word problem is determined by the maximal amount of space required to map between two words w, w ′ ∈ K g,L , as diagnosed by the expansion length function EL g (L) (23).When the expansion length is large, transitioning from w to w ′ necessarily requires that w first grow to be much larger than its original size before shrinking down to w ′ .When EL g (L) > L, the dynamics thus lacks the spatial resources needed to connect all states which describe the same group element.In this situation, Dyn G cannot act ergodically in K g,L , and thus the K g,L become further fragmented.Each fragment now contains words that can be reached from some reference word w b) a) FIG. 7. Relaxation of nb under stochastic circuit dynamics for initial states with random b density waves.The initial states are chosen to be words of the form w1b n w2b −n w3b n w4b −n w5, where n = L/10 and the w1,...,5 are random words containing only the characters {a, a −1 , e}.(a) Time evolution of observable ⟨nb⟩ 2 from Eq. ( 89) for several independent runs with random initial states described above and the time averaging window T = 100.The time at which the density wave collapses and the final equilibrium value of ⟨nb⟩ 2 are seen to change significantly for different choices of initial state, with some runs thermalizing faster than T (low-lying blue lines) and some runs not thermalizing within the whole displayed time window of 5000T (high-lying blue lines).(b) Thermalization times t th of random density waves, post-selected on initial states that exhibit a drop in ⟨nb⟩ 2 of at least 75% during the displayed time window.t th is determined as the median across all runs (as opposed to the mean), due to the presence of a long tail in the distribution of t th , the statistics of which are shown for L = 80 in the inset.t th determined in this way is seen to scale exponentially or faster with n. by derivations that do not involve intermediate words longer than L. We call this phenomenon fragile fragmentation in analogy with the notion of fragile topology in band theory [40]: Dyn G is said to exhibit fragile fragmentation if there are pairs of words w, w ′ of length L such that Dyn G on a system of length L does not connect |w⟩ and |w ′ ⟩, but Dyn G on a larger system of length L ′ > L connects the "padded" words |w⟩ ⊗ e L ′ −L and |w ′ ⟩ ⊗ e L ′ −L . 18A schematic illustration of this definition is given in Fig. 8.
A simple example of this phenomenon that exists in higher dimensions is the jamming transition.In jammed systems, an ensemble of particles with hard core repulsive interactions can exhibit a phase transition from a low density mobile phase to a high density jammed phase.The analog of fragmentation is the limited configuration space that particles can explore in the jammed phase.When the jammed particles are given more space, their density decreases, and when it drops below a critical value, the dynamics becomes ergodic.If the extra space is subsequently removed ergodicity may again be broken, but the system may find itself in a previously inaccessible microstate.The models we discuss in this section are more drastic examples of this phenomenon: unlike the examples with hardcore particle models, the models we study exihibit jamming even in one dimension, where the analog of the critical jamming density can be polynomially or exponentially small in system size.
To understand when Dyn G exhibits fragile fragmentation, we need to compute the expansion length EL(L). 19oing so for a general group can be rather difficult, as EL(L)'s definition involves a rather complicated minimization problem.If however we already know the time complexity of the word problem-i.e. if we know the scaling of Dehn(L)-it is possible to place a lower bound on EL(L) [57].Indeed, suppose a word w ∈ K e,L has an expansion length EL(w), implying that |w| is at most EL(w) during any homotopy from w to e L .Then the expansion length of any derivation D(w ⇝ e) cannot exceed the total number of length-EL(w) words; if it did, there would be at least one state which appeared multiple times in D(w ⇝ e)-which implies that such a derivation cannot be of minimal length.Since the number of length EL(w) words is (2|S|+1) EL(w) , we thus have Dehn(w) ≤ (2|S| + 1) EL(w) .By maximizing over all possible w ∈ K e,L and taking a logarithm, we obtain the general bound This bound is interesting in that it connects the spatial and temporal complexities of the word problem.It also has the consequence that to find examples with additional ergodicity breaking, we need only find a group with super-exponential Dehn function.We will do this in Sec.V B, but before doing so, we will warm up by understanding fragile fragmentation and the spatial complexity of the word problem in the simpler case of Dyn BS dynam-ics.A general discussion of fragile fragmentation and its repercussions for thermalization will be given in Sec.V C.

A. Fragile fragmentation in BS dynamics
We saw in Sec.IV that the word w large (n) encloses an area of O(2 n ) on the BS Cayley graph, leading to a word problem with exponentially large time complexity.We now address the space complexity of the word problem for the BS group.At first pass, it may seem that the spatial complexity is also exponentially large.Indeed, the naive homotopy mapping w large (n) to the trivial word is to bring the two excursions w large (n) makes along the b axis back "down" onto the a axis.Doing this would cause w large (n) to grow to a length of ∼ 2 n over the course of the homotopy.
However, it turns out that w large (n) can be deformed in a way that does not require its size to significantly increase, via the process illustrated in Fig. 9.As shown in the figure, instead of collapsing the excursions "down", we instead first make the loop "skinnier" by narrowing its extent along the a axis before bringing the excursions "down" after they have become low-area enough.
It is then clear that the length of w large (n) does not grow by too much-at least, not by more than a factor linear in L-during this homotopy.In App.E we prove that at large L, where α > 0 is an O(1) constant.Importantly, the fact that α > 0 means that as L approaches 4(n + 1) from above, there will be an L * > 4(n + 1) at which |w large (n)| < L * (so that w large (n) can still be fit inside of the system), but EL(w large (n)) > L * ; in this regime, the density wave defined by w large (n, L * ) cannot relax even at infinite times.Therefore, there is a transition at a finite density of es in the initial product state between a jammed regime (where homotopies are unable to contract) and an ergodic regime, leading to fragile fragmentation.Since the expansion length EL(L) ∝ L, the severity of this jamming is comparable to that of conventional jammed systems in higher dimensions.
We can identify L * numerically simply by decreasing L until |w large (n, L)⟩ ceases to thermalize.The results of doing this are shown in Fig. 10.In this example, the length of the initial word (without identities) is |w large | = 4(n + 1) = 44.We observe that thermalization time diverges as L approaches L * ≲ 50 (for L = 50, we have observed only a single instance of thermalization at a very long time, t th ≈ 5 • 10 8 ), confirming a nontrivial expansion length.

B. Exponential spatial complexity: Iterated
Baumslag-Solitar We now present an example of group dynamics that exhibits fragile fragmentation with a zero density jamming transition: the iterated Baumslag-Solitar group [41].This group is (loosely speaking) constructed by embedding a BS group inside of itself.We refer to it as BS (2) , and define it via the presentation Dyn BS (2) dynamics are thus most naturally realized in spin-3 chains with 3-local dynamics, whose local Hilbert space is obtained from that of the BS model by appending the two states {|c⟩, |c −1 ⟩}.Note that like BS, BS (2) has a single conserved U (1) charge given by the density of c generators, Several facts about BS (2) (and related generalizations thereof) are proven in App.F. The most important result is that the Dehn function of BS (2) is a super-exponential function of L: This can be intuited from the fact that the word ) is equivalent to a doublyexponentially long string of as: with the length of the RHS being doubly-exponentially larger than that of |v(n)| (and where "=" in the above denotes equality as elements of BS (2) .By following the same strategy as in the construction of w large (n), we can construct a word w huge (n) ∈ K e,L whose area grows doubly-exponentially with its length, namely Like with BS, the slow dynamics of BS (2) is manifest in the relaxation of the conserved charge n c , which from the scaling of Dehn(L) we expect to relax with an effective momentum-dependent diffusion constant which is doublyexponentially suppressed with q.
The general bound (91) implies that EL(L) ≳ 2 L .In App.F we prove that this bound is in fact tight, so that Thus unlike BS, there is no way to contract w huge (n) to the identity without it taking up an exponentially larger amount of space. 20This means that the n c density wave FIG. 8. a) A schematic of a jammed system, illustrated by densely-packed repulsively-interacting particles.The interactions and high density prevent the system from rearranging itself with the space available to it.If however the system size is increased for an intermediate period of time-allowing the particles to intermittently occupy a larger region of space before returning to their original volume-all different particle configurations can be reached.The amount by which the system size must be increased for ergodicity to be restored defines the expansion length EL. b) Fragile fragmentation is the analogue of jamming in our dynamics.For a fixed system size, the dynamics is non-ergodic, but c) (some degree of) ergodicity is restored when a reservoir of trivial ancillae |0⟩A is appended to the system.The analogue of returning to the original system size in the jamming example is played by projecting the ancillae onto their original state |0⟩A at the end of the time evolution.
FIG. 9. How a word w large enclosing an exponentially large area can be deformed to the identity word without incurring an exponential amount of expansion.The deformation proceeds by making the loop defined by w large narrower along the a direction before shrinking the loop in the b direction.
pattern present in the state |w huge (n)⟩⊗|e m ⟩ will remain present even at infinite times unless m ≳ 2 n , i.e. unless the n c density wave is exponentially dilute.
We now demonstrate this phenomenon numerically, using an extension of the analysis presented for BS.A direct implementation of the bistochastic circuit (87) is numerically rather expensive when investigating how n c relaxes, due to the requirement of needing simulations to be run for times doubly exponential in characteristic scale of the n c fluctuations under study.For this reason we will instead consider a irreversible modification of (87).We modify the dynamics so that cc −1 and c −1 c pairs can be annihilated, but not created.This means that our elementary stochastic gates M i contain terms like |e, e, e⟩⟨c, c −1 , e| i,i+1,i+2 but not the transpose thereof, with the quantity decreasing monotonically with time.
The merit of taking this approach is that the irreversible setting allows us to extract lower bounds on the relaxation time of n c for the reversible setting.Our simulations are run by initializing the system in |w huge (n, L)⟩, a version of w huge (n) padded with L−|w huge (n)| e characters at random locations, so that |w huge (n, L)| = L, and then tracking the time evolution of n |c| .The results are shown in Fig. 11 for different values of n, which are necessarily very small on account of the doubly-exponential growth of the Dehn function.In the main panel, we show FIG.10.Thermalization time for stochastic BS dynamics as a function of L for a system initialized in a product state |w large (n, L)⟩ with n = 10 (so that the initial states for different L differ only by the density of es).The thermalization time diverges at some finite L, which defines the expansion length of the word w large (n).Each thermalization time is obtained by averaging over several independent runs with the same parameters.For L = 50, we observed a single thermalizing run, with thermalization time t th ≈ 5 • 10 8 (not shown in the plot).No thermalization was observed for L < 50.(Inset) Expansion length of w large (n) for different n.For the BS group, EL is a linear function of n (and therefore, of L).
the relaxation time of |w huge (3, L)⟩ as a function of L, defined by the time at which no c, c −1 characters remain in the evolved word.The inset shows EL(n), defined as the minimal value of L for which relaxation (namely the reaching of a state containing no c, c −1 characters) was observed to occur over 1000 runs of the dynamics.The extracted EL(n) roughly conforms to our expectation of EL(n) ∼ 2 αn for an O(1) constant α, although the long timescales required to observe thermalization mean that statistical errors are rather large, and with our current data we should not expect to obtain a perfectly exponential scaling.

C. Fragile fragmentation: generalities
The BS (2) example has a conserved density, so its failure to thermalize manifests itself as the freezing of a conserved density.In general, however, models exhibiting fragile fragmentation need not have conserved densities.Defining fragile fragmentation and finding reliable diagnostics for it in the general case are nontrivial tasks, which we address below.First, we provide a more precise definition of fragile fragmentation, to distinguish it from what we call intrinsic fragmentation.Second, we present a physical 'decoupling' algorithm for detecting whether a system exhibits fragile fragmentation, given access to a large enough reservoir of ancillas.Third, we comment on the ways in which fragile fragmentation manifests itself in the dynamics of a thermalizing system.FIG.11.Thermalization time as a function of L for irreducible BS (2) dynamics, where cc −1 , c −1 c pairs can be annihilated but not created.The dynamics is initialized in the state |w huge (n, L)⟩ with n = 3, and the system is considered to have thermalized when no c, c −1 characters remain.Inset: Expansion length of w huge (n) for different n, defined as the minimal system size for which thermalization was observed to occur over 1000 runs of the dynamics.An approximately exponential dependence on n is observed.

Defining fragile fragmentation
We begin by precisely defining the notions of intrinsic and fragile fragmentation in a general context (i.e., without reference to the word problem).For concreteness we specialize to quantum systems evolving under unitary dynamics, specified by a sequence of evolution operators U i (A), i ∈ Z + , acting on the system plus a collection of ancillas A, with each ancilla assumed for simplicity to have the same onsite Hilbert space as the system itself.We assume that U i form a uniform family of time evolution operators that can be defined for any number of ancillas.Given any initial state |ψ⟩ of the system, and a fixed reference state |0⟩ A of the ancillas, we define an ensemble of states on the system as In other words, τ A is the ensemble of pure states one gets by evolving the initial state |ψ⟩ ⊗ |0⟩ A for an arbitrary time, and postselecting on the ancillas being in the final state |0⟩ A .Note that in principle we could make τ A depend explicitly on the state of the ancillas.However, for group dynamics the most natural choice for this state is |0⟩ A = |e⟩ A , and for semigroups an analogous state can be defined by augmenting the local Hilbert space with a character |e⟩ which commutes with all characters.We therefore content ourselves with studying fragmentation for this particular choice of ancilla state.We define the Krylov sector of |ψ⟩ extended to A as where H sys is the Hilbert space of the system (without ancillas).We furthermore define the intrinsic Krylov sec-tor associated to |ψ⟩ as the limit Under generic thermalizing Hamiltonian or unitary dynamics, the intrinsic Krylov sector of any |ψ⟩ will be the entire Hilbert space, K in,ψ = H sys .Intrinsic fragmentation occurs whenever this is not true, i.e. when there exist distinct initial states that do not mix under the dynamical rules even when the system is attached to an infinitely large bath (undergoing the same dynamics as the system).
When |A| is finite, each intrinsic Krylov sector may further shatter into many subsectors.This phenomenon (for |A| larger than the system) is what we have referred to above as fragile fragmentation.The expansion length associated with the dynamics is the minimal size of A below which additional subsectors form.

Probing fragile fragmentation
Our definition of fragile fragmentation above makes reference to postselection on the final state of the ancillas being |0⟩ A .The probability of postselection succeeding is clearly exponentially small in |A|.We now present a more efficient algorithm for (i) identifying whether a given system exhibits fragile fragmentation, and (ii) constructing the subspace K ψ (A) associated with an initial state |ψ⟩ given a maximum expansion length L + |A|.This procedure is more efficient than naively postselecting on the state of the ancillas in various regimes which we discuss below.
The general algorithm proceeds as follows.We start with the state |ψ⟩ ⊗ |0⟩ A , and evolve it under Dyn acting on the system plus ancillas for some time t th .After time t th , we repeat the following steps many times: 1. Measure the last site of the system plus ancillas in the computational basis.
2. If the outcome is e, decouple this site from the rest of the system.Otherwise, leave the site coupled.
3. Run the dynamics for a time t retherm on the system plus remaining ancilla, and go to step 1.
The iteration stops when all ancilla sites have been decoupled: we know this is always possible since the initial state was originally decoupled from the ancillas.On physical grounds we expect that the probability of getting outcome 0 in step 1 is O(1) at all times, but for our purposes it suffices for it to scale as 1/poly(L + |A|).
We first discuss how this algorithm can be used to construct the subspace K ψ (A).To accomplish this, one sets t th to be the maximum possible thermalization time for the system plus ancillas, i.e., t th ∼ exp(L + |A|).Any fragmentation that persists after t th will persist to infinite time for the given spatial resources.One can take the re-thermalization time after a measurement, t retherm , to be much shorter (i.e., as a low-order polynomial in L + |A|), as the measurement is a single-site perturbation to the equilibrated state, and it is not expected to take more than polynomial time to have a significant amplitude to be in |e⟩.When the procedure terminates it yields a state |ψ ′ ⟩ that (by hypothesis) is in K ψ (A).After many runs, the ensemble of generated states spans K ψ (A).
The procedure we described avoids the exponential overhead of postselection, but still incurs the exponential overhead of mixing.If we want to reconstruct a state with overlap on all states in K in,ψ , this overhead cannot be avoided.Suppose, however, that we are not interested in full reconstruction of K in,ψ , but just in the simpler task of showing that adding ancillas and removing them (as above) partially lifts the fragmentation of the original system.More generally, suppose that we have constructed K ψ (A 0 ), and want to know if enlarging A 0 to A 1 enlarges the sector.We can run the initial equilibration step to a much shorter time than the full equilibration time.We stop when we have compressed back down to A 0 , and check if the resulting state is in K ψ (A 0 ) 21 .Finding a single state that lies outside K ψ (A 0 ) suffices to establish fragile fragmentation.Thus, detecting fragile fragmentation can be accomplished without requiring the system to fully thermalize or requiring a large bath.
Can this procedure be used to study other properties of the fragmentation?One additional quantity that can be computed using this method are the geodesic lengths of group elements.Recall that the geodesic length |g| of an element g ∈ G is the length of the shortest word which represents g, namely |g| = min{|w| : φ(|w⟩) = g}.To compute this, we repeat this sequential length reducing procedure until the system freezes.Formally speaking, the system will freeze when ⟨e|ρ end |e⟩ = 0 for quantum dynamics (where ρ end is the reduced density matrix of the spin at the end of the system), or p(e end ) = 0 for classical dynamics, where p(•) denotes a marginal distribution for the last site of the chain.When this occurs, the system size has reached the minimum length needed to support a word in the Krylov sector, therefore yielding the geodesic length of the word.

Fragile fragmentation and thermalization
We now discuss the unexpectedly subtle consequences of fragile fragmentation for the evolution of generic states under unitary dynamics that need not be timeindependent or have any local conserved densities.To keep our discussion concrete, we will focus on Dyn G dynamics for some group G, although the diagnostics we will arrive at are much more general.As we will see, G being a group (rather than just a semigroup) in fact makes fragile fragmentation particularly hard to detect locally; when G is instead a semigroup, simpler diagnostics exist (see Sec. VI for an example).
In a system exhibiting fragile fragmentation, a random word w of length L will contain many substrings s that have expansion length greater than L; in particular, treating a particular substring s as subsystem A and B = A c as a bath of e's, |s⟩ A ⊗ |e⟩ B exhibits fragile fragmentation so long as |B| ≪ EL(|s|).In reality, the substring is nested in the system as |w⟩ = |w L sw R ⟩, but we still claim that the action of Dyn G can never map |w⟩ to |w ′ ⟩ = |w L s ′ w R ⟩, where s ′ is a word with expansion length asymptotically greater than L and in a different fragment to s, but satisfies φ(s) = φ(s ′ ).In particular, one might worry that the presence of the environment words w L/R can 'catalyze' transitions of s, thereby sending |w⟩ to |w ′ ⟩ despite s and s ′ living in different fragile sectors.
A simple argument shows that such catalysis cannot parametrically change the expansion length of a word.Indeed, suppose that by contradiction catalysis can occur.We can append w −1 L to the left and w −1 R to the right, increasing the length of the system by less than L. Then the sequence is allowed by Dyn G .Therefore the space complexity of the transition s ↔ s ′ is at most 2L.For groups with asymptotically superlinear expansion lengths this is a contradiction, and thus such a catalysis cannot occur.
To summarize, a random word contains large substrings that are frozen in some sense: if the initial state can be written as |w L sw R ⟩, time evolution under Dyn G will never produce |w L s ′ w R ⟩.An immediate consequence of this fact is that the time-evolved reduced density matrix for a region A has ⟨s|ρ A (t)|s ′ ⟩ = 0 at all times if s, s ′ are in distinct fragile fragments.Furthermore, since both s and s ′ can be locally generated, through the transition which only requires 2|s| ≪ L of space, we in general expect ⟨s|ρ A |s⟩, ⟨s ′ |ρ A |s ′ ⟩ to be both nonzero.
In conventional systems that exhibit the jamming transition, one can easily diagnose the jammed phase by computing local autocorrelation functions.However, in contrast, fragile fragmentation is hard to detect in this way because the frozen substrings can slide around in the system and locally change their configuration (while remaining in the same fragile sector).Thus, while one can write down a conserved quantity describing the frozen strings, such a quantity will generically be very nonlocal.Nevertheless, the observation that the dynamics does not connect pairs of states like sw R and s ′ w R regardless of w R still allows one to construct a reasonable dynamical probe of fragile fragmentation.For any two words w, w ′ in the same Krylov sector, consider the two-point correlation function β⟩ in time 2t.Thus, the quantity Dehn(wβ, w ′ β) is expected to control when we expect this quantity to be nonzero 22 .Furthermore, for systems where the spacetime bound is saturated (like iterated BS) or close to being saturated, this timescale is dictated by EL(w, w ′ ) and can be at most ∼ exp(EL(w, w ′ )).
Beyond this timescale, the system is able to undergo a large-scale rearrangement that connects w and w ′ (i.e.w and w ′ will no longer appear to live in disconnected fragile sectors).By measuring the onset timescale for C ww ′ (t) to become nonzero as a function of |w|, |w ′ |, one can diagnose the existence of and place bounds on large expansion lengths.Alternatively, computing C w,w ′ (t) can be thought of in the following way: prepare an initial state ρ 0 where a subregion R is in the pure state |w⟩, time evolve this state to get ρ and measure the expectation value of X w,w ′ in the reduced density matrix ρ R .
The above prescription suggests a heuristic way to determine the expansion length as follows (we leave various technical details of this proposal to future work).Define Φ R to be a dephasing channel acting on region R of the system (decomposing the system to the form ABR for convenience): (105) Starting in the state ρ 0 described above, one can alternate between time evolving under Dyn G and applying Φ R before measuring the expectation value of X ww ′ .Call A the subsystem in which the state |w⟩ is initially present.If dist(A, R) ≫ EL(w, w ′ ), then repeatedly dephasing the system should not change the value of C ww ′ (t) by much.
However, if the dephasing channel is applied within a distance of EL(w, w ′ ) from A, then we would expect a further suppression of C ww ′ (t) given that the dephasing eliminates many trajectories mapping w to w ′ .Finding the location of R where one crosses over between these two behaviors would thus provide an estimate of EL(w, w ′ ).
The protocol discussed above is general but somewhat indirect.As we saw in Sec.V B, in specific examples fragile fragmentation can have more direct and dramatic manifestations.In the next section we show that when the group property is violated, fragile fragmentation generally has easier-to-detect physical consequences: in these cases, there can sometimes be a transition where at small subsystem sizes reduced density matrices are generically full rank, while above a threshold size reduced density matrices have nontrivial kernels.

VI. SEMIGROUP EXAMPLES
In the explicit examples of Dyn G dynamics studied above, G has been taken to have the structure of a group.The phenomena discussed so far are however not limited to models with a group structure; indeed all of the general results obtained in Secs.II and III were are valid for any finitely presented semigroup.For semigroups, however, the geometric perspective adopted above in the discussion of Dyn BS is less useful 23 .In this section, we introduce two new semigroup models which do not admit a group structure but which nevertheless have word problems exhibiting large time and space complexity, which we establish using combinatorial rather than geometric arguments.The first example has large time and small space complexity, and shares similarity with the BS model.The second example has both large time and large space complexity, is qualitatively unique to non-group based dynamics, and leads to a more direct characterization of fragile fragmentation than that provided by the general criterion of Sec.V C 3.
These models are inspired by the Motzkin spin chain and, more broadly, Motzkin dynamics (see Refs. [51,58]).For the readers' convenience, we briefly summarize Motzkin dynamics.The local state space includes an identity character |0⟩ (which plays the role of |e⟩ in our group-based models) as well as left and right parentheses |(⟩, |)⟩.The dynamics is engineered so that the "nestedness" of the parenthesis remains preserved, where nestedness is defined by the number of left parenthesis located to the left of matching right parenthesis; for example, under the dynamics the word '()' may evolve to '()()', '(())' or '0', but not to ')(', '((', or '))'.These rules can be summarized formally by defining the Motzkin semigroup as Motz = semi⟨0, (, ) | (0 = 0(, )0 = 0), () = 00⟩.(106) Alternatively, we may define a height field h i which keeps track of the level of nestedness at site i, via The dynamics then preserves both h L (the net difference between the number of ( and ) parenthesis) and min i h i (which measures the extent of the nestedness).

A. Star-Motzkin model: slow thermalization
Our first example has a local state space which we will label by {(, ), 0, * }.As usual, all of what follows can be applied to Hamiltonian, random unitary, or classical stochastic dynamics.The purpose of the extra character * is to slow down the dynamics of the parenthesis; this is done by adding to the relations of Motz the relations Thus when a parenthesis moves past a * character (in a certain direction), the * character is duplicated.The combination of these set of rules results in the dynamics which we will refer to as Dyn * M .One can readily see that Dyn * M exhibits Hilbert space fragmentation.Indeed, if we ignore the * character that was added, the Hamiltonian describes Motzkin dynamics, and thus already possesses fragmentation which cannot be described solely by a conserved parenthesis density.The sectors of these dynamics are labelled by a sequence of closed parentheses followed by a sequence of open parentheses taking the form ) m ( n : this corresponds to the total parenthesis imbalance in the configuration.When the * character is added, a label for the Krylov sectors becomes where It is clear that there are exponentially more sectors with the addition of the * character, and the structure of fragmentation is thus richer.At some level Dyn * M resembles Dyn BS , as the * duplicate every time they are moved past a parenthesis, in a way similar to the duplication of as that occurs as they move past bs in BS.However, there are two differences.The first is that * does not have a natural inverse.The second is that the underlying Motzkin dynamics does not satisfy properties of a group: if the characters ( and ) are identified with a generator and its inverse, then we necessarily must also allow the rule () ↔)(, which is absent in Dyn * M .Nevertheless, in App.G we show that the word problems for these models exhibit the same scaling of spatial and temporal complexities as in Dyn BS : 1. Within the sector K q,0,0 , the expansion length is linear, EL(w, w ′ ) = O(L).
Since q can grow to be exponentially large in L, this last fact implies slow dynamics.In fact, one way to study this slow dynamics is to observe that atypical configurations of the U (1) charge corresponding to the parentheses takes a very long time to thermalize.Regarding the second result, we also provide a crisper characterization for the circumstances under which it takes a long time to transition between two words (see App. G).

B. Chiral star-Motzkin model: fragile fragmentation
We now present an example where the expansion length is exponentially large, implying fragile fragmentation.In this example, we simply replace the * character with a 'chiral' version of the character, which we denote as ▷.The rules for these new characters are similar to that of * , except that the rules are only activated when a ▷ character is adjacent to ).More specifically, we replace the rules in (108) with Note that one could also add another chiral character ◁ which only interacts with ) and commutes with ▷, but the necessary physics is already illustrated for ▷.
We first discuss the intrinsic Krylov sectors of the dynamics.In particular, if we append the system is a large bath of 0's, then one can show that any configuration can be reduced to the canonical form: Note that we can have a large number of ▷ characters locked between adjacent '(' because ▷ characters cannot tunnel past '(' characters.As a result, we will label the Krylov sectors by the four indices ( ⃗ k, ℓ, m, n) where dim ⃗ k = n.Chirality of the ▷ character plays a crucial role in the large space complexity.Define a nest as a collection of parentheses in the form (((• • • ))).In particular, ▷ characters embedded in a nest can exit the nest but cannot enter an adjacent one, due to the chirality constraint.As a result, the transition from can however require exponentially large space if the nest (((• • • ))) contains a large number of ▷ characters.Thus, intuitively, an exponentially large bath is needed to unfreeze the system.
In App.G, we provide a more rigorous argument for when the expansion length connecting two words w and w ′ can be exponentially large, and also discuss an interesting consequence of the fragile fragmentation in this model, which does not have an analog in group dynamics.In particular, we argue that under unitary time evolution ρ(t) = e −iHt ρ(0)e iHt where ρ(0) is a product state, the subsystem density matrix ρ A (t) = Tr A c (ρ(t)) exhibit a transition in its rank as |A| is increased.When |A| ≪ log L then ρ A (t) is of full rank and when |A| ≫ log L, then ρ A (t) is no longer of full rank.Probing this property in a physically reasonable way is further discussed in App.G

VII. GENERALIZATIONS TO TWO DIMENSIONS: GROUP LOOP MODELS
The discussion thus far has been restricted to 1D models of group dynamics.It is natural to wonder whether or not higher-dimensional models with similar behavior can be constructed, especially since the aforementioned phenomena are more prevalent in higher dimensions.
In this section, we discuss one 2D generalization of our group-based models that in some sense is the most faithful way of embedding the 1D group constraint in a two dimensional system, and which leads to a qualitatively new way of producing glassy dynamics and jamming in 2D.This proceeds by fixing a group G 24 and considering loop models that possess one flavor of loop for each generator of G. Along any one-dimensional reference loop, one can associate a group element corresponding to the product of all of the generators corresponding to loops that the reference loop intersects.The dynamics is engineered so that this group element remains invariant under the dynamics.
This class of models can be viewed as a broad generalization of the construction in Refs.[59,60] (which studied dynamics) and Ref. [61] (which studied ground state properties), and we expect similar robustness of the Hilbert space fragmentation in these models.For brevity's sake we will discuss this construction at a high level-e.g.we will largely use continuum language in order to avoid the notational burden incurred by an explicit lattice description-and will defer a more comprehensive analysis to future work.
Given a discrete group G and a presentation thereof, the degrees of freedom in our 2D model are associated with directed loops labeled by generators of G.In the following, we describe how to place constraints on the dynamics of these loops to produce phenomenology similar to that present in our 1D models.
We start by observing that each directed reference path γ through space (microscopically, along the lattice) can be associated with a group word w(γ).This word is determined by the ordered labels of the loops that γ intersects.Specifically, when proceeding along γ, each time γ intersects a loop labeled by the generator g i , w(γ) is i) multiplied by g i if the local tangent vector ⃗ r of γ and the local tangent vector ⃗ ℓ of the loop can have a cross product ⃗ r × ⃗ ℓ is parallel to ẑ; and ii) is multiplied by This observation means that isolated closed loops can be regarded as implementing trivial relations in the words associated to the paths which pass through them, a fact we illustrate pictorially as (112) Therefore, we associate processes nucleating a loop with a free expansion (ee → gg −1 ) and annihilating a loop with a free reduction (gg −1 → ee).With more loops, a more general situation might look like the following: (113) We now discuss how to implement relations of the group in terms of the loops.Suppose the group presentation is indicated by where each of the r i are words to be identified with the identity in G, and |r i | ≤ 3 for all i; this restriction on the length of the relations follows from the fact that any group (but not any semigroup) exhibits a finite presentation satisfying |r i | ≤ 3 (see App.A for the proof).Writing r i = g m g n g ℓ , this corresponds to an object which we refer to as a net: As another example, if r i = g m g −1 n g ℓ (i.e. one of the generators is replaced with its inverse), then the net looks like and so on.Any loop configuration corresponding to one of the above nets can be created or destroyed without changing φ(w(γ)) (the group element associated to w(γ)), since for any curve γ that cuts across the net, creating or destroying the net simply corresponds to applying the appropriate relation r i at some point in the word w(γ).
The dynamics we consider are generic dynamical processes which preserve φ(w(γ)) for all closed curves γ, which may be viewed as an unusual type of gauge constraint.Thus the dynamics will include processes which nucleate and destroy nets associated to each r i , and will also include processes where lines are moved, stretched, contracted, and where intersections of lines are moved.It will also include processes which attach a loop with an intersection point of other loops, like so: where an analogous deformation occurs for g k replaced with g −1 k (in which case the arrow is reversed).To avoid problems on the lattice where an unbounded number of joins can occur (requiring unphysically large degrees), we only allow for a join if the degree of the intersection point is below a certain threshold, set by max i |r i |.
The dynamical processes described above are sufficient to simulate the full group dynamics.For example, suppose we want to determine whether the processes we wrote down suffice to simulate g m g n = g −1 ℓ (assuming r i = g m g n g l = e is a relation the group).We can show that this is indeed the case by applying the following sequence of relations: where the first relation creates a net, the second relation corresponds to two joins, the third relation corresponds to two un-joins, and the last relation corresponds to a free reduction.A similar derivation shows that cyclic conjugates of relations (such as g −1 l r i g l = g −1 l g m g n = e) can similarly be simulated by the rules we have already discussed.
To summarize, our 2D dynamics contains the following processes: 1. Processes which deform loops in ways which do not create or destroy loop crossings, 2. Processes which nucleate and annihilate closed loops (performing free reductions and expansions), 3. Processes which join a free loop with an intersection of loops25 , 4. For each relation r i , a process which nucleates or annihilates an r i -net.
The dynamical processes above were formulated for the case where G is a group, but it is also possible to generalize to the case when G is merely a semigroup, producing models reminiscent of the 2D generalizations of the Motzkin chain studied in Ref. [61].Obtaining a more systematic understanding of the temporal and spatial complexity of the dynamics of these models would constitute an interesting avenue for future work.
One final observation we make is that there is a simple way to map group loop models to what we call tile models.Observe that a configuration of loops or nets splits the plane into a set of disjoint tiles (two adjacent tiles are joined by an edge which corresponds to a generator of the group).Then, it turns out to be possible to label tiles with an element g ∈ G such that if two tiles have labels g and h and share an edge corresponding to generator k, then gk = h.For example, if G = Z, each tile will be labelled by an integer and two adjacent tiles have labels which differ by 1 (which is the generator of Z).This corresponds to a mapping to a height model, of which the tile model is a generalization of in the case of arbitrary G.This mapping may play a role in understanding the nature of fragmentation in these models, which we leave to future work.

VIII. DISCUSSION
In this work, we have constructed a number of natural dynamical systems-with local few-body interactionsin which relaxation places anomalously expensive demands on a system's temporal and/or spatial resources.When the models have local conserved densities, the hydrodynamics of these densities is anomalous or frozen; even when conserved densities are absent, we have presented diagnostics for nonergodic behavior.
Our examples were all constructed in the context of models with intrinsic Hilbert space fragmentation.A natural question is whether the intrinsic fragmentation is essential to their physics.In our framework, dynamics without fragmentation is generated by finite presentations of the trivial group, which cannot have a superlinear Dehn function (see App. B).This of course does not mean that the dynamics of models without fragmentation cannot be slow, but it does mean that any mechanism for slow thermalization must originate from something other than the Dehn function.
Our results lend themselves to several natural extensions.Most naturally, the anomalous hydrodynamic relaxation we saw in the BS model can be extended to other groups with presentations that manifest a conservation law.Whether these groups give rise to new classes of hydrodynamic relaxation is an interesting question for future work (a family of such examples will be presented in [62]).Another interesting direction is to investigate the ground states of Hamiltonians that implement group dynamics.A natural class of frustration-free Hamiltonians can be read off from the transfer matrices of bistochastic Markov processes [63]; their ground states are equal-weight superpositions of all the configurations in a sector, and their spectral gaps can be bounded by the Markov-chain gap.The tools developed here may be useful for undertaking a more detailed study of properties of these states.
The family of models we considered is restricted in the sense that the dynamical constraints can be expressed in the computational basis, so that every computationalbasis product state is in a definite dynamical sector of Hilbert space.More generally, one can consider constraints associated with a commuting set of projectors with entangled eigenstates.In one-dimensional spin chains, such commuting projectors can be deformed into unentangled projectors by a short-depth unitary cir-cuit.However, this process just corresponds to conjugating the Hamiltonians/unitaries we have explored with short-depth unitary circuits, and yields nothing qualitatively new.Getting something new in one dimension thus requires defining constraints using non-commuting projectors-which occurs in the (quantum-fragmented) Temperley-Lieb model [23,64,65]-and hence pursuing this direction requires systematic characterization of such constraints.In higher dimensions however, there are sets of commuting projectors (like those of the toric code) whose simultaneous eigenstates are inherently long-range entangled.An interesting task for future work would be to extend our group-theoretic dynamical systems to these models, and explore the resulting entanglement dynamics.
Finally, it would be interesting to explore the stability of our results with respect to weak violations of the constraints.One way to do this is by breaking the constraints in an isolated region of space, or by bringing a thermal bath in contact with the system at its boundary.In the case of group dynamics, doing so destroys fragmentation, making the dynamics fully ergodic.It furthermore leads to all states being connected by at most O(L 2 ) steps of the resulting dynamics, since the O(L) elements of a given product state's geodesic word need to be moved at most an O(L) distance to the constraintfree region, at which point they can be changed into the elements of any other geodesic word.It is however still possible for the dynamics in this case to have long (even exponentially long in L) thermalization times, due to bottlenecks in Fock space that arise from the finite spatial extent of the constraint-free region.A specific example of this is presented in [66].semigroup.As an example, consider the groups where m, n are relatively prime.These two groups are isomorphic, G 1 ∼ = G 2 ∼ = Z, with the isomorphism associating an element x a y b with the integer am + bn; note that this is true despite the fact that G 1 and G 2 have a different number of generators and relations.A given semigroup in general admits an infinite number of different presentations, but below we will prove that the group-theoretic functions defined in the main text-the Dehn function, expansion length, and so onhave asymptotic scaling behaviors that are presentation independent.
We can exploit this fact to choose presentations satisfying some particular desired property.For example, we may be concerned with choosing a model of dynamics where the Hamiltonian or unitary gates are as local as possible.Since the locality of the dynamics is limited by the maximum size of the relations in R, we would thus like to minimize the size of the relations.To this end we have the following proposition: Proposition 1.Every finitely generated group has a presentation ⟨S|R⟩ where all relations r i ∈ R have length Proof.Consider the Cayley 2-complex CG G of a finitely presentable group G (see Sec. IV for a brief discussion of its definition) obtained from a finite presentation P .While the exact structure of CG G depends on P , CG G can always be subdivided to obtain a simplicial complex where each 2-cell has 3 edges.Since each 2-cell in the complex corresponds to a relation, |r i | ≤ 3 for all r i ∈ R in the subdivided complex, thereby defining a presentation P ′ whose relations all have length ≤ 3. Since P is finite this subdivision is completed after only a finite number of steps, and P ′ is consequently also finite.
Note that while the Cayley 2-complex of a semigroup can also be subdivided, the lack of translation invariance in a semigroup's Cayley complex means that the resulting subdivision may yield a presentation with infinitely many generators (an illustrative example is to compare the semigroup N × N with the group Z × Z).
As mentioned above, the group theoretic properties we are interested in from the point of view of group dynamics are largely insensitive to the choice of presentation.For example, recall the growth function N K (L) ≜ |{g, |g| ≤ L}| defined in (30) of the main text, which measures the number of dynamical sectors as a function of system size.The scaling of N K (L) with L is independent of the choice of presentation, allowing us to meaningfully talk about the growth function of a semigroup, rather than of a presentation: Proposition 2. Let N K,P denote the growth function for a particular presentation P = semi⟨S|R⟩ of G. Then for all finite presentations P, P ′ of G.
Proof.Let P = semi⟨a 1 , . . ., a |S| | R⟩ and Then since both P, P ′ present G, each a ′ i can be expressed as a product of a finite number of a i .Let n P P ′ denote the maximal number of generators of P that appear when writing the a ′ i in terms of these generators.Let also |g| P denote the geodesic distance of g ∈ G with respect to the presentation P .Then |g| P ′ ≤ n P P ′ |g| P .Thus We may also perform a similar rewriting of generators of P in terms of those of P ′ .After running the same argument, we find that there exist O(1) constants n P P ′ , n P ′ P such that and hence N K,P ∼ N K,P ′ .
Similar reasoning can be applied to show that the Dehn function and expansion length (see Sec. II or the following appendix for definitions) of a semigroup are presentationindependent: Proposition 3. Let Dehn P (L), EL P (L) be the Dehn function and expansion length of a semigroup with a finite presentation P = ⟨S|R⟩.Then if P, P ′ are any two such finite presentations, Appendix B: A primer in geometric group theory In this appendix we state and prove some useful facts about the geometry and complexity of finitely presentable discrete groups.Many of the statements derived below are well-known results in the math literature, and we have tried to provide citations to the original works when appropriate.A particularly accessible review of background material relevant to the discussion to follow can be found in Ref. [41]; a more advanced reference is Ref. [67].As a small notational convenience, in the following the notation w ∼ g will be used as shorthand to denote that the word w evalutes to g; in the main text this was written as φ(|w⟩) = g: We will mostly be interested in infinite groups, since finite ones have trivial large-scale geometry (in a sense soon to be made precise).Finitely presentable infinite semigroups are of course very easily constructed; indeed it is easily verified any group presentation where the number of generators exceeds the number of nontrivial relations 26 will generate an infinite group.Free groups (on n > 1 generators) and Abelian groups in some sense define opposite extremes, since the free group has a Cayley graph that is embeddable in the hyperbolic space H n , while Abelian groups have Cayley graphs that are embeddable in R n .Most of the interesting cases for us correspond to when an intermediate amount of 'Abelian-ness' is introduced to the non-Abelian free group.

Time complexity: the Dehn function
The definition of the Dehn function (14) in the main text relates only to the (worst-case) complexity of deforming a given word w ∈ K e into the identity word (recall that K g,L is the set of length-L words that represent the element g, namely K g = {w | |w| = L, g(w) = g}).We claimed in the main text that focusing on the complexity of words in K e -as opposed to K g for g ̸ = ewas done without loss of generality.We now prove that studying the complexity of the word problem in K g̸ =e does indeed not produce anything which is not already captured by Dehn(L): Proposition 4. For a given element g of geodesic distance |g| ≤ L, define the g-sector Dehn function as where Dehn(w, w ′ ) is the minimum number of applications of relations needed to transform w into w ′ .Then for all g.
Proof.For two words w 1,2 in the same K g sector, any deformation (aka based homotopy) of w 1 to w 2 gives a deformation between the length-2L word w 1 w −1 2 ∈ K e (2L) and e.Thus the minimal number of steps needed to relate w 1 to w 2 cannot be asymptotically smaller than the minimal number of steps needed to deform w(w ′ ) −1 to e.This implies where ≲ denotes equivalence up additional contributions linear in L. 27 This means that Dehn(L) ≲ Dehn g (L).
26 By a "trivial relation" we mean a free reduction/expansion, namely a relation of the form aa −1 = e. 27These additional contributions occur because we need O(L) steps just to get rid of trivial words like ww −1 .However, an extra L steps can be important when we do not allow words of length greater than L to appear at any stage of the deformation (which we very well may want to do on physical grounds).
Conversely, since we can deform e to w −1 1 w 2 ∼ e in time ∼ Dehn(w 1 w −1 2 ), w 1 can be deformed into w 1 (w −1 1 w 2 ) = w 2 in time ≲ Dehn(L).Thus we also have Dehn(w, w ′ ) ≲ Dehn(w 1 w −1 2 ), so up to factors of order L, we have The above argument means that-if the amount of space available to us is not restricted-all sectors have asymptotically the same worst-case complexity.If we however require that all words involved have length ≤ L, Prop. 4 is modified to read Dehn g (L) ≲ Dehn(L). (B6) In this case, all sectors K g,L for which the geodesic length 28 of g satisfies |g|/L < 1 in the L → ∞ limit will still have Dehn g (L) ∼ Dehn(L).On the other hand, sectors where |g|/L = 1 as L → ∞ will not have enough "free space" for the above argument to work, and will thus have Dehn g (L) < Dehn(L)-each word in this sector must have at least |g| locations allotted to represent g, giving it less room to form large loops in the Cayley graph with the remaining locations (in the case where fragile fragmentation occurs [see Sec.V], this remains true provided we define Dehn(w, w ′ ) = 0 if w 1 , w 2 are in different subsectors of K g,L ).Sectors where |g|/L = 1 as L → ∞ are however necessarily exponentially smaller in size than those with |g|/L < 1 (see Prop. 10), and hence a random state-which with high probability is associated to a geodesic word where L − |g| diverges as L → ∞will be in a sector with Dehn g (L) ∼ Dehn(L) with high probability.
We now provide examples of simple groups and their Dehn functions as well as discuss which sorts of groups we can expect to have large Dehn functions.A basic result is that only infinite non-Abelian groups can have interesting Dehn functions, due to the following easily verified statements: Fact 1.All finite groups have Dehn(L) ≲ CL for some (presentation-dependent) constant C related to the diameter of the group's Cayley graph. 29act 2. Dehn(L) ≲ L 2 for any Abelian group. 30roups which are too non-Abelian also have quadratic Dehn functions: Proof.This follows from the fact that the Cayley graphs of free groups are trees, and thus any word w ∈ K e is a (potentially backtracking) path on the tree which contains no nontrivial loops.Such paths can be contracted to the trivial path by applying O(L) relations xx −1 = ee, and O(L 2 ) relations xe = ex. 31  To look for interesting Dehn functions we thus need infinite groups with a moderate amount of Abelianness, which (roughly speaking) possess nontrivial loops at all length scales.One can start by finding examples of groups where Dehn(L) scales as a higher order polynomial.It was shown in Ref. [42] that these turn out to be virtually nilpotent groups.The simplest example is: Example 1.The discrete Heisenberg group has Dehn(L) ∼ L 3 [42].
The group BS(1, 2) [68] studied in detail in the main text is the simplest group known to the authors with exponential Dehn function, whose properties we discuss in detail in App.E.

Space complexity: the expansion length
As discussed in the main text, the space complexity of the word problem is given by the maximal size of words that one must encounter when reducing a w ∈ K e to the identity, as measured by the expansion length EL(w), a quantity originally introduced by Gromov in Ref. [69].If EL(w) > |w|, then w must expand by a nontrivial amount while being reduced to the identity.
As we did with the area, we may use the expansion length to define a distance metric between any two words w 1,2 that represent the same group element.Instead of using EL(w 1 w −1 2 ) (which we do not do on account of the homotopy relating w 1 to w 2 needing to be properly based), we define the relative expansion length EL(w, w ′ ) between two words w 1 ∼ w 2 by replacing e by w 2 in the above definition: ) 31 In the mathematical literature, a convention is often used where Dehn(w) is the number of relations needed to reduce w to the identity e, allowing the length of the word to change in the process (so that e.g.ee may be replaced by e).For physical models of group dynamics, where the total length of the word is always fixed, Dehn(w) will generically be longer than the value obtained with this convention by a factor of L, corresponding to moves of the form ea = ae that need to be performed during the reduction of w to e L .
where the ∆ w1→w2 (t) are all paths in the Cayley graph with endpoints fixed at e and g(w 1 ) = g(w 2 ).When the homotopy is trivial, i.e. when w 1 = w 2 , we define the relative expansion length to vanish.The expansion length of a group can be defined as the worst-case spatial complexity of words in K e , as was done in our definition of the Dehn function: A comprehensive survey of geometric properties of EL can be found in Ref. [70].
Similarly to the Dehn function, the asymptotic scaling of EL(L) is independent of the choice of (finite) presentation.Also as with the Dehn function, considering expansion in other K g sectors does not yield anything new.Just as with proposition 4, one can similarly show that Proposition 6.For a given g of geodesic distance |g| ≤ L, define the expansion lengths for all g.
Most groups one is familiar with have EL(L) ≲ L. For example, it is easy to see that all Abelian groups have EL(L) ≲ CL for some constant C. One can also show Proposition 7. All finite groups have EL(L) < L + C for some constant C.
Proof.The proof proceeds according to the same one that would be used in demonstrating the correctness of example 1.Let w ∈ K e be a length-L closed loop in the Cayley graph of a finite group, and let N be the number of vertices in the Cayley graph.Then since N is finite, we may write since a path in the Cayley graph can only reach at most N different vertices before returning to its starting point.Let M = EL(N ).If we append M identity characters to the end of w, we can use them to turn any one of the w j into e without increasing the length of the appended word.Since we can do this for all of the w j , we thus have EL(L) = L + M as claimed.
Furthermore, we will see later that despite having exponential time complexity (Dehn(L) ∼ 2 L ), BS(1, 2) only has linear spatial complexity (EL(L) ∼ L).This is part of a more general result that asychronously combable groups (which BS(1, 2) is an example of) have EL(L) ≲ L [71].
It is however relatively straightforward to construct examples where EL(L) grows faster than linearly.This is done simply by finding groups with Dehn functions that scale as Dehn(L) = ω(2 L ).This is due to the "spacetime" bound mentioned in the main text: Proposition 8 ( [57]).For a finitely presentable group generated by n g generators, This can be rewritten as EL(L) ≳ log 2ng+1 Dehn(L), which grows superlinearly if Dehn(L) grows superexponentially.
Proof.For w ∈ K e with expansion length EL(w), the number of words that w can possibly visit as it is reduced to the identity is |{a i , a −1 i , e}| EL(w) = (2n g + 1) EL(w) .Since the shortest reduction of w to e cannot visit a given word w ′ more than once, the number of steps in the reduction is (often very loosely) upper bounded by (2n g + 1) EL(w) .

Growth rates of groups
A simple measure of a group's geometry is how fast the group grows, namely how the number of elements within a distance L of the origin of the Cayley graph grows as L is increased.To this end, let B(L) denote the radius-L ball centered at the origin of the Cayley graph, namely and define the growth function (or group volume) as N K (L) ≜ |B(L)|.Physically, N (L) places a lower bound on the number of Krylov sectors that Dyn G possesses (in the absence of fragile fragmentation, the number of Krylov sectors equals N K (L)).A basic property of a group is the asymptotic scaling of N K with L. It is straightforward to N K ∼ O(1) for finite groups and N K ∼ poly(L) for Abelian groups.Free groups provide the simplest examples where N K ∼ exp(L), which is the maximum possible growth rate. 32e now ask what implications group expansion has for the time and space complexity measures introduced above.It turns out that exponential growth is needed in order to have Dehn(L) growing faster than poly(L): Proof.The contrapositive of this proposition follows from Gromov's theorem that all finitely generated groups with polynomial growth are virtually nilpotent (namely have a nilpotent subgroup of finite index) [73].Since virtually nilpotent groups have the same Dehn functions as nilpotent ones, we may combine Gromov's theorem with the fact that all nilpotent groups have Dehn(L) ∼ L d for some d [42] to arrive at the result.
Note that the converse to proposition 9 is obviously false, as free groups on more than one generator provide examples of groups with exponential growth but with polynomial (in fact Dehn(L) ∼ L) Dehn functions.
We would also like to know the sizes of the different sectors K g .One result along these lines is that the sector sizes must get small as |g| gets large: as the total number of words of length L. Then the size of K g,L is upper bounded as for some g, L-independent constants C, c.
In the context of groups (like BS) with exponential growth, this tells us that almost all Krylov sectors are exponentially smaller than the largest ones.
Proof.The proof follows from connecting the counting of walks in K g,L with the heat kernel on the Cayley graph.Consider the symmetric simple lazy walk on the Cayley graph, and let p L (g, h) be the probability that a length-L walk starting at h ends at g. Then where the sum over k runs over the neighbors of g in the Cayley graph.By translation invariance of the Cayley graph, we can fix h = e without loss of generality, and will simply write p L (g) for p L (g, e).
The previous equation can be written more succinctly as where ∆ is the normalized graph Laplacian and δ L denotes the discrete derivative along the "time" direction determined by L. Thus the different sizes of the K g,L are determined by using the heat equation to evolve a delta function concentrated on e for a total time of L. The claim we are trying to prove then follows from estimates of the discrete heat kernel Greens function developed in the graph theory literature, see [74] for a review.
Various other facts follow from the observation that p L (x) obeys the heat equation.For example, it implies that p L (x) obeys strong maximum and minimum principles, which guarantees that all local maxima and minima of p L (x) occur on the boundaries of its domain of definition.It also means that since in groups of exponential growth almost all group elements in B(L) have geodesic dist stance close to L, almost all sectors K g,L will contain a number of elements exponentially smaller than D.
In addition to the above asymptotic bound, we can also prove that K e is always the largest sector: (B19) Proof.We aim to show that p L (e, e) ≥ p L (e, g) for all g, L. For simplicity of notation, let L ∈ 2N.Then using p L (g, h) = p L (gk, hk) for all g, h, k on account of transitivity of the Cayley graph, we have where in the third line we used Cauchy-Schwarz.
It is also possible to make statements about the absolute size of K e .In particular, |K e | admits different bounds depending on the growth rate of the group: Proposition 12 ( [75]).For a group with growth function N K (L) scaling as a polynomial of L, (B21) For a group with exponential growth, In the main text, we saw numerically that this bound is saturated for BS (see Fig. 4).This means that while Dyn BS has exponentially slow relaxation, it is as 'close' to being weakly fragmented as possible.
We can also connect the above bound with a previous result to derive: Corollary 1. Groups with superpolynomial Dehn functions have exponentially many group sectors for words of a fixed length, and have an identity sector K e that contains a fraction of all length-L words that scales at most as e −L 1/3 .Proof.This follows by combining proposition 12 with proposition 9.

Appendix C: Semigroup dynamics and undecidability
While not relevant to the examples studied in the main text, it is amusing to note that the word problem for semigroups is not just computationally hard, but can even be undecidable.For example, a foundational result in computability theory is the undecidability of the semigroup word problem, as first proven by Markov [76].This result implies the existence of finitely-presented semigroups for which Dehn(w, w ′ ) grows faster than any recursive function.For such groups, one can never be sure whether or not two words |w⟩, |w ′ ⟩ are in the same K g , showing that establishing the number of dynamical sectors is impossible in general.
Remarkably, rather simple examples of semigroups with undecidable word problems are known, with the required |H loc | being as low as five.An explicit example from Ref. [77] provides a rather simple example of a semigroup with an undecidable word problem on the fiveelement generating set S = {v, w, x, y, z}, and the seven relations The word problem is also undecidable when one specifies further to the setting of a finitely presented group [78,79].Going further still, the Adyan-Rabin theorem [80,81] states that nearly all "reasonable" properties of finitelypresented groups-their Dehn times, whether or not they are finite or Abelian; even whether or not they are a presentation of the trivial group-is undecidable.As a corollary, the existence of Hilbert space fragmentation itself is therefore undecidable: even if ergodicity looks to be broken for all system sizes below L, for some group presentations one can in general never be sure that it will not be restored at system size L + 1.This result compliments recent work on the undecidability of physical problems relating to local Hamiltonians, e.g.computing spectral gaps, ground state phase diagrams, and so on [82][83][84][85].In this appendix, we will prove that the Krylov sectors of Dyn G can not be associated with the quantum numbers of any global symmetry if G is non-Abelian (as is the case for all of the examples of interest).In fact we will actually prove a more general result.To state the result, we will define a locality-preserving unitary as any unitary operator U which is such that for all local operators O, the conjugated operator is also local.The generators of any global symmetry are locality-preserving unitaries, but for us we will not need to assume that U commutes with Dyn G .We will prove the following result: Proposition 1.When G is non-Abelian, the Krylov sectors of Dyn G cannot be fully distinguished by the eigenvalues of any set of locality-preserving unitaries.
Proof.We will argue by contradiction.Assume that there existed a set of unitaries U a whose expectation values in a given computational basis product state |w⟩ allowed one to determine the group element φ(|w⟩) associated with w (and therefore the K g,L that |w⟩ belongs to).Consider the states with l, m, n all proportional to the system size L. We can write Taking g, h to be any two generators such that gh ̸ = hg as group elements, the above then shows that states in different K g,L cannot be distinguished by eigenvalues of the {U a }, provided that i) the above assumption about translation invariance can be removed, and ii) |w gh ⟩ is not orthogonal to |w gh;a ⟩ for all choices of l, m, and all choices of g, h such that gh ̸ = hg.i) is dealt with by noting that when the U a are not translation invariant (as is the case for modulated symmetries), we may exploit the fact that the sectors to which |w gh ⟩ and |w hg ⟩ belong are independent of l, m, and hence l, m can be varied without changing the eigenvalues of the respective states under the {U a }.For ii), we need only note that which is nonzero since |e⟩ is an eigenstate of U † a by assumption (on account of its definite symmetry charge), and since unitarity guarantees that U † a has no zero eigenvalues.
Appendix E: The geometry and complexity of Baumslag-Solitar groups In this section we state and prove some facts about the group geometry of the Baumslag-Solitar group BS(1, 2) and some of its simple generalizations.The original work in which these groups were defined is Ref. [68].We will always work with the presentation Everything we say below can be readily modified to work for the generalixed groups BS(1, q) = ⟨a, b | ab = ba q ⟩, but for concreteness we will specify to q = 2 throughout, and as in the main text will write BS for BS (1,2).A useful fact about our chosen presentation for BS is that it admits the the matrix representation [41] a which allows the word problem to be solved in linear time and is helpful in numerical studies of BS's group geometry. 34ur notation will carry over from the previous section on general aspects of group geometry.We will also introduce the notation n b (w) to denote the net number of bs that appear in a word w, namely 12. The number NK (L) of Krylov sectors, equal to the number of distinct BS group elements whose geodesic length is less than or equal to L.

Worst-case complexity
We will start by analyzing the worst-case time and space complexity of words in BS, which we do by computing the Dehn and expansion length functions.
To orient ourselves, it is helpful to recall that the Cayley graph of BS is homeomorphic to the product of the real line with a 3-regular tree (for BS(1, q) it is a (q + 1)tree), with the depth of a given word w along the tree being controlled by the relative number of bs and b −1 s that w contains (see Fig. 3 of the main text).We will refer to this tree as the b-tree, and will use terminology whereby multiplying by b (b −1 ) moves one "up" ("down") on the sheet of the b-tree one is currently at, while multiplying by a (a −1 ) moves one "right" ("left") within the sheet (along the a-axis).Because of the hierarchical nature of the graph, multiplying by b -namely moving deeper into the b-tree -moves one to "larger scales", while multiplying by b −1 does the opposite.
Because of the tree structure, it is clear that the group volume grows exponentially with L for some constant λ > 1.
A naive guess for the value of λ is as follows.First, we realize that the exponential growth comes from the tree structure, and that motion by one node on this tree is always possible though the use of at most two group generators (since to perform an arbitrary move on the tree one must multiply by either b, b −1 , or ab).Since the number of points at depths d ≤ L of a 3-regular tree goes like 3 L , we therefore estimate N K (L) ∼ 3 L/2 , giving λ ≈ √ 3.This estimate is actually extremely close to the numerically computed scaling, as we show in Fig. 12.

Dehn function
We now turn to computing the Dehn function and expansion length of BS.The Dehn function must be large, since it is easy to construct words with Dehn(w) ∼ 2 |w| [56]: Proposition 13.Define w n ≜ b −n ab n , so that w n ∼ a 2 n . 35Then the word has area A visual illustration of this statement is given in Fig. 3.The proof, which is a condensed version of the original proof in Ref. [56], is as follows: Proof.To determine Dehn(w big ), we need to find the minimal number of relations needed to turn w big into the identity word.Geometrically, this corresponds36 to the number of 2-cells in the Cayley complex CG BS that form a minimal spanning surface S w big with w big as its boundary.Note that the loop defined by w big is entirely contained within two sheets of CG BS .It is furthermore clear that if we only consider bounding surfaces S w big contained within these two sheets, the minimal bounding surface has area (E7) Therefore we need only show that this area cannot be reduced by considering surfaces that extend into other sheets.The reader can verify that this is true, since a 2-cell of S w big that lives on any other sheet will necessarily make an unwanted contribution to ∂S w big .More formally, we can recognize that, being homeomorphic to the product of R with a 3-tree, the Cayley 2-complex is contractible, and thus the S big found above is the unique minimal bounding surface.
Note that by considering a homotopy which shrinks w big down to the identity by first making the loop narrower along the a axis before shrinking it along the b axis, the length of the word does not parametrically increase.This is an intuitive explanation for the following fact: Fact 3. [ [71]] BS has linear expansion length, EL(L) ∼ L.
The previous proposition immediately implies the existence of exponentially many (in L) elements of K e,L with exponentially large (also in L) area.The Dehn function thus grows at least as Dehn(L) ≳ 2 L .We now show that this bound is in fact tight.The existing proof of this fact in the mathematical literature appears to be to note that BS is an "asychronously automatic" group [86], and that any such group has Dehn(L) ≲ 2 L [87].In the following, we give a more elementary proof: Proposition 14. BS has exponential Dehn function: (E8) Proof.The construction above tells us that Dehn(L) ≳ 2 L .A matching upper bound can be proven using an algorithm which converts an input word to a particular standard form.This is done by simply moving all occurrences of b in w to the left and all occurrences of b −1 to the right, duplicating a and a −1 s along the way as needed and (optionally, for our present purposes) eliminating bb −1 pairs as they are encountered.
It is clear that at most an exponential in L number of additional as are generated during this process of shuffling the bs and b −1 s around (as usual, this means exponential up to a polynomial factor, here linear in L), so that the resulting word is where w a is a word containing only e, a, a −1 and with length |w a | ≲ 2 L .Since w ∼ e, we know that k = l and that w a ∼ e.Thus an additional 2 L relations suffice to reduce w a to e and hence w ′ to e -thus, 2 L is also an upper bound on Dehn(L).
We saw in Fact 12 that BS has linear expansion length, meaning that there exists constants C, D such that EL(w) ≤ CL + D for all w ∈ K e,L .The argument in Ref. [71] however does not tell us whether C = 1 or C > 1, which is need for determining whether or not BS exhibits fragile fragmentation.The following proposition answers this question in the affirmative: Proposition 15.There exists constants α, D with α > 0 such that Proof.It is enough to show that EL(w) ≥ (1 + α)L + D for a particular length-L word w, which we choose to be w big in (E5) with n such that L = 4(n + 1).By the contractibility of the Cayley 2-complex CG BS , for all 0 ≤ m ≤ 2 n , any homotopy from w big to the identity must pass through the vertex a m of CG BS .Therefore where |a m | is the geodesic distance (not word length) of a m .
We now argue that there exists an 2 n−1 < m < 2 n , an α > 0, and a constant c 1 such that |a m | > (2 + 2α)n + c 1 .By the contractibility of CG BS , any geodesic of a m will be contained within a single sheet.Furthermore, since m is exponentially large in n in the worst case, the geodesic should reach a height of at least n−c 2 on the sheet, where c 2 is a sufficiently large O(1) constant.If the geodesic does not reach such a height, then it must make a much larger number of steps in the a direction resulting in a larger perimeter.Letting n |b| (n |a| ) be the number of b, b −1 (a, a −1 ) characters that appear in the geodesic, this shows that n |b| ≥ 2(n − c 2 ), and so α ≥ 0.
To compute the perimeter, we have to determine the number of steps the smallest length word makes in the a direction in order to reach a m .Since at its highest point n |b| /2 steps down along the b direction are needed, we can intersperse any of these n |b| /2 ≥ n − c 2 steps with at least one step along the a direction.Note that if two consecutive steps along the a direction are taken, these can be pulled past the previous b step to form a single a step, thereby reducing the total length of the loop.Therefore, the minimal length loop has b steps interspersed with at most a single a step.Since there are exponentially many (in n) choices of m, and since each a m has a unique geodesic (i.e.minimal sequence of a and b steps), a simple application of the pigeonhole principle guarantees that ∃m such that ≥ 0.98n steps are along a.This can be verified because the total number of placements of less than 0.98n steps along a in a sequence of n |b| ≥ (n − c 2 ) steps along b is ≪ 2 n .Therefore, there is a c 4 such that max 0≤m≤2 n 2|a m | ≥ 2(2 + 0.98)n + c 4 = 5.96n + c 4 , meaning that (E10) is satisfied with α = 0.49.
While this demonstrates that words in K e,L must expand by an amount proportional to L before being mapped to the identity word, this statement is only meaningful in the large L limit, and in practice EL(L)/L can be very close to 1 for modest choices of L (as was seen in the numerics of Sec.IV E).

Average-case complexity
We now turn to examining the average-case complexity of words in BS (or more precisely, the complexity of typical words in BS).Unfortunately, only a few results on average-case Dehn functions are known (one example for nilpotent groups is Ref. [45]), and average-case complexity has not been studied for groups with superpolynomial Dehn function.We emphasize that our analysis in this section is not rigorous and relies on several reasonable claims backed up by some numerical evidence.Letting Dehn(L) denote the Dehn function of a typical word in K e,L , we claim (E12) for f (ϵ) → 0 as ϵ → 0.
Rigorously proving this claim-which we believe is within reach-constitutes an interesting direction for future research.
Distribution of geodesic lengths for random words Dehn(L) measures the typical area of words in K e,L .The difficulty in computing Dehn(L) lies in the fact that it is hard to sample words from K e,L , due to the constraint that such words form closed loops in the Cayley 2-complex CG BS .A much easier task is to determine the geometry of typical length-L paths in CG BS , regardless of whether these paths form closed loops.Understanding this easier problem will allow us to build intuition for computing the scaling of Dehn(L).
The precise question we address is this: given a random length-L word w, what is the geodesic distance |g(w)| of w?We can efficiently obtain an answer numerically by randomly sampling words w and approximately computing their geodesic distances.In order for this procedure to be efficient, it is helpful to realize that any word may always be brought into the following canonical form: Proposition 16 ( [88]).Any word in BS can always be mapped to a unique standard form where n can be even only if at least one of k, l are zero. 37  Proof.This follows from the arguments around (E9) or use of the matrix representation (E2).
Since k, n, l are unique, the w knl serve as a set of canonical representatives for each g.To get the geodesic of an arbitrary word, we first reduce it to this form, and then make use of the fact that Proposition 17 ([88]).The geodesic distance of a word w is determined by the exponents k, n, l appearing in its canonical form w knl as All that remains is to find a way of determining k, n, l given an arbitrary word w.This is done using the matrix representation (E2), in which w knl becomes Therefore to find k, n, l for an input string w, we proceed as follows.We first find the matrix corresponding to w by explicit matrix multiplication, yielding a result of the form A B 0 1 .We know right away that • If B ̸ ∈ Z, then k > 0, and k is determined by the number of significant figures after the decimal point when B is represented in binary, 38 which then allows one to determine l and n.
Numerically implementing the above procedure for several values of L gives the histogram shown in Fig. 13, 38 There is a small subtlety here, since this approach is incorrect if l = 0 and n = 2 m no with m > 0, no ∈ 2Z + 1. However in this case it is easy to show that the naive values of (k, n, l) one obtains from this procedure in this case are (k, n, l) naive = (k − m, n 0 , −m).Since l naive < 0, this mistake is easy to identify and correct for.The probability for two returning random walks w, w ′ on the 3-tree-with transition probabilities as in (E17)-to have a given branch point Br(w, w ′ ), confirming that Br(w, w ′ ) = O(1) with constant probability.
element are read right to left, so b k a n b −l ∈ S Dyck (w) only if l = k = 0).
Since w was chosen randomly from the set of words obeying the Dyck walk condition, the b-walk defined by w is (using simple concentration inequalities) likely to reach a height of O( √ L), namely to have max x n b (x) ∼ √ L. A random w fulfilling this condition can be seen to be likely to reduce to a word a nw of length n w ∼ 2 √ L , which can be seen by following the procedure bringing w into the canonical form (E13).
Supose now that n w scales as 2 √ L , and consider a random element w ′ ∈ S Dyck (w).We claim that the distance between w and w ′ is likely to scale as d(w, w ′ ) ∼ 2 √ L .This is because w, w ′ are likely to travel on different sheets of the Cayley graph for nearly all of the length of their walks.More precisely, consider the projection of w, w ′ onto the b-tree, and define the branch point Br(w, w ′ ) as the largest depth of a vertex in the tree shared by the paths undertaken by both w and w ′ .We claim that Br(w, w ′ ) is exponentially likely to be O(1), so that w, w ′ indeed travel on separate sheets for nearly the entirety of their trajectories.
To demonstrate this more carefully, consider how returning random walks on CG BS behave when projected onto the b-tree.When the walk on the tree moves to larger scales of CG BS , we will refer to it as moving "upwards", and when it moves to smaller scales we will say that it moves "downwards".At a given vertex in CG BS , moving upwards while staying on the same sheet can be done by moving directly upwards, or by moving to the left or right by an even number of steps, and then moving up.Moving upwards onto a different sheet on the other hand is done by moving left or right by an odd number of steps before moving up.Finally, moving downwards (on the same sheet) can be done by moving an arbitrary amount either left or right, and then moving downwards.
The above reasoning shows that the probability to move downwards is equal to the probability of moving upwards, despite the fact that moving upwards can be done on either of two sheets.More precisely, let p ↖ , p ↗ and p ↓ be the probabilities of moving up on the same sheet, up on a different sheet, and down, respectively.Then The fact that p ↓ = 1/2 means that as far as motion in the tree is concerned, the walk will not move ballistically upwards or downwards, but will instead move diffusively.
Using the above transition probabilities, the expected behavior of Br(w, w ′ ) can be calculated analytically using generating functions.The details are rather messy however, and we content ourselves with a numerical demonstration that Br(w, w ′ ) = O(1) with constant probability, see Fig. 16.Now we return to our discussion of the distance between w and w ′ .As just argued, Br(w, w ′ ) is likely to be O(1).The contractibility of CG BS means that the minimal bounding surface linking w to w ′ must consist of all cells bounded by w and the a axis that lie at a depth greater than Br(w, w ′ ), together with the analogous set of cells for w ′ (a similar argument arose in the proof of proposition 13, whereby in that case Br(w, w ′ ) = 0).Since each of these contributions to the bounding surface consists of ∼ 2 √ L cells, we thus have d(w, w ′ ) ∼ 2 √ L .We are now in a position to argue for the correctness of Claim 1. Dyck walks do not constitute a constant fraction of all returning walks: the number of length-L Dyck walks scales as ∼ L −3/2 2 L ), while the number of returning walks goes as ∼ L −1/2 2 L .Thus only a fraction ∼ 1/L of returning walks are also Dyck walks.Nevertheless, this still gives us enough information to argue that the expected Dehn function is ≳ 2 √ L .To show that the Dehn function is ∼ 2 √ L with constant probability, the basic idea is to realize that a generic word in K e is likely to contain at least two subwords of size ∼ √ L obeying the Dyck property, allowing for application of the above argument.
First, when sampling a random word in K e , we can first sample uniformly from all 5 L/2 words w L of length L/2, and then sample from all words w R of length L/2 such that w L w R ∼ e.Since w L is chosen randomly, we expect that with unit probability in the L → ∞ limit, the walk defined by w L reaches a distance of order ∼ 2 √ L along the a axis of the Cayley graph (this was numerically demonstrated to be the case in Fig. 14).Since |w L | ∼ L, reaching this distance is only possible if w L contains an excursion along a particular sheet of the b-tree which create exponential expansion of the b characters, which in turn can create doubly-exponential expansion of the a characters.The following result makes this intuition more precise: Proposition 18. BS (2) has Dehn function This result appears to be well-known and was stated in Ref. [41] without an explicit proof; we could not find a proof of this statement and thus provide one below for completeness.The key result we need to complete the proof is known as Britton's lemma [89], 40 which is stated as follows: Lemma 1.Let G be a group with presentation S. Further let G contain two isomorphic subgroups H, K ⊂ G, with ϕ : H → K the isomorphism between them.Define the group and wolog express any w on {S, t} * in the form ) Britton's lemma states that if w ∼ e represents the identity in G ϕ , then there must be some i such that either 1. n = 0 and g 0 = e,

ε
This gives us a way of simplifying any word w ∼ e representing the identity in G ϕ .Case 1 above is trivial.In case 2, we can replace the occurrence of t −1 ht with ϕ(h), while in case 3 we may replace tkt −1 with ϕ −1 (k).After doing this reduction, we will be left with a new word w ′ which still represents e, and can apply the lemma again.Iterating, we are guaranteed to eliminate all ts from any w ∼ e in {S, t} * to obtain a new word w G ∈ {S} * , w G ∼ e.For instance, the group BS(1, 2) is simply G ϕ for the choices G = Z, H = G, and K = 2Z.
We now return to a proof of the Dehn function scaling: Proof.We first construct a lower bound.Define the word w n = c −n bc n , so that w n ∼ b 2 n .Then feed this word into the construction of the large-area word w big for BS, by defining w ′ n = w n aw −1 n .Then we claim the word ) 40 We thank Tim Riley for suggesting this proof strategy.
which is doubly exponential in |w huge |.This follows by an argument similar to the one we gave for the area of w big in BS.The tree-like structure of the sheets of the BS Cayley graph give tree-like structures both for words built from b, c and those built from a, b.Letting w huge = b −2 n ab 2 n , this means that Dehn(w huge ) = Dehn( w huge a w −1 huge a −1 ) + O(2 n ).(F7) But using our results from our study of BS, we know that Dehn( w huge a w −1 huge a −1 ) ∼ 2 2 n .Thus Dehn(L) ≥ 2 2 L asymptotically.
We now need to provide a matching upper bound.We do this by combining Britton's lemma with our earlier result about BS.Note that BS (2) can be obtained from BS using just the type of extension as appears in Britton's lemma, where G = BS, H = ⟨b⟩, K = ⟨b 2 ⟩.Then we know that if we are given w ∼ e where |w| = L in BS (2) , we can obtain a word w ′ ∼ e in BS after at most O(L) applications of c −1 bc = b 2 .The maximum amount that |w| can grow by under these substitutions is O(2 L ).Thus an upper bound on Dehn(L) in BS (2) can therefore be obtained by an upper bound on Dehn(2 L ) in BS.Using our previous result on the latter, we conclude and thus when combined with the lower bound, we also have that Dehn(L) ∼ 2 2 L .
BS (2) also provides an example with a superlinear expansion length: Corollary 2. BS (2) has exponential expansion length, EL(L) ∼ 2 L . (F9) Proof.From the general bound (B13) and our above result about the Dehn function of BS (2) , we know that EL(L) ≥ 2 L .The upper bound follows from the above application of Britton's lemma and the fact that the expansion length of BS is only O(L).
It is easy to generalize the above example to construct groups with even faster growing space and time complexity: Corollary 3. Define the group BS (l) through the presentation [41]  In this appendix we provide some detailed analysis studying both the time and space complexity of the nongroup examples presented in the main text.

Star-Motzkin model
Recall the Star-Motzkin model from Sec. VI A. There are two sources of fragmentation; the first originates from the parentheses, and the second from the interaction of the parentheses with the * character.
In the following analysis, we will rely on the fact that we can write down a non-local conserved quantity under the dynamics that necessarily reflects the interaction between the Motzkin degrees of freedom with the * degrees of freedom.This non-local conserved quantity is Q = i 2 j<i n (,j −n ),j n * ,i . (G1) where n c,i = |c⟩⟨c| i .The interpretation of this operator can be understood pictorially.A configuration of parentheses (ignoring the * character) can be mapped onto a height configuration.The height h i is written as h i = j<i n (,j − n ),j .Bringing a * character at height h i down to height h j < h i by a sequence of local updates creates 2 hi−hj such * characters at height h j .The charge operator simply counts the total number of * characters if all * characters are brought down to zero height.
In the situation where the height becomes negative, we can rescale 2 j<i n(,j −n ),j to 2 j<i n(,j −n ),j −h , where h is the value of the lowest height.The interpretation of the exponential charge is then equivalent to the previously introduced definition if h is redefined to be at zero height.
We now proceed to understanding how to label Krylov sectors of the dynamics.For simplicity, we start with analyzing the Krylov sector K Q corresponding to the balanced sector for the parentheses, with total value Q for the non-local conserved quantity.We expect this model to exhibit fragile fragmentation with a linear expansion length (much like Dyn BS ) and as such, we will discuss connectivity of configurations assuming they are appended to a reservoir of 0 characters of length αL.
We claim that the dynamics is ergodic within the Krylov sector K Q so long as α = O(1) is sufficiently large.The proof follows from finding a path between any two states in K Q given the desired size of the reservoir of O(L).To do this, we construct a reference state R, which we show can be reached from all states given the provided space: Without any * characters (Q = 0), the dynamics is entirely ergodic within K 0 .Therefore, the goal is to show that when Q ̸ = 0, all of the * 's can be isolated to a reference configuration R on one side of the system.To this end, given a configuration C, we first isolate O(log Q) number of 0 characters on one side of the system and use these to create a nest ((• • • () • • • )) of parentheses.Next, we construct an algorithm for localizing the entirety of the non-local conserved quantity into the nest, thereby reducing C to the reference configuration R. Consider the * character nearest to the nest.Labelling the nest by 'q' (the current amount of the conserved quantity inside of it), this character is positioned as such: where the number of open parentheses is p.Next, we move the * character via the following sequence.After absorbing one unit of the conserved quantity in the nest and performing the collapse process, we iterate these two steps.By construction, this algorithm will eventually localize the conserved quantity in the nest, forming the reference configuration R.So long as the reservoir is large enough it is possible to transition from any configuration C ∈ K to R, therefore proving ergodicity.
This result indicates to us that the dynamics is ergodic within the K sector (up to a mild form of fragile fragmentation that exist due to EL(L) > L).It also indicates that the dynamics may be slow, since transporting the non-local conserved quantity out of a region appears to take a very large number of steps.This would indicate that the relaxation times of Dyn * M mimic that of Dyn BS .To formalize this, we define the notion of an h-restriction of C. First, we perform a preprocessing step where we eliminate as many () pairs as possible in C. The new configuration C is what we will call a clean version of C. A h-restriction of C, which we denote by S h ( C) is formed by first drawing a reference line at a height h (note that the height profile of the parentheses is shifted so the minimum height is at 0). S h ( C) denotes a set of contiguous configurations above height h -an example is denoted in Figure 18.We also label the total nonlocal conserved quantity in a contiguous configuration c FIG. 18.An illustration of the process used to define hrestrictions.We first clean the configuration, as shown in the upper panel.Then we draw a line at height h and consider all contiguous regions above this line, which are shaded in red in the bottom panel.We label the non-local charge in each of these regions by qi.
by q c = i∈c n i, * 2 hi .Though this quantity should not be associated with a 'charge' corresponding to a symmetry, we will henceforth, in an abuse of notation, call it a charge.
Suppose two contiguous configurations c and c ′ have charge q c and q ′ c .If we want to transport an amount of charge ∆q between these two configurations, how much time will this take?Note that because c and c ′ are supported above height h, charge must be pumped in or out of them at increments of 2 h .Therefore, the amount of time required is at least O(∆q/2 h ).Since the value of the charge supported in c or c ′ can be exponentially large, transporting some fraction of the charge in c to c ′ can take exponentially long time, so long as h is not too large.
With this in mind, we can discuss how long it takes to transition between two configurations.Denote the hrestriction of C to be S h ( C) and that of C ′ to be S h ( C ′ ).Label the charges of the h-restriction of C to be (in decreasing order) q 1 ≥ q 2 ≥ • • • , ≥ q n C and for C ′ to be q Here, n C denotes the number of contiguous regions in the h-restriction of C. If n C ̸ = n C ′ , then some number of contiguous regions need to be created 41 .In this case, assuming that wolog n C ≥ n C ′ , we 41 These new regions can either be created from scratch or can be created by borrowing a part of an existing contiguous region.However, one can show that in both cases, the amount of time required to pump charge q into this new region will be at least q/2 h .Thus, in what follows we can assume wolog that new contiguous configurations are constructed from scratch.
construct the two vectors G8) where the number of 0's in ⃗ q ′ is n C − n C ′ indicating a number of yet-to-be created contiguous configurations.The number of charge that needs to be transferred in and out of these contiguous configurations is at least ∆q = ∥⃗ q − ⃗ q ′ ∥ 1 .The amount of time required to do this is therefore t hit ≥ 2 −h ∥⃗ q − ⃗ q ′ ∥ 1 . (G9) As a result of this bound, there is a simple method for checking whether the time to go between two configurations is very long.Given two configurations C and C ′ , we first construct clean versions and successively raise the value of h until their h-restricted charge vectors are significantly different in 1-norm.At this point, so long as h is not too large, we know that it will take a long time to traverse between these configurations.
Note that the hitting times strongly depend on the total value of Q.In particular, we have the obvious bound Q ≤ L • 2 maxi hi .For a randomly chosen height profile, max i h i = O( √ L) and thus we expect Q ∼ exp √ L .As a result, we may expect that for generic sectors the hitting time could scale like ∼ exp √ L , the same scaling as the one argued for in typical Dehn function of the Baumslag-Solitar group (See Eqn.(76) and App.E).

Chiral star-Motzkin
We now provide a more in depth analysis of the chiral star-Motzkin model discussed in Sec.VI B. Note that, like in the star-Motzkin dynamics, the chiral star-Motzkin dynamics features a non-local conserved quantity: To understand why large spatial resources are needed, we first consider the following warm-up example.Suppose we have the configuration and we want to convert it to the configuration In essence, we want to move a single unit of charge from one of the nests to the other one.We can move the ▷ out of the first nest yielding but unlike in the star-Motzkin model, we cannot move this into the other cluster.The only way to proceed is to collapse the entire cluster, giving where h is the height of the nest that was collapsed.Upon doing this, we may reinsert an empty nest of parentheses forming which when 2 h of the ▷s are used to populate the center of the right next with an ▷, gives C ′ .Let us more rigorously understand when a large amount of spatial resources are needed.We work in an intrinsic Krylov sector with fully matched parenthesis (m = n = 0) and with a fixed value of the non-local conserved quantity Q R = Q.Consider an h-restriction of configurations C and C ′ .Label the charges of the contiguous regions (in decreasing order) for C and C ′ with ⃗ q and ⃗ q ′ ; this notation was introduced in the discussion of the star-Motzkin model.We need to transfer an amount of charge to convert between charge configurations ⃗ q and ⃗ q ′ ; define This is (a lower bound on) the maximum amount of charge that has to be transferred out of a single contiguous region.Some of this charge can be deposited below height h.However, the maximum amount of charge below height h is L2 h .Therefore, if ∆q max − L2 h is large, then this remaining amount of charge must be transferred to different contiguous region.As there are at most L contiguous regions, at some point in time during the charge transfer process, an amount of charge has to be inserted in a contiguous region of charge at least ∆q max /L − 2 h .This requires an amount of space O(∆q max 2 −h /L) since at minimum the entire region needs to be collapsed down to height h before inserting the charge.Therefore, if ∆q max = ϵQ then this space can be very large.This is in fact an extremely loose bound, but sufficient for our purpose of showing large space complexity.Therefore, as in the quasi-fragmented example, given two configurations C and C ′ , one first constructs clean versions of these configurations and then selects a height h such that the h-restriction of C and C ′ has large ∆q max .If this is the case, the space complexity scales linearly in ∆q max 2 −h up to polynomial factors in L.

Non-groups and thermalization
Since we could also construct group based examples with large time and space complexities, it is an interesting question to ask whether there are qualitatively new features that non-group based constraints provide to the dynamics.
We answer this question by examining the structure of reduced density matrices of subsystems under the dynamics.Recall that under group dynamics Dyn G , reduced density matrices of subsystems, defined as ρ A = Tr A c Dyn † G ρ 0 Dyn G , will have non-zero values along their diagonals.To explain why, consider a decomposition of the system S = AA c .Start with an initial product state and suppose v has m zeroes (and can be converted under the dynamics to some canonical form 0 m v ′ ).Under the dynamics, we can perform a sequences of transitions that converts |u⟩ A ⊗ |0 m v ′ ⟩ A c to 0 |A| A ⊗ 0 m−|A| uv ′ A c and subsequently If the number of zeroes in the initial state is large enough, then any word w can be produced in A, therefore implying that ρ A will have all diagonal elements nonzero.Note that this crucially relies on the existence of inverses, hence why it is special to a group structure.
However, this property no longer applies for dynamics which are not based on groups.Instead, we argue that for certain non-group examples, it is not possible to attain all words u in subsystem A. To see this, let us consider the chiral star-Motzkin model from the previous subsection.Consider the initial word in the subsystem to be and the entire initial state |u⟩ ⊗ |v⟩ to be in K Q for some large value of Q.Let us assume that v is generic enough that a constant fraction of it is filled with '0' characters.where P n projects onto configurations with n * ,A = n.Let us first suppose that |A| ≪ log L.Then, we contend that p(n, t) > 0 for all 0 ≤ n ≤ |A|.To show this is simple: in order to allow for 0 ≤ n ≤ |A| we need to be able to annihilate all of the parentheses in A. Since this can be done in roughly 2 |A|/2 space and |A| ≪ log L, configurations with all possibly densities n * ,A | can be reached.However, when |A| ≫ log L it is no longer that case that all configurations with densities 0 ≤ n * ,A ≤ |A| are achievable.To see this, consider sectors with Q ≫ exp(|A|).Note that in order to have configurations with n * ,A = |A|/2 + q, we need to annihilate at least q pairs of parentheses in A. If we set the height of the h-restriction to be h = |A|/2 − q and the value of Q ≫ exp(|A|), there will be at least two contiguous regions.To transport out an amount of charge 2 q from the contiguous region in A to a contiguous region outside A requires at least 2 q /poly(L) space.If q ≫ log L then this is not possible.Therefore, we find that for subsystems of size ≫ log L, no configurations with charge n * ,A = |A|/2 + q are reachable with q ≫ log L. This implies an unusual property that the reduced density matrix ρ A will not have full rank unless the size of the subsystem is smaller than log L. In this example, the consequences are easily observable since the value of n * ,A is a local operator.

FIG. 3 .
FIG.3.The group geometry of BS(1,2).The Cayley graph of BS(1, 2) is constructed from an infinite number of two-dimensional sheets glued together in a tree-like fashion.a) A single sheet of the Cayley graph, which resembles a hyberbolic plane.Orange arrows denote multiplication by a and purple arrows multiplication by b, with the boundary of each plaquette being the defining relation aba −1 a −1 b −1 = e.The diagonal purple lines denote edges which connect to different sheets; all diagonal edges connected to vertices at the same "height" connect to the same sheet.b) How different sheets are glued together to form the full Cayley graph.The shaded yellow region denotes how a section of a single sheet is embedded in the full Cayley graph.The bold dashed path denotes the path traced out by a "large" word w large (n) = ab −n a −1 b n a −1 b −n ab n , which is homotopic to the identity (φ(|w large ⟩) = e) and possesses an area scaling exponentially with its length.In the figure n = 3, and the path on the Cayley graph is ordered by points 1 to 8, according to the 8 different components of w large (read from right to left).Thus the point 2 corresponds to the word b 3 , the point 5 corresponds to a −1 b −3 ab 3 , and so on.

∼ e −αL 1/ 3 FIG. 4 .
FIG.4.The probability pe for a randomly chosen word to lie in the identity sector, determined according to the procedure described in App.E. The dashed line is a fit to pe ∝ e −αL 1/3 with α ≈ 1.84.

FIG. 5 .
FIG.5.Stochastic circuit time evolution of observables associated with the b-charge for the system initialized in a state |w large (n = L/10, L)⟩, which is a large-area word w large (n = L/10) with identities inserted at random places.(a) Time evolution of the spatial profile of b-charge, ⟨nb,i⟩, where averaging is performed over a time window of T = 10 5 brickwork layers.(b) Time evolution of observable ⟨nb⟩ 2 from Eq.(89).Thermalization occurs at time t th , when this observable drops to zero, which corresponds to the b-charge density wave collapsing to a flat profile.The run corresponding to panel (a) is shown in grey, while the blue curve corresponds to an average over several independent runs, with the red shade showing the standard deviation.Time for each run has been rescaled by the respective thermalization time.The analytic formula from Eq.(83) is shown in dashed green (rescaled by the value at t = 0).(c) Thermalization time scales exponentially with the system size L if the density of b's in the initial word is kept fixed, n = L/10.

Proposition 9 .
If a finitely presented group G possesses a superpolynomial Dehn function, then G has exponential growth.

Proposition 11 .
For all L and all g, |g| ≤ L, we have 33 |K e,L | ≥ |K g,L |.
Appendix D: No complete symmetry labels for non-Abelian G

Claim 1 .
With probability 1−ϵ, a randomly chosen word in K e,L for BS has a Dehn function Dehn(L) ≳ 2 f (ϵ) √ L .

FIG. 13 .
FIG. 13.Histogram of the geodesic length estimate |g(w)| = k + l + log 2 (|n| + 1) randomly-chosen length-L words in BS.The scaling collapse indicates that typical words have a geodesic length scaling as √ L.
E16) the number of bs contained in w minus the number of b −1 s.Then: • If B ∈ Z, either k = 0 or l = 0. Which one of these scenarios holds depends on sgn(n b ): if n b < 0 then k = 0, l = −n b and n = B, while if n b > 0 then l = 0, k = n b and n = 2 n b B.
FIG.16.The probability for two returning random walks w, w ′ on the 3-tree-with transition probabilities as in (E17)-to have a given branch point Br(w, w ′ ), confirming that Br(w, w ′ ) = O(1) with constant probability.
is the b-charge on site i at time step t (with g i (t) the ith entry of the state at time t), and T is a time window that is small relative to the thermalization timescales of interest, but large enough to suppress short-time statistical fluctuations n b,i (t) (here ⟨•⟩ denotes averaging over this time window, while • denotes averaging over space).The equilibrium distribution |π g ⟩ ∝ w∈K g,L |w⟩ for g = e satisfies ⟨π e |n b,i |π e ⟩ ≈ 0 for all i (see App. E), and so for initial states |w⟩ ∈ K e,L , |wβ⟩⟨w ′ β| Dyn G (t) |wα⟩ where W = |w⟩ ⟨w| and X ww ′ = |w⟩ ⟨w ′ | + h.c. are operators which can be fully supported in a subsystem of size max(|w|, |w ′ |).Suppose that EL(w, w ′ ) is large, so that all derivations between w and w ′ require large spatial resources.Then, if Dyn G describes circuit dynamics, C ww ′ (t) is zero for small t since if both ⟨wα| Dyn † G (t) |wβ⟩ and ⟨w ′ β| Dyn G (t) |wα⟩ are nonzero, then one can transition from |wβ⟩ to |w ′ D3)where O g,i = |g⟩⟨e| i + h.c.Our goal will be to show that |w gh ⟩ cannot be an eigenstate of any of the {U a } if gh ̸ = hg.Denoting e l+m+n+2 as |e⟩ for simplicity, and defining |w gh;a ⟩ = O g,l O h,l+m+1 U a e l+m+n+2 , we have⟨w gh;a |U a |w gh ⟩ = ⟨e|U † a O g,l+1 O h,l+m+2 U a O g,l+1 O h,l+m+2 |e⟩ = ⟨e|O Ua g,l+1 O Ua h,l+m+2 O g,l+1 O h,l+m+2 |e⟩.In particular, in the translation invariant case where ⟨e|O Ua g,l+1 O g,l+1 |e⟩ is independent of l, we have ⟨w gh;a |U a |w gh ⟩ = ⟨w hg;a |U a |w hg ⟩.