Directed Percolation Criticality in Eternal Inflation

False-vacuum eternal inflation can be described as a random walk on the network of vacua of the string landscape. In this paper we show that the problem can be mapped naturally to a problem of directed percolation. The mapping relies on two general and well-justified approximations for transition rates: 1.~the downward approximation, which neglects ``upward"transitions, as these are generally exponentially suppressed; 2. the dominant decay channel approximation, which capitalizes on the fact that tunneling rates are exponentially staggered. Lacking detailed knowledge of the string landscape, we model the network of vacua as random graphs with arbitrary degree distribution, including Erd\"os-R\'enyi and scale-free graphs. As a complementary approach, we also model regions of the landscape as regular lattices, specifically Bethe lattices. We find that the uniform-in-time probabilities proposed in our previous work favor regions of the landscape poised at the directed percolation phase transition. This raises the tantalizing prospect of deriving universal statistical distributions for physical observables, characterized by critical exponents that are insensitive to the details of the underlying landscape. We illustrate this with the cosmological constant, and show that the resulting distribution peaks as a power-law for small positive vacuum energy, with a critical exponent uniquely determined by the random graph universality class.


Introduction
Our universe appears to be tantalizingly poised at criticality.Extrapolating the Higgs effective potential reveals that the electroweak vacuum lies within a tiny parameter region of metastability [1][2][3][4][5][6][7][8][9][10], a result that is exquisitely sensitive to the top quark and Higgs boson masses.This hinges on an enormous cancellation between the exponentially small decay rate and the exponentially large observable volume of the universe.It is tempting to speculate that there is an intricate relation between various measured quantities: the cosmological constant (CC), which sets the observable volume of the universe, and the Higgs and top quark masses, which set the Higgs effective potential when extrapolated to high enough energy.
Other fine-tuned features of our universe can also be interpreted as near-criticality.In light of the nondetection of supersymmetry at the electroweak scale, the nearly vanishing ratio of Higgs mass to M Pl is a result of exponential fine-tuning, which can be interpreted as the boundary of broken/unbroken electroweak symmetry [11].In cosmology, the CC problem translates to our universe being nearly Minkowski space, which bifurcates into ever-expanding de Sitter (dS) and crunching Anti-de Sitter (AdS) space-times with distinct asymptotics and stability properties [12,13].
In the context of eternal inflation [14][15][16][17][18], universes in causally-disconnected Hubble patches possess different physical laws.New universes are being constantly generated in all patches.At the same time, string theory predicts an exponentially large number of metastable vacua with enormously rich low-energy physics [19][20][21].False-vacuum eternal inflation is essentially a random walk on the network of vacua of the string landscape.Given these, it is natural to ask if the near-criticality of our universe can be approached from a statistical point of view.
In order to extract predictions in the multiverse, it is intuitive to study statistically the distribution of vacua and their associated physical properties.Although deriving such a statistical distribution may ultimately require a complete understanding of quantum gravity, it is still instructive to approximate it using a semi-classical prescription.Attempts to define semi-classical probabilities (or measure) usually rely on limiting frequency distributions.This is perhaps natural, since the infinite ensemble necessary to define frequencies is actually realized in the multiverse.However, it is well known that defining such a measure is ambiguous, as it is assumption-dependent even under the same framework [22][23][24][25].
In a recent paper [26], we presented a general Bayesian framework for probabilistic reasoning in eternal inflation.Different assumptions about the measure problem amount to different choices of priors to define probabilities.We identified two prior distributions, both pertaining to initial conditions, that must be specified to obtain well-defined occupational probabilities for different vacua.Since eternal inflation is geodesically past-incomplete [27], we know that we exist a finite time t since the onset of eternal inflation.Our ignorance about the time of existence is captured by a prior density ρ(t).Relatedly, along our past world-line eternal inflation must have started within some particular "ancestral" dS vacuum, but we do not know which one.Our ignorance about the ancestral vacuum is parametrized by a probability distribution p α over dS vacua.Different proposed solutions to the measure problem simply amount to different choices for these two priors.
In [26] we argued that there are two natural and well-justified choices for the time-of-existence prior ρ(t): • Since the number of observers grows with volume, a natural choice is ρ(t) ∼ a 3 .This is equivalent to weighing probabilities by physical volume.The resulting "late-time/volume-weighted" probabilities coincide with the measure of Garriga, Vilenkin, Schwartz-Perlov and Winitzki (GSVW) [28].This choice of prior reflects the belief that we exist at asymptotically late times in the unfolding of the multiverse, much later than the exponentially-long relaxation time for the landscape, such that probabilities have settled to a quasi-stationary distribution.This assumption is adopted in nearly all existing approaches to the measure problem [22,23,[28][29][30][31][32].
• Alternatively, motivated by the time-translational invariance of the random walk on the landscape, a natural choice is the uniform prior: ρ(t) = const.(To be clear, this is uniform in either proper time or e-folding time.)The resulting "uniform-in-time" probabilities agree with the prior probabilities of [33].They are closely related to the "comoving" probabilities proposed in [28,32], as well as probabilities derived recently using the local Wheeler-De Witt equation [34].Importantly, the uniform-in-time probabilities favor vacua that are accessed early on in the evolution of the multiverse, during the approach to equilibrium.This is consistent with the early-time approach to eternal inflation developed recently [35][36][37][38][39].
transition rates between vacua: 1.The first assumption is that transition rates between dS vacua satisfy a condition of detailed balance, such that "upward" jumps are exponentially suppressed by κup κ down ∼ e −∆S , where S is the dS entropy.This is satisfied by most tunneling instantons, including Coleman-De Luccia (CDL) [40][41][42].Thus we are justified to work in the downward approximation [43,44], wherein upward transitions are neglected to leading order.In this approximation, the network of vacua becomes a directed graph.
2. The second assumption rests on the fact that semi-classical tunneling rates depend exponentially on the Euclidean action of the instanton, κ ∼ e −SE .In turn, S E depends sensitively on the height and width of the potential barrier.Because of this exponential sensitivity, branching ratios for dS vacua are typically overwhelmingly dominated by a single decay channel.This defines the dominant decay channel, in which exponentially-subdominant decay channels are neglected.
It should be stressed that these approximations are not strictly necessary to study percolation.They are made for convenience, to simplify the problem, and we will discuss how the analysis can be generalized by relaxing them.In any case, with these approximations, the uniform-in-time probabilities reduce to a simple and intuitive observable in directed graphs.Namely, the probability to occupy a given node I simplifies to where s I is the number of ancestors, i.e., nodes that can reach I through a sequence of directed (downward) transitions.See Fig. 4. Thus the measure favor vacua with a large basin of ancestors.In other words, regions of the landscape with large probability must therefore have the topography of a deep valley, or funnel [26,36,37,39].This is akin to the smooth folding funnels of protein conformation landscapes [45], and those of atomic clusters with Leonard-Jones interactions [46][47][48].In the context of deep learning, it has been argued that deep neural networks that generalize well have a loss function characterized by a smooth funnel [49].
Another instance is the "big valley" hypothesis in combinatorial optimization (e.g., the search space of the traveling salesman problem), where it is conjectured that local optima are clustered around the central global optimum [50].It is tempting to speculate that funnels are a generic solution to optimization problems on complex energy landscapes.Equation (1) gives an intuitive and well-justified notion of probability for different vacua.But what can we reasonably assume about the network of vacua, given our limited understanding of the string landscape?Lacking detailed knowledge of the underlying network of vacua, it seems sensible to model regions of the landscape as random graphs.Random graphs have a long and venerable history, going back to the seminal work of Erdös and Rényi graph [51].In general, they can be defined by specifying a probability distribution that a given node has a certain degree (number of links), with Erdös-Rényi graphs corresponding to the special case of a Poisson degree distribution.In this work we follow [52] and consider arbitrary degree distributions, including scale-free random graphs.
As a complementary approach, we also model landscape regions as a regular lattice, specifically a Bethe lattice (or Cayley tree).This is suitable for local string landscapes in which vacua form a regular network, for instance the axion landscape [53][54][55].The directed percolation transition can be studied analytically for both Bethe lattices and (Erdös-Rényi) random graphs [52].Remarkably, despite being extreme opposites in terms of graph "regularity", Bethe lattices and Erdös-Rényi graphs belong to the same percolation universality class.Thus it is our hope that, despite being highly simplified and idealized, these two approaches offer important lessons about percolation phenomena on the landscape, that are applicable to more realistic dynamics.
To see how this maps to directed percolation, consider the landscape networks shown in Fig. 1, comprised of dS transient vacua (blue nodes) and AdS/Minkowski terminal vacua (red nodes). 1 In the downward

Subcritical
' < l a t e x i t s h a 1 _ b a s e 6 4 = " 2 K 4 5 J J M P X P v 3 4 x s / b 8 transients Figure 1: A region of the landscape, comprised of transient dS vacua (blue nodes) and terminals (red nodes).In the dominant decay channel approximation, each dS vacuum has exactly one outgoing link.Left: If dS vacua decay primarily to terminals, the region breaks up into many small disconnected components and is therefore subcritical.Right: If dS vacua mainly decay to other dS vacua, then a giant connected component can emerge, and the region is near percolation criticality.
approximation, all transitions are one-way, and the graphs are directed.Each dS transient has exactly one directed edge emanating from it, corresponding to its dominant decay channel.Now, if dS vacua mainly decay to terminals (left panel), then the region breaks up into many small disconnected components, resulting in s I ∼ O(1), and thus low probability.If, however, dS vacua mainly decay to other dS vacua (right panel), then the graph can include a very large connected component, with low-lying nodes having s I ≫ 1.This corresponds to the emergence of a giant component at the percolation phase transition.
It is clear from these simple considerations that vacua with the highest occupational probability reside in landscape regions poised at the directed percolation phase transition.This is a key result of our analysis.In hindsight, since the uniform-in-time measure is relevant for the approach to equilibrium in landscape dynamics, its relation to directed percolation is perhaps not surprising, as directed percolation is the paradigmatic non-equilibrium critical phenomenon [57].
As usual, the power of criticality lies in universality.Near the percolation phase transition, various quantities assume power-law (scale-invariant) probability distributions, characterized by universal critical exponents that are insensitive to the microscopic details of the system.This raises the tantalizing possibility of deriving universal probability distributions for physical observables without detailed knowledge of the underlying landscape.We illustrate this point concretely with the CC, and briefly mention other potential observables in the conclusions.

Universal probability distribution for the CC
As reviewed in Sec. 5, at criticality the probability P s that a randomly-chosen vacuum has s ancestors displays a power-law tail, P s ∼ s −3/2 .(The particular critical exponent of −3/2 holds for the Erdös-Rényi universality class; scale-free graphs have different critical exponents.)Assuming only that the underlying CC probability distribution on the landscape is smooth as v → 0 + , we show in Sec.7 that this translates into a (non-anthropic) universal probability distribution for the CC (which takes into account eternal inflationary for a recent discussion of the possibility to up-tunnel out of AdS.dynamics) that is also power-law near the origin: (2) Thus the uniform-in-time cosmological measure favors small, positive vacuum energy.Quantitatively, the 95% confidence interval for the CC is set by the size of the giant component, which is famously O N 2/3 dS for Erdös-Rényi graphs: This can explain the observed CC, v obs ∼ 10 −120 , if our vacuum belongs to a funneled region of size N dS ∼ 10 240 .To be clear, here N dS is the number of dS (transient) vacua in a funnel region near directed percolation criticality, not the total number of dS vacua across the entire landscape.Since the measure favors vacua with the largest number of ancestors, we are likely to inhabit the largest funnel region near percolation criticality, i.e., the near-critical region with largest N dS .Before closing, we should mention other occurrences of percolation criticality in the context of eternal inflation, and how they contrast with our framework.It is well known that the bubbles generated in falsevacuum inflation exhibit a percolation phase transition when the nucleation probability within a Hubble volume, κ ≡ Γ/H 4 , reaches a critical value somewhere in the range 10 −6 ≲ κ c ≲ 0.24 [58].This transition describes bubble percolation in space-time, as opposed to the percolation phase transition in the network of vacua discussed in this work.Relatedly, in the context of slow-roll inflation, it was shown in [59] that the phase transition to eternal inflation can be described by a Galton-Watson branching process [60], whose critical behavior is equivalent to directed percolation, e.g., [61].Lastly, we should mention the mechanism of 'self-organized localization' [62], whereby the near-criticality of our universe arises from quantum firstorder phase transitions in stochastic inflation.In contrast, our approach pertains to classical, second-order non-equilibrium criticality.
The paper is organized as follows.In Sec. 2 we briefly review vacuum dynamics as an absorbing Markov process on the network of vacua.In Sec. 3 we describe the general Bayesian approach to the measure problem, and review the late-time/volume-weighted and uniform-in-time probabilities as two well-justified choices of priors.We also review the argument, originally given in [26], that posterior odds overwhelmingly favor the uniform-in-time hypothesis.In Sec. 4, we discuss how the mapping of vacuum dynamics to a problem of directed percolation.Section 5 is a rather comprehensive review of the key notions of percolation on random graphs, both undirected and directed, with general degree distributions.In Sec. 6 we apply these notions to the case of interest, namely random networks with terminal (AdS) vacua, and argue that uniform-in-time probabilities favor regions of the landscape poised at directed percolation criticality.In Sec. 7 we show how the probability distributions of ancestors and descendants, which assume power-law tails at criticality, translate into universal distributions for the CC with certain critical exponents.We summarize our results and discuss a few avenues of future research in Sec. 8.

Brief Review of Vacuum Dynamics
Vacuum dynamics on the string landscape are described by a linear Markov process [28,31].Technically, this is an absorbing Markov process, because of AdS vacua which act as terminals.As a result, detailed balance is explicitly violated, and the dynamics are out of equilibrium.The Markov process describes the probability f I (t) along a given world-line to occupy vacuum I as a function of time.This probability satisfies the master equation df where κ IJ is the J → I transition rate.(Terminal vacua by definition have κ aI = 0.) While most of the results in this section hold for general tunneling rates, we have in mind transitions mediated by semi-classical instantons, such as Coleman-De Luccia (CDL) [40][41][42], Hawking-Moss [63] and Brown-Teittleboim [64].The general time variable t is related to proper time in vacuum I via a lapse function: The master equation relies on coarse-graining over a time ∆τ I , which must be longer than any transient evolution between epochs of vacuum energy domination. 2 Since AdS bubbles crunch in a Hubble time, coarse-graining spans their entire evolution.
The probabilities f I (t) have a dual interpretation.They can be interpreted "locally", as occupational probabilities along a world-line.Or, they can be interpreted "globally", as the fraction of comoving volume that each vacuum occupies on a spatial hypersurface at time t.We mainly adopt the former interpretation.Equation ( 4) makes two properties of the f I 's clear: 1) The master equation ( 4) is manifestly invariant under redefinitions of t, hence the f I 's are time-reparameterization invariant; 2) Because summing the right-hand side over I gives zero, the f I 's can be normalized: Thus the f I (t)'s offer well-defined, time-reparametrization invariant probabilities to occupy different vacua at time t.
We will be primarily interested in the dS component of the master equation, given by Here, M ij is the dS → dS transition matrix, and κ i ≡ J κ Ji is the total decay rate of vacuum i.Our only assumption about M ij is that it is irreducible, i.e., there exists a sequence of transitions connecting any pair of dS vacua, a property which has been argued to be valid for the string landscape [66].Equation ( 7) can be solved in terms of a Green's function: where p α ≡ f α (0) is the initial probability over ancestral vacua.Since eternal inflation by definition started in a dS vacuum, the initial probabilities satisfy 3 Bayesian Probabilities To define Bayesian probabilities, one must carefully distinguish the elements that are inherent to the eternal inflation hypothesis from those that require additional assumptions in the form of prior information.An 2 At the same time, ∆τI should be shorter than the lifetime of most metastable dS vacua, for otherwise we would be "integrating out" the transitions we are interested in describing.In practice, the coarse-graining time interval for a given transition to I should satisfy ∆τI , where Hparent is the Hubble rate of the parent dS vacuum (see, e.g., [65]).
important fact is that eternal inflation, while eternal into the future, is not eternal into the past.That is, an eternally-inflating space-time is past geodesically incomplete [27].This has two implications: • We exist a finite time t after the onset of eternal inflation, but we do not know how long ago that was.
We must therefore parametrize our ignorance about the time of existence with a prior density ρ(t), normalized as ∞ 0 dt ρ(t) = 1.• Along our past world-line, eternal inflation started in some ancestral dS vacuum α, but we do not know which one.Our ignorance about the ancestral vacuum is captured by the initial probability distribution p α .
Lastly, it is customary to condition probabilities on one piece of observational data.Namely, that we exist in our dS pocket universe during the transient period before vacuum domination.That is, we exist within a coarse-graining time ∆t after nucleation of our bubble.
It is then straightforward to write down the joint probability distribution P (I, t, α) to inhabit a bubble of vacuum I, nucleated at time t, starting from an ancestral vacuum α: This is easy to understand.The factor e M t jα p α is the probability to evolve from ancestral vacuum α to parent dS vacuum j at time t; while the factor κ Ij ∆t is the probability to transition from parent dS vacuum j to vacuum I in the next ∆t.Lastly, we weigh the time of nucleation with ρ(t), and sum over all dS parents j.To our mind, the above joint probabilities are the correct objective approach to inductive reasoning in the multiverse.They accurately encode our ignorance about when and where eternal inflation started in our past.Different approaches to the measure problem simply amount to different choices for the priors p α and ρ(t).
Within this general framework, one can perform the three main operations of Bayesian inference: 1.By marginalizing over the model parameters t and α, and using (8), we obtain the prior predictive distribution: This distribution informs us on which vacua are statistically favored without taking any data (e.g., value of the CC, particle spectrum etc.) into consideration, other than conditioning on our bubble being nucleated within the last ∆t.
2. Different hypotheses H 1 and H 2 , corresponding to different choices of priors, can be compared by computing the posterior odds: where P (H i ) is the prior odds for each hypothesis, and P (D|H1) P (D|H2) is the Bayes factor.The data D refers to all the information available about our observable universe, in the form of measured values for various observables {O i }.These include the particle content, masses and couplings of the Standard Model, as well as the parameters of the cosmological ΛCDM model.
3. Conditioning on our data D for a given choice of priors, we can perform parameter inference.For instance, P (t|D) gives the posterior distribution for the time of nucleation.
Each of these operations was studied in detail in [26].In what follows we will be primarily interested in the prior predictive probabilities (11).

Uniform-in-time measure
As mentioned above, a choice of measure amounts to specifying a choice of priors p α and ρ(t).Consistency requires that priors reflect all information at hand, while at the same time being minimally informative.Let us first discuss the time of nucleation prior ρ(t), as it is most important to determine P (I).In general, specifying a prior for a continuous variable is tricky, for the obvious reason that a uniform prior is not reparametrization invariant.Following Jaynes [67], a useful strategy in this case is to identify the symmetries of the problem and apply the notion of group invariance.Logical consistency requires that our prior be invariant under all symmetry transformations.
In the case at hand, a key property of the master equation ( 4) is that it is time-translation invariant.More precisely it is invariant under translations in proper time, as well as any time variable t related to proper time via a lapse function N I depending on H I only (e.g., scale factor/e-folding time).Without additional information, the most natural choice is the uniform prior: To be clear, this prior is uniform in proper time and e-folding time.In terms of conformal time, however, it corresponds to the Jeffreys prior, ρ(η) ∼ η −1 , consistent with the dS dilation symmetry η → λη, ⃗ x → λ⃗ x.
Substituting into (11), we can perform the time integral using the identify where T ij ≡ κij κj is the branching ratio.(More generally, the branching ratio matrix has components T Ij = κ Ij κj , T ab = δ ab , and T ja = 0, such that I T IJ = 1 for all J.) Equation ( 11) then gives The matrix (1 − T ) −1 is known as the fundamental matrix for the absorbing Markov chain.Expanding it as a geometric series, (1 − T ) . ., it is easily recognized as the total branching probability for all transition paths connecting j to i. Thus P (I) naturally interpreted as the sum over all paths connecting ancestral vacua to vacuum I, weighted by the branching probability for each path and averaged over ancestral vacua.
Next, consider the prior p α over ancestral vacua.This was discussed in detail in [26], and we briefly mention the salient points.The prior p α pertains to the question of the initial state in quantum cosmology, which has been the subject of active debate for decades and remains an open problem.A well-motivated proposal for the quantum creation of a closed universe is the Hartle-Hawking (HH) state [68,69], which exponentially favors the lowest energy (highest entropy) dS vacuum.Another well-studied proposal is the tunneling wavefunction [70][71][72][73], which instead favors high-energy/low-entropy initial vacua.Thus the tunneling wavefunction favors (high-energy) inflation, whereas the HH state does not [74].
As motivated in [26], a reasonable attitude is to err on the side of maximal ignorance and apply the principle of indifference: (If the number of dS vacua in the landscape is infinite [75][76][77], then (16) would represent an improper prior, which is fine since the resulting probabilities would nevertheless be well-defined.)Because high-energy dS vacua are expected to vastly outnumber low-energy dS vacua in the landscape, a uniform prior is statistically equivalent to a prior favoring high-energy/low entropy initial conditions, such as the tunneling wavefunction.
If the HH state turns out to be the correct initial conditions for eternal inflation in our past, then this would have important implications for the uniform-in-time probabilities.Adopting ( 16), ( 15) reduces to This distribution agrees with the prior probabilities of [33], and is closely related to the "comoving" probabilities proposed in [28,32].We will see that this admits a clear and intuitive interpretation with the simplifying assumptions discussed in Sec. 4.

Late-time/volume-weighted measure
Another reasonable choice for ρ(t) is motivated by the fact that the number of observers grows with volume.Hence, ρ(t) should grow accordingly: As shown in [26], this is equivalent to weighing occupational probabilities by physical volume.
Because the prior is sharply peaked at late times, the occupational probabilities f j (t) can be approximated by their asymptotic form Here s j the so-called dominant eigenvector of M ij , which by definition has the largest (least negative) eigenvalue −q [28].Substituting into (11), we obtain in this case This agrees with the GSVW measure [28] obtained by counting bubbles along a world-line.
The above distribution admits an intuitive explanation in downward perturbation theory, discussed in Sec.4.1 below.In this approximation, the dominant eigenvector takes a simple form [44]: where ⋆ denotes the most stable (i.e., longest-lived) dS vacuum, also known as the dominant vacuum.Thus (20) becomes In other words, this is recognized as the total branching probability from ⋆ to I. The late-time/volumeweighted measure is independent of initial conditions (i.e., independent of p α ), reflecting the attractor nature of eternal inflation.However, somewhat paradoxically, (22) coincides with (15) for the special choice of initial conditions p α = δ α⋆ .

Model comparison favors uniform-in-time probabilities
In [26] we compared the Bayesian evidence for the uniform-in-time and late-time measures by computing the Bayes factor P (D|H late ) P (D|Huni) .We argued, under general and plausible assumptions, that it overwhelmingly favors the uniform-in-time hypothesis.The reason is easily understood intuitively.Since ⋆ is the most stable vacuum anywhere in the landscape, it is likely that it can only decay via an upward transition, because upward jumps are doubly-exponentially suppressed (as discussed in Sec.4.1).Therefore, the branching probability to vacua compatible with our data is also doubly-exponentially suppressed.In contrast, for uniform-in-time probabilities, if vacua compatible with our data can be reached from some ancestral vacua via a sequence of downward transitions, then the Bayesian evidence is likely exponentially small, but not doubly-exponentially suppressed.
Furthermore, conditioning on our data D, we performed in [26] parameter inference to determine the most likely time of nucleation.For the uniform-in-time hypothesis, we found that the average time for occupying vacua compatible with our data is much shorter than mixing time for the landscape.This is fully consistent with the "early-time" approach to eternal inflation [36][37][38][39], which proposes that we live during the approach to equilibrium in the unfolding of the multiverse.See [35] for related ideas.This is in contrast with the late-time/volume-weighted distribution, which reflects the belief that the evolution of the multiverse has been going on for an exponentially long time, much longer than the mixing time of the landscape, such that the occupational probabilities have settled to a quasi-stationary distribution.
We henceforth focus on the uniform-in-time measure (17).

Mapping to a Directed Percolation Problem
The landscape can be modeled as a network (or graph).The nodes represent the different vacua, while the links define the network topology and represent all relevant transitions between vacua.There are two types of nodes: transients (dS) and terminals/absorbing (AdS and Minkowski).Because transition rates are different along each link, the graph is said to be weighted.The master equation ( 4) describes a random walk on this weighted network.The measures derived above are closely related to network centrality indices: the uniform-in-time measure (17) is analogous to Katz centrality [78]; the late-time (GSVW) measure ( 22) to eigenvector centrality.
In this Section we show how the problem can be mapped to a problem of directed percolation.Directed percolation is the paradigmatic critical phenomenon for non-equilibrium systems [57].It is perhaps not surprising that the absorbing Markov process describing vacuum dynamics on the landscape, which is inherently non-equilibrium, belongs to the universality class of directed percolation.The mapping relies on two very general and reasonable assumptions about transition rates between vacua, discussed respectively in Secs.4.1 and 4.2.We will see that, with these approximations, the uniform-in-time probabilities reduce to a simple and intuitive observable in directed graphs, namely the number of ancestors of a given node.

Downward approximation
The first assumption is that transitions between dS vacua satisfy a condition of detailed balance [79]: where S j = 8π 2 M 2 Pl /H 2 j is the dS entropy.This condition is satisfied by CDL, Hawking-Moss and Brown-Teittleboim instantons.More generally, it is consistent with the interpretation of quantum dS space as a ' < l a t e x i t s h a 1 _ b a s e 6 4 = " Z d 8 X L c k 0 This explains the dominant decay channel approximation.On the left, in the downward approximation a given node has possibly many allowed decay channels (directed links), but with exponentially staggered branching ratios (different shades of gray).On the right, the approximation amounts to only keeping the link with largest branching ratio.The parent node therefore has out-degree 1.
thermal state [80]. 5Notice that (23) depends only on the false and true vacuum potential energy -it is insensitive to the potential barrier and does not rely on the thin-wall approximation.
Equation ( 23) implies that upward transitions, which increase the potential energy, are exponentially suppressed compared to downward tunneling.This allows one to define a "downward" approximation [43,44], in which upward transitions are neglected to zeroth order.(Upward transitions are treated perturbatively at higher order.)In this approximation, the network of vacua reduces to a directed, acyclic graph [88], i.e., without directed loops, whereby a link from j to i is only allowed if V j ≥ V i .This may be a good place to point out that the validity of the master equation ( 4) has not been rigorously established for upward transitions.So it may be the case that the description of vacuum dynamics as a Markov process is only legitimate in the strict downward approximation.
In any case, dS vacua that can only decay via upward transitions become effectively terminal in this approximation.In other words, in the downward approximation terminals consist both of AdS/Minkowski vacua and dS vacua with upward-only decay channels.Transient nodes are dS vacua with at least one downward decay channel.

Dominant decay channel
The second assumption is motivated by a generic feature of transition rates in quantum field theory, namely that they are exponentially staggered.This is because tunneling rates depend exponentially on the instanton Euclidean action: For CDL tunneling, for instance, S E depends sensitively on the shape of the potential, such as the height and width of the barrier.Because of this exponential sensitivity, branching ratios for dS vacua are typically overwhelmingly dominated by a single decay channel, with T dom Ij ≃ 1, while other decay channels are comparatively exponentially suppressed, i.e., T other Ij ≃ 0. Hence our second simplifying assumption is that we work in the approximation where T Ij is 0 or 1.In other words, either there is a link between two nodes (T Ij ≃ 1) or not (T Ij ≃ 0).Furthermore, if T Ij ≃ 1, then this is the only link emanating from j, i.e., the out-degree of j is 1.This is illustrated in Fig. 2.There are of course exceptions, for instance in regular lattices of flux vacua [19].But we expect that single-channel dominance is justified for random landscapes, which will be our primary interest in Sec. 5.
Training neural networks requires minimizing a high-dimensional non-convex loss function -a task that is hard in theory, but sometimes easy in practice.Despite the NP-hardness of training general neural loss functions [2], simple gradient methods often find global minimizers (parameter configurations with zero or near-zero training loss), even when data and labels are randomized before training [42].However, this good behavior is not universal; the trainability of neural nets is highly dependent on network architecture design choices, the choice of optimizer, variable initialization, and a variety of other considerations.Unfortunately, the effect of each of these choices on the structure of the underlying loss surface is unclear.Because of the prohibitive cost of loss function evaluations (which requires looping over all the data points in the training set), studies in this field have remained predominantly theoretical.arXiv:1712.09913v3

Implications for uniform-in-time probabilities
The two assumptions discussed above greatly simplify the uniform-in-time probabilities (17), and lead to an intuitive interpretation.
• Downward approximation: In this approximation the only contributing paths to a given vacuum I are those composed of a sequence of downward transitions.It follows that the probabilities (17) favor vacua that can be accessed through downward transitions from a large basin of ancestors.In other words, regions of the landscape with large probability must therefore have the topography of a deep valley, or funnel [26,36,37,39].This is akin to the smooth folding funnels of protein conformation landscapes [45], as sketched on the left panel in Fig. 3.
A similar narrative holds in deep learning.It has been argued that deep neural networks that generalize well have a loss function characterized by a smooth funnel [49] -see right panel in Fig. 3. Another instance is the "big valley" hypothesis in combinatorial optimization (e.g., the search space of the traveling salesman problem), where it is conjectured that local optima are clustered around the central global optimum [50].
• Dominant decay channel: In this approximation where T ij is 0 or 1, the measure (17) simply counts the number s I of ancestor vacua that can reach I: This is a key result of our analysis.It entails that the probability of occupying a vacuum is proportional to the number of other nodes that can access it through sequences of unsuppressed (T ij ≃ 1), downward transitions.This is illustrated in Fig. 4 for a trivial example.
Thus the problem of determining probabilities on vacua is reduced to a problem of directed percolation.To see this, consider a region of the landscape shown in Fig. 1, comprised of a number of transient dS vacua (blue nodes) and terminals (red nodes).(Recall that in the downward approximation terminals include AdS/Minkowski vacua, as well as dS vacua with only upward decay channels.)Each dS transient has exactly one directed edge emanating from it, corresponding to its dominant decay channel.
Suppose transients decay primarily to terminals, as sketched on the left panel in Fig. 1.In this case, nodes in the region will generically have s I ∼ O(1), corresponding to relatively low probability.From a percolation perspective, the region breaks down into many small disconnected components, and is therefore subcritical.Suppose, on the other hand, that transients decay primarily to other transients, as shown on the right panel.This corresponds to the emergence of a giant directed component, wherein the bottom nodes have s I ≫ 1, and therefore high probability.
It is clear from these simple considerations that the uniform-in-time probabilities (17) favor regions of the landscape that are close to the directed percolation phase transition [57].In what follows we will make this precise by studying directed percolation on random graphs and Bethe lattices.

Percolation on Directed Random Networks
To set up the problem, it is useful to review some essential notions of directed percolation [90,91].Concretely, in this work we study two simplified approaches for directed percolation on the landscape.In the first framework, discussed in this section, we model the a fiducial region of the landscape as a directed random graph with given degree probability distribution.As a special case, the Poissonian degree distribution corresponds to the celebrated Erdös-Rényi graph [51].In the second approach, discussed in Appendix C, we model the region as a regular lattice, specifically a Bethe lattice.
The directed percolation transition can be studied analytically for both Bethe lattices and (Erdös-Rényi) random graphs [52].(In fact, they belong to the same universality class, as we will see.)Our focus is on bond percolation, in which the percolation problem on either the Bethe lattice or the Erdös-Rényi random graph is defined by assigning a probability p that a given edge of the graph is "open".While the frameworks considered are highly idealized, they allow us to draw important lessons about percolation phenomena on the landscape, which we believe apply more generally to realistic dynamics.
Although much of the analysis is already in the literature, we include it here for completeness.The reader mainly interested in the punchline can skip to (55), which is the main result for our purposes.Our exposition primarily follows [52], which considers random graphs with general degree distributions.For pedagogical purposes, we have also included in Appendix A a review of percolation on undirected random graphs.Many of the results for the undirected case can be easily generalized to directed random graphs.

Directed random graphs
A directed random graph is specified by a joint in-degree and out-degree probability distribution: p jk = probability of randomly-chosen node having in-degree j and out-degree k .
< l a t e x i t s h a 1 _ b a s e 6 4 = " Q / C 4 D m q B m F l n 9 n a Q J 9 y d s U l A U y c = " < l a t e x i t s h a 1 _ b a s e 6 4 = " A c I 8 L 8 q T t f V s a i 5 l y I p W P j u J Y p I = " T q L I 0 + O y Q k 5 I z 6 5 J F V y S 2 q k T j i Z k G f y S t 6 c J + f F e X c + F q 0 5 J 5 s p k j 9 w P n 8 A M C C V I g = = < / l a t e x i t > 1st ancestor < l a t e x i t s h a 1 _ b a s e 6 4 = " P s t 4 u z n D 5 b 3 L C R 6 9 g L g 5 I 3 3 y k It is useful to work in terms of its moment generating function, Since the distribution is normalized, we have G(1, 1) = j,k p jk = 1.Its partial derivatives give in-and-out degree moments of the distribution.For instance, the average in-and out-degrees are given by Since every link leaving a node terminates at another node, the average in-and out-degrees must be equal: From p jk , we can derive the (marginalized) in-and out-degree distributions of a randomly-chosen vertex, with generating functions In particular, we have F ′ 0 (1) = G ′ 0 (1) = z.Now, suppose we start from a randomly-chosen vertex, and follow each of its outgoing links to reach its 1st-generation descendants (children), as shown on the left panel of Fig. 5. (In the directed case, ignoring loops, it is natural to distinguish the neighbors of a node as descendants and ancestors. 6) Let us denote by q out k the out-degree distribution of a 1st descendant.Since we are j times more likely to arrive at a vertex with in-degree j than a vertex of degree 1, we have which is correctly normalized.The corresponding generating function is given by If the original vertex has out-degree k, then the number of 2nd descendants is generated by G 1 (y) k .(This ignores loops, since their density is 1/N -suppressed close to the percolation threshold for large N , as argued in Appendix A.) Therefore, the number of 2nd descendants is generated by j,k p jk (G 1 (y)) For instance, using ( 28), ( 30) and ( 32), the average number of 2nd descendants is Similarly, suppose we once again start from a randomly-chosen vertex, but now follow each of its incoming links in the opposite direction to reach its 1st ancestors (parents).See right panel of Fig. 5.By similar reasoning, the in-degree distribution for a 1st ancestor is generated by The number of 2nd ancestors of the original vertex is generated by j,k p jk (F 1 (x)) j = F 0 F 1 (x) .For instance, the average number of 2nd ancestors is zF . This is of course identical to (33), given that it has x ↔ y symmetry.

Factorized example
An important example is the case where the in-and out-degree distributions are independent, This includes, as a particular case, Erdös-Rényi random graphs [51].In a directed Erdös-Rényi graph with N vertices, any two distinct vertices can be connected with a directed edge with probability p. Therefore both p in j and p out k are given by a binomial distribution: and similarly for p in j .The last step follows from taking the limit N → ∞, keeping the average degree z = p(N − 1) fixed, to obtain a Poisson distribution.
The generating function ( 27) also factorizes, with G 0 (y) = e z(y−1) and F 0 (x) = e z(x−1) .An immediate consequence of ( 37) is that G 1 (y) = G 0 (y), i.e., the out-degree distribution for a 1st descendant is the same as that of the original vertex.Similarly, F 1 (x) = F 0 (x).It follows that the average number of 2nd descendants (or ancestors), given by (33), satisfies < l a t e x i t s h a 1 _ b a s e 6 4 = " 8 L V r q J u c Q l 9 x N w s j g r 7 U F r w s e 1 f l S r 1 S q t 5 m c e T h B E 7 h H D y 4 h i r c Q w 0 a w A D h G V 7 h z X l 0 X p x 3 5 2 P R m n O y m W P 4 A + f z B 3 R z j L g = < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " G 4 r w s e 1 f l S r 1 S q t 5 m c e T h B E 7 h H D y 4 h i r c Q w 0 a w A D h G V 7 h z X l 0 X p x 3 5 2 P R m n O y m W P 4 A + f z B 3 R z j L g = < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " G 4 r w s e 1 f l S r 1 S q t 5 m c e T h B E 7 h H D y 4 h i r c Q w 0 a w A D h G V 7 h z X l 0 X p x 3 5 2 P R m n O y m W P 4 A + f z B 3 R z j L g = < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " k Q k D x P n V v 0 M g c / t s f q E u J Y O L y 9 w = " > A A A B 7 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S R S 1 G P R i 8 c K 9 g P a U D a b T b t 2 s x t 2 J 0 I p / Q 9 e P C j i 1 f / j z X / j t s 1 B W x 8 M P N 6 b Y W Z e m A p u 0 P O + n c L a + s b m V n G 7 t L O 7 t 3 9 Q P j x q G Z V p y p p U C a U 7 I T F M c M m a y F G w T q o Z S U L B 2 u H o d u a 3 n 5 g 2 X M k H H K c s S M h A 8 p h T g l Z q 9 U S k 0 P T L F a / q z e G u E j 8 n F c j R 6 J e / e p G i W c I k U k G M 6 f p e i s G E a O R U s G m p l x m W E j o i A 9 a 1 V J K E m W A y v 3 b q n l k l c m O l b U l 0 5 + r v i Q l J j B k n o e 1 M C A 7 N s j c T / / O 6 G c b X w Y T L N E M m 6 W J R n A k X l T t 7 3 Y 2 4 Z h T F 2 B J C N b e 3 u n R I N K F o A y r Z E P z l l 1 d J 6 6 L q X 1 Z r 9 7 V K / S a P o w g n c A r n 4 M M V 1 O E O G t A E C o / w D K / w 5 i j n x X l 3 P h a t B S e f O Y Y / c D 5 / A L 8 N j 0 E = < / l a t e x i t > . . .< l a t e x i t s h a 1 _ b a s e 6 4 = " 8 L V r q J u c Q l 9 x N w s j g r 7 U F

r w s e 1 f l S r 1 S q t 5 m c e T h B E 7 h H D y 4 h i r c Q w 0 a w
r w s e 1 f l S r 1 S q t 5 m c e T h B E 7 h H D y 4 h i r c Q w 0 a w A D h G V 7 h z X l 0 X p x 3 5 2 P R m n O y m W P 4 A + f z B 3 R z j L g = < / l a t e x i t > + < l a t e x i t s h a 1 _ b a s e 6 4 = " k Q k D x P n V v 0 M g c / t s f q E u J Y O L y 9 w = " > A A A B 7 X i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 B I v g q S R S 1 G P R i 8 c K 9 g P a U D a b T b t 2 s x t 2 J 0 I p / Q 9 e P C j i 1 f / j z X / j t s 1 B W x 8 M P N 6 b Y W Z e m A p u 0 P O + n c L a + s b m V n G 7 t L O 7 t 3 9 Q P j x q G Z V p y p p U C a U 7 I T F M c M m a y F G w T q o Z S U L B 2 u H o d u a 3 n 5 g 2 X M k H H K c s S M h A 8 p h T g l Z q 9 U S k 0 P T L F a / q z e G u E j 8 n F c j R 6 J e / e p G i W c I k U k G M 6 f p e i s G E a O R U s G m p l x m W E j o i A 9 a 1 V J K E m W A y v 3 b q n l k l c m O l b U l 0 5 + r v i Q l J j B k n o e 1 M C A 7 N s j c T / / O 6 G c b X w Y T L N E M m 6 W J R n A k X l T t 7 3 Y 2 4 Z h T F 2 B J C N b e 3 u n R I N K F o A y r Z E P z l l 1 d J 6 6 L q X 1 Z r 9 7 V K / S a P o w g n c A r n 4 M M V 1 O E O G t A E C o / w D K / w 5 i j n x X l 3 P h a t B S e f O Y Y / c D 5 / A L 8 N j 0 E = < / l a t e x i t > . . .
s n L p H 1 e t S + q t d t a q X 4 9 j 6 M A T s A p K A M b X I I 6 a I A m a A E M H s E z e A V v x p P x Y r w b H 7 P W F W M + c w T + w P j 8 A S 8 N l z w = < / l a t e x i t > a) H out 1 (y) < l a t e x i t s h a 1 _ b a s e 6 4 = " P R Q h e X r w w R r K A r e j C f j x X g 3 P m a t K 8 Z 8 5 g j 8 g f H 5 A y 8 U l z w = < / l a t e x i t > b) H out 0 (y) This holds for arbitrary factorized distribution (35), including the Erdös-Rényi case.

In-and out-component size distribution and percolation phase transition
Going back to the general case, the generating functions defined earlier allow us to study the size distribution of connected components.In the directed case, we must distinguish between the out-component, comprised of all descendants of a given vertex, and the in-component, comprised of all ancestors of a given vertex.
Let us focus for concreteness on the out-component, and define H out 1 (y) = Gen.fcn for number of descendants following a randomly-chosen edge.
Ignoring loops, this generating function satisfies the tree-like consistency condition depicted in Fig. 6a).
From the definition (31) for q out k , the tree-like structure implies the consistency condition where the last step follows from q out k being generated by G 1 (y).Note that the factor of y means that the chosen node is included in the counting.
Similarly, we define H out 0 (y) = Gen.fcn for number of descendants of a randomly-chosen vertex.(41) This generating function satisfies the tree-like consistency condition depicted in Fig. 6b).Using the fact that G 0 (y) is the generating function for the out-degree distribution of a randomly-chosen vertex, the consistency condition in this case sums up to By definition, H out 0 describes finite components, i.e., it excludes the giant out-component.As long we work below the percolation threshold, such that there is no infinite cluster, then H out 0 (1) = 1.Above the percolation threshold, H out 0 (1) gives the fraction of the vertices that do not belong to the giant component.See Appendix A for more details in the undirected case.
Note that the conditional probability P in (s|k) for an in-cluster having size s, given k in-degree of a vertex, is generated by It should be related to the generating function for the number of ancestors x −1 H in 0 (x) = F 0 H in 1 (x) via the definition Therefore by comparison we have A similar derivation applies to the out-cluster conditional probability.
The algorithm for determining H out 0 and H out 1 is then the following.Given a joint degree distribution p jk , with generating function G(x, y), we can determine G 0 (y) and G 1 (y) using ( 30) and (32), respectively.Then, the implicit equation ( 40) can be solved to obtain H out 1 (y), and the result is substituted into (42) to obtain H out 0 (y).For general random graphs, it is often difficult in practice to solve (40) analytically.It is, however, straightforward to calculate the moments of the size distribution, in particular the average size of (finite) connected components.For instance, consider the average size S out (z) of the out-component reached from a random vertex.For simplicity we work below the percolation threshold, such that there is no giant component, and H out 0 (1) = H out 1 (1) = 1.Using (42), we have On the other hand, from (40) we have where we have used (33).Therefore a giant out-component emerges when In exactly the same fashion, we can define generating functions H in 1 (x) and H in 0 (x) for the in-component size, obtain respectively by following a randomly-chosen edge and starting from a randomly-chosen vertex.In doing so, we follow each incoming link in the opposite direction.These generating functions satisfy the implicit relations Following identical steps as before, it is easy to derive the average size of the in-component: Therefore a giant in-component emerges when z 2 = z, which is the same as (48).In other words, the giant in-and out-components emerge simultaneously. 7cale-free networks are characterized by a degree distribution with power-law tail We require γ > 2 in order for the distribution to be normalizable and have finite mean.If γ > 3, such that the variance is also finite, then the percolation structure belongs to the Erdös-Rényi universality class.So the interesting regime is 2 < γ < 3 .
A concrete example is where ζ is the Riemann zeta function.The fraction of nodes with k = 0 ensures that the distribution is normalized and has mean degree z.Like Erdös-Rényi graphs, percolation occurs when z c = 1 [93,94].At percolation criticality, the distribution of component sizes also exhibits a power-law tail, but with a different critical exponent: (Notice that the power matches (54) as γ → 3.) Thus each value of γ in the range (58) defines its own universality class, comprised of all degree distributions with a scale-free tail with this particular power.The typical size of the giant component is estimated as before by setting P s (s ⋆ ) ∼ s

Directed Percolation in Eternal Inflation
Lacking detailed knowledge of the underlying string landscape, it is reasonable to model a fiducial landscape region as a random network.With the quantitative results of the previous section at hand, let us briefly recap the approximations underlying the mapping to directed percolation.
1.The downward approximation, in which upward transitions are neglected to leading order.Strictly speaking, the downward approximation requires us to study directed acyclic graphs, i.e., without directed loops.However, as argued earlier, at low connectivity the density of cycles is suppressed by 1/N , hence directed random graphs offer a reasonable approximation.
2. The dominant decay channel approximation, in which the branching ratio T Ij is either 0 or 1.This relies on semi-classical transition rates in field theory being exponentially staggered, and therefore generically dominated by a single decay channel.
Let us stress that these approximations are made for convenience, to simplify the problem.It is in principle straightforward to generalize our analysis by relaxing them.For instance, if a landscape region includes vacua that are nearly degenerate, such that the downward approximation is invalid, then the corresponding links would be bi-directed.The problem of directed percolation with a finite fraction of bi-directed edges was studied in [97], where it was shown that bi-directed edges act as a catalyst for directed percolation.Similarly, if subdominant transitions are not completely negligible, such that the dominant decay approximation is invalid, then the corresponding network would be a random weighted graph [98].

Generating functions
With these provisos in mind, consider a landscape region with N dS transient nodes (dS vacua) and N AdS terminal nodes.Although the latter also include dS vacua with only upward decay channels, as well as Minkowski vacua, we use the collective "AdS" subscript for simplicity.The moment generating function (27) can be written as with N = N dS + N AdS .Since terminals by definition have vanishing out-degree, we have where we have used (30).Furthermore, since transients have out-degree 1 in the dominant decay channel approximation, as depicted in Fig. 2, we should set G dS (x, y) = y k p dS 1k x j = yF dS 0 (x).However, we will proceed more generally for now, and specialize to z dS out ≃ 1 at the end of the calculation.The generating functions (30) for the marginalized in-and out-degree distributions of a randomly-chosen node are given by The condition (29) for edge conservation gives Next, the out-degree distribution for a 1st descendant, given by (32), amounts to weighing by the number of edges: Similarly, the in-degree distribution for a 1st ancestor, given by (34), reduces to The number of 2nd ancestors of a given vertex is generated by In particular, the average number of 2nd ancestors, which equals the average number of 2nd descendants, is

Percolation phase transition
The derivation of the directed percolation phase transition given in Sec.5.3 follows identically in the case of interest.For instance, the generating functions H out 1 (y) and H out 0 (y) for the number of descendants satisfy the same implicit relations ( 40) and ( 42): with G 0 and G 1 respectively given by ( 64) and (66).From (48), the directed percolation phase transition occurs when z 2 = z.Using ( 65) and ( 68), this means As mentioned earlier, consistent with the dominant decay channel approximation we should set G dS (x, y) = yF dS 0 (x), such that the out-degree of transient vacua is precisely 1.To see how percolation works out, it is instructive to keep things slightly more general by assuming that transients have independent in-and out-degree distributions: In this case the percolation condition (70) reduces to Equivalently, from (65), Equation ( 73) is a key result of our analysis.From the point of view of dS vacua, the presence of terminals pushes the percolation threshold above unity, i.e., z dS out > 1.This makes sense intuitively, as absorbing nodes inhibit the emergence of a giant component.On the other hand, the dominant decay channel approximation tells us that z dS out ≃ 1.Therefore, in order for a landscape region to be near percolation criticality, it must satisfy This is the situation shown in the right panel of Fig. 1, wherein dS vacua decay primarily to other transients, and the region includes a giant funnel of size In contrast, if a significant fraction of dS vacua decay into terminals, such that N AdS z AdS in is comparable to N dS , then the landscape region will be subcritical, as shown in the left panel of Fig. 1.
As argued in (25), in the downward and dominant decay channel approximations, the probability to occupy a node is proportional to the number of its ancestors: P (I) ∼ s I .In other words, vacua with high occupation probability have large number of ancestors.The probability that a randomly-chosen vacuum has s ancestors is precisely given by P s , defined in (52).For subcritical regions, the tail distribution of the component is exponentially cut off, as in (53), and therefore vacua in such regions typically have s ∼ O(1).For near-critical regions, however, P s displays a power-law tail, given by P s ∼ s −3/2 for the ER universality class, and ∼ s − γ γ−1 with 2 < γ < 3 for the scale-free graphs (see (60)).Correspondingly, near-critical regions include nodes whose number of ancestors is of order the size of the giant component, i.e., s ∼ s ⋆ .
Thus we arrive at an important realization.To the extent that landscape regions can be modeled as random networks, as we have done, we conclude that vacua with the highest occupational probability reside in landscape regions that are close to directed percolation criticality.As usual, near the percolation phase transition, various observables assume power-law (scale-invariant) probability distributions, characterized by universal critical exponents that are insensitive to the microscopic details of the system.As we are about to show, the critical exponent for P s translates to a critical exponent for the CC distribution.

Critical Exponent for the Cosmological Constant
Let f V (v) denote the underlying CC probability distribution function on the landscape, where v = Λ/M 4 Pl is the dimensionless CC.In what follows we will keep f V (v) completely general, except for one assumption made at the end, namely that the distribution is smooth as v → 0 + , such that where F V (v) is the cumulative distribution function. 8ur task is to derive a probability distribution P (v) that takes into account the measure factor from cosmological dynamics.For this purpose, we focus on landscape regions close to directed percolation criticality.As argued above, such regions include vacua whose number of ancestors are of order the size of the giant component, i.e., s ∼ s ⋆ , and therefore have very high probability.Furthermore, since all but one vacuum in the giant component (in any connected component, for that matter) is a terminal, we are justified in focusing on v > 0 to deduce the CC distribution.(This is an obvious consequence of the the downward and dominant decay channel approximations.)In other words, vacua in the giant component are overwhelmingly more likely to be dS vacua than AdS.For concreteness we focus on the Erdös-Rényi universality class, and briefly discuss the generalization to scale-free graphs at the end.
Consider a vertex in such a region, and suppose that this vertex has s ancestors and t descendants.If the vertex in question has vacuum energy v, then in the downward approximation its s ancestors all have larger vacuum energy, while its t descendants all have lower vacuum energy.In other words, the conditional CC probability distribution P (v|s, t) follows ordered statistics: It is convenient to use the cumulative distribution itself as the random variable, U ≡ F V , such that Clearly f U (u) is uniform over u ∈ [0, 1].Indeed, the ordered statistics of U are simply those of the uniform distribution: The desired CC probability distribution is obtained by marginalizing over s and t, where we have used the fact that s and t are independent random variables, even for correlated degree distributions [97].
The probability distribution f S (s) is given by where P s is the probability that a randomly-chosen vertex has s ancestors, defined in (52).The factor of s is the cosmological measure factor (25), which encodes the fact that the probability to pick a node is proportional to the number of ancestors.Since P s ∼ s −3/2 in the tail at criticality, f S (s) ∼ s −1/2 is not normalizable for N dS → ∞.For the realistic case of finite N dS , however, it is regularized by the giant component of size s ⋆ ∼ N 2/3 dS .It follows that f S (s) has the power-law behavior Meanwhile, f T (t) is just the probability distribution P t that a randomly-chosen vertex has t descendants.All we will need is its power-law tail behavior (55): 7.1 Analytic approximation for P (u) We proceed to evaluate (80) analytically.We assume throughout that u ≪ 1, and this will be justified a posteriori since the resulting distribution will peak for u ≪ 1.First, let us suppose that the integral is dominated by the tail region s, t ≫ 1, such that f S (s) and f T (t) are given by ( 82) and (83).In this regime, the Beta distribution is well-approximated by a Gaussian, and the sums can be approximated as integrals: The t integral can be evaluated using Laplace's method.The exponent is stationary for t 0 = su.Consistency of the tail approximation requires t 0 ≫ 1, and in particular Evaluating the t integral, we obtain This peaks for small u, as anticipated.
To see that the distribution is well-behaved as u → 0, consider the regime s ⋆ u ≪ 1.In this case we can make the approximation (1 − u) s ≃ 1 for all s, such that (80) becomes To proceed, let us assume that s ≫ t, which can be justified a posteriori.Thus we obtain We will comment further on the empty universe problem and scale-free graphs in the Conclusions.

Conclusions
The near-criticality of our universe may be the strongest empirical hint that we are part of a multiverse.A natural arena to realize this ensemble is the vast energy landscape of string theory, together with the dynamics of eternal inflation to instantiate in space-time the different vacua of the landscape.Making robust statistical predictions for physical observables in our own universe is unquestionably a task of fundamental importance in theoretical physics.Yet, how can we ever hope to make progress towards this goal without a detailed understanding of the string landscape?Fortunately, despite all of its conceptual pitfalls, eternal inflation boils down to a random walk on the network of vacua.Technically this is an absorbing Markov process, because of terminal (AdS/Minkowski) vacua which act as sinks.Hence the dynamics are inherently non-equilibrium.The Markov process leads to a natural definition of probabilities as occupational probabilities for the random walk.
In this paper we showed how the Markov process governing vacuum dynamics can be mapped naturally to a problem of directed percolation on the network of vacua.The mapping relies on two very general and well-justified approximations for transition rates: 1. the downward approximation, which neglects "upward" transitions, as these are generally exponentially suppressed; 2. the dominant decay channel approximation, which capitalizes on the fact that tunneling rates are exponentially staggered.
With these simplifying assumptions, we argued that the uniform-in-time probabilities reduce a simple and intuitive observable in directed graphs.Namely, the probability to occupy a particular node is proportional to the number of its ancestors, i.e., how many other nodes can reach it through a sequence of directed (downward) transitions.Thus the probabilities favor vacua with a large basin of ancestors, lying at the bottom of a deep funnel.Funneled landscape topography appears to be a common solution to optimization on complex energy landscapes, including protein folding [45], atomic clusters [46][47][48], deep learning [49], and combinatorial optimization [50].
Lacking detailed knowledge of the string landscape, we modeled the network of vacua as random graphs with arbitrary degree distributions, including Erdös-Rényi and scale-free graphs.As a complementary approach, we also modeled regions of the landscape as regular lattices, specifically Bethe lattices.Despite representing extreme opposites of graph regularity, Bethe lattices and Erdös-Rényi belong to the same percolation universality class.Thus one may hope that the lessons drawn from studying percolation in these simplified, idealized setups carry over to more realistic landscapes.
The most important result of our analysis is that the uniform-in-time probabilities favor regions of the landscape poised at the directed percolation phase transition.In other words, our vacuum most likely resides within a network of vacua tuned at directed percolation criticality.As usual, the predictive power of criticality lies in universality.This raises the tantalizing prospect of deriving statistical predictions for physical observables that are insensitive to the details of the underlying landscape.More broadly, it suggests a deep and powerful relation between phase transitions in landscape dynamics and the inferred near-criticality of our universe.
To illustrate this point, we derived a probability distribution for the CC.At percolation criticality, the probability distributions for the number of ancestors and descendants of a given node both display powerlaw tails, with certain critical exponents.Assuming only that the underlying CC distribution function is smooth near the origin, we derived probability distributions for the CC that are also power-law, with critical exponents that are determined by the universality class (Erdös-Rényi or scale-free).These distributions favor small, positive CC, and can account for the observed CC if our vacuum belongs to a large enough region.In fact, since the measure favors vacua with the largest number of ancestors, we are likely to inhabit the largest funnel region near percolation criticality.

< l a t e x i t s
N H 8 k x e y Z v z 5 L w 4 7 8 7 H r D X j z G c O y R 8 4 n z / S K p V 6 < / l a t e x i t > 2nd neighbors Figure 7: Starting from a randomly-chosen ("original") vertex, we follow of its links (dashed line) to a 1st neighbor.The probability that this 1st neighbor has k edges excluding the one we followed is q k , with generating function G1(x) given by (99).
There are many future directions of inquiry worth pursuing.Let us mention two concrete follow-ups: • It is possible to derive probability distributions for other physical observables.To give one example, consider a node of in-degree k.The joint probability distribution for its k parents to have s 1 , s 2 , . . ., s k ancestors, P (s 1 , . . ., s k |k), translates to a joint distribution for the potential energy of its parents, given by P (v 1 , . . ., v k |k).This statistics informs us on the energy scale of the last period of slowroll inflation, which of course has immediate bearing on the observational prospects of detecting primordial gravitational waves.This also has immediate bearing on the potential empty universe problem discussed in Sec. 7.For instance, one could condition on sufficiently large k, i.e., large number of parents.
• The connection with other complex energy landscapes deserves further exploration.A remarkable aspect of protein folding networks is that they are scale-free, with the native state acting as a hub with very large degree [105].Protein folding networks also display the small-world property, and are hierarchical.Similar properties are found in atomic clusters with Leonard-Jones interactions [46][47][48] -their funnel topography is hierarchical (funnels nested within larger funnels), and the degree distribution is scale-free.It may be that such properties are generic to optimization on complex energy landscapes.It will be fascinating to explore their implications in the context of landscape dynamics.
with normalization condition G 0 (1) = ∞ k=0 p k = 1.Its derivatives give as usual the moments of the distribution, such as the average degree: The next important quantity is the degree distribution of a vertex reached by following a randomly-chosen edge.This distribution is not simply p k , since we are k times more likely to arrive at a vertex with degree k as we are at a vertex of degree 1. Therefore this distribution is proportional to kp k .Now, suppose we start from a randomly-chosen vertex, and follow each of its links to reach the 1st neighbors.We are interested in the "excess degree" of the 1st neighbors, which excludes the edge we arrived along.Let q k = probability that 1st neighbor has excess degree k . ( This is shown in Fig. 7.This distribution is related to p k by q k = (k+1)p k+1 k kp k = (k+1)p k+1 z , and its generating function is given by Now, if the original vertex has degree k, then the number of second-nearest neighbors is generated by G 1 (x) k .This ignores loops, which are negligible in the large N limit below the percolation threshold, as argued below.It follows that number of 2nd neighbors is generated by For instance, the average number of 2nd neighbors is where we have used (97) and G 1 (1) = 1.

A.1 Erdös-Rényi example
An important example is the Erdös-Rényi graph [51].An undirected Erdös-Rényi graph is a random graph with N vertices, in which an edge between any two distinct vertices has a probability p of being included.Therefore the probability distribution p k is binomial and tends to the Poisson distribution for large N , as given by (36).The generating functions ( 96) and (99) in this case are given by The average number of 2nd neighbors (100) is p 2 (N − 1)(N − 2) ≃ p 2 N 2 , and therefore To see that the probability of having loops is suppressed by 1 N below or close to the percolation threshold, consider the probability of having two 1st neighbors connected to each other.For a random vertex with degree k there are k(k−1) 2 possible edges between any pair of its neighbors.The chance of having at least . In Erdös-Rényi graphs, p ≃ z N , thus the chance of having triangular loops is ≃ zk(k−1) 2N ∼ 1 N in the large N limit.
On the other hand, from (104) we have H where in the last step we have used ( 97) and (100).It is clear that the percolation threshold where a giant component emerges is given by the condition (Notice that this is identical to (48), hence the percolation structure matches that of the directed case.)This result holds for any degree distribution p k with finite mean and variance.More generally, above the phase transition S(z) gives the average size of finite clusters, i.e., excluding the giant component.Using (107), the generalization of (108) is where H 1 (1) = u is the probability that there is no infinite cluster going down an edge.It is the smallest solution of u = G 1 (u), with 0 ≤ u ≤ 1.As an explicit example, consider Erdös-Rényi graphs.Combining ( 102) and (110), we recover the classic result that percolation occurs at a critical mean degree z c = 1, and corresponding critical probability p c ≃ 1 N .Using (101), the implicit equations ( 104) and ( 106) are readily solved in terms of the Lambert W function: The fraction of vertices belonging to a giant cluster (107) is given by Equivalently, P (z) satisfies the well-known implicit relation P (z) = 1 − exp − zP (z) .As shown in Fig. 9 statement implies Given a solution w ⋆ to (120), we can solve (119) to find the corresponding x ⋆ : and thus s max via (117).It should be stressed that (120) need not have a solution.When it does not, then the asymptotic behavior of P s is not of the form (116).We will see, however, that a solution to (120) exists close to the phase transition.Indeed, at the percolation threshold, defined by G ′ 1 (1) = 1, (120) and (121) are solved by Correspondingly, s max → ∞.To determine the power-law for P s at criticality, let us expand (119) around x ⋆ = w ⋆ = 1: where we have used G 1 (1) = G ′ 1 (1) = 1.The derivation clearly relies on G ′′ 1 (1) being finite.From (99), this amounts to assuming that the first three moments of the degree distribution are finite.This is the case, in particular, for the Poissonian distribution, but not for scale-free graphs.It follows from (123) that Therefore, since w = H 1 (x), we deduce that the singular behavior of H 1 , and therefore that of H 0 (x) as well, near x = 1 is given by The critical exponent β = 1/2 is related to the critical exponent τ for P s as follows.On the one hand, from (125) we have On the other hand, the asymptotic form (116) implies that H 0 (x) can be expressed as where C is a constant, and a is sufficiently large.The singular behavior as x → 1 is encoded in the infinite sum.Focusing on this term we have In the limit of large a, i.e., large s, the infinite sums can be approximated as integrals: Substituting into (128) and taking the limit x → 1, we obtain Since β = 1/2, it follows that τ = 3/2, which proves (54).

Appendix C Bethe Lattice and its Percolation Structure
As a second approach to percolation on the landscape, the underlying graph topology is modeled as a Bethe lattice (or Cayley tree).This approach is appropriate whenever transitions are strictly "local" in field space, i.e., they are non-negligible only between "nearest-neighbor" vacua, while transitions to more distant vacua can be safely ignored.

C.2 Directed Bethe lattice
In the situation of interest, the downward approximation implies that the graph is a directed Bethe lattice.This reflects the fact that low-lying (high-lying) vacua have mostly incoming (outgoing) edges.Choosing a fixed vertex as the center, we denote by r the probability that an edge is directed outward from this center, and by s = 1 − r the probability of the edge being directed inward.Thus the probability of including an inward edge is pr.For general r, vertices are no longer indistinguishable since the chosen center vertex is indeed the center.(The exception is r = 1 2 , where every vertex is still statistically the same.)Nevertheless, away from the center vertex, layers after layers still have the same self-similarity structure.
Certain properties are insensitive to r, such as the emergence of the giant component.One can ask about the probability for the existence of a directed path from the center vertex to infinity.The derivation of P (p) in the undirected case applies just as well to the directed case with the replacement p → pr.Therefore the probability for the existence of a path to infinity and the probability for a path from infinity are respectively given by a path to ∞ : P (pr) ; a path from ∞ : P (ps) .
Similarly, the average size of the outward-and inward-pointing clusters are respectively given by S(pr) and S(ps).Obviously their sum must be strictly smaller than the average cluster size in the undirected case, S(pr) + S(ps) − 1 < S(p) , below the percolation threshold, p ≤ p c .

Figure 1 :
Figure 1: The loss surfaces of ResNet-56 with/without skip connections.The proposed filter normalization scheme is used to enable comparisons of sharpness/flatness between the two figures.32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada.

Figure 3 :
Figure3: The uniform-in-time probabilities favor regions of the landscape with the topography of a deep valley, or funnel.This is akin to the free energy landscapes of naturally-occurring proteins (Left, reproduced from[89]), which are characterized by a large funnel near the native state.Right: A similar narrative holds in deep learning.This shows the loss surfaces of ResNet-56 with/without skip connections, reproduced from[49].Skip connections lead to better generalization, and correspond to a loss function characterized by a smooth funnel.

Figure 4 :
Figure 4:  In the downward and dominant decay channel approximations, the probability for a given node is proportional to the number of its ancestors.In this simple example, the bottom node has s = 4 ancestors.
o e 0 T N 6 R W / O k / P i v D s f s + i S k 8 8 c o D 9 w P n 8 A / E y W v A = = < / l a t e x i t > original vertex < l a t e x i t s h a 1 _ b a s e 6 4 = " Q / C 4 D m q B m F l n 9 n a Q J 9 y d s U l A U y c = " > A A A B / 3 i c b V D L S g M x F M 3 4 r P U 1 K r h x E y y C q z o e 0 T N 6 R W / O k / P i v D s f s + i S k 8 8 c o D 9 w P n 8 A / E y W v A = = < / l a t e x i t > original vertex< l a t e x i t s h a 1 _ b a s e 6 4 = " c Y B 0 V u g e f m j 0 X Q 3 D S c c D J D K m Q J U = " > A A A B / 3 i c b V D L S g N B E J y N r x h f U c G L l 8 E g e A q 7 I a j H o B e P E c w D k h B m Z z v J k N n Z Z a Z X D G s O / oo X D 4 p 4 9 T e 8 + T d O H g d N L G g o q r r p 7 v J j K Q y 6 7 r e T W V l d W 9 / I b u a 2 t n d 2 9 / L 7 B 3 U T J Z p D j U c y 0 g 0 E B o 4 y p E l j G t h b 6 V 8 w D T j a C P L 2 R C 8 x Z e X S b 1 U 9 M 6 L 5 d t y o X I 1 j y N L j s k J O S M e u S A V c k O q p E Y 4 e S T P 5 J W 8 O U / O i / P u f M x a M 8 5 8 5 p D 8 g f P 5 A 2 A a l l k = < / l a t e x i t > 2nd descendants < l a t e x i t s h a 1 _ b a s e 6 4 = " k T

Figure 5 :
Figure 5: Left: 1st and 2nd-generation descendants of an original vertex.Right: 1st and 2nd-generation ancestors of that vertex.

Figure 6 :
Figure 6: a) Tree-like structure satisfied by the generating function H out 1 for the number of descendants following a randomly-chosen edge; b) Same structure, but for the generating function H out 0 starting from a randomly-chosen vertex.

< l a t e x i t s h a 1 _
b a s e 6 4 = " y p 5 w z 5 5 0 9 n T R T r H p Q 5 F Q O m r Y 9 U 4 = " > A A A B 6 3 i c d V D L S g M x F M 3 U V 6 2 v q k s 3 w S L U z T B T + n J X d O O y g n 1 A O 5 R M m m l D k 8 y Q Z I Q 6 9 B f c u F D E r T / k z r 8 x 0 4 6 g o g c u H M 6 5 l 3v v 8 S N G l X a c D y u 3 t r 6 x u Z X f L u z s 7 u 0 f F A + P u i q M J S Y d H L J Q 9 n 2 k C K O C d D T V j P Q j S R D 3 G e n 5 s 6 v U 7 9 0 R q W g o b v U 8 I h 5 H E 0 E D i p F O p X b 5 / n x U L D m 2 6 9 Q v m g 3 o 2 M 4 S h t S b l W q t B t 1 M K Y E M 7 V H x f T g O c c y J 0 J g h p Q a u E 2 k v Q V J T z M i i M I w V i R C e o Q k Z G C o Q J 8 p L l r c u 4 J l R x j A I p S m h 4 V L 9 P p E g r t S c + 6 a T I z 1 V v 7 1 U / M s b x D p o e g k V U a y J w K t F Q c y g D m H 6 O B x T S b B m c 0 M Q l t T c C v E U S Y S 1 i a d g Q v j 6 F P 5 P u h X b r d v V m 2 q p d Z n F k Q c n 4 B S U g Q s a o A W u Q R t 0 A A Z T 8 A Ce w L P F r U f r x X p d t e a s b O Y Y / I D 1 9 g m o + I 4 D < / l a t e x i t > P (z) < l a t e x i t s h a 1 _ b a s e 6 4 = " e u n s T DM D 8 k X G o i Y M m B 9 B c K X c 9 + w = " > A A A B 6 3 i c d V D L S g M x F M 3 U V 6 2 v q k s 3 w S L U z T D T 1 9 R d 0 Y 3 L i r Y W 2 q F k 0 k w b m m S G J C P U 0 l 9 w 4 0 I R t / 6 Q O / / G T F t B R Q + E H M 6 5 l 3 v v C W J G l X a c D y u z s r q 2 v p H d z G 1 t 7 + z u 5 f c P 2 i p K J C Y t H L F I d g K k C K O C t D T V j H R i S R A P G L k N x h e p f 3 t H p K K R u N G T m P g c D Q U N K U Y 6 l a 6 L 9 6 f 9 f M G x q 4 5 X O i t D x z Z / p e 4 Z U v b K t X o V u r Y z R w E s 0 e z n 3 3 u D C C e c C I 0 Z U q r r O r H 2 p 0 h q i h m Z 5 X q J I j H C Y z Q k X U M F 4 k T 5 0 / m u M 3 h i l A E M I 2 m e 0 H C u f u + Y I q 7 U h A e m k i M 9 U r + 9 V P z L 6 y Y 6 r P t T K u J E E 4 E X g 8 K E Q R 3 B9 H A 4 o J J g z S a G I C y p 2 R X i E Z I I a x N P z o T w d S n 8 n 7 R L t l u z K 1 e V Q u N 8 G U c W H I F j U A Q u 8 E A D X I I m a A E M R u A B P I F n i 1 u P 1 o v 1 u i j N W M u e Q / A D 1 t s n 0 X G O H w = = < / l a t e x i t > S(z)

Figure 9 :
Figure 9: Percolation criticality for Erdös-Rényi graphs.Blue curve: The fraction of vertices P (z) belonging to the giant component, given by (113).Purple curve: The average size of finite clusters, S(z), excluding the giant component, given by (114).

Figure 11 :p c p 1 − p 1 − p c 2 exp 2 ( 1 −
Figure 11: The probability P (p) for a vertex to belong to an infinite cluster (blue) and the average cluster size S(p) (orange) as a function of p for an undirected Bethe lattice with degree k = 4, corresponding to pc = 1/3.