Rethinking Mean-Field Glassy Dynamics and Its Relation with the Energy Landscape: The Surprising Case of the Spherical Mixed p -Spin Model

The spherical p -spin model is a fundamental model in statistical mechanics of a disordered system with a random first-order transition. The dynamics of this model is interesting both for the physics of glasses and for its implications on hard optimization problems. Here, we revisit the out-of-equilibrium dynamics of the spherical mixed p -spin model, which differs from the pure p -spin model by the fact that the Hamiltonian is not a homogeneous function of its variables. We consider quenches (gradient descent dynamics) starting from initial conditions thermalized in the high-temperature ergodic phase. Unexpectedly, we find that, differently from the pure p -spin case, the asymptotic states of the dynamics keep memory of the initial condition. The final energy is a decreasing function of the initial temperature, and the system remains correlated with the initial state. This dependence disproves the idea of a unique “ threshold ” energy level attracting dynamics starting from high-temperature initial conditions. Thermalization, which could be achieved, e.g., by an algorithm like simulated annealing, provides an advantage in gradient descent dynamics and, last but not least, brings mean-field models closer to real glass phenomenology, where such a dependence is observed in numerical simulations. We investigate the nature of the asymptotic dynamics, finding an aging state that relaxes towards deep, marginally stable minima. However, careful analysis rules out simple generalizations of the aging solution of the pure model. We compute the constrained complexity with the aim of connecting the asymptotic solution to the energy landscape. in the integration step Δ t , and the resulting integration error in using Δ t ¼ 0 . 1 is very small compared to the precision needed in the asymptotic evaluation of the observables.


I. INTRODUCTION
Understanding the relation between the dynamical behavior and the underlying energy landscape is a fundamental question in the physics of glassy systems. The same question arises in a broader interdisciplinary perspective in optimization, information theory, and computer science. In these fields, the comprehension of the performances of classes of iterative algorithms represents both a practical and a fundamental theoretical problem. One notable example, among many, is provided by supervised learning, where one formalizes data learning as the problem of minimizing a loss function that depends parametrically on the data [1].
The development of the mean-field theory of spin glasses has provided unifying concepts that, starting from glassy materials, also allowed deep insight in optimization, inference, and machine learning [2].
Simple mean-field spin-glass models have been fundamental to understand the emergence of glassy dynamics within the so-called random first-order transition (RFOT) scenario [3] since they provide a concrete realization of the phenomena observed in more realistic glass formers. Many recent reviews exist [4][5][6] that discuss in detail the RFOT and the simple spin-glass models related to it. The same mean-field spin-glass theory has recently been used to address fundamental problems in the theory of computation [2] and to model the jamming state via high-dimensional interacting spheres [7], thus showing that key concepts of this theory have a broad range of applications.
Theoretical progress has been paralleled by the invention of powerful optimization algorithms-such as simulated annealing [8], parallel tempering [9], and quantum annealing [10]-that mimic physical dynamics, or message passing algorithms, such as survey propagation [11] and approximate message passing [12], which are based on the mean-field equations for spin glasses.
The spin-glass model that, more than any other, has shaped our theoretical ideas about the relation between the energy landscape and the relaxation dynamics is the spherical p-spin model [13]. This model is a system of spherical spins that interact through a disordered and long-range p-body Hamiltonian. Despite its simplicity and its abstract nature, the model is important. In a nutshell, its very explicit exact solution provides the complete description of glassy phenomena in the mean field, including the long-time dynamics [14,15], thermodynamics [13], and the structure of metastable states [16]. Arguably, this model makes the RFOT more explicit and makes it easier to produce theoretical predictions [17]. Thanks to these virtues, the same system has been proposed as a simplified model to rationalize the properties of the loss function and the dynamics of learning in deep networks [18] and in problems involving tensors such as tensor completion and tensor principal component analysis [19,20]. Last but not least, the model has been widely used in the mathematical physics community to understand spin-glass mean-field theory in rigorous terms [21].
The picture of the glassy dynamics that emerges from the p-spin model is very attractive. Below the temperature T MCT , identified as an ideal mode-coupling transition [22], the dynamics following a quench from a random initial condition falls in an aging state [15]. In this state, despite the lack of equilibration, any memory of the initial state is lost and the energy of the system tends to a temperaturedependent threshold value extensively higher than the equilibrium one. A crucial prediction, which has deep consequences both for glassy physics and for algorithms, is that the same asymptotic energy is reached if instead of starting from a completely random condition the system is quenched from an initial condition well thermalized at some temperature above T MCT (see Fig. 1, left diagram).
In the standard aging solution, the property that the asymptotic dynamics forgets the initial condition, as well as any configuration reached at finite times, is called weak ergodicity breaking [23]. It is essential to the solution of the long-time relaxation dynamics [15] because it allows us to decouple the dynamics at short and long times. It is believed to hold in any mean-field spin-glass model, although very recent results are raising some doubt [24].
In the context of glasses, the independence on the initial condition contrasts with simulations of realistic glassformers. In these systems, one can define a temperature value T onset well in the liquid region, where relaxation crosses over from exponential to stretched exponential. Around this temperature and below, the states reached after quenching, deemed inherent structures (IS), have energy that is sensibly dependent on the starting temperature [25,26].
This mismatch has often been considered as a failure of mean-field theory to describe an important aspect of finitedimensional glassy dynamics, the main argument being based on the assumption that the phenomenon is due to activated processes that are absent in mean-field models, where barriers are divergent with the system size. The net result is the lack of a theoretical comprehension of dynamical onset within RFOT.
For algorithms, the above picture rules out simulated annealing as a valid optimization protocol in systems with an ideal RFOT transition. Any time spent in the ergodic phase is irrelevant for optimization: In the long run, the system gets stuck at the universal threshold energy level. By extension, it has been hypothesized that algorithms running in a polynomial fail to optimize below threshold.
Unfortunately, despite its popularity, the p-spin model possesses a symmetry that makes it singular among meanfield spin glasses. Its Hamiltonian is a homogeneous function of its variables. While it is very well known that this homothetic symmetry puts in a one-to-one correspondence finite-temperature metastable state with energy minima, the independence of the low-temperature asymptotic energy on the initial condition is generally believed to be a generic feature of fully connected models.
In this paper, we present analytical evidence that this belief needs to be revised. As soon as we move to the mixed p-spin model [27,28] where the Hamiltonian is no longer homogeneous, the physical picture changes drastically. This model shares with the simpler pure p-spin model the property that one can write closed dynamical meanfield equations that can be integrated numerically with high precision at finite time and analyzed analytically in the asymptotic regimes.
Our main findings are schematically summarized in Fig. 1. In the mixed model, an onset temperature exists, such that below T onset the system never forgets the initial configuration, and it relaxes to marginal states whose energies depend on the initial temperature. If we define FIG. 1. Schematic representation of the different behavior of gradient descent dynamics in pure and mixed p-spin models. In the pure model (left diagram), all initial temperatures higher than T MCT lead to the same aging dynamics at the threshold energy, while starting below T MCT the dynamics is trapped within a state: Here, T MCT is the only relevant dynamical critical temperature. In the more generic mixed model (right diagram), two dynamical critical temperatures exist, T onset and T SF , such that, starting from a temperature in between, the dynamics tends to a marginal state depending on the initial configuration while aging and keeping memory of the short-time evolution. the threshold energy as the value reached after a quench from a completely random point (infinite temperature), we find that in this model, the relaxation dynamics can go below threshold while aging and keeping memory of the short-time evolution. This picture is richer than the one that is usually associated with a RFOT, thus providing support for an improved RFOT already at the mean-field level.
(i) For glasses, we have a simple mean-field model, more generic than the pure p-spin model, allowing us to naturally incorporate the onset of glassy dynamics in the unifying theoretical framework of RFOT. This central result shows that, differently from what is commonly believed, genuinely finitedimensional phenomena such as activation-but also interface dynamics or strongly spatially heterogeneous fluctuations-need not necessarily be invoked to explain the appearance of the dynamical onset. (ii) For algorithms, we have that even in cases where the asymptotic energy turns out to be larger than the ground-state energy, thermalization in the hightemperature region is helpful. Relaxation dynamics can go below the threshold energy even without jumping over barriers. Thus, the present model gives rigorous support to the empirical observations that annealing works and suggests explanations that do not require activated processes. Unfortunately, in the interesting regime where the energy goes below threshold, the asymptotic dynamics is not described by a simple generalization of the Cugliandolo-Kurchan aging solution [15]. The relation between the dynamics and landscape in mean-field models of glasses is more complex and also harder to solve than previously thought.
The paper is organized as follows. The next section briefly notes the salient features of mean-field glasses. In Sec. III, we define the model. In Sec. IV, we discuss the properties of the landscape and the number of stationary points of the energy surface of the model. Section V presents the equations describing the off-equilibrium dynamics and the effect of a nonrandom initial condition. Section VI constitutes the core of the paper, where we discuss the energy of inherent structures, the puzzling properties of the zero-temperature dynamics, and its relation to the landscape. The results are extended to relaxation in the presence of a small thermal noise in Sec. VII. We discuss an approximated dynamical solution and the emergence of an onset temperature in Sec. VIII. We comment on the results and their consequences in Sec. IX. The paper comprises a large number of Appendixes devoted to the discussion of more technical aspects of the work that support and complement the main text. In Appendix A, we present some formal solutions of the long-time asymptotic aging regime and its connections with the effective potential theory. An approximate solution to the asymptotic dynamics is analyzed in Appendix B. We present the computation of the constrained complexity of energy minima in Appendix C. In Appendix D, we discuss finite-temperature dynamics in equilibrium. Appendix E deals with the properties of the response function at short times in the aging regimes. Finally, in Appendix F, we describe the numerical solution of the dynamical equations.

II. SETTING THE STAGE
A widely used statistical characterization of the energy landscape sampled by the glassy dynamics is the numerical study of the inherent structures (ISs), defined as the local minima of the energy potential reached by a steepest descent procedure starting from well-thermalized initial conditions at temperature T [25,[29][30][31][32]. The thermalization temperature of the initial configuration is sometimes called the parent temperature, but here, when there is no ambiguity, we simply call it temperature. In simulations of glass-forming liquids, it is observed that starting from thermalized configurations at temperature T, the energy of the corresponding ISs concentrates around a value E IS ðTÞ that depends on T only below a characteristic onset temperature T onset . Remarkably, this temperature is higher than the estimated "mode-coupling temperature" T MCT , where mode-coupling theory would predict structural arrest. Similar effects are typical in finite-dimensional disordered systems and are also observed, e.g., in 3D Heisenberg spin glasses [33]. Here, simulated annealing is effective: In order to minimize the energy, equilibrating at lower and lower temperatures is a better strategy than a sudden quench from a high-energy state.
As stressed in the previous section, according to the common belief, the temperature dependence of the IS energy is a consequence of activation and should not appear in models with long-range interactions: Initial conditions thermalized above T MCT would all have the same value of the IS energy; any memory of the initial condition would be lost after a long enough time.
However, this belief is based on the solution of the spherical pure p-spin model [13], which, in a nutshell, gives the following dynamical picture: (1) From the point of view of equilibrium dynamics, the model provides an exact realization of the mode-coupling theory scenario. One finds dynamical freezing at a temperature T MCT , where the free energy does not present singularities [14].
(2) Randomly chosen initial conditions evolved according to the Langevin dynamics at temperature T f < T MCT fail to equilibrate. They fall in an aging state, which fails to attain a time-translation-invariant long-time regime and present a nontrivial response to perturbations [15]. (3) Any memory of the initial condition is lost in the aging state; the choice of an initial condition thermalized at temperatures T in > T MCT would not change the asymptotic of a dynamics in the presence of a bath at temperature T f < T MCT . There is a universal threshold energy E th that attracts the zerotemperature dynamics of any such initial conditions, i.e., E IS ðTÞ ¼ E th for all T > T MCT . (4) Initial conditions thermalized at T < T MCT live in metastable states that can be followed down in temperature [27,34,35]. In that case, one reaches an energy E 0 ðTÞ, which is monotonous in T and smaller than the threshold value E th . Unfortunately, such initial conditions cannot be generated in a finite time.
Such a detailed dynamical picture is deeply rooted in the structure and organization of the stationary points of the energy and the free-energy landscapes [36]. (A) For E < E th , almost all stationary points of the energy are isolated and stable minima. The stability gap of the minima (i.e., the minimum eigenvalue of the Hessian) is uniquely a function of the energy. Saddle points are exponentially suppressed and have only a finite number of unstable directions. (B) For E > E th , almost all stationary points are saddles. (C) Minima with E ¼ E th are marginal. Dynamics is attracted by these threshold states, no matter what the thermalization effort is at T > T MCT . It is clear that simulated annealing is of no use for such a system. (D) Below-threshold energy minima are separated by extensive barriers. There is a one-to-one correspondence between energy minima and finite-temperature metastable states (minima of the Thouless-Anderson-Palmer free-energy [16]). Metastable states can be followed up and down in temperature.
The general picture above, both for the dynamics and for the structure of the landscape, is not restricted to the spherical p-spin model; it is generic to mean-field models with a discontinuous glass transition (one-step replica symmetry breaking, or 1RSB in spin-glass jargon) and has been confirmed by the exact description of glasses of particles in the limit of infinite dimensions [37]. However, there are important aspects of the physics of the spherical p-spin that are deeply model-specific. Its Hamiltonian is homogeneous and does not allow for bifurcations or merging of metastable states while changing the temperature. The energy landscape is simpler than more generic mean-field models. It is well known that in other mean-field modes like the Ising p-spin model, the Potts glass or even spherical models with mixtures of multibody interactions [27], when metastable states are followed down in temperature, bifurcations are encountered. This result leads to the glass-to-glass Gardner phase transition, which has received much attention recently [38].
The study of metastable states below such a transition reveals puzzling aspects. The mean-field solutions describing the evolution of states in temperature may unexpectedly disappear [39] as the temperature is lowered. This anomalous behavior has not received a satisfactory explanation, and its implications are not very well studied. In this paper, we reexamine dynamics in models where these phenomena occur and challenge crucial aspects of the p-spin dynamical picture: the emergence of a universal dynamical threshold and the loss of memory of high-temperature initial conditions.

III. MODEL DEFINITION
We concentrate on the mixed p-spin spherical model. This model, while presenting a more generic phenomenology, keeps much of the analytical simplicity of the pure spherical p-spin. In the mixed model, states bifurcate, and the above-mentioned anomalous behavior occurs. The Hamiltonian of the mixed p-spin model is just a sum of p-spin interacting terms with different p values, Choosing the couplings J ðpÞ i 1 …i p as independent random Gaussian variables of zero mean and variance, E½ðJ ðpÞ i 1 …i p Þ 2 ¼ ðp!=2N p−1 Þ, and the coefficients a p ≥ 0 with P p a p < ∞, we find that H J is a random Gaussian function on the N-dimensional sphere ( We choose fðqÞ as a polynomial such that the model has a RFOT in thermodynamics, associated with an ideal mode-coupling transition in equilibrium dynamics. In addition, we exclude from our analysis models that can present a Gardner transition [38] to a continuous replica symmetry-breaking state. It is well known that a sufficient condition for this transition is that the function f is such that 1= ffiffiffiffiffiffiffiffiffiffiffi f 00 ðqÞ p is a convex function of q, a property that we assume throughout the paper. The pure p-spin model corresponds to the case of a single monomial fðqÞ ¼ 1 2 q p ; we study a more generic class of polynomials f where at least two coefficients are positive (mixed p-spin model). While we keep the function f generic in formulas, all of our numerical results are presented for fðqÞ ¼ 1 2 ðq 3 þ q 4 Þ and compared with the pure case with p ¼ 3.
In other examples, we checked that all of the results we get in this particular case are typical of general mixed models.
To investigate how much of the above picture based on the solution of the pure p-spin model is generic, we perform a high-precision integration of the dynamical equations describing the evolution of the system in the thermodynamic limit, starting from a thermalized initial condition at temperature T and subsequently undergoing a gradient descent dynamics. Our main finding is that the above picture is not generic, and inhomogeneous models show several unexpected features. The value of the asymptotic energy predicted under the assumption of aging with FOLENA, FRANZ, and RICCI-TERSENGHI PHYS. REV. X 10, 031045 (2020) 031045-4 loss of memory is only found starting from high enough temperatures. We find strong evidence for the existence of an onset temperature T onset > T MCT below which the energy of the IS depends on the initial temperature T and goes below the threshold value.

IV. ENERGY LANDSCAPE
In order to put our dynamical results in the right framework, we first discuss the structure and organization of the stationary point of the mixed Hamiltonian. The stationary points of the Hamiltonian H½σ on the sphere The parameter μ, which we call the radial reaction force, is a Lagrange multiplier that ensures the spherical constraint.
In any stationary point, it takes the value The radial reaction is directly related to the nature and the stability of the stationary points; in fact, the Hessian matrix (intended to be restricted to fluctuations on the sphere) reads It is well known [36] and rigorously proven [40] that H 00 ij is a random matrix that belongs to the Gaussian orthogonal ensemble (GOE) with variance Var½H 00 ij ¼ ð1=NÞf 00 ð1Þ. Thus, the matrix M has a semicircular spectral distribution ρðλÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðλ−μÞ 2 −4f 00 ð1Þ p =(π ffiffiffiffiffiffiffiffiffiffiffi f 00 ð1Þ p ), which has a lower band edge at λ min ¼ μ − 2 ffiffiffiffiffiffiffiffiffiffiffi f 00 ð1Þ p . We notice that the value of the radial reaction determines the nature of the stationary points [41]. It is natural to define a marginal value of the radial reaction μ mg ¼ 2 ffiffiffiffiffiffiffiffiffiffiffi f 00 ð1Þ p that separates minima from saddles with an extensive number of negative directions. Stationary points of H are overwhelmingly minima if μ > μ mg , and they are saddles or maxima if μ < μ mg . Notice that all marginal minima with vanishing spectral gaps lie on the manifold specified by μðσÞ ¼ μ mg . In the pure model, at any stationary point, the radial reaction and the energy verify μ ¼ −pE. In the mixed model, there is no such one-to-one relation between μ and E in stationary points. This case can be seen by computing the number of stationary points in Eq. (2), for which both E and μ are fixed, N ðE; μÞ ¼ e NΣðE;μÞ .
The computation of the complexity ΣðE; μÞ was performed in Ref. [42], and we quote the result here [for completeness, a physicist's derivation and the explicit formula for DðμÞ is provided in Appendix C], ΣðE; μÞ ¼ DðμÞ − E 2 f 00 ð1Þ þ ðE þ μÞ 2 f 0 ð1Þ 2ffð1Þ½f 00 ð1Þ þ f 0 ð1Þ − f 0 ð1Þ 2 g : ð5Þ The above expression holds when it returns a positive value. Fixing μ, the complexity ΣðE; μÞ is a parabola in E with a curvature that is finite for mixed models [see Fig. 2, where we show the complexity for the (3 þ 4)-spin model] and that tends to infinity in the pure limit. For μ < μ max , with μ max implicitly defined by 0 ¼ ΣðE max ; μ max Þ ¼ ∂ E ΣðE max ; μ max Þ with E max the location of the parabola maximum, there is a finite interval of energies ½E − ðμÞ; E þ ðμÞ such that, within the interval, ΣðE; μÞ > 0. Most importantly, this is true for the threshold value of the radial reaction μ mg . Seen from the point of view of μ, the organization of the stationary points resembles the one of the pure model; however, from the point of view of the energy, there are deep differences. In the pure model, for a given level of energy, the index of any stationary point [43] differs, at most, by a finite number from the index of dominating ones. In the mixed models, for a given energy, stationary points coexist with macroscopically different indices: i.e., their spectra are shifted by a finite amount. In particular, as it is apparent in Fig. 2, there is a whole interval in energy where marginally stable states exist. A unique The solid line (E < E th ) represents the complexity of the dominant stable minima (μ > μ mg ¼ 6); its continuation above E th (shortdashed line, μ < μ mg ¼ 6) represents the complexity of the dominant saddles. The parabolic long-dashed lines below represent the complexity fixing the radial reaction: μ ¼ 6 corresponds to marginal minima (dashed blue line), μ > 6 to stable minima, and μ < 6 to saddles. For the same value of E, stationary points of different nature coexist, and more importantly, marginal states (μ ¼ 6) exist in a broad energy range E ∈ ½−1.716; −1.644.
relation between E and μ only holds for the exponentially dominating stationary points.

V. DYNAMICAL EQUATIONS IN THE OUT-OF-EQUILIBRIUM REGIME
Given the Hamiltonian in Eq. (1), we consider the following Langevin dynamics: where T f is the temperature of the thermal bath (later set to zero) and β ¼ 1=T is the inverse temperature at which the initial condition is equilibrated. The time-dependent radial reaction force μðtÞ is the Lagrange multiplier that constrains the dynamics on the sphere, and at time t, it takes the value Taking the thermodynamic limit, N → ∞, at finite times, Eq. (6) implies closed integrodifferential equations [14,44,45] for the correlation Cðt; t 0 Þ ≡ hσ i ðtÞσ i ðt 0 Þi and for the response Rðt; t 0 Þ ≡ ½∂hσ i ðtÞi=∂ξ i ðt 0 Þ that vanishes for t < t 0 and verifies Rðt þ ; tÞ ¼ 1. For an initial temperatures T greater than the Kauzmann temperature of the thermodynamic phase transition T K , the dynamical equations, valid for t > t 0 , read [35] ∂ t Cðt;t 0 Þ¼−μðtÞCðt;t 0 Þþ The initial condition enters the above equations in a rather simple way. For β ¼ β f , one gets equilibrium dynamics, whose MCT transition is illustrated in Appendix D. For β ¼ 0, the equations reduce to the usual form that is valid for an uncorrelated initial condition. In this case, for T f < T MCT , the equations admit a Cugliandolo-Kurchan weak ergodicity-breaking aging solution completely analogous to the one of the pure model. It is possible to show that in this solution, for T f → 0, the radial reaction μðtÞ tends to the marginal value μ mg for t → ∞, while the energy tends to the threshold value In correspondence with this value of energy, the complexity ΣðE th ; μÞ, as a function of μ, is maximum for μ ¼ μ mg . In other words, E th is not the energy of the most numerous marginal states but the energy where marginal states exponentially dominate the complexity.
It is possible to see that for T < ∞, the same aging solution solves the dynamical equations in the long-time limit, if one assumes that memory of the initial condition is lost and Cðt; 0Þ → 0 at a large time. If this is the correct description for all T > T MCT , then the asymptotic energy would be independent of T in this domain.
We study the dynamical equations in Eq. (8) by numerical integration. The simplest possibility, which we adopt here, is a fixed time step Δt, first-order Euler discretization algorithm that gives very reliable results at short times [46] and allows for reliable extrapolations in the Δt → 0 limit (see Appendix F). Other algorithms with variable time steps have been proposed and used in the literature [47,48]. Unfortunately, in the case of a mixed p-spin model with a correlated initial condition (β > 0), these algorithms appear to be unstable at short times and do not allow any improvement over the simplest one. With the Euler integration scheme and a maximum step of Δt ¼ 0.1, we reach times of order 10 3 . These timescales are often not large enough to allow for a simple naive extrapolation to the infinite-time limit. Nonetheless, under some conditions, they will allow us to make clear claims on the large-time dynamics. In this paper, we concentrate on the case of zerotemperature dynamics, where the energy monotonically decreases and, once below threshold, it cannot increase again. However, we show in Sec. VII that this T f ¼ 0 result extends to a range of positive temperatures, T f > 0.

VI. AFFINITIES AND DIVERGENCES BETWEEN MIXED AND PURE MODELS
Here, we discuss the core question of this paper and compare the energy reached by the T f ¼ 0 gradient descent FOLENA, FRANZ, and RICCI-TERSENGHI PHYS. REV. X 10, 031045 (2020) 031045-6 dynamics from different initial conditions to the threshold energy that can be computed from the asymptotic solution with β ¼ 0 [15] or by computation of the complexity of minima [16]. For the 3 þ 4 model, the value of the threshold energy is E th ¼ −71=42 ≃ −1.6905. In Fig. 3, we show the curves of the energy as a function of time for different T, ranging from T ¼ ∞ to T ¼ 0.796, the dynamical transition temperature of the model being T MCT ¼ 0.805166 (see Appendix D). Data are obtained from an integration with step Δt ¼ 0.1, reaching the maximum time of 2500. This step size gives a relative integration error smaller than 10 −5 (see detailed discussion in Appendix F).
On the timescale we can reach and for T ≳ 1, the energy seems to have reached an asymptotic behavior well fitted by a power law, EðtÞ ¼ E th þ a=t γ , with a slightly dependent on temperature and γ ¼ 0.66 AE 0.01, making us confident that E th is the asymptotic value of the energy in this range of temperatures. On the other hand, if T is small enough but still larger than T MCT , the curves of the energy go below E th . For comparison, in the upper inset we present the same curves for the pure model with p ¼ 3. In that case, the energy manifestly tends to the threshold value for all T ≥ T MCT . These results suggest a scenario with a new dynamical transition in the mixed model, with an onset temperature T onset separating a memoryless phase for T > T onset from a phase with memory at T < T onset . It is also interesting to look at T < T MCT . In our data, we do not observe anything special happening at T MCT ; it is only below a state-following temperature T SF ≈ 0.798 that we observe fast relaxation and the energy decaying exponentially to its asymptotic value. We can predict the asymptotic energy in this region through a quasistatic solution of the dynamical equations ("state following" with memory of the initial condition). In this region, the dynamics just consists in a simple relaxation inside a stable energy minimum, which conserves its identity as a metastable ergodic component when the temperature is changed. In principle, the state-following procedure could also be used above T SF , but we know, in that regime, it provides just an approximate solution that eventually disappears if the temperature is too high [39]. We discuss this solution in more detail in Appendixes A and B. In the next section, we show that a small temperature T f in the dynamical equations does not change this behavior.
In order to understand the dynamical mechanisms that allow us to beat the threshold, we look at the behavior of the response and correlations. The first quantity that we study is the correlation with the initial state, Cðt; 0Þ. According to the usual weak ergodicity-breaking scenario, this quantity should vanish at large t. Both in the mixed and in the pure model, we observe that the relaxation of Cðt; 0Þ is slower and slower as T decreases. As for the energy, we can identify three temperature regimes for T > T K : (I) a high-temperature standard aging regime T > T onset , where memory of the initial condition is lost and Cðt; 0Þ → 0; (II) an intermediate hic sunt leones regime T SF < T < T onset , where the relaxation is slow, any extrapolation is difficult, but evidence suggests aging with memory is taking place, i.e., q 12 ¼ lim t→∞ Cðt; 0Þ > 0; (III) a low-temperature state-following regime T < T SF , where Cðt; 0Þ relaxes exponentially fast to q 12 > 0.
While no aging is found in regime III, in regimes I and II, we find aging that is qualitatively similar in the two cases. In Fig. 4  Even though the energy goes below threshold, we find compelling evidence that the dynamics remains critical and approaches asymptotically, marginally, stable minima. We reach this conclusion by studying the asymptotic behavior of the radial reaction μðtÞ, which, for t → ∞, determines the spectral gap of the asymptotically visited states. In Fig. 5, we can clearly see that both in region I and II, the radial reaction tends to the marginal value μ mg ≡ 2 ffiffiffiffiffiffiffiffiffiffiffi f 00 ð1Þ p of marginal minima. In region III, by contrast, the system reaches stable minima and μð∞Þ > μ mg . Aging behavior is often qualified by studying the relation between the response and correlation [49]. Figure 6 shows the parametric plot of the integrated response χðt; sÞ ≡ R t s Rðt; uÞdu versus Cðt; sÞ as a function of s for various values of t. In zero-temperature dynamics, in the large-time limit, the integrated response has a jump in C ¼ 1 to the intrastate susceptibility χ EA . In a given minimum, and in marginal minima, The curves in Fig. 6 show the formation of a jump in C ¼ 1 that is very well compatible with this value.
The memoryless solution [15] predicts a linear behavior for C < 1 with the slope given by the "fluctuationdissipation ratio" y 0 ≡ ffiffiffiffiffiffiffiffiffiffiffi f 00 ð1Þ p =f 0 ð1Þ − χ mg . On the timescales we can access, both the pure and the mixed models show, for all the values of T > T SF , after the jump to χ ≃ χ mg , an approximately linear part. The linear behavior continues until a rather well-defined timedependent value of the correlation, q 0 ðtÞ > Cðt; 0Þ, where the response essentially stops increasing. The value q 0 ðtÞ identified in Fig. 4 marks the value that separates slow dynamics, where an effective temperature emerges, from fast partial decorrelation from the initial condition, during which the system does not respond (see Appendix E). In the memoryless phase, T > T onset , both q 0 ðtÞ and Cðt; 0Þ go to zero for t → ∞, while they tend to nonzero limits in the aging-with-memory phase.
Having established that the system relaxes towards marginal minima, the question is which marginal minima are selected. In an attempt to address this question, we study the constrained complexity [50] ΣðE; μ; q 12 ; TÞ of energy minima of fixed radial reaction μ, energy E, and correlation q 12 from a reference configuration thermalized at temperature T. The quantity for q 12 ¼ 0 reduces to the one in Eq. (5) studied in Sec. IV. The detailed computation of this constrained complexity is presented in Appendix C. The computation reveals that for q 12 > 0, we have qualitatively the same picture illustrated in Fig. 2, but with ΣðE; μ; q 12 ; TÞ < ΣðE; μÞ, with ΣðE; μ; q 12 ; TÞ monotonically decreasing in q 12 . One can define a generalized threshold energy that depends on T and q 12 , E th ðq 12 ; TÞ,  where marginal states dominate the constrained complexity. It turns out that E th ðq 12 ; TÞ is a decreasing function of q 12 and T. Unfortunately, as explained in detail in Appendix C, while the resulting E th ðq 12 ; TÞ has the right qualitative behavior, the corresponding minima have features that are incompatible with the actual attractors of the dynamics. The complete static characterization of the attractors of the dynamics remains an important open problem.

VII. EXTENDING RESULTS TO FINITE-TEMPERATURE RELAXATION DYNAMICS
As we said in Sec. V, for T f ¼ 0, the energy monotonically decreases, while if the bath temperature T f is positive, this is not necessarily the case. If we integrate the dynamical equations for a small temperature T f , we observe that the finite-time energy is continuous, At T f > 0, the Lyapunov function of the Langevin dynamics, which decreases monotonically in time, is the free-energy functional [51]: where P t ½σ is the probability distribution of configurations at time t induced by the initial condition and the dynamics. In principle, the energy function EðtÞ could be nonmonotonous in time for T f > 0 because an entropy increase ΔSðtÞ ¼ S ∞ − SðtÞ > 0 can induce an energy increase ΔE > 0. However, since the free energy is a decreasing function of time, we have ΔF ¼ ΔE − T f ΔS < 0, and the energy increase is upper-bounded by ΔE < T f ΔS. This quantity tends to zero for T f → 0. Consequently, if the temperature is small enough, the energy EðtÞ, which is below threshold by an Oð1Þ amount, cannot be pushed back to the threshold value.
In Fig. 7, we show the energy reached at several times t ∈ f125; 250; 500; 1000g during the integration of the dynamical equations as a function of the temperature T f . The topmost solid line is the analytic prediction for the threshold energy as a function of T f . We observe that the dynamical energy lies below the threshold energy and shows a continuous and nicely linear behavior in T f . The weak time dependence of the data shown in Fig. 7 suggests the energy is close to its asymptotic value. Even insisting that an eventual entropy increase may induce an energy increase at later times, this increase would tend to zero for small T f , and thus, by continuity, there would be a finite T f range where the energy would certainly remain below the threshold value even at an infinite time.

VIII. APPROXIMATE ASYMPTOTIC SOLUTION TO THE DYNAMICS
The dependence of the final energy on T in regime II witnesses a form of memory of the initial condition. In contrast, thermalized configurations lie in basins of attraction of different marginal states. The asymptotic ansatz of Cugliandolo and Kurchan has been generalized in Ref. [27] by supposing "aging within a metabasin" and a nonvanishing correlation with the initial state lim t→∞ Cðt; 0Þ ¼ q 12 > 0. Unfortunately, the resulting equations for χ, y, q 12 , q 0 only have an aging solution that is different from the "amnesic" one (q 12 Even when the solution exists, the values of the various parameters do not match the observations; in particular, one finds y ≈ 0.3, while in dynamics, y ≈ 0.52. This aging solution terminates at the temperature T SF , where q 0 tends to 1. Below that temperature, there is a stable state-following solution with no aging, q 12 > 0, μ > μ mg and χ < χ mg (see Appendix A for a detailed derivation). This solution correctly reproduces the asymptotic values of the energy, correlation, and radial reaction of the large-time dynamics for temperatures T ≲ T SF . Despite all the efforts, we have been unable to find a different asymptotic dynamical ansatz describing aging with memory in the hic sunt leones regime II and/or to find in the numerics an indication of where the memory of the initial condition could affect the asymptotic solution. The lack of simple aging solutions is certainly related, and equally paradoxical, to the phenomenon of a lost solution in the state-following procedure of Ref. [39].
Insisting on using a 1RSB dynamical ansatz with χ EA ¼ χ mg , μ ¼ μ mg , and y ≃ y 0 , which is supported by the numerical solution, one can easily derive the following relation: A second relation can be obtained from the observation that χ tot ðtÞ ≡ R t 0 dsRðt; sÞ ≈ χ mg þ y 0 (1 − q 0 ðtÞ) depends approximately linearly on Cðt; 0Þ even at finite times (see dotted lines in Fig. 6). Moreover, this relation does not present an appreciable temperature dependence in the range T SF ≤ T ≲ 1 (data shown in Appendix B) and implies a linear relation between q 12 and q 0 that can be estimated from two analytically known limits: For T large enough, q 12 ¼ q 0 ¼ 0, while at T ¼ T SF , we have q 0 ¼ 1 and q 12;SF ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 − f 0 ð1Þ=f 00 ð1Þ p . Assuming the relation q 12 ¼ q 12;SF q 0 , Eq. (10) admits a solution with q 0 > 0 if T < T onset ≡ q k 12;SF =y 0 , where k is defined by fðqÞ ∝ q k for q → 0 (in the 3 þ 4-model T onset ¼ 0.91). This approximated solution gives an asymptotic energy close to the one extrapolated from the numerical integration (see Fig. 8). Even though this is not an exact solution of the asymptotic equations, it gives us a strong indication that the passage from memoryless aging to aging with memory could be marked by a phase transition. In addition, it shows that the exact solution cannot be too far from a 1RSB weak ergodicity-breaking solution inside a basin.

IX. DISCUSSION AND PERSPECTIVES
Solving the out-of-equilibrium relaxation dynamics in the spherical mixed p-spin model, starting from a thermalized configuration at temperature T, we have uncovered several interesting and unexpected features. While the known solution of the relaxation dynamics in the pure p-spin model suggested the existence of a unique dynamical phase transition at T MCT , where ergodicity breaks down, and of a unique threshold energy where the dynamics relaxes below T MCT , our results about the mixed p-spin model reveal a much richer scenario.
Our main result is summarized in the schematic picture in Fig. 9: In the mixed p-spin model, there is an entire temperature range around T MCT (especially above it) where the final energy depends on T and memory of the initial condition is kept. A temperature T onset marks the onset of this effect. This behavior resembles realistic glass formers but was unexpected in mean-field models because such a phenomenon was ascribed to activated processes, which are absent in the out-of-equilibrium relaxation of mean-field models. Our results show, contrary to common wisdom, that the dependence on the initial configuration can arise even in long-range models, following a purely relaxation dynamics.
From the numerical solution of the dynamical mean-field equations, this dynamical onset of memory may be a crossover or a new genuine phase transition. We have found an approximate solution of the relaxation dynamics at large times that describes the onset as a phase transition and provides an analytic prediction for the onset temperature. Thus, we finally have a solvable (although approximate) model whose predictions can be compared with numerical experiments.
Another piece of common wisdom that the present work demystifies is the mantra "relaxation dynamics in meanfield models relaxes to the threshold energy where the most numerous metastable states are found." Calling threshold energy the energy where the relaxation dynamics converge if starting from a random configuration (T ¼ ∞), we have shown that (i) metastable states of an infinite lifetime (i.e., locally stable states) also exist above the threshold energy and (ii) the most numerous states are above the threshold energy. The threshold energy can thus be understood as the energy value where the measure is dominated by marginal states, which can be computed via the unconstrained complexity. Above the threshold energy, an exponential number of marginal states already exist, but the measure over stationary points is dominated by saddles. The same argument is trivialized in the pure p-spin model since, in the latter, marginal states exist only at the threshold energy; thus, assuming convergence to marginal states, we can uniquely determine the corresponding threshold energy.
We have shown evidence that the correct way to understand the large-time limit of the relaxation dynamics is to assume that such a dynamical out-of-equilibrium process  always tends to marginal states. For T < T onset , the persistent memory of the initial configuration suggests that we compute a constrained complexity of these marginal states that might be correlated with the initial condition. We have performed this task with different levels of accuracy and success.
For an initial temperature below T SF , which is strictly smaller than T MCT in the mixed p-spin model, the starting configuration lies in a well-defined metastable state that can be followed down to zero temperature. In this situation, the dynamics never enters an aging regime, and its asymptotic behavior can be computed from a standard state-following computation.
The most interesting regime is the one reached with an initial temperature T SF < T < T onset . The relaxation process falls asymptotically in an aging regime, but it remains correlated with the initial configuration. In this regime, the constrained complexity, while confirming the existence of marginal states below threshold and correlated with the initial condition, fails to identify the ones chosen by the dynamics.
We notice that the mode-coupling transition temperature lies in this regime, T SF < T MCT < T onset , but plays no role at all in the out-of-equilibrium relaxation, which is controlled by T onset and T SF . The central role T MCT that has played until now is due to the accidental equality T SF ¼ T MCT ¼ T onset holding in the pure p-spin model. All our results remain true for dynamics performed at a small temperature T f .
We believe that, contrary to the pure p-spin model, the mixed p-spin model we studied is generic enough that the properties we uncovered should typically hold in models with a mean-field RFOT, such as Potts glasses [52], Ising p-spins [53][54][55], and, importantly, glasses of spherical particles in the infinite-dimensional limit [37].
In conclusion, on the basis of the above results, we have to revisit our common beliefs about the relaxation dynamics in p-spin models and, more generically, in models presenting a mean-field RFOT. The physical picture that has been a central paradigm of our scientific community, where a simple connection between the relaxation dynamics and the complexity of metastable states exists, is unfortunately valid only in the case of the pure p-spin model and is false in more general models.
In pure p-spin models, strong symmetries lead to the equalities T SF ¼ T onset ¼ T MCT , and these make many of the interesting phenomena disappear that we described in this work and that have misled us for two decades with a too-simplistic connection between the dynamics and energy landscape. Mixed models appear to be well suited to describe the dynamical onset of glassy phenomenology; they have much more complex energy landscapes, which deserve a more thorough investigation in the future. The exact solution to the asymptotic dynamics remains unknown, and the connection between asymptotic dynamics and the energy landscape is still very open in spherical mixed p-spin models. Further research is needed in two directions: finding algorithms capable of reaching large times in solving the dynamical equations, and establishing new theoretical ideas on possible structures of the aging with memory to understand the asymptotic aging regime.
The richer scenario we have uncovered in this study is likely to also have an impact on the many applications in machine learning we discussed in the Introduction. Given the simplest connection between relaxation dynamics and the energy landscape, it is clear that more in-depth studies of the energy landscape will be needed in order to predict the performance of algorithms (a first example can be found in Ref. [56]), especially those that do not start from a random configuration but use some smart initialization to improve their performance [57].

APPENDIX A: ASYMPTOTIC SOLUTIONS
In Ref. [27], it was shown that the asymptotic solution to the dynamical equations, assuming a simple aging with memory within a state with a unique effective temperature, has parameters χ, q 0 , q 12 , and y that satisfy the same equations that can be obtained by extremizing the Franz-Parisi (FP) potential computed in the one-step replica symmetry-breaking (1RSB) scenario, provided that the parameter y is fixed by a marginality condition.
In our case, the 1RSB FP potential has to be computed at zero temperature (T f ¼ 0), while the reference configuration is in equilibrium at temperature T ¼ 1=β, − 2V 1RSB ðq 12 ; χ; q 0 ; yÞ The saddle-point equations and the marginality condition thus read RETHINKING MEAN-FIELD GLASSY DYNAMICS AND ITS … PHYS. REV. X 10, 031045 (2020) 031045-11 We notice en passant that Eqs. (A2) do not always select the minima of the FP potential because, in order to do that, one needs to set to zero the total derivative with respect to q 12 and not the partial derivative. The energy and the radial reaction are given, respectively, by Equations (A2) always admit the solution with q 12 ¼ q 0 ¼ 0, representing the memoryless or "amnesic" aging solution with parameters and, consequently, the energy and radial reaction are given by For the (3 þ 4)-spin model, the numerical values for the above parameters are χ mg ¼ 1=3, y 0 ¼ 11=21 ≃ 0.52381, E th ¼ −71=42 ≃ −1.69048, and μ mg ¼ 6.
A nontrivial aging-with-memory solution with strictly positive values for q 12 and q 0 , disconnected from the amnesic aging solution, exists only for temperatures below T 0 . Moreover, at an even lower temperature T SF , we have that q 0 → 1, and this solution becomes replica symmetric. In Fig. 10, we report the values of q 0 , q 12 , and y in the aging solution with memory. In the (3 þ 4)-spin model, the existence domain for this solution is upper bounded by T 0 ¼ 0.8031557, where the solution disappears by a square-root singularity, and lower bounded by T SF ¼ 0.7982754, where q 0 → 1 and the solution becomes replica symmetric (RS). At T ¼ T SF , the parameters extremizing the potential can be computed analytically. The value of q 12 is more easily obtained from the RS solution (see below), while the value of y can be obtained from the third-order expansion in ε ¼ 1 − q 0 of the difference between the two equations in (A2), thus obtaining The numerical integration of the off-equilibrium dynamics does not give any indication in favor of this solution; in particular, for the (3 þ 4) model, this solution has y ≈ 0.3 in the whole range of validity, which is incompatible with the value y ≈ 0.52 that we obtain from the numerical solution of the dynamics. Even if we believe that the dynamics would slowly cross over to this solution on timescales that we cannot observe numerically, this would not solve the puzzle of the behavior of the asymptotic energy going below threshold for T > T 0 . In this solution, the value of q 0 tends to 1 at temperature T SF , well above the temperature of the static (Kauzmann) phase transition of the model. The solution above is the only exact aging solution that we found; we searched for, without success, more complex solutions with more than one effective temperature. When q 0 → 1, the 1RSB potential reduces to the RS potential, FOLENA, FRANZ, and RICCI-TERSENGHI PHYS. REV. X 10, 031045 (2020) takes the marginal value χ mg at T SF and becomes smaller than this value at lower T, as shown in the upper panel of Fig. 11. Our estimate T SF ¼ 0.7982756 comes from exactly imposing χ ¼ χ mg in this RS solution. In the lower panel of Fig. 11, we plot the values of q 0 and q 12 obtained in the RS state-following solution on the left of the vertical axis, which is located exactly at T ¼ T SF . On the right of the vertical axis, we show the corresponding values computed in the aging RSB solution (the same as plotted in Fig. 10). We observe that q 12 is a smooth function at T SF . Its value at T SF can be computed analytically from the first equation in Eq. (A6) by imposing χ ¼ χ mg , leading to For T ≤ T SF , the FP potential has a secondary minimum described by a replica symmetric ansatz. Thus, we are effectively describing the quenching process from temperature T to zero temperature as a state-following process [39]: The observation that, both at the starting and ending temperatures, the state we are following is well described by a replica symmetric ansatz is evidence that static-dynamics equivalence holds in this case. This is indeed what we observe by comparing the asymptotic dynamics obtained by numerically integrating the dynamical equations to the values derived here from the thermodynamic FP potential. For T ≤ T SF , aging disappears, and one finds a simple agingless relaxation within a state described by the parameters computed in the secondary minimum of the FP potential.
Given the difficulty in finding a 1RSB solution by solving the equations for the asymptotic dynamics, one may wonder whether a solution with more steps of RSB would be helpful. The thermodynamics of the model, even under the constraint of a fixed overlap with the initial configuration, is exactly solved by the 1RSB ansatz. Thus, a higher-order RSB solution is unlikely to be needed, and, as shown in Appendix B, the actual asymptotic dynamics looks very similar to the one of a 1RSB model. Nonetheless, we cannot exclude such a possibility: Let us just stress that, if different RSB ansatz are required for the thermodynamic and the dynamic solutions, the staticsdynamics connection that has been used so much in the field of spin glasses [58][59][60] would be strongly weakened.

APPENDIX B: SEMIEMPIRICAL RELATIONS TO DESCRIBE LARGE-TIME DYNAMICS
We have seen that there is no simple aging solution to the asymptotic equations with q 12 > 0 that is consistent with the finite-time numerical integration of the dynamical equations. Nonetheless, the FD plot in the mixed (3 þ 4)-spin model does not differ in any appreciable way from the usual ones of the pure p-spin model with a single effective temperature and a value of y equal to the one that holds for the β ¼ 0 initial condition. Representing the data in the hic sunt leones region II, as much as possible, with a single-effective-temperature dynamical ansatz, we study what kind of relation can be derived between the overlaps q 0 and q 12 . We suppose that μ and χ take the marginal values μ mg ≡ 2 ffiffiffiffiffiffiffiffiffiffiffi f 00 ð1Þ p and χ mg ≡ 1= ffiffiffiffiffiffiffiffiffiffiffi f 00 ð1Þ p . Moreover, we assume that the FD slope in the range Cðt; t 0 Þ ∈ ½q 0 ðtÞ; 1 is independent of the initial temperature; thus, it is equal to the value obtained with β ¼ 0, i.e., y 0 ≡ ffiffiffiffiffiffiffiffiffiffiffi f 00 ð1Þ p =f 0 ð1Þ − 1= ffiffiffiffiffiffiffiffiffiffiffi f 00 ð1Þ p . Finally, we assume the response is null in the range Cðt; t 0 Þ ∈ ½Cðt; 0Þ; q 0 ðtÞ, where the correlation decays extremely fast.
Within this 1RSB ansatz, the asymptotic value for the radial reaction is given by which must hold in the asymptotic solution. We notice at this point that using Eq. (A3), the energy can be written as Only in the pure model does Eq. (B1) imply ΔE ¼ 0; in all the other cases, it gives a nonvanishing ΔE.
A second relation between q 0 and q 12 can be derived from the observation of the FD plots in Fig. 12, where each curve is a parametric plot of χðt; t 0 Þ versus Cðt; t 0 Þ at fixed t, varying t 0 . We notice that, below the onset temperature, the points (Cðt; 0Þ; χðt; 0Þ) closely follow the dash-dotted curve that we derive analytically. It is important to remark that the dash-dotted curve is the same in all the panels of Fig. 12, so it is a temperature-independent relation. In addition, the dashed line is temperature independent and corresponds to the FD relation in the aging solution with β ¼ 0, which is a line of slope −y 0 connecting the points ð0; χ mg þ y 0 Þ and ð1; χ mg Þ.
In order to obtain the dash-dotted line, we assume that in the aging regime [i.e., for χðt; 0Þ ≥ χ mg ], the relation between Cðt; 0Þ and χðt; 0Þ is linear, while in the regime χðt; 0Þ < χ mg , the dynamics asymptotically explores a state; thus, we assume the relation holds within the RS state-following solution. The latter can be easily derived from the first equation in Eq. (A6), which thus holds for q 12 ∈ ½q 12;SF ; 1, while the linear part has slope −y 0 =q 12;SF and passes through the points ð0; χ mg þ y 0 Þ and ðq 12;SF ; χ mg Þ. We notice that the dash-dotted curve has a continuous first derivative at the point ðq 12;SF ; χ mg Þ, as can be easily checked by taking derivatives. Our asymptotic ansatz thus implies a very simple relation between the overlaps describing the asymptotic aging regime, namely, and plugging this relation into Eq. (B1), it is easy to find that a solution with q 12 > 0 can exist only if where k is defined by fðqÞ ∝ q k for q → 0, i.e., the smallest power appearing in f (in the 3 þ 4 model, T onset ¼ 0.91). Even though this is not an exact solution of the asymptotic equations, it is a strong indication that there is a phase transition between a memoryless phase where dynamics decorrelates from the initial condition and falls over the "usual" threshold states with E ¼ E th and a phase where aging takes place in a confined space, with an asymptotic energy below threshold and depending on T.
In Fig. 13, we report even stronger evidence that this approximate solution provides a very good description of  There is no need to comment on which one is better in describing the asymptotic dynamics. Given that the 1RSB solution has three main defects: it is not available above T 0 , it predicts a slightly wrong q 12 and a completely wrong y.
Since the onset temperature in Eq. (B4) depends only on the smallest-degree monomial in fðqÞ, one may wonder what the different effects are of perturbing a pure model with a smaller-or larger-degree polynomial. In order to answer this interesting question, let us study a mixed p-spin model that interpolates between pure p-spin models by just changing a single parameter a, According to this definition, fðqÞ is at most a binomial; it recovers a pure p-spin model for a ¼ 0, 1, 2, while it reduces to our reference mixed model for a ¼ 1=2.
Given the formula in Eq. (B4) for the onset temperature, we expect a different behavior when a pure p-spin model, e.g., the four-spin model (a ¼ 1), is perturbed by a smallerdegree (a ¼ 1 − ε) or larger-degree (a ¼ 1 þ ε) term. This behavior is clearly visible in Fig. 14, where we draw the phase diagram in the ða; TÞ plane according to the approximation discussed in this Appendix.
The nonanalytic behavior of the onset temperature T onset around the pure models may appear strongly counterintuitive, as one would expect T onset to reduce to T MCT in the pure case. Such a limit is obtained only when the perturbing term is of higher order, e.g., in the limit a → 1 þ in the present case, T onset → T MCT as shown in Fig. 14. On the contrary, when the perturbing term is of lower order, e.g., for a → 1 − in the present case, we have a discontinuity in T onset (see Fig. 14). Such a discontinuity could be a genuine effect or an artefact of our approximation.
To prove that the singular behavior in T onset is not incompatible with the physics of the system, let us show that perturbing a pure p-spin model with a higher-or lowerorder term produces different effects on the relaxation dynamics. In Fig. 15 a ¼ 0.99, we observe an undershoot and a clear dependence of the asymptotic energy on the temperature. This dynamical effect, clearly visible even if the perturbing term is tiny, could be due to the existence of an onset temperature T onset larger than T MCT in the limit a → 1 − as suggested by our approximation. The simplest explanation for the undershoot in the energy relaxation observed for a ¼ 0.99 in Fig. 15 can be provided by assuming that lower-order interactions in the energy are satisfied (i.e., minimized) sooner during the relaxation, and once satisfied, they become a sort of pinning field for the relaxation of the remaining energy terms. For this reason, even a tiny fraction of three-spin terms would have a visible effect in the relaxation of a four-spin model, with the energy E 3 becoming too small too soon and eventually increasing again when the four-spin part is satisfied.
Nonetheless, we show in Fig. 16 that the nonanalytic behavior of T onset does not correspond to a discontinuity in the approximated value of ΔE, which tends to zero in the limit of a pure model from both sides (as long as T > T MCT ). In Fig. 17, we show, in particular, the behavior approaching a ¼ 1 from below. We notice that, while T onset − T SF remains finite (as shown in Fig. 14), the energy excess ΔE vanishes in the whole interval ½T onset ; T SF when a → 1 − . This result means that, in such a limit, T onset cannot be easily detected from the energy relaxation.

APPENDIX C: COUNTING THE MINIMA
In this Appendix, we try to answer the question of whether or not the attractors of the dynamics can be well described in terms of typical marginal saddles and minima that lie close to the initial configuration. Let us consider the stationary points of the Hamiltonian H½σ on the sphere P i σ 2 i ¼ N: As in dynamics, the radial reaction μ takes, in any stationary point, the value We classify the stationary points according to their energy E ¼ ð1=NÞH½σ and the value of the radial reaction μ. Differently from pure models where μ ¼ pE, the relation between E and μ here is not univocal, and stationary points are found in a whole region of the ðE; μÞ plane.
We have seen that for T > T SF , dynamics is attracted by some family of marginal minima. In order to characterize these minima, we count the number of stationary points of the Hamiltonian H½σ with fixed energy E and radial reaction μ that lie at a fixed overlap σ · σ 0 ¼ Nq 12 from a reference configuration σ 0 , weighted with a Gibbs measure e −βH½σ 0 . Since the complexity, i.e., the logarithm of their number, is self-averaging, we write ΣðE; μ; q 12 ; βÞ ¼ The computation of Σ is standard and can be performed in several steps. First of all, since the matrix H 00 is a GOE random matrix, the distribution of eigenvalues of μI þ H 00 is self-averaging and is a shifted semicircular ρðλÞ ¼ f2=½π4f 00 ð1Þg ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 4f 00 ð1Þ − ðλ − μÞ 2 p . The modulus of the determinant j detðμI þ H 00 Þj is self-averaging, and its logarithm reads which only depends on μ. The imaginary part of this expression is the proportion of negative eigenvalues. To evaluate the remaining terms, we use replicas and write Σ ≡ logðN Þ ¼ lim n→0 ½ðN n − 1Þ=n. We concentrate on the case with temperatures greater than the static transition temperature (T > T K ) of the model, where the partition function appearing in the denominator of Eq. (C3) is self-averaging and takes its annealed value Z β ¼ e ðN=2Þβ 2 fð1Þ . One can then average over the disorder and the configuration σ 0 at the same time. Opening the delta function in the Fourier basis, ΣðE; μ; p; βÞ ¼ lim Z Ds e P n a Nðiβ a E−iσ a ·σ a μÞ δðNq 12 − σ a · σ 0 Þe −βH 0 e P n a ðiβ a þiσ a ·∇ÞH a þ NDðμÞ; ðC5Þ Since the disorder is Gaussian, Now, we define overlap variables NQ ab ¼ σ a · σ b , Nχ ab ¼ iσ a ·σ b , and NR ab ¼ −σ a ·σ b , and the overlaps with the reference configuration Nq 12 ¼ σ a · σ 0 , Nχ p ¼ iσ a · σ 0 . This change of variables defines a matrix where Q aa ¼ 1 due to the spherical constraint. In addition, from the equivalence between replicas, we fix iβ a ¼ y and χ aa ¼ χ ∀ a. With this change of variables, Eq. (C5) becomes where 1 2 logðdet QÞ is the volume factor that comes from the change of variables from spins to overlaps. To get the leading N contribution, we must extremize with respect to all the overlap parameters Q and y. We notice that a further simplification to the expression (C6) can be obtained by first extremizing with respect to R ab . Assuming a replica symmetric ansatz for the overlap matrices Q and χ, i.e., Q ¼ δ ab þ ð1 − δ ab Þq 0 and χ ab ¼ δ ab χ þ ð1 − δ ab Þχ 1 , we get, in the limit n → 0, ΣðE; μ; q 12 ; β; y; χ; χ 1 ; χ p ; q 0 Þ ¼ þyE − μχ þ β½yfðq 12 Þ þ χ p f 0 ðq 12 Þ þ 1 2 where This result can be extremized explicitly with respect to y, χ, χ 1 , χ p , while q 0 extremization has to be done numerically. For q 12 ¼ 0, the solution is q 0 ¼ 0, and we recover the expression found in Ref. [42], This unconstrained complexity is plotted in Fig. 2. For any value of μ, ΣðE; μÞ has a concave parabolic shape as a function of the energy, and many values of the energy are possible for the same μ value. In particular, this case is true for μ ¼ μ mg : There is not a unique threshold energy (at variance to pure p-spin models) but a whole interval for which ΣðE; μ mg Þ > 0. We define the dominating stationary points at a given value of the energy as the ones that maximize Σ as a function of μ. The threshold energy as defined by dynamics corresponds to the values of the energy that separates minima from saddles on the dominating line. Notice that this value does not correspond to the most numerous marginal minima, which occur for μ ¼ μ mg and E > E th , but to the point where the most numerous stationary saddles become minima. This observation sheds some light on the dynamics from a random initial condition; minima are not accessible at levels of the energy E > E th , where saddles dominate the landscape, even if many minima are present.
Extending the concept of threshold energy to the case q 12 > 0 is straightforward. As shown in Fig. 18, for any fixed value of q 12 , we can plot the curve corresponding to dominating minima. This curve ends at an energy value that we call E th ðq 12 Þ, which is the best candidate for the asymptotic energy in the relaxation dynamics. Indeed, for E > E th ðq 12 Þ, dominating stationary points are saddles, and with high probability, the dynamics will be able to relax further until the energy level E th ðq 12 Þ is reached. However, even assuming the dynamics converges to E th ðq 12 Þ, we are still left with an unknown: the value of q 12 , i.e., the asymptotic value for Cðt; 0Þ. This problem is the same one the authors of Ref. [50] faced after computing the constrained complexity for the free energy at a nonzero temperature.
In Fig. 19, we plot the quenched constrained complexity of dominant states in the (3 þ 4)-spin model, computed with T ¼ T MCT (left panel) and T ¼ 1.02T MCT (right panel). Please notice that the choice of the temperature is not crucial: As long as T ∈ ðT SF ; T onset Þ, the plot would be very similar. The almost-horizontal red curve in the plot represents the threshold energy E th ðq 12 Þ defined above. We notice that the range of energies with a positive complexity is very large compared to variations in E th . The green ellipsis represents our best estimate for the large-time limit of the actual dynamics solved numerically: While the energy can be very well estimated, the limit of Cðt; 0Þ is plagued by a large uncertainty due to its slow convergence (until now, we have not attempted such an extrapolation; however, here we want to be more speculative, and we take the risk). From the plot, one is tempted to conjecture that the dynamics always converges to marginal states with a threshold energy E th ðq 12 Þ. However, we have not found any principle to fix the value of q 12 solely from the complexity curve, and further studies are needed to better match the large-time limit of the dynamics to the energy landscape.
Although Fig. 19 may suggest a relation between dynamics and the generalized threshold energies, a more careful analysis reveals its limitations. Assuming that at large times the relaxation dynamics converges to the manifold of marginal states belonging to the curve E th ðq 12 Þ, one could estimate the point reached by the dynamics by extrapolating the asymptotic energy E ∞ ðTÞ and estimating q 12 from the equality E th ðq 12 Þ ¼ E ∞ . Thus, having fixed the values of E and q 12 , one can proceed by estimating the remaining parameters of the asymptotic aging dynamics, q 0 and y ¼ ∂ E Σ, from the saddle-point equations used to compute the quenched complexity. The result of this computation is shown in Fig. 20 with solid lines and compared to the (very uncertain) extrapolation of FIG. 18. Constrained complexity at an overlap q 12 from a reference configuration sampled at temperature T ¼ T MCT . Vertical lines mark energy values E ∞ (dotted line on the right) and E T MCT (dashed line on the left), corresponding to extrapolated asymptotic energies reached by the dynamics starting, respectively, from T ¼ ∞ and T ¼ T MCT . From a random configuration with T ¼ ∞, the dynamics goes to the energy level E th , where the dominant stationary points are marginal minima, while starting near T MCT , the dynamics goes below E th . If q 12 > 0, the energy where marginal minima dominate decreases, and we represent it with a bold dashed red curve.
Cð∞; 0Þ, shown by red points with errors. We clearly see that, while the estimate of q 12 is compatible with the actual dynamics, the other two parameters are far from the values measured in the numerical solution to the dynamics. Indeed, q 0 becomes smaller than q 12 ; however, in the actual dynamics, the inequality q 12 < q 0 is always satisfied, and y becomes much smaller than y 0 (marked by a dotted horizontal line in Fig. 20), which is a good descriptor of the actual aging in the whole temperature range studied. Thus, the present computation of the quenched complexity of marginal minima of the energy function in mixed p-spin models does not allow us to identify the attractors of the dynamics. More work will be required to connect the largetime aging dynamics to the properties of the energy landscape.

APPENDIX D: EQUILIBRIUM DYNAMICS
The onset temperature T onset marks the point where the landscape influences gradient descent dynamics. In realistic model glasses, it has been found that the same temperature marks the onset of nonexponential relaxation, thus making a direct relation between equilibrium dynamics and the energy landscape [25]. Therefore, it is natural to search for a signature of T onset in equilibrium dynamics.
The aim of this Appendix is to briefly review the equilibrium dynamics at finite temperature and show that, while the usual MCT-like transition takes place at T MCT , no anomalies around T onset are found. The equilibrium dynamical equation for the correlation function reads  Fig. 18. The choice of the temperature is not crucial as long as T ∈ ðT SF ; T onset Þ. The almost horizontal red curves mark the threshold energy E th ðq 12 Þ, and the green ellipsis is our best estimate for the large-time limit of the relaxation dynamics obtained from the numerical integration.
transition is signaled by the appearance of a nonzero limit q 1 ¼ lim t→∞ CðtÞ, with q 1 discontinuous at the transition. Evaluating Eq. (D1) at large times, one finds that q 1 is a solution of This equation coincides with the equation defining the nontrivial minimum of the equal-temperature effective potential discussed in detail in Appendix A. The transition temperature, the highest temperature for which Eq. (D2) has a nonzero solution, satisfies β 2 f 00 ðq 1 Þð1 − q 1 Þ 2 ¼ 1. Standard mode-coupling analysis of the dynamics [22] describes the approach to the transition in terms of the formation of a plateau in the correlation function as a function of log time. This analysis, in particular, shows that the approach to q 1 at T MCT follows a power law with an exponent a, which is determined by Γð1 − aÞ 2 = Γð1 − 2aÞ ¼ ðT=2Þf 000 ½q=f 00 ½q 3=2 . For the mixed model Þ ≈ 0.805166, corresponding to q 1 ¼ ð1 þ ffiffiffiffiffi 37 p Þ=12 ≈ 0.59023 and a ¼ −0.38504. In Fig. 21, we display data from the numerical integration of Eq. (D1) with a simple first-order Euler algorithm similar to the one we use in the off-equilibrium case. The curves show that theoretical expectations are met: A transition occurs at T MCT according to the expected pattern, and below T MCT , the system is in a nonergodic state.
We then investigate the temperature range around T onset to see if this temperature corresponds to a qualitative change in the equilibrium dynamics. Following Ref. [25], we fit the correlation function at intermediate times with a stretched exponential fðtÞ ¼ a expð−ðt=τÞˆβÞ and study the behavior of the stretching exponentβ as a function of the temperature [61]. In Ref. [25], for Lennard-Jones liquids,β was shown to become sensibly different from 1 around T onset . While the long-time behavior of the correlation function is a pure exponential, we can fit the decay at intermediate times. In Fig. 22, we see that the stretched exponential form provides a decent fit if we do not get too close to T MCT .
We plot the stretching exponentβ for the mixed model as a function of the temperature in Fig. 23, together with the same exponent in the pure model. In both cases, we observe that whileβ ≈ 1 at high enough temperatures, it starts to decrease from that value as temperature is lowered.  The vertical lines correspond to the transition temperatures T MCT of the two models. We see no qualitative differences between the two cases. Moreover, in the case of the mixed model, deviations from pure exponential behavior are seen much above the onset temperature T onset ≃ 0.91.
We could not find any qualitative difference here in the behavior of the pure and the mixed model, and we observe that in the mixed model, the stretching starts at T ≃ 1.2, well above the estimated value of T onset ≃ 0.91. In conclusion, we do not see any signature of T onset in the equilibrium dynamics of the mixed model.

APPENDIX E: RESPONSE AT SHORT TIMES
A natural hypothesis about the memory effect in region II is that it could be associated with an increased response Rðt; sÞ to perturbations applied at short times s. Here, we show that short-time dynamics provides hints against this hypothesis. Let us then compare the response Rðt; sÞ starting from infinite temperature and the one for an initial temperature in the region II. In Fig. 24, we plot the response as a function of the time difference t − s starting from infinite temperature and from a temperature T ¼ 1.01T MCT . We clearly see that starting from finite temperature, the response at short times s is much smaller than the one starting from infinite temperature. It is also interesting to study the behavior in time of Rðt; 0Þ, which we plot in Fig. 25. It is well known that starting from a uniformly distributed initial condition, one finds that the response to the initial condition coincides with the remanent magnetization, namely, Rðt; 0Þ ¼ Cðt; 0Þ [46]. For finite T, this relation does not hold; in fact, it is possible to show that Rðt; 0Þ ¼ Cðt; 0Þ − β R t 0 ds f 0 (Cð0; sÞ)Rðt; sÞ. We see that a finite limiting value for Cðt; 0Þ does not necessarily imply a finite response to the initial condition. Our data show that the two terms largely compensate each other; the larger β, the larger the compensation effect. Both for the mixed and the pure model, the lower the starting temperature, the lower Rðt; 0Þ. The decay of Rðt; 0Þ becomes exponential only for T ¼ T SF . In the pure model, this case coincides with T MCT , while in the mixed model, T SF < T MCT .

APPENDIX F: NUMERICAL EXTRAPOLATIONS
The dynamical equations for the correlation and the response functions, written in Eq. (4), were integrated via a simple Euler algorithm with a fixed integration step Δt. In this way, we get extremely precise results at finite time, but the times we can reach are limited. We tried more sophisticated integration schemes [47], where the integration steps are increased during the evolution; unfortunately, as soon as a mixture is used, we have met numerical instabilities at very short times. In practice, only the pure p-spin model seems to allow those integration schemes to work.
The results presented in the main text have been obtained by integrating with an optimal integration step Δt ¼ 0.1, which allows us to reach the largest times without facing any numerical instability (consider that at short times the differential equations we are solving have a natural timescale of order 1, so Δt cannot be much larger than the value we used).
The integration error in the Euler algorithm is linear in the integration step Δt, and we are interested in understanding how much physical quantities computed with Δt ¼ 0.1 differ from the corresponding Δt → 0 limit. To this purpose, we study the relative errors in one-time quantities defined as follows: By definition, this relative error is zero for Δt ¼ 0.1, it should be linear in Δt for small enough Δt, and, in the limit Δt → 0, it provides the relative error from using an FIG integration step Δt ¼ 0.1 instead of the exact integration (Δt → 0). In Fig. 26, we show the relative errors in some one-time quantities, Cðt; 0Þ, χðt; 0Þ, EðtÞ, and μðtÞ, obtained in the integration of the dynamical equations for the (3 þ 4) spin with T ¼ T MCT . We clearly see that, at any time t, the relative error scales roughly linearly with Δt. For any time t, the relative errors are extremely small: of order 10 −5 for correlation and response, and of order 10 −7 for the energy and radial reaction. Moreover, the coefficient of the linear relation does not seem to grow indefinitely with time, but it actually shows a maximum around t ¼ Oð100Þ.
In order to better appreciate the nonmonotonic behavior of the relative errors, we plot in Fig. 27 the integration error in the energy obtained starting from T ¼ T MCT as a function of time. The fact that relative errors decrease at very large times is probably related to another important observation: Trajectories integrated with different Δt may differ sensibly at short times, but at large times, they seem to converge to the same asymptotic solution and are thus very attractive and stable (this result may be the reason why integration errors at very large times are smaller than those at intermediate times). The results of the integration shown in the present work are very reliable and stable.
Despite the high precision at finite times, the extrapolation to large t of physical quantities, notably μðtÞ, EðtÞ, or Cðt; 0Þ, is delicate. While Cðt; 0Þ decays very slowly and we do not attempt any extrapolation in the t → ∞ limit, both μðtÞ and EðtÞ converge fast enough, and their largetime limits can be estimated.
We already noticed in Fig. 5 that for T ≥ T SF , the radial reaction μðtÞ is perfectly compatible with an extrapolation to the marginal value μ mg ¼ 6. At very high temperatures, the decay is a simple power law, while at lower temperatures, we observe preasymptotic effects and an eventual crossover to the asymptotic decay. The latter seems to be characterized by an exponent that is roughly temperature independent.
Given the critical character of the dynamics in the aging regimes I and II, we have estimated the asymptotic energy E ∞ ðTÞ via power-law fits to EðtÞ data, and we show the results in Fig. 28. While at high temperatures the data follow a nice power law in time for a couple of decades, at lower temperatures the presence of the preasymptotic regime and the crossover to the asymptotic one makes it very hard to assess the reliability of the value of E ∞ ðTÞ that we extrapolate in this way.
However, comparing the data shown in Fig. 5 for μðtÞ and in Fig. 28 for EðtÞ, we notice a very similar behavior (including the crossover), and thus it could be useful to parametrically plot EðtÞ versus μðtÞ, varying t, to see whether a better estimation of E ∞ ðTÞ could be obtained. This method is used in Fig. 29. The behavior of the energy as a function of the radial reaction is very smooth, practically linear at high temperatures and close to quadratic approaching T SF . We can then fit the relation EðμÞ and obtain a very good estimate of E ∞ ðTÞ by just assuming that μðtÞ converges to the marginal value μ mg in the large-time limit. We have interpolated the data via the function  4) spin, which show a maximum at a finite time. Here, we plot the relative error in EðtÞ during the integration with T ¼ T MCT . We notice that the maximum relative error is much smaller, by several orders of magnitude, than the physical effect we seek (energy going under the threshold). EðμÞ ¼ E ∞ þ Aðμ mg − μÞ α ; reporting the best fits using dotted curves in Fig. 29.
In Fig. 30, we show the two estimates of E ∞ ðTÞ obtained from the procedures just described: the power-law fit of energy versus time and the power-law fit of energy versus radial reaction μ mg − μðtÞ. We see that they are perfectly compatible.
Although we do not estimate the large-time limit of Cðt; 0Þ directly by extrapolating in time (in Appendix B, an estimate is provided based on a different approach), we believe it is useful to show the Cðt; 0Þ curves so that the readers can decide for themselves. In Fig. 31, the data for Cðt; 0Þ are shown, measured both in the pure three-spin model and in the mixed (3 þ 4)-spin model. We have decided to show these data as a function of the scaling variable t −1=3 that describes the decay quite well starting from a random configuration (T ¼ ∞). En passant, we notice that the exact value for this decay is not known. We see in Fig. 31 that, while data for the three-spin model can be easily extrapolated to zero in the large-time limit, the data for the (3 þ 4)-spin model seem much more compatible with a nonzero limit lim t→∞ Cðt; 0Þ when the temperature gets close to T MCT . An alternative representation of the same data is provided in Fig. 32. variable is to make the relaxation from T ¼ ∞ as linear as possible [notice, however, that the exact value for the decay exponent of Cðt; 0Þ is unknown even in this case]. While data for the three-spin model can be easily extrapolated to zero in the large-time limit, the data for the (3 þ 4)-spin model seem much more compatible with a nonzero limit q 12 ¼ lim t→∞ Cðt; 0Þ when the temperature gets close to T MCT . The reason why we have not been able to integrate the dynamical equations for a longer time resides in the difficulties in using integration algorithms with adaptive integration steps. These kinds of adaptive algorithms have been used in the past to integrate the dynamical equations in the case of the pure p-spin model starting from random initial conditions [47,48]. Unfortunately, we found that these adaptive algorithms are not robust to changes in the details of the dynamics, and their stability strictly depends on the regime under consideration. At each contraction of the grid, some information on the correlation and response functions computed in the previous steps is lost, which does not seem to be relevant if one starts from a random state and memory of the initial conditions is lost in the asymptotic state. However, in the mixed p-spin model, where memory of the initial configuration crucially affects the large-time behavior, the error induced by the time-step adaptation gives rise to strong numerical instabilities. Increasing the grid size pushes the instability to later times, but unfortunately, even the largest grid size we could use (8192 × 8192) did not allow substantial improvements with respect to the simple Euler algorithm. Same data for the decay of the correlation with the initial configuration Cðt; 0Þ as in Fig. 31, but here instead of the time, we use the Cðt; 0Þ measured starting from a random configuration (T ¼ ∞) on the x axis. As in Fig. 31, the data for the three-spin model extrapolate to zero, while the data for the (3 þ 4)-spin model seem much more compatible with a nonzero limit q 12 when the temperature gets close to T MCT .