Glassy nature of the hard phase in inference problems

An algorithmically hard phase was described in a range of inference problems: even if the signal can be reconstructed with a small error from an information theoretic point of view, known algorithms fail unless the noise-to-signal ratio is sufficiently small. This hard phase is typically understood as a metastable branch of the dynamical evolution of message passing algorithms. In this work we study the metastable branch for a prototypical inference problem, the low-rank matrix factorization, that presents a hard phase. We show that for noise-to-signal ratios that are below the information theoretic threshold, the posterior measure is composed of an exponential number of metastable glassy states and we compute their entropy, called the complexity. We show that this glassiness extends even slightly below the algorithmic threshold below which the well-known approximate message passing (AMP) algorithm is able to closely reconstruct the signal. Counter-intuitively, we find that the performance of the AMP algorithm is not improved by taking into account the glassy nature of the hard phase. This result provides further evidence that the hard phase in inference problems is algorithmically impenetrable for some deep computational reasons that remain to be uncovered.


INTRODUCTION
Inference problems are ubiquitous in many scientific areas involving data. They can be summarized as follows: a signal is measured or observed in some way and the inference task is to reconstruct the signal from the set of observations. Many practical applications involving data rely on our ability to solve inference problems fast and efficiently. While from the point of view of computational complexity theory many of the practically important inference problems are algorithmically hard in the worst case, practitioners are solving them every day in many cases of interest. It is hence an important research question to know which types of inference problems can be solved efficiently and which cannot. Formally satisfying answer to this question would lead to an entirely new theory of typical computational complexity, and would likely shed new light on the way we develop algorithms.
For a range of inference problems the Bayesian inference naturally leads to statistical physics of systems with disorder, see e.g. [1]. This connection was explored in a range of recent works and brought a class of models for inference problem in which the Bayes-optimal inference can be analyzed and presents a first order phase transition. As common in physics in high dimension, the first order phase transition is associated to the existence of a metastable region in which known efficient algorithms fail to reach the theoretical optimal performance. This metastable region was coined as the hard phase, see e.g. [2]. It has been located in error correcting codes [3,4], compressed sensing [5], community detection [6], the hidden-dense submatrix problem [7,8], low-rank estimation problems including data clustering, sparse PCA or tensor factorization [9,10], learning in neural networks [11]. The nature of the hard phase in all these problems is of the same origin, and therefore it is expected that algorithmic improvement in any of them would lead to improvement in all the others as well.
In the current state-of-the-art (including the references above) the hard phase is located as a performance barrier of a class of message passing algorithms. Message passing algorithms can be seen as spin-offs of the cavity method of spin glasses [12]. In the context of inference on dense graphical models the algorithms is called approximate message passing (AMP) known from the context of compressed sensing [13]. In the limit of large system size, the dynamical evolution of AMP can be tracked by the so-called state evolution (SE) [13,14], whose fixed point equations coincide with the saddle point equations describing the thermodynamic of the system under the replica symmetric assumption. The analysis of SE and its comparison to the analysis of the Bayes-optimal performance reveals that there is an interval of noise-to-signal ratio where the signal could be reconstructed by sampling the posterior measure, while AMP is not able to converge to the optimal error. This interval marks the presence of the hard phase.
In this paper we want to attract further attention of the physics community towards the existence of this hard phase related to a 1st order phase transition in the optimal performance in inference problems. The following open questions might use the physics-like approach and insights: Could there be a physics-inspired algorithm that is able to overcome the algorithmic barrier the AMP algorithm encounters? Note that in problems where the corresponding graphical model can be designed, such as compressed sensing or error correcting codes, such a strategy related to nucleation indeed exists [5,15]. But what about the more ubiquitous problems where the graphical model is fixed? Are there some phys-ical principles or laws that can provide further evidence towards the impenetrability of the algorithmic barrier?
The motivation of the present work was to investigate the above questions. We analyze the following physicsmotivated strategy: It is known that the metastable part of the posterior measure in the hard phase is glassy [16][17][18]. Yet, the AMP algorithm fails to describe this glassiness properly. In some other contexts where message passing algorithms are successfully used, a correct account of glassiness leads to algorithm that improve over simpler ones. Notably this is the case of random constraint satisfaction problems, where the influential work [19] has shown that survey propagation, that takes correctly glassiness into account, beats the performance of belief propagation.
We pose therefore the problem whether, in inference tasks, the reconstruction of the signal becomes easier when one uses algorithms in which the glassiness is correctly taken into account. We investigate this strategy thoroughly in the present work. We confirm that the hard phase is glassy in the sense that it consists of an exponential number of local optima at higher free energy than the equilibrium one. However, when it comes to the reconstruction of the signal, our analysis leads us to the remarkable conclusion that, in contrast to constraint satisfaction and optimization problems, in inference problems taking into account the glassiness of the hard phase does not improve upon the performance of the simplest AMP algorithm. We thus provide an additional evidence towards the bold conjecture that in the corresponding inference problems AMP is the best of lowcomputational-complexity inference algorithms.
Note that such a negative result is very interesting from both physics and computer science point of view. In physics, a common intuitive narrative tells us that the properties of the energy landscape control the algorithmic difficulty of the problem. Yet a solid and physically intuitive explanation of why inference algorithm could not penetrate the hard phase remains open. Our results invite researchers to progress in this question, eventually leading to a precise understanding of the interplay between dynamics and landscape. In computer science, developments that go beyond the traditional worst-case computational complexity results are rare and the hard phase provides an unique and a sharply delimited case that might be computationally hard even for a typical instance. Building a theory that would explain the nature of hard phase might be the next pillar of our understanding of computational complexity.
Our analysis of the glassiness of the hard phase provides new insights on the performance of Monte Carlo or Langevin dynamics. Presence of the glassiness suggests that these sampling-based algorithms are sloweddown and thus their commonly used versions may not be able to match the performance of AMP. While this aligns with some of the the early literature [16], more recent literature [6] suggested, based on numerical evidence, that Monte Carlo sampling is as good as the message passing algorithm. Based on conclusion of our work, this question of performance barriers of sampling-based algorithms should be re-opened and investigated more thoroughly. Good understanding of performance of these algorithms is especially important in the view of the fact that some of the most performing systems currently use stochastic gradient descent, that can be seen as a variant of the Langevin dynamics.
This paper is organized as follows. In Section we introduce the model on which we illustrate the main findings of this paper, we expect this picture to be generic and apply to all the models where the hard phase related to a first order phase transition in the performance of the Bayesian inference was identified. In Section we remind the basic setting of Bayesian inference. In Section we give a summary of the main algorithmic consequences of our work. In Section we then remind the replica approach to the study of the corresponding posterior measure. Section then summarized the known replica symmetric diagram and the resulting phase transitions. Section then includes the main technical results of the paper where we quantitatively analyze the glassiness of the hard phase, giving rise to our conclusions in section .

MODEL
In order to be concrete we concentrate on a prototypical example of an inference problem with a hard phasethe constrained rank-one matrix estimation. This problem is representative of the whole class of inference problems where the hard phase related to a 1st order phase transition was identified [7,20,21]. We choose this example because it is very close to the Sherrington-Kirkpatrick model for which the study of glassy states is the most advanced [12]. Glassiness was also studied in detail in the spherical or Ising p-spin model, corresponding to spiked tensor estimation [9]. However, in that model the hard phase spans the full low-noise phase and the transition towards the easy phase, on which we aim focus here, happens for noise-to-signal-ratio too low to be straightforwardly investigated within the replica method.
In the rank-one matrix estimation problem the signal, denoted by x (0) ∈ R N , is extracted from some separable prior probability distribution given by P . This signal is subjected to noisy measurements of the following form where ξ ij are Gaussian random variables with zero mean and variance ∆. Therefore one observes the signal through the matrix Y . The inference problem is to reconstruct the signal x (0) given the observation of the matrix Y . The informational-theoretically optimal performance in this problem was analyzed in detail in [21] and this analysis was proven rigorously to be correct in [22][23][24][25].
Refs. [21,22,26] also analyzed the performance of the AMP algorithm. While the theoretical part of this paper is for a generic prior P X , the results section focuses on the Rademacher-Bernoulli prior as this is a prototypical yet simple example in which the hard phase appears for sufficiently low ρ [20,21]. Let us mention that the rank-one matrix estimation with the Rademacher-Bernoulli prior has a very natural interpretation in terms of community detection problem. Keeping this interpretation in mind can help the reader to get intuition about the problem. Nodes are of three types: x (0) = 1 belong to one community, x (0) = −1 to a second community, and x (0) = 0 does not belong to any community. The observations Y ij (1) can be interpreted as weights on edges of a graph that are on average larger for nodes that are either both in community one or both in community two, they are on average smaller if one of the nodes is in community one and the other in community two, and they are independent and unbiased when one of the nodes does not belong to any community. Thanks to the output universality result of [23,27] the result presented in this paper also hold for a model where the observations Y ij ∈ {0, 1} correspond to the adjacency matrix of an unweighted graph with Fisher information corresponding to the inverse of the variance ∆.

BAYESIAN INFERENCE AND APPROXIMATE MESSAGE PASSING
We study the the so-called Bayes optimal setting, which means that we know both the prior P X (x) and the variance ∆ of the noise. The probability distribution of x given Y is given by Bayes formula Since the noise ξ ij is Gaussian we have Both in Eq. (3) and (4) we have omitted the normalization constants. An estimate of the components of the signal that minimize the mean-squared-error with the ground truth signal x (0) is computed aŝ where the brackets stand for the average over the posterior measure Eq. (3). Therefore in order to solve the inference problem we need to compute the local magnetizations {x i }. The AMP algorithm is aiming to do precisely that, its derivation can be found e.g. in [21]. AMP boils down to a set of recursion relations of the formx whose iterative fixed point is taken as an estimate of the signal. It is known that fixed points of the state evolution of the AMP algorithm is in the thermodynamic limit described by the replica symmetric (RS) solution of the model [13,14]. AMP follows the RS solution irrespectively of the fact whether RS is the physically correct description of the posterior measure or not. As shown in [28], it is possible to derive a generalized AMP, that we call Approximate Survey Propagation (ASP) algorithm, whose state evolution fixed points coincide with the replica equations in the one-step replica symmetry breaking (1RSB) ansatz. Just as AMP, the ASP algorithm can be also written in a form [28] x depending on one additional free parameter s, corresponding to the Parisi parameter from the spin glass literature. The special case of s = 1 reduces the ASP algorithm back to AMP. The 1RSB solution is known to provide a better description -in many case exact -of glassy states. In section we hence study the thermodynamics of the above model in the RS and 1RSB ansatz, focusing on its properties in the hard phase.

SUMMARY OF MAIN ALGORITHMIC RESULT
Before going to the technical part of the replica analysis in Sec. , we briefly summarize the corresponding main algorithmic result. In section we then investigate in detail the 1RSB solution of the low-rank matrix estimation model (1) focusing on the glassy properties of the hard phase. Our main interest, however, is in the relation between the 1RSB solution and the associated algorithmic performance. The main question we ask is whether ASP can (for a suitable choice of the Parisi parameter s) improve on AMP. The experience with survey propagation algorithm applied to constraint satisfaction problems [29] suggests that this should be possible.
In Fig. 1 we plot the magnetization achieved by the ASP algorithm as a function of the noise ∆ for several values of the Parisi parameter s. We observe that as the noise ∆ decreases the equilibrium value (yellow) is reached first by the s = 1 curve, corresponding to performance of AMP. In Fig. 3 we then plot the mean-squarederror as a function of the Parisi parameter s for several values of the noise ∆. Again we see that in all cases the best error is achieved with s = 1. Algorithmically this means that in the present setting, ASP never obtains better accuracy than the canonical AMP algorithm.
The fact that among all the values of s the lowest MSE is reached by the s = 1 states for all ∆ is unexpected from the physics point of view. It implies that the AMP that neglects glassiness and wrongly describes the hard region works better as an inference algorithm than an algorithm that correctly describes the metastable states in this region. At the same time, the above result could be anticipated based on mathematical theorem of [7] that implies that AMP is optimal among all local algorithms. This theorem applies as long as an iterative algorithm only uses information from nearest neighbours and (nearly) reaches a fixed point after O(1) iterations.

THE REPLICA APPROACH TO THE POSTERIOR MEASURE
In order to study the posterior measure, we define the corresponding free energy as (8) This is a random object since it depends on the matrix Y . Furthermore it depends on ∆ through the function G. Indeed, we want to study the typical behavior of this sample-dependent free energy. Therefore we define where Y is obtained as in Eq. (1), so that P (Y ) is given by In order to perform the average defined in Eq. (9) we use the replica method [12]. Introducing we get For integer n we can represent Z n as an n-dimensional integral over n replicas x (a) with a = 1, . . . , n. Stated in this way the problem is obviously symmetric under the exchange of the n replicas among themselves. Moreover since we need to integrate over the signal distribution P (Y ) we end up with a system of n + 1 replicas, that, in the Bayes optimal case, is symmetric under the permutation among all the n + 1 replicas. Performing standard manipulations, see e.g. [12], we arrive at a closed expression for f(∆) that is where S is a function that can be computed explicitly and q andq are (n + 1) × (n + 1) overlap matrices. In the large N limit, the integral in Eq. (13) can be evaluated using the saddle point method. At the saddle point level the physical meaning of the overlap matrix q is given in terms of while the matrixq is just a Lagrange multiplier. We denote m the magnetization of the system, meaning The saddle point equations for q andq can be written in complete generality for any n but then one needs to take the analytic continuation down to n → 0. One needs an appropriate scheme from which one can take the replica limit. Here we consider two schemes: the replica symmetric (RS) and the 1-step replica symmetry breaking (1RSB) one. We refer here to symmetry under permutations of the n replicas with index a = 1, . . . , n.

Reminder of the replica symmetric solution
The RS scheme boils down to consider From the point of view of the inference, the relevant quantity to look at is the Mean Square Error (MSE) where ρ ≡ x (0) 2 . Replica symmetry among all the n+1 replicas is obtained for m = q 0 . It is well known that, as a direct consequence of Bayes optimality (also called Nishimori condition [2]), this fully replica symmetric solution is the one that describes thermodynamically dominant states. The more general ansatz is, however, important as it allows to describes metastable states where the Nishimori identities might not hold. Plugging this ansatz inside the expression for S and taking the saddle point equations w.r.t. all these parameters one gets the replica symmetric solution as reported in [21], and proven to give the equilibrium solution in [24,25]. The RS free energy can be expressed as where and x (0) and W are random variables distributed according P X x (0) and a standard normal distribution, respectively. The values of m for which φ RS is stationary are the solution of Equilibrium properties of the inference problem are given by the global minima of the free energy Eq. (19). Local minima of the free energy that do not correspond to the equilibrium solution are called metastable. For illustration, we consider the case of the Rademacher-Bernoulli prior (2) and we set ρ = 0.08 so that the inference problem has an hard phase [21]. The replica symmetric phase diagram is represented in Fig. 1 (yellow curve).
At high ∆ the noise is so strong that the signal cannot be recovered and therefore m = 0. Upon decreasing ∆ the signal is relatively stronger w.r.t the noise and for ∆ = ∆ dyn ∼ 1.041ρ 2 the system undergoes a dynamical transition. On the one hand one can see that the free energy (19) develops a local metastable minimum with m > 0. On the other hand, the m = 0 state undergoes a clustering transition according to the pattern familiar in the physics of spin glasses [30,31]. The corresponding RS free energy ceases to describe a paramagnetic state and it describes a non-ergodic phase with an exponential number exp(N Σ(∆)) of metastable states -aka clusters -with zero overlap among each other and identical energy and internal entropy. Both the zero m dominating branch and the metastable m > 0 branch have identical energy and internal entropy. Their free energy difference is the complexity f (m > 0) − f (m = 0) = Σ(∆). Moreover, as we will see in the next section, the typical overlap q 1 between configurations in these states coincides with the value of m of the magnetized solution. For that reason the magnetized state corresponds just to one cluster among the exponential multiplicity dominating the thermodynamics. The complexity (i.e. log of their number) of the thermodynamic states decreases with ∆, until it vanishes at a value ∆ = ∆ IT ∼ 1.0295ρ 2 where there is the information theoretic phase transition and Σ(∆ IT ) = 0. The signal is here strong enough so that a first order phase transition happens where the minimum with positive magnetization becomes the global minimum of the free energy. The complexity of the m = 0 solution becomes negative, the solution is non physical and consequently RSB is necessary to describe the metastable branch. Despite this fact, this RS metastable branch cannot be just dismissed as unphysical: it continues to be relevant algorithmically as dynamical attractor of the AMP algorithm. Decreasing the intensity of the noise further, another phase transition happens in this RS branch. At ∆ = ∆ c = ρ 2 the metastable minimum develops a small magnetization. Decreasing even further ∆, at ∆ = ∆ alg ∼ 0.9805ρ 2 this metastable minimum disappears with a spinodal transition. In the interval [∆ alg , ∆ IT ] one finds the hard phase defined by the property that the AMP algorithm is suboptimal (the shaded yellow region in Fig. 1): the global minimum of the free energy has a high m (low MSE), but the small m non-physical local minimum continues to describe the attractor of the AMP. The state evolution describing the AMP algorithm starting from random conditions converges to the local minimum of lowest magnetization.

Glassy phase and complexity
The low branch RS solution is non-physical below ∆ IT , its existence, however, suggests that metastable states exist that should be described with RSB. We therefore consider the 1RSB ansatz. We divide the n replicas a = 1, . . . , n into n/s blocks, where s is the so-called Parisi parameter [12]. The overlap matrix becomes in the same block q 0 a, b in different blocks (22) and analogous forq. For s strictly equal to one we get back the replica symmetric ansatz Eq. (16). Note that for s = 1, m and q 0 are in general different in the solution: this is crucial when evaluating the MSE Eq. (17) as the minimum of the MSE does not correspond in general to the maximum of m.
The 1RSB free energy takes the form with where The stationary points of the 1RSB free energy are now obtained by the fixed points of where A = q 1 /∆, B = mx (0) /∆ + W q 0 /∆ and C = (q 1 − q 0 )/∆ and the extremum is a minimum in m and a maximum in the other parameters.
We would like to reiterate here the observation that in the same way that the stationary points of the RS free energy correspond to state evolution fixed points of the AMP algorithm, the stationary points of the 1RSB free energy correspond to the fixed points of the state evolution of an approximate survey propagation algorithm that depends on s [28]. In particular, the expression (17) exactly gives the MSE of such algorithm with m and q 0 being the solution of (26).
For high enough ∆ the 1RSB solution collapses to the RS one, meaning that q 0 = q 1 = m = 0. At ∆ dyn the saddle point equations for s = 1 admit a solution with m = q 0 = 0, q 1 > 0. The value of q 1 in this solution coincides with the value of m in the high magnetization RS branch discussed in the previous section. At ∆ IT the metastable states undergo an entropy crisis transition. Although the thermodynamically dominant state becomes the state with high correlation with the ground truth signal, glassy states continue to exist. In fact as far as these states are concerned -if we neglect the high magnetization state -the system undergoes there a Kauzmann transition where the dominant glassy states have zero complexity and a value of the Parisi parameter s is determined by the condition that complexity Σ(∆, s) (defined below) is equal to zero [32].
Let us now discuss s = 1 solutions. It is well known that the Parisi parameter s can be interpreted as an effective temperature that enables to select families of metastable states of given (internal) free energy [33]. Their corresponding complexity Σ (defined as the log of their number) is obtained by deriving (24) w.r.t s [33], and multiplying the result by s 2 , i.e.
As expected this complexity for s = 1 coincides with the free energy difference between the two RS branches discussed in the previous section. In Fig. 2 we plot the complexity as function of both s and of the noise variance ∆. For each value of s we find two regions: a physical region where Σ is positive, and an non-physical one where Σ < 0. Note as the physical region with positive complexity continues not only below ∆ IT , but even well below ∆ alg .
The 1RSB solution is not guaranteed to give the exact description of the glassy states. It is well known that in the replica solutions should be stable against (further) breaking of the replica symmetry. This requires that all the eigenvalues of the Hessian of the free energy should be positive in the solution. The 1RSB solutions can loose stability in two possible ways, associated, to negative val-ues of the following eigenvalues [34][35][36]: where A negative λ I (type I instability) signals the appearance of new scales of distance between states. A negative λ II on the other hand is met when the glassy states are unstable against a Gardner transition to further RSB [34,35]: each metastable state splits into a hierarchy of new states (type II instability) [36]. In Fig. 2 we mark with full lines the stable region, with dashed lines the unstable ones. Type I instability is found for large s in the non-physical region of negative complexity. Type II instability is found in the physical region at small values of s and it has been found also in spin glass models [36][37][38].
Let's now discuss in detail the glassy solutions that one finds for ∆ < ∆ IT representing metastable states with higher free energy than the high-magnetization solution. These solutions have zero or low magnetization (overlap with the signal). As already remarked, for a given ∆, among all the glassy states the ones with lowest total free energy turn out to be the ones with zero complexity Σ. For different fixed values of the parameter s, the complexity curves reach zero value at different values of ∆. Remarkably, as illustrated in Fig. 2 a stable (towards higher levels of RSB) zero-complexity solution is found down to a value of noise ∆ 1RSB,equil < ∆ alg . Stable solutions of positive complexity exists down to ∆ 1RSB,stable < ∆ 1RSB,equil , and solutions with positive complexity (irrespective of the stability) down to ∆ 1RSB,all < ∆ 1RSB,stable . Example of specific values for ρ = 0.08 in Fig. 2 This notably means that for ∆ < ∆ alg , namely in the easy phase where AMP converges close to the signal, families of metastable states continue to exist, some of them being stable with extensive complexity.
One can discuss how do these states influence Monte-Carlo dynamics, that explore the space of configuration  according to principles of physical dynamics. On the one hand, one could conjecture that Monte-Carlo dynamics gets trapped by glassy states even below ∆ alg . On the other hand, the dynamics is expected to fall out of equilibrium for all ∆ < ∆ dyn and it is not a priori clear in which states it should get trapped. While AMP clearly works for ∆ < ∆ alg and does not work for ∆ > ∆ alg , our analysis does not provide any reason why the threshold ∆ alg should be relevant for Monte Carlo or other sampling-based algorithms. For such physical dynamics, numerical simulations and analytic studies in suitable models are necessary to clarify the question of what is the corresponding algorithmic threshold. So far we focused on glassy states of positive com-plexity (i.e. existing with probability one for typical instance). There are also solutions of the 1RSB equations having negative complexity. We will call the negativecomplexity solution the ghost-glassy states. From the physics point of view those solutions do not correspond to physical states for typical instances. Yet, from the algorithmic point of view they do correspond to the fixed points of the ASP algorithm [28] run for a given value of Parisi parameter s, as such they can be reached algorithmically. At this point it becomes relevant to understand for which value ∆ alg (s) do the ghost-glassy state disappear, developing a spinodal instability towards the highmagnetization state. In particular we can ask the natural question if with a suitable choice of the Parisi parameter s the ASP improves over the algorithmic threshold ∆ alg ≡ ∆ alg (s = 1) of the usual AMP (s = 1) and if we could have an s for which ∆ alg (s) > ∆ alg (1). With this question in mind in Fig. 3 we plot the mean-squared error (MSE) with the ground truth signal given by Eq. (17) as a function of s for various values of ∆. We initialize the 1RSB fixed point equations at infinitesimal magnetization and iterate them till a fixed point. We observe that for all values of ∆ the MSE is minimized for s = 1, i.e. by the canonical AMP algorithm.

CONCLUSION
In conclusion, we studied the glassy nature of the hard phase in inference problems. Our results imply that indeed the corresponding metastable state is glassy, i.e. composed of exponentially many states. We evaluate their number (complexity) as a function of their internal free energy to conclude that this glassiness extends to a range of the noise parameter ∆ even larger than the extent of the the hard phase. This finding re-opens the natural question of performance limits of Monte-Carlo based sampling. While some recent works [6] anticipated numerically that Monte-Carlo and message passing will share the same algorithmic threshold, our results do not provide any evidence of this. Instead they suggest that since glassiness is present also below the algorithmic threshold of AMP the performance of sampling-based algorithms will be different in general. In order to validate this proposition one needs to study a different model than the present one. The present model is dense and thus not suitable for large scale simulations, also analytically tractable description of sampling-based dynamics for the present model is a major open question. One possibility is to perform large-scale numerical study with Monte-Carlo based dynamics in diluted models such as those studied in [39]. Another possibility is to aim at analytical description of the Langevin dynamics that is known in a tractable form so far only for mixtures of spherical p-spin models.
While we anticipate that the performance of the usual sampling-based algorithms will be hampered by the glassiness, it is an interesting open question to investigate whether other algorithms are able to match the performance of AMP. We have in mind for instance the algorithms based on the robust ensemble as introduced in [40].
Concerning the AMP algorithm, we conclude that, despite the fact that it assumes the hard-phase not to be glassy, the improved description in terms of one-step replica symmetry breaking, that takes glassiness into account, does not provide algorithmic improvement. This is at variance with the situation in random constraint satisfaction problems, where the knowledge of the organization space of solutions provided by 1RSB leads to algorithmic improvement [29]. We note that this observation is surprising, and we are missing a physically intuitive explanation for why taking glassiness into account improves performance in optimization problems but not in Bayes-optimal inference. We stress that our results provide strong evidence towards the conjecture that the hard phase is impenetrable for some computationally fundamental reasons. Further investigation of this is an ex-citing direction both for physics and theoretical computer science.
In this paper we use the example of low-rank matrix estimation with spins 0 and ±1 as a prototypical example in which the hard phase exists. We checked that the resulting picture applies in a range of parameters and also for some other models (such as planted mixed pspin model) where the hard phase was identified. We expect the picture presented here to be generic in all the problems where the hard phase related to a first order phase transition was identified.
We also note that our above conclusions apply to the case of Bayes-optimal inference where the generative model is matched to the inference model. In case the hyper-parameters are not known or mismatched the message passing algorithm that takes glassiness into account can provide better error and robustness, this is investigated in detail in [28].
Finally, we mention that the results shown here may be compelling also beyond inference problems. In particular, the instabilities of the RS solution at ∆ alg and ∆ c can be related to a similar phenomenon occurring in the mean field theory of liquids and glasses [41,42]. A phase structure similar to the one presented in this paper is found in that case, if we identify ∆ as analogue to an (inverse) density parameter and the reconstruction phase as the crystal. Also in that case, the RS solution representing the liquid at low density describes a non-ergodic extensive complexity phase at higher density. As it is the case here, there is a density where complexity vanishes, but the solution can be continued below this point. Finally, there is a maximum density where the solution undergoes an instability -called Kirkwood instabilityand ceases to exist [43,44]. Our analysis suggests that within inference models not only the non-physical negative complexity RS solution could undergo this instability, but also the glassy ones. Whether this phenomenon could be relevant for other glassy systems is an intriguing question.