Superposition Enhanced Nested Sampling

The theoretical analysis of many problems in physics, astronomy and applied mathematics requires an efficient numerical exploration of multimodal parameter spaces that exhibit broken ergodicity. Monte Carlo methods are widely used to deal with these classes of problems, but such simulations suffer from a ubiquitous sampling problem: the probability of sampling a particular state is proportional to its entropic weight. Devising an algorithm capable of sampling efficiently the full phase space is a long-standing problem. Here we report a new hybrid method for the exploration of multimodal parameter spaces exhibiting broken ergodicity. Superposition enhanced nested sampling (SENS) combines the strengths of global optimization with the unbiased/athermal sampling of nested sampling, greatly enhancing its efficiency with no additional parameters. We report extensive tests of this new approach for atomic clusters that are known to have energy landscapes for which conventional sampling schemes suffer from broken ergodicity. We also introduce a novel parallelization algorithm for nested sampling.

Computer simulations play an important role in the study of phase transitions and critical phenomena. In particular, stochastic techniques such as Monte Carlo (MC) methods have proved to be powerful tools [1]. These methods rely on the ability of the Monte Carlo algorithm to sample the accessible volume in phase space. There are, however, many situations where standard Monte Carlo simulations suffer from a lack of ergodicity. In that case, more sophisticated algorithms are needed to explore the volume in phase space that is, in principle, accessible. Some such techniques are based on the efficient exploration of the underlying, multi-dimensional potential energy surface (PES) [2]. The PES, or energy landscape, can be viewed as a collection of basins separated by barriers, where each basin corresponds to a particular local minimum-energy configuration. The basin volumes define the entropic weight of the corresponding local minima. The transition rate from one basin to another depends on the barrier height as well as the relative entropic weights (configurational space volumes) [2]. Many PES of interest exhibit frustration in the form of low-lying minima with different morphologies separated by high barriers. These structures may act as kinetic traps, when fixed-temperature sampling methods such as molecular dynamics or Metropolis Monte Carlo sampling are used. There exist a wide range of extended or biased sampling techniques, both in Monte Carlo and in Molecular Dynamics, that make it possible to speed up the sampling of landscapes with kinetic traps. These techniques include Monte Carlo methods such as umbrella sampling [3,4], density of states based methods, such as the Wang-Landau method [5] and replica exchange methods [6,7], and their Molecular Dynamics counterparts. Examples are the replica-exchange MD method [8], and the Meta-Dynamics method [9]. In cases where a biased distribution is generated, the original distribution can be reconstructed using reweighting techniques [10,11] However, these approaches may perform poorly when deal-ing with high-dimensional spaces exhibiting broken ergodicity or, in other words, with highly multimodal (or multifunnel [2,[12][13][14]) parameter spaces [15][16][17][18].
In recent years a Bayesian method known as nested sampling [19] has emerged as a possible alternative to extended or biased sampling methods. The nestedsampling approach has found widespread application in astrophysics [20,21] and cosmology [22,23], and has drawn the attention of computational and statistical physicists [24][25][26][27][28][29]. Furthermore the method has been recently adopted for Bayesian model comparison in system biology [30][31][32]. Nested sampling explores phase space in an unbiased way, and allows one to determine statistically the density of states associated with shrinking fractions of phase space. This objective is achieved by placing a constraint on the potential energy (for instance), which decreases at each nested sampling iteration. Like Wang-Landau sampling, the method is athermal and produces the density of states and the partition function (Bayesian evidence) as its primary product. However, nested sampling does not require binning of the energy for systems with continuous potentials. The self-adapting steps in energy (but constant in log phase space volume) is attractive because the approach does not require prior knowledge of possible phase transitions. For example, the step size adjusts automatically as the phase space volume shrinks near a first order phase transition [19,25].
An important drawback of nested sampling is that when the decreasing energy constraint forbids a transition to an unexplored basin, that basin cannot be visited and ergodicity is broken. Hence, while nested sampling certainly is conceptually interesting, its performance is often no better than that of conventional extended sampling methods in dealing with systems exhibiting broken ergodicity [25]. In the present work we introduce a novel hybrid methodology for the exploration and thermodynamic analysis of such systems.
Superposition enhanced nested sampling (SENS) com-bines the strengths of unbiased global optimization techniques [2] with those of nested sampling. Global optimization techniques such as basin-hopping [33,34] are designed to find the lowest energy configuration of a PES. They are not constrained to sample according to any distribution, so they are free to use 'quick and dirty' techniques while searching for the global minimum. For example, they can take Monte Carlo steps that do not satisfy detailed balance, or make use of minimisation algorithms such as L-BFGS and conjugate gradient. Such operational freedom tends to make global optimization algorithms more efficient than generalised ensemble methods at locating the lowest energy minima [35][36][37][38][39]. Collections of the lowest energy minima configurations thus obtained can then be used in the context of the superposition approach (SA) [2,[38][39][40][41][42][43][44] to compute the thermodynamic properties of the system. However, doing so accurately at high temperatures using the SA alone often requires a prohibitively large number of minima.
In the present contribution we show how knowledge of the lowest energy minima and their statistical weights, calculated using the harmonic superposition approximation (HSA), can be exploited to enhance the problematic low energy behaviour of nested sampling, thus making it likely that none of the important minima and associated regions are missed. Although we discuss SENS in the context of energy landscapes, the method is completely general and can be applied to any multi-modal parameter space whose minima (maxima in likelihood) can be identified by global optimization algorithms.

NESTED SAMPLING
Nested sampling [19] provides an elegant solution to the problem of evaluating the density of states, and hence the partition function, for arbitrary systems. A likelihood value is assigned to each possible configuration. For our purposes the likelihood is the Boltzmann factor exp(−E/k B T ), but it could be some other measure. Typically, there are large numbers of configurations with a low likelihood. In addition, there may be a small number of configurations with high likelihood.
The aim of nested sampling is to sample configuration space uniformly, but with the energy constrained to lie below a maximum value, E max , that decreases iteratively throughout the calculation. The rate of decrease is maintained self-consistently, such that the phase space volume with energy less than E max decreases by a constant factor in each iteration.
The nested sampling algorithm starts by generating K configurations of the system completely at random, distributed uniformly, in configuration space. The energy, E R , of each of these configurations is computed and added to a sorted list. For each of these replicas we define the configurational phase space volume, Ω E≤ER , contain-ing all configurations with E ≤ E R . The key insight of nested sampling is that the volumes, Ω E≤ER , normalised by the total phase space volume, are distributed according to the Beta distribution, Beta(K − R + 1, R). This distribution has expectation value and variance The above formalism assumes that the total phase space volume Ω tot is finite, but this condition can generally be satisfied with negligible error, for example by placing the system in a large box. At the i-th nested sampling iteration the replica (out of K replicas) with highest energy E max i is discarded and replaced by a new configuration sampled uniformly under the constraint E ≤ E max i . The maximum energy E max i is stored for later analysis. Again, the volume of configuration space with energy less than the R-th largest energy, Ω E≤ER , this time normalized by Ω E≤E max i , is distributed according to the Beta distribution with mean and variance given by Eq. (1). During the nested sampling iteration the volume of phase space with energy below E max contracts, on average, by µ 1 = K/(K + 1). The algorithm produces a list of the The density of states, or the (normalized) volume of phase space with energy between E max i+1 and E max i is simply (2) Thermodynamic quantities of interest, such as the mean energy, entropy, free energy, and heat capacity, can easily be computed from the density of states at arbitrary temperature.
To generate configurations uniformly in space we use the strategy suggested by Skilling [19]: after removing the configuration with highest energy one of the remaining K − 1 replicas (chosen randomly) is duplicated. The new configuration is then evolved through a Markov chain Monte Carlo (MCMC) walk sufficiently long to decorrelate the system from its initial state. This Monte Carlo walk is equivalent to a normal Monte Carlo simulation at infinite temperature. The coordinates are randomly perturbed, and the new configuration is accepted subject only to the condition that the energy remains below E max . For most systems of interest the vast majority of the computational effort will be spent generating new configurations.

Parallelization
Nested sampling can be formulated to run in parallel on an arbitrary number of processors. We present a pseudocode description of our parallel implementation in Algorithm 2. Since this scheme also constitutes the basic framework for SENS we define the MCMC loop in the most general way at line 9 of Algorithm 2. For the purpose of discussing the algorithm in its simplest form here we will consider Algorithm 1.
for l = 0 to N -steps do generate trial configuration (e.g. by random perturbation); if E trial ≤ Emax: accept trial configuration; end for

ALGORITHM 1. Nested sampling MCLoop
At each nested sampling iteration, instead of removing only the replica with the highest energy, we remove the P replicas with highest energy, where P is the number of processors available. The rate of phase space contraction now is given by µ P , leading to much faster phase space contraction and shorter calculations in terms of wall-clock time. This parallelization procedure was first described in reference [27]. Our improvement is that we do not discard the P − 1 replicas with highest energy but we store them for later analysis. Phase space contraction between iterations is still constant, but now the post-analysis is slightly more complicated. The fraction of configuration space associated with the n-th recorded energy is where "%" is the mod operator. This method follows the same stepping routine as the existing parallelization algorithm. However it produces P times as many points, hence providing a more detailed picture of the potential energy surface and a much more fine-grained binning of the density of states.

SENS -THE CONCEPT
Global optimization is a common numerical problem and global optimization algorithms have been developed in many areas of science [2,45,46]. Knowledge of the local minima alone, however, is not sufficient to infer all the observables properties of interest from the energy landscape (or in general any parameter space). The harmonic superposition approximation (HSA) [47] (for more details, see e.g. [2]) allows one to compute density of states and partition function, solely based on the knowledge of the individual local minima and the local curvatures (normal MCLoop{{Rs}, Emax, minima.db } 10: end while ALGORITHM 2. Parallel nested sampling mode frequencies) of the potential energy landscape, via the Hessian matrix. In the HSA each local minimum corresponds to a harmonic basin and observable properties are expressed as a sum over individual contributions of the minima.
The HSA has been shown to be very effective for several systems [18,39,48] but the accuracy depends on how well the potential energy of the basins can be approximated as harmonic, and how many minima are thermodynamically important. The HSA becomes an increasingly good approximation at lower energies. The total number of minima increases exponentially with system size [47,49], but it is impossible to tell a-priori how many of those are important. For example, LJ 31 , a cluster of 31 isotropic particles interacting through a Lennard-Jones potential [50], has about 3 × 10 15 predicted distinct configurations [12], but only a few dozen are required to reproduce the low temperature thermodynamic behaviour.
The global resolution of nested sampling depends on the number of replicas, K, used in the simulation, which is generally limited by the available computation time (the larger K, the slower the descent in energy). A more serious problem for nested sampling is that if the barrier to enter an unexplored funnel or superbasin is higher than the energy constraint E max , that region of the PES will never be explored if it is not already populated. For example, in a crystallisation transition, at high energy the statistical weight of the liquid phase will be overwhelming and there will be no replicas in the region corresponding to the solid phase. However, as the energy constraint decreases (hence the temperature) the relative statistical weight associated with the solid phase increases. If we could sample phase space uniformly then at low energy we would observe a phase transition corresponding to crystallisation, but we must resort to a MCMC walk to explore phase space. If the entrance to the superbasin corresponding to the crystal has been locked out by E max a Markov chain will not be able to find it, thus missing the transition.
Here we propose a new method that combines complementary techniques: nested sampling can sample efficiently the high energy regions of phase space, while at low energy a database of minima obtained by global optimization is used to augment the survey. While nested sampling assigns the correct statistical weight to each basin, global optimization makes it likely that no important minima are missed. This philosophy is also used in other methods combining replica exchange Monte Carlo with global optimization algorithms to treat broken ergodicity [12,16].

SENS -THE ALGORITHM
Employing knowledge of low-lying minima fits naturally within the framework of nested sampling. We present here an exact and an approximate implementation of the SENS algorithm. Exact SENS is fully unbiased and requires no additional parameters than those needed in nested sampling. Approximate SENS, on the other hand, is formally biased and requires additional parameters. The reason for presenting both methods is that, in some cases, the latter can be considerably simpler to implement than the former while generally producing equally good, or better, results. SENS is based on the original nested sampling algorithm presented in Algorithm 2. The novelty of our method resides in the augmented sampling of the parameter space obtained by coupling the MCMC to the HSA. SENS can therefore be implemented by changing the function MCLoop({R s }, E max , minima.db) of Algorithm 2. A full outline of the SENS algorithm can be found in Algorithm 3 of the Supplementary Information. To run SENS, a database of the lowest energy minima must be precomputed. In this work we adopt basin-hopping [33,34] as the global optimization algorithm of choice.

Exact SENS
An unbiased version of SENS can be implemented by means of Hamiltonian replica exchange Monte Carlo moves [51]: in addition to normal MC steps, we introduce a new Monte Carlo step in which a minimum is sampled from the database according to its HSA configurational entropic weight: We define the configurational volume of basin b and the total configurational volume where V (b) is the potential energy of the minimum corresponding to basin b, ν α are the normal mode vibrational frequencies defined by the Hessian matrix, κ is the number of vibrational degrees of freedom, and n b is the degeneracy of the basin (for Lennard-Jones clusters this is the number of distinct non-superimposable permutationinversion isomers for minimum b) [2]. Here we have left out all the constant factors that cancel out and overall rotations. Once a minimum is selected, a configuration with E ≤ E max is then chosen uniformly from within its basin of attraction. This approach corresponds to selecting a point uniformly from a multidimensional harmonic well. Such a configuration can be generated analytically, see the Supplementary Information for details. Unlike Ref. [16], in our approach we sample from the uniform distribution of configurations below energy E max , rather than from the corresponding canonical distribution.
Thus, we obtain a configuration R sys sampled according to the true Hamiltonian, H sys , and a trial configuration, R har , sampled according to the HSA-Hamiltonian, H har . The energies of the two configurations are then computed with the other Hamiltonian. If then the true distribution and the HSA distributions overlap, the swap is accepted, and R sys becomes R har , otherwise it is rejected. This procedure guarantees that detailed balance is satisfied. In practice only the lowest energy minima will successfully swap, since the HSA can only be reasonably accurate around these basins. It is, however, at low energy that such swaps are needed the most due to the hard energy constraint used by nested sampling. Note that swaps are complemented by regular MCMC walks to allow for the exploration of the full configuration space. In SENS the replicas are allowed to "tunnel" between basins, thus improving the sampling. A more detailed description, along with a pseudo-code implementation of MCLoop specific to Lennard-Jones clusters is provided in the Supplementary Information.

Approximate SENS
The implementation of approximate SENS is somewhat simpler, but comes at the cost of at least one extra parameter. The basic idea of approximate SENS is that the sampling of configuration space can be augmented by starting a MCMC walk from a local minimum configuration, sampled from the database according to its entropic weight Eq. (4), with some user defined frequency. This frequency is intrinsically defined in exact SENS by the relative overlap of the HSA and the true density of states. To implement approximate SENS we only need a database of minima and their relative weights computed according to Eq. (4). Before each MCMC step a random number is drawn. If this number is less than some user defined probability, P DS , then a minimum is selected from the database according to the HSA weights and the MCMC walk starts from this minimum configuration. A pseudo-code implementation of MCLoop for approximate SENS is provided in the Supplementary Information.
There are two main sources of bias in the approximate SENS. The first one is due to the limited number of minima from which we sample, since we cannot include the large number of high energy minima. The second source of error is due to the poor quality of the HSA approximation far from the minimum, hence the entropic weights for the minima are not accurate at high energy. The most obvious way of reducing these biases is to use long MCMC walks. In fact, if we sample from the wrong basin a long MCMC walk will allow the system to escape and explore regions of phase space with greater entropic weight. However, very long MCMC walks are computationally expensive, and if short runs are required we need to sample from the database of minima carefully. If we start sampling from the database of minima at high energy we will possibly introduce a bias due to over-weighting of the low energy regions of configuration space. To avoid this problem we suppress sampling from the database until we are sure the HSA is likely to describe the potential energy landscape accurately. We use a simple function (of the Fermi type) that delays the onset of sampling from the database of minima and limits its maximum frequency where E on is some onset energy and E (R) min is the energy of the replica with lowest energy. E on could be chosen as E (minima.db) max , the energy of the highest known minimum (stored in the database), or as the largest energy at which the HSA describes the system accurately. f max and ∆E are user-defined parameters that determine the total probability of sampling after the onset and the width of the onset region, respectively. For small sampling probabilities, P DS ≪ 1, the optimal frequency of sampling from the database, should scale as 1/K; a theoretical justification is derived in the Supplementary Information. Hence, for P DS ≪ 1, we can make the probability of sampling from the database independent of the number of replicas, replacing f max with f max /K.
We identify two possible strategies for the application of approximate SENS. One is to start sampling from a large database early in the simulation when , with a small P DS , hence we choose f max ≪ 1. This procedure allows nested sampling to do most of the work, but ensures that no important basins will be missed. Alternatively, sampling from the database can be delayed until all the high temperature transitions have occurred, at which point we start sampling more extensively from the database, hence f max 1/2. Note that the database can be considerably smaller in this case. The first strategy is a slight enhancement to nested sampling, while the latter strategy interpolates between nested sampling and the HSA in a similar spirit to the basin sampling method [12]. Importantly, even if we sample from the database of minima, we use the MCMC walk to explore more the anharmonic regions of a basin, allowing us to go beyond the harmonic approximation.

RESULTS
We test SENS by calculating the thermodynamic properties of Lennard-Jones clusters exhibiting broken ergodicity. Lennard-Jones (LJ) clusters are systems of particles that interact via the Lennard-Jones potential [50] where ǫ is the pair well depth, σ is the separation at which E = 0, and 2 1/6 σ is the equilibrium pair separation. LJ clusters have served as benchmarks for many global optimization techniques and for the estimate of thermodynamic properties by computer simulation [2,12,15,18,25,34]. The majority of putative ground states for LJ clusters are based on icosahedral packings [13]. For some magic number LJ clusters complete Mackay icosahedra are possible, for examples N = 13, 55. Complete icosahedral structures are considerably more stable than neighbouring sizes and their landscape is funneled towards the global minimum [13]. There are, however, other sizes for which the global minimum is not icosahedral. Examples are LJ 38 , whose ground state is an fcc-truncated octahedron [13], and LJ 75 whose global minimum is a Marks decahedron [13]. Due to the overwhelming number (entropic weight) of structures based on incomplete icosahedra at high energy, the energy landscapes of LJ clusters with nonicosahedral global minima exhibit broken ergodicity. Calculating accurate thermodynamic properties for these systems has proved to be a real challenge for all conventional techniques [2,15,18,25] and hybrid or more complicated schemes [12,14,15,18] are necessary. LJ clusters with broken ergodicity therefore provide excellent benchmarks to test the performance of new sampling techniques.

LJ31
LJ 31 is the smallest Lennard-Jones cluster exhibiting broken ergodicity and a low temperature solid-solid phase-like transition from Mackay to anti-Mackay surface structures [13]. Convergence of the heat capacity curve for LJ 31 by parallel tempering (PT) with 24 geometrically distributed temperatures in the range 0.0125 to 0.6 required N total E = 1.9 × 10 11 energy evaluations to converge (curve shown in Fig. 1). Partay et al. [25] report that K = 288000 replicas and N total E = 3.4 × 10 12 energy evaluations were needed to converge the heat capacity curve of LJ 31 by nested sampling (NS) using a low particle density of 2.31 × 10 −3 σ −3 (100 fold less dense than our system). Fig. 1 compares the heat capacity curves obtained by PT, HSA (computed using 80000 minima), NS and SENS for LJ 31 . The SENS and NS results correspond to K = 20000 replicas, N = 10000 steps for each MCMC walk, and P = 16 cores. The database of minima used for SENS contained the lowest 183 minima although for SENS exact we observe that only seven minima contribute to the swaps; see Table IV of Supplementary Information for the swap statistics. From Fig. 1 we see that both exact SENS and approximate SENS are well converged and agree with the PT curve over the whole temperature range, and with the HSA at low temperature. We note that K = 20000 replicas are not nearly enough for NS to converge, and the low temperature peak is in fact completely absent. Using this number of replicas SENS requires half the total number of energy evaluations of PT and one order of magnitude less than NS, see Table I. The swap operations do not constitute a noticeable overhead and the reduction in the total num-  indicates the total number of energy evaluations (summed over all processors). PT was performed using 24 replicas spread geometrically through the temperature range 0.0125 to 0.6. Note that approximate SENS can perform as well as exact SENS when fewer replicas are used, in the interest of brevity we do not include these results as the LJ75 calculations illustrate clearly the capabilities of the method. ber of energy evaluations corresponds to an equivalent reduction in wall-clock time.
In Fig. 2 we show a comparison of PT, HSA and exact SENS for a range of replica numbers 2500 ≤ K ≤ 20000; see Table I for comparison. We observe that the high temperature peak practically converges for K = 10000 and it resembles the features of the converged curve quite well even for smaller numbers of replicas. The low temperature peak instead converges very quickly, for as few as K = 2500 replicas, representing an improvement in performance of 20 times over PT. We note that one of the great strengths of SENS is that even when a small number of replicas are used and run times are very short, although the curves may not be completely converged, the physical picture produced by the method is always correct because all the important basins are visited. On the other hand, rapid convergence of the heat capacity curves, requires the HSA to be a good representation for the system. LJ 38 is an example for which this condition does not hold, see the Supplementary Information for further details.

LJ75
LJ 75 is a particularly clear example of a doublefunneled energy landscape [13] with O(10 25 ) distinct local minima [12]. The decahedral global minimum is separated by a very large potential energy barrier from the lowest icosahedral minimum. Sharapov and Mandelshtam [16] showed that O(10 12 ) (total) energy evaluations of adaptive parallel tempering are not enough to converge the heat capacity peak corresponding to the solid-solid phase-like transition in LJ 75 [16]. Furthermore, the rate of convergence slows down dramatically (it practically stops) after O(10 11 ) (total) energy evaluations and coupling of PT to the HSA is necessary to obtain convergence of the low temperature peak [16]. Fig. 3 compares the heat capacity curves obtained by HSA (computed using 758 minima) and SENS for LJ 75 . SENS was carried out using K = 30000 or K = 60000 replicas and N = 10000 steps for each MCMC walk on P = 16 processors. The database of minima for SENS contained the lowest 758 minima. Approximate SENS started sampling from the database at E on = −369 ǫ, while for exact  indicates the total number of energy evaluations (summed over all processors). PT curves are not shown as the computational cost to converge its heat capacity by this method is computationally prohibitive as shown in Ref. [16]. SENS exact does not converge as quickly as approximate SENS due to the low accuracy of the HSA and hence the low swap acceptance.
SENS only 10 of the minima contributed to the swaps; see Table VI of the Supplementary Information for swap statistics. Unlike adaptive PT [16], approximate SENS converges in O(10 11 ) energy evaluations (Table II), but exact SENS fails to converge the low temperature peak for the same number of replicas. As for LJ 38 , exact SENS does not converge quickly due to the lower accuracy of the HSA, as inferred from the extremely low swap acceptance (Table VI of Supplementary Information). On the other hand approximate SENS performs considerably better than for LJ 38 because the melting transition is well separated from the solid-solid transition, thus allowing sampling from the database relatively early on in the simulation (right after melting) without affecting the melting transition.

Methods
We define a move in MCMC as the displacement of each individual particle along a random vector (n in total). After each MCMC walk we update the step size in order to keep the average acceptance ratio within range of some target value, which we have chosen as 0.5. The default parameter values for the onset function Eq. (8) were f max ≈ 2/3 and ∆E = 1. We used a spherical box of radius R = 2.5 σ for n = 31, R = 2.8 σ for n = 38 and R = 3.0 σ for n = 75, with no periodic boundary conditions and no cutoff radius. All calculations were carried out on a single workstation with P = 16 processors (eight-core dual Xeon E5-2670 2.6GHz, Westmere) using the improved parallelization scheme discussed in sec. Parallelization. The calculations were terminated when the energy difference between the replicas with highest and lowest energies was less than 10 −2 ǫ. Energies of the final "live" replicas were added to the output and the compression factor associated with the ℓ th "live" replica was computed as Error bars were obtained by the compression factor resampling scheme discussed in the Supplementary Information. By nested sampling or SENS iterations, N iter , we mean a whole nested sampling iteration on P processors, the total number of energy evaluations is N (tot) E = N ×P ×N iter , where N is the number of steps in a MCMC. The computational overhead associated with global optimization by Basin Hopping is insignificant as ≤ O(10 5 ) energy evaluations are necessary to find the global minima of the LJ clusters considered here [52]. Highly modular Python/C parallel implementations of nested sampling and SENS are publicly available [53,54].

Sampling configurations in a harmonic well
Given a harmonic potential the configurational density of states for a basin can be obtained by inverse Laplace transforming the configurational partition function. In particular, the scaling goes as: where all the terms that do not depend on energy have been left out. g c (E) is the configurational density of states, Z c the configurational partition function (evidence), β = 1/k B T is the inverse temperature, E c the configurational energy, V the potential energy of the corresponding minimum and κ is the number of degrees of freedom (for a n-atoms cluster κ = 3n− 6). We can write the energy of the simple harmonic oscillator as where r is the magnitude of the displacement vector and ξ is the stiffness of the harmonic spring. We want to determine the probability distribution of the configurational energy as a function of the displacement vector norm, ξ 1 2 r, to perform analytical uniform sampling in the harmonic well. The unnormalised probability of finding a configuration between E c and E c + dE c must be proportional to the configurational density of states, from Eq. (11): Denoting q = ξ 1 2 r, by a simple change of variables we can express the energy probability distribution in terms of q: where the Jacobian J = dE c /dq = q and the equality simplifies to the probability density function Hence q must be distributed according to the power law cumulative distribution function P (q) = q κ (denoted Pow(κ)) to obtain the correct distribution of energies. In order to sample uniformly below some energy constraint E max we first generate a random κ−dimensional vector v with norm v ∼ Pow(κ) ∈ (0, 1] in the unit hypersphere. Then starting from Eq. (12) we write where q usc is the uniformly sampled configuration vector with energy E c . It can easily be verified that q usc has the correct inner product: where again v ∼ Pow(κ) ∈ (0, 1].

Onset function scaling
How well would SENS perform in a multifunneled landscape? Let us assume that SENS is using a database that stores the lowest minima m a and m b of two different funnels, with associated energies V a and V b , respectively. Assuming that funnel F b has already been missed (F b is not populated and E max is lower than the lowest transition state that leads to F b ) and all replicas are already in funnel F a , the probability of successfully sampling in F b is then where n is the number of iterations before the calculation terminates, P is the number of processors, P DS is a user defined probability of sampling from the database, and p b (i) is the discrete probability density that the minimum sampled from the database is m b . Assuming that the funnels are harmonic we obtain Eq. (19) where ν α = ω α /(2π) is the vibrational frequency of mode α and for an object corresponding to a point group with o independent symmetry operations, there are o permutation-inversion operations associated with barrierless reorientations [2]. The ideal value for P DS must then satisfy an identity of the form where n(V a ) is the average number of steps necessary for descending from E g to V a and φ is a probability close to 1, say φ = 0.999. Taking the logarithm of both sides, Eq. (20) can be rewritten as For small P DS linearisation then leads to which provides an optimal value for P DS . To calculate the average number of steps necessary to descend a harmonic basin, first we calculate an expression for the set . (19) of energies that would be obtained by nested sampling if at each step the configurational space was compressed exactly by µ. We note that from which we find where µ = 1 − P/(K + 1). The number of steps n(V a ) necessary to descend from E g to E − V a = 10 −δ can be obtained by rearranging Eq. (24) to give Finally, substituting Eq. (24) for E i in Eq. (19) we can evaluate Eq. (20) numerically to obtain an optimal value for P DS (approximating n(V a ) to the nearest integer).
In the main text we introduce the onset function Eq. (8) and suggest that for small P DS an optimal way to make the probability of sampling from the database of minima independent of the number of replicas, is to use the prefactor 1/K, hence the maximum frequency to sample from the minima, should be f max /K, where f max is a user-defined parameter. Eq. (25) gives the average number of steps necessary to descend a particular harmonic basin as a function of the number of replicas and by substituting this result in Eq. (20) we can evaluate P DS , the optimal probability of sampling from the database of minima. In Fig. we plot f max = KP DS vs K. For large K the optimal value of sampling from the minima scales as 1/K, thus justifying the use of the prefactor in the onset function.  25) gives the average number of steps necessary to descend a particular harmonic basin as a function of the number of replicas K and by substituting it in Eq. (20) we can evaluate PDS, the optimal probability of sampling from the database of minima. As long as K is sufficiently large, fmax = KPDS will be approximately constant, hence the optimal PDS scales as 1/K. (a) As Va, the potential of ma, decreases, the volume of Fa increases, and hence the optimal fmax increases as well. (b) The optimal fmax should be independent of the number of processors used, hence for sufficiently large K all curves approach the same value. Unless specified assume P = 1, κ = 3, Eg = 500, Va = 100, The nested sampling algorithm produces as its primary product a list of parameters (in our case energies) with an associated fraction of configurational space where the t j are the compression factors sampled on a unit interval with probability distribution Beta(K − P + 1, P) and expectation value µ = 1 − P/(K + 1). The exploration of configuration space is the challenging and time consuming part of the algorithm, while the overhead due to the assignment of compression factors is almost irrelevant. In general we use the expectation value µ of these compression factors in order to find the bins of density of states, g, that we need to calculate thermodynamic properties. Given a set of energies obtained by nested sampling, the correct size of the bins for the density of states is one unique realisation of the compression factors t that we do not know a priori (it is for this reason that we use the expectation value µ). There is some statistical uncertainty associated with the distribution of the bin size, which is ultimately due to the distribution of compression factors t, which we know. Since we are interested in the distribution of some observable Q(E), say the heat capacity, we can use a representative set of parameters E = E 1 , E 2 , . . . , E n (energies) obtained by nested sampling (the time consuming part) and sample c sets of compression factors n to associate with this representative set of parameters. This procedure is justified by the fact that we are interested in the probability distribution of Q( E) given the joint probability distribution n ). The mean and variance of Q( E) are therefore A protocol for quantifying uncertainty of this form was already suggested by Skilling [19]. However, this method suffers from some flaws: it does not include an estimate of systematic uncertainties and does not include in any way an estimate of the uncertainty due to the incomplete sampling of configurational space. The latter effect is due to the fact that resampling is performed over one representative set of energies ( E) that could be missing a whole part of configurational space.

LJ38
LJ 38 has a double funnel energy landscape [42] and its heat capacity exhibits three peaks: the first high temperature peak corresponds to a vapour-liquid transition, the second peak corresponds to melting, and the low temperature peak corresponds to a solid-solid transition from the cuboctahedral global minimum to the lowest icosahedral minimum [14]. The PT heat capacity curve for LJ 38 generally overestimates the temperature at which the solid-solid phase-like transition occurs and, consequently, disappears under the melting peak, and should appear, instead, as a small shoulder slightly under the melting peak [15,17]. For this reason long calculations are necessary for the heat capacity to converge. Partay et al. [28] employed K = 244000 replicas and O(10 12 ) energy evaluations to resolve the heat capacity of LJ 38 by NS. Fig. 5 compares the heat capacity curves obtained by PT (using a box of radius R = 2.8 σ), HSA (computed using 89000 minima) and SENS for LJ 38 . SENS was carried out using K = 50000 replicas and N = 10000 steps for each MCMC walk. The database of minima used for SENS contained approximately 89000 minima. It appears that neither version of SENS can outperform NS or parallel tempering, which require the same number of energy evaluations to converge; see Table III for comparison. Exact SENS fails due to the inaccuracy of the HSA, and Table V shows the number of effective swaps from and to the basin. These numbers are considerably smaller than for LJ 31 (see Table IV), even if the total number of iterations for LJ 38 is much larger. Approximate SENS should generally work independently of the quality of the HSA. Here, however, the three transitions overlap significantly thus preventing sampling from the database early enough to get the right low temperature behaviour without affecting the high temperature behaviour. This is not the case for LJ 75 where the phase-like transitions are well separated in temperature and sampling can start early enough to get the correct low temperature behaviour without affecting the melting transition.

Swap statistics in exact SENS calculations
In this section we presents statistics for the exact SENS swaps in the longest runs of each LJ system presented in the paper. For each minimum we report the total number of swaps that led to this minimum, the number of effective swaps to this minimum (thus excluding the swaps within the same minimum) and the number of effective swaps that led from the minimum to another.