A Toolbox for Quantifying Memory in Dynamics Along Reaction Coordinates

Memory eﬀects in time-series of experimental observables are ubiquitous, have important con-sequences for the interpretation of kinetic data, and may even aﬀect the function of biomolecular nanomachines such as enzymes. Here we propose a set of complementary methods for quantifying conclusively the magnitude and duration of memory in a time series of a reaction coordinate. The toolbox is general, robust, easy to use, and does not rely on any underlying microscopic model. As a proof of concept we apply it to the analysis of memory in the dynamics of the end-to-end distance of the analytically solvable Rouse-polymer model, an experimental time-series of extensions of a single DNA hairpin measured by optical tweezers, and the fraction of native contacts in a small protein probed by atomistic Molecular Dynamics simulations. Abstract In this Supplementary Material (SM) we present details about the numerical integration of the anti-Itˆo Langevin equation, all exact results for the Rouse polymer, MD simulation details and the fraction of native contacts – the reaction coordinate for the protein dynamics considered in the Letter, as well as details about the estimation of the diﬀusion landscape D ( q ). In addition, a description of the uncertainty quantiﬁcation for the Kullback-Leibler divergence and supplementary ﬁgures are included showing the various Green’s functions for the Rouse polymer and DNA hairpin.

Memory effects can have intriguing manifestations in the evolution of both, ensemble- [17,[27][28][29][30] and timeaveraged observables [17,31], and are often particularly well-pronounced in observations that reflect, or couple to, intra-molecular distances in conformationally flexible biomolecules [18, 20, 22-24, 27, 32-40]. Moreover, if the dynamics is ergodic in the sense that the system relaxes to a unique equilibrium probability density function from any initial condition (i.e. the reaction coordinate has a unique free energy landscape) then the memory is necessarily transient [17]. Whether or not memory is in fact relevant depends on how its extent compares to the relaxation time and whether or not the latter is reached in an experiment. If the extent of memory is comparable to, or longer than, the time-scale on which biomolecules operate, such e.g. enzymes catalyzing chemical reactions [41,42], non-Markovian effects shape biological function.
It is therefore important to assess the presence and duration of memory effects in the dynamics along reaction coordinates. An elegant "test of Markovianity" of a reaction coordinate has recently been proposed by Berezhkovskii and Makarov, who considered the behavior of transition paths [43]. The authors provide a pair of inequalities whose violation conclusively reflects that the dynamics is non-Markovian. However, memory-effects are typically transient [17] although their extent may exceed the duration of experimental observations [35]. There is thus a need to determine not only the presence of memory in a time-series of a reaction coordinate but also its extent and attenuation on different time-scales.
Here, we fill this gap by providing a toolbox for quantifying the magnitude and duration of memory in a timeseries of a reaction coordinate. We propose a set of model-free complementary methods that are easy to use and suited to treat reaction coordinates with arbitrary dimensionality. As a proof of concept we apply these methods to the analysis of an experimental time-series of the extension of a DNA-hairpin measured by optical tweezers, the fraction of native contacts in a protein probed by atomistic Molecular Dynamics (MD) simulations, and the exactly-solvable Rouse model of polymer chain.
Theory.-Our approach is twofold -(i) we quantify violations of the Chapman-Kolmogorov equation in a time series of the monitored true dynamics, and (ii) compare the true dynamics to a constructed nominally memoryless diffusion in the free energy-and diffusion-landscape of the true dynamics. This assumes all hidden degrees of freedom to be at equilibrium constrained by the instantaneous value of the observable.
Let q t with 0 ≤ t ≤ T denote the monitored timeseries of the reaction coordinate and q M t the constructed Markovian series. Without any loss of generality we assume that the reaction coordinate is one-dimensionalthe generalization to multiple dimensions is straightforward. We assume q t and q M t to be ergodic with an equilibrium probability density p eq (q) that is by construction identical for both processes. Let G(q, t|q 0 ) = δ(q t −q) q0 denote the probability density that the reaction coordinate evolving from q t=0 = q 0 is found at time t to have a value in an infinitesimal neighborhood of q and G M (q, t|q 0 ) = δ(q M t − q) q0 the Markovian counterpart, where δ(x) denotes Dirac's delta function and the angular brackets · q0 the average over all realizations of q t evolving from q 0 . We then have lim t→∞ G(q, t|q 0 ) = lim t→∞ G M (q, t|q 0 ) = p eq (q) as a result of ergodicity. In practice the limits are achieved as soon as t becomes sufficiently larger than the relaxation time t rel , i.e. t t rel , which may or may not be reached in an experiment. Note that the relaxation times of the true and Markovian ref-erence process are typically different [17,30].
We use two descriptors. The first is the Kullback-Leibler divergence between the transition probabilities of the true and a reference process defined as [44] D a q0 (t) ≡ dqG(q, t|q 0 ) ln[G(q, t|q 0 )/G a (q, t|q 0 )], (1) where a = CK, M denotes the particular kind of reference process that we detail below. By construction D a q0 (t) = 0 if and only if G(q, t|q 0 ) = G a (q, t|q 0 ) and thus non-zero values of D a q0 (t) reflect memory in the dynamics of q t . When q t reaches equilibrium in the course of the experiment we also consider the normalized equilibrium autocorrelation function defied as where we have introduced where the definitions in terms of time-averages hold when trajectories are much longer than the relaxation time, i.e. T t rel . The absence of an index refers to the true process and M to the constructed Markovian counterpart.
We consider two distinct reference processes. The first one is a mathematical construction based on the Chapman-Kolmogorov equation (i.e. a = CK) that we may write as because the Green's function of a time-homogeneous Markov process is time-translation invariant, G(q, t − τ |q ) = G(q, t|q , τ ), and G CK τ (q, t|q 0 ) = G(q, t|q 0 ) independent of τ [45]. The physical interpretation of Eq. (4), which is exact for Markov processes, is that we observe the true dynamics q t until time τ and then instantaneously reset the memory (if any) to zero.
If q t is indeed memoryless we have G CK τ (q, t|q 0 ) = G(q, t|q 0 ) for any τ and thus D CK τ,q0 (t) = 0 for any t and τ . If G CK τ (q, t|q 0 ) = G(q, t|q 0 ) for some t and τ then q t is conclusively non-Markovian and D CK τ,q0 (t) > 0, but the converse is not true. Namely, there exist non-Markovian processes that satisfy the Chapman-Kolmogorov equation [17,46]. Note that this method does not require q t to reach equilibrium during an experiment and requires only G(q, t|q 0 ) that is straightforward to determine from a time series q t given sufficient data. If equilibrium is reached, D CK τ,q0 (t t rel ) 0 for any q 0 . By analyzing D CK τ,q0 (t) we can quantify the degree and range of memory as a function of τ and q 0 which we demonstrate below.
In the second method we construct from q t a Markovian time-series q M t (i.e. a = M) evolving under the influence of the potential of mean force w(q) ≡ −k B T ln p eq (q) according to the thermodynamically consistent anti-Itô (i.e. post-point) [18] Langevin equation and ξ t denotes zero mean Gaussian white noise with covariance ξ t ξ t = δ(t − t ), D(q t ) is the diffusion landscape and is the anti-Itô or Klimontovich product [47] (see Supplementary Material (SM) [48] for the discretized version of Eq. (5)). This method assumes the ability to determine the equilibrium probability density p eq (q) and thus requires q t to reach equilibrium. In the simplest model the diffusion coefficient does not depend on q and we may interpret Eq. (5) according to Itô. However, this may not be the case (see below), and we note that the best possible Markovian approximation includes a positional dependence [49]. Efficient methods have been developed to infer D(q) [9][10][11].
On the level of the probability density function Eq. (5) corresponds to the Fokker-Planck equation (6) with initial condition G M (q, 0|q 0 ) = δ(q − q 0 ) and natural boundary conditions imposed by the underlying physics.
End-to-end distance of a Rouse polymer.-As a first example we consider a Rouse polymer chain with N + 1 beads (N bonds) in absence of hydrodynamic interactions [57,58] and focus on the end-to-end distance as the reaction coordinate, i.e. q t ≡ |r 1 − r N +1 |, which is known to be non-Markovian. The model is exactly solvable and the explicit results for C(t), C M (t), G(q, t|q 0 ), G M (q, t|q 0 ) and G CK τ (q, t|q 0 ) are all given in [48]. We express time in units of t Kuhn , the characteristic diffusion time of a Kuhn-segment, i.e. t Kuhn = b 2 /D, where b is the Kuhnlength and D the diffusion coefficient of a bead.
A comparison of the autocorrelation function of the true dynamics and its Markovian approximation C M (t) is shown in Fig. 1a, with the inset depicting the corresponding equilibrium probability density p eq (q). Note that when the free energy landscape w(q) overestimates the confining effect of hidden degrees of freedom on q t the Markovian approximation overestimates the relaxation rate (e.g. [17]; see also [48]). Namely, the Markovian approximation assumes the hidden degrees of freedom to remain at equilibrium at all times, whereas the actual instantaneous, fluctuating restoring force on q t is in this case smaller than the force arising from w(q).
The Chapman-Kolmogorov-construct for the Rouse polymer, G CK τ (q, t|q 0 ), (given explicitly in the [48]) differs from the true G(q, t|q 0 ) for all expect large values of t − τ . A quantification of the discrepancy between the true and "Chapman-Kolmogorov" evolution of the end-to-end distance of the Rouse-polymer in terms of the Kullback-Leibler divergence (1) is shown in Fig. 2a. A typical time evolution of D CK τ,q0 (t) gradually increases from zero, reaches a maximum and afterwards returns back to 0, which reflects the gradual build-up and attenuation of memory because q t "remembers" the initial condition of the hidden degrees of freedom [17]. As a result, the Chapman-Kolmogorov Green's function G CK τ (q, t|q 0 ) fails to predict the true evolution of q t , and D CK τ,q0 (t) constructed this way depends on both, τ and initial condition q 0 . For the Rouse-polymer with 1000 beads D CK τ,q0 (t) = 0 at least up to t ∼ 10 4 × t Kuhn .
Next we examine D M q0 (t), the Kullback-Leibler diver-gence (1) between the true Green's function G(q, t|q 0 ) and the Markovian approximation corresponding to the white-noise Markovian diffusion in the exact free energy landscape (i.e. Eq. (5)). The results are shown in Fig. 2d. The qualitative features of the time-dependence of D M q0 (t) are similar to those observed in Fig. 2a -memory builds up in a finite interval and smoothly returns back to zero from the attained maximum. The intuition behind this result is that it takes a finite time to allow for distinct evolutions of hidden degrees of freedom that introduce memory in the dynamics of the reaction coordinate q t . At long times memory is progressively lost as a result of the gradual relaxation of the hidden degrees of freedom to their respective equilibrium that in turn renders the dynamics of the reaction coordinate effectively memory-less and correspondingly D M q0 (t) vanishes. Single-molecule experiments on a DNA hairpin.-As a second example we consider a time-series of the end-toend distance of a single-strand DNA hairpin measured in an optical tweezers experiment performed by the Woodside group [59]. The data-set contains 11 million measurements of the extension of the DNA hairpin 30R50T4 held in a pair of optical traps with stiffness 0.63 pN/nm and 1.1 pN/nm, respectively, sampled with a 2.5µs temporal resolution. It has been shown that this time-series is non-Markovian [40]. The length of the time-series is much larger that the relaxation time (see Fig. 1b) and therefore we slice it into several pieces that are statistically independent. More precisely, we use the time-scale t cut where the autocorrelation function of the extension, C(t), falls to 0.05. This ensures t cut t rel and yields an ensemble of 50 statistically independent trajectories.
We determine the equilibrium probability density p eq (q) (see inset of Fig. 1b) and two-point joint probability density p(q, t, q 0 , 0) = p(q, t 0 + t, q 0 , t 0 ) by performing a standard histogram analysis with a bin-size of l bin =0.35 nm, such that q refers to a bin of width l bin centered at q. The Greens function is thereupon obtained by the law of conditional probability, G(q, t|q 0 ) = p(q, t, q 0 , 0)/p eq (q) while C(t) in Eq. (2) is determined directly from the respective second lines of Eq. (3).
The Chapman-Kolmogorov construct is determined from G(q, t|q 0 ) by direct integration of Eq. (4) and is used to determine D CK τ,q0 (t), while the corresponding fictitious Markovian process evolves as Markovian diffusion in a free energy landscape w(q) with a constant diffusion coefficient D that we determine according to standard methods as detailed in the [48]. According to the results to a good approximation D is independent of q. The analysis yields D = 447 ± 9 nm 2 /ms that we use to generate the Markovian time-series q M t by integrating the Itô Langevin equation (5) using the Euler-Mayurama scheme (for details see [48]), and determine D M t (q 0 ) in Eq. (1) and C M (t) in Eq. (2), respectively.
In contrast to the Rouse-polymer the DNA hairpin exists in two characteristic conformational states -folded  ) and (e) the extension of a DNA-hairpin evolving from several initial conditions within a bin of thickness 1nm centered at q0, and (c) and (f) the fraction of native contacts in the WW-domain of protein 2F21 for several q0; the error bars depict the standard deviation obtained by systematically neglecting ∼20% (in case of the hairpin) and ∼40% (in case of the protein) of the data. Due to the particular construction of Eq. (4) times shorter than depicted are not accessible due to numerical instability or poor statistics. and unfolded. As a result, the equilibrium probability density function p eq (q) is bimodal and the dynamics of q t displays signatures of metastability [59]. However, since the two peaks corresponding to the two sub-populations are not separated (see inset of Fig.1b) the potential of mean force w(q) is expected to underestimate the free energy barrier and therefore the Markovian evolution is likely to overestimate the relaxation rate. In complete agreement Fig.1b displays an overestimation of the rate of decay of autocorrelations in the Markovian approximation by two orders of magnitude in time. Moreover, a long-lived plateau is observed in the true C(t) spanning more than an order of magnitude in time.
In order to assess whether the mismatch between true and Markovian time evolution is predominantly due to an underestimation of the free energy barrier between folded and unfolded states of the hairpin we inspect the Kullback-Leibler divergence (1) between the true and "Chapman-Kolmogorov evolution" shown in Fig. 2b. The result clearly shows pronounced signatures of memory extending over more than ∼10 ms. Note that the "Chapman-Kolmogorov evolution" is exact until time t = τ whereupon memory is reset to zero. Therefore a non-zero D CK τ,q0 (t) is a clear signature of memory arising from the dynamical coupling of q t to hidden degrees of freedom. Similar to the Rouse-polymer D CK τ,q0 (t) depends on the initial condition q 0 .
A build-up and decay of memory similar to the Rousepolymer is also observed in the time evolution of D M q0 (t), the Kullback-Leibler divergence between the Green's function of the true evolution and the white-noise Markovian diffusion in the exact free energy landscape shown in Fig. 2b. Notably, Fig. 2b and Fig. 2e display essentially the same extent of memory (though the peak is attained sooner in the white-noise Markovian diffusion), demonstrating that metastability does not necessarily destroy nor dominate memory in the evolution of reaction coordinates. Note that the presence of memory in metastable systems is not unusual (see e.g. [22,23] and [30]). In total, the analysis conclusively identifies extended memory in the dynamics of the extension of the hairpin. It is important to note that the extent of memory (of the order of ∼ 10ms) is clearly shorter than the relaxation time t rel (compare Figs. 1b and 2e), and therefore the decay of memory does not coincide with t rel and the corresponding "forgetting" of initial conditions of the coordinate itself. Instead the memory reflects correlations between q t and the initial conditions of the hidden degrees of freedom [17]. The information encoded in C(t) and D M,CK (t) is therefore different -D M,CK (t) is a genuine measure of the extent and duration of memory.
MD simulation of WW-domain of 2F21.-We analyzed 177 atomistic MD trajectories of the WW-domain of the human Pin1 Fip (2F21) mutant [11] provided by the Grubmüller group, each 1 µs long sampled every 10 ps. During this time the protein attains a pronounced local equilibrium in the folded state and does not unfold. The data set was produced in 15 days in "wall time".
We also analyzed two longer trajectories, 486 and 651 µs long sampled every 200 ps, from [22] where the protein reversibly (un)folds several times but sampling of the unfolded state is limited (see [48]). The fraction of native contacts [20] was chosen as the reaction coordinate (see [48] for details). It reflects the displacement of the protein's structure from the native conformation. In contrast to the previous examples it is not known whether this coordinate displays memory. Technical details incl. the simulation parameters, estimation of D(q) (with error analysis), and corresponding results for the longer trajectories are shown in [48].
The results are qualitatively similar to the hairpin with one notable exception -the diffusion coefficient may not be considered to be constant. The equilibrium density p eq (q) and diffusion landscape D(q) in the folded state are shown alongside C(t) in Fig. 1c. As a first signature of memory the Markovian time-series constructed according Eq. (5) overestimates the relaxation rate by almost two decades. The Kullback-Leibler divergence D CK τ,q0 (t) in Fig. 2c shows pronounced memory up to ∼ 10 ns, extending up to ∼ 100 ns when considering the longer trajectories that also capture the protein's dynamics in the unfolded state (see [48]). Occurring on time-scales 20µs [22], the (un)folding dynamics is thus memory-less. This example highlights that our method does not distinguish between local and global equilibrium in case of a timescale separation, such as the ns time-scale folded-state dynamics and ∼ 20µs time-scale (un)folding dynamics.
The constructed Markovian time-series shows qualitatively similar signatures of memory as the hairpin (see Fig. 2f). The extent of memory displayed by D M q0 (t) matches that of D CK τ,q0 (t) and, similar to the Rousepolymer and hairpin, depends on the initial condition q 0 . One may quite generally relate this dependence to the dynamics of hidden degrees of freedom with respect to how far q 0 is displaced from the free energy minimum. When q 0 is near the free energy minimum the dynamics of hidden degrees of freedom has a smaller effect.
Remarks on feasibility.-The toolbox requires an ensemble of statistically independent or ergodically long trajectories. Most demanding is the Chapman-Kolmogorov analysis that requires sufficient sampling of the support of the integral in Eq. (4) at different times t, τ . Constructing the Markovian time-series requires accurate estimates of p eq (q) and D(q). The minimal data requirements depend on the system at hand, and may vary substantially. However, we propose a simple test of the reliability of the results -determining their uncertainty by a comparison with results obtained by omitting say ∼10%-20% of data as shown in Fig. 2e-f. For a reliable quantification of memory the statistical uncertainty should be substantially smaller than the value of the Kullback-Leibler divergence, as in the present case.
Conclusion.-We presented a set of complementary methods to quantify conclusively the degree and dura-tion of memory in a time series of a reaction coordinate q t . The proposed toolbox does not assume any particular physical model. Instead it exploits the Chapman-Kolmogorov equation and constructs a fictitious Markovian diffusion process in the free energy landscape of q t , and compares the artificially constructed transition probability density with the observed probability density. The analysis not only determines whether the dynamics of q t has memory but also quantifies the magnitude and duration of memory and thus complements the recently proposed "test for Markovianity" based on transition paths [43]. Whereas in our examples we considered only one-dimensional coordinates, the toolbox generalizes straightforwardly to higher-dimensional reaction coordinates. The method is general, robust, and easy to use, and should be used before any attempt to describe a complex system with a low-dimensional Markovian reaction coordinate. We therefore hope that it will find numerous applications involving time-series derived from experiments and computer simulations.

Alessio Lapolla and Aljaž Godec
where we assumed the validity of the fluctuation-dissipation theorem, i.e. µ = D/k B T is the mobility, f (q t ) is the force, and η is a random number drawn from a Gaussian distribution with zero mean and unit variance. Note that only a single random number η is required for each iteration. When the noise is additive (i.e. D(q) → D is a constant) the previous scheme simplifies to the classic Euler-Mayurama scheme The above equations are used to integrate the Langevin equation (5) in the main text in the case of the DNA hairpin and protein.

ANALYTICAL RESULTS FOR THE ROUSE POLYMER
The probability density function for the positions of all beads {r i } is well-known [2,3] and allows us to determine exactly the probability density of the end-to-end distance.
Introducing ν k ≡ kπ/2(N + 1), α k = 4 sin 2 (ν k ) as well as Q ik ≡ 2/(N + 1) cos(ν k [2i − 1]) and η t ≡ N k=1 (Q 1k − Q N +1 k ) 2 e −α k t /2α k the equilibrium probability density of q is given by p eq (q) = q 2 e −q 2 /4η0 /2 √ πη 3/2 0 for q ∈ [0, ∞) with the mean extension d = 4 η 0 /π and mean square extension d 2 = 6η 0 . The probability density function of q reads exactly (for a derivation see Ref. [4]) . (S3) The exact autocorrelation function is in turn obtained in the form The Fokker-Planck equation in the Markovian approximation to the evolution of q for the Rouse polymer can be solved in the form of a spectral expansion [5] and reads G M where Γ(x) denotes the Gamma-function and L 1/2 k (x) the generalized Laguerre polynomial of degree k with parameter 1/2 (see [6]) that we compute using the Arb-library [7] and ψ R k (x) = p eq (x)ψ L k (x). Here from it is straightforward to obtain the autocorrelation function in the Markovian approximation that reads The integral defined in Eq. (4) in the main text can be solved analytically via a straightforward but tedious calculation using Eq. (S3). The result of the integral reads exactly having defined Notably, the structure of Eq. (S7) is identical to the structure of the plain Green's function (Eq. (S3)) but here the temporal dependence is obviously different. Note that in when the observation time is much larger than the relaxation time of the observable t rel , we find for t − τ > t rel that G CK τ (q, t|q 0 ) p eq (q) dq G(q , t|q 0 ) = p eq (q). Therefore, since lim t→∞ G(q, t|q ) = p eq (q), the definition of G CK τ (q, t|q 0 ) (Eq. (4) in the main text) by construction ensures lim t→∞ D CK τ,q0 (t) = 0.

GREEN'S FUNCTIONS
In Fig. S3 we explicitly show the Green's function that is required for the computation of the Kullback-Liebler divergence.

DETAILS OF THE PROJECTION AFFECT THE RELAXATION TIME AND EXTENT OF MEMORY
In the main text we consider Rouse polymer chain composed of 1000 beads and we focus on the autocorrelation function of its end-to-end distance as the reaction coordinate q t . We find that the fictitious Markovian reference process describing Brownian diffusion in the free energy landscape overestimates the relaxation rate; a similar observation is also made in the case of the experimental hairpin data. However this difference in the rate of relaxation is non-unique and in fact depends on the observable, i.e. on details of the projection.
For example we demonstrate in Fig. S4 the opposite trend that arises when we observe the autocorrelation function of the distance between the first and the second bead of the same Rouse Chain (see dashed lines).
In addition, is worth to note that if the Green's function G describing the full many-dimensional system is diagonalizable (like in the Rouse chain case [8] or any Markovian dynamics obeying detailed balance), it can be written as where ψ R k and ψ L k are respectively the right and left eigenfunctions of the underlying Fokker-Planck-Smoluchowski operator, while λ k denotes the eigenvalues. Then the Green's function of the projected observable -the reaction coordinate q = Γ(x) -can be written in full generality [9] as where the elements V R k and V L k depend both on ψ R k and ψ L k , and on the projection Γ(x). In turn the autocorrelation function can be easily computed as: and one can show that for systems obeying detailed balance a R,Γ k b L,Γ k ≥ 0 [9]. The analysis shows that the projection only affects the weights whereas the exponentiated eigenvalues (and thus time-scales) are those of the full system's dynamics.
Nevertheless, the autocorrelation function of different observables of the same system may decay on widely disparate time-scales; compare the dashed and continuous lines in Fig. S4 where in the end-to-end distance the relaxation time is ∼ 10 6 while in the first-to-second distance is ∼ 10 1 . This disparity is simply a result of the projection that determines the relative contribution of different eigenfunctions.

MD SIMULATION DETAILS
177 trajectories 1 µs long trajectories of the WW-domain of the human Pin1 Fip (2F21) mutant were generated using the GROMACS 4.5 software package [12] with the Amber ff99SB-ILDN force field [13] and the TIP4P-Ew water model [14]. The starting structure was taken from the PDB entry 2F21 [11] and considered only its WW-domain. Energy minimization was performed using steepest descent for 5 · 10 4 steps. The hydrogen atoms were described by virtual sites. In each trajectory the protein was positioned within a triclinic water box using gmx-solvate, such that the smallest distance between protein surface and box boundary was larger than 1.5 nm. Sodium and chloride ions were added to neutralize the system, corresponding a physiological concentration of 150 mmol/l. The system was first equilibrated for 0.5 ns in the NVT ensemble, and subsequently for 1.0 ns in the NPT ensemble at 1 atm pressure and temperature 300 K, both using an integration time step of 2 fs. The velocity rescaling thermostat [15] and Parrinello-Rahman pressure coupling [16] were used with coupling coefficients of τ = 0.1 ps and τ = 1 ps, respectively. All bond lengths of the solute were constrained using LINCS with an expansion order of 6, and water geometry was constrained using the SETTLE algorithm. Electrostatic interactions were calculated using PME [17], with a real space cutoff of 10Å and a Fourier spacing of 1.2Å. The integration time-step was 4 fs, and the coordinates of the alpha carbons were saved every 10 ps.

FRACTION OF NATIVE CONTACTS
The dynamics of WW-domain of the human Pin1 Fip mutant was projected on the fraction of native contacts as the reaction coordinate, defined in [20] as (S12) where r ij (t) is the distance between atoms i and j at time t, r 0 ij is the same distance in the native state, S is the set of all pairs of the N heavy atoms (i, j) belonging to residues θ i and θ j such that |θ i − θ j | > 3Å and r 0 i, < 4.5Å. The parameter β = 5Å −1 is a smoothing parameter while λ = 1.8 takes into account the fluctuations of the system. This reaction coordinate was extracted from the files containing the Molecular Dynamics trajectories using the MDTraj library [19].

RESULTS FOR LONG MD SIMULATIONS
The equilibrium probability density p eq (q) and autocorrelation function C(t) of the fraction of native contacts determined from the two longer MD trajectories provided by the Shaw group is shown in Fig. S5. Clearly, q t does not relax during the simulation despite the beyond impressive length of the trajectory. Moreover, because the major change in q is due to the folding process the intermediate plateau corresponding to the local equilibrium in the folded state is not visible, as it contributes negligibly to the total relaxation process.
Despite limited statistics at long times we used the Chapman-Kolmogorov construction (since this method does not require that q t equilibrates) to asses the presence of memory in the reaction coordinate. The results are depicted in Fig S6. Signatures of memory are present only on short time-scales < 100 ns, and are the strongest in the deep well corresponding to the folded state. We therefore confirm that the folding-unfolding transition that develops on timescales larger than 1 µs is effectively memory-less [20] (note that the experimental unfolding time was estimated to be  [21] while the Molecular Dynamics simulations yield a value of 21 µs [22]). Conversely, both data-sets show a pronounced memory in the folded-state relaxation.

ESTIMATION OF THE DIFFUSION COEFFICIENT
We estimate the (q-independent) diffusion coefficient D(q) from a time-series using the first two moments of the local displacements according to the thermodynamically consistent anti-Itô convention. We first determine the first and second moment of the displacement in each bin-point q l after a single time-step ∆t (that is 2.5 µs for the hairpin and 10 ps in the case of the protein), i.e. δq 2 ∆t (l) and δq ∆t (l) where δq δt (l) = q t+∆t − q t | qt+∆t=q l .
where the brackets · here denote the average over all displacements in the bin observed during the entire time-series. We consider two bin-sizes, l D =0.01 nm and l D =0.001 nm, and find the result to be essentially independent on the precise value of l D we choose.
In the case of the hairpin the results are rather independent of the location of the bin q l (see Fig. S7), implying that to a good approximation D may indeed be taken as being constant, such that we instead take D(q l ) → D =  In the case of the protein we determine D(q) for both, the shorter and longer simulation. In both cases the diffusion coefficient is found to be weakly dependent on q, and is smaller in the folded state, in agreement with the results presented in [10]. In order to efficiently simulate the constructed Markovian process for the shorter simulation (which attains a local equilibrium), we fit diffusion landscape to a cubic polynomial D(q) = [−4.03867 + 13.66777 q − 15.26772 q 2 + 5.64218 q 3 ] ns −1 . (S14) The result are shown in Fig. S8.

UNCERTAINTY ESTIMATION
We estimated the uncertainty in the computation of the Kullback-Liebler divergences by considering M = 20 randomly reduced the data-sets, each containing 100 different trajectories (i.e. taking only ∼ 56% of the total number of trajectories) for the protein, and 40 different trajectories (i.e. taking only ∼ 80% of the total number of trajectories) for the DNA-hairpin. From these results we determined the standard deviation in D(t) as (D(t) − D(t) ) 2 . (S15)