Spectral rate theory for projected two-state kinetics

Classical rate theories often fail in cases where the observable(s) or order parameter(s) used are poor reaction coordinates or the observed signal is deteriorated by noise, such that no clear separation between reactants and products is possible. Here, we present a general spectral two-state rate theory for ergodic dynamical systems in thermal equilibrium that explicitly takes into account how the system is observed. The theory allows the systematic estimation errors made by standard rate theories to be understood and quantified. We also elucidate the connection of spectral rate theory with the popular Markov state modeling (MSM) approach for molecular simulation studies. An optimal rate estimator is formulated that gives robust and unbiased results even for poor reaction coordinates and can be applied to both computer simulations and single-molecule experiments. No definition of a dividing surface is required. Another result of the theory is a model-free definition of the reaction coordinate quality (RCQ). The RCQ can be bounded from below by the directly computable observation quality (OQ), thus providing a measure allowing the RCQ to be optimized by tuning the experimental setup. Additionally, the respective partial probability distributions can be obtained for the reactant and product states along the observed order parameter, even when these strongly overlap. The effects of both filtering (averaging) and uncorrelated noise are also examined. The approach is demonstrated on numerical examples and experimental single-molecule force probe data of the p5ab RNA hairpin and the apo-myoglobin protein at low pH, here focusing on the case of two-state kinetics.

The description of complex molecular motion through simple kinetic rate theories has been a central concern of statistical physics. A common approach, first-order rate theory, treats the relaxation kinetics among distinct regions of configuration space by single-exponential relaxation. There has been a recent interest in estimating such rates from fluctuation trajectories of single molecules, resulting from the recent maturation of measurement techniques that are able to collect extensive traces of single molecule extensions or fluorescence events [1,2]. When the available observable is a good reaction coordinate in that it allows the slowly-converting states to be clearly separated (see Fig. 2 top left), classical rate theories apply and the robust estimation of transition rates is straightforward using a variety of means [3]. However, in the case in which the slowly-converting states overlap in the observed signal (see Fig. 2 bottom left), either due to the fact that the molecular order parameter used is a poorly separating them, or due to large noise of the measurement (see discussion in [4,5]), a satisfactory theoretical description is missing and many estimators break down.
Most rate theories and estimators are based on dividing the observed coordinate into a reactant and a product substate and then in some way counting transition events that cross the dividing surface. Transition state theory measures the instantaneous flux across this sur-face, which is known to overestimate the rate due to the counting of unproductive recrossings over the dividing surface on short timescales [6]. Reactive flux theory [7] has proposed to cope with this by counting a transition event only if it has succeeded to stay on the product side after a sufficiently long lag time τ. Reactive flux theory involves derivatives of autocorrelation functions that are numerically unreliable to evaluate [8]. In practice, one therefore typically estimates the relaxation rate via integration or performing an exponential fit to the tail of a suitable correlation function, such as the number correlation function of reactants or the autocorrelation function of the experimentally measured signal [3,9,10]. In order to split this relaxation rate into a forward and backward rate constant, a clear definition of the reactant and product substates is needed, which is difficult to achieve when these substates overlap in the observed signal.
Markov models (MSMs) have recently become a popular approach to producing a simplified statistical model of complex molecular dynamics from molecular simulations and can be regarded as an attempt towards a multistate rate theory. MSMs use a transition matrix describing the probability a system initially found in a substate i is found in substate j a lag time τ later. When the state division allows the metastable states of the system to be distinguished [11][12][13][14], the transition matrix with a sufficiently large choice of τ can be used to derive a phenomenological transition rate matrix that accurately describes the interstate dynamics [15]. This is explicitly done for the two-state case in [8]. It has been shown in [14,16] that by increasing the number of substates used to partition state space, and hence using multiple dividing surfaces instead of a single one, these rate estimates become more precise. In the limit of infinitely many substates, the eigenfunctions of the dynamical propagator in full phase space are exactly recovered, and the rate estimates become exact even for τ → 0 + [17]. In practice, however, a finite choice of τ is necessary in order to have a small systematic estimation error, especially if "uninteresting" degrees of freedom such as momenta or solvent coordinates are discarded. An alternative way of estimating transition rates is by using a state definition that is incomplete and treats the transition region implicitly via committor functions that may better approximate the eigenfunctions of the dynamical propagator in this region [18][19][20].
The quality of the rate estimates in all of the above approaches relies on the ability to separate the slowlyconverting states in terms of some dividing surface or state definition. These approaches break down in the case where the available observables do not permit such a separation, i.e., when kinetically distinct states overlap in the histogram of the observed quantity. However, such a scenario may often arise in single-molecule experiments where the available order parameter depends on what is experimentally observable and may not necessarily be a good indicator of the slow kinetics. Moreover, consequences of the measurement process may increase the overlap between states, for example by bead diffusion in optical tweezer experiments or by shot noise in single-molecule fluorescence measurements. In favorable situations, the signal quality can be improved by binning the data to a coarser timescale, thus reducing the fluctuations from fast processes and shot noise. However, the usefulness of such filtering is limited because the time window used needs to be much shorter than the timescales of interest -otherwise the kinetics will be distorted. In general, one has to deal with a situation where overlap between the slowly-converting states is present, both theoretically and practically.
Hidden Markov Models (HMMs) [21][22][23] and related likelihood methods [24] are able to estimate transition rates even in such situations, and have been recently successful in distinguishing overlapping states in molecules with complex kinetics [25,26]. However, HMMs need a probability model of the measurement process to be defined, which can lead to biased estimates when this model is not adequate for the data analyzed. A recent approach, the signal pair-correlation analysis (PCA) [27] provides rate estimates without an explicit probability model, and instead requires to define indicator functions on which the measured signal can uniquely be assigned to one of the kinetically separated states. While this is often easier to achieve than finding an appropriate dividing surface, there is a tradeoff between only using only data that is clearly resolved to be in one state or the other (thus minimizing the estimation bias), while avoiding discarding too much data (thus minimizing the statistical error). Despite these slight limitations, both HMM's and PCA are practically very useful to identify and quantify hidden kinetics in the data. Yet, both are algorithmic approaches rather than a rate theory.
The recent success of single-molecule experiments and the various attempts to estimate accurate transition rates when the relevant states overlap in the observed signal emphasizes the need to have a general rate theory for observed dynamics. Here, we make an attempt towards such a general rate theory for stochastic dynamics that are observed on a possibly poor reaction coordinate -either because the probed molecular order parameter is a poor choice, or because the measurement device creates overlap by noise-broadening the signal. Only very basic properties are assumed to hold for the dynamics underlying the observed signal: It is assumed that the dynamical law governing evolution of the system is a time-stationary Markov process in full-dimensional phase space, obeys microscopic detailed balance, and supports a unique stationary distribution -criteria satisfied for many physical systems of interest. These dynamics are then projected onto some observable in which they are no longer Markovian, and optionally subjected to noise -such that the resulting signal may not easily be separable by a simple dividing surface. This framework allows us to (i) evaluate the quality of existing estimators and propose optimal estimators for the slowest relaxation rate, (ii) provide an absolute measure of reaction coordinate quality (RCQ) of the observed signal, and (iii) derive an optimal estimator for the transition rates between the slowly converting states, as well as their stationary probability densities, even if these strongly overlap in the observation.
The present rate theory is exclusively concerned with the systematic error in estimating rates and proposes "optimal" methods that minimize this systematic rate estimation error. Therefore all statements are strictly valid only in the data-rich regime. Explicit treatment of the statistical error in the data-poor regime is beyond the scope of the present work but discussed at the end of the paper.

Full-space dynamics
We consider a dynamical system that follows a stationary and time-continuous Markov process x t in a generally large and continuous state space Ω. x t is assumed to be ergodic with a unique stationary density µ(x). In order to be independent of specific dynamical models we use the general transition density p τ (x t , x t+τ ), i.e., the conditional probability density that given the system is at point x t ∈ Ω at time t, it will be found at point x t+τ ∈ Ω a lag time τ later. We will at this point also assume that the dynamics obey microscopic detailed balance, i.e., µ(x t ) p τ (x t , x t+τ ) = µ(x t+τ ) p τ (x t+τ , x t ), which is true for systems that are not driven by external forces. In this case, µ(x) is a Boltzmann distribution in terms of the system's Hamiltonian. The following rate theory can also be formulated for some nonreversible dynamics, but the notation becomes more involved and the quantities are more difficult to interpret; we therefore restrict our discussion to the reversible case here.
Since we are interested in the slow processes we rewrite the transition density as a sum of relaxation processes that are associated with different intrinsic rates by expanding it into terms of the eigenvalues and eigenfunctions ψ i of the corresponding transfer operator [14,16]: Here, are eigenvalues of the propagator that decay exponentially with lag time τ. We order relaxation rates according to κ 1 < κ 2 ≤ κ 3 ≤ ... and thus λ 1 (τ) > λ 2 (τ) ≥ λ 3 (τ) ≥ .... The first term is special in that it is the only stationary process: κ 1 = 0, λ 1 (τ) = 1, ψ 1 (x) = 1, thus the first term of the sum is identical to µ(x). All other terms can be assigned a finite relaxation rate κ i , or a corresponding relaxation timescale t i = κ −1 i , which are quantities of our interest. The eigenfunctions ψ i are independent of τ and determine the structure of the relaxation process occurring with rate κ i . The sign structure of ψ i (x) determines between which substates the corresponding relaxation process is switching and is thus useful for identifying metastable sets, i.e., sets of states that are long-living and interconvert only by rare events [14,28]. The eigenfunctions are chosen to obey the normalization conditions and integration always runs over the full space of the integrated variable if not indicated otherwise. At a given timescale τ of interest, fast processes with κ τ −1 (and correspondingly t i τ) will have effectively vanished and we are typically left with relatively few slowlyrelaxing processes.
such that the normalization condition of eigenfunctions can be conveniently written as Finally, the correlation density c τ (x t , x t+τ ), i.e., the joint probability density of finding the system at x t at time t and at x t+τ at time t + τ is related to the transition density p t by

Observed dynamics and rate theory
Let us consider the case that we are only interested in a single relaxation process -the slowest. Below, we sketch a rate theory for this case. Details of the derivation can be found in the supplementary information (SI). Based on the definitions above, the correlation density can then be written as: with subsuming the fast processes.
Exact rate: The exact rate of interest κ 2 can be recovered as follows: If we know the exact corresponding eigenfunction ψ 2 (x), it follows from Eq. (1) and (3) that its autocorrelation function evaluates to: where · t denotes the time average, that is here identical to the ensemble average due to ergodicity of the dynamics. ψ 2 (0)ψ 2 (τ) t yields the exact eigenvalue λ 2 (τ) and thus also an exact rate estimate ofκ 2 = −τ −1 ln λ 2 (τ) = κ 2 , independently of the choice of τ. Projected dynamics without measurement noise: Suppose we observe the dynamics of an order parameter y ∈ R that is a function of the configuration x. Examples are the distance between two groups of the molecule, or a more complex observable such as the Förster resonance transfer efficiency associated to a given configuration. See Fig. 1 for an illustration. We first assume that no additional measurement noise is present. The analysis of a molecular dynamics simulation where a given order parameter is monitored is one example of such a scenario. Here, it is not possible to compute the rate via Eq. (9) or some direct approximation of Eq. (9), since the full configuration space Ω in which the eigenfunction ψ 2 exists can no longer be recovered once the dynamics has been projected onto an order parameter. Instead, we are forced to work with functions of the observable y ∈ R.
We have two options for deriving the relevant rate equations for the present scenario: As a first option, we note that a projection that is free of noise can be regarded as a function y(x) : Ω → R. Thus, any functionψ 2 (y) of elements in observable space R that aims at approximating the dominant eigenfunction ψ 2 , can be also regarded as a function in full space Ω viaψ 2 (y) =ψ 2 (y(x)). When following this idea, one can use the variational principle of conformation dynamics [29] (see also the discretestate treatment in [20]), in order to derive the rate equations for the observed space dynamics. See SI for details.
However, since we aim at adding measurement noise in a second step, we derive a more general approach (see SI), which is summarized subsequently. Consider the function χ p (y | x) that denotes the output probability density with which each configuration of the full state space, x ∈ Ω, yields a measured value y ∈ R. In the case of simply projecting x-values without noise to specific yvalues, χ has the simple form: This allows the correlation density in observable space to be written as: where we have used superscript y to indicate the projection of a full configuration space function onto the order parameter: µ y (y) is the observed stationary density that can be estimated from a sufficiently long simulation by histogramming the values of y. Mathematically, it is given by: φ y i are the projected eigenfunctions: In order to arrive at an expression of the rate κ 2 , we propose a trial function in observation space,ψ 2 (y), which we require to be normalized by and evaluate its autocorrelation function as: where In contrast to Eq. (9), bothψ 2 and φ y i live on the observable space. In the special case that ψ 2 (x) is constant in all other variables than y(x), the projection is lossless (φ y 2 (y(x)) = φ 2 (x) and ψ y 2 (y(x)) = ψ 2 (x) for all x), and using the choiceψ 2 = ψ y 2 , we recover ψ y 2 , φ y 2 = ψ 2 , φ 2 = 1, and thus the exact rate estimate via Eq. (9). In general, however, the eigenfunction ψ 2 (x) does vary in other variables than y, and thereforẽ ψ 2 can at best approximate the full-space eigenfunction viaψ 2 (y(x)) ≈ ψ 2 (x).
Observed dynamics with measurement noise: Suppose that an experiment is conducted in which each actual order parameter value y(x) ∈ R is measured with additional noise, yielding the observed value o ∈ R. In time-binned single-molecule fluorescence experiments such noise may come from photon counting shot noise for a given binning size. In optical tweezer experiments such noise may come from bead diffusion and handle elasticity, assuming that bead and handle dynamics are faster than the kinetics of the molecule of interest. See Fig. 1 for an illustration. Note that we only treat the situation of uncorrelated noise. In situations where the experimental setup changes the kinetics, e.g., when the optical bead diffusion is slow, thus exhibiting different transition rates than the isolated molecule, our analysis always reports the rate of the experimental setup. The task of correcting the measured rates so as to estimate the rates of the pure molecule is beyond the scope of this work and can, for example, be attempted via dynamical deconvolution [30,31].
Like before, the probability of observing a measurement value o ∈ R given that the true configuration was x ∈ Ω can be given by an output probability: which concatenates the projection from x to the value of the order parameter, χ p (y | x), and the subsequent dispersion of the signal by noise, χ d (o | y). Despite the fact that dispersion operates by a different physical process than projection, the same analysis as above applies. We define the projected and dispersed stationary density and eigenfunctions: which are "smeared out" by noise compared to the purely projected density and eigenfunctions φ y i . As above, the autocorrelation function of a probe functioñ ψ 2 (o) is given by: The observation process including noise is a more general process than the observation process not including noise, therefore -unless the distinction is important -we will generally refer to the observation as o subsequently, whether noise is included in the observation or not. Filtered dynamics: The effect of measurement noise may be reduced by filtering the observed signal o(t) → o(t), for example by averaging the signal value over a time window of length W. Note that this operation will introduce memory of length W into the signal and will impair the estimation of all rates which are close to W −1 . Fig. 1 of the SI illustrates the effect of filtering on the estimation quality of rates in a simple example. To make sure that the filter used does not impair the rate estimates, we recommend that the filter length be at least a factor of 10 smaller than the timescales of interest, t 2 = κ −1 2 . The filtered signalō(t) can then be used as input to the various rate estimators discussed in this paper, but the theory of systematic errors given in the subsequent section may no longer apply because filtering destroys the Markovianity of the original dynamic process in the full state space.
Direct rate estimate: In all of the above cases, the autocorrelation function of the trial functionψ 2 does not yield the exact eigenvalue λ 2 (τ), but some approximationλ 2 (τ). For τ κ −1 3 , which can readily be achieved for clear two-state processes where a time scale separation exists (κ 2 κ 3 ), the terms involving the fast processes disappear:λ This suggests that the true rate κ 2 , as well as the prefactor α o that may serve as a basis to measure the reaction coordinate quality, could be recovered from large τ decay of an appropriately good trial function even from the observed signal. We elaborate this concept in subsequent sections. Note that in experiments the relaxation rates κ 2 , κ 3 , etc, are initially unknown and hence the validity of Eq. (22) can only be checked a posteriori, e.g., by the fact that estimates based upon Eq. (22) are independent of the lag time τ.

Existing rate estimators
Many commonly used rate estimators consist of two steps: (1) they (explicitly or implicitly) calculate an autocorrelation functionλ 2 (τ) of some functionψ 2 , and (2) transformλ 2 (τ) into a rate estimateκ 2 . In order to derive an optimal estimator, it is important to understand how the systematic error of the estimated rate depends on each of the two steps. The SI contains a detailed derivation of the subsequent results.  (23) that can also be interpreted as an autocorrelation func- Here, π A = h A µ is the stationary probability of state A and π B = 1 − π A the stationary probability of state B. Other rate estimates chooseψ 2 to be the signal o(t) itself or the committor function between two pre-defined subsets of the o coordinate [19]. We show that none of these choices is optimal, and the optimal choice ofψ 2 will be derived in the subsequent section.
Existing rate estimators largely differ by step (2), i.e., how they transformλ 2 (t) into a rate estimateκ 2 . This procedure then determines the functional form of the systematic estimation error. We subsequently list bounds for these errors (see SI for derivation). The prefactor α in the equations below either refers to α y (purely projected dynamics) or α o (dynamics with noise), whichever is appropriate.
Reactive flux rate.
Chandler, Montgomery and Berne [7,33] considered the reactive flux correlation function as a rate estimator: which becomes 0 for a perfect reaction coordinate (i.e., a perfect choice ofψ 2 = ψ 2 that leads to α = 1), but can be very large otherwise.
Transition state theory rate. The transition state theory rate, which measures the instantaneous flux across the dividing surface between A and B, can be computed from the short-time limit of the reactive flux [7], κ 2,tst = lim τ→0 +κ 2,rf (τ) such that the error in the rate is given bỹ which is always an overestimate of the true rate and of the reactive flux rate.
Integrating the correlation function. Another means of estimating the rate is via the integral of the correlation function,κ 2,int = − ´∞ 0 dτλ 2 (τ) −1 (see, e.g., Equation 3.6 of [7]), with the error: Figure 1: Illustration of the observed dynamics for which a rate theory is formulated here. Top row: the full-dimensional dynamics x(t) in state space Ω. These dynamics are assumed to be Markovian, ergodic, and reversible as often found for physical systems in thermal equilibrium. Furthermore, the theory here is formulated for two-state kinetics, i.e., the system has two metastable states exchanging at rates k AB and k BA , giving rise to a relaxation rate of κ 2 = k AB + k BA . Middle row: One order parameter y(x) of the system is observed, such as a distance between two groups of a molecule, or the FRET efficiency between two fluorescent groups. The projection of the full-space dynamics x(t) onto the order parameter y generates a time series y(t), that may, however not be directly observable. The projection also acts on functions of state space, such as the stationary distribution of configurations in full state space that is projected onto a density in the observable, µ(y). The reaction coordinate qualityα y measures how well the order parameter y resolves the slow transition. It is 1 when A and B are perfectly separated and 0 when they completely overlap. in the special case that κ 3 κ 2 (time scale separation), the error is approximately given by κ 2 (1 − α)/α. Thus, the error of this estimator becomes zero for the ideal case of an ideal reaction coordinate (α = 1), but may be very huge if the reaction coordinate is poor (α < 1).
Single-τ rate estimators: A simple rate estimator is to take the value of the autocorrelation function of some functionψ 2 at a single value of τ, and transform it into a rate estimate by virtue of Eq. (22). We call these estimators single-τ estimators. Ignoring statistical uncertainties, they yield a rate estimate of the form Quantitatively, the error can be bounded by the expression (see derivation in the SI): The error becomes identical to this bound for systems with a strong timescale separation, κ 3 κ 2 . Eq. (28) decays relatively slowly in time (with τ −1 . See Fig. 2 for a two-state example). It will be shown below that methods that estimate rates from counting the number of transitions across a dividing surface, such as Markov state models, are single-τ estimators and are thus subject to the error given by Eq. (28).
The systematic error of single-τ estimators results from the fact that Eq. (27) effectively attempts to fit the tail of a multi-exponential decayλ 2 (τ) by a singleexponential with the constraintλ 2 (0) = 1. Unfortunately, the ability to improve these estimators by simply increasing τ is limited because the statistical uncertainty of estimating Eq. 22 quickly grows in τ [34].
Multi-τ rate estimators: To avoid the error given by Eq. (28), it is advisable to estimate the rate by evaluating the autocorrelation functionλ 2 (τ) at multiple values of τ. This can be done e.g., by performing an exponential fit to the tail of theλ 2 (τ), thus avoiding the constraint λ 2 (0) = 1 [3,10]. The estimation errorκ 2 − κ 2 has the where τ 1 is the first lag time from the series (τ 1 , ..., τ m ) used for fitting, and the constant c also depends on the lag times and the fitting algorithm used. The SI shows that for several fitting algorithms, such as a least-squares procedure at the time points (τ, 2τ, ..., mτ), c is such thatκ 2,mult ≤κ 2,single .
Thus, the multi-τ estimator is always better than the single-τ estimator (see SI). The main advantage of multiτ estimators is that their convergence rate is exponential in τ when the time scale separation κ 3 − κ 2 is not vanishing (compare to Eq. 28). Thus, multi-τ estimators are better when the timescale separation between the slowest and the other relaxation rates in the system is larger. Note that the systematic error of a multi-τ estimate thus decays much faster in τ than its statistical error that decays by τ −1 [34].
In the absence of statistical error, all of the above rate estimation methods yield an overestimation of the rate, κ 2 ≥ κ 2 .

Optimal choice ofψ 2
It was shown above that multi-τ estimators are the best choice for converting an autocorrelation function into a rate estimate. However, what is the best possible choiceψ 2 =ψ 2,optimal given a specific observed time series o t ? In other words, which function should the observed dynamics be projected upon in order to obtain an optimal rate estimator? Following Eq. (28), the optimal choiceψ 2 is the one which maximizes the parameter α, as this will minimize the systematic error from a direct rate estimation by virtue of Eq. (28), and also minimize the systematic error involved in estimating κ 2 from an exponential fit to Eq. (22). We are thus seeking the solution of:ψ subject to the normalization Eq. (14). Here, arg maxψ 2 α denotes the function that maximizes α over the space of functionsψ 2 (o). If the system has two-state kinetics, i.e., only ψ 1 (x) = 1 and ψ 2 (x) are present as dominant eigenfunctions, the problem (31) is solved by the projected eigenfunction:ψ How can the best possibleψ 2 be determined from the observed time series? For a sufficiently large set of n basis functions, γ = {γ 1 (o), ..., γ n (o)}, the optimal eigenfunctionψ 2 is approximated by a linear combinationψ .., c n }. When γ i (o) is chosen to be an orthogonal basis set, thenψ 2 = arg maxψ 2 α can be approximated by the Ritz method [29,35]. An easy way to do this approximation in practice is to perform a fine discretization of the observable o by histogram windows. Using a binning with bin boundaries b 1 , ..., b n+1 , and the corresponding indicator functions then the above optimization problem is solved by estimating the transition probability matrix with elements and calculating c as the second eigenvector: where λ 2 < 1 is the second-largest eigenvalue of T. If the system has two-state kinetics, i.e., only ψ 1 (x) = 1 and ψ 2 (x) are present as dominant eigenfunctions, the estimateψ 2 is independent of the choice of τ in Eq. (34). Thus, in real systems, τ should be chosen to be at least a multiple of κ −1 3 (e.g., τ ≥ 3κ −1 3 , as indicated by a constant rate κ 2 estimate using a multi-τ estimator (Eq. (29)). Note that a given optimalψ 2 (o) can still be used with single-τ and multi-τ rate estimators that would produce different estimates for κ 2 .
Note thatψ 2 according to the procedure described here is only optimal for the case when the observed signal is obtained by projecting the high-dimensional data onto the observable, but is no longer optimal in the presence of noise, and especially large noise. In order to chooseψ 2 optimal when noise is present, a generalized hermitian eigenvalue problem must be solved instead of Eq. (35) which includes a mixing matrix whose elements quantify how much the observable bins are mixed due to measurement noise. Since this approach is not very straightforward and in most practical cases only leads to small improvements, we won't pursue it here. We rather note that it is often practical to reduce the noise level by carefully filtering the recorded data, provided that the filter length is much shorter than the timescales of interest.

Reaction coordinate quality (RCQ)
Evaluating how suitable a given observable in terms of capturing the slow kinetics is of great general interest. Although there is not a unique way of quantifying this suitability of the observable, the term reaction coordinate quality (RCQ) is often used. Previous studies have proposed ways to measure the RCQ that are based on comparing the observed dynamics to specific dynamical models or testing the ability of the observable to model the committor or splitting probability between two chosen end-states A and B [4,5,36]. These metrics are either only valid for specific models of dynamics or themselves require a sufficiently good separation of A and B by definition, restricting their applicability to observables with rather good RCQs.
The prefactor α o in Eq. (22) is a measure between 0 and 1, quantifying the relative amplitude of the slowest relaxation in the autocorrelation function that is used to estimate the rate from. By virtue of Eqs. (28) and (29), α o quantifies how large the error in our rate estimate can be. Therefore, we propose to that α o may serve as a basis to define the quality of the reaction coordinate. However, α o depends on the way the data is analyzed, namely the functional formψ 2 (o) used to compute the autocorrelation functionλ 2 (τ). In order to remove this ambiguity we note that for the optimal choiceψ 2 =ψ 2 (Eq. (31)), α o is maximized and in this case we denote . We proposeα o to serve as a measure for the quality of the reaction coordinate. This definition of RCQ is very general, as it makes no assumptions about the class of dynamics in the observed coordinate, and does not depend on any subjective choices such as the choice of two reaction end-states A and B in terms of the observable o. Through the derivation above it has also been shown thatα o measures the fraction of amplitude by which the slowest process is observable, which is exactly the property one would expect from a measure of the RCQ:α 0 is 1 for a perfect reaction coordinate and 0 if the slowest process is exactly orthogonal to the observable, or has been completely obfuscated by noise.
Using this definition, the reaction coordinate quality is not purely a property of the molecular order parameter used, but rather a property of the measurement signal (including the effects of experimental noise and filtering used to preprocess the data). Yet, an analyst is typically interested in the RCQα y that is due to the choice of the molecular order parameter. Unfortunately, when an experiment is conducted, only the effective or apparent RCQα o is available. Unless a quantitative model of the dispersion function χ d (o | y) is known, the RCQα y before adding noise cannot be recovered. (see also Fig. 1 for an illustration). However, we can still quantitatively relateα y andα o . For this, we derive a theory of observation quality. While the detailed derivation is found in the SI, we just summarize the most important results here: 1. When observing the order parameter y without noise and projecting the observation onto the optimal indicator functionψ 2 =ψ o 2 , the RCQ can be expressed as the weighted norm of the projected eigenfunction, expressed by the scalar product: 2. Unless the projection perfectly preserves the structure of the full-space eigenfunction ψ 2 , we haveα y < 1. Thus, almost every observable reduces the RCQ.
3. When additional noise is present, the RCQ can be expressed as the weighted norm of the projected and noise-distorted eigenfunction: 4. The RCQα y due to the projection onto the selected molecular order parameter alone, and the apparent RCQα o including measurement noise are related by:α i.e., adding noise always further reduces the apparent RCQ compared to a situation where the order parameter y can be directly measured.
The inequality (38) implies that we can use the observable RCQα o in order to both optimize the experimental setup and the order parameter used. For example in an optical tweezer measurement, we can change laser power and handle length so as to maximizeα o , thus makingα o andα y more similar and reducing the effect of noise on the measurement quality. On the other hand, sinceα o is a lower bound forα y , we can also use it ensure a minimal projection quality: When the measurement setup itself is kept constant we can compare the measurements of different constructs (e.g., different FRET labeling positions or different attachment sites in a tweezer experiment). The best valueα o corresponds to the provably best construct. Finally,α o can be determined by fitting the autocorrelation function ofψ 2 , as described in the spectral estimation procedure described below. Figs. 2-4 show estimates of the RCQ of different observed dynamics using different rate estimators.

Markov (state) models (MSMs)
MSMs have recently gained popularity in the modeling of stochastic dynamics from molecular simulations [12,14,15,37,38]. MSMs can be understood as a way of implicitly performing rate estimates via discretizing state space into small substates. Let us consider a MSM obtained by finely discretizing the observed space y into bins and estimating a transition matrix T(τ) amongst these bins. We have seen that this procedure approximately solves the optimization problem of Eq. (31) and the leading eigenvector of T(τ) approximates the projection of the true second eigenfunction,ψ o 2 (o), available for the given observable o. Ref [15] has suggested to use the implied timescalet 2 = −τ/ ln(λ 2 (τ)) as an estimate for the system's slowest relaxation timescale, and at the same time for a test which choice of τ leads to a MSM with a small approximation error. These implied timescales correspond to the inverse relaxation rates and therefore the MSM rate estimate is described by Eq. (27) with the choiceψ 2 =ψ 2 . A sufficiently fine MSM thus serves as an optimal single-τ rate estimator as it uses the maximum RCQα o for observed signal that is being discretized. However, when this signal has a poor RCQ α o since it is poorly separating the slowly-converting states, there is a substantial rate estimator error according to Eq. (28) that decays slowly with τ −1 . This explains the slow convergence of implied timescales shown in recent MSM studies [12][13][14][15]39].

Estimating state densities and microscopic transition rates
When the rate κ 2 is exactly known, the microscopic transition rates between the two interchanging states, k AB and k BA could be calculated from the equations: where π A and π B are the stationary probabilities of states A and B: with µ o A (o) and µ o B (o) being the partial densities of states A and B in the observable o, respectively.
Here we attempt to estimate both the partial densities, µ o A (o) and µ o B (o), and from these the microscopic transition rates via Eqs. (39) and (40). The difficulty is that µ o A (o) and µ o B (o) can significantly overlap in o, both due to the way the order parameter used projects the molecular configurations onto the observable, and due to noise broadening of the measurement device. This reveals a fundamental weakness of dividing-surface approaches. Although a dividing surface estimator can estimate the rate κ 2 for sufficiently large τ without bias via Eq. (29), it cannot distinguish between substates on one side of the barrier, and thus assumes the partial densities µ o A (o) and µ o B (o) to be given by cutting the full density µ o (o) at the dividing surface. When the true partial densities overlap, this estimate can be far off (compare the curves in Fig. 2, second column). Consequently, wrong estimates for the microscopic rates k AB and k BA are obtained when Eqs. (39) and (40) are used with π A and π B that are simply the total densities "left" and "right" of the dividing surface.
Hidden Markov models approach this problem by proposing a specific functional form of µ o A (o) and µ o B (o), for example a Gaussian distribution, and then estimating the parameters of this distribution with an optimization algorithm [21,23]. This is approach is very powerful when the true functional form of the partial densities is known, but will give wrong estimates when the wrong functional form is used.
Here, we propose a nonparametric solution that can estimate the form of the partial densities µ o A and µ o B , and the microscopic transition ratesk AB andk BA in most cases without bias. For this, we employ PCCA+ theory [17,40] which is based on PCCA theory [28,41] which allows for a way of splitting state space into substates and at the same time maintain optimal approximations to the exact eigenfunctions (here ψ 2 ): The state assignment must be fuzzy, i.e., instead of choosing a dividing surface that uniquely assigns points o to either A or B, we have fuzzy membership functions χ A (o) and χ B (o) with the property χ A (o) + χ B (o) = 1. These membership functions can be calculated after ψ 2 is known.
In order to compute the membership χ A and χ B , the memberships of two points of the observable o must be fixed. The simplest choice is to propose two points that are pure, i.e., that have a membership of 1 to A and B each. Such an approach is also proposed by the signalpair correlation analysis approach [27] where the pure points need to be defined by the user. However at this point of our analysis, an optimal choice can be made, because the eigenfunctionψ o 2 has been approximated. Thus we propose to follow the approach of [40] and choose the o-values whereψ o 2 achieves a minimum and a maximum, respectively, as purely belonging to A and B. Typically, these are the states the are on the left and the right boundary of the histogram in o. This approach will only start to give a biased estimate when the overlap of the A and B densities is so large that not even these extreme points are pure (see Fig. 2, last row for such an example). Letψ 2 be the second eigenvector of the Markov model T(τ) of the finely binned observable (Eq. (35)).ψ 2 is a discrete approximation to the projected eigenfunction ψ o 2 . Following the derivation given in the appendix, the fuzzy membership functions on the discretized observable space are given bŷ Since we are restricted to the projected eigenfunctionψ 2 , we can determine the optimal choiceχ A (o) andχ B (o) fromψ 2 (o). Together with the estimated stationary density µ(o) which can e.g., be obtained by computing a histogram from sufficiently long equilibrium trajectories, the probability density of being in A and B when projected onto the observable o can be calculated: The probability of being in A and B is thus given by: and these probabilities can be used to splitκ 2 into microscopic transition rates k AB and k BA : Note that the assignment of labels "A" and "B" to parts of state space is arbitrary. Eq. (49) is the transition rate from A to B as defined by Eqs. (43)- (44), and Eq. (50) is the corresponding transition rate from B to A.

Spectral estimation
The optimal estimator for κ 2 is thus one that fits the exponential decay ofλ 2 (τ) while minimizing the fitting error Eq. (29). As analyzed above, the systematic fitting error is minimized by any multi-τ estimator. In order to obtain a numerically robust fit, especially in the case when statistical noise is present, it is optimal to fit to an autocorrelation functionλ 2 (τ) where the relevant slowest decay has maximum amplitudeα 0 . This is approximately achieved by constructing a fine discretization MSM on the observed coordinate (see Section "Optimal choice ofψ 2 "). Thus, the optimal estimator of κ 2 proceeds as outlined in (1-4) below. The full spectral estimation algorithm (1-6) additionally provides estimates for the microscopic rates k AB , k BA , and for the partial densities µ A and µ B : 1. Obtain a fine discretization of the observed coordinate o into n bins, say [o i , o i+1 ] for i ∈ 1...n.

Construct a row-stochastic transition matrix T(τ)
for different values of τ. The estimation of transition matrices from data have been described in detail [14]. A simple way of estimating T(τ) is the following: (i) for all pairs i, j of bins, let c ij (τ) be the number of times the trajectory has been in bin i at time t and in bin j at time t + τ, summed over all time origins t; (ii) estimate the elements of T(τ) by T ij (τ) = c ij (τ)/ ∑ k c ik (τ). A numerically superior approach is to use a reversible transition matrix estimator [14].
5. Calculate the partial densities µ A and µ B from Eqs. (45) and (46) using transition matrix eigenvectors estimated at a lagtime τ min at which the rate estimateκ 2 is converged.
6. Calculate the microscopic transition rates k AB and k BA from Eqs. (49) and (50) Note that this estimator is optimal in terms of minimizing the systematic error. When dealing with real data, the amount of statistics may set restrictions of how fine a discretization is suitable and how large a lag time tau will yield reasonable signal to noise. For a discussion of this issue refer to e.g., [34].

Illustrative two-state example
To illustrate the theory and the concepts of this paper, we compare the behavior of different order parameters, measurement noise and different estimators in Fig. 2. The full-space model here is a two-dimensional model system using overdamped Langevin dynamics in a bistable potential. This choice was made because the exact properties of this system are known and the quality of different estimates can thus be assessed. The potential is chosen such that the eigenfunction associated with the slow process, ψ 2 (x) varies in x 1 and is constant in x 2 , such that the choice o = x 1 represents a perfect projection and the choice o = x 2 represents the worst situation in which the slow process is invisible. Fig. 2I, II and III show three scenarios using y = x 1 (I, perfect order parameter -projection angle 0°), y = 1 2 (x 1 + x 2 ) (II, intermediate-quality order parameterprojection angle 45°), and y = 1 4 (x 1 + 3x 2 ) (III, poor order parameter -projection angle 72°). Additionally, we compare the results when the order parameter y is traced without noise ("a" panels), and when measurement noise is added ("b" panels). Noise here consists of adding a uniformly distributed random number from the interval [-1,1] to the signal, such that the noise amplitude is roughly 25% of the signal amplitude. Fig. 2, panels 1a and 1b show the apparent stationary density in the observable y, µ y (y), or in the noisy observable o, µ o (o), as a black solid line. The partial densities of substates A (orange) and B (grey) which comprise the total stationary density are shown as well. The lower part of the figure shows the observed eigenfunction associated with the two-state transition process (ψ y 2 or ψ o 2 ) as black solid line with grey background. It is apparent that when the quality of the observation is reduced, either by choosing a poor order parameter, or by adding experimental noise, the overlap of the partial densities increases and the continuous projected eigenfunction ψ o 2 becomes smoother and thus increasingly deviates from the dividing surface model which is a step function switching at the dividing surface (dashed line).
Panels 2a and 2b show the reaction coordinate qualities (RCQ) in these different scenarios. The fact that the blue and red lines are approximately constant after τ = 5 (when the fast processes have relaxed) shows that the RCQ can be reliably estimated at these lag time ranges using either the dividing surface or the spectral estimation approach. The red line (spectral estimation) gives the optimal, i.e., maximal, RCQ, which varies between 1 (perfect order parameter I) and 0.15 (poor order parameter III with additional measurement noise). It is seen that the optimal RCQ given by the spectral estimator can be much larger than the suboptimal RCQ estimated by the dividing surface estimator that uses a fit to the number correlation function Eq. (23) (blue line). This is especially apparent in the case of an intermediatequality order parameter (Fig. 2II-2a and II-2b).
Panels 3a and 3b show the estimate of the relaxation rate κ 2 obtained for different scenarios. Each panel compares the Markov model (MSM) estimate (solid black line), the dividing surface estimate (solid blue line) and the spectral estimate (solid red line) to the exact value (dashed black line). For the multi-τ estimators (dividing surface and spectral estimation), the lag time τ specifies the start of a time range [τ, τ + 10] used for an exponential fit. In the case of a perfect order parameter (I), all estimators yield the correct rate at lagtimes τ > 5 time steps (where the fast processes with rates κ 3 or greater have disappeared). In this case, the noise does not deteriorate the estimate, because the partial densities of states A and B are still well separated. For intermediatequality and poor order parameters, the MSM estimate breaks down dramatically, providing a strongly overestimated rate for 0 < τ < 100 time steps. Panels II-3a,b, and III-3a,b show the typical behavior τ −1convergence of the MSM estimate predicted by the theory (Eq. (28)). Clearly, the MSM estimate will converge to the true value for very large values of τ, but especially for the situation of a poor order parameter, the minimal τ required to obtain a small estimation error if typically larger than the timescale κ −1 2 of the slowest process, thus rendering a reliable estimation impossible. It is seen that the magnitude of the error for a given value of τ increases when going from panel II-3a to II-3b, from II-3a to III-3a, and from III-3a to III-3b. This is because in this sequence the RCQ deteriorates, as predicted by the theory of reaction coordinate qualities (see above), and hence the prefactor of the MSM error increases (see Eq. (28)).
As predicted by Eq. (30), the multi-τ estimators (dividing surface and spectral estimate) are always better than the single-τ (MSM) estimate. As predicted by Eq. (29), both the dividing surface and spectral estimate ofκ 2 converge when the fast processes have died out (here approximately at τ > 5 time steps). As seen in panels II-3b, III-3a and III-3b, the spectral estimate is more stable than the dividing surface estimate, i.e., it exhibits less strong fluctuations around the true value κ 2 . This is because the spectral estimate uses a higher RCQα o , and thus the exponential tail of the autocorrelation function can be fitted using a larger amplitude of the process relaxation with rate κ 2 , achieving a better signal-to-noise ratio.
Panels 1a and 1b show the partial densities estimated with a dividing surface and PCCA+ (see above) and panels 4a and 4b show the microscopic transition rates estimated from Eqs. (49) and (50), either using the partial densities from the dividing surface, or PCCA+ (spectral estimate). As expected, the partial densities from the dividing surface estimate are wrong as soon as the states overlap in the observable, either due to choosing a poor order parameter, or to experimental noise. As a result, the dividing surface estimates for the microscopic rates k AB and k BA are biased for all these cases (panels II-4a,b, panels III-4a,b). The spectral estimate gives an unbiased estimate for intermediate overlap (panels II-4a,b). For strong overlap, even the spectral estimator starts to have a bias because no pair of observable states can be found that are uniquely assignable to states A and B. Nonetheless, the spectral estimator still yields a smaller bias than the dividing surface estimator. Like for the relaxation rate (κ 2 ) estimate, the spectral estimator exhibits less fluctuations due to a better signal-to-noise ratio.

Applications to optical tweezer data
In order to illustrate the performance of spectral estimation on real data, it is applied to optical tweezer measurements of the extension fluctuations of two biomolecules examined in a recent optical force spectroscopy study: the p5ab RNA hairpin [42] and the H36Q mutant of sperm whale apo-myoglobin at low pH [43]. The p5ab hairpin forms stem-loop structure with a bulge under native conditions (Fig. 3-5) and zips/unzips repeatedly under the conditions used to collect data (Fig. 3-6a), while apo-myoglobin (crystal structure shown in Fig. 4-5) hops between unfolded and molten globule states at the experimental pH of 5 ( Fig.  4-6a) [43]. Experimental force trajectory data were generously provided by the authors. Experimental details are given in Refs. [42,43], but we briefly summarize aspects of the apparatus and experimental data collection procedure relevant to our analysis.
The instrument used to collect both datasets was a dual-beam counter-propagating optical trap [44]. The molecule of interest was tethered to polystyrene beads by means of dsDNA handles, with one bead suctioned onto a pipette and the other held in the optical trap. A piezoactuator controlled the position of the trap and allowed position resolution to within 0.5 nm, with the instrument operated in passive (equilibrium) mode such that the trap was stationary relative to the pipette during data collection. The force on the bead held in the optical trap was recorded at 50 kHz, with each recorded force trajectory 60 s in duration. Fig. 3 compares the results of different rate estimators for optical tweezer measurement of the p5ab RNA hairpin extension fluctuations. Fig. 3-5 shows a sketch of the RNA molecule under analysis and Fig. 3-6 shows the trajectory that was analyzed. The trajectory exhibits a two-state like behavior with state lifetimes on the order of tens of milliseconds. Fig. 3-1a shows the stationary probability density of measured pulling forces, exhibiting two nearly separated peaks. Fig. 3-2a shows the estimated reaction coordinate quality (RCQ) α o , which is approximately constant at lag times τ > 5 ms, indicating a reliable estimate for this quantity at lag times greater than 5 ms. An optimum value ofα o ≈ 0.96 (spectral estimator) is found while the best possible dividing surface results in α o ≈ 0.94. These values indicate that the present reaction coordinate is well suited to separate the slowly interconverting states, and that different approaches, including a Markov model, a dividing surface estimate and a spectral estimate should yield good results. Fig. 3-3a compares the estimates of the relaxation rate κ 2 using the direct MSM estimate (black), a fit to the fluctuation autocorrelation function using a dividing surface at the histogram minimum o = 12.75 pN (blue), and spectral estimation (red). For the multi-τ estimators (dividing surface and spectral estimation), the lag time τ specifies the start of a time range [τ, τ+2.5 ms] that was used for an exponential fit. All estimators agree on a relaxation rate of aboutκ 2 ≈ 58 s −1 , corresponding to a timescale of about 17 ms. The MSM estimate is strongly biased for short lagtimes, exhibiting the slow τ −1 convergence predicted by the theory for single-τ estimators (Eq. (28)). It converges to an estimate within 10% of the value from multi-τ estimates after a lagtime of about 10 ms. The dividing surface and spectral estimators behave almost identical, and converge after about τ = 5 ms. According to the error theory of multi-τ estimators (Eq. (29)), this indicates that there are additional, faster kinetics in the data, the slowest of which have timescales of 2-3 ms. In agreement with the theory (Eq. (30)), the multi-τ estimators (dividing surface, spectral estimate) converge faster than the single-τ estimate (MSM).
As indicated in Fig. 3-1a, the substates estimated from PCCA+ are almost perfectly separated and can be well distinguished by a dividing surface at the histogram minimum o = 12.75 pN. Consequently, both the dividing surface estimate and the spectral estimate yield almost identical estimates of the microscopic transition rates -the folding rate being k AB ≈ 45 s −1 and the unfolding rate being k BA ≈ 15 s −1 (Fig. 3-4). In summary, the two-state kinetics of p5ab can be well estimated by various different rate estimators because the slowly-converting states are well separated in the experimental observable. Fig. 3 panels b show estimation results for data that has been filtered by averaging over 50 frames (1 ms). This averaging further reduces the already small overlap between substates A and B while the filter length is much below the timescale of A − B interconversion. Therefore, filtering has a positive result on the analysis: The effective RCQα o increases and is now approximately equal to 1 according to spectral estimation. The estimation results are largely identical to the case with noise. In Fig. 3-3b, the error made by the Markov model estimate has become smaller because the error prefactor reported in Eq. (28), lnα o , has become smaller. Note that in contrast to the unfiltered data analysis, some of the rate estimates (MSM and spectral estimate) underestimate the rate for small lag times τ. This is not in contradiction with our theory which predicts an overestimation of the rate for Markovian processes. By using the filter, one has effectively introduced memory into the signal, and the present theory will only apply at a lagtime τ that is a sufficiently large multiple of the filter length, such that the introduced memory effects have vanished. Fig. 4 shows estimation results for an optical tweezer experiment that probes the extension fluctuations of apo-myoglobin [45]. Fig. 4-5 shows a sketch of the experimental pulling coordinate (green arrows) depicted at the crystal structure of apo-myoglobin. Fig. 4-6 shows the trajectory that was analyzed. Out of the trajectories reported in [45] we have here chosen one where the two slowest-converting states have the largest overlap. While the trajectory indicates that there are at least two kinetically separated states, the stationary probability density of measured pulling forces ( Fig. 4-6) does not exhibit a clear separation between these states in the measured pulling force. Thus, the order parameter used here, in combination with signal broadening by the experiment, does not clearly separate the kinetically distinct states. This is also indicated by Fig. 4-2a, which shows that the optimal reaction coordinate quality (RCQ) has a value ofα 0 ≈ 0.5 (spectral estimator) at τ = 15 ms while the best possible dividing surface results in only α o ≈ 0.4 at τ = 15 ms. Thus, the quality of the apo-myoglobin data is similar to that of the twostate model with intermediate-quality order parameter and noise (Fig. 2II-b). This data thus represents a much harder test for rate estimators than the p5ab hairpin and should show clear differences between different rate estimators.  local histogram minimum of the binned data at o = 4.6 pN (blue), and spectral estimation (red). For the multiτ estimators (dividing surface and spectral estimation), the lag time τ specifies the start of a time range [τ, τ+2.5 ms] that was used for an exponential fit. Fig. 4-3a shows again, that the MSM estimate of κ 2 exhibits the slow τ −1 convergence predicted by the theory (Eq. (28)) and does not yield a converged estimate using lagtimes of up to 20 ms. Since the MSM estimate still significantly overestimates the rate at τ = 50 ms when the relaxation process itself has almost entirely decayed, this estimator is not useful to analyze the apomyoglobin data. In contrast, both, the dividing surface multi-τ approach and the spectral estimator do yield a converged estimate ofκ 2 ≈ 26 s −1 , corresponding to a timescale of about 38.5 ms (Fig. 4,3a). In Ref. [45] a Hidden Markov model with Gaussian output functions was used and the rate was estimated to beκ 2 ≈ 46 s −1 corresponding to a timescale of κ 2 ≈ 21 ms. These results are consistent with our theory which shows that rate estimation errors lead to a systematic overestimation of the rate (and underestimation of the timescale). Fig. 41a clearly shows that the partial probabilities are not Gaussians, and enforcing Gaussian output probabilities will lead to a systematic error in the rate estimate. The smallest rate estimates are therefore the best estimates, which are here provided by the multi-τ estimators using either dividing surface or spectral estimation approaches.
In agreement with the theory (Eq. (30)), the multi-τ estimators (dividing surface, spectral estimate) converge faster than the single-τ estimate (MSM). The multi-τ estimators can be considered converged after a lagtime of about τ = 15 ms. According to the Eq. (29), this indicates that there are additional, faster kinetics in the data, the slowest of which have timescales in the order of 5-10 ms. As the resulting timescale separation to the 25 ms process is rather small, it may be worthwhile to investigate further substates of the A and B states with multistate approaches such as Hidden Markov Models [23] or pair correlation analysis [27]. Such an analysis is beyond the scope of the present paper on two-state rate theory.
As indicated in Fig. 4-1a, the substates A and B estimated from PCCA+ do strongly overlap. Thus, even though the dividing surface estimator can recover the true relaxation rate κ 2 , the estimated microscopic rates k AB and k BA strongly depend on the choice of the position of the dividing surface. Fig. 4-4a shows the estimates the dividing surface multi-τ estimator, evaluating to k AB ≈ 25 s −1 and k BA ≈ 5 s −1 . In contrast, the spectral estimator yields estimates of k AB ≈ k BA ≈ 15 s −1 . Like for the two-state model results shown in Fig. 2, the spectral estimate is numerically more stable in τ compared to the dividing surface estimate as a result of achieving a better signal-to-noise ratio. Clearly, in the dividing surface approach it is possible to pick a dividing surface position which yields the same estimates for k AB and k BA like the spectral estimator. However, the dividing surface estimator itself does not provide any information which is the correct choice, and therefore this theoretical possibility is of no practical use. Supplementary Figure 2 compares the estimation results of κ 2 and k AB , k BA for different choices of the dividing surface. In contrast to the dividing surface approach, the spectral estimator only assumes that the extreme values of o are pure, which is a much weaker requirement than assuming that an appropriate dividing surface exists (see theory), and hence provides more reliable rate estimates.
The "b"-panels in Fig. 4 show the effect of filtering the data on the estimation results. Here, the data was averaged over a window length of 1 ms, corresponding to an averaging of 50 data points of the original 50 kHz data . Fig. 4-1b indicate that this filtering enhances the separation of states, and the apparent RCQ's increase to aboutα 0 ≈ 0.7 (spectral estimate) and α 0 ≈ 0.6 (dividing surface). The relaxation rate κ 2 is still estimated to haveκ 2 ≈ 40 s −1 and the estimate becomes more robust for both the dividing surface and spectral estimates ( Fig. 4-3b). The MSM estimate slightly improves but is still significantly too high. Fig. 4-4b shows that the dividing surface derived rate estimates k AB and k BA have slightly, while the spectral estimate remains at k AB ≈ k BA ≈ 15 s −1 independent of the filtering, which is in support of the reliability of the spectral estimate.

Conclusions
We have described a rate theory for observed twostate dynamical systems. The underlying system is assumed to be ergodic, reversible, and Markovian in full phase space, as fulfilled by most physical systems in thermal equilibrium. The observation process takes into account that the system is not fully observed, but rather by tracing one order parameter (the extension to multiple oder multidimensional order parameters is straightforward). During the observation process, the observed order parameter may be additionally distorted or dispersed, for example by experimental noise. Such observed dynamical systems occur frequently in the molecular sciences, and appear both in the analysis of molecular simulations as well as of single-molecule experiments.
The presented rate theory for observed two-state dynamics is a generalization to classical two-state rate theories in two ways: First, most available rate theories assume that the system of interest is either fully observable, or the relevant indicators of the slowest kinetic process can be observed without projection error or noise. Secondly, most classical rate theories are built on specific dynamical models such as Langevin or Smoluchowski dynamics. The present theory explicitly allows the two kinetic states to overlap in the observed signal (either due to using a poor order parameter or to noise broadening), and does not require a specific dynamical model, but rather works purely based on the spectral properties of a reversible ergodic Markov propagator -hence the name spectral rate theory.
Given the spectral rate theory, the systematic errors of available rate estimators can be quantified and compared. For example, the relatively large systematic estimation error in the implied timescales / implied rates of Markov state models is explained. Additionally, the theory provides a measure for the reaction coordinate quality (RCQ)α o of the observed signal, which is independent of any specific dynamical model and also does not need the definition of an "A" or "B" state and bounds the error in rates estimated from the observed signal.α o includes effects of the order parameter measured as well as the effect of the experimental construct on the signal quality, such as experimental noise. It is shown thatα o is a lower bound to the RCQ due to choosing the order parameter, and can thus be used as an indicator to both improve the quality of the experimental setup and the choice of the order parameter.
The theory suggests steps to be taken to construct an optimal rate estimator, that minimizes the systematic error in the estimation of rates from an observed dynamical system. We propose such an estimator and refer to it as spectral estimator. It provides rather direct and optimal estimates for the following three types of quantities: 1. The reaction coordinate quality (RCQ)α o of the observed signal.
2. The dominant relaxation rate κ 2 , as well as the microscopic transition rates k AB and k BA , even if A and B strongly overlap in the observable.
3. The partial probability densities, and hence projections of the states A and B in the observable, µ o A (o) and µ o B (o), as well as their total probabilities, π A and π B . This information is also obtained if A and B strongly overlap in the observable.
Other rate estimators that rely on fitting the exponential tail of a time-correlation function calculated from    Fig. 3C), similar to the other pulling axis and other native proteins.
The Molten Globule State of Apomyoglobin Unfolds Cooperatively Under Force with a Large Distance to the Transition State. After initially characterizing an individual molecule at pH 7.0, the buffer was changed to pH 5.0 with the sample still held in the optical trap in order to populate the molten globule state at equilibrium (Fig. S2) (16). Under these conditions, unfolding and refolding events (I-U) of the N/C variant are cooperative and reversible, with both force distributions centered on 4.5 pN (Fig. 2C). The 53/C variant also shows cooperative unfolding and refolding events, with both force distributions centered on 6.5 pN (Fig. 2D). The total extension changes for both variants are consistent with those measured at pH 7.0. The force-dependent unfolding and refolding rates of the pH 5 state were determined using constant trap position experiments. In these experiments, the trap is held at a fixed position and the force varies as the molecule spontaneously folds and unfolds, allowing for the state to be identified and the lifetimes of each state as a function of force to be measured (see Materials and Methods).
[It should be noted that these hopping experiments were inaccessible for the native state because of the long lifetime of the native state (i.e., high barrier to unfolding) at low force.] The distances to the transition state for both folding and unfolding of both variants were determined by using Bell's model (17) (see Fig. 4, Table 1, and Table S1). Other more detailed models that account for any potential movement of the transition state were not needed due to the limited force-range examined (18). The apparent deviations at the extreme forces are associated with more uncertainty due to the longer lifetimes and limited observations as well as a potential bias toward a smaller rate constant due to missed observations for the short-lived state. The distances from the molten globule state to the transition state (Δx ‡ Unf ) for both variants (6.1 AE 0.5 nm for the N/C variant; 3.4 AE 1.2 nm for the olomyoglobin (B). (A) The protein was tethered between two polystyrene beads through functionalized cysteine residue, thereby determining the axis along which the force was applied. One bead is held on a pagating optical trap manipulates the other. By monitoring the bead in the trap, the force on the tether and d. (B) The structure of holomyoglobin (Protein Data Bank IDcode 1BZ6). The regions that are thought to be ted in red. The arrows indicate the pulling axis for the N/C variant (in green) and the 53/C variant (in blue) g points as determined from the structure in parentheses. Apo-myoglobin H36Q Figure 4: Estimates for rates and the reaction coordinate quality (RCQ) from passive-mode single-molecule force probe experiments of apo-myoglobin. Panels (1-4) report the estimation results, showing the direct Markov model estimate (black), a fit to the fluctuation autocorrelation function using a dividing surface at the histogram minimum o = 4.6 pN (blue), and spectral estimation (red). Panel (5) shows the native protein structure, adapted from [45], and the pulling coordinate, Panel (6) shows the traces used for analysis. The first row (a) reports the results when directly analyzing the measured 50 kHz data, while row (b) reports the results when analyzing data that has been binned to 1 kHz to reduce the noise amplitude. the experimental recorded trajectories can also estimate κ 2 without systematic error. However, the spectral estimator is unique in also being able to estimate k AB , k BA , µ o A (o), µ o B (o) and the RCQ in the presence of states that overlap in the observable order parameter.
The present study has concentrated on systematic rate estimation errors that are expected in the data-rich regime. We expect that taking the statistical error into consideration will make the spectral estimator described here even more preferable over more direct approaches such as fitting the number autocorrelation function of a dividing surface. This intuition comes from the fact that the spectral estimator maximizes the amplitude α with which the slow relaxation of interest is involved in the autocorrelation function. In the presence of statistical uncertainty, this will effectively maximize the signal-tonoise ratio in the autocorrelation function and thus lead to an advantage over fitting autocorrelation functions that were obtained differently. However, a systematic treatment of both statistical and systematic error should be made but goes beyond the present study.
The presented idea of building an optimal estimator for a single relaxation rate upon the transition matrix estimate of the projected slowest eigenfunction,ψ 2 , is readily extensible to multiple relaxation rates. This lays the basis for a multi-state rate theory for lowdimensional observations of dynamical systems that will be pursued in future studies. Such a theory will help us to understand and enhance the behavior of existing data analysis methods such as Hidden Markov models when applied to observed physical systems.