Ritz method for transition paths and quasipotentials of rare diffusive events

The probability of trajectories of weakly diffusive processes to remain in the tubular neighbourhood of a smooth path is given by the Freidlin-Wentzell-Graham theory of large deviations. The most probable path between two states (the instanton) and the leading term in the logarithm of the process transition density (the quasipotential) are obtained from the minimum of the Freidlin-Wentzell action functional. Here we present a Ritz method that searches for the minimum in a space of paths constructed from a global basis of Chebyshev polynomials. The action is reduced, thereby, to a multivariate function of the basis coefficients, whose minimum can be found by nonlinear optimization. For minimisation regardless of path duration, this procedure is most effective when applied to a reparametrisation-invariant"on-shell"action, which is obtained by exploiting a Noether symmetry and is a generalisation of the scalar work [Olender and Elber, 1997] for gradient dynamics and the geometric action [Heyman and Vanden-Eijnden, 2008] for non-gradient dynamics. Our approach provides an alternative to chain-of-states methods for minimum energy paths and saddlepoints of complex energy landscapes and to Hamilton-Jacobi methods for the stationary quasipotential of circulatory fields. We demonstrate spectral convergence for three benchmark problems involving the Muller-Brown potential, the Maier-Stein force field and the Egger weather model.


I. INTRODUCTION
The theory of Freidlin and Wentzell [1] gives asymptotic probability estimates of rare events in dynamical systems perturbed by small noise [2][3][4][5].Specifically, Freidlin-Wentzell theory yields estimates of the stationary distributions and mean first-passage times.Both these quantities are determined, in turn, by the asymptotic estimate of the probability of a stochastic trajectory to not deviate from a smooth path by more than a given amount in a given interval of time.The key result of Freidlin and Wentzell is that the limiting form of this probability, for small noise and small deviations, is given by a non-negative functional of the smooth path.This functional is the Freidlin-Wentzell action and its minimum, for fixed initial and terminal states, determines both the stationary distributions and first-passage times.The smooth path minimizing the action is often called the Freidlin-Wentzell instanton.The theory is applicable to dynamical systems of both gradient and non-gradient character and can so be used to study a wide variety of equilibrium and non-equilibrium systems modelled by Itô diffusions [6][7][8][9][10][11][12][13][14][15].
Determining the minimum of the Freidlin-Wentzell action is a problem in the calculus of variations.The Euler-Lagrange equations provide the necessary conditions for extrema of variational problems and, unsurprisingly, have been the basis of * ltk26@cam.ac.uk the large literature devoted to the numerical computation of Freidlin-Wentzell instantons [6,[16][17][18].There exists, however, an alternative "direct" route for the solution of variational problems in which the functional is reduced, through finite-dimensional parametrizations of paths, to a multivariate function and then extremised by appropriate multivariate optimisation methods [19,20].To the best of our knowledge, the first use of the direct method for the Freidlin-Wentzell action, discretised by finitedifferences, appears in the work of Weinan, Ren and Vanden-Eijnden [21].
Here we combine the direct method with a Ritz discretisation [19,20,22] to minimize the Freidlin-Wentzell action.We analyse paths in a spectral basis of Chebyshev polynomials and use spectral quadrature to express the action as a multivariate function of the basis coefficients.Nonlinear optimisation is used to obtain coefficients that give the least action from which the instanton is synthesised in the spectral basis.For minimisation over paths regardless of their duration, this procedure is especially effective when applied to a reparametrisation-invariant onshell form of the action that follows from the timetranslational invariance of the Lagrangian.This generalises the scalar work functional of Olender and Elber (for gradient dynamics) and the geometric action of Heyman and Vanden-Eijnden [23] (for nongradient dynamics).Our method is efficient enough to robustly sample the logarithm of the asymptotic estimate of the stationary distribution, i.e. the quasipotential, avoiding the alternative, but numerically delicate, route of solving the Hamilton-Jacobi equation [24][25][26].Our method is simple to use, converges rapidly, and is applicable to both equilibrium and non-equilibrium problems.Its implementation is freely available on GitHub as the open-source Python library PyRitz.
The remainder of this paper is organized as follows.In the next section, we recall key results of Freidlin-Wentzell theory from dual perspectives of Itô stochastic differential equations and the corresponding Fokker-Planck equations.In Section (III) we present the derivation of the on-shell form of the Freidlin-Wentzell action in a manner reminiscent of the Routh reduction procedure in classical mechanics and explain its relation to the scalar work and the geometric action.In Section (IV) we describe the direct method for the minimisation of functionals, the Chebyshev spectral basis in which we construct smooth paths, the spectral quadrature rule we use to evaluate the action, and the multivariate non-linear optimisation methods we employ to find the minimum.In Section (V) we apply the direct method to three well-known diffusion processes and demonstrate convergence in each case.
A particular achievement of our approach is its relatively facile ability to calculate quasipotentials.This can be done with a sufficiently high density of sample points to construct effectively continuous maps of the quasipotential, which we do here for the same set of benchmark problems.We conclude with a discussion on extending the method to degenerate diffusion processes, systems with inertia and to the stochastic dynamics of fields.

II. LARGE DEVIATION THEORY
We consider the autonomous dynamics of a ddimensional coordinate X = (X 1 , . . ., X d ) in R d perturbed by configuration-dependent noise of intensity √ ε described by the Itô diffusion equation governing the stochastic trajectory X(t), where a µ (X) is the drift vector, √ εσ µ ν (X) is the volatility, W ν (t) is a d-dimensional Wiener process and repeated indices are summed over.The transition probability density of the process, P 1|1 (x, t|x 0 ) = P (X(t) = x|X(0) = x 0 ), obeys the Fokker-Planck equation ∂ t P (x|x 0 ) = LP (x|x 0 ) where the Fokker-Planck operator is and b µν (x) = σ µ λ (x)σ ν σ (x)δ λσ is the diffusion tensor.We assume it to be non-degenerate, positivedefinite and invertible.The inverse, b µν (x), induces a Riemannian structure in R d with a norm |x| b = b µν x µ x ν that is distinct from the Euclidean norm |x| = (x 1 ) 2 + . . .(x d ) 2 .We use the subscript b to indicate this second "diffusion" norm.The stationary density satisfies the time-independent Fokker-Planck equation LP 1 (x) = 0 and, when it exists, is reached asymptotically in time for arbitrary initial distributions, lim t→∞ P 1|1 (x, t|x 0 ) = P 1 (x).
Associated with the Itô process is the Freidlin-Wentzell "action" functional [1,27,28] which gives an asymptotic estimate for the logarithm of the probability of trajectories X(t) to remain in the tubular neighbourhood of a smooth path x(t) over the duration 0 ≤ t ≤ T .We write this as which, in terms of limits, means The limits must be taken in the order above as they do not commute.Eq. ( 4) is a large deviation principle for trajectories of Itô processes, due to Wentzell and Freidlin and Graham [29].
For reasons described below, it is of interest to obtain the mode of the tube probability over the set of continuous paths which have fixed termini x 1 and x 2 and are of duration T. This is equivalent to the variational problem of minimising the Freidlin-Wentzell action.The minimum value of the action, is called the quasipotential.The path attaining the minimum, is called the instanton.We emphasise that this path describes the smooth centerline of the tube of maximum probability and not a non-differentiable trajectory of the diffusion process.It is the most probable dynamical path connecting two points in configuration space.The instanton and the quasipotential are central objects in Freidlin-Wentzell-Graham theory and relate to the eikonal approximation of the Fokker-Planck equation [30].Assuming the JWKB form of the transition density, with prefactors suppressed, substituting in the Fokker-Planck equation and matching terms gives a Hamilton-Jacobi equation for the lowest order contribution, This corresponds to the Hamiltonian system whose solutions define an equivalent variational problem of extremising an action with the Lagrangian Thus, the rays of the Hamilton-Jacobi equation that determine the lowest order contribution to the eikonal are local maxima of the tube probability, or in other words, φ 0 (x, x 0 ; T ) = V T (x|x 0 ).The largedeviation principle of Freidlin and Wentzell and the theory of the non-equilibrium potential of Graham [27,28] thus appear as elegant reformulations of the JWKB approximation [30].
The correspondence with the JWKB approximation yields the asymptotic form of the transition density, and, in the T → ∞ limit of the above, the asymptotic form of the stationary distribution, If x and x 0 belong to the same basin of attraction of an attractor A, then it can be shown that this limit is independent of the initial coordinate, lim where V A ∞ is equal, to within a constant, to the stationary quasipotential V ∞ (x) in the basin of attraction of A. For a system with multiple attractors A i , the global quasipotential is where C Ai is an additive constant.The constants are fixed by requiring for attractors A i and A j with adjacent basins of attraction, where x (i,j) s is the saddle with the lowest value on the separatrix between the basins [28].The stationary quasipotential determines the mean persistence time of a trajectory in a basin of attraction which generalises the Arrhenius law to systems out of equilibrium.
The T → ∞ limit involved in the definition of the stationary quasipotential presents considerable numerical difficulties in the minimisation of the Freidlin-Wentzell action.A more numerically amenable route to determining the stationary quasipotential is by the minimisation of the action over paths that start at an attractor and end at a point in its basin, regardless of the duration.We show in the next section that the solution of this second variational problem does, indeed, yield the stationary quasipotential and derive an alternative form of the Freidlin-Wentzell action that is adapted to computing instantons regardless of their duration.

III. ON-SHELL ACTION
We consider the variational problem of minimising the Freidlin-Wentzell action over paths with fixed termini but of arbitrary duration, where both the initial and final points are in the basin of the attraction A and the Freidlin-Wentzell Lagrangian following from Eq. ( 9) is This variational problem can be solved by introducing a parametrisation u for both the coordinate and time, x = x(u), x = dx/du; t = t(u), t = dt/du, that allows the shape of the path, to be varied independently of its duration, Coordinates x and time t are dependent variables in the reparametrised action, in which the time-dependence appears only through the derivative t .Therefore, t is a cyclic (or ignorable) coordinate and Noether's theorem implies that the corresponding conjugate momentum is conserved [31]: This defines submanifolds of the dynamics labelled by the "energy" E which we shall call shells.The bound 2E + |a| 2 b ≥ 0 for the energy follows immediately from the positive-definiteness of the diffusion tensor.
Solving the first integral for t gives from which the duration of the path is obtained to be This shows that paths γ T E (of duration T E ) are equivalent to shapes σ E (of energy E), where the latter is the restriction of shapes in σ to the shell of constant energy.Then, minimisation over paths γ T regardless of their duration is equivalent to minimisation over shapes σ E regardless of their energy, or The action for shapes restricted to σ E is obtained by eliminating t between Eq.( 17) and Eq.( 19).This gives the "on-shell" form of the Freidlin-Wentzell action, which is a functional of the shape x(u), a function of the energy E, and allows for independent variations of both.It is straightforward to see that the integrand and, therefore, the action is minimised when E = 0. Therefore, most probable paths, regardless of their duration, are obtained by minimising over shapes restricted to the zero-energy shell.The duration on the zero-energy shell, shows that paths that leave, cross, or terminate at points of vanishing drift, a µ (x) = 0, are necessarily of infinite duration.The corresponding shapes σ A 0 can then be taken to start at a fixed point and end at another point x in the basin of attraction.The quasipotential is determined by a minimisation over such shapes σ A 0 , and the shape attaining the minimum, is the stationary instanton.The time on the instanton path can be obtained by integrating t = |x | b /|a| b .The utility of the on-shell form of the action is that it provides the shape of the path independently of its duration.The latitude of obtaining the shape from a parametrisation over a finite interval, even for paths of infinite duration, is extremely useful in numerical work.The on-shell action is related to, but distinct from, the Jacobi action in mechanics [32,33], which, following a Routh reduction [31], would in this case be Though both the on-shell and Jacobi action agree on the zero-energy shell, only the former supports the interpretation as least action for non-zero energies.Furthermore, variations of the on-shell action have to respect the on-shell condition Eq. 19 (in other words, the solutions of its Euler-Lagrange equations does not coincide with its extrema).On the other hand, the Jacobi action can be varied using the standard Euler-Lagrange approach.
For gradient dynamics, that is a µ = b µν ∂U/∂x ν , the on-shell action generalises the "scalar work" functional of Olender and Elber [34] to non-zero energies and configuration-dependent diffusion tensors.For non-gradient dynamics, where the drift cannot be so expressed, the on-shell action generalises the geometric action of Heyman and Vanden-Eijnden [17] to non-zero energies.The non-zero energy shell | ẋ| 2 b = |a| 2 b + E admits the most general path consistent with time-translation invariance, in contrast to the zero-energy shell where the magnitude of the velocity is always equal to that of the drift, | ẋ| 2 b = |a| 2 b .Such general paths determine the quasipotential and the asymptotic form of the transition density for finite times and will be examined in detail in future work.Accordingly, we set E = 0 below.The derivation of the on-shell action only requires timetranslation invariance of the Lagrangian and not, as in [17,34], their positive-definiteness.Thus, it can be applied to stochastic actions whose Lagrangians are not necessarily positive-definite, as for example the Onsager-Machlup action [35,36].We now describe the Ritz method by which we minimise actions.

IV. RITZ METHOD
The direct method in the calculus of variations consists of constructing a sequence of extremisation problems for a function of a finite number of variables that, in the passage to the limit of an infinite number of variables, yields the solution to the variational problem.The two main families of direct methods are finite differences and Ritz methods [19,20,22,37].In the latter, the solution of the variational problem is sought in a sequence of functions ϕ 1 (t), ϕ 2 (t) , . . .ϕ n (t), . . .each of which satisfies end point conditions.The path is expressed as a linear combination of these functions which transforms the action from a functional of the path into a function of the expansion coefficients, Action minimisation now becomes a search for a set of coefficients, α * i such that S(α * 1 , . . ., α * n ) < S(α 1 , . . ., α n ).The necessary condition for this is the vanishing of the gradient, which is the Ritz system of non-linear equations.Coefficients satisfying these conditions can be obtained by non-linear optimisation.The n-th approximation to the minimum action path, x * T (t), and the minimum value of the action, S[x * T (t)], are obtained from these values of the coefficients.It is generally the case that this sequence of approximations converges to the minimum of the variational problem as n → ∞ [19,20].
The method, then, has three parts: first, the choice of basis functions ϕ i (t); second, the quadrature rule that integrates the Lagrangian to obtain the action as a function of the expansion coefficients; and third, the optimisation that yields the coefficients at the minimum.Since each part is only loosely dependent on the others, Ritz methods come in many varieties [38].Our choices are centered around Chebyshev polynomials as described below.Approximation by Chebyshev polynomials and their optimality for the purpose are described in [39][40][41].
Basis functions: We consider a path x(u) that is a Lipschitz continuous function of the parameter u in the interval [−1, 1].Then, it is has an absolutely and uniformly convergent Chebyshev expansion, where T k (u) are Chebyshev polynomials of the first kind and the integral must be halved for k = 0.A suitable sequence of paths can be constructed from the first n terms of this infinite series.However, it is computationally more convenient, for reasons that will be clear below, to construct the sequence from n-th degree polynomials that interpolate the path at the n + 1 Chebyshev points The n-th degree interpolant can be expressed in standard form as a sum of Lagrange cardinal polynomials j (u) or as a linear combination of Chebyshev polynomials, The coefficients c k are aliased versions of the coefficients a k .Since the cardinal polynomials have the property x n (u k ) = α k , that is, the expansion coefficients α k are path coordinates at the Chebyshev points.Expressing the entire path in terms its discrete coordinates has the advantage that end point conditions can be imposed by setting Admissible paths of degree n are, then, parametrised by the n − 1 independent coefficients α 1 , . . ., α n−1 .
In contrast, imposing end point conditions in series form leads to a numerically inconvenient linear dependence between the coefficients c k .The derivative of the path is a polynomial of degree n − 1 that can be expressed in terms of the interpolant as with the two sets of expansion coefficients related by the Chebyshev spectral differentiation matrix We use the barycentric form of the Lagrange polynomials [42] j (u) = with weights [43] This form is both numerically stable and, costing no more than O(n) operations, efficient to evaluate.[44].Chebyshev interpolants converge exponentially for analytic functions and algebraically for functions with a finite number of derivatives.More precisely, for an analytic path, ||x − x n || = O(ρ −n ) for some ρ > 1 as n → ∞.For a path with ν derivatives and ν-th derivative of bounded variation K, ||x − x n || = O(Kn −ν ) as n → ∞.These estimates are in the supremum norm ||a||, that is, the maximum of the absolute value of a in the interval [−1, 1].In contrast, finite-difference methods can only achieve polynomial, but never exponential, rates of convergence, even for analytic paths [39,40].
Quadrature: To reduce the action to a multivariate function of the coefficients it is necessary to evaluate the integral using a quadrature rule.For instance, quadrature at the Chebyshev points u j gives where ω j are the quadrature weights.However, standard quadrature rules at this set of n Chebyshev points, which integrate a polynomial of degree less than or equal to n exactly, will generally be inaccurate.The reason is that the Lagrangian has polynomial degree different from, and usually greater than, the polynomial degree of the path.For instance, when b ij is a constant, the term quadratic in the velocities has twice the polynomial degree of the path.Therefore, if the Lagrangian is to be integrated accurately, the order of the quadrature must be different from, and in general greater than, the polynomial degree of the path.Therefore, we define a second set of n q > n Chebyshev points v j = − cos(jπ/n q ), (j = 0, 1, . . .n q ) (35) and interpolate the path at these points.This is done efficiently by matrix multiplication with a (n q + 1) × (n + 1) matrix x n (v j ) = nq k=0 whose elements are derived from the barycentric interpolant The Lagrangian is evaluated at these second set of points after which Clenshaw-Curtis quadrature [39,40] is used to evalute the action, As with interpolation, Clenshaw-Curtis quadrature converges exponentially for Lagrangians that are analytic in u and algebraically for Lagrangians with a finite number uderivatives.Precise estimates are given in [41].For fixed values of n and n q , the matrices B ij and C ij in the above expression are constant and can be precomputed and stored.The multiplications require O(nn q ) operations, and so there is a linear cost, for fixed n, to increase the order of the quadrature.For Lagrangians of polynomial order n L , the number of quadrature points must be n q > (n + 1)n L .For nonpolynomial Lagrangians, n q has to be chosen to ensure that the n q -th Chebyshev coefficient is suitably small.Well-defined procedures exist for the adaptive truncation of Chebyshev series [45] but here we use a simple rule of thumb and set n q = 10n leaving the implementation of more efficient truncations to future work.We note that in the direct finite-difference method, introduced in [21], the path is interpolated at uniformly spaced points by a quadratic polynomial and the Lagrangian is integrated using the trapezoidal rule.This combination can exactly evaluate the action for Lagrangians that are at most quadratic polynomials.Optimisation: To minimise the action over the expansion coefficients α 1 , . . ., α n−1 we use both gradient-free and gradient-based algorithms.For gradient-free algorithms we provide Eq.( 39) directly.For algorithms that require the gradient, the chain rule gives These matrices, too, can be precomputed and stored and only the partial derivatives of the Lagrangian need to be computed for given values of the coefficients.For the examples presented below, we use NEWUOA [46] for gradient-free optimisation and SLSQP algorithm [47] for gradient-based optimisation, both of which are implemented in the NLOPT numerical optimisation package [48].For non-equilibrium systems, instantons lose smoothness when passing through fixed points.For such paths, convergence is still achieved but at less than spectral rates.Spectral convergence can be recovered if paths are evaluated piecewise, taking care to isolate the points of derivative discontinuities.This is feasible because fixed points are the only locations where Freidlin-Wentzell instantons can lose smoothness [28].

V. NUMERICAL RESULTS
In this section, we apply the Ritz method to three diffusion processes that are widely used to benchmark rare event algorithms.The first is overdamped Brownian motion in a complex energy landscape, the second is overdamped Brownian motion under the influence of a circulatory force, and the third is a model of the weather.All three models have configuration-independent diffusion tensors for which it is not necessary to distinguish between covariant and contravariant indices.Python codes for each of these examples are freely available on GitHub.

A. Brownian dynamics in a complex potential
Our first example considers the overdamped Brownian motion in a two-dimensional potential with a constant friction.The usual equations of Brownian dynamics can be recast into Itô form, where µ is the mobility and ε = k B T is the temperature.The Freidlin-Wentzell action for a smooth path with two-dimensional coordinate x = (x 1 , x 2 ) is where ∇U = (∂ 1 U, ∂ 2 U ).The minimum of the zeroenergy action, provides the most probable shape and the stationary quasipotential.The second term does not affect the minimisation and can be discarded.The resulting reduced action is of the same form as Fermat's principle for optical rays, where |∇U (x)| plays the role of the refractive index and |x |du = ds is the arc-length of the ray.In geometric optics, Fermat's principle is equivalent to Huygen's principle and its "wavelet equation" This can be easily verified by differentiatiating it with respect to arc-length, to obtain the eikonal equation which is identical to the Euler-Lagrange equation of the zero-energy action.The wavelet equation implies that the tangent t = dx/ds to the path is parallel to the gradient of the potential, or equivalently, that rays are normal to contours of the potential.This is the well-known condition for a minimum energy path and was first derived variationally from the scalar work functional by Olender and Elber [34].It provides a stringent test of the fidelity of the paths obtained by minimisation.Following [34], we choose the Müller-Brown potential of [49] as an example of a complex energy landscape.The potential and its stationary points are shown in Fig. 1.The three minima are marked by crosses and two saddle points by dots.The instanton is computed by requiring the path to start at the minimum on the top left and terminate at the minimum on the bottom right.The initial straight line shape, an intermediate shape and the converged instanton are shown in panel (a).The minimisation automatically locates the two saddle points and makes the the instanton pass through them.The action cost along the path is shown in panel (b), where the vanishing of the action on segments of the path along the force is clearly seen.The cosine of the angle between the tangent and force is shown in panel (c) and the condition for a minimum energy path is clearly fulfilled.We emphasise that the condition is not imposed separately but is satisfied automatically at the minimum.The Ritz method provides an alternative to chain-of-states methods for finding minimum energy paths.It does not need the Hessian of the potential, which makes it suitable for problems where such evaluations are expensive.Unlike [17], our parametrisation has no unit-speed constraint and the minimisation, accordingly, is unconstrained.The method applies without change to dynamics with configuration-dependent friction.

B. Brownian dynamics in a circulatory field
Our second example consider, in contrast to the first, Brownian motion in a force field that cannot be derived from a potential and, as such, necessarily has a non-vanishing curl.Choosing the force field of Maier and Stein [9] gives for the overdamped motion of the two-dimensional coordinate X = (X 1 , X 2 ), where β is a parameter.The force field f (x 1 , x 2 ) = ( x 2 ) is smooth, and f 1 is odd in x 1 and even in x 2 , while for f 2 the converse holds.There are two stable fixed points at x a = (−1, 0) and x b = (1, 0), and a saddle point at x s = (0, 0).The force field admits a potential only for β = 1, when it can be written as The force field is shown in the first panel of Fig. 2 for β = 10 together with the instanton moving from x a to x b .As before, solid (dashed) segments represent motion against (along) the vector field.The instanton moving from x b to x a is obtained by reflection about the x 1 -axis showing that that fluctuational and relaxational paths are not identical in a non-gradient field.
The middle panels shows the stationary quasipotential V Ai ∞ (x) with respect to the attractors at (−1, 0) and (1, 0) respectively.The quasipotential is sampled on a 128 × 128 grid by computing instantons between a point on the grid and the relevant attractor.The contours of the quasipotential and its heatmap are obtained from these discrete samples.To the best of our knowledge, all prior estimations of the quasipotential for this problem (and more generally, for circulatory forces) have required numerical solutions of the Hamilton-Jacobi equation.Our method of direct sampling provides an alternative to this route of computing the quasipotential.The right panel shows the Lagrangian as a function of arc-length along the instanton.As in the previous example, the Lagrangian vanishes along segments of the path where motion is along the force.For motion against the force, the tangent to the path is no longer parallel to the force, as shown by the variation of the cosine of the angle θ between the tangent and the force.We note that our method is agnostic to the existence, or not, of a potential for the drift and treats both these cases on equal footing.

C. Egger model of weather
Our final example is a reduced model of the weather for a a three-dimensional coordinate X = (X 1 , X 2 , X 3 ) that has a circulatory drift, where k, β, γ, U 0 and H are constants.This model is due to Egger [50].It is not particularly illuminating to visualise the three-dimensional vector field describing this dynamics but we note that it has two stable fixed points, marked by crosses in Fig. 3, and a saddle fixed point marked by a dot.The instanton moving between these points is shown as before in the left and right panels of the figure.Also shown are isosurfaces of the quasipotential with respect to the stable fixed points, with isovalues increasing from red to blue.To the best of our knowledge, this is the first computation of the quasipotential for this model.We provide this example primarily to demonstrate the feasibility of sampling quasipotentials in dimensions greater than two with our method.

VI. NUMERICAL CONVERGENCE
We briefly recall the convergence properties of the Ritz method, comprising that of the basis functions, the quadrature, and the optimisation.The Chebyshev interpolant is guaranteed to converge to the most probable path, assuming that it is Lipschitz continuous, at a rate that increases with the number of derivatives the path admits and is exponential for a smooth path.Likewise, the Clenshaw-Curtis quadrature is guaranteed to converge to the minimum of the action, assuming that the Lagrangian is Lipschitz continuous.The optimal number of quadrature points for accuracy to machine precision can be obtained by following the decay of the Chebyshev coefficients of the Lagrangian and trun- cating at that value beyond which the coefficients vanish to machine precision.The optimisation has lesser theoretical guarantees than the interpolation and quadrature, as is generally the case with search in high-dimensional spaces.However, the residual of the Ritz system provides an empirical measure for how closely the minimum has been located.In all three examples (and in others not presented here) we have found both gradient-free and gradient-based optimization to robustly locate the minima, and gradient-based methods to yield faster convergence.We note that for equilibrium problems, the gradientfree method does not require the Hessian of the energy function, which can be of significant computational advantage.In Table I we show the spectral convergence of the action with increasing polynomial order of the path for each of our examples.

VII. DISCUSSION
We have presented an efficient and accurate numerical method for computing most probable transitions paths and quasipotentials of rare diffusive events.The method directly minimises the Freidlin-Wentzell action and thus provides an unified approach for transition paths in both equilibrium and non-equilibrium systems.Our reparametrisationinvariant form of the action, derived using a Noether symmetry, is well-suited for numerical work and is a generalisation of the geometric action.This frees us from the constraints of the commonly used arclength path parametrisation and offers the maximum flexibility in choosing the space of polynomials in which action is minimised.Thus our method is not limited to the Chebyshev polynomials in [−1, 1] used here but easily admits trigonometric polynomials and, more generally, any global basis.Numerical quadrature reduces the action to a multivariate function of coefficients of the path polynomial whose minimum is obtained by both gradientfree and gradient-based optimisation.This gives, simultaneously, both the minimum value of the action and the most probable path.This efficiency of the method allows us to repeatedly compute mini- Here, δ → functional variation, P → finite-dimensional projection, and ∇ → function minimisation.On the left branch, the action is first varied to obtain the Euler-Lagrange equation and then projected onto a finite-dimensional basis for numerical solution.On the right branch, the action is first projected onto a finite-dimensional basis and then minimised to obtain the Ritz system.The finitedimensional projection of the Euler-Lagrange equations is, in general, not identical to the Ritz system.
mum action paths between an attractor and a point in its basin of attraction and, thereby, map out the quasipotential.The quasipotential in a nonequilibrium steady state has the same significance as the Gibbs distribution in equilibrium and our method provides a robust way of obtaining it without the need to numerically solve the Hamilton-Jacobi partial differential equation.The direct method used here consists of a discretisation of the action followed by a search for the minimum in the resulting finite-dimensional space, expressed schematically in Fig. (4).In contrast, the majority of methods impose the vanishing variation of the action and then search for the solution of the Euler-Lagrange equation in a finitedimensional space.The resulting discretised Euler-Lagrange equations is, in general, not identical to the Ritz system; in other words, these two methods of reducing an infinite-dimensional problem to a finitedimensional one are not equivalent.In contrast to mechanics, where Newton's equations of motion are considered primary and the action derived, here it is the tube probability and hence the Freidlin-Wentzell action that is primary and the Euler-Lagrange equation for the most probable path that is derived.It appears more natural to us to discretise the primary, rather than the derived, object directly.Our approach is algorithmically simple and the only adjustable parameters are the polynomial order n and the quadrature order n q .This simplicity does not compromise accuracy or efficiency, as confirmed by our examples.
The rapid convergence of the method holds promise for its application to problems involving the stochastic dynamics of fields, with both scalar and Lie group-valued order parameters.We also expect the method to apply to stochastic dynamics with degenerate diffusion tensors and to stochastic systems with inertia.These will be addressed in forthcoming work.

Figure 1 .
Figure 1.Ritz method for overdamped motion in the Muller-Brown potential, which has three minima (crosses) and two saddle points (dots).The initial path is the straight line connecting two minima and the instanton is the solid line, with broken segments showing motion along the force.The instanton automatically locates and passes through both saddles.A typical path before convergence to the minimum is shown as a dotted line.(b) The value of the Lagrangian as a function of Euclidean arc-length of the instanton.The action vanishes to machine precision on segments of the path where motion is along the force.(c) The cosine of the angle θ between the tangent and the force is always ±1, i.e. the instanton is a minimum energy path.The instanton is represented by a polynomial of degree n = 10.

Figure 2 .
Figure 2. Ritz method for overdamped motion in a circulatory (i.e.non-gradient) force field.The instanton is in red with solid (dashed) segments showing motion against (along) the force field.The instanton is reflected about the horizontal axis for motion starting on the right, showing the inequivalence of fluctuational and relaxational paths for non-gradient dynamics.(b) The quasipotential, computed using Eq. 13, with a caustic at the unstable fixed point.(c) The value of the Lagrangian as a function of the Euclidean arc-length of the instanton.As in the potential case, the action vanishes to machine precision on segments where motion is along the force.(d) The cosine of the angle θ between the tangent and the force is, unlike in the potential case, not always ±1.The instanton is represented by a polynomial of degree n = 8.

Figure 3 .
Figure 3. Instantons and quasi-potentials of the Egger model.The instantons are shown in red with solid (dashed) lines representing motion against (along) the vector field.The left and right panels are forward and reverse instantons.Isosurfaces of the quasipotential with respect to each attractor is shown in the respective panels.Isovalues increase from light red to blue in the range {1, 7, 11, 16, 21, 26, 31, 36}.Parameter values are k = 2, β = 1.25, γ = 2, U0 = 10.5 and H = 12.The instanton is represented by a polynomial of degree n = 10.

Figure 4 .
Figure 4. Inequivalence of the direct and Euler-Lagrange routes to numerical action minimization.Here, δ → functional variation, P → finite-dimensional projection, and ∇ → function minimisation.On the left branch, the action is first varied to obtain the Euler-Lagrange equation and then projected onto a finite-dimensional basis for numerical solution.On the right branch, the action is first projected onto a finite-dimensional basis and then minimised to obtain the Ritz system.The finitedimensional projection of the Euler-Lagrange equations is, in general, not identical to the Ritz system.