Minimum Action Method for Nonequilibrium Phase Transitions

First-order nonequilibrium phase transitions observed in active matter, fluid dynamics, biology, climate science, and other systems with irreversible dynamics are challenging to analyze because they cannot be inferred from a simple free energy minimization principle. Rather the mechanism of these transitions depends crucially on the system's dynamics, which requires us to analyze them in trajectory space rather than in phase space. Here we consider situations where the path of these transitions can be characterized as the minimizer of an action, whose minimum value can be used in a nonequilibrium generalization of the Arrhenius law to calculate the system's phase diagram. We also develop efficient numerical tools for the minimization of this action. These tools are general enough to be transportable to many situations of interest, in particular when the fluctuations present in the microscopic system are non-Gaussian and its dynamics is not governed by the standard Langevin equation. As an illustration, first-order phase transitions in two spatially-extended nonequilibrium systems are analyzed: a modified Ginzburg-Landau equation with a chemical potential which is non-gradient, and a reaction-diffusion network based on the Schl\"ogl model. The phase diagrams of both systems are calculated as a function of their control parameters, and the paths of the transitions, including their critical nuclei, are identified. These results clearly demonstrate the nonequilibrium nature of the transitions, with differing forward and backward paths.


I. INTRODUCTION
Materials with identical microscopic constitutions can be found in very different macroscopic states when external conditions, such as temperature or pressure, vary. A major achievement of equilibrium statistical mechanics is to give a first-principle explanation of these phase transitions. The theory posits the existence of a distribution, for example of Boltzmann-Gibbs type, that gives the probability of finding the microscopic system in any of its possible configurations. Macroscopic properties like the system's density, magnetization [1], population number in the ground state [2], etc., can then be deduced by enumerating all the microscopic configurations consistent with a given value of the chosen macroscopic observable, and identifying which of these values is most likely. More formally, if we denote by φ the variable used to characterize the macroscopic state of the system, its statistical weight is obtained by summing the system's probability distribution over all microscopic states consistent with a given realization of φ. Performing this calculation typically require sophisticated tools such as renormalization group theory [3], the replica method or the cavity method [4,5], along with tools from large deviation theory [6]. It generically shows that the statistical weight of φ is asymptotically given by exp(−V (φ)/ ), where V (φ) is some free energy to be calculated, and is a small parameter that tends to zero in the thermodynamic limit when the number of microscopic constituents tends to infinity: as a result, the theory predicts that the system will be found in the macroscopic state φ of minimum free energy with probability one in this limit. This also explains phase transitions: they take place when the topology of the free energy V (φ) changes as a control parameter, like the temperature or some applied external field, is varied. For example, if V (φ) has two wells whose relative depths change with the control parameter, a first-order phase transition occurs when the deepest well becomes more shallow than the other well, and as a result, the macroscopic state of the system changes from the first to the second.
The statistical mechanics approach to phase transitions rests on the assumption that the probability distribution of the microscopic system is known. This information is available for equilibrium systems, whose microscopic dynamics is time-reversible. In this work, however, we are primarily interested in nonequilibrium systems, whose dynamics is irreversible. Except for some special situations where it can be computed exactly [7,8] or asymptotically e.g. via some thermodynamic mapping [9][10][11][12], the invariant distribution of these systems is not known in general. Yet, these systems too can undergo phase transitions. Examples include driven systems arising from active matter [13][14][15][16][17][18], fluid dynamics [19,20], biology [21], neuroscience [22][23][24], climate science [25,26], etc. The description of such nonequilibrium phase transitions requires a generalization of the equilibrium statistical mechanics approach, in which we must consider the probability of trajectories rather than configurations.

A. Minimum Action Principle
Even if the stationary distribution of nonequilibrium systems is not known in general, we can often write down the probability distribution of their trajectories, using e.g. path integral approaches such as the Martin-Siggia-Rose-Janssen-De Dominicis [27] or the Doi-Peliti formalism [28], or Girsanov theorem. This offers the possibility to generalize the microto-macro mapping to trajectory space rather than phase space: that is, enumerate all the microscopic trajectories leading to the same evolution of a macroscopic variable, and thereby deduce the probability weight of these macroscopic trajectories. While these calculations are again to be performed on a caseby-case basis, by analogy with the equilibrium setup we can deduce some of the generic features of the result. Let us discuss those next.
Assuming again that the macroscopic state of the system can be described by a variable or field φ in some differentiable manifold M (for example R d , S d , or L 2 (R d )), work- Probability distribution (e.g. Boltzmann-Gibbs distribution) of the microscopic variables in state space.
Probability distribution (e.g. path integral) of the microscopic trajectories.

Macroscopic variable
Map from microscopic state space to coarse-grained macroscopic space (e.g. spin values to magnetization).
Map from microscopic trajectories in phase space to macroscopic trajectories.

Macroscopic weights
Marginal distribution of the macroscopic variables, and associated free energy Marginal distribution of the macroscopic trajectories, and associated action. (1) whereφ = dφ/dt ∈ T φ M. The action S T [φ] is the nonequilibrium generalization of the free energy V (φ) and minimization of S T [φ] allows us to quantify the probability and mechanism of various macroscopic events in the limit as → 0. In particular: 1. The probability that the system started in state φ a at time t = 0 ends up in state φ b at time T , is obtained by summing exp(−S T [φ]/ ) over all paths with these end points. When 1, the path with minimum action dominates this sum, which means that the aforementioned probability is asymptotically given by where the minimization is taken over all paths {φ(t)} t∈[0,T ] such that φ(0) = φ a and φ(T ) = φ b and means exponential asymptotics, i.e the ratio of the logarithm of both sides in (2) tends to 1 as → 0. The minimizer of the action also gives the pathway by which the macroscopic transition event occurs with probability one in this limit.
2. The non-equilibrium invariant distribution of the system can be characterized similarly via the quasipotential defined as where the inner minimization is again taken over all paths {φ(t)} t∈[0,T ] such that φ(0) = φ a and φ(T ) = φ b . The quasipotential V φa (φ b ) plays a role analogous to the free energy barrier from state φ a to φ b , and it can be used to identify the possible phases and formulate an equivalent of Arrhenius law. More precisely: φ a is a metastable phase if V φa (φ) ≤ V φ (φ a ) for all φ in a vicinity of φ a (i.e. φ a is the non-equilibrium equivalent of a local minimum on the free energy); and if φ a and φ b are the only two metastable phases in the system, the asymptotic rates of transition from φ a to φ b and φ b to φ a are respectively given by 3. Eq. (4) is a non-equilibrium generalization of Arrhenius law. It implies that the relative probability to find the system in states φ a or φ b on its non-equilibrium invariant distribution is asymptotically given by As a result, with probability 1 as → 0, the system is in . By analyzing how the quasipotential varies in terms of the system's control parameters we can thereby identify non-equilibrium phase transitions that arise when V φ b (φ a ) = V φa (φ b ) and characterize their mechanism-the details of these calculations will be given below. We also refer the reader to Fig. 1 for a graphical illustration in a toy system of non-equilibrium phase transition whose detection requires the formalism above.

B. Hamiltonian formalism
The minimum action principle described in the last section offers a way to study transition events and phase transitions in non-equilibrium systems. Concrete predictions however rest on our ability to: (i) derive the Lagrangian used in the action (1) and (ii) minimize this action as needed in (2) and (3).
Like in the equilibrium case, resolving the first issue is again complicated in general and requires to be handled on a case-by-case basis. When these calculations can be done (see Sec. II for a list of examples), one often deduces that L(φ,φ) is given as the Legendre-Fenchel transform of a Hamiltonian The parameters D and h are fixed to 0.5 and −0.1, respectively, while ν is used as a control parameter. The flow of the vector field f for ν = 0.5 is shown in the top panel and ν = 1.5 in the bottom panel. This flow is non-gradient (i.e. the deterministic dynamics is not steepest descent over an energy) has two stable fixed points, φa (bottom left) and φ b (top right), which solve f (x, y) = f (y, x) = 0, which are the possible phases in this toy example. The blue line represents the most probable transition path (i.e. the minimizer of the action) from φa to φ b and the orange line is the most probable path from φ b to φa. The thickest line indicates the path with larger rate: that is, φa is the stable phase when ν = 0.5 (top panel), while φ b is the stable phase when ν = 1.5 (bottom panel). The transition paths were calculated with the method developed in this paper, and cross-checked using GMAM [29].
H(φ, θ): where ·, · denotes the scalar product in T φ M and θ is a field conjugate to φ whose physical meaning will be explained below. The form of the Hamiltonian H(φ, θ) is also problemdependent but it is known in some instances, see Sec. II A.
Using (6), the minimization of the action can then be formulated as a min-max problem: where the supremum is taken over all {θ(t)} t∈[0,T ] and the infimum over all {φ(t)} t∈[0,T ] such that φ(0) = φ a and φ(T ) = φ b . To get the quasipotential, we must also consider an extra minimization over all T > 0 to (7), while other applications may require adding terms to (7) or modifying the boundary conditions for this min-max problem. In most cases, these calculations must be performed numerically. One of the main goal of this paper is to develop robust numerical methods for these computations. These methods aim to be general enough to be applicable to a wide variety of systems that fit the framework above; here we will also use them to solve some non-trivial examples involving spatially-extended systems undergoing non-equilibrium phase transitions.

C. Related Works
The Euler-Lagrange equations associated with the min-max problem (7) are Hamilton's classical equations: What makes the problem non-standard, however, are the boundary conditions imposed on φ(t) at t = 0 and t = T . The nature of these boundary conditions suggests to use shooting methods [30], as was proposed e.g. in [31], but such methods scale badly with dimension, or can even be ill-posed for the problems we are interested in, for which the equation for θ in (8) cannot be integrated forward in time. Shooting methods are also hard to use when T = ∞, which typically arises To get around this difficulty, the minimum action method (MAM) proposed in [32] evolves the whole trajectory {φ(t)} t∈[0,T ] while keeping φ(0) = φ a and φ(T ) = φ b fixed. This amounts to performing gradient descent (GD) on the action in the landscape of all authorized paths satisfying these boundary conditions. Introducing the artificial optimization time τ , GD results in the following evolution equation for or using the Lagrangian formulation of the action with the same boundary conditions at t = 0, T . The main drawback of MAM is that it involves the Lagrangian L(φ,φ) rather than the Hamiltonian H(φ, θ), and there are many problems of interest where the latter is explicitly available but the former is not. Solving (10) then requires one to perform for all t ∈ [0, T ] at each iteration step in τ to numerically get an estimate of the function ϑ(φ,φ) such that Proceeding similarly, we can also obtain numerical estimates for the derivative ∂L/∂φ and (d/dt)∂L/∂φ appearing on the right hand-side of (10). This approach was used in [33]. The downside is that (10) is a partial differential equation in physical time t, optimization time τ , and possibly space as well when φ and θ are fields; writing efficient numerical solvers for such equations typically requires one to use implicit schemes for numerical stability and/or efficiency, and such schemes are hard to design without explicit knowledge of ϑ(φ,φ). This is why here we want to bypass the computation of ϑ(φ,φ) and solve the min-max problem in (7) concurrently by treating the minimization over φ and the maximization over θ on equal footings.
If we now turn our attention to the quasipotential (3), an extra minimization over all T > 0 must be added to the min-max problem (7). The MAM can be generalized to handle this problem by using a reparametrization of the path {φ(t)} t∈[0,T ] by arclength rather than physical time. This formulation leads to the geometric minimum action method (GMAM), in which the minimization over T is performed explicitly beforehand [29]. GMAM was further developed and used in [33,34] and a variant of it was also recently proposed in [35] to compute the quasipotential by a spectral decomposition of paths and an optimization of basis coefficients. However, these methods are Lagrangian based, with the issues discussed before. We will show below how to use the ideas behind GMAM in a Hamiltonian approach and thereby get an efficient method to solve inf T >0 inf φ S T [φ].
This equation can be used to deduce some structural properties of the quasipotential. It is a central object in problems that can be tackled through macroscopic fluctuation theory (MFT), and the review [38] provides numerous examples and useful insights. Ref. [39] also discusses how to perturbatively solve this equation in systems that are close to equilibrium. In general, however, (12) needs to be solved numerically, which is nontrivial since it is a complicated partial differential equation (or even a functional equation when φ is a field). In dimension 2 or 3 this can be done globally using fast marching methods like the one discussed in [40]. In dimension higher than 3, these methods become inapplicable, and (12) must be solved locally by the method of characteristics using the variational formulation of this equation: this brings us back to solving

D. Organization and Main Contributions
The remainder of this paper is organized as follows: In Sec. II, we give more details about the minimum action prin-ciple that is at the core of our approach. To this end, in Sec. II A we first present typical classes of stochastic dynamical systems that display metastability and non-equilibrium phase transitions, and are amenable to analysis via action minimization. For illustration, we also use our method to compute the transition paths between metastable states in two low-dimensional benchmark models, namely the Maier-Stein model [31] and the Schlögl model [41]. In Sec. II B we discuss the features of the minimum action framework in a general setup, and summarize the main outputs of the approach.
In Sec. III we present our method for solving the minmax problem (7). The scheme is based on performing gradient descent-ascent (GDA) on the objective, which has the advantage that it can be formulated directly in the Hamiltonian setup. The main issue we need to address is that, in our context, the GDA equations are partial differential equations of hyperbolic type in some optimization time τ and physical time t, to be solved with boundary conditions at t = 0 and t = T . In Sec. III A we show that a simple linear change of variables in these equations allows us to reformulate them in a way that is convenient for numerical solution via Strang splitting. In Sec. III B we then generalize this scheme to compute the quasipotential when an extra minimization over T is added to (7). This is done by using a geometric formulation in which the paths {φ, θ} t∈[0,T ] are reparametrized using normalized arclength. This allows us to handle the minimization of T efficiently and calculate paths whose duration in physical time is infinite but whose length remains finite.
In Secs. IV and V we then use the methods we propose to analyze two spatially extended nonequilibrium systems that display first-order phase transitions. More specifically, in Sec. IV we study a modified Ginzburg-Landau (GL) dynamics subject to an additive Gaussian white noise. The noiseless evolution of the field is non-gradient with two stable fixed points; the noise makes these points metastable and we must resort to minimum action algorithms to compute the nonequilibrium transition pathways between them. The action along these paths allows us to estimate the relative probability of the metastable states. We use this procedure to compute the phase diagram of the system in function of two control parameters. In Sec. V, we study a spatially extended version of the Schlögl model, in which the fluctuations are driven both by diffusion of the microscopic molecules and reactions between them. This system displays a first-order non-equilibrium phase transition in terms of the diffusivity of the molecules, which we characterize. We also show that the predictions of the minimum action approach explain the transition events observed in the microscopic system.
Concluding remarks are given in Sec. VI and some technical developments are deferred to several Appendices.

II. PROBLEM SETUP AND INTERPRETATION
The aim of this section is to provide a better motivation of the minimum action principle introduced in the introduction and pinpoint some of its key predictive features. For the reader's convenience we begin by listing a collection of moti-vating problems where the formalism applies. In Sec. II B we will then put the approach in a broader context and explain how to use it to analyze metastability.

A. A collection of motivating problems
The first two examples a and b involve no coarse-graining from micro-to-macro and are included because they are simple and transparent; the last three examples c, d, and e requires one to define proper macroscopic variables to derive the minimum action principle and its Hamiltonian.
a. Diffusion in detailed balance: Consider the motion of a particle x(t) ∈ R d whose evolution is governed by the overdamped Langevin equatioṅ where U (x) is some potential, kT is the product of Boltzmann constant k and the temperature T , and η(t) is a whitenoise. The dynamics (13) is in detailed balance with respect to the Boltzmann-Gibbs probability density function If the potential U (x) has multiple local minima, and the temperature kT is much smaller than the barriers between them, (13) displays metastability: the system stays confined for a long time in the well around a minimum of U (x) before finally hopping to another well where the process repeats. In this example, we can use a WKB expansion to analyze the Fokker-Planck equation associated with (13). The eikonal equation obtained at leading order in kT is a Hamilton-Jacobi equation whose Hamiltonian is given by This Hamiltonian is the one to be used in the min-max problem (7), as can also be proven rigorously using Freidlin-Wentzell LDT [36]. In this example, the quasipotential V xa (x b ) can be calculated explicitly. If x a and x b are the locations of two local minima of U (x) with adjacent wells, the path minimizing the action (3) is the minimum energy path between these two points, and V xa (x b ) is given by where U (x s ) is the energy of the saddle point of minimum height (aka mountain pass) between x a ad x b . Thus we recover the Arrhenius law for the rate of transition from x a to b. Diffusion out-of-equilibrium: The picture above can be generalized to systems whose evolution is described by the stochastic differential equatioṅ . When is small but finite, these two fixed points become metastable, and the noise induces transitions between them: the most likely path from xa to x b is the minimum action path, which is shown as the blue line. Also shown as an red dashed line is the heteroclinic orbit between xa and x b : this orbit is different from the minimum action path, indicative of a nonequilibrium transition. Panel b) shows the increment of the action along the minimum action path and the heteroclinic orbit, confirming that the former is more likely.
D(x) = (σσ T )(x) and some potential U (x). Metastability is observed with (17) in situations where the noiseless deterministic systemẋ = b(x) has multiple stable fixed points and the noise amplitude is small but finite: the system hovers for long times in the basin around one of these fixed points, but a noisedriven transition to another basin eventually occurs. These . The flow lines of the massaction law are shown in panel a), along with its stable fixed points (black dots) and unstable fixed points (black crosses). When the number of agents is large but finite, these stable fixed points become metastable states, and the most likely paths between them are shown as full lines in blue and orange. Also shown in dashed red line is the heteroclinic orbit. All three paths differ, indicative of a a nonequilibrium transition. Panel b) shows the increment along the action of the two minimum action paths, indicating that the bottom left state is most stable under random fluctuations. transitions can be described by the minimum action principle using the Hamiltonian which can again be obtained via WKB analysis of the Fokker Planck equation associated with (17). It is given by Given two stable fixed point x a and x b ofẋ = b(x) with adjacent basins of attraction, it is no longer possible in general to solve (3) analytically and calculate V xa (x b )-to do so requires numerical tools of the type developed below. Still, we know that the rates of transition from x a to x b and x b to x a satisfy respectively and, with probability 1 as → 0, the system performs the transition by following the optimal path minimizing (3)-for an illustration in the context of Maier-Stein model [31], see Fig. 2. The results in SDE with small noise of this type can be made rigorous using Freidlin-Wentzell theory of large deviations (LDT) [36]. c. Reaction networks: Consider a well-stirred chemical network between M chemical species, where the quantity of species i is denoted X i . Define the population vector X = (X 1 , . . . , X M ) T , and assume that there are R reaction channels with rates w j (x) and change (stoichiometric) vectors When the typical number of agents, Ω, tends to infinity, the dynamics of x = X/Ω is captured by the mass-action laẇ and the minimum action principle is useful to quantify the effects of fluctuations when the number of agents is large but finite. In particular, metastability arises if (21) has multiple stable fixed points (see Ref. [42,43]), and it can be analyzed using the Hamiltonian Here too this Hamiltonian can be obtained rigorously via Freidlin-Wentzell LDT, or formally via WKB analysis of the system's master equation [42] (see also the appendix of [16] for a pedagogical derivation), or via a Doi-Peliti field theory computation [28,[44][45][46]. If x a and x b denote two stable fixed points of (21) with adjacent basins of attraction, the rates of transition from x a to x b and x b to x a are respectively given by and with probability 1 as Ω → ∞, when the networks performs the transition, X/Ω follows the optimal path minimizing (3). This minimization needs again to be performed numerically in general. As an illustrating example, Fig. 3 displays reaction paths in the bistable Schlögl model with two reactive compartments: this model belongs to the class of reaction-diffusion networks that will be properly introduced in Sec. V A. Note that if we reduce the number of compartments to one, the quasipotential of this model can be explicitly obtained from the Hamiltonian, see Fig. 4.
d. Interacting particle systems: Consider N particles x i ∈ Λ ⊂ R d , i = 1, . . . , N , that evolve according tȯ where b(x) is a drift as in (17), k(x, y) is some interaction kernel, and η i (t) are independent white-noises. To analyze such interacting particle systems, it is convenient to introduce the empirical density of the particles, . As N → ∞, this empirical density converges towards the density ρ(t, x) that satisfies McKean-Vlasov equation where . Large fluctuations away from the mean-field dynamics (25) can be captured by the minimum action principle, by using the Hamiltonian This Hamiltonian can again be derived rigorously using LDT [47], and it formally follows from a WKB analysis, now performed on the functional master equation for the empirical particle density. e. Fast-slow systems Consider a system made of a pair of variables (x, y) ∈ R d×D whose evolution is governed by where α > 0 measure the separation of time-scale between x and y. For small α, this separation is large and y evolves much faster than x. In particular, when α → 0, the dynamics of x is effectively captured by the deterministic limiting equatioṅ where F (x) is obtained by averaging f (x, y) over the stationary distribution of the equation for y at x fixed, assuming that it exists. This equation defines the so-called virtual fast process, which on the fast time scale τ = αt reads Denoting by E x the expectation over the stationary distribution of y x (τ ), the function F entering (28) is given by If we want to analyze the effect of fluctuations on the dynamics of x when α is small but finite, we can use the minimum  71), with a single reactive compartment containing a large number of particles Ω. The two black disks show the stable fixed points φa (left disk) and φ b (right disk) of the law of mass action valid when Ω → ∞, and the orange square shows its unstable fixed point at φs = 1. The solid and dashed white line shows the level set H(φ, θ) = 0 on which the solution to Hamilton's equations (8) evolve in the limit when T → ∞. The horizontal dashed line at θ = 0 corresponds to the noiseless dynamics of the mass-action law by which the system relaxes to one of the two stable fixed points φa or φ b by moving in the direction of the white arrows. The solid white line, to follow in the direction of the black arrows, shows the solutions of Hamilton's equations on H(φ, θ) = 0 at θ = 0: these trajectories indicate the most probable path by which the fluctuations can drive the system away from φa and φ b , and induce transitions when reaching φs. In this example, the action required to drive the system away from φa to some φ ≤ φs in its basin of attraction is the area between the white and the dashed solid lines, going from φa to φ: this area is also the quasipotential is also the viscosity solution of the Hamilton-Jacobi equation (12). action principle with a Hamiltonian that can be derived from LDT [48][49][50][51] In general, this Hamiltonian will need to be calculated numerically, which makes slow-fast systems of the type above more difficult to treat than the models previously discussed in this section. Still, provided that we can design some numerical routine to estimate H as well as its derivatives ∂ x H and ∂ θ H, the numerical methods presented below are applicable in the context of slow-fast systems too. In models of this type = α.

B. General setup
The problems listed in Sec. II A all share a Hamiltonian with the following features:

A3. H(φ, θ) is twice differentiable in both its arguments;
Assumptions A1 and A2 follow from the fact, generically, the Hamiltonian can be expressed as a cumulant generating function, i.e. in the form of an expectation generalizing (31): where F is problem-dependent and the expectation is taken over the statistics of some underlying process y φ conditional on φ being fixed. Assumption A3 is added for simplicity, as it guarantees that Hamilton's equations (8) are well posed. The aim of this section is to discuss at generic level the meaning we can give to the min-max problem (7) assuming that the Hamiltonian satisfies these assumptions. In particular, we will show that the conjugate field θ appearing in the minmax problem (7) can be generically interpreted as measuring the mean effects of the fluctuations in the system dynamics needed to achieve some rare event, and the minimum of the action as the total cost of these fluctuations from which the probability of the event can be estimated as well as its mechanism and rate (if we add the minimization over T > 0) .

Mean behavior
To begin, notice that for the interpretation that θ measures the effects of the fluctuations to be consistent, the stochastic system under consideration should, in some appropriate limit in which the fluctuations disappear, satisfy the deterministic evolution equation obtained by setting θ = 0 in Hamilton's equations (8) For example, returning to the problems mentioned in Sec. II A, (33) reduces to the ODEẋ = b(x) as → 0 in the SDE (17) with Hamiltonian (18); to the law of mass action (21) as Ω → ∞ for the reaction network (20) with Hamiltonian (22); to the McKean-Vlasov equation as N → ∞ for the interacting particle system (24) with Hamiltonian (26); and to (28) as α → 0 in the slow-fast system (27) with Hamiltonian (31). More generally, under our assumptions, the solution of (33) is indeed a special solution with θ(t) = 0 of Hamilton's equations (8) since Assumptions A1 and A3 imply that ∂ φ H(φ, 0) = 0 for all φ.

Impact of the fluctuations
At the same time, the solution to the deterministic evolution (33) with φ(0) = φ a does not satisfies φ(T ) = φ b in general. Therefore, for general boundary conditions φ(0) = φ a and φ(T ) = φ b , the solution of (8) must have θ(t) = 0-in the minimum action framework this is a reflection that in the original system fluctuations are needed to drive the system's trajectory away from the solution of (33), and the value of θ(t) = 0 allows us to quantify the cost/probability of observing the event φ(T ) = φ b given that φ(0) = φ a . Specifically, under the strict convexity Assumption A2, we have with equality if and only if θ = 0. Sinceφ = ∂ θ H(φ, θ) along the solution to Hamilton's equations (8), we deduce that along this solution can be indeed interpreted as a cost, which is zero only if θ(t) = 0 (i.e. when the event can occur without fluctuations), and is strictly positive otherwise (i.e. when the event requires fluctuations). For the problems listed in Sec. II A the form of the integrand θ, ∂ θ H − H is |σ(x)θ| 2 ≥ 0 for the SDE (17); R j=1 a j (x) ν j , θ e νj ,θ − e νj ,θ + 1 ≥ 0 for the reaction network (20); 1 2 R d |σ(x)∇θ(x)| 2 ρ(x)dx ≥ 0 for the interacting particle system (24); and (27).
In this interpretation, minimizing S T [φ] amounts to minimizing the cost of the fluctuations or, equivalently, finding the most likely fluctuation that drives the event φ(T ) = φ b given that φ(0) = φ a , leading to the asymptotic estimate (2) for the probability of the event.

Long-time limit.
Turning now our attention to the problem , its interpretation is easiest if we assume that the noiseless equation (33) has N stable fixed points φ 1 , φ 2 , ..., φ N , and the basins of attraction of these fixed points under (33), denoted respectively as B 1 , . . . B N , In this case, we can calculate the quasipotentials V φi (φ j ) of every φ i and φ j with i = j such that these points have adjacent basins: these are defined as Consistently we also set V φi (φ j ) = +∞ if B i and B j have no common boundary. These quasipotentials quantify the cost of the fluctuations needed to escape φ i conditional on entering φ j next, and these costs can be used to deduce the asymptotic rate of these transition events. More precisely, as the parameter measuring the amplitude of the macroscopic fluctuations tends to zero, the system dynamics can be approximated by a Markov jump process (MJP) between the metastable states φ 1 , ..., φ N , with rates given asymptotically by Questions about the asymptotic behavior of the system's dynamics can be answered by analyzing this MJP. Since the rates are all vanishing exponentially at different rates as → 0, this process is quite singular and can be analyzed by the method of decomposition into cycles developed by Freidlin and Wentzell [36] (see also [37]). This method is however rather intricate, and in practice it is often simpler to solve specific questions in the MJP directly (e.g. what is its invariant distribution or what is the mean first passage time from state i to state j), and take the limit as → 0 afterward. Note also that the quasipotentials V φi (φ) for i = 1, . . . , N can be used to construct a global nonequilibrium potential φ, solution of the Hamilton-Jacobi equation (12) [36,37]; since this construction is not doable in practice for the high-dimensional examples we are interested in, we will not dwell upon it here. We will however look at the minimizers of inf T ≥0 inf S T [φ], which can exist if the path {φ(t)} t∈[0,T ] is reparametrized using arclength rather than physical on any minimizing sequence: these geometric paths give the mechanism of the transition between φ i and φ j .
The statements made in this section summarize what can be deduced from the minimum action principle when it applies. As stated early, these results can be proven rigorously in specific cases using e.g. tools for large deviation theory (LDT), in the small noise [36] and weak interaction limit [47], or in more complicated setups involving hydrodynamic limits [52,53]. We can however envision situations beyond the realm of LDT where min-max problem like (7) with a Hamiltonian H satisfying Assumptions A1, A2, and A3 arise (e.g. from the Martin-Siggia-Rose-Jensen-De Dominicis [27] or the Doi-Peliti formalism [28], or from macroscopic fluctuation theory [38,54]), and can be interpreted as above. The methods that we propose below are of general purpose and were designed with such general situations in mind.

III. COMPUTATIONAL ASPECTS
A. Min-max on the action as a wave propagation problem In this section, we discuss how to solve the min-max problem stated in (7) where we defined the functional (38) and the optimization is to be performed over trajectories If the functional I T is convex with respect to φ and concave with respect to θ, then it is well-known [55,56] that this min-max problem can be solved by amortizing the minimization and maximization over small alternating steps of steepest descent in φ and steepest ascent in θ. If these steps are infinitesimal in some artificial optimization time τ , this gradient descent ascent (GDA) method leads to the evolution equation where for convenience we have introduced a parameter α > 0 that sets the relative time scales over which φ and θ evolve -in the jargon of GDA this is referred to two time scale GDA [57].
Calculating the functional derivatives, the system (39) is explicitly given by These equations for {φ(τ, t), θ(τ, t)} are to be solved with the boundary conditions (in physical time t) for some initial conditions (in optimization time τ ) It is easy to see that the fixed points (in τ ) of (40) are solution to Hamilton's equations (8) that satisfy φ(0) = φ a and φ(T ) = φ b . In Appendix A we show that: (i) there is a oneto-one correspondence between the fixed points of (40) and the critical points of the action S T [φ], and (ii) if α is small enough these fixed points are stable if and only if they are local minimizers of the action. Thus solving (40) is indeed a way to perform inf φ S T [φ]. For illustrative purposes, we derive in Appendix B how the GDA converges to the instanton for an Ornstein-Uhlenbeck process.
Let us now show how to put (40) in a form that is convenient for numerical integration. Since (40) is an hyperbolic system of partial differential equations (PDEs), it is useful to ; the functions f (u, v) and g(u, v); T > 0, ∆τ > 0, α > 0. 2: Initialization: For every i ∈ I, take θ 0 i = 0, and set u 0 Update u with an implicit upwind scheme, namely, solve {u n+1 i }i∈I sequentially from i = M to i = 0 using: Update v with an implicit upwind scheme, namely, solve {v n+1 i }i∈I sequentially from i = 0 to i = M using: diagonalize the problem and introduce the fields u = φ + αθ and v = φ−αθ that propagate along characteristics and verify where we have defined The boundary conditions now only involve the propagating fields This formulation shows that the system made of (43) and (44) is well-posed under these boundary conditions (see [58]) since the fields v and u propagate respectively forward and backward in physical time t as the optimization time τ increases. It also immediately suggests an algorithm to solve Eqs. (43) and (44) based on Strang splitting [59]: to update the fields at every iteration step in τ , first update u at v fixed by propagating the final condition at t = T for u in (48) towards t = 0 using (43) with forward differentiation in t, then update v at u fixed by propagating the initial condition at t = 0 for v in (47) towards t = T using (44) with backward differentiation in t.
Of course, the convergence does not change if the algorithm starts by updating v before updating u.
In practice, the continuous paths φ(τ, t) and θ(τ, t) are discretized in physical time on M + 1 points with index i ∈ I = {0, · · · , M } such that T = M ∆t, and we use index n ∈ N 0 to encode the evolution of the paths in optimization time τ using steps of size ∆τ , so that for any field ψ(τ, t), ψ n i ≡ ψ n (i∆t). The details are given in Algorithm 1. The stability of the code relies on two important features. First, advection of the fields is treated with an implicit upwind scheme for both v and u. Second, reaction terms g and f are also evaluated on an upwind grid point with respect to the direction of advection. Reaction terms could also be evaluated on site i but we empirically found that the upwind implementation strongly stabilizes the code when dealing with spatially extended diffusive fields. The stability analysis of the numerical scheme is detailed in Appendix C.
We should also emphasize that our interest resides in the fixed point of the dynamics that solves Hamilton's equation in physical time t, rather than in the details of the dynamics in algorithmic time τ , which has no physical relevance. This consideration enjoins us to look for the largest time-step ∆τ that still provides a converging algorithm. The time-step ∆t, however, crucially needs to remains smaller than some characteristic time t c needed to correctly resolve the dynamics of the instanton. This issue is discussed in more details in Appendix C.
Finally, it is important to mention that a higher-order finite difference stencil for the advection of the fields can be implemented while keeping the same algorithmic complexity. Such a scheme can significantly improve computation time since we need a smaller number of grid points to get the same accuracy as the first-order scheme, introduced in the text for purpose of simplicity. The second-order scheme is presented in Appendix D.

B. Geometric formulation on unbounded time intervals
Let us now turn to the min-max problem where I T (φ, θ) is the functional defined in (38)  Update u with an implicit upwind scheme, namely, solve {u n+1 i }i∈I sequentially from i = M to i = 0 using: Update v with an implicit upwind scheme, namely, solve {v n+1 i }i∈I sequentially from i = 0 to i = M using: value of the inner min-max by increasing T . This lack of optimizer complicates the solution of (49). To proceed, it is useful to follow the strategy of the geometric minimum action method (GMAM) in [29], and parametrize the physical time as t(s) for s ∈ [0, 1]. Writing dt/ds = λ −1 (s), and introducing (φ(s),θ(s)) = (φ(t(s)), θ(t(s)), the minimization over T in (49) can be turned into a minimization over λ: (50) whereφ = dφ/ds and the min-max is to be performed over the triple {φ(s),θ(s), λ(s)} s∈[0,1] subject to the boundary conditionsφ(0) = φ a ,φ(1) = φ b . Since we have added degrees of freedom by representing T by the function λ(s), we can add a constraint on the parametrization ofφ(s), e.g. by imposing that |φ | = cst (in s), in which case s is normalized arclength along the pathφ. This choice is convenient numerically as it guarantees that discretization points in s will be uniformly distributed along the path.
The min-max problem (50) is now in a form that can be solved using GDA, similar to what we did for (37). As shown in Appendix E, it is however convenient to treat λ separately, and this leads to the following GDA equations (compare (40)) with λ given by Eqs. (51) are to be solved under the constraint that |∂ sφ | = cst (in s, not τ ), with the boundary conditions (in s) for some initial conditions (in optimization time τ ) withφ 0 (s) such thatφ 0 (0) = φ a andφ 0 (1) = φ b . Similar to what we did with (40), the system of equations (51) as well as the expression (52) for λ can be put in a form suitable for numerical solution by rewriting them in terms of the fields u =φ+αθ and v =φ−αθ. For brevity we will not write these equations explicitly and refer the reader to Algorithm 2 for their discretized version. At convergence,φ andθ satisfy If we insert the first of these equation in (52), we can reorganize this equation into This equation shows that at convergence we must also have ∀s ∈ [0, 1] : H(φ(s),θ(s)) = 0.
This is consistent with the fact that the term involving λ −1 H in (50) can also be interpreted as a Lagrangian multiplier term added to the objective function to enforce the constraint that H = 0, see Ref. [29] for details. In practice the continuous pathsφ(τ, s) andθ(τ, s) are discretized in s on a grid of M + 1 points with index i ∈ I = {0, · · · , M } such that 1 = M ∆s, and we use the index n ∈ N 0 to encode their evolution with step size ∆τ in artificial time τ so that for any field ψ(τ, s), ψ n i ≡ ψ n (i∆s). The details are given in Algorithm 2, which is a modified version of Algorithm 1 that includes the step of reparametrization of the path. Note that we have kept the implicit upwind discretization for the advection term. Also, the reaction terms f and g, and the coefficient λ should be evaluated at the same grid point, as originating from the same term λ −1 H(φ,θ) in (50). We found that evaluating these terms at the target grid point i is better for stability in general.
Algorithms 1 and 2 are simple to implement and only require evaluating the first derivative of the Hamiltonian H(φ, θ) with respect to its arguments, just as we would have to do to solve Hamilton's equations (8). It was used to calculate the paths shown in Figs. 1, 2, and 3. In the more complicated examples we treat below, because the state variable is a field that depends on space as well as time, and as a result the PDEs (43) and (44) involve spatial derivative too, some additional consideration must be given to the way we discretize space and evaluate these derivatives to ensure that the resulting scheme is numerically stable. As usual with PDEs, this issue needs to be addressed on a case by case basis, depending of the nature of the PDE.

IV. FIRST-ORDER PHASE TRANSITIONS IN A NONEQUILIBRIUM GINZBURG-LANDAU (GL) SYSTEM
A possible starting point of the method borrows from Landau's theory of phase transitions where, relying on the symmetries of the system, we postulate the free energy of the macroscopic variable of interest φ rather than deriving it from a microscopic distribution [60][61][62][63]. When the system is in equilibrium, this procedure is well understood [64,65]. For nonequilibrium systems, macroscopic fluctuation theory (MFT) [38,54,66] offers a generalization of Landau's approach where we directly starts from the action. In what follows, we apply a similar approach to study phase transitions in a modified Ginzburg-Landau system.

A. Nonequilibrium GL dynamics
Following the general framework introduced in [67,68], the stochastic evolution of a non-conservative field ρ(t, x) can formally be described by the Langevin equation where the drift µ([ρ], x) can be interpreted as a chemical potential, η(t, x) is a standard Gaussian white noise in space and time whose amplitude is measured by > 0, and where for simplicity we have set the mobility to 1. For a field ρ defined on a one-dimensional domain, say [0, 1], the action corresponding to this equation is and it will be the subject of our investigations. The dynamics (58) is in detailed balance when µ = µ E , with µ E such that it can be written as a derivative of a free energy functional F[ρ], In this case, at equilibrium, the field configurations are distributed according to the Gibbs measure associated with the free energy F, at temperature . Here we will be mostly interested in the active version of dynamics (58), when µ = µ E + µ A with µ A that cannot be cast into the form (60), or, equivalently, does not satisfy the functional Schwarz relation [12,16]: When µ A = 0, the stationary distribution of the field configurations, if it exists, is a nonequilibrium distribution which is not available in closed form. For concreteness, we will focus on the following example that displays a nonequilibrium first-order phase transition: we assume that the field ρ(t, x) is one-dimensional, with x ∈ [0, 1] and periodic boundary conditions, and we use chemical potentials given by such that (58) becomes Here D > 0 is a diffusion constant which effectively depends on the system size in the dimensionless variables we use (reducing D is equivalent to enlarging the domain size), h the strength of an externally applied field, and κ the strength of the nonequilibrium coupling. The chemical potential µ E is borrowed from the Ginzburg-Landau φ 4 theory, often referred to as Model A [64]. A direct functional integration shows that µ E = δF/δρ with The chemical potential µ A was chosen because the phenomenology that it brings remains straightforward without being trivial. Specifically, µ A acts as an additional uniform applied field, that is active and depends nonlinearly on the value of ρ, rather than being externally imposed. While µ E drives the system towards the minimizers of the energy (65), which are homogeneous state solutions of ρ − ρ 3 + h, µ A homogeneously pushes the field upward when κ > 0 and downward when κ < 0. Therefore, the applied field h and µ A have competing effects when h and κ have opposite signs. Since there is a region in the (κ, h) space where the noiseless dynamics has two stable fixed points (see Sec. IV B), this means that the system can undergo a nonequilibrium first-order phase transition for critical values of h and κ which we will determine in Sec. IV C.

B. Phase boundaries for coexisting homogeneous fixed points
Numerical evidence indicates that the stable fixed points of the noiseless dynamics (i.e. Eq. (64) with = 0) are homogeneous states. As a result, they are solutions to ρ − ρ 3 + h + κρ 2 = 0. In the domain where this equation has three real roots, ρ − , ρ c and ρ + with ρ − < ρ c < ρ + , ρ − and ρ + are stable fixed points of the noiseless dynamics whereas ρ c is an unstable point. The coexistence region in the parameter space (κ, h) where both ρ − and ρ + are present is marked as region (ii) in Fig. 5 shown as a blue line in Fig. 5, and shown as a yellow line in Fig. 5. Exactly on these boundaries, only two real roots coexist, and one state is thus marginally stable. In regions (i) and (iii) only one stable state exists. Our next goal will be to analyze the relative stability of ρ − and ρ + under the effect of the noise, i.e. derive the phase diagram of the system. Even though we lack a free energy that yields the stationary measure, we expect ρ + to be the stable phase for h, κ > 0 and ρ − for h, κ < 0. However, when h and κ have opposite signs (and thus opposite effects on ρ), determining the most likely phase becomes nontrivial.

C. Extracting the minimum action paths
To assess whether ρ + or ρ − is the stable phase in the coexistence region, we will compute the difference of the minimal actions ∆S ≡ V ρ− (ρ + ) − V ρ+ (ρ − ), obtained from the minimum action paths from ρ − to ρ + and vice-versa. The Hamiltonian entering the action is the one associated with Eq. (58): where the scalar product of two functions f and g is given by f, To obtain the phase diagram, these calculations must be repeated for a set of values (κ, h) in the coexistence region to compute the minimal action difference as a function of these parameters, ∆S(κ, h): From Eq. (5), the line of phase transition is then the curve where ∆S(κ, h) = 0.
In practice we use Algorithm 2 with M = 400 copies along the path, N x = 64 points of space discretization, ∆τ = 10 −3 , and α = 0.33, and we monitor convergence by looking at the decay of the action. The result of these computations is shown in Fig. 5, where the purple dashed line frontier in the phase diagram corresponding to ∆S(κ, h) = 0. The stable phase is ρ − below the line (region (ii)a) , and ρ + above it (region(ii)b).
For comparison, we also compute the phase diagram under the (wrong) assumption that the escape paths were given by the heteroclinic orbits followed in a time-reversed way. This would have to be the case in equilibrium by time-reversal symmetry. While these escape paths are incorrect in general in nonequilibrium systems, their respective cost in the action gives an upper bound on the actual minima V ρ− (ρ + ) and V ρ+ (ρ − ). Denoting by S het ρ− (ρ + ) and S het ρ+ (ρ − ) the actions along the heteroclinic orbit, we compute for instance S het ρ− (ρ + ) as where the pathρ(s) and the parametrization λ(s) (s ∈ [0, 1]) have been obtained by the string method [69,70] that identifies the heteroclinic orbit betweenρ(0) = ρ − andρ(1) = ρ + , and θ solves λ∂ sρ = ∂ θ H(ρ,θ).
As a sanity check, we did verify that one always has V ρ− (ρ + ) ≤ S het ρ− (ρ + ) and V ρ+ (ρ − ) ≤ S het ρ+ (ρ − ), namely, that the minimizer of the action is always smaller than the action along the heteroclinic orbit. It is also worth noticing that these bonds offer no information about the location of the phase transition line: The line where ∆S het (κ, h) = 0 is plotted as the grey dashed line in The minimum action paths also give physical insights about the mechanism of the transition: the contour plot in (s, x) space of these paths are shown in Fig. 6 for the specific value As can be seen in Fig. 6 the forward and the backward minimum action paths are different, and they cross the separatrix (marked as a dashed black line in the figure) at different places: that is, the critical nucleii for the forward and backward transitions are different (for the backward path this 'critical nucleus' is actually flat). This is a signature of timesymmetry breaking that can be intuitively explained as follows: When κ > 0, as in Fig. 6, the nonequilibrium term κ ρ 2 dx favors the movement from ρ − to ρ + but opposes the one from ρ − to ρ + . In the forward path from ρ − to ρ + , it is better to have ρ 2 dx large, which favors nucleation; conversely, in the backward path from ρ + to ρ − , it is better to have ρ 2 dx small, and the way to minimize this quantity given the value of its changing mean is to have ρ spatially uniform. Notice however that this is a finite size effect: If D were decreased to even smaller values, we would observe nucleation events in both directions (albeit different ones in each).
Note also that the interesting part of these minimum action paths is their escape half, where the noise is needed and the action increment is therefore positive: it is the first half for the forward path in panel (a) and the second half for the backward path in panel (b); for the heteroclinic orbit shown in panel (c), we display both halves at once since the forward and reversed paths are symmetric. In all situations, past the critical nucleus, the paths simply follow the noiseless dynamics, and this half of the path may not be unique if the critical nucleus has more than one unstable direction. This non-uniqueness has no impact on the action however, since the Lagrangian is zero along the solution of the noiseless dynamics.
The effective nonequilibrium Ginzburg-Landau-like dynamics we have considered in this section can be modified to include different nonequilibrium terms, like for example µ A = κ|∂ x ρ| 2 which naturally appears in the coarse-grained field description of interface growth phenomena [71] and active matter systems [15,67,72,73]. We could also use our approach to compute the phase diagram of modified Cahn-Hilliard systems, which naturally emerge in active matter field theories [11,33,67,68,73,74].

V. PHASE TRANSITIONS IN A BISTABLE REACTION-DIFFUSION SYSTEM
A. The Schlögl model In 1972, Schlögl introduced the following chemical reaction network [41] with microscopic rates k i > 0, and where the concentration of A and B are held constant. In a certain regime of the reaction rates, this system displays metastability between a low density and a high density phase. Here we will consider a spatially extended variant of this model introduced by Tȃnase-Nicola and Lubensky [43], in which a one dimensional domain is split into L ∈ N well-stirred compartments: the molecules only react within their compartment, and randomly jump to neighboring ones with rate γ > 0. We also impose periodic boundary conditions. When the number n i of molecules in compartment i ∈ {1, . . . , L} is large, it is convenient to introduce the rescaled ρ i = n i /Ω, where Ω 1 is the typical number of molecules per compartments. In the limit as Ω → ∞, the law of mass action for ρ i is a (discrete) reaction-diffusion equatioṅ where w + (ρ i ) = λ 0 + λ 2 ρ 2 i and w − (ρ i ) = λ 1 ρ i + λ 3 ρ 3 i are the rescaled reaction rates with λ i = k i Ω i−1 (see App. G). (72) can be written as a gradient flow: where ρ = (ρ 1 , . . . , ρ L ) and we introduced with We recognize a (discrete) Ginzburg-Landau free energy, from which we conclude that the stable fixed points of (72) are homogeneous states as long as γ is not too small. For the value of (λ 0 , λ 1 , λ 2 , λ 3 ) that we will consider here, there are two such fixed points, ρ i = ρ ± for all i ∈ {1, . . . , L}, where ρ − < ρ + are the largest and smallest roots of w + (z) = w − (z). The gradient structure of (73) may suggest that ρ − is the . This conclusion is however incorrect, as already observed by Tȃnase-Nicola and Lubensky [43] who showed that the system undergoes a nonequilibrium first-order phase transition when γ changes. Since E(ρ ± ) = U (ρ ± ) and, therefore, is independent of γ, the phase transition cannot be predicted by analyzing E(ρ) only: the system is not in detailed balance with respect to the Gibbs measure associated to E(ρ). What was not determined in [43] is the critical value γ c at which the phase transition occurs. This is the question we address next. The parameters λi are fixed and we vary the jump rate γ. For γ < γc = 5 ± 0.1, we have Vρ + (ρ−) > Vρ − (ρ+), which indicates that ρ+ is stable state phase, whereas ρ− is for γ > γc. We use the parameters from [33]: λ0 = 0.8, λ1 = 2.9, λ2 = 3.1, λ3 = 1, and took L = 40.

B. Change of relative stability with increasing jumping rate
The spatially extended Schlögl model is a reaction network of the type considered in Sec. II A, and its phase diagram can be analyzed by minimizing the action associated with a Hamiltonian similar to (22). Using the structure of the model, it is natural to decompose H = H R + H D , with H R accounting for the reaction and H D for the jumps: We use this Hamiltonian in the action that we minimize using Algorithm 2 to calculate the quasipotentials V ρ− (ρ + ) and V ρ− (ρ + ). We repeat these calculations for different values of the jump rate γ while keeping the rates fixed at (λ 0 , λ 1 , λ 2 , λ 3 ) = (0.8, 2.9, 3.1, 1), for which ρ − = 0.5 and ρ + = 1.6. We use L = 40 compartments and in Algorithm 2 we set M = 400 (∆s = 2.5 × 10 −3 ), α = 1, and ∆τ = 0.01. The graphs of V ρ− (ρ + ) and V ρ− (ρ + ) versus γ are shown in Fig. 7. These results indicate that the nonequilibrium firstorder phase transition occurs at γ c 5: ρ + is the stable phase for γ > γ c , while ρ − is the stable one for γ < γ c . We stress again that this result cannot be deduced by looking at E(ρ) even though the law of mass action can be written as the gradient flow (73). This is of course not a contradiction: the Schlögl model is not in detailed balance and lacks time reversal symmetry because this property also depends on the nature of the noise. The non-equilibrium nature of the phase transition can be confirmed by looking at the transition paths from ρ − and ρ + For these values we have ρs = 1 and ρ+ = 1.6. The figure is organized as Fig. 6. As spatial coordinate, we use x = i/L and plot the transitions paths and associated momenta as if they were continuous in space.
and vice-versa. They are shown in Fig. 8, in which we use the same plotting conventions as in Fig. 6. These results are for γ = 12, when ρ − is the stable phase. We can see that the forward (panel a) and the backward (panel b) paths are different and go through different critical nucleii (marked as a dashed vertical black line on the graphs). These path are also different from the heteroclinic orbit (panel c). If we increase the value of γ, the forward path eventually become homogeneous too (results not show): this is consistent with the fact that at high γ, the system behaves essentially as one single well-stirred compartment. In that limit we can calculate the nonequilibrium steady distribution of the system and use it to calculate V ρ− (ρ + ) and V ρ− (ρ + ): this is done in App. G and gives the same values as the ones obtained by Algorithm 2 when γ is large and the transition paths are both homogeneous. Conversely, if we decrease the value of γ, the backward path becomes inhomogeneous too (results not shown). This transition from homogeneous to inhomogeneous backward path occur around γ = 7. How to gain intuition about these changes of behavior is harder on this model than in the GL model of Sec. IV because the noise is non-Gaussian, and ultimately the shape of the minimum action paths depend on a complex interplay between many effects in the dynamics.  For the parameters value used in Sec. V B it is not possible to calculate this way the phase diagram shown in Fig. 7: this is because Ω needs to be large in order for the phases ρ − and ρ + to be (meta)stable under the noise, and the timescales of the forward or backward transition between the phases ρ − and ρ + is of the order of the nucleation times e ΩVρ − (ρ+) and e ΩVρ + (ρ−) . These timescales are too big to be accessible with MCMC. We can however perform two types of experiments to test the results of the minimum action principle: First we can check that MCMC simulations of the microscopic system initiated with some nonuniform profile in the compartments behave as predicted by Eq. (72). The results of these simulations are shown in Fig. 9 where we compare the evolution of a step profile: the microscopic system follows the deterministic dynamics (72), as expected. To do these calculation, we fix Ω = 1.6 × 10 5 , set the number of molecules in each compartment i ∈ {1, . . . , L} to be n i = ρ i Ω for all i, and simulate the microscopic dynamics exactly with Gillespie algorithm [75,76] using the microscopic rates k i = λ i Ω 1−i .
Second, to check the prediction of the minimum action principle in terms of nucleation times, we can change the rates λ i to make one of two phases (say ρ + ) only very weakly metastable, i.e. such that V ρ+ (ρ − ) 1. We can then calculate the mean escape time τ +,− from this state towards ρ − , repeat this calculation for different values of Ω and check that τ +,− e ΩVρ + (ρ−) . The result is shown in Fig. 10, which confirms that the prediction from the minimum action framework explains the microscopic simulations.

D. Continuous limit
Finally, let us consider the continuous space limit of the model by sending the number of compartments L → ∞. To this end, let us set ρ i =ρ(x i )/L and θ i =θ(x i ), with x i = i/L and whereρ(x) andθ(x) are fields on x ∈ [0, 1]. Let us also set γ = DL 2 for some diffusion coefficient D > 0, and λ i =λ i L i−1 for some rescaled ratesλ i . Assuming that D andλ i are O(1) in L, in the limit as L → ∞, it is easy to verify that the Hamiltonians H R and H D in (76) and (77) now becomẽ where we definedw + (ρ) =λ 0 +λ 2ρ 2 ,w − (ρ) =λ 1ρ + λ 3ρ 3 . Note that in this limit the discrete Poisson jumps of the molecules are approximated as a Gaussian noise on the density: we have recovered the multiplicative Gaussian noise that appears in the Dean-Kawasaki equation [46,77]. The structure of the Poisson noise of the reaction is however left unaffected by the limit.
We used Algorithm 2 to calculate transition pathways at continuous level by minimizing the action associated with H =H R +H D . The pathways (not shown) are not significantly different from those shown in Fig. 8 when L = 1/40: this indicates that the system was already close to its continuous limit at that value of L.

VI. CONCLUSION
In summary, the analysis of first-order phase transitions and other activated processes in nonequilibrium systems can be reduced to the minimization of an action in situations where these processes are rare and occur via reproducible pathways. The approach can be justified rigorously within the framework of large deviation theory (LDT), and minimum action principles can also be derived formally in other instances using e.g. the Martin-Siggia-Rose-Janssen-De Dominicis [27] or the Doi-Peliti [28] formalisms. The minimum of the action can be used to generalize Arrhenius law, and its minimizer to explain the mechanism of the transitions, including the shape of the critical nucleus that serves as transition state. Concrete predictions however rest on our ability to solve this minimization problem, which often needs to be done numerically.
Here we developed algorithms to perform these calculations, both in finite and infinite times. These algorithms are designed to be used directly within the Hamiltonian formulation of the action, which leads to a min-max problem, and do not require the user to calculate the Lagrangian beforehand. In particular it can be used for systems where the fluctuations are non-Gaussian and the Lagrangian is typically unavailable in closed form. The applicability of our method was tested on two nontrivial examples involving spatially extended systems undergoing nonequilibrium phase transitions: a modified Ginzburg-Landau equation perturbed by noise and a reactiondiffusion system based on the Schlögl model. In both cases the method allowed us to calculate the phase diagram of the systems, compute the paths of the transitions, and identify the critical nucleus.
We hope that our work will pave the way for a systematic approach to study activated processes in other interesting examples, like active matter systems undergoing a motilityinduced phase separation or a flocking transition, or other systems which display phase transitions but are not in detailed balance. These applications will depend on the possibility to derive an appropriate minimum action principle, potentially through coarse-graining, which is a non-trivial question on its own. We would like to show that if a path is stable with respect to the Lagrangian minimization, then the path is also stable with respect to the Hamiltonian min-max algorithm, for α small enough. We start from (39), but in this section, we conveniently rescale artificial time τ =τ /α, we setα = α 2 and drop the tilde such that the evolution equations read

ACKNOWLEDGMENTS
We focus on the dynamical system subjected to an additive Gaussian white noise, as presented in Eqs. (17) and (18), i.e the Hamiltonian takes the form We assume that a minimum action path {(x * , θ * )} t∈[0,T ] has been obtained. We look at a perturbed path (x * + X, θ * + Θ) with X = (x 1 , · · · , x P ) T and Θ = (θ 1 , · · · , θ P ) T the perturbations, and we assess the conditions for path relaxation to the minimum action path. The evolution of the perturbation reads where we have used the Einstein convention for the sum on repeated indices, and the shorthand notations ∂x k ∂xp | x * , for any indices j, k, p. The evolution of the fields can now be cast into the following form: In the Lagrangian algorithm, the equation Θ = L 3 X is always verified, so the evolution of the perturbation X is simply given by Any perturbation X vanishes if the eigenvalues of the operator L = L 1 + L 2 L 3 are all of negative real part. The operator L is self-adjoint (since L is the Hessian of the action S), and one can thus extract a basis of normalized orthogonal eigenvectors X L n . Let us denote by µ n the n-th eigenvalue and X L n the corresponding eigenvector. We have LX L n = µ n X L n , with µ n < 0. Now, in the Hamiltonian algorithm, we would like to find the conditions under which any perturbation (X, Θ) close to a path of minimum action vanishes when evolving in artificial time τ . The perturbation will vanish if and only if the eigenvalues λ n of the linear operator given in Eq. (A4) have a negative real part. The eigenvalue λ n associated to the eigenvector (Θ n , X n ) T should verify The second equation in (A9) yields Θ n = (1+λ n α) −1 L 3 X n , assuming that there exists α > 0 such that (1 + λ n α) = 0 for every n. Injecting this result into the first equation of (A9) yields a closed equation for X n and λ n (1 + λ n α)(L 1 X n − λ n X n ) + L 2 L 3 X n = 0.
For α 1, the system (A4) can be seen as a perturbation of the Lagrangian problem, where an additional degree of freedom Θ relaxes to L 3 X on a fast time scale 1/α. This suggests to look for eigenvalues with a specific form: (i) a first set of eigenvalues λ (1) n should be the perturbed eigenvalues µ n with perturbed X L n as corresponding eigenvectors, (ii) a second set of eigenvalues λ (2) n is expected to scale as O(α −1 ) and encodes the fast relaxation of the variable Θ to L 3 X.
Therefore we look for eigenvalues of a general form with η n and ν n of O(1). Expanding at leading order O(α −1 ) in (A10) yields implying that η n = 0 or η n = −1. The case η n = 0 leads us to consider the first case (i) mentioned above, in which we look for eigenvalues λ (1) n = µ n + αξ n , with ξ n = O(1) and we expand (A10) to leading order. At order 0 in α, we find that X n must solve where we recognize the operator L = L 1 + L 2 L 3 , which confirms that we expand around the eigenvalues X L n of the Lagrangian system. We write X n = X L n + αY n with Y n = O(1). Now expanding (A10) at order 1 in α, and using the relation LX L n = µ n X L n , we get By taking the scalar product with the eigenvector X L n on both sides of (A14), we obtain The brackets ·, · stand for the scalar product in L 2 ([0, T ]; R P ) which is given by Using the fact that L * 2 = −L 3 , we have where · is the norm associated with ·, · . Inserting this result as well as η n = 0 in (A11), we deduce that This shows that for α small enough, the sign of the real part of λ n and µ n are the same. Now, in case (ii) where η n = −1, at leading order 1 in α (A10) becomes −ν n X n = L 2 L 3 X n . (A19) Hence using again the fact that L * 2 = −L 3 , ν n verifies where X n is an eigenvector of L 2 L 3 . Therefore we have another set of eigenvalues given by with X L2L3 n an eigenvector of L 2 L 3 . Since λ (2) n < 0 if α is small enough, the associated eigenvectors are always stable. Therefore the stability of the fixed points of the Hamiltonian system is determined by the sign of λ (1) n , which is the same as the sign of the eigenvalues µ n of the Lagrangian system for α small enough. To gain insight about the convergence of the GDA algorithm, we consider an Ornstein-Uhlenbeck (OU) process in one dimension for which the evolution equations of the GDA are amenable to analytic solution. The example is also relevant since it is the dynamics verified by each Fourier mode of a freely diffusive field in a one-dimensional periodic box. The stability of the numerical scheme associated with this example will be analyzed in Appendix C.
For an OU process, the Hamiltonian is given by where ζ > 0 is the stiffness of the confining harmonic potential. We look for the instanton joining x a to x b = x a , with x a possibly nonzero (i.e. not the stable point of the deterministic dynamics). The GDA equations read subject to the boundary conditions and to the initial conditions Again, as in the body of text, we have introduced a scale α > 0 that controls the relative evolution of x and θ in time τ . By taking the derivative with respect to τ of the first equation in (B2) and applying the operator ∂ t − ζ to the second equation, a closed PDE for the function x(τ, t) can be obtained: such that ϕ(τ, 0) = ϕ(τ, T ) = 0. To proceed, let us use the Fourier decomposition of ϕ: where we have taken the convention Inserting (B6) and (B7) into (B5) and projecting, we arrive at the equation verified by the modes ϕ q (q ∈ N): with ω 2 q = ζ 2 + (πq/T ) 2 . We recognize the equation for the damped harmonic oscillator, implying that the modes will eventually decay exponentially in τ to their stationary value. Setting we have: for ∆ q > 0 ϕ q (τ ) =φ q + C 1q e −τ /(2α) e √ ∆qτ /(2α) for ∆ q < 0, ϕ q (τ ) =φ q + C 1q e −τ /(2α) cos |∆ q |τ /(2α) and for ∆ q = 0, In each cases the constants C 1q and C 2q are then determined by the initial conditions x 0 (t) and θ 0 (t).
This computation addresses the question of the convergence rate, which is different for each mode q. For a given q ≥ 1, the best decay rate λ q is obtained when choosing α = 1/(2ω q ), which corresponds to the damped critical regime. In practice, we need to use the same α for all modes q. The convergence to the final path is then determined by the smallest rate, which must be chosen as large as possible. Since the decay rates λ q are ordered according to λ 1 ≤ λ 2 ≤ ... ≤ λ q , this prescribes the choice of α = 1/(2ω 1 ) that maximises the convergence rate of the mode q = 1, for which we have λ 1 = ω 1 . This also yields identical rate λ q = ω 1 for all modes q and the modes q > 1 will display damped oscillations.

Appendix C: Stability of the numerical scheme
In this section, we analyze the stability of Algorithm 1 for the Ornstein-Uhlenbeck system introduced in Appendix B. Starting from (B2), where x is subject to the boundary conditions (B3), we set u = x + αθ and v = x − αθ. The GDA equations in these variables becomes and with the boundary conditions v(τ, t = 0) = −u(τ, Since (C1) are linear equations, Algorithm 1 can be expressed in terms of matrix multiplications. To this end, let us define r = ∆τ /∆t, the ratio between the algorithm evolution time step and the physical time step, and set U n = (u n 0 , · · · , u n M ) T and V n = (v n 0 , · · · , v n M ) T , such that (U n , V n ) T = (u n 0 , · · · , u n M , v n 0 , · · · , v n M ) T . The scheme we prescribe in Algorithm 1 thus writes in a block-matrix form, where each block K, L, {G i } 1≤i≤4 is a (M + 1) × (M + 1) matrix, and where 1 is the identity matrix: we solve (U n+1 , V n+1 ) T such that and, setting z ≡ ∆τ (ζ + 1 α ), andz ≡ ∆τ (−ζ + 1 α ), The scheme is stable if the eigenvalues of the matrix Q are of module ≤ 1. To check the stability for different values of ∆τ and ∆t, we prescribe a value of ζ, which defines a typical relaxation time scale t ζ = 1/ζ of the noiseless dynamics. We then either prescribe the final time T and vary the number of points M along the path, or we fix the number of points and vary ∆t, which changes the final time T . In particular, it is important to check the stability for a final time T t ζ , such that a transition from x a or x b to the critical point x = 0 can follow the deterministic flow on most of the path (such transitions only need noise close to the critical point that must be reached exactly at time t = T ). To correctly resolve the instanton, we should then typically take ∆t < t ζ .
In Fig. 11 we display the stability region of the scheme in space (∆t, ∆τ ) for some values of the parameters. The result shows that the stability region depends on ζ and α. Since the fixed point (the instanton) is independent of the dynamical parameters ∆τ and α, we should take the best values that bring stability and convergence. Interestingly sending α → 0 is not the best choice for stability, even if it looks appealing at first sight since it corresponds to solving the Legendre-Fenchel transform argmax θ ( ẋ, θ − H(x, θ)) for a given x. It is instead preferable to take α = O(1) to update x and θ on similar time scales. It is also worth noticing that the use of an implicit scheme for advection has relaxed the Courant-Friedrichs-Lewy (CFL) stability condition, i.e. one can take ∆τ > ∆t and still have a stable scheme. Unfortunately, it is clear from the graphs that the limit ∆τ → ∞ is not stable. This limit is however interesting since it corresponds to an infinitely fast update of u n+1 at v n fixed, i.e. that u n+1 solves in the continuous limit at v n fixed, and, following, v n+1 solves at u n+1 fixed. This greedy procedure would bring the convergence to the fixed point in a few steps, when it converged.
Here instead, ∆τ must remain finite to guarantee convergence.
Finally, let us discuss the issue of preconditioning the scheme for large values of ζ, which is relevant e.g. if ζ refers to the different relaxation rates of the different Fourier modes of a diffusive field (we typically have ζ(k) = Dk 2 with D the diffusion coefficient of the field). Specifically, we would like to choose time step ∆τ for which the numerical scheme remains stable when evolving separately each Fourier component in a semi-spectral method for solving PDEs. Preconditioning the evolution (B2) aims at keeping the same ∆τ for each mode, and allowing higher modes (large ζ(k)) to relax slower than lower modes (small ζ(k)). This procedure does not change the final path, which still solves Hamilton's equa- tions. Such preconditioning for the GDA reads The shape of the matrices K, L and {G i } 1≤i≤4 is unchanged but ∆τ should be modified following the substitution rule: which also modifies r, z, andz. We find that this procedure ensures the stability of the scheme up to ∆τ = 1, for all ζ, assuming ∆t is small enough. The semi-analytical proof presented here in the Ornstein-Uhlenbeck setup provides insights on the behavior of the system, but the problems we are usually interested in display strong non-linearities, and their stability cannot be analyzed through the spectrum of a linear operator. We keep in mind however that a few ingredients should be reused and that they stabilize the code in general: the advection should be treated with an implicit upwind scheme, the reaction terms can be evaluated upwind, and preconditioning the dynamics allows us to take the same time step ∆τ for each Fourier mode in a semi-spectral scheme.
Eq. (52) follows from this equation if we modify a few terms to guarantee that λ ≥ 0 (which may not always be satisfied since (E2) only holds at convergence and not during the optimization). Note that equations (E2) and (E4) impose H = 0 along the trajectory, for any definite positive C. This enjoins us to consider replacing C by the identity in our numerical algorithms in order to avoid computing (∂ 2 θ H) and its inverse. Note also that the value H = 0 is the only possible value since H(0) = H(1) = 0 (endpoints are critical points) and that H(s) = H(0) for every s ∈ [0, 1] (Hamiltonian system). In this sense, the coefficient λ −1 can also be seen as a Lagrange multiplier enforcing H = 0. Finally, the min onφ brings the keeps the same fixed points ρ − , ρ s , ρ + when Ω → ∞, with ρ − and ρ + stable, while ρ s is unstable. Following [42,43,78,79], using the WKB (or eikonal) approximation and the continuum limit, the probability now becomes a probability density and can be cast into the following form P eq (ρ) = K(ρ, Ω)e −ΩVeq(ρ) , where the equilibrium potential is given by V eq (ρ) = ρ dy ln w + (y) w − (y) and where K(ρ, Ω) is a function with the property that for any ρ 1 , ρ 2 , the fraction K(ρ 1 , Ω)/K(ρ 2 , Ω) is bounded. As such, the ratio of the probabilities P (ρ 1 )/P (ρ 2 ) is completely determined by the difference ∆V ≡ V eq (ρ 1 ) − V eq (ρ 2 ), in the large Ω limit. We show in Fig. 14 that the potential V eq and the naive Ginzburg-Landau potential U (ρ) = −λ 0 ρ − λ 2 ρ 3 /3 + λ 1 ρ 2 /2+λ 3 ρ 4 /4 display different maxima, thus different predictions for the relative stability of the states. In the large Ω limit, we check that the probability converges as expected to the probability density given by the function V eq (ρ). We also notice that many particles per box may be needed to observe the convergence to the large deviation function. To summarize, for a well-stirred and unique compartment, the analytical expression of the large deviation function can be obtained explicitly. This is no longer the case when spatial diffusion is taken into account and this is why one has to resort to other techniques to access the quasipotential. ticle Ω increases. The probability is explicitly known for this equilibrium case, and we show the convergence to the function Veq(ρ) as the number of particles in the box increases. For the parameters chosen here, ρ− is the stable phase, while looking at the naive Ginzburg-Landau potential U (ρ) = −λ0ρ−λ2ρ 3 /3+λ1ρ 2 /2+λ3ρ 4 /4 would wrongly predict that ρ+ is the stable phase. Parameters: λ0 = 0.8, λ1 = 2.9, λ2 = 3.1, λ3 = 1.