Capacities and efﬁcient computation of ﬁrst-passage probabilities

A reversible diffusion process is initialized at position x 0 and run until it ﬁrst hits any of several targets. What is the probability that it terminates at a particular target? We propose a computationally efﬁcient approach for estimating this probability, focused on those situations in which it takes a long time to hit any target. In these cases, direct simulation of the hitting probabilities becomes prohibitively expensive. On the other hand, if the timescales are sufﬁciently long, then the system will essentially “forget” its initial condition before it encounters a target. In these cases the hitting probabilities can be accurately approximated using only local simulations around each target, obviating the need for direct simulations. In empirical tests, we ﬁnd that these local estimates can be computed in the same time it would take to compute a single direct simulation, but that they achieve an accuracy that would require thousands of direct simulation runs.


I. INTRODUCTION
Reversible diffusions play a key role in a wide variety of physical systems. For example, the folding of macromolecules into their native configurations is often posed as a diffusion (either directly through simulations of atomic dynamics or indirectly through mesoscopic models [1][2][3]), the fluctuation of chemical species in solution can be modeled as diffusions ("chemical Langevin equations" are one classic example of this approach [4,5]), the motion of particles through membranes can be posed as a diffusion [6], and so on. In all cases, the the physical state of the system at time t is represented through a variable X t , confined to a finite region and evolving according to a stochastic differential equation.
In this paper we seek to estimate first-passage probabilities (sometimes also called "splitting probabilities" [7] or simply "hitting probabilities") of such diffusions: given an initial condition x 0 and a collection of targets, what is the probability that the process first hits any particular target before hitting any of the other targets? Targets might represent, for example, multiple exit locations from a region of the state space in a statistical mechanics problem or the establishment of new intramolecular bonds in a folding problem. Where will the system first exit or what is the next step in folding? This paper develops an alternative algorithm for the approximation of first-passage probabilities.
To keep things simple, most of the discussion will be in the context of just two targets, A and B, A, B ⊂ , although all of the results apply, with trivial notational changes, to any finite set of targets. In this context, "hitting probability" (also known as "first-passage probability") refers to the specific probability that X t hits target A before target B. The algorithm is designed for cases where the diffusion "forgets" its initial condition before encountering a target. Precise definitions are in Sec. II, but to get the main idea consider a region M in that lies outside of the targets A and B, M ⊂ \(A ∪ B), and assume that the diffusion starts in M (x 0 ∈ M). Recall that reversible diffusions are ergodic: as time goes on, it gets harder and harder to guess the initial condition from the current configuration. As this happens, the process is said to become "mixed," and the speed with which the initial condition is forgotten is called the "mixing rate." If the first-passage time out of M is long relative to this mixing rate, then the hitting probability will be nearly independent of x 0 . This is what we mean by "forgets" its initial condition. Our main result (Theorem 1) asserts that if the hitting probability varies by at most ε among all initial conditions in M, then the proposed algorithm's approximation error is less than ε + √ ε/2. Figure 1 gives a rough sketch of this idea. When would these conditions hold? We will have more to say about this in Sec. III, but for now we note that the narrow-escape literature gives at least one class of examples.
In these examples X t is a Brownian motion trapped inside a set by reflecting boundaries, and the targets are very small windows in boundary. These models have been successfully applied to a variety of mesoscopic biosystems (cf. Ref. [8] and the references therein). These kinds of diffusions may also be relevant for modeling the folding of large molecules. According to McLeish, "folding rates are controlled by the rate of diffusion of pieces of the open coil in their search for favorable FIG. 1. First-passage capacities allow efficient computation of hitting probabilities. We consider a reversible stochastic differential equation in an n-dimensional state space, initialized at some configuration x 0 inside a set M. What is the probability that the diffusion will reach region A before B? We assume that the hitting probability is nearly the same for every initial condition x 0 ∈ M-differing by at most ε across all initial conditions. (See Theorem 2 for some sufficient conditions, involving small targets and/or high dimensions.) Theorem 1 establishes an approximation to the hitting probability that is accurate to within ε + √ ε/2 and uses only "first-passage capacities." These capacities can be computed from simulations that take place locally, meaning in the neighborhoods of the targets and entirely outside of M. Here we depict several examples. In each case, the first-passage capacities can be computed efficiently using simulations confined only to the areas shaded in gray.
contacts" [9]. There is also some experimental evidence that the exploration of large regions is the rate-limiting step for a variety of processes [10][11][12][13]. If these results bear out, then to the extent that exit times are long it will be possible to obtain good approximations for hitting probabilities using our approach. Many standard methods for calculating hitting probabilities fail for diffusions of this kind. When the escape time from M is large, the associated partial differential equations often include large Lipschitz constants, and direct Monte Carlo simulation requires a prohibitive number of time steps (cf. Refs. [14][15][16]). Some authors take a pessimistic view of this subject: "If these processes are intrinsically slow, i.e., require an extensive sampling of state space, not much can be done to speed up their simulation without destroying the dynamics of the system" [17]. But in fact diffusions of this kind enjoy certain properties which can actually make analyses easier. The literature includes a variety of tools which use such properties to advantage. Here we review a few of these tools: Analytic analyses. When targets are very small, the firstpassage times (as opposed to probabilities) are often well approximated by analytic formulas. The narrow-escape literature has developed many techniques in this direction. Much of this literature focuses on making these formulas as accurate as possible for specific geometric configurations, such as the case of N circular targets (cf. Ref. [18]), the case that one target lies at the end of a long tube (cf. Ref. [19]), or the case that the motion is trapped inside a symmetric domain (cf. Refs. [20][21][22][23]). There is also some work on efficient numerical methods for low-dimensional problems using insights from these analytic results (cf. Ref. [24]). Note that this literature mostly focuses on hitting times, whereas in this paper we focus on hitting probabilities. Nevertheless, the first-passage probabilities are nearly inversely proportional to the mean first-passage times for many cases, so these techniques can often be used to estimate first-passage probabilities. A helpful overview of many of the main ideas in this literature can be found in Ref. [25].
Markov state models. Markov state models (MSM) begin by partitioning the state space of the diffusion into n sets ("states"). The diffusion can be understood at a coarse level by looking at which state the process is in at any given time (cf. Refs. [26][27][28]). In some cases the time it takes to move between states is long relative to the rate of mixing inside the states. This separation of timescales ensures that the coarse process is approximately Markovian. We can therefore approximately simulate the discrete process as long as we know the distribution on the exit times and the probability of transitioning to each possible state. The amount of computational time required to simulate from this approximate process does not depend upon how long the process actually spends in each state.
Site-localizing functions. Site-localizing functions define a kind of continuous version of the MSMs: instead of creating a hard partitioning of the space, one constructs basis functions g 1 (x) · · · g n (x) such that i g i (x) = 1. In many cases the action of the relevant diffusion operators can be well approximated by the their action on the n-dimensional space spanned by these basis functions. For example, under a separation-of-timescales assumption, Morro [29] shows a way to design basis functions which faithfully represent the diffusion's behavior on the slow timescale.
Milestoning. In some cases it is possible to construct a lowdimensional reaction coordinate which measures the distance from the targets in some suitable metric. In some cases the movement along this reaction coordinate is slow relative to the mixing rate along all other directions. This separation of timescales ensures that the dynamics of the diffusion along this coordinate are approximately Markovian. If the reaction coordinate is carefully chosen, many properties of the original diffusion are maintained in this low-dimensional projection [7]. Simulating the low-dimensional approximate diffusion can yield useful insights into the behavior of the original system [30].
Here we will investigate a different way to make use of separated timescales. Let h A,B (x) denote the probability of hitting A before B given that the diffusion is initialized at x ∈ M. It is well known that h A,B (x) can be represented as the solution of a variational problem. (A detailed account of this and related facts is included in Appendix A.) Under the special condition that h A,B (x) is a constant, say h A,B (x) =h for all x ∈ M, the solution of the variational problem, and henceh, can be written in terms of certain integrals over local regions surrounding the targets (indicated by the gray-shaded areas in the examples depicted in Fig. 1). We will refer to these integrals as "first-passage capacities"; see Sec. II for the formal definition. If, on the other hand, h A,B (x) is not exactly constant, thenh becomes an approximation rather than an exact formula: for all x ∈ M, |h − h(x)| ε + √ ε/2, where ε is the difference between the maximum and minimum hitting probabilities over M. This is the main theoretical result, Theorem Sec. 1, which reduces the first-passage problem to one of evaluating local integrals. For the purpose of evaluating these integrals, we introduce a Monte Carlo approach, that we call the shell method, and demonstrate its accuracy and computational efficiency.
The key advantage of our approach is that it allows us to get good estimates while completely ignoring all dynamics inside M. This is particularly valuable if M is a large, highdimensional set. In this case it is unclear how one would partition M or formulate a meaningful reaction coordinate, calling into question the utility of Markov state models and milestoning under these conditions. Furthermore, integrating over all of M is computationally burdensome, making site-localizing functions also difficult to implement. Asymptotic techniques from the narrow escape and reaction-controlled diffusion literature may still be applied, but only if the problem falls into one of a collection of very specific cases, most of which lie in two or three dimensions, feature regular targets, and assume that the diffusion has no energetic potential. By contrast, first-passage capacities can be calculated efficiently for a diffusion with an arbitrary energetic potential, target shape, and ambient dimension. Moreover, first-passage capacities are fundamentally nonasymptotic quantities; for any problem the approximation error is bounded explicitly in terms of the extent to which the hitting probability varies across M. On the other hand, the strengths of the capacity-based approach are accompanied by a significant limitation: the approach gives no information about first-passage times. Indeed, the temporal process is explicitly eliminated. In general, there is an overall proportionality constant which connects first-passage hitting probabilities to first-passage hitting times, but this relationship cannot be recovered from the first-passage capacities. We discuss this limitation further in Sec. VI.
In the following section (Sec. II "Preliminaries") we introduce notation, define the diffusion and first-passage probabilities, enumerate the main assumptions, and finally define the first-passage capacities that are at the heart of this approach. In Sec. III we show that these capacities can be used to accurately estimate first-passage probabilities, and by way of examples, we give some sufficient conditions that guarantee that the main assumptions are satisfied. In Sec. IV, we develop the "shell method," a numerical approach to computing firstpassage capacities. And in Sec. V we investigate the speed and accuracy of the capacity-based approach in a variety of computational experiments.

II. PRELIMINARIES
The results in this paper are about diffusion processes X confined to an open bounded set ⊂ R n with reflecting smooth boundary ∂ for n 3. We assume X is driven by an n-dimensional standard Brownian motion W , where b: → R n and σ : → R n×n are continuously differentiable vector-valued and matrix-valued functions. We further assume that a(x) = σ (x)σ (x) T is uniformly elliptic on , i.e., the smallest eigenvalues of a are bounded away from zero. Let¯ denote the closure of (and in general letS denote the closure of any set S ⊂¯ ). For the precise definition of the reflected process, we adopt the framework developed by Lions and Sznitman [31]: Let n = n(x) denote the outward normal of ∂ and ν: ∂ → R n a smooth vector field satisfying n T ν c > 0, and assume that x 0 ∈ . Then there is a unique pathwise continuous and W -adapted strong Markov process X t ∈¯ , and (random) measure L, such that and L({t:X t / ∈ ∂ }) = 0. For convenience, we will refer to X by simply saying "the reflected diffusion process (1)." We further assume that X is a reversible process, with equilibrium distribution given by where U :¯ → R is continuously differentiable. As shown by Chen [32], to ensure that X is reversible it is sufficient that where a(x) = σ (x)σ (x) T is uniformly elliptic. When the conditions in (3) are in force we will say that X satisfies the reversibility conditions relative to U . We assume these conditions throughout the paper.
The main goal of this paper is to estimate first-passage probabilities for X . We here give a formal definition for these hitting probabilities for the two-target case, although all of our results extend, straightforwardly, to finite collections of targets: Definition 1. (First passage probabilities). Fix two disjoint sets A, B ⊂ . The first-passage probability function h A,B (x) is the probability that the process X visits A before B if it is initialized at X 0 = x. Formally, where τ A∪B indicates the first-passage time to A ∪ B, i.e., τ A∪B inf{t 0:X t ∈ A ∪ B}. Throughout this paper we will use τ S to denote the first-passage time to a set S and h S,S to denote the hitting probability function for targets S, S .
As discussed in the introduction, these hitting probabilities are much easier to estimate if the process forgets its initial condition to such an extent that the hitting probabilities are nearly constant for any initial condition inside a set M. Here we give a formal definition for this property: Definition 2. (The ε-flatness condition) A hitting probability function h A,B (x) is said to be "ε-flat relative to M" whenever sup x,y∈M As we shall see in Theorem 1, this condition is exactly what we need to show that the hitting probabilities within M can be well approximated using "first-passage capacities." These capacities are the last piece we must define: Definition 3. (Capacity) Let S ⊂S ⊂ be open sets and let X be a diffusion governed by the SDE in Eq. (2) and satisfying the reversibility conditions in Eq. (3). The firstpassage capacity cap(S,S) for X is defined as We refer the reader to Appendix A for more details on capacity and the related concept of Dirichlet form. Note that there are several related definitions of "capacity" in the probabilistic potential theory literature, all slightly different. For example, the harmonic capacity arises by taking the above definition in the special case when the diffusion is a simple Brownian motion. The definition used here most closely follows the work of Bovier [33,34]. Throughout this work, the term is used only in the sense of the above definition.

III. MAIN THEORETICAL RESULTS
Our main theorem shows that the first-passage capacities give accurate approximations of the hitting probabilities when the hitting probabilities are themselves ε-flat in a region outside a neighborhood of the targets. Theorem 1. Let A ⊂Ã, B ⊂B be open sets and assume that h A,B (x) is ε-flat relative to \(Ã ∪B). Then the firstpassage probabilities are well-approximated by the firstpassage capacities: We defer the proof to Appendix C. Note that the generalization to multiple targets is straightforward due to the additive property of capacities (Proposition 2 in Appendix A). In general, the hitting probability is approximately proportional to the corresponding capacity.
When might the ε-flatness condition hold? In the introduction, we gave an intuitive explanation for when this might be expected to happen, namely, whenever the diffusion forgets its initial condition before it hits the targets. To prove the ε-flatness condition for a particular problem, one must make this idea rigorous. Here are two examples of how this might be done: (1) (Small targets) Fix r, ε > 0. We can then find r > 0 with the following property. For any bounded convex set with diameter less than or equal to 1, any x A , x B ∈ , any reversible stationary diffusion trapped inside and behaving like Brownian motion for all (2) (High dimensions) Fix r, ε > 0 and r < r. We can then find n with the following property. For any convex set ∈ R n with diameter less than or equal to 1, any x A , x B such that B(x A , r), B(x B , r) ⊂ , any reversible stationary diffusion trapped inside and behaving like Brownian motion for all A proof can be found in Appendix B. The first example shows how hitting probabilities become flat as the targets become small. The second example notes that even if we fix the radius of the targets, the hitting probabilities becomes flat if the ambient dimension n is high. In both cases, the results are proved by showing that the rate of mixing inside M is fast relative to the time it takes to exit M.

IV. CAPACITY ESTIMATION VIA THE SHELL METHOD
To make use of Theorem 1 in practice, we must be able to compute capacities. The calculation is local, in that cap(A,Ã) is an integral onÃ\A. We will propose here a Monte Carlo approach to evaluating the integral, using a combination of analytic reductions and local sampling.
We begin by using an alternative formulation of the capacity: Proposition 1. For any regions G andG having smooth boundaries and such that A ⊂ G ⊂G ⊂Ã, cap(A,Ã) can be expressed as a flux leavingG\G: where is the (n − 1)-dimensional Hausdorff measure, and n represents the outward-facing (relative to the setG\G) normal vector on ∂ (G\G).
(Results like this are well known, though perhaps not in exactly this form. In any case, we have included a formal proof, which can be found in Appendix D.) There is a great deal of freedom in choosing G andG; the idea is to choose them so as to make the surface integrals as simple as possible. Each of these surface integrals can be viewed as an expectation. Let H (dx) denote the n − 1dimensional Hausdorff measure. We can define a probability measure on ∂G by We can defineP (dx) andZ analogously, on ∂G rather than ∂G. Then where, in these integrals, the normal, n, points outward from both G andG. Let y 1 , y 2 , . . . , y m ∼ iid P (dx), so that where |∂G| is the surface area of G. Putting these together, we get the large n approximation .
If we now extend all of this to ∂G, withỹ 1 ,ỹ 2 , . . . ,ỹ n ∼ iidP (dx), and put the approximations into (5), then for large n and m, In summary: we can use a Monte Carlo technique to compute the capacity as long as we can (1) sample from P (dx) andP (dx), (2) compute the surface areas |∂G| and |∂G|, (3) compute the first-passage probability h A,Ã c , and (4) compute the gradient ∇h G,G c . We can often make tasks (1) and (2) straightforward by a judicious choice of G,G. However, tasks (3) and (4) are more difficult.
Thus, to compute the capacity using this approach, the most difficult challenge is to compute the first-passage probabilities along G,G. Broadly speaking there are two approaches for this kind of problem. First-passage probabilities satisfy an elliptic PDE related to the infinitesimal generator [see Appendix A, Eq. (A1)], and we could therefore choose from a selection of numerical solvers. Here, in a different direction, we exploit the connection between first-passage probabilities and the underlying random walk in order to develop Monte Carlo tools suitable for estimating both h A,Ã c and ∇h G,G c on the surfaces ∂G and ∂G. These tools are based on what we will call the "shell method," which we describe briefly in the following paragraphs and in full detail in Appendix F.
Generically, given two simply connected regions R and R, with R ⊂R, and a set S such that R ⊂ S ⊂R, we seek an approximation to the function h R,R c on the surface ∂S. In principle, we could begin with a fine-grained partitioning of ∂S into simply connected "cells," and for each cell simulate the diffusion many times, recording whether or not the path first exits ∂ (R\R) at ∂R. The fraction of paths that first exit at ∂R constitutes an estimate of h R,R c (x) for any x in the current cell. But this is wasteful and likely infeasible in all but the simplest of settings. Much of the waste stems from the fact that the ensemble of all paths generated from all cells will likely include many near collisions of paths scattered throughout ∂ (R\R). An alternative, divide-and-conquer approach, is to introduce multiple sets, S 0 , S 1 , . . . , S n such that and use sample paths from X , locally, to estimate the transition probability matrices from each cell within each "shell" ∂S k to each cell of its neighboring shells, ∂S k−1 and ∂S k+1 . Equipped with these transition matrices, the first-passage probability for a given x ∈ S is computed algebraically, without further approximation.
S must have been chosen not only to satisfy R ⊂ S ⊂R but also in such a way as to make it feasible to sample from ∂S under the probability measure 1 Z e −U H (dx). After that, S k k = 1, . . . , n − 1 are chosen so that the shells nest and are in close proximity; the hitting times starting from a sample in ∂S k and ending at ∂S k−1 ∪ ∂S k+1 must be short enough to encourage many repeated runs. The output is a set of samples, on all of the shells ∂S k , along with an estimate of h R,R c at every sample.] With the choice of A for R andÃ forR, the algorithm becomes directly applicable to the estimation of h A,Ã c on ∂G and ∂G, taking S = G in the former case and S =G in the latter.
The shell method is closely related to milestoning [30,35,36] and Markov state models [26][27][28], though more tailored to the problem at hand. In particular, our interest here is in computing the first-passage probabilities rather than in approximating the underlying process. Also, the discretizations of the shells are adaptive, in that they are based on clusters that are derived from an ensemble of samples, as opposed to being crafted for a particular landscape; see Appendix F.
As for the required gradients, these are generally harder to estimate. Nevertheless, for the particular gradient ∇h G,G c , the problem is substantially mitigated by noting that we are interested only in its evaluation on ∂G and ∂G, each of which is a level set of h G,G c (h G,G c = 1 on G and 0 onG). Consequently, on each surface the gradient is in the normal direction, and we need only estimate its magnitude. For this purpose it is enough to know the values of h G,G c on a surface close to G and interior toG\G (for estimating ∇h G,G c on G) and on another surface close toG and also interior toG\G (for estimating ∇h G,G c onG). Two such surfaces would be ∂S 1 and ∂S n−1 , were we to apply the shell method with R = G and R =G, since, as already noted, a by-product of the method is an estimate of h R,R c on all of the shells. Alternatively, in the interest of better accuracy, the method could be run twice, once with S = S 1 , a well-chosen outer approximation of G, and then again with S = S n−1 , a well-chosen inner approximation ofG.

V. NUMERICAL EXPERIMENTS
We use numerical experiments to test the practical relevance of our theoretical results and capacity-estimation algorithm [37]. There several questions we would like to address empirically: We used a series of experiments to investigate the accuracy and computational efficiency of the proposed approach to estimating first-passage probabilities. Three questions were addressed empirically, in each of several settings: Is the first-passage probability function ε-flat? The capacity approach to first-passage probabilities (Theorem 1) is predicated on the assumption that h A,B (x) is ε-flat over a region that is sufficiently removed from the immediate neighborhoods of the targets. Theorem 2 gives asymptotic (small-target or high-dimension) sufficient conditions. How quickly do these asymptotic limits come into play? We estimated, numerically, the extent of ε-flatness in each setting.
How tight are the bounds in Theorem 1? The theorem guarantees that the first-passage probability function is within ε + √ ε/2 of the probability determined by the target-capacity ratios. We estimated the tightness of the bound in each setting.
Is the shell method accurate and computationally efficient? Target capacities can rarely be computed analytically, and hence the applicability of the capacity-based approach to first-passage probabilities depends upon the accuracy and computational efficiency of the shell method (Sec. IV). We examined both in each of many settings.
We looked at problems of the following form. In every case the configuration space, , was the unit ball in R 5 . In other words, The diffusion was confined to by a reflecting boundary at ∂ , and within the dynamics were assumed to obey the first-order (high-viscosity) Langevin equation, where W t is an n-dimensional Brownian motion and U is a potential energy. Although the method applies to multiple targets (as noted earlier), the experiments involved only two, Given an initial condition X 0 ∈ \(A ∪ B), we are interested in knowing whether the dynamics carry the system into A or B first. We further assumed that U is constant beyond the immediate neighborhoods of A and B: In order to establish good estimates for ground truth, we ran a total of 400 000 simulations of (7) until the first passage to A or B, in each of six experiments. This was greatly facilitated by the assumption that ∇U (x) = 0 outside ofȦ ∪Ḃ, since we could use the "walk-on-spheres" method (cf. Ref. [38]) for the corresponding portions of the trajectory.
We considered two broadly different variations of the setup: one (Brownian diffusion) in which the exact target capacities could be computed, thereby allowing for a direct assessment of the accuracy of the shell method, and the other involving complex energy landscapes in the neighborhoods of the targets. In the latter case, the accuracy of the capacity estimates could only be inferred from the accuracy of the resulting estimates of first-passage probabilities.

A. Brownian diffusion
The first set of experiments tested the capacity-based approach in the simplest possible case: U (x) = 0 for all x outside of the targets. The target centers were fixed at x A = (0.5, 0.6, 0.0, 0, 0, 0.0) and x B = (−0.7, 0.0, 0.0, 0, 0, 0.0), the radii rÃ and rB were fixed at 0.2, and the radii r A and r B were varied across three conditions. These were defined as "small targets" (r A = 0.02, r B = 0.04), "medium targets" (r A = 0.05, r B = 0.075), and "large targets" (r A = r B = 0.15). The three cases are illustrated, along with the corresponding results, in Each of the three questions raised earlier (Is the firstpassage probability function ε-flat? How tight are the bounds in Theorem 1? Is the shell method accurate and computationally efficient?) was explored for each of the three target sizes.

Is the first-passage probability function ε-flat?
Theorem 2, although an asymptotic result, suggests that the variability of the first-passage probabilities will decrease with target size. To investigate this relationship we conducted 2000 diffusion simulations at each of 100 randomly selected initial conditions, X 0 ∈ \Ã ∪B. These simulations yielded 100 well-estimated hitting probabilities. The histograms of these probabilities are displayed in Fig. 2. As the targets become smaller, the histograms become more peaked. A crude estimate of the flatness of h A,B (x) is the spread of the 100 estimated probabilities: (1) ε ≈ 0.0295 for the small targets (2) ε ≈ 0.0420 for the medium targets (3) ε ≈ 0.1060 for the large targets.
In other words, the dependency of the hitting probability on the initial condition decreases with target size, as anticipated.

How tight are the bounds in Theorem 1?
If we were to take these estimates of ε-flatness and apply Theorem 1, as though they were precise measurements, then we would conclude that and ε + √ ε/2 = 0.0295 + √ 0.0295/2 ≈ 0.15 for the small targets, and 0.19 and 0.34 for the medium and large targets, respectively.
In this special case, in which the diffusion is just a Brownian motion outside of the targets, the capacities can be FIG. 2. Capacities and first-passage probabilities: Brownian motion. We consider three different cases (small, medium, and large), as illustrated on the left side. In each case we have two targets and are interested in the probability that a Brownian motion will first encounter target A before target B. We consider 100 random initial locations, confined to be outside of the small black circles. For each location, we use 2000 runs of the diffusion to estimate the probability of hitting target A before B. We plot these 100 estimates as a histogram. The histograms reveal that the hitting probabilities fall within a narrow range (0.0295, 0.0420, 0.1060 for small, medium, and large targets, respectively), especially when the targets are small. Moreover, we see that the location of this narrow range can be accurately predicted by the capacity-based approximation in Theorem 1 [p A from Eq. (9), shown as a dotted red line, 0.1104, 0.2219, 0.5000 for small, medium, and large targets, respectively]; the capacity-based approximation, derived from purely local computations within the neighborhoods of the targets, always falls inside the span of the histogram. We can further compare the capacity-based value to an estimate of the mean hitting probability for a process initialized at a random location, which is just the relative frequency of hitting A before B in 200 000 additional simulations from random initial conditions (shown as a solid red line, 0.0975, 0.2223, 0.4903 for small, medium, and large targets, respectively.) computed exactly via the formulas which are easily derived using the representation in equation (4), established in Proposition 1. Together with (9), we then get the following formula for the corresponding probabilities: , which evaluates to 0.1104, 0.2219, and 0.5, respectively for the small, medium and large targets. These capacity-based estimates are indicated in Fig. 2 by the dotted red lines, which are superimposed on the corresponding histograms of the 100 empirically derived probabilities.
To the extent that the range of the 100 well-estimated hitting probabilities (2000 samples of the diffusion starting at each of the 100 randomly selected starting locations) is a reasonable estimate of the ε-flatness, for a given pair of targets, we are now in a position to evaluate the accuracy of the bound guaranteed by Theorem 1, i.e., the bound in equation (8). In particular, in each of the three examples depicted in Fig. 2 the value of p A (dotted red line) lies within the range of the corresponding histogram. Therefore, in each of the three 023304-7

Is the shell method accurate and computationally efficient?
As noted in Sec. V A 2, the special assumptions in these examples allow us to compute the exact capacities, according to Eq. (10). Hence we can directly measure the accuracy of the approximations computed with the shell method. 1 Table I summarizes the results for each of the three target sizes. By Theorem 1, the capacities enter into an approximation of firstpassage probabilities through p A , as defined in Eq. (9). The third and fourth rows of the table give the exact values of p A and the corresponding values derived from the shell-estimated capacities, respectively. The differences are small relative to the spread of first-passage probabilities and, in particular, each of the estimated values lies comfortably within the range of estimated probabilities (cf. Fig. 2). The shell method does not introduce any significant additional error for these problems.
Turning now to the question of computational efficiency, how fast is a capacity-estimation approach when compared against a simulation-based approach? There are of course many variables that might affect the comparison, including the dimension and the details of the energies, not to mention the implementation details. In our experiments with direct simulations, we used the "walk-on-spheres" method to simulate trajectories in the flat region [38], JIT compilation to remove loop overhead, multi-CPU parallelization, and the coarsest time step that yielded accurate results. (In that the walk-on-spheres method requires a flat energy landscape and Theorem 1 does not, our inferences about computational efficiency are, in this regard, somewhat tilted in favor of the simulation approach.) As for capacity estimation, we made no effort to adjust the number of samples or the discretization parameters.
In both the Brownian-motion experiments discussed in this section and the experiments with nontrivial landscapes 1 With reference to Appendix F, the following parameters were used to implement the shell method: m = 2, n = 4, N p = 100, N b = 3, N s = 1000, and a time step of 10 −7 . discussed in the next section, we observed that a single run of direct simulation took about as long as estimating two capacities using the shell method, i.e., as long as it takes to compute the capacity-based estimate of the first-passage probability. With this benchmark in hand, the comparison of the two approaches comes down to the question of accuracy: How many direct simulations would be needed to compute the first-passage probability to within an accuracy comparable to what was obtained using capacities?
A single simulation run produces a single Bernoulli variable-one if target A is encountered before target B, and zero otherwise. The total number of ones, then, in a sequence of independent simulations will have a binomial distribution. Hence, given any probability p of the Bernoulli event "one," and any desired accuracy δ, the number of independent simulations needed to achieve a confidence interval of size δ, with 95% confidence, is about n(δ) = 4 p(1−p) δ 2 . 2 Since a single simulation run takes about as long as our capacitybased estimation for p A , we can assert that in this sense the capacity-based estimation is approximately n = n(δ) times more efficient.
What is a reasonable value of δ? Here are two points of view, leading to similar estimates of n: (1) Letp A be the capacity-based estimate of first encountering A after starting at a given location x ∈ \Ã ∪B (indicated, for each of the three experiments, by the dotted red line in Fig. 2), and recall that h A,B (x) represents the actual probability, which in general depends on the starting location, x. Keep in mind that the capacity-based estimator is independent of x. If the starting location were chosen randomly, then an unbiased estimator of the mean absolute error, E[|h A,B (X ) −p A |], for a given value ofp A , is obtained by taking expectation with respect to the empirical probability distribution, which is displayed in the figure by a histogram for each target size. The resulting three estimates, for the small, medium, and large targets, of the mean absolute errors are 0.0127, 0.0080, and 0.0167, respectively. From this point of view (i.e., using these three values for δ), the number of simulation runs from a given location x needed to achieve an error approximately equal to the mean absolute error of the capacity method, and hence the computational advantage of the capacity method, is about 2000 for the small targets, 10 000 for the medium targets, and 3500 for the large targets.
(2) Suppose, instead, that the goal is to estimate the mean first-passage probability, i.e., the average over all starting locations x ∈ \Ã ∪B. For each of the three target sizes, we used 200 000 simulation runs, each one initiated at an independent and randomly chosen starting location, and recorded the relative frequencies with which target A was reached before target B. The resulting estimates of the mean first-passage probabilities are reported in the figure and indicated by the solid red lines in the histograms. The difference between these means and the capacity-based probabilities (the distance between the red and dotted-red lines) is an approximation of the accuracy of the capacity-based approach in estimating the mean first-passage probabilities. We used these differences as δ, but slightly corrected (made larger) to account for the binomial approximation error associated with a sample size of 200 000 (i.e., corrected for the estimation of the mean in the first place). In this way we estimate a computational advantage for the capacity-based method to be about 2000-fold for the small targets, 400 000 for the medium targets, and 8000 for the large targets.

B. Nontrivial landscape
In the second set of experiments the targets A and B were only partially contained in , and the energy landscape U was nontrivial. We fixed the target centers at  Fig. 3. We will now revisit the three issues discussed in the previous examples, but in the context of the more complex landscapes surrounding the targets.

Is the first-passage probability function ε-flat?
Following the same procedures used in Sec. V A 1, we estimated ε-flatness from the spreads of the probability histograms: (1) ε ≈ 0.0530 when the targets are small (2) ε ≈ 0.0495 when the targets are medium (3) ε > 0.75 when the targets are large.
In the third example, the first-passage probability function h A,B (x) can hardly be called flat. Evidently, the large targets are not subject to the asymptotic guarantees offered in Theorem 2. In this circumstance, Theorem 1 is irrelevant, and therefore we will discuss the remaining questions only in the context of the small and medium targets.

How tight are the bounds in Theorem 1?
In other words, how large is sup the capacities were derivable analytically [Eq. (10)], which led to a closed-form expression for p A . Here, we followed the same reasoning as in Sec. V A 2 to get an approximation of the tightness of the theorem's bound, but replaced p A byp A , the estimate computed from the shell method. The resulting value ofp A , for each target size, is displayed in Fig. 3 as a dotted red line superimposed on the targetprobability histogram. For both the small and medium targets, the estimated hitting probability lies within the range of the 100 sampled probabilities, as it did in the case of Brownian motion. Reasoning as we did there, these observations suggest that the true error may be better approximated by ε rather than the ε + √ ε/2 guarantee given in the theorem. Specifically, the magnitudes of the ranges, 0.0530 and 0.0495, respectively, serve as ballpark upper bounds on the estimation errors for the first-passage probabilities for the small and medium targets.

Is the shell method accurate and computationally efficient?
How accurate is the shell method in these cases? In the absence of a closed-form expression for the capacities, we are forced to rely on an indirect measure. In Sec. V B 2 we used the shell method to estimate capacities and used those capacities to get hitting-probability estimates. These were in good agreement with the results obtained from 200 000 runs of simulated diffusions (2000 from each of 100 random starting points). This, then, constitutes indirect evidence for the accuracy of the shell method, but with the important proviso that the estimated hitting probabilities depend only on the ratio of the (estimated) capacities, and not on the capacities themselves.
To assess computational efficiency, we repeated the approach used for Brownian motion, step by step as laid out in Sec. V A 3. As noted there, one simulation run is nearly the same as using the shell method to estimate p A , in terms of computational cost. Therefore, efficiency comparisons between the shell method (capacity-based) approach and direct simulation come down to computing the number of simulation runs required to achieve comparable accuracy in estimating first-passage probabilities. In Sec. V A 3 we explored two interpretations of accuracy, which we repeat here, but using instead the results reported in Fig. 3: (1) For each target size (small and medium) and corresponding value ofp A , we used the probability histogram to estimate E[|h A,B (X ) −p A |], the mean absolute error of the capacity-based method. The resulting accuracy measures came to 0.0078 for the small targets and 0.0095 for the medium targets. Then, using the respective probabilities of 0.8217 and 0.3893 (solid red lines), we conclude that in each case about 10 000 simulation runs would be needed to achieve an accuracy comparable to capacity-based approach. In this sense, the capacity approach, based on shell-method estimates, is about 10 000-fold more efficient than direct simulation.
(2) In the second interpretation, we consider the problem of estimating the mean first-passage probability for a randomly chosen starting position (approximately 0.8154 for the small targets and 0.3893 for the medium targets, based on 200 000 independent random samples). Errors in the capacitybased approach were about 0.0072 and 0.0089, which come  Fig. 2. The setup here is the same except that the targets A and B are only partially contained in , and the energy landscapes become complicated near the targets. The histograms show that the first-passage probabilities are largely independent of starting location for the small and medium targets, as they were in the previous set of experiments. In contrast, the first-passage probabilities for the large targets depend strongly on starting location. This is expected, since these targets are substantially larger than the corresponding targets from the previous experiments. (The size difference is even bigger than it might appear, given that these figures represent two-dimensional slices of a five-dimensional region.) In contrast to the Brownian motion examples, the complex landscape in these examples precludes an exact evaluation of p A [Eq. (9)]. Hence the dotted red line represents the estimated value,p A , computed using the shell method (cf. Fig. 2). Nonetheless, these estimates fall within the span of the respective histograms in all three examples. For small, medium, and large targets, the ranges of the 100 hitting probabilities are 0.0495, 0.0530, 0.6430, respectively; the capacity-based hitting probabilities (dotted lines) are 0.8217, 0.3815, 0.5219, respectively; the mean hitting probabilities (solid lines), estimated using 200 000 additional simulations from random initial conditions, are 0.8154, 0.3893, 0.5069, respectively. from the differences in the positions of the red and dottedred lines, but inflated to account for the expected sampling error when using a sample of size n = 200 000. Comparable accuracies would require about 12 000 simulation runs, for each of the small and medium targets. Hence, from this point of view, we would estimate that the capacity-based approach is about 12 000-fold more efficient than direct simulation.

VI. SUMMARY AND DISCUSSION
If a first-passage probability is nearly the same for every initial condition in a region M, then it can be well-approximated in terms of certain integrals. We dub these integrals the "first-passage capacities." These capacities can be approximately computed using a milestoning-like tech-nique, as described in Sec. IV. In Theorem 1 we show that the error introduced by this approximation is small whenever hitting probabilities are sufficiently similar across different initial conditions inside a large region, M. When this condition holds, we say that the hitting probabilities are "ε-flat," where the coefficient ε bounds how much the hitting probabilities vary over the initial conditions. Under what circumstances are hitting probabilities similar for all initial conditions in a large region, M? Theorem 2 provides some sufficient conditions, though the concept is more general and we would expect some form of ε-flatness to hold anytime the mixing rate of the diffusion is fast relative to the typical escape time from M.
The capacity of a given target is local, in the sense that the capacity integral is over a region in the neighborhood of the target. We have devised a Monte Carlo approach to estimating capacities that we call the shell method (Sec. IV). In Sec. V we provide evidence for the accuracy and computational efficiency of using estimated capacities and Theorem 1 to compute first-passage probabilities. In a series of computational experiments we compared our capacitybased approach to the direct estimation of hitting probabilities using repeated runs of the diffusion. In each of these experiments, repeated simulation required at least three orders of magnitude more computation to achieve comparable accuracy.
We were motivated by the narrow-escape problem, and in particular the potential connection to modeling the folding of large molecules. To the extent that folding can be thought of as a series of discrete steps, each requiring a rate-limiting search for new connections in a large conformational space [9][10][11][12][13], it might be possible to sidestep the exploratory phases by computing first-passage probabilities-one for each potential connection. An example might be the sequence of stem formations in the folding of structural RNA molecules. The idea would be to model the creation of each new stem as the outcome of a first-passage problem, defined by the already established stems. Not only does the search, as opposed to the stem creation, have to be the rate-limiting step, but it must be slow enough to allow for the hitting probabilities to be nearly independent of the starting configuration. But the larger problem would be the possibility of the dissolution of an existing stem. RNA folding is not necessarily monotonic; stems come and go. Which will come first, the undoing of an existing stem or the creation of a new one? Unless the undoing of a stem could be formulated as itself a target or some kind of narrow escape, we would be forced to confront the problem of estimating first-passage times in addition to probabilities.
In general, it may turn out that estimating first-passage times is profoundly more difficult than estimating firstpassage probabilities. In this work we showed that the firstpassage probabilities are sometimes invariant to the diffusion dynamics on the interior of a large set M, but the firstpassing times will admit no such invariance. If the dynamics inside of M are genuinely intricate, it appears that estimating hitting times will always require a correspondingly intricate analysis. ACKNOWLEDGMENT S.G. was partially supported by the Office of Naval Research under Contracts No. ONR N000141613168 and No. ONR N000141512267.

APPENDIX A: THREE PERSPECTIVES ON HITTING PROBABILITIES
Many of our results are based on the fact that hitting probabilities can actually be seen from three distinct perspectives. To prepare for these results, we take a moment to review these three perspectives here. Let A, B ⊂ , disjoint, open with smooth boundary.
(1) Hitting probabilities: Let τ S inf{t:X t ∈ S} for any set S and h A,B (x) P (X τ A∪B ∈ ∂A|X 0 = x).

(2) Elliptic equation: Let h dir
A,B (x) denote the solution to the partial differential equation This solution is unique and smooth [39]. What's more, it is equal to the hitting probability function: h dir A,B (x) = h A,B (x) (cf. Sec. 6.7 of Chen [40]). (

3) Variational form: For any open set
Dirichlet form" of f , g on the domain S. Let L 2 (S, ρ) denote the Hilbert space of functions on S which are square-integrable with respect to ρ. Let H 1 (S, ρ) = W 1,2 (S) ⊂ L 2 (S) denote the corresponding once weakly differentiable Hilbert-Sobolev space. We define h var A,B (x) as the solution to min . This solution is unique and equal to h dir A,B on S (cf. Sec. 4 of Dret [41]). This variational perspective leads us to the notion of the "condenser capacity" associated with h A,B . It is defined as A,B , h A,B ), where again S = \(A ∪ B) = ( \B)\A.
We will use all three of these perspectives to show our results. For example, consider how the hitting probability perspective helps us show a result about capacities: Proof. This result is certainly known, but we include a proof here because we were unable to find a proof in the literature. SinceÃ,B are disjoint and X is continuous, the process cannot cross from one to the other without hitting the boundary. Thus we have τ ∂Ã∪∂B∪∂A∪∂B = τ ∂Ã∪∂A as long as X 0 ∈Ã. We get a symmetric result if X 0 ∈B. It follows that We can now use this probabilistic perspective to help us understand the capacity by articulating it as the Dirichlet form on the relevant hitting probability functions (1) (Small targets) Fix r, ε > 0. We can then find r > 0 with the following property. For any bounded convex set with diameter less than or equal to 1, any x A , x B ∈ , any reversible stationary diffusion trapped inside and behaving like Brownian motion for all x / ∈ B(x A , r ) ∪ B(x B , r ), and any A ⊂ B(x A , r ), B ⊂ B(x B , r ), we have that h A,B is ε-flat on \ (B(x A , r) ∪ B(x B , r)).
(2) (High dimensions) Fix r, ε > 0 and r < r. We can then find n with the following property. For any convex set ∈ R n with diameter less than or equal to 1, any Brownian motion trapped inside and coupled to X such that X t = Z t for all t τ = {inf t:X t ∈Ȧ ∪Ḃ} (we can do this because we assumed that X behaves like Brownian motion outside ofȦ,Ḃ). Let Z ∞ denote the stationary distribution of Z, i.e., the uniform distribution on . Using Lemma 1, we have that On the other hand, Dynkin's formula gives that Putting those two facts together: To make this small, it thus suffices to get t large but keep P (τ t ) small. In short, it suffices to show that τ is usually big. We prove this differently in the two different cases: (1) The first example follows because the targets are vanishing into nothing, so it is no surprise that the time to hit the targets will get longer and longer. However, note that this intuition is valid only because the ambient dimension n is at least three (which we have assumed throughout). A formal proof of this argument can be found in Lemma 2.
(2) The second example is similar; even though the targets maintain the same diameter, their relative volume is vanishing because the ambient dimension n is increasing. Thus the targets are effectively becoming smaller, so the hitting time increases. This matter is a bit trickier, and so for this second case note we have made additional requirements that where Z is uniformly distributed on .
Proof. Per Loper [42], where τ is the first exit time of a one-dimensional Brownian motion from the interval [−ξ, ξ] when initialized at the origin. Dynkin's formula gives that E[τ ] = ξ 2 . Hence, by the Markov inequality, P(τ > 4t ) ξ 2 /4t. Lemma 2. Let Z denote a Brownian motion trapped inside a convex bounded open set ⊂ R n with smooth boundary. We further assume that n 3. Fix x ∈ R n and r, > 0. Let τ A denote inf{t:Z t ∈ A} for any set A. We can always find r 1 Proof. First consider the case that x ∈ . To measure the time it takes to hit B(x, r 1 ), we will make use of two additional concentric spheres around x. First, find r 3 such that B(x, r 3 ) ⊂ . Second, let r 2 = √ r 1 r 3 , an intermediate radius. We will use these spheres to break the trajectory of Z into smaller manageable pieces. If Z 0 − x > r 3 , the first part of the trajectory must carry it to a radius of r 2 before it reaches r 1 . The next part of the trajectory will then do one of two things: either it will carry on to r 1 , or it will first exit the ball of radius r 3 . Assuming it exits the ball of radius r 3 , the strong Markov property can be used to show that we are essentially in the same place we started. Under this latter condition, the total time will be the the time we've spent so far plus a new variable which has the same properties as the original time. We can use this recursive relation to put a lower bound on the overall time.
Let us make this rigorous. To measure the tails of τ , we will be interested in To get a lower bound on the moments of τ , we need to get an upper bound on this object. To do so, we will use two objects of interest: , using two stopping times of interest: So those terms are negligible. However, the other two terms of the denominator are in fact exploding: one to positive infinity and one to negative infinity. To understand this delicate balance, we these we turn to Lemma 4. Applying this Lemma and the asymptotics of the DLMF, we obtain that which is indeed exploding to positive infinity as h = (n − 2)/2 → ∞. Thus, since the numerator vanishes and the denominator explodes to positive infinity, we have that L vanishes as n → ∞.
Lemma 4. Let I denote the modified bessel function of the first kind of order h. If a > b then .
Proof. Recall that I may be defined as . .
Since a > b, we have that a 2m − b 2m is always positive. Thus we can get a lower bound by simply taking one of the terms. Choosing m = 1, we get our final result.

APPENDIX C: PROOF OF THEOREM 1
Let A ⊂Ã ⊂ , B ⊂B ⊂ . LetÃ,B be disjoint and h A,B (x) ε-flat with respect to \(Ã ∪B). We assume the set boundaries are all smooth.
Under these conditions, we will show we can use local capacities to get good approximations for h A,B (x) when x / ∈Ã,B. To do so, our key idea is to uncover upper and lower bounds on the value of the Dirichlet form applied to this function, E (h A,B , h A,B ). We will see that these bounds can be understood in terms of local capacities, and the resulting inequalities will then yield our main result in the form of Theorem 1. Proof. We recall from Appendix A that for any u with u(∂A) = 1, u(∂B) = 0. Thus, to prove an upper bound it suffices to find any such function for which we can calculate E (u, u). To this end, consider These functions are well suited to giving us upper bounds on E S (h A,B , h A,B ). Indeed: (1) u c (∂A) = 1, u c (∂B) = 0. In fact, u c takes a constant value c outside ofÃ,B, drops smoothly inB to achieve 0 on ∂B, and rises smoothly inÃ to achieve 1 on ∂Ã.
(2) Noting that u c is written as a piecewise combination of hitting probability functions, we see that its Dirichlet form can be calculated in terms of capacities on local regions: Thus the u c functions give us a practical way to calculate upper bounds: This inequality holds for any value of c. To get the best bound, we can take derivatives to minimize the right hand side with respect to c. The result is cap(B,B) .
Plugging this into the previous equation, we obtain our final result.
Our assumption that m ε/2 indicates that (1 − m − ε/2) 2 (1 − ε) 2 , and thus in fact we have which means that κ A /(κ A + κ B ) 2ε − ε 2 2ε. Thus we assumed m ∈ [0, ε/2] and showed that κ and so for any x / ∈Ã,B, we have If m > 1 − ε/2, the same bound can be achieved by arguments which are symmetric to those employed in m < ε/2: Our final result is found by noting that all these bounds are upper bounded by ε + √ ε/2.
Together with Lemma 7, this yields that where in the last step we used h A,Ã c (x) = 1, x ∈ ∂A, h A,Ã c (x) = 0, x ∈ ∂Ã. Also, note that the normal vector on the right-hand side is pointing out of the set A, as is our convention. Hence then negative sign. Next we apply Lemma 7 again to get Combining this with Eq. (D1) gives us Using the facts that h G,G c (x) = 1, x ∈ ∂G, h G,G c (x) = 0, x ∈ ∂G, and Lh G,G c = 0, we apply Lemma 7 two more times to obtain Corollary. For any region S having smooth boundary ∂S, and such that A ⊂ S ⊂Ã, cap(A,Ã,) can be expressed as a flux leaving S: where a and H (dx) are as defined in the proposition, and n is the outward-facing normal on ∂S.

APPENDIX E: INEQUALITY BOUNDARY CONDITIONS FOR THE VARIATIONAL FORM
Recall that h A,B (x) may be defined variationally. We have let It is natural to consider an apparently different problem, where the equality boundary conditions are replaced with inequalities. Here we show that it is not possible to get lower than cap(A, \B) by such a relaxation. Proof. This result is certainly known, but we include a proof here because we were unable to find a proof in the literature. Let k = clamp(h, 0, 1), i.e.,

APPENDIX F: SHELL METHOD
The algorithm for estimating local hitting probabilities is outlined as follows: Algorithm 1. Estimating h R,R c (x) for many values of x on a shell ∂S Input. R ⊂ S ⊂R ⊂ and a stationary reversible diffusion process {X t } t 0 in with invariant measure μ = e −U (x) dx. We also require a series of subsets which indicate a kind of reaction coordinate.
Output. A collection of points z 1 , . . . , z N p on ∂S sampled from the invariant measure μ = e −U (x) dx restricted on ∂S, along with estimates of h R,R c (z i ) for each point.
(1) Discretize the space. (d) For each one of the shells ∂S 1 , . . . , ∂S n−1 , cluster the N p samples on that shell into N b states. In our implementation, we use k-means, and represent the N b states by the N b centroids we get from the algorithm. The result of this step is a partitioning of each shell ∂S i into N b regions, representing an adaptive discretization of the shells. For a point on a shell ∂S i , we identify its corresponding discrete state by finding the nearest centroid.
(2) Estimate the transition probabilities between these discrete states by running an additional N s local simulations for each one of the N b states on each shell. The result of this step is an estimate of the probability of transitioning from state k on ∂S i to state l on ∂S j , which we denote by P (i, j) k,l , where k, l ∈ {1, . . . , N b } and i, j ∈ {1, . . . , n − 1} with |i − j| = 1. (3) Use the transition probabilities to get an estimate of the hitting probabilities for the N b states on ∂S. In line with related works on Markov state models [26][27][28], we approximate the continuous dynamics using closed-form calculations from the discrete Markov chain we have developed in the previous two steps. In particular, we estimate overall hitting probabilities using the standard "one-step analysis." For any k ∈ {1, . . . , N b } and i ∈ {1, . . . , n − 1}, let u (i) k denote the probability of hitting ∂R = ∂S 0 before hitting ∂R = ∂S n if we start the discretized process at state k on ∂S i . We can calculate our object of interest by solving the matrix difference equation u (i) = P (i,i+1) u (i+1) + P (i,i−1) u (i−1) , i = 1, . . . , n − 1 with boundary conditions u (0) = 1, u (n) = 0, where 0 and 1 are vectors of all 0's and 1's. This gives the estimated hitting probability for each discrete state. We then estimate the hitting probability of each point z i by

APPENDIX G: DETAILS ON ENERGY FUNCTION
For the energy function, we hand designed two different kinds of landscape: random well energy, which we use for the region around target A, and random crater energy, which we use for the region around target B. The basic components of these energy functions are the well component, given by where d w gives the depth of the well; the crater component, given by where d c and h give the depth and the height of the crater, respectively, and , 0 = −9d c hr 4 ; and finally a random component, given by where μ = (μ i j ) m×d and σ = (σ i j ) m×d , with μ i = (μ i1 , . . . , μ i,d ), i = 1, . . . , m being the locations of m Gaussian random bumps in the region around the targets, and σ i j , i = 1, . . . , m, j = 1, . . . , d gives the corresponding standard deviations.
To make sure the energy function is continuous, and the different components of the energy function are balanced, we introduce a mollifier, given by Intuitively, for the well component, we use a fourth-order polynomial to get a well-like energy landscape around the target that is continuous and differentiable at the boundary. Similarly, for the crater component, we use a sixth-order polynomial to get a crater-like energy landscape around the target that is also continuous and differentiable at the boundary. For the random component, we are essentially placing a number of Gaussian bumps around the target. And for the mollifier, we are designing the function such that it's almost exactly 1 around the target, until it comes to the outer boundary, when it transitions smoothly and swiftly to 0. To summarize, given parameters