Learning force fields from stochastic trajectories

When monitoring the dynamics of stochastic systems, such as interacting particles agitated by thermal noise, disentangling deterministic forces from Brownian motion is challenging. Indeed, we show that there is an information-theoretic bound, the capacity of the system when viewed as a communication channel, that limits the rate at which information about the force field can be extracted from a Brownian trajectory. This capacity provides an upper bound to the system's entropy production rate, and quantifies the rate at which the trajectory becomes distinguishable from pure Brownian motion. We propose a practical and principled method, Stochastic Force Inference, that uses this information to approximate force fields and spatially variable diffusion coefficients. It is data efficient, including in high dimensions, robust to experimental noise, and provides a self-consistent estimate of the inference error. In addition to forces, this technique readily permits the evaluation of out-of-equilibrium currents and the corresponding entropy production with a limited amount of data.

From nanometer-scale proteins to micron-scale colloids, particles in biological and soft matter systems undergo Brownian dynamics [1,2]: their deterministic motion due to the forces competes with the random diffusion due to thermal noise from the solvent. At a larger scale, the overdamped Langevin equation describing Brownian dynamics is commonly used as an effective model for the stochastic evolution of complex systems such as motile cells [3], financial markets [4] or climate dynamics [5], where the noise corresponds to the random influence of fast, unresolved degrees of freedom, while force fields model persistent, deterministic trends. In the absence of forces, all trajectories would thus look alike (Fig. 1A): the force field simultaneously shapes a system's trajectory ( Fig. 1B-C) and encompasses most physical information about the system. The inference of such force fields from experimental data is therefore crucial to problems as varied as understanding the dynamics of single molecules in complex cellular environments [6,7], quantifying the interactions between self-propelled colloidal particles [8], calibrating devices to optically trap particles [9], or identifying the laws governing the motion of cells [10]. This problem is particularly relevant in the context of living or driven out-of-equilibrium systems, where active forces induce dissipative currents at the mesoscale [11]. The knowledge of the force field in such cases would permit to measure the mean entropy production rate and thus quantify the irreversibility of the dynamics, a question which gained attention recently [12][13][14][15][16][17]. Moreover, it would also enable one to measure the fluctuations of heat, work and entropy production -the subject of stochastic thermodynamics [18] -which is so far only possible in highly controlled systems [19].
Numerous previous studies have proposed methods to reconstruct force fields, motivated by applications in soft matter [20], cell biology [21-23], climate dynamics [24,25] finance [26][27][28][29][30] and other complex systems [31]. However, force inference in Brownian systems remains a hard problem, and a general method is still missing, in particular one addressing the many challenges The stochastic Lorenz process (see Fig. 5). D. Time series of a 6D out-of-equilibrium Ornstein-Uhlenbeck process (see Fig. 4). E. The same trajectories as in D, with additional time-uncorrelated measurement noise. F. Self-propelled active Brownian particles with soft repulsion and harmonic confinement (see Fig. 7). G Simulated single-molecule trajectories in a complex environment with space-dependent diffusion (see Fig. 9). associated with experimental data in soft matter and biological systems. First, there needs to be enough information about the force available in the trajectory: short trajectories are dominated by noise (Fig. 1A), and only after a long enough observation time does the effect of the force field become apparent (Fig. 1B). Second, one needs a practical method to extract that information and reconstruct the force field, which is challenging for outof-equilibrium systems with a complex spatial structure (Fig. 1C), in particular for high-dimensional processes ( Fig. 1D-F) and in the presence of measurement error (Fig. 1E) and multiplicative noise (Fig. 1G).
Here we address these challenges for steady-state Brownian trajectories. We first use communicationtheory tools to quantify the maximal rate at which information about a force field can be inferred from a trajectory (Sec. I). We relate this rate, that we term channel capacity of the system, to the entropy production rate, thus providing a novel link between stochastic thermodynamics and information theory. We then propose a practical procedure, Stochastic Force Inference (SFI), to use the information in a trajectory and reconstruct the force field by projecting it onto a finite-dimensional functional space (Sec. II). By inferring the information contained in a trajectory, we propose a practical criterion to control overfitting, an aspect generally overlooked by previous approaches. We ensure that this method is robust to the presence of experimental noise. Finally, the diffusion coefficient can depend on the state of the system, which significantly complicates force inference: in such cases, we adapt our method to infer the space-dependent diffusion and force field (Sec. III). Using simple model stochastic processes, we demonstrate that our method permits a quantitative evaluation of phase space forces, currents and diffusion coefficients, and estimate the entropy production with a minimal amount of data.
We focus in this article on stochastic systems governed by the overdamped Langevin equation, where friction dominates over inertia, as is typically the case in subcellular biological systems for instance. We thus consider a system where the phase space coordinates x µ obey Brownian dynamics, where F µ (x) is the force field (we absorb the mobility matrix in its definition), D µν is the diffusion tensor, and ξ µ is a Gaussian white noise, ξ µ (t)ξ ν (t ) = δ(t − t ).
In the first two sections of this article, we assume that D µν is space-independent and known [32,33]; in the third section we address the case of inhomogeneous diffusion, which modifies Eq. 1.

I. THE INFORMATION CONTENT OF BROWNIAN TRAJECTORIES
We propose to interpret Brownian dynamics (Eq. 1) as a noisy transmission channel, where the force is the en-coded signal and √ 2Dξ is the noise (Fig. 2). Information can be read out from such a channel at a maximal rate C, called the channel capacity, which relates to the signalto-noise ratio of the input [34]. This fundamentally limits the ability to infer forces by monitoring the dynamics. To build up intuition, consider the simplest case of a spatially constant force with isotropic diffusion, corresponding to drifted Brownian motion (Fig. 1B). The capacity is then given by C = F 2 /4D (expressed in natural information units, or nats, per time unit -1 nat = 1/ log 2 bits). The force to infer is here equal to the persistent velocity, which can be estimated asF µ = ∆x µ /τ , where ∆x is the end-to-end vector along the trajectory of duration τ . The relative error on this estimator due to random diffusion is ||F − F|| 2 /F 2 = 2dD/τ F 2 = d/2I, where d is the space dimension. We have identified here I = Cτ , defining it as the information in the trajectory. Persistent motion thus starts to emerge from the noise if the trajectory duration τ is longer than d/C, corresponding to the diffusive-to-persistent transition for the mean-squared displacement. Equivalently, the force starts to be resolved if I > d, i.e. if more than one bit of information is available for each degree of freedomF µ to infer.
We now give a precise meaning to the notion of capacity for general Brownian systems, where inter-particle interactions and external fields lead to a force that depends on the position x in phase space. We recognize that within communication theory, the dynamics of a Brownian system (Eq. 1) corresponds to an infinitebandwidth Gaussian channel [34]. The signal transmitted is the force, with signal power equal to its timeaveraged square. The corresponding channel's capacity, which we refer to as the system's capacity, is thus (see Appendix A) where P (x) is the steady-state probability distribution function of the process, and we use the Einstein convention of summation over repeated indices throughout. This quantity was previously considered as a penalty term to regularize force inference [35]. The steady-state Fokker-Planck equation allows to decompose the force into a sum of two terms, where v µ is the average phase space velocity, quantifying the presence of irreversible currents, and D µν ∂ ν log P quantifies reversible, diffusive currents. Interestingly, this implies that the capacity defined in Eq. 2 decomposes into two non-negative parts, one related to dissipation and the other to spatial structure, as HereṠ is the steady-state entropy production of the process [18],Ṡ = v µ D −1 µν v ν P (x)dx (we set the Boltzmann Figure 2. The dynamics of an overdamped system can be seen as a noisy data transmission channel, encoding information about the force field, with a rate bounded by the channel capacity C as defined in Eq. 2. Note that this definition does not include the information loss stemming from the measurement device. This analogy is further discussed in Appendix A. constant k B = 1 throughout). In the case of thermal systems satisfying the Einstein relation,Ṡ corresponds to the rate at which the system dissipates heat into the bath, divided by the temperature; in other cases,Ṡ quantifies the irreversibility of the dynamics. The second term, named inflow rate G = g µ D µν g ν P (x)dx with g µ = ∂ µ log P , was previously introduced and studied in Ref. [36]. It reflects the amount of information that the force field injects into the system in order to maintain probability gradients against diffusion, and is positive even at equilibrium. Indeed, in a thought experiment where the force field would be suddenly switched off, G would correspond to the instantaneous entropy production rate due to the relaxation of probability gradients (see Appendix B 2). The inflow rate quantifies the fact that in steady state, the system dwells in convergent regions of the force field: an equivalent expression for it is indeed [36] G = − ∂ µ F µ (x)P (x)dx. In a deterministic system, it would thus correspond to the average phase space contraction rate. The connection between the inflow rate and the previously introduced notions of traffic and frenesy [37,38] is explored in Appendix B 3. As G ≥ 0, Eq. 4 provides a generic upper bound to the entropy production in Brownian systems,Ṡ ≤ 4C.
The decomposition of the information into dissipative and structural contributions introduced in Eq. 4 can be expressed at the level of individual trajectories in phase space. Indeed, the entropy production rate corresponds to the rate at which trajectories, C = {x(t)} t=0..τ , become distinguishable from their timereversed version, −C = {x(τ − t)} t=0..τ , as quantified by the Kullback-Leibler divergence rate [18]:Ṡ = lim τ →∞ 1 τ log P(C|F )/P(−C|F ) F . Here P(C|F ) is the probability that the system follows a trajectory C under Brownian dynamics (Eq. 1) in the force field F , and · F corresponds to averaging over all possible trajectories C with weight P(C|F ). Time reversal (C, F ) → (−C, F ) changes the sign of the heat produced along the trajectory, and thus connects dissipation and irreversibility of the dynamics. Interestingly, a similar expression can be derived for the inflow rate [36]: G = lim τ →∞ 1 τ log P(C|F )/P(−C| − F ) F , where −F corresponds to the reversed force field. Indeed, the operation (C, F ) → (−C, −F ) now leaves the heat unchanged, but reverses the sign of the divergence of the force. At equilibrium, this corresponds to inverting the energy landscape: for a typical trajectory that dwells in potential wells, the reverse trajectory is atypical in the force field −F , as it spends time around unstable maxima of energy. Finally, the capacity can be expressed as 4C = lim τ →∞ 1 τ log P(C|F )/P(C| − F ) F : this operation reverses both heat and force divergence. Intuitively, there is information about the force in a trajectory if it allows to distinguish the force field from its reverse. More naturally, the capacity quantifies the rate at which a trajectory becomes distinguishable from force-free Brownian motion: indeed, it can be written as as the trajectory-wise information gain about the force field.

II. STOCHASTIC FORCE INFERENCE
A trajectory of finite duration contains finite information, quantified by Eq. 5. We now show how to use this information in practice and reconstruct the force field through Stochastic Force Inference (SFI). In contrast with the drifted Brownian motion, a spatially variable force field is in principle characterized by an infinite number of degrees of freedom: the force value at each point in space. With a finite trajectory, only a finite number of combinations of degrees of freedom can be estimated. It is therefore natural to approximate the force field as a linear combination of a finite basis of n b known functions b = {b α (x)} α=1..n b . The force can, in principle, be approximated arbitrarily well by using a large enough set of functions from a complete basis, such as polynomials or Fourier modes. Alternatively, a limited number of functions might suffice if an educated guess for the functional form of the force field can be made. We propose to perform this approximation by projecting the force field onto the space spanned by b α (x) using the steady-state probability distribution function P as a measure. This corresponds to a least-squares fit of the force field by linear combinations of the b α 's. To this aim, we define the projector c α (x) = B −1/2 where B αβ is an orthonormalization matrix such that c α c β P (x)dx = δ αβ . Our approximation of the force field is then F µ (x) ≈ F µα c α (x) with the projection coefficient This is akin to projecting the dynamics onto a finite-dimensional sub-channel of capacity Similarly, we can define the projection v µα of the phase space velocity. The corresponding entropy productionṠ b = D −1 µν v µα v να is then a lower bound to the total entropy production. Interestingly, for a system obeying Brownian dynamics (Eq. 1) but where only a subset of degrees of freedom can be observed, our framework gives the force averaged over hidden variables, and provides a lower bound on the entropy production limited to the observable currents (see Appendix E).
The projected force field has a finite number of degrees of freedom N b = dn b , one per element of the d × n b tensor F µα , and corresponds to a finite capacity C b . Inferring the approximate force with a finite trajectory is thus in principle possible if the information However, the force coefficients introduced in Eq. 6 are not directly accessible from experimental data. Indeed, neither the force nor the probability distribution function P are known, the latter being also required in the definition of the orthonormal projectors c α . Instead, the available data is typically a discrete time series x(t i ) of phase space positions, at sampling times t i = i∆t. We thus propose to estimate phase space averages by discrete time integrals along the trajectory. The empirical projectors are defined asĉ α =B τ . Furthermore, the force can be expressed in terms of a local Itô average ofẋ [39]: a local estimator for the force at x(t i ) is thus ∆x(t i )/∆t, with ∆x(t i ) = x(t i+1 ) − x(t i ). Combining these two insights yields an operational definition for the estimator of Eq. 6 in terms of a discrete Itô integral (see Appendix C), which is the discretized version of the Itô integral where ∆ξ i is independent of x(t i ): in the long trajectory limit, the main contribution comes from the force, while the noise averages to zero. Equation 7 corresponds to a linear regression of the local force estimator, previously suggested for one-dimensional systems [27], and coincides with the maximum-likelihood estimator of the force projection coefficients. The typical squared relative error on the inferred coefficients due to the diffusive noise can be estimated in practice as µνFµαFνα is the empirical estimate of information contained in the trajectory. This formula indicates that again, in order to resolve the force coefficients, the information in the trajectory should exceed the number of inferred parameters. Another source of error stems from the fact that the force varies over a finite time step ∆t; we provide an estimator for the magnitude of the resulting bias in Appendix F.
We now demonstrate the utility of our method using simulated data of simple models. The simplest spatially varying force field is a harmonic trap, i.e. an Ornstein-Uhlenbeck process (Fig. 3). We benchmark our method by using a first-order polynomial basis, b = {1, x µ }, which can capture the exact force field. The 2D trajectory displayed in Fig. 3A has an information content of I = 27.6 bits, while this linear channel has N b = 6 degrees of freedom, allowing precise inference of the projected force field (Fig. 3A). Indeed, the squared relative error on the force coefficients is 0.15; this is consistent with the operational estimate of this error, 16. The force along the trajectory is thus inferred to a good approximation (Fig. 3A, inset). Furthermore, the projected force fieldF µαĉα (x) provides an ansatz that can be extrapolated beyond the trajectory (Fig. 3A), which works equally well here as the functional form of the force field is fully captured by our choice of basis. More quantitatively, we confirm the predicted behavior for the squared relative error by studying an ensemble of trajectories (Fig. 3B).
In the case of out-of-equilibrium Brownian systems, our method also permits the approximation of phase space currents and entropy production. Indeed, the phase space velocity v can be expressed in terms of a local Stratonovich average ofẋ, reflecting the fact that it is odd under time reversal [40]. Our estimator for the projection coefficients of the phase space velocity is thus (see Stochastic force inference for a 2D Ornstein-Uhlenbeck process, with force field Fµ(x) = −Ωµν xν and isotropic diffusion. A. An example trajectory. The inferred force field for this trajectory, using SFI with functions b = {1, xµ} (blue arrows), is compared to the exact force field (black arrows). Inset: the inferred force components along the trajectory versus the exact force components, with normalized mean-squared error (MSE). B. The average of the relative error on the inferred projection coefficientsFµα and its self-consistent estimate N b /2Î b both converge to N b /2I b , as expected from theory (see Appendix C). Here F τ µα = Fµ(x(t))ĉα(x(t)) dt τ is the projection of the exact force on the empirical projectors.
which is the discretized version of the Stratonovich integral 1 τ τ 0ĉ α (x(t)) • dx µ (t). This allows the inference of the entropy production rateŜ b = D −1 µνvµαvνα associated to the observed currents. This is a biased estimator of the entropy production, with an error that can be self- : the entropy production rate in the channel can thus be inferred using a single trajectory provided that several k B 's per degree of freedom have been dissipated.
The simplest structure for phase space currents corresponds to cyclic circulation around a point. The detection of such features in active biological systems has been the focus of a number of recent studies, which employ phase space coarse-graining [11,13,16]. This method is however limited to low-dimensional systems, and even then requires large amounts of data: indeed, the capacity per degree of freedom is low, as each grid cell is visited infrequently. In contrast, our method provides a way to detect circulation in any dimension with minimal data. Using the centered linear basis b α (x) =x α = x α − x α dt τ , we can infer the velocity coefficientsv µα , which have a matrix structure. This matrix readsv is the covariance matrix, and the antisymmetric part of A µν is A {µν} = 1 2τ x µ dx ν −x ν dx µ , which is the rate at which the process encircles area in the (µ, ν) plane [17,41]. This rate, sometimes called probability angular momentum [42,43], intuitively quantifies circulation and closely connects to cycling frequencies [14,44]. Indeed, the eigenvectors of A {µν} can be used to define cycling planes (see Appendix H). The entropy production rate due to cycling readsŜ b = D −1 µν A νρ C −1 ρσ A σµ . We demonstrate the potency of our cycle-detection method on a challenging dataset: a short trajectory of an out-of-equilibrium Ornstein-Uhlenbeck process in dimension d = 6 (Fig. 4A), which is equivalent to popularly used bead-spring models [13,15,44]. Our method identifies the principal circulation plane accurately, together with the force field (Fig. 4C). Quantitatively, we demonstrate that the angular error in the identification of this plane vanishes with increasing trajectory length (Fig. 4E), concomitant with the convergence ofŜ b to the exact value (Fig. 4F). The entropy production inferred is associated to the observable currents: if only a fraction of the degrees of freedom can be observed,Ŝ b is a lower bound to the total entropy production of the system (Fig. 4G), as some currents are not observable. In particular, if only one degree of freedom can be measured, this technique will yieldŜ b = 0; alternative techniques based on the non-Markovianity of the dynamics Time series of a 6D out-of-equilibrium Ornstein-Uhlenbeck process, with anisotropic harmonic confinement and diffusion tensor, and circulation. The force field is Fµ(x) = −Ωµν xν . The matrix Ω and the diffusion matrix are chosen from a random ensemble. The antisymmetric part of D −1 Ω has rank 2, thus inducing circulation in a randomly chosen plane. B. The same trajectories as in D, with additional time-uncorrelated measurement noise. C. SFI for the trajectory in A allows precise identification of the plane of circulation and reconstruction of the force along the trajectory. D. SFI applied to the trajectory in B, with measurement noise. It can still detect forces accurately. E. Convergence of the angular error for cycle detection with increasing trajectory length, for the process shown in D-E. F. Inferred entropy production rate for this process, with and without measurement noise (we subtracted here the systematic bias 2N b /τ ). The shadowed area indicates the self-consistent confidence interval for the inferred entropy production. The dotted line shows the exact value of the entropy produced; for the noisy process SFI underestimates this value due to blurring of the currents. G. Entropy production captured when observing a d-dimensional projection of the trajectory, averaged over direction of observation, for long trajectories. In plots E,F,G, error bars indicate standard deviation over an ensemble of 32 trajectories. Parameters of the simulations are presented in Appendix H.
are better suited to inferring entropy production in this case [45].
A major challenge in the inference of dynamical properties of stochastic systems from real data is timeuncorrelated measurement noise, which dominates time derivatives of the signal. Indeed, in our inference scheme, Eq. 7 is highly sensitive to such noise. In contrast, the time-reversal antisymmetry of the velocity coefficientŝ v µα makes them robust against measurement noise (see Appendix F). Exploiting this symmetry, we obtain an unbiased estimator for the force by using the relation between Itô and Stratonovich integration, whereĝ µα = − i ∆t τ ∂ µĉα (x(t i )) is an estimator for the projection of g µ = ∂ µ log P onto the basis (note that whileĝ µ (x) ≡ĝ µαĉα (x) is an estimate of ∂ µ log P (x), it is not a gradient, and thus cannot be integrated to estimate P (x)). The modified estimator proposed in Eq. 9 can only be computed if the projection basis is smooth, and would not apply to grid coarse-graining, for instance. It requires knowledge of the diffusion tensor D µν , as discussed in Sec. III. Using this modified force estimator allows precise reconstruction of the force field, circulation and entropy production even in the presence of large measurement noise (Fig. 4B,D-G). The limiting factor on force inference due to measurement noise then becomes the blurring of the spatial structure of the process. For observations with a finite time step ∆t, the currents are also blurred by time discretization, introducing an additional bias in the force estimator (see Appendix F), and resulting in an underestimate of the entropy production. Note however that this finite ∆t effect only induces a bias onv µα : for an equilibrium, time-reversible process,v µα → 0 and the force estimator reduces to D µνĝνα , which is independent of the time-ordering of the data.
We have so far considered only the case of linear systems projected onto linear functions. In general, force fields are nonlinear, which can result in a complex spatial structure. We illustrate this in Figs. 5A-B for processes with, respectively, non-polynomial forces and a complex attractor [46]. For such processes, SFI with a linear basis captures the covariance of the data and the circulation of its current. However, it fails to reproduce finer features, as evident by inspecting bootstrapped trajectories generated using the inferred force field (Fig. 5C-D). A better approximation of the force can be obtained by expanding the projection basis, for instance by including higher-order polynomials {x µ x ν }, {x µ x ν x ρ } . . . (Fig. 5E-H) or Fourier modes. The captured fraction of the capacity and entropy production increases monotonically when expanding the basis ( Fig. 5I-J), corresponding to finer geometrical details: the force field is well resolved if the measured capacity does not increase upon further expansion of the basis. However, expanding the basis also results in an increase in the number of parameters to infer, which eventually leads to overfitting.
For a finite trajectory, there is therefore a trade-off between the precision of the inferred force and the completeness of the force field representation. This is demonstrated in Fig. 6A-B by plotting the force inference error along the trajectory as a function of the number N b of degrees of freedom in the basis. At small N b , this error decreases, as it mostly originates from underfitting. At large N b , the error increases, as all statistically significant information is already captured and adding new functions primarily fits the noise. This is reflected in the inferred informationÎ b which steadily increases with the number of fitting parameters N b : the increase is initially mainly due to the increase in the captured information I b , but as N b grows, so does the typical error Appendix C 4), and this A. Trajectory of an out-of-equilibrium process with harmonic trapping and circulation, and a Gaussian repulsive obstacle in the center. The force field is given by where Ω has both a symmetric and antisymmetric part. B. Trajectory of the stochastic Lorenz process, a 3D process with a chaotic attractor. The force field is Fx = s(y − x), Fy = rx − y − zx, Fz = xy − bz, where we choose r = 10, s = 3, and b = 1. C-H. SFI for these two trajectories, respectively with polynomials of order n = 1, 3, 5 and n = 1, 2, 3: inferred force versus exact force (left) and bootstrapped trajectory using the inferred force field (right). I-J. Capacity (top) and entropy production (bottom) of each process projected on different bases for an asymptotically long trajectory, as a function of the number of degrees of freedom N b in the basis. These bases are polynomial and Fourier functions with order n = 0 . . . 7, and a coarse-grained approximation with a variable number of grid cells n = 2 . . . 7 in each dimension. Parameters and details of the simulations are presented in Appendix H.
error eventually overwhelms the gain in I b . As a practical criterion to optimize between under-and overfitting Figure 6. Influence of the size of the basis on the precision of SFI. A-B. SFI error as a function of the number of fit parameters, respectively for the models presented in Fig. 5A-B, with a Fourier basis, and for different numbers of time steps in the trajectory. Specifically, the y-axis is the mean squared relative error on the inferred force along the trajectory, µνFν . The crossover from under-to overfitting is apparent, and takes place at larger N b and lower error with longer trajectories. The star symbols indicate the optimal basis size predicted by our self-consistent criterion of maximizingÎ b − δÎ b . C-D. The squared error as a function of the amount of information Cτ in a trajectory of duration τ , for the optimal basis, averaged over n = 3 trajectories. For the Lorenz process with a polynomial basis (D, orange squares), the convergence is fast as the basis is adapted to the exact force field, and the saturation of the error to a lower plateau is due to the finite time step (see Appendix F). and best estimate the force along the trajectory, we thus propose to use the basis b which maximizes the information I b that can be statistically resolved. In practice, we find that choosing the basis size that maximizesÎ b − δÎ b (i.e. the inferred information minus one standard deviation) robustly selects the optimal basis size for a given trajectory (star symbols on Fig. 6A-B). An alternative optimization procedure, based on a similar balance, was suggested in [27] for one-dimensional processes. We empirically observe that when using this criterion to adapt the basis to the trajectory, the typical squared error on force inference scales as τ −1/2 with the trajectory duration τ (Fig. 6C-D). There is an exception to this scaling: when the force field can be exactly represented by a finite number of functions of the basis, such as the Lorenz process with order 2 polynomials, this same criterion selects the smallest adapted basis: further adding functions does not resolve more information. This results in a faster convergence of the force field as τ −1 (Fig. 6D), which is the rate of convergence of the force projections for a given basis size.
Systems with many degrees of freedom, such as active interacting particles (Fig. 7A), are challenging to treat. Indeed, with limited data, the criterionÎ b N b precludes even the inference of gross features of the force Figure 7. Stochastic force inference for harmonically trapped active Brownian particles with soft repulsive interactions F (r) = 1/(1 + r 2 ) between particles at distance r. A. Snapshot of a configuration for 25 active particles. The black dots indicate the direction of self-propulsion. We perform SFI on a trajectory of only 25 frames, blurred to mimic measurement noise. Background shows the trajectory of one particle, and force on each particle, inferred (blue arrows) and exact (black arrows). The fitting basis for SFI consists in a combination of harmonic trapping, constant velocity self-propulsion and radial interactions between particles with the form r k e −r/r 0 with k = 0...5 and r0 a typical nearest-neighbour distance between particles. B. Inferred versus exact components of the force on all particles along the trajectory. C. Inferred radial force between interacting particles, compared to the exact force.
field. In such cases however, the use of symmetries can make the problem tractable. For instance, treating particles as identical implies that forces are invariant under particle exchange, which greatly reduces the number of parameters to infer. Forces can then be expanded as oneparticle terms, pair interactions, and higher orders, by choosing an appropriate basis (see Appendix H 6). With this scheme, a large number of particles actually results in enhanced statistics, allowing accurate inference of the force components ( Fig. 7A-B) and reconstruction of the pair interactions ( Fig. 7C) with a limited amount of data. This method could be straightforwardly extended to include, e.g., alignment interactions between particles. In contrast to standard methods to infer pair interaction potentials, we do not rely here on an equilibrium assumption.

III. INHOMOGENEOUS DIFFUSION
We have so far assumed that the diffusion tensor does not depend on the state of the system. While this is a natural first approximation, it is rarely strictly the case: for instance, the mobility of colloids depends on their distance to walls and other colloids due to hydrodynamic interactions [47]. In order to mathematically describe Brownian dynamics in the presence of an inhomogeneous  (2πx), with periodic boundary conditions. B-C For the trajectory presented in A, inferred and exact diffusion coefficient (using Eq. 12) and force field (using Eq. 15) as a function of position. We use a 1st order Fourier basis to infer both force and diffusion. D. Analysis of the convergence of the diffusion (blue) and force (orange) estimators, as a function of trajectory duration, for the process presented in A. The dotted and dashed black lines are the self-consistent estimates for the squared error, respectively for the diffusion and the force. The plateau for the diffusion inference is due to the finite time step. E. A trajectory of a minimal 2D model, an isotropic harmonic trap at equilibrium, Fµ(x) = −Dµν (x)xν , in a constant gradient of isotropic diffusion, Dµν (x) = (1 + aρxρ)δµν . F-G. Inferred versus exact diffusion coefficient (using Eq. 12) and force components (using Eq. 15) along trajectory A. A linear polynomial basis was used to fit the diffusion coefficient, and a quadratic basis to fit Fµ. D. Convergence of the diffusion projection estimator (normalized by the average diffusion tensor) to its exact value for the process shown in A. Circles: using Eq. 12, diamonds: using Eq. 12 in the presence of time-uncorrelated measurement noise; triangles: using the bias-corrected local estimator. Error bars represent the standard deviation over 64 samples. Details and parameters in Appendix H. diffusion tensor D µν (x), Eq. 1 should be modified intȯ written in the Itô convention, i.e. evaluating D(x) at the start of the step. Here Φ µ is the drift, which relates to the physical force through The additional term ∂ ν D µν , sometimes called "spurious force", combines with the noise term to ensure that the dynamics does not induce currents and probability gradients in the absence of forces [47]. To our knowledge, the only way to infer the physical force is to infer both terms in Eq. 11 independently, and involves taking gradients of the inferred diffusion. Here we show how to infer both the diffusion field and the drift field, following the same idea as in Sec. II. We propose to approximate D µν (x) by its projection as a linear combination of known functions, As before, we can estimate the projectorsĉ α using trajectory averages; the only missing ingredient is a local estimated µν (t i ) for the diffusion tensor D µν (x(t i )). Such an estimator can be constructed asd The relative error on these projection coefficients is of order N b ∆t/τ (see Appendix G). Similarly to Eq. 7 for the force field, Eq. 12 corresponds to a linear regression ofd µν (t i ), and was previously suggested for onedimensional systems in [27]. We test this estimator using two minimal models: a one dimensional ratchet process with sinusoidal force and diffusion coefficient, inspired by the Büttiker-Landauer model [48, 49] ( Fig. 8A-D); and a two-dimensional process in a harmonic trap with a constant diffusion gradient ( Fig. 8E-H). We quantitatively recover the diffusion coefficient as a function of position ( Fig. 8B,F) and confirm that the error vanishes in the limit of long trajectories (Fig. 8D,H). Importantly, the estimator introduced in Eq. 12 is biased in the presence of noise on the measured x, and becomes effectively useless if this noise is larger than the typical ∆x. Inspired by the estimator proposed by Vestergaard et al.
[33] for homogeneous, isotropic diffusion, we define a bias-corrected local estimator where tensor products are implied. Modifying Eq. 12 accordingly thus corrects measurement noise bias ( Fig. 8H), at the price of an increased relative error for short trajectories (see Appendix G).
We also approximate the drift as a linear combination of functions, Φ µ (x) = Φ µα c α (x). Equation 7 provides an estimator for the projection coefficients Φ µα in terms of an Itô integral. This estimator is however impractical for experimental data, as even moderate measurement noise induces large errors in these coefficients. As in Eq. 9, we exploit the Itô-to-Stratonovich conversion to obtain an estimator that is not biased by measurement noise: wherev µα is the velocity projection coefficient (Eq. 8), andd µν (t i ) can either be the local biased-corrected estimator (Eq. 13) or another estimator of D µν (x i ). The convergence properties ofΦ µα to its asymptotic value are similar to those of Eq. 7.
We can now combine our diffusion (Eq. 12) and drift (Eq. 14) projection estimators to reconstruct the force field,F using Eq. 11. This estimator allows for quantitative inference of the force provided that the divergence of the diffusion coefficient is well approximated. We demonstrate this (Fig. 8C,D,G) for the simple processes presented in Fig. 8A,E using an adapted basis to fit the diffusion coefficient.

IV. DISCUSSION
In this article, we have introduced Stochastic Force Inference, a method to reconstruct force and diffusion fields and measure entropy production from Brownian trajectories. Based on the communication theory notion of capacity, we have shown that such trajectories contain a limited amount of information. With finite data, force inference is thus limited by the information available per degree of freedom to infer. SFI uses this information to fit the force field with a linear combination of known functions. We have demonstrated its utility on a variety of model systems and benchmarked its accuracy using data comparable to current experiments.
We now briefly compare SFI to other existing methods to infer forces from Brownian trajectories. SFI combines the ability to infer arbitrary force fields, for nonequilibrium processes, in high dimensions and in the presence of measurement noise. In contrast, many previous methods essentially rely on a specific linear [ approximations. However, these techniques become inefficient as the system's dimensionality increases. Furthermore, none offers a generic unbiased estimator in the presence of measurement noise. Few of these general methods are being used on experimental data in soft matter and biological systems. We quantitatively compare SFI to two of the most popular such methods [21, 23, 31] that rely on spatial binning (Fig. 9). Our method significantly outperforms them for a twodimensional process simulating single molecule dynamics in a complex cellular environment, in particular in the presence of realistic measurement noise.
An important by-product of SFI is the ability to quantify the irreversibility of a system by measuring the entropy production associated to its currents. Alternative methods to estimate entropy production also exist, either by coarse-graining trajectories to estimate currents [13,16], by measuring cycling frequencies [14,44], Figure 9. Quantitative comparison of SFI with other methods, on a simulated system mimicking 2D single molecule trajectories in a complex cellular environment with multiple potential wells, out-of-equilibrium circulation, and spacedependent isotropic diffusion. A. The diffusion field (blue gradient) and drift field (white arrows, scaled as |Φ| 1/2 for better legibility). B. The steady-state probability distribution function of the process. The blue traces show two representative trajectories with n = 100 time steps. The red traces show trajectories blurred by moderate Gaussian measurement error (with amplitude shown as a red kernel). C-F. Comparison of the performance of SFI with adaptive Fourier basis (green circles) and two widely used inference methods: Infer-enceMAP [23], a Bayesian method for single molecule inference (blue triangles), and grid-based binning with maximumlikelihood estimation [21, 31] (Eq. 7) and an adaptive mesh size (orange squares). We evaluate the performance of these methods on the approximation of the drift field (C,E) and diffusion field (D,F), as a function of the number N of singlemolecule trajectories (similar to those in B) used, with ideal data (C,D) and in the presence of measurement noise (E,F). The performance is evaluated as the average mean-squared error on the reconstructed field along trajectories. SFI outperforms both other methods in all cases; for noisy data, SFI is the only one that provides an unbiased estimation of the drift. Details and parameters in Appendix H 9.
by using non-Markovian signatures of irreversibility in hidden variables [45], or by using thermodynamic bounds on the fluctuations of dissipative currents [15]. These methods are however inherently limited to relatively lowdimensional systems with homogeneous diffusion, and even then require large amounts of well-resolved data; SFI, in contrast, performs well in high dimensions -even with trajectories too short to resolve the steady-state density -and in the presence of measurement noise and inhomogeneous diffusion.
We have limited our scope here to systems whose dynamics is described by Eq. 1 or 10, with a timeindependent force field and white-in-time noise. When the force field varies in time, for instance due to the dynamics of unobserved variables, SFI captures the average projection of the force onto the observed variables (see Appendix E). Furthermore, SFI could be extended to capture an explicit time-dependence of the force by using a time-dependent basis. Finally, force inference is notably complicated by non-Markovian terms in the dynamics [55], such as colored noise; however, in such cases, our projection approach to estimate phase-space velocities (Eq. 8) remains useful and valid.
Our approach, all in all, proposes a solution to the inverse problem of Brownian dynamics: inferring the force and diffusion fields from trajectories. This method consists in a few intelligible equations, and provides a powerful data analysis framework that could be used on a broad class of stochastic systems where inferring effective forces and currents from limited noisy data is of interest. Our work thus applies to microscopic systems where thermal noise is relevant, such as single molecules [21], active colloids [8,56] and cytoskeletal filaments [14,16]. Beyond thermal systems, for stochastic dynamical systems that can be effectively modeled by Brownian dynamics, applications of our framework range from the behavior of cells [3,10,57] and animals [58], to modeling of climate dynamics [5,50,59] and trend finding in financial data [4]. Our method could be combined with sparsitypromoting techniques, as used to infer dynamical equations in deterministic systems [60], to go from force fitting to identifying the simple rules governing the dynamics.

Material and methods
All formulas presented in this article are derived in Appendix, together with the details of each simulated system. Code

Appendix Appendix A: Gaussian channel interpretation of Brownian dynamics
In this Section, we address the question of quantifying the rate at which information can be read out, or is encoded in a trajectory. We assume that the system follows the overdamped Langevin equation, Here and in the main text, what we refer to as a "force" is in fact the physical force multiplied by the mobility matrix M , which has the dimension of a mobility. So, in terms of our F , the system is out-of-equilibrium if D −1 F does not derive from a potential. Indeed, a system in equilibrium has a physical force that is derived from a potential, and a mobility matrix which is proportional to the diffusion coefficient: D = M T where T is the temperature. Our approach thus does not distinguish out-of-equilibrium systems due to difference in temperature between components, such as popular bead-spring models, from systems driven by non-reciprocal force fields. We assume through most of this article that this diffusion matrix is known and space-independent (although it can be anisotropic); the case of a spatially variable diffusion matrix, and how to infer it from data, is treated it Sec. G. We also assume that a steady state exists and that the system is ergodic, i.e. that time averages converge to phase space averages. Note however that the discussion below can be readily extended to averages over an ensemble of trajectories instead of time averages over a single long trajectory. The complete force field is characterized by an infinite number of degrees of freedom, and thus in principle contains an infinite amount of information (the value of the force components at each location in phase space). It is therefore pertinent to ask if there is a bound to the rate at which this information can be read off from the trajectory. We consider an infinite length trajectory, from which, in principle, all information about the force field can be recovered. We argue that indeed there is such a maximal rate, given by the capacity (in natural information units, or nats) To explain this formula, let us first focus on a one dimensional system. A trajectory which satisfies the dynamics given by Eq. A1 encodes the information about the force field in the form of a continuous time signal F (x(t)) corresponding to the values of the force field at the points x(t) that the trajectory visits. However, what can actually be read out from the trajectory isẋ, i.e. the signal F (x(t)) with noise ξ added to it (Fig. 2). Thus, we can think of the dynamics Eq. A1 as a noisy communication channel, with Gaussian white correlated noise, where the information about the force is transmitted in the form of a codeword F (x(t)) which satisfies lim τ →∞ 1/τ τ 0 F 2 dt = F 2 (x)P (x) dx. In communication theory, such a channel is called an infinite bandwidth Gaussian channel [34]. It has a well defined capacity, i.e. a maximal rate of information transmission: for codewords of duration τ that satisfy the so-called "power constraint" 1/τ τ 0 dtF 2 (t) ≤ P, and a white noise with amplitude 2D the capacity is given by P/(4D) nats per second. Information cannot be transmitted through the channel at a faster rate. Stated differently, the capacity quantifies the (exponential) rate with which the maximal number of distinguishable signals grows with the amount of time the channel is used for, in particular as τ → ∞. In our case, the capacity is related to the distinguishability of different force fields with the same power constraint. The maximal rate is obtained for a signal which saturates the power constraint, so that the relevant constraint to consider is P = lim τ →∞ 1/τ τ 0 F 2 dt. Thus, our trajectory which has lim τ →∞ 1/τ τ 0 F 2 dt = F 2 P (x) dx cannot produce information about the force field at a rate faster than the capacity as defined in Eq. A2. Note that in contrast to the usual communication theory setting, we do not control the codeword through which the force field is encoded, only the decoding scheme-the code word is determined by the dynamics, the force field being sampled according to the probability density function (pdf) P (x). To go from the capacity for a one dimensional process to that of a d dimensional process, Eq. A2, we have decomposed the channel into d parallel channels and added up their capacities. Indeed, let us first go into the basis where the noise is diagonal and normalize its amplitude to two, such that all components of the new force D −1/2 µν F ν have the same units (t −1/2 ). The components of the noise become independent, and the d components in that basis become parallel channels, with signals measured in the same units, whose capacities sum up to Eq. A2.
The Shannon-Hartley formula and infinite bandwidth channels. The infinite-bandwidth capacity of Brownian dynamics, as presented in Eq. A2, corresponds to that of the continuous dynamics. It can also be seen as the ∆t → 0 limit of a discrete signal (i.e. a finite bandwidth signal) such as can be acquired in practice. The capacity of such a discrete Gaussian channel is given by the Shannon-Hartley formula [34] where we consider as before power-limited signals, where P∆t/N is the signal-to-noise ratio: P is the signal power (note that it is not the power of the system in the energetic sense, only in the signal theory sense), and N /∆t the noise power. When the bandwidth is taken to infinity, i.e. ∆t → 0, we get which corresponds to Eq. A2. For a finite but small ∆t the expression for the capacity becomes The first correction to the continuous-time capacity due to finite rate of sampling is thus of relative order C 0 ∆t, i.e. the information per sample: the loss of information when monitoring Brownian dynamics at a finite rate is thus negligible provided that the information per sample remains small. This has an important practical consequence for experimental applications, where there is often a trade-off between acquisition rate and duration of the experiment (for instance due to photobleaching of fluorescent proteins): when the information per sample becomes small, very little can be learned about the force field by increasing the acquisition frequency.

Appendix B: Information at the trajectory level
In this Appendix, we relate the notion of capacity to trajectory-level quantities, and relate it to other stochastic thermodynamics quantities: the entropy production and the inflow rate. While Appendix A was restricted to the case of constant-diffusion Brownian dynamics, here we consider the general case with not only a state-dependent force, but also a state-dependent diffusion tensor. In that case, the noise is no longer additive: it has a multiplicative component, and care must be taken to specify the convention within which the Langevin equation is written. We use the Itô convention here, writing:ẋ is the drift term [47], and F µ (x(t i )) equals the mobility matrix times the physical force. To relate the capacity to path-dependent quantities, we consider a trajectory C N = (x(0), x(∆t), ..x(N ∆t)), with t i = i∆t, and where we have defined the discrete difference ∆x µ (t i ) = x µ (t i + ∆t) − x µ (t i ) and τ = N ∆t. The path integral formula for the probability density P(C N |F ) of a trajectory C N in the force field F , written in the Itô convention, reads [39]: Note that in the limit of long trajectories, the initial point probability becomes unimportant. We show here that the capacity of the system relates to the Kullback-Leibler divergence rate between P(C N |F ) and the probability density at zero force (but with the same diffusion field), P(C N |0) ≡ P(C N |F = 0): Indeed, for a constant diffusion coefficient the right hand side of the above equation reduces to the capacity discussed in Sec. A2, Eq. A2. Note that for systems with multiplicative noise, to the best of our knowledge a formula for the channel capacity, as defined in transmission theory, has yet to be derived. Moreover, the interpretation from the standpoint of transmission theory is further complicated as, from physical considerations, we wish to infer F µ rather than Φ µ . However, one may use the trajectory based formula in Eq. B4 as a general definition of the capacity for Brownian dynamics. Then, the generalization of Eq. A2 to systems with inhomogeneous diffusion is seen to be: Let us proceed to show Eq. B4, where we have used that . Note that passing between the first and second line in the above equation is equivalent to deriving the Girsanov formula for diffusions.

The inflow rate
In the main text, we connect the capacity to the inflow rate G = dxP (x)g µ D µν g ν with g µ = ∂ µ log P . This quantity was originally introduced and studied by Baiesi and Falasco [36] in the case of Brownian dynamics with homogeneous diffusion (and for discrete Markov processes, not discussed here). We generalize it here to systems with inhomogeneous diffusion and discuss its properties.
Relation between the inflow rate and an instantaneous entropy production rate. Let us show that it corresponds to an instantaneous entropy production rate that would be present if the force was suddenly set to zero. Consider the entropy S(t) = − dxP (x, t) log P (x, t), after the force is set to zero: F µ = 0, denoting that instant by t = 0. At that instant one has ∂ t P = ∂ µ [D µν ∂ ν P ]. Then where we have used integration by parts, assuming boundary terms vanish. We can define v Fick µ = −D µν g ν , a Fick velocity related to the current j Fick µ = −D µν ∂ ν P , that would result from diffusion of particles with an initial density profile P (x) in the absence of forces. Indeed, in these notations G has a similar form to the entropy production rate However, the inflow rate is nonzero even at equilibrium. It measures the heterogeneity of the steady-state probability distribution. Indeed, for an equilibrium process F µ = D µν ∂ µ log P (and G = C trivially). In a sense, it is the amount of information that the force field needs to continuously inject into the system in order to maintain its spatial structure; while the entropy production can be seen as the amount of information the force field injects into the system to maintain its currents.
The inflow rate as a phase space contraction rate.
The relation D µν g µ = F µ − v µ (which holds for a spacedependent diffusion tensor) can be used to rewrite the inflow rate as where in the second line the steady state relation ∂ µ (v µ P (x)) = ∂ µ j µ = 0 was employed. We have thus obtained an expression for the inflow rate as (minus) the average divergence of the force. In a deterministic dynamical system this is equal to the average sum of the Lyapunov exponents and is called the average phase space contraction rate. It then corresponds to the mean rate of entropy production in the environment [38]. For non-deterministic systems it was mentioned in [38] as a "natural entropy production". It is worth stressing the difference between the deterministic case and overdamped Brownian dynamics in this context. While for a deterministic system at equilibrium, i.e. a Hamiltonian system, the divergence of the force is identically zero due to the symplectic structure (there is no entropy production), for an equilibrium overdamped system that divergence is nonzero. Indeed the inflow rate (which does not correspond to an actual entropy production in this case) is positive, as discussed above.
Trajectory based interpretation of the inflow rate.
Here we prove that an equivalent expression for the inflow rate is The simplest way to do that is to express the probability density of a trajectory (Eq. B3) in an alternative form, as we now show. We begin with the expression for the probability of a transition to the point x from the point x in an infinitesimal time ∆t [39] Note that here the diffusion coefficient and Φ µ are both evaluated at the point x to which the system transitions. The probability of a trajectory is then simply given by a product of such transition probabilities, and the distribution of the initial point. Using that Φ µ = F µ + ∂ ν D µν we then get It follows that the probability of the time reversed trajectory −C N = {x(t N ), x(t N −1 )..., x(t 0 )} can be written in the form Now, it becomes straightforward to evaluate Eq. B12, dividing term by term in the product in Eq. B3 by the product in P(−C N | − F ), using Eq. B15 with the reversed sign for the force. Indeed, we notice that all terms cancel out except for the divergence of F µ , which yields (we ignore the terms related to the initial and final distributions whose contribution vanishes in the limit of τ → ∞)

Different decompositions of the capacity and the relation to traffic
The trajectory-based expression for the capacity, Eq. B4, is related to the "dynamical entropy" introduced in [37]: it is equal to the dynamical entropy per unit time in the limit τ → ∞, i.e to a rate of dynamical entropy. In [37] the dynamical entropy was split into two contributions: a time anti-symmetric contribution, equal toṠ/2 and a time symmetric contribution −T , where T is called the traffic (and is related to the so-called frenesy in Markov jump processes). The relations between the capacity, the inflow rate we have defined, the entropy production and the steady state traffic T are The decomposition of the capacity that we have presented in the main text can also be presented as the sum of time symmetric and anti-symmetric parts, but corresponding to a different trajectory-based expression for the capacity: Indeed, the first term in the last line is time anti-symmetric, and is equal to the entropy production rate, and the second term is time symmetric and is equal to the inflow rate. One can think of the decomposition of the capacity intoṠ and G as decomposing the influence of the force field into two types of "orders": "go there! " -corresponding to a dissipative, irreversible motion quantified byṠ -and "stay there! " -corresponding to a nondissipative, reversible motion fighting thermal diffusion, and quantified by G.

Appendix C: Stochastic Force Inference: estimating Fµα and its error
In this Section, we derive the core results of our article: how to perform SFI in practice, and self-consistently estimate the error in the inference.

The force as a trajectory average
To be able to deduce the force from the trajectory one first needs an expression for the force in terms of measurable quantities along the trajectory. We have where ·| x(t) = x means averaging over realizations of the noise, conditioned on being at position x at time t. We have defined hereẋ + as the right hand derivative, corresponding to Itô calculus (see Appendix A of [40]). The coefficients of the force field in its decomposition with respect to the phase space projector c α (x) are: Because of this last expression, the force projection coefficient F µα can be expressed as an average quantity along an infinitely long trajectory, which can thus be estimated by computing it on a finite trajectory. Note that, similarly to the force, the phase space velocity can also be defined through an average ofẋ, where the time derivative is taken in the Stratonovich sense: ). The phase space velocity in its decomposition with respect to the phase space basis c α (x) is, analogously to the force,:

Projection on the empirical basis
The second difficulty in evaluating Eq.2 of the main text in practice is that the phase space measure P (x) is unknown in practice. As a consequence, the phase space basis, c α (x) is not known either, as it is the orthonormalized basis derived from b using P as the measure. Our approach consists in approximating P (x) by the empirical measurê corresponding to a time average along the trajectory. We then define the empirical projectorĉ α with respect to this measure, as in the main text: In the long-trajectory limit, these "empirical projectors"ĉ α (x) converge to the phase-space projectors c α (x); more precisely, we expect that for typical trajectoriesĉ α (x) = c α (x) + O( τ 0 /τ ), where τ is the duration of the trajectory and τ 0 is a relaxation time of the system. In the case of the polynomial basis for instance, the convergence of the basis at order n is related to the convergence of the n-th cumulant of the probability distribution function. We do not seek to make this statement more mathematically precise here.
As an intermediate variable for this calculation, we define the projection coefficients F τ µα of the (exact) force onto these empirical projectors. These coefficients are trajectory dependent; however,ĉ α are directly accessible from the trajectory, as is the empirical measure with respect to which they are projectors, so that obtaining the coefficients F τ µα precisely, would result in an accurate approximation of the force field F µ ≈ F τ µαĉα along the trajectory. For this reason, we focus here on how the estimatorF µα as defined in Eq. 6 of the main text converges to F τ µα . The relative errors presented in the main text also refer to this convergence (rather than the convergence to the phase-space projection F µα ). Recall that our estimator is given bŷ using the Langevin equation (A1). Since F τ µα is what we wish to infer, we propose to study now the statistics of Z µα =F µα − F τ µα , i.e. its mean and variance.

Statistics of the error in the inference of the projection coefficients
We thus study the first and second moment of the random tensor Z µα , i.e. respectively the systematic bias and the typical error ofF µα as an estimator of F τ µα . To make the norm of these moments meaningful, it is necessary here to go to dimensionless coordinates: indeed, different phase space coordinates can have different dimensions (such as, for instance, a phase space comprising both distances and angles, as in Fig. 7 of the main text), and thus different coordinates of Z µα cannot be compared or summed. To this end, we define W µα = D −1/2 µν Z να , all the coordinates of which have the dimension of t −1/2 .
First recall that we defined both phase-space and empirical projectors as a linear combination of the basis functions b, c α = B − δ αβ the dimensionless error on the orthonormalization matrix (indeed, the basis functions b α can in principle have a dimension). We have lim τ →∞ ∆ αβ = 0; typically, we'll have more precisely ∆ αβ = O(1/ √ τ ), corresponding to the convergence of trajectory integrals to phase-space integrals in Eq. C9. We then have For the remainder of this Section we will denote the Itô integral by a regular integration: We now put an upper bound on the first moment of Z µα , i.e. on the systematic bias. Note that the first term in Eq. C10 has zero average, as it is linear in the noise. In contrast, due to possible correlations between the noise and the random variable ∆ αβ , the second term may not average to zero. Going to dimensionless coordinates, we use the Cauchy-Schwarz inequality to bound the norm of this bias: We can then use the Itô isometry relation [61] to prove that , which corresponds to a fast convergence of the bias towards zero: the bias is negligible compared to the fluctuating part of inference error, which goes as O(τ −1/2 ). Indeed, let us now compute the second moment of W µα . We have depends on all values of t, it is not adapted to the Wiener process dξ µ t , and thus we cannot apply the Itô isometry. However, we haveB −1/2 αγ = B −1/2 αβ (δ βγ + ∆ βγ ). Applying the Itô isometry (Eq. C12) yields: where we have defined the remainder which is, as we show now, subleading in Eq. C15. We now wish to bound the amplitude of the remainder | W µα W µα − 2 τ N b | = |R µαµα |. Since for typical trajectories ∆ αβ = O(τ −1/2 ), we can bound every element of the matrix |B is a (non-fluctuating) number and O γδ is the matrix with ones at all places. We get In the fourth line we have used that for two semi-definite matrices M αβ and N αβ , M αβ N βα ≤ M 2 αα N 2 ββ ≤ M αα N ββ , an identity based on the Cauchy-Schwarz inequality. In the fifth line we employed the the Itô isometry (Eq. C12). Again, this subleading term originates from the convergence of the empirical projected basis to its long-trajectory limit.

Self-consistent estimate of the error on the projected force
The previous error estimates are rigorous, but require knowledge of the exact force field to assess their amplitude. The goal of this section is to provide approximate estimates of the typical error that can be obtained using only the inferred force field, and are thus useful in practical situations. Now that we know the statistical properties of the dimensionless error term W µα , we can write the covariance of the inferred force projection coefficients explicitly: Now, let us define the information along the trajectory by In the long time limit, the rate of information I τ b /τ converges to the capacity we had discussed previously. Similarly, we define the empirical estimate of the information along the trajectory, We can also relate the average of the empirical information to the trajectory information: at leading order. The estimatorÎ b is thus biased, with bias 1 2 N b . The variance of this estimator is well approximated In practice, the "true" force field is not known -inferring it is the goal here. It is therefore important to provide an estimate of the inference error using only the inferred quantities. Eq. C19 allows us to propose such a self-consistent estimate of the error. Indeed, it can be interpreted as the (squared) typical error on the force projection coefficients, its right-hand-side can be estimated using only trajectory-dependent quantities (again, we assume that the diffusion matrix is known). We can also combine these quantities in a single number quantifying the relative inference error, as Thus N b /2Î provides a self-consistent estimate of the relative error. Note that in the absence of forces, Î = N b /2, corresponding to an inferred error of 1, which is consistent. Similarly, based on our estimate of the variance ofÎ b , we define a self-consistent confidence interval around this inferred information as

The force estimator and maximum likelihood
Here we show that the estimator we propose in Eq. C8 is also the maximum log-likelihood estimator for F µα . Indeed, given a measured trajectory C τ , we use the expression for the probability of a trajectory, Eq. B3, to calculate Next, the empirical projectorsĉ α , corresponding to the trajectory, give the decomposition of the force as which is solved by our estimator in Eq. C8. This estimator indeed maximizes the log-likelihood, sinceĉ α (x) is independent of F τ µα so that which is a negative definite matrix.

Appendix D: Inference of velocities and entropy production
In this section, we show how our approach allows the inference of entropy production, and phase space currents (or more specifically phase space velocities). We start by some phase-space reminders about the entropy production, then discuss how to infer the entropy produced from a given trajectory.
Phase space entropy production. The steady state entropy production rate is defined via [18] where v ν (x) = j ν (x)/P (x) is the phase space velocity, explicitly given by and j ν is the phase space current. The equality between the two expressions for the entropy production arises from the steady state condition: ∂ µ j µ = 0, implying that g µ = ∂ ν log P (x) is orthogonal to v µ with respect to the phase space measure.
is the entropy production related to the heat produced in the bath. Indeed, if the Einstein relation between the mobility and diffusion matrix holds, then this term corresponds to the average work performed by the force divided by the temperature. As the system is overdamped, any work performed is dissipated into heat. Note however that even if we do not assume the Einstein relation holds (i.e. that the origin of the white noise is a heat bath), this quantity is related to time irreversibility.
Entropy production along a trajectory. One can define the entropy production along the trajectory, or equivalently the dissipated heat divided by temperature corresponding to the work performed by the force, as [18] where the integral is to be understood in the Stratonovich sense (which following usual notations we denote as •dx t µ ). This entropy production is often referred to as the entropy produced in the medium, and one can also define what is called the total entropy production along the trajectory (medium+system) [18]. Assuming the initial point is drawn from the steady state pdf, the total entropy production is given by In the limit τ → ∞, when divided by τ , the two definitions for the entropy production converge to the same limit, equal to the entropy production rate in the system: dx. Velocity and entropy production inference. The probability density P (x) is generally not accessible, so that the phase space velocity cannot be directly computed. However, we have already discussed the empirical densityP (x) and we can also define the empirical current (see for example [62]): using in the last line that x(t) satisfies the Langevin equation and the relation between Itô and Stratonovich integrals. This motivates the definition for the empirical phase space velocitŷ and allows to write Note that in this last equation, the force is the exact force, but the velocity (and probability measure) is the empirical one, defined in Eq. D6, so that we obtain the trajectory-wise entropy production related to the heat, as in Eq. D3. If we now insert into this relation the projection onto the empirical basis of the force and phase space velocity we get the entropy production corresponding to that basis: using integration by parts. The estimator for the entropy production related to the basis is It is important to note that the projected entropy production corresponding to the heat is not positive definite, unless we are able to resolve the entire force. Therefore, it does not give a bound on the entropy produced. Furthermore, recall that as was the case for the projection onto the phase space basis, the projected total entropy and that related to heat generically differ.
On the other hand, the projection of the total entropy production is positive definite, and therefore does give a lower bound on the entropy production. The expressionv µα D −1 µνvνα may be viewed as an estimator of the projection of the total entropy production ∆S τ b /τ , or the total entropy production rate in the steady stateṠ b = v µα D −1 µν v µα , however some caution is required. Indeed, consider (note that one velocity is empirical and the other is exact in this equation). then and we can define the estimatorŜ This estimator is less controlled than the estimator we have forΠ τ b . Indeed, the estimatorv has two sources of error as an estimator of v. Definingṽ with the empirical pdf rather than the actual one, we havev =ṽ µ + 1 µν dξ ν t /P (x). In particular, for the projection onto the empirical basis:v whereṽ τ µα − v τ µα = δv τ µα = 0, v τ µα being the projection of the actual phase space velocity onto the empirical basis. This is in contrast to the force, where our estimator includes the projection of the actual force.
We writeŜ This is a biased estimator, since We do not have a formal estimate for the last term, δv τ µα D −1 µνvνα , but we expect δv τ µα ∼ O(1/ √ τ ) so that a reasonable estimate seems to be: Here, for the estimate of the fluctuating part (the error term) we have estimated , and the contribution in the square root is the dominant term when v τ µα is non-zero, i.e there is signal. We focus on the long time limit, τ → ∞,Ṡ τ →Ṡ so that naturally an estimator of ∆S τ b /τ becomes also an estimator ofṠ b , with deviations which are again of order O(1/ √ τ ). Thus, we may finally estimatê Note that this is an order-of-magnitude error estimate, not a fully rigorous one. In this article, we make strong assumptions on the dynamics of the system we observe: that it obeys a Langevin dynamics for the observed degrees of freedom x, and that the force field in phase space is time-independent. These two assumptions are linked. Indeed, consider the very relevant case of systems which obey a Langevin dynamics, but for which not all degrees of freedom are observable. In that case, the force on the observed degrees of freedom depends on the state of the hidden variables, therefore apparently violating the assumptions of our formalism. It is interesting to note however that this violation is only superficial. Indeed, "hiding" some degrees of freedom of the system is completely equivalent to using a projection basis where these degrees of freedom do not appear explicitly (i.e. functions that are constant with respect to these degrees of freedom). Therefore, provided that the system as a whole obeys a constant-force Langevin equation, SFI will capture the projection of the dynamics onto the observed degrees of freedom, effectively averaging over the hidden ones. Indeed, assume that the force takes the form F (x, y) where only x can be measured. We thus project the force field onto a set of function b α (x) that depends only on x. Hence y) is the force at x averaged over y. A similar formula applies to the phase space velocity, as well as when replacing the phase space integral by a time integral-in which case one replaces the phase space measure with the empirical measure. As a consequence, our formulas for the projected entropy production and capacity remain valid, and provide lower bounds to total entropy production and capacity of the system: where we have applied Jensen's inequality twice.
Appendix F: Inference with imperfect data: measurement noise and time discretization Our inference method relies heavily on computingẋ, i.e. the first time derivative of the signal, and on being able to resolve the difference between Itô and Stratonovich time derivatives for (the white noise part of) the signal. One expects that measurement noise would then swamp the signal and make the distinction between the two, and thus our inference method, impractical. It turns out, however, that even in the presence of measurement noise we can suggest estimatorsv µα andF µα which are unbiased by the measurement noise and accurately capture the currents and forces, respectively.
Indeed, let us consider a noisy measure y of the system's state x at discrete times t i = i∆t, defined as where x obeys the dynamics (A1) and η is the measurement noise (which we assume to be of zero average, without loss of generality). We assume this noise to be uncorrelated between different (discrete) time points. Consider first the estimatorF (noisy) µα for the force projection coefficient in the presence of noise (we define, as before, There are two parts to the error due to measurement noise, one stemming from the noise in the position and the other from the noise in the velocity. We assume here that the former is relatively small, i.e. that we can write Then the average (over measurement noise) of the estimator for the force projection reads This second term is a "dangerous" bias, as it diverges with ∆t → 0, which is symptomatic of the influence of measurement noise on force inference. Eq. F2 is thus impractical in this case. In contrast, it is interesting to notice than when doing the same expansion with the velocity projection coefficients, we havev Now all the dangerous terms in 1/∆t have zero average. Indeed, averaging over the measurement noise, The reason for these useful cancellations is that by construction, the velocity projection coefficient is odd under time-reversal of the trajectory; in contrast, all moments of the measurement noise are even under time reversal, as it is assumed to be time-uncorrelated. Note that there remains a fluctuating term which is of the order O( Λ/τ ∆t), where Λ is the magnitude of the measurement noise variance. Up to this zero-mean error term, our estimator for the velocity projection coefficients is thus unaffected by measurement noise on time derivatives. To obtain an unbiased estimator for the force, we may use the relation between Itô and Stratonovich integration for a variable x which satisfies the stochastic differential equation (Eq. A1): We can therefore use for the force estimator where we have seen thatv µα is unbiased by the noise, and the last term does not include a time derivative of the measurement and so is also under control. Note that both the empirical informationÎ b and the estimated entropy productionŜ b are now biased by the measurement noise, the bias being of order O(1/(τ ∆t)). Thus our treatment of the measurement noise remains incomplete, and if no other method is used to take care of the measurement noise, requires sufficiently large τ as well as not too small time steps ∆t. In addition, if the amplitude of the noise is not small compared to the typical spatial variation of the trajectory then there are additional biases coming from evaluating the projectors at the wrong points.
Finally, in order to resolve the force correctly, the time step ∆t must not be too large: indeed, force variations during the time step result in a blurring of the inferred force field. Specifically, the force variation over a time step is, on average, ∆F µ ∼ ∆t F ν ∂ ν F µ . This results in a discretization bias δF µα in the force estimator (Eq. F2), the magnitude discretization of which can be self-consistently estimated as: whereF µ (x) =F µαĉα (x) is the inferred force field,Ĉ =F να D −1 µνF µα /4 is the inferred capacity, and · denotes average over the trajectory. Note however that when using, as we suggest for "real" data, Eq. F11 as an estimator for the force projections, the discretization error is only for the dissipative part of the force field, i.e. only onv. Indeed, the second term in Eq. F11 does not involve the time ordering of the data, and is therefore independent of ∆t. Furthermore, the use of a Stratonovich average for the estimate ofv µα reduces the squared error in Eq. F12 by a factor 4.
Comparing the discretization error estimate (Eq. F12) with the error stemming from the limited amount of information, Eq. C24, allows to self-consistently determine whether the limiting factor to force inference is the total trajectory length or the frame rate. This is particularly important for the optimization of the acquisition protocol in applications such as tracking of fluorescently labeled biological objects, where photobleaching limits the total number of frames that can be captured.

Appendix G: Inference in the presence of an inhomogeneous diffusion coefficient
We now provide proofs of the results presented in Sec. III of the main text, regarding the inference of diffusion and drift in the presence of a state-dependent diffusion tensor. Our method of inference for the diffusion coefficient follows a similar logic to that of the inference of the force. We start with the local expression and define the projections from which we get our estimatorD where we have defined the local diffusion estimator, (G4)

Estimate of the error on the projected diffusion coefficient
We now compute the typical error between the estimatorD µνα and the exact projection coefficient D µνα . We work with the discrete version of the overdamped Langevin equation (Eq. 10), written using the Itô convention: where ∆ξ ti ν is a centered Gaussian variable with variance ∆ξ ti ν ∆ξ ti µ = ∆tδ µν δ ij . For error calculations we only consider the leading order terms in ∆t, so that we can replace ∆x µ (t i )∆x ν (t i ) by 2D νσ ∆ξ ti ρ ∆ξ ti σ . Hence: We define the normalized (dimensionless) error whereD µν is a reference constant diffusion matrix used for the normalization, which could be taken as the average diffusion tensor:D µν = D µν (x)P (x)dx. We have also denotedD ρσ = D 1/2 ρµD −1 µν D 1/2 νσ and ζ ti ρσ = ∆ξ ti ρ ∆ξ ti σ /∆t − δ ρσ . Note that ζ ti ρσ = 0 and ζ ti ρσ ζ tj µν = δ ij (δ ρµ δ σν + δ ρν δ σµ ): using Wick's theorem in the last equality. The normalized squared error is then given by: We compute the leading order of this error, replacingĉ α (x(t j )) by c α (x(t j )): in the equality in the second line we have used that ζ ti ρσ is white in time correlated and centered: i.e. that it is uncorrelated with x(t j ) for j ≤ i and that ζ ti ρσ = 0, which gives an Itô isometry type of result for the double sum. In the line before last we have passed to the continuous limit of the sum, using τ = N ∆t. In the last line we have assumed thatD νµ (x(t i )) is bounded from above in the domain. We denote by D max the maximum eigenvalue of D νµ (x(t i )) in the domain, and boundD νµ (x(t i ))D µν (x(t i )) ≤ d(D max ) 2 .
Let us comment that the correction to the above result, due to the difference betweenĉ α (x(t j )) and c α (x(t j )) can be bounded in a similar fashion as was done in Section C 3, if one again uses the assumption thatD νµ (x(t i )) is bounded in the domain. This correction should result in a term of order O(τ −3/2 ), which is sub-leading.
To summarize, we have the error estimate withD max the maximum eigenvalue ofD ρσ = D νσ in the domain. Here the choice of normalizationD is arbitrary, and it may be chosen as a diagonal matrix with the maximal diffusion coefficients in the domain on the diagonal, in a dimensionally consistent way (i.e if there are directions in phase space with different units each has its own maximal diffusion). In that case D max becomes of order unity.

Inference of the diffusion coefficient with measurement noise
As in Sec. F, we now consider the case where the exact trajectory is not known, but only a noisy approximation of it, due to imperfections of the measurement device. To correct for such measurement noise, we suggest using the modified estimatorD where as in Eq. 13 of the main text, We thus want to estimate the relative magnitude of the error term Z µα in Eq. G15. The statistics of Z µα can be derived following the derivation in Sec. C 3, except that now the diffusion coefficient depends on x. Thus, the normalized error W µα is defined using the average diffusion coefficientD µν and the calculations go through resulting in the same asymptotic behavior. However, now the variance of the error reads where the space dependence of D µν prevents us from using the orthonormality of c α . We thus have Finally, we can normalize by the average diffusion tensorD µν to obtain the estimate: where we have defined D max as the maximal eigenvalue of the matrixD −1 µρ D ρν (x) in the domain. Finally, we note that in our method, the inferred physical forceF µ (x) is obtained in Eq. 15 by combining the drift with the divergence of the inferred diffusion tensor. As there is no control of the error on this latter term -the error on the gradient is a priori independent of the error on the function estimate, in the absence of regularity assumptions -we cannot provide an error estimate for the inferred physical force.
Appendix H: Model details and simulation parameters for numerical results

Overdamped Langevin simulations
To benchmark our Stochastic Force Inference method, we test it on several simple models of Brownian dynamics. We discretize the overdamped Langevin equation,ẋ µ = F µ + ξ µ , into or, in the case of a state-dependent diffusion tensor inducing multiplicative noise, Here ζ is a vector of independent normal random variables with zero mean and unit variance. Again, the force here includes the mobility matrix: the system is out-of-equilibrium if D −1 F(x) does not derive from a potential, regardless of whether this comes from violations of fluctuation-dissipation relations (such as interacting components at different temperatures), non-reciprocal interactions or the presence of curl in the external force fields. Note that in order to ensure numerical stability of this equation, the interval dt must be sufficiently small, while SFI can accommodate a moderately large value of dt (see Sec. A). We therefore run the simulations at a higher rate than the input for SFI; the value of ∆t indicated in the parameters is that of the SFI input, while the elementary time step used to generate the trajectories is denoted dt. All simulations presented here have an initial state pre-equlibrated.
In the simulations presented in this article, the diffusion matrix is assumed to be known, except in Fig. 8 where inferring it is part of the object of the simulations. In all other figures, it could however be inferred using our method (but fitting it only with a constant). In general, in the strong-noise cases considered in this article, inferring the diffusion coefficient is significantly less demanding than force inference, and results in very little additional error.

2D Ornstein-Uhlenbeck processes (Figure 3)
The first model we benchmark our method on is a 2D process in a linear trap, also known as an Ornstein-Uhlenbeck process. We consider here an anisotropic equilibrium process with isotropic diffusion; we set the diffusion to unity, D µν = δ µν . The force field is F µ = −Ω µν (x ν − x 0 µ ) (black arrows in Fig. 1F), where we choose We use a simulation timestep dt = 0.005 and ∆t = 0.01. The trajectory presented in Fig. 1C and analyzed in Fig  1G of the main text has a length N samples = 4000. It is analyzed by SFI with basis b = {1, x 1 , x 2 }. The inferred projected force field on this basis (blue arrow in Fig. 1F) has the formF µ (x) = −Ω µν (x ν −x 0 ) where the N b = 6 inferred parameters arex Quantitatively, as mentioned in the main text, this results in a (squared) relative error on the inferred projection coefficient The inferred information along this trajectory iŝ I b =F µα D −1 µνFνα = 19.1 (i.e. 27.6 bits with the 1/log(2) nat-to-bit conversion factor). The self-consistent confidence interval for this error is N b /2Î b = 0.16: the actual error is thus within the confidence interval.
It is interesting to note that the inferred matrixΩ (Eq. H4) is not symmetric, meaning that the inferred model is out-of-equilibrium (it exhibits phase space cycling). This does not, however, result in significant entropy production. Indeed, the inferred entropy produced is∆S = 0.5k B .
In Fig 1G of the main text, we study the statistics of the relative error, obtained over 64 realizations of trajectories of the same model, with varying length N samples = 2 4 , 2 5 , ..., 2 17 , 2 18 . We present the average (and standard deviation, blue symbols and error bars) of the squared relative error ; the average self-consistent estimate of this error N b /2Î b (orange solid curve), and the asymptotic convergence to N b /2τ C b , i.e. the actual information per degree of freedom (black dashed line). These quantities match quantitatively in the long trajectory limit, as predicted from our analytical reasoning (Sec. C). Interestingly, in the regime where there is little information available in the trajectory, our self-consistent formula reliably predicts a relative error of order 1, consistent with the fact that there is no signal.
3. 6D circulating Ornstein-Uhlenbeck processes (Figure 4) The next example we use to test SFI is another Ornstein-Uhlenbeck process with force F µ = −Ω µν (x ν − x 0 µ ), but this time with several complications: it is high-dimensional (d = 6), with anisotropic diffusion and trapping, and such that we exert a torque in a given plane. We challenge our method by applying it to the short trajectories displayed in Fig. 1D in the main text, and even further in Fig. 1E in the presence of strong measurement noise.
The diffusion and harmonic trapping matrices are obtained as random matrices constructed to have a moderate degree of anisotropy. The diffusion matrix is symmetric, while the confinement is not and induces circulation.  Fig. 1E has N samples = 400 points, and the three plots correspond to three projections of the same trajectory, respectively (from left to right) along directions (x 1 , x 2 ), (x 3 , x 4 ) and (x 5 , x 6 ).
In Fig. 1H, we present the results of SFI at linear order (b = {1, x µ }) for the specific trajectory displayed in 1E. The inferred parameters are: with a squared relative error of 0.24, consistent with the self-consistent estimate N b /2Î b = 0.22. We show in Fig. 2H in the main text a 2D-slice of the inferred force field (blue) and the exact force field (black). This slice is chosen as the plane of maximal inferred circulation. To determine this plane, we consider the nondimensionalized velocity projection coefficients, R αβ = C −1/2 αµvµβ , with C the covariance matrix of the data. With this choice of normalization, the rows and columns of R are normalized in the same way, and it thus makes sense to consider its antisymmetric part to quantify circulation. The eigenvalues of 1 2 (R αβ − R βα ) are imaginary and come in conjugate pairs. We define the inferred principal circulation plane as the real-space plane (u, v) spanned by (C 1/2 µα r 1 α , C 1/2 µα r 2 α ), where (r 1 α , r 2 α ) is the pair of eigenvectors of R associated to the eigenvalue of largest norm. We compare this inferred plane to the exact plane of maximal circulation (u 0 , v 0 ), obtained through the same procedure but with an asymptotically long trajectory (N steps = 2.10 6 ). In Fig 1J, we present the statistics of the angular error in this cycle detection. This angular error is defined as ) and (u 0 , v 0 ) are the pairs of orthogonal unit vectors defining the inferred and exact maximal circulation planes, respectively. This error is equal to 0.12 for the trajectory presented in Fig 1D, and decays to zero as δ ∼ τ −1 with increasing trajectory length, as the inferred matrixΩ converges to Ω. Fig 1K shows the statistics of the de-biased entropy production,Ŝ − 2N b /τ . Measurement noise. In Fig 1E, we present the same trajectories as in Fig 1D, with an added challenge to force detection: a strong "measurement noise", i.e. a time-uncorrelated error on the input data x µ . We model such noise by adding Gaussian white noise to each coordinate of x µ , with standard deviation equal to 0.5 (half the standard deviation of the data). In the presence of such time-uncorrelated noise, the estimate ofẋ becomes strongly noisy, and we have to used the modified estimator forF µα , Eq. F11. With this estimator, we infer: In Figure 2A, we study the case of a 2D stochastic process with circulation in a nonlinear force field, using Stochastic Force Inference with a polynomial basis at different orders. The force field we use is: which is a non-polynomial force field, i.e. it cannot be captured exactly in our choice of basis. We use isotropic diffusion with D = 1. We simulate this process with ∆t = 0.01 and dt = 0.001; the trajectory in Fig 2A has N samples = 4096. We perform SFI on the trajectory with a polynomial basis at orders n = 1, 3, 5 in Figs.2C,E,G; note that as the force field is odd under reversal x → −x, the even orders in the polynomial expansion do not contribute to it (as apparent in the n-dependency of the capacity in Fig 1I). The bootstrapped trajectories presented on the right column of Fig 2C,E,G are obtained using the inferred projected force field,F µαĉα (x), to simulate new trajectories with the same starting point, τ , dt and ∆t as the original trajectory.
In Fig 2I, we present the capacity C b and entropy productionṠ b captured by the projection of a long trajectory with N samples = 2 18 onto three different bases: • Polynomials of order n = 0...7.
• Fourier modes of order n = 0...7; specifically, we use all functions of the form cos 2π µ k µ (x µ − x µ )/R µ and sin 2π µ k µ (x µ − x µ )/R µ with non-negative integers k µ such that µ k µ ≤ n. Here we choose R µ to be 1.05 times the diameter of the trajectory in direction µ.
• A constant-by-part grid coarse-graining with n = 2...7 grid cells in each direction, centered on x µ and with width R µ . Our second nonlinear process is a stochastic variant of a popular model for dynamical systems, the Lorenz system [46]. Its 3D Brownian dynamics is described by the force field In our simulations, we employ the parameters r = 10, s = 3 and b = 1. Diffusion is isotropic with D = 1. We use ∆t = dt = 0.02, and the trajectory in Fig 2B has N samples = 2 12 . All images of trajectories are in the (xz) plane.
It should be noted that this force field is polynomial of order 2, implying that it can be fully captured by the order n = 2 of our polynomial expansion. Indeed, with polynomial SFI at orders 2 and 3 (Fig 2F,H) we capture precisely the force field, and bootstrapped trajectories are very similar to the original data. As apparent in Fig 2J, the order n = 1 polynomial approximation only captures a fraction of the capacity and entropy production. Interestingly, the order n = 2 polynomial approximation captures the whole capacity, but not the full entropy production, as there are nonzero exchange terms with higher order moments (corresponding to the fact that the logarithm of the pdf is not itself a polynomial).
6. Active Brownian Particles simulations (Figure 7) The next system studied in this article corresponds to a model of self-propelled Brownian particles, mimicking in a somewhat realistic manner experimental systems such as studied in Refs. [8]. Specifically, we simulate N particles = 25 self-propelled 2D particles, each characterized by its coordinates x and orientation θ. These particles interact through soft repulsive pair interactions f (r) between particles at distance r, are self-propelled towards the direction θ at velocity v, and are harmonically confined with strength ω: the force exerted on particle i is thus The angle θ is freely diffusing (note that we could include alignment interactions in this model).
In our simulations we use f (r) = 1/(r 2 + 1), ω = 0.2, v = 1, isotropic diffusion with D = 1 in spatial coordinates and angular diffusion with D θ = 0.1. We use a large sampling time step ∆t = 1, while the simulation step is dt = 0.01. The number of frames for our study is very limited, N frames = 25, with significant positional and angular measurement noise (on both x, y and θ with standard deviation 0.4). These limitations are chosen to mimic those of experimental data. Note that we assume that the identity of the particles can be tracked along the trajectory. Symmetrization of the forces. Each of the 25 particles being characterized by three degrees of freedom, the phasespace of this system is 75-dimensional, making any "brute-force" approximation of the force field in phase space hopeless: even a simple form such a linear polynomial (which would be a terrible approximation of Eq. H10) would have 5700 variables. Here we propose to use a more subtle projection basis, making use of the invariance of the force field when exchanging two particles. More precisely, instead of using a projection basis b α ({x i } i=1..N particles ) that depends on each phase space coordinate in an explicit way, we will project on symmetrized functions b α (x i , {x j } j =i ) that consider the interaction between one particle i and all others, regardless of the identity of i. The projected force field thus consists in an approximation of the force on any particle i as where, crucially, the projection coefficient F µα and the projector c α are independent of the identity of i. This drastically reduces the number of degrees of freedom of our approximation: now the data on each particle contributes to the inference of the same coefficients F µα , and thus a large number of particles actually facilitates force inference. These additional symmetry constraints on the projection do not fit strictly speaking in the framework developed in the rest of this article. Specifically, the orthonormalization of the projector is now performed with an additional average over all particles: and all integrals are adapted accordingly; for instance, the Itô integral for the force projection now readŝ with ∆t the time step.
Choice of the basis. So far, we have only use the indiscernibility of the particles, without any assumption on the nature of their interactions: Eq. H11 is completely generic, and could in principle approximate any type of interactions -provided that the choice of projection basis is adapted. For instance, a natural choice would be to expand the interaction in single-particle terms (i.e. external fields), pair interactions, and possibly higher orders, as where c (1) , c (2) , c (3) . . . are the respective projectors onto the space spanned by the 1-, 2-and 3-body interaction terms in the basis. It is important to note that these projectors should be orthonormalized as a whole, either hierarchically (through the Gram-Schmidt process, for instance by orthonormalizing the 1-body term, then the 2-body term with respect to itself and the 1-body term, etc.) or in a single step as in Eq. H12, but with the index α now understood as comprising all terms in the expansion. Let us also note that while polynomials constitute a natural "default" basis for generic processes in an unstructured phase space, no such natural choice exist for the interaction terms. Symmetries can serve as a guide: for instance, for radially/spherically symmetric particles the magnitude of the pair interaction should depend on the distance r ij between particles. The use of such symmetries warrants some caution: indeed, the choice of projection basis should be compatible with these symmetries. For instance, for radial symmetry, the basis b = {(x i , x j ) → r n ij } n=0,1,2... , i.e. polynomials in the distance between particles, is not adapted. Indeed, a force written as a linear combination of these functions would transform as a scalar under rotations, not as a vector. Instead, b = {(x i , x j ) → r ij,µ r n−1 ij } µ=1..d,n=0,1,2... would be adapted. This does not constrain the force to be invariant under rotation, but allows it. Finally, let us note that while this choice is fine, it is not great: indeed, polynomials in r put most of their weight in the far-field, i.e. in interaction between far-away particles: SFI will thus put most weight on capturing the tail of the interaction. In most cases, interactions decay with distance, and it is more interesting to capture the details of the interaction forces between nearby particles. For this reason, decaying functions of r, such as inverse power-laws or exponentials, are better adapted. We finally note that non-power-law functions typically have a characteristic scale, or shape parameters. These parameters are not optimized upon by SFI, which only fits the signal as a linear combination of the basis functions: the outcome will thus depend on the choice of parameter. While such shape parameters could in principle be optimized upon (for instance to maximize the inferred information captured by SFI), we find that in practice it is simpler, both computationally and analytically, to improve the precision of SFI by expanding the basis than by performing such shape parameter optimization. We leave this possibility open for future work.
Motivated by these considerations, in practice, our choice of basis for Figure  where we choose r 0 = 2, corresponding to half the first peak in the radial distribution function. The outcome of SFI is not significantly affected by small changes in the number of functions or their shape. 7. One-dimensional ratchet process ( Figure 8A-D) Figure 8 of the main text deals with the case of Brownian dynamics with multiplicative noise, i.e. with a spacedependent diffusion tensor. Panels A-D treat a minimal example of it: a 1D ratchet process, where an out-ofequilibrium current is driven by the combination of a periodic space-dependent diffusion coefficient and a periodic force, such that the fluctuation-dissipation relation is not satisfied for a unique temperature. This model falls within the class described by Buttiker [48] and Landauer [49]. Specifically, we consider a process on the segment [0, 1], with periodic boundary conditions. The dynamics is described by Eq. H2, with F (x) = F 0 cos(2πx) and D(x) = D 0 + a cos(2πx) where we choose F 0 = −2, D 0 = 1, a = 0.5, and the discretization step is ∆t = 0.005. The trajectory presented in Fig. 8A has 10,000 steps. In Fig 8B-C we perform SFI on the trajectory in A, using an adapted basis, with b = {1, cos(2πx), sin(2πx)}, for both the diffusion and the force. In Fig 8D we present the convergence of the inferred fields as a function of the trajectory duration for n = 32 repeats.
8. Minimal 2D model with diffusion gradient (Figure 8E-H) We next consider a minimal 2D equilibrium model with inhomogeneous diffusion: an Ornstein-Uhlenbeck process with a constant gradient of isotropic diffusion coefficient. Specifically, we choose the following form for the space-dependent diffusion tensor: D µν (x) = (1 + a ρ x ρ )δ µν with a = 0.25 0 (H17) and the following force field: corresponding to a potential well with energy E(x) = x 2 /2 and a space-dependent mobility matrix equal to the diffusion tensor D µν (x) (i.e. the system obeys the Einstein relation with k B T = 1). This choice ensures that the probability distribution function of the process is unaffected by the inhomogeneity of D(x). We simulate this process using the discretized version of Eq. 8 of the main text, with ∆t = dt = 0.02. The trajectory showed in panel 4A and analyzed in panels B and C has length n steps = 4096. The blue symbols in panel 4D show the convergence of the diffusion estimator with increasing trajectory length N steps = 2 4 . . . 2 15 . The green and orange symbols correspond to the same data, with added measurement noise with amplitude 0.075. 9. Reconstruction of the drift and diffusion field for a complex 2D process (Figure 9) In our last Figure, we present a comparison of SFI with two pre-existing methods, grid binning and InferenceMAP. To this end, we simulate a model designed to mimic the diffusion of single molecules in a complex cellular environment. To allow for quantitative comparison with the other methods, we consider here the inference of the drift field, rather than the physical force, and an isotropic space-dependent diffusion tensor. The diffusion coefficient is constructed as the ratio of two second-order polynomials in the coordinates, with randomly generated coefficients. The drift field is chosen as the sum of an overall harmonic trap with constant torque, three attractive Gaussian traps in a triangle, and a repulsive one at the center. Typical scales are Φ ∼ 1, D ∼ 1, the spatial extent of the process is ∼ 4, and we choose a time step ∆t = 0.01. We consider two types of input signal: exact data, and noisy data where each coordinate is blurred by a Gaussian white noise of amplitude 0.1 (represented as a red dot in Fig. 9B).
In single molecule contexts, the total duration of a trajectory is typically limited by photobleaching: the exploration of a cellular environment is only possible by accumulating many such tracks. To reproduce this fact, we use N = 4 . . . 10 4 independently generated finite-duration trajectories with 100 time steps (four of which are depicted in Fig. 9B), each starting at steady-state. Each individual track contains, on average, an information gain of I = 2.8 bits about the drift field. We study the convergence of each method to the true drift and diffusion fields as N → ∞. The performance of drift and diffusion inference are assessed as the mean-squared-error between exact and inferred fields along the trajectory, normalized by the mean squared inferred value. We now detail the parameters employed for each of the three methods.

Stochastic Force Inference.
We employ a Fourier basis over a window spanning 1.1× the total process extent for both diffusion and drift inference. The order n = 1 . . . 9 of the basis is adapted to the number N of trajectories, as n = log(N ) . We employ noise-free estimators for the exact signal, and noise-corrected estimators (Eq. 13 and Eq. 14) for noisy data.

Maximum-likelihood grid binning.
The principle of this method is simple: decomposing the phase space as a regular grid, and inferring a constant drift vector and diffusion coefficient in each bin using maximum-likelihood estimators. The estimators are: where the sum runs over all N (x) data points that are inside the bin x. We use an adaptive grid size with n = N steps bins (width and height √ n), where N steps is the total number of time points in all trajectories in the data. This ensures that both the spatial resolution and the accuracy of inference in each bin increase with the amount of data.
This method, or slight variants of it, is used in a large number of contexts [21,31], and also often adapted to infer phase space velocities [11,13,16]. With ideal data, we find that it performs reasonably well and converges to exact values, although not as fast as SFI. With noisy data, it becomes biased and does not converge.

InferenceMAP.
The last method we compare to is InferenceMAP, a Bayesian method relying on space discretization, introduced by Beheiry and Masson [23]. This method is commonly used for the analysis of trajectories of single molecules inside cells [6,7,22]. We use the public implementation of this software. Upon trying many different parameters, we find that the best results are obtained with a square mesh, with maximum mesh size (the software adapts it to the amount of data), and the (D,drift) inference option. We manually provide the amplitude of the measurement noise (0 or 0.1). Typical outcome of the method is presented on Fig. 10. The performance on the inference of D is slightly lower to that of SFI; it significantly outperforms grid binning in the presence of measurement noise. However, we find that the performance on drift inference does not exceed that of grid binning, and our method significantly outperforms InferenceMAP. This is demonstrated quantitatively in Fig. 9, and on an example data set in Fig. 10. Figure 10. Comparison of the performance of the three methods on a set of N = 128 noisy trajectories. A. The trajectories in the raw data set exploited by each method. B-D. The inferred drift field (thick light blue arrows) and the exact one used to generate the data (thin red arrows). The lower left insets show scatter plots of the inferred versus exact drift components, with an indication of the normalized mean-squared error (MSE). Top right inset of B: screen capture of the InferenceMAP software used for this drift field. While methods B and C capture the qualitative shape of the drift field, they appear biased, and consistently overestimate the drift for noisy data. In contrast, our method shows quantitative agreement with the exact drift field.