Optimal power extraction from active particles with hidden states

We identify generic protocols achieving optimal power extraction from a single active particle subject to continuous feedback control under the assumption that the instantaneous net velocity, but not the fluctuating contribution originating from the self-propulsion, is accessible to direct observation. Our Bayesian approach draws on the Onsager-Machlup path integral formalism and is exemplified in the cases of free run-and-tumble and active Ornstein-Uhlenbeck dynamics in one dimension. Such optimal protocols extract positive work even in models characterised by time-symmetric positional trajectories and thus vanishing informational entropy production rates. We argue that the theoretical bounds derived in this work are those against which the performance of realistic active matter engines should be compared.

Macroscopic living creatures such as horses and oxen have been used by humans for millennia to do useful work.A modern question of theoretical and practical interest is the extent to which energy can be efficiently harvested from microscopic active systems [1][2][3][4][5][6], whose motion is subject to non-negligible noise.The efficiency of existing many-particle microscopic active matter engines, such as turbines driven by the persistent motion of E. coli bacteria in solution [7][8][9], is heavily limited by the difficulty of rectifying the incoherent motion of collections of individual swimmers with weak alignment interactions in the bulk.However, even under idealised conditions where individual active particles can be manipulated independently, strict upper bounds on extractable power are not well understood, particularly when only a subset of the observables characterising active motion are accessible to direct observation at operational resolution [10][11][12].In the following, we present a generic framework for the identification of protocols achieving optimal power extraction from a single active particle under continuous feedback control with the assumption that the instantaneous net velocity, ẋ(t), but not the fluctuating contribution originating from the self-propulsion, w(t), is observable.This is typically the case for realistic active matter engines [1,7].Our Bayesian approach draws on the Onsager-Machlup path integral formalism [13], and is illustrated in the cases of free runand-tumble (RnT) [14] and active Ornstein-Uhlenbeck (AOU) [15] dynamics in one dimension.Both models are characterised by time-symmetric positional trajectories (SM Sec.SI) and thus vanishing informational entropy production rates (iEPR) [16,17], defined as the Kullback-Leibler divergence [18] per unit time of the ensemble of forward paths and their time-reversed counterparts [19,20].In the Markovian case, where all degrees of freedom are observable, the iEPR is proportional to the thermodynamic dissipation and thus provides a (loose) upper * luca.cocconi@crick.ac.uk bound to the extractable power.This relation fails to apply in the presence of hidden states [12,21,22].Indeed, we show here that positive average power extraction remains possible even for vanishing iEPR upon Bayesian inference of the hidden state (cf.[23], where it is argued that vanishing local iEPR implies zero extractable work).Measurement-driven protocols of the type we discuss in the following incur a thermodynamic maintenance cost [24,25].However, since they operate on active particles, they are not constrained by Landauer's principle in the same way as equilibrium information engines [6,26].
Definition of the optimal protocol -Consider the overdamped Langevin equation for a generic active particle ẋ(t) = w(t) + γ −1 F ext (t) + √ 2D x ξ(t), where ξ(t) is a white noise of unit covariance with associated diffusivity D x and γ denotes the viscosity.We henceforth work in units whereby γ = 1.Here, w(t) is a stochastic selfpropulsion velocity, which for the time being we take to be known to an external observer tasked with controlling the applied force F ext (t).We will subsequently refer to F ext (t) as "the protocol".The noise-averaged total work extracted from the active particle over a time window of duration T is given by the integral Above and henceforth, E φ [•] is used to denote an average with respect to the steady-state distribution of the random variable φ.The integrand, which corresponds to the instantaneous power output, can be maximised at each time t by applying the protocol F * ext (t) = −w(t)/2.The corresponding steady-state average power output is lim where w ≡ E w [w(t)] and we have invoked ergodicity to convert time averages to ensemble averages.The aver-age power is smaller than the thermodynamic dissipation at F ext = 0, given by D x Ṡi = E w [w 2 (t)] [27,28], demonstrating that the entropy production rate Ṡi provides only a loose upper bound to the extractable power at low Reynolds number, due to the unavoidability of viscous effects when ẋ(t) = 0. We will henceforth refer to protocols F * ext (t) achieving the maximum average power output allowed under a particular set of constraints as optimal.
Consider now the case where the underlying dynamics of the active particle, in the form of the governing Langevin equation, are known but the instantaneous selfpropulsion velocity w(t) is not accessible to direct observation, i.e. it is a hidden variable.Let P(w(T ) = v|{ ẋ} T 0 ) denote the posterior probability density that the instantaneous self-propulsion velocity of the active particle at current time T equals v given that a particular spatial trajectory { ẋ} T 0 has been observed.The expected total work extracted during a time window of duration T can be expressed as the following functional of the generic protocol F ext (t), The optimal protocol whence where E w [w(T )|{ ẋ} T 0 ] denotes the posterior expectation of the self-propulsion velocity with respect to P(v|{ ẋ} T 0 ).This is not to be confused with the expectation of w(T ) taken with respect to the corresponding prior probability P(v) = Dx P(v|{ ẋ} T 0 )P({ ẋ} T 0 ), which we denoted w and assume to be independent of T .Substituting the optimal force into the expression for the instantaneous power output, the integrand in Eq. ( 3) gives cf. Eq. (2).We expect the average power output to approach w2 /4 from above as D x → ∞ since, in this limit, signatures of the self-propulsion fluctuations around the mean are "washed out" by the additive noise.In the following, we take w = 0 to focus on the non-trivial term appearing on the right-hand side of Eq. ( 6).Fig. 1 summarises the feedback control described above.
FIG. 1. Optimal power extraction from an active particle (here visualised as a bacterium) with hidden self-propulsion velocity is achieved by subjecting the latter to continuous feedback control, whereby the magnitude and direction of the protocol Fext(t) are modulated to match the inferred selfpropulsion velocity.
Warm-up: The run-and-tumble particle -We have reduced the problem of identifying the optimal protocol to the evaluation of the posterior expectation E w [w(T )|{ ẋ} T 0 ], Eq. ( 5).Now we proceed to show how this can be done for the case of RnT motion in one dimension, ẋ(t) = νw(t) + F ext (t) + √ 2D x ξ(t), whose binary internal self-propulsion mode w(t) constitutes the simplest example of a state-space amenable to non-trivial coarse graining.In particular, let w(t) ∈ {−1, 1} be a dimensionless dichotomous noise with symmetric transition rate α.We seek the posterior probability that the particle is a right self-propeller, w(T ) = +1, given its positional trajectory up to the current time T , which we denote P + (T ) = P[w(T ) = +1|{ ẋ} T 0 ] for compactness.The complementary probability is denoted P − (T ) = P[w(T ) = −1|{ ẋ} T 0 ].Defining the confidence parameter Q[{ ẋ} T 0 ] = log(P + (T )/P − (T )) and using P + (T ) + P − (T ) = 1, we can write The second term on the right-hand side is an even function of Q and Eq. ( 7) reduces to the prior probability P(w = ±1) = 1/2 when Q = 0. To calculate P + (T ) via Q we thus need to find an expression for the ratio of the conditional path probabilities.To do so, we first invoke Bayes' theorem, where we have used P[w(T ) = ±1] = 1/2.We can equivalently write We now introduce the notation for the average with respect to the distribution of w(t) path probabilities conditioned on w(T ) = ±1, which allows us to express the path probabilities in Eq. ( 9) as Finally, we invoke the Onsager-Machlup path integral form [13] of the conditional path probability in the Stratonovich discretisation where ẋc = ẋ − F ext denotes the velocity in the reference frame where the externally imposed drift is subtracted away.Substituting Eq. ( 12) into Eq.( 11), combining the resulting expressions with Eq. ( 9), and cancelling common w-independent factors appearing in the numerator and denominator, we eventually arrive at where we have also used w 2 (t) = 1 for all t ∈ [0, T ].To make further progress we exploit the identity between the logarithm of a moment-generating function and its cumulant-generating function [29,30], as well as the parity of the cumulants (see SM Sec.SII).This leads to with Péclet number Pe = ν 2 /(D x α) and where the superscript c in expectations, e.g.− → • c , denotes the corresponding cumulant (connected moment).Substituting Eq. ( 14) into Eq.( 7), combined with Eq. ( 5), returns the optimal protocol.
Computing the right-hand side of Eq. ( 14) is unfeasible in general.However, Q[{ ẋ} T 0 ] can be computed analytically in the low-Pe asymptotic regime.To leading order in Pe 1, only the first cumulant (16) In order to conveniently apply the optimal protocol under continuous feedback control, we can differentiate Eq. ( 16) with respect to T and use the Leibniz integration rule (assuming ẋc (t) = 0 for t < 0) to obtain a differential equation for the time evolution of Q, i.e.Q(T ) = ν ẋc (T )/D x − 2αQ(T ).Remarkably, upon substituting for ẋc and rescaling time by the switching rate, T = αT , the Langevin equation for Q(T ) reads like that of a RnT particle in a harmonic potential with selfpropulsion speed and diffusivity both equal to the Péclet number, i.e.
We now proceed to make the connection with the rate of work extraction.First of all, we have by combining Eq. ( 5) and Eq. ( 7) that the optimal protocol is given to leading order in Q ∼ Pe by When the optimal protocol is applied at all times, the resulting noise-averaged power output, Eq. ( 6), is given by Taking a further expectation with respect to the dichotomous noise w(t) and exploiting the mapping of the Q-dynamics onto those of a RnT particle in a harmonic potential, Eq. (17), whence E ξ,w [Q 2 ] = (1 + Pe/4)Pe/2 [14], we eventually arrive at which constitutes a tight upper bound to the average extractable power from a RnT particle with hidden selfpropulsion velocity in the low-Pe regime.Higher moments of the fluctuating power output under F * ext can be computed similarly, see SM Sec.SIII.
A boundary-update protocol -We further introduce an independent approach to computing the posterior probability P + (T ) in real time.This novel "boundaryupdate" protocol, described in full detail in SM Sec.SV, both saturates the bound (18) and is conjectured to achieve optimality for all Pe.It draws on the conditional splitting probabilities of the RnT process, which, to the best of our knowledge, we compute here for the first time.These are the probabilities that a particle initialised at x 0 ∈ [−L/2, L/2] in a given statistical superposition of internal states exits said interval through either the left or right boundary in either a left or right self-propulsion state.Knowledge of the splitting statistics is used in FIG. 2. Average power extracted from a RnT particle with hidden self-propulsion velocity upon application of the boundary-update protocol, the numerical implementation of which is discussed in detail in SM Sec.SV.The extractable power, which is positive for all Pe, asymptotically approaches that of a situation where the internal state is known, Eq. ( 2), as Pe → ∞ and is in excellent agreement with the theoretical bound in the low-Pe limit, Eq. ( 18).
combination with Bayes' theorem to update the posterior distribution of the internal state w(t) each time the particle is observed to undergo a net displacement larger than L/2 in the reference frame where the deterministic drift is subtracted away, ẋc = ẋ − F ext .In the limit L → 0, the posterior updating frequency diverges and we conjecture that optimal inference is achieved.Fig. 2 shows that application of the boundary-update approach indeed produces an average power output matching the bounds Eqs. ( 18) and ( 2) in the low-and high-Pe limits respectively.
A generic active particle -Having explored the particular case of RnT motion in some detail, we now expand our scope to a one-dimensional active particle with selfpropulsion velocity w(t) evolving according to a generic (discrete-or continuous-state) stochastic process [31].Following Eq. ( 5), the identification of the optimal protocol requires us to compute the posterior expectation of the self-propulsion velocity, which can be conveniently expressed as with Pe = σ 2 w /(µD x ), σ 2 w = E w [w 2 ], and µ a characteristic inverse timescale associated with the self-propulsion dynamics.We have also invoked Bayes' theorem to write where, similarly to Eq. ( 10), and have used the normalisation condition 1 = v P[w(T ) = v|{ ẋ} T 0 ] to divide by a factor of unity throughout, producing the same type of cancellations of v-independent terms that were observed in the RnT case.We can rewrite Eq. ( 19) in a compact form as by introducing the cumulant-generating function If no further assumptions can be made regarding the process w(t), one can now truncate the sum and substitute the resulting expression into Eq.( 22) to obtain, by invoking Eq. ( 5) and recalling E w [w] = 0, the optimal protocol in the asymptotic case Pe 1, The form of Eq. ( 24) matches the RnT result, Eq. ( 16), except for the appearance of a term depending on the second-order cumulant w 2 (t) , which was absent in the RnT case due to the norm of the self-propulsion velocity being constant.This approach generalises straightforwardly to higher dimensions.
In SM Sec.SIV, we apply the general result obtained above to the specific case of a one-dimensional AOU process, the simplest canonical active particle model with a continuous self-propulsion state [15].There, the dynamics of w(t) are captured by the equilibrium linear Langevin equation ẇ(t) = −µw(t) + √ 2D w η(t), with η(t) a zero-mean, unit variance white noise with diffusivity D w , whence σ 2 w = D w /µ.We find that the average extractable power from an AOU particle with hidden self-propulsion velocity in the low-Pe asymptote is bound above by and further compute the second moment of the power output distribution (SM Sec.SIII).
Langevin dynamics: high-Pe asymptotics -When the dynamics of w(t) are described by a Langevin process, Eq. ( 19) also allows us to explore the high-Pe asymptote through a saddle-point expansion.For the particular case of the AOU process, we can write, using the Onsager-Machlup form of P with the action-like functional which combines a "potential" term (prefactor Pe), penalising departures from w(t) = ẋc , and a "kinetic" term (unit prefactor) penalising changes in w(t) that are exceedingly fast/slow compared to the characteristic inverse timescale µ of the self-propulsion dynamics.Even at high Pe, the second term cannot be ignored since the boundary condition w(T ) = v in general prevents w(t) = ẋc (t) from being an accessible trajectory for the functional integral.We define w * (t; v) as the path that minimises Eq. ( 27), δN [w]/δw| w * = 0, whence with m = 1/Pe and boundary condition w * (T ) = v.
Eq. ( 28) is purposefully arranged to resemble the Newtonian dynamics of a particle of mass m in an unstable, time-dependent harmonic potential Remarkably, the high-Pe limit corresponds to the overdamped limit of Eq. ( 28), whereby m → 0 and the potential term dominates.For m 1, Eq. ( 28) is solved by combining an exponential ansatz with the particular solution w Noting that the second functional derivative of N is independent of w, we perform a change of variables w(t) → δw(t) + w * (t; v) in the functional integral, Eq. ( 26), to rewrite Eq. ( 19) exactly as Substituting Eq. ( 29) into Eq.( 27) we thus have, to leading order in large Pe, which draws only on the potential term.Further substituting Eq. ( 31) into Eq.( 30) and performing all the resulting Gaussian integrals in closed form, we arrive at the following expression for the posterior expectation of the self-propulsion velocity at high Pe, In other words, the prior distribution P[w] weakly biases our posterior estimation E w [w(T )|{ ẋ} T 0 ] away from ẋc (T ) and towards the prior expectation E w [w(T )] = 0. Using Eq. ( 6), the high-Pe asymptotic average power output, having applied the optimal protocol, is thus given by Conclusion -We have identified generic continuous feedback protocols achieving maximum average power extraction from active particles with a (zero-mean) hidden self-propulsion state.These optimal protocols can be written in closed form in the asymptotes Pe 1 and Pe 1, and provide upper bounds to the average extractable work by any such protocol (cf.[6]), e.g.Eqs. ( 18), ( 25) and (33).These bounds are those against which the performance of autonomous active matter engines, which typically do not have access to the selfpropulsion states of the individual constituent particles [1,2], should be compared.Furthermore, our "boundaryupdate" approach enables work extraction when realtime particle tracking is unfeasible, since only the detection of first-passage events is required for its implementation.Extending this approach to continuous-state self-propulsion dynamics, e.g. the AOU process, remains an open challenge.
The optimal protocol is generally non-Markovian.However, this difficulty can be circumvented at Pe 1 by embedding the dynamics in a higher dimensional phase space [32], e.g.via the auxiliary dynamics in Eq. ( 17).Analogously to equilibrium information engines [6,26], the thermodynamic cost of operating the feedback control can be identified with the increase in the total entropy production rate upon expanding the phase space to include such auxiliary variables.In an idealised situation where the operating temperature of the measurement device is arbitrary, and can thus be chosen to be arbitrarily small, the associated dissipation is negligible [26].The unique utility of information engines operating on active particles arises from their non-vanishing efficiency even when the measurement device and the particle are coupled to the same heat bath [6].Future work will characterise the efficiency of the optimal protocols in this case.
Yuning Chen for contributing to the schematic in Fig. 1.L.C. acknowledges support from the Francis Crick Institute, which receives its core funding from Cancer Research UK, the UK Medical Research Council, and the Wellcome Trust (FC001317).J.K. and C.R. acknowledge support from the Engineering and Physical Sciences Research Council (grant numbers 2620369 and 2478322 respectively).Consider a motile active particle with dynamics governed by the Langevin equation ẋ(t) = w(t) + √ 2D x η(t), where η is a unit covariance white noise, D x the spatial diffusivity and w the fluctuating self-propulsion velocity which we take to be inaccessible to direct observation, i.e. to constitute a hidden state.In this supplementary section, we show that, when w(t) is a parity-time (PT) symmetric stochastic process, the ensuing x-dynamics are time-reversal symmetric.This result applies to the particular cases of zero-mean-drift RnT and AOU dynamics discussed in the main text, where w(t) is given by a telegraph process and an Ornstein-Uhlenbeck process respectively, among others [31].Both of these are separately parity and time (and thus PT) symmetric.As a measure of time-reversal symmetry, we take the Kullback-Leibler divergence [18] of the ensemble of forward paths and their time-reversed counterparts, which also defines the average total informational entropy produced along a fluctuating trajectory [16,17], Here, P[{ ẋ} T 0 ] denotes the probability density of a particular positional path and ẋR (t) = − ẋ(T − t) denotes the time-reversed velocity, with the negative sign originating from odd parity of velocities under this transformation [27].Using the law of total probability combined with the Onsager-Machlup [13] form of the positional path probability in the Stratonovich discretisation, we find and similarly Assuming PT symmetry of the w-statistics is equivalent to requiring that P[{w} where we have used Combining Eqs.(S2) and (S4) with Eq. (S1), we find ∆S[{ ẋ} T 0 ] = 0, confirming the positional dynamics of the active particle are indeed time-reversal symmetric.It is worth restating that PT symmetry is a weaker constraint on the w-statistics compared to requiring that symmetry under parity reversal (P[{w} T 0 ] = P[{−w} T 0 ]) and time reversal (P[{w} T 0 ] = P[{w R } T 0 ]) are separately satisfied.The simplest example of a stochastic process that is PT symmetric, but not P or T symmetric, is the zero-mean three-state Markov process taking values w ∈ {−1, 0, 1} with transition rates k 0 where k i,j denotes the probability per unit time to transition from state j into state i and α = β are positive real numbers.

SII. CONDITIONAL CORRELATIONS OF THE SYMMETRIC TELEGRAPH PROCESS
In this supplementary section, we derive the n-time correlation functions for a normalised telegraph process with symmetric transition rate α between states w ∈ {−1, +1}, conditioned on the process achieving a given state at some final time T (cf. the unconditional correlations derived in Appendix A of Ref. [33]).In particular, given an ordered set of measurement times 0 < t 1 < t 2 < ... < t n ≤ T , we denote where, in the second equality of both Eqs.(S6a) and (S6b), we have used the invariance of the symmetric telegraph process statistics under time reversal to express the expectations in terms of conditioning on some initial state w(0) of a trajectory of duration T .Let us now introduce the transition probabilities where P i,j (t) denotes the probability of a telegraph process initialised in state j to be found in state i after time t.

SIII. MOMENTS OF POSITION FOR ACTIVE PARTICLES IN HARMONIC POTENTIALS
In this supplementary section we derive simple recurrence formulas for the moments of the position of RnT and AOU particles in harmonic potentials.The relevance of these results for the present work originates from the mapping we identified in the main text, via Eq.( 16), between the dynamics of the fluctuating extracted power from a RnT particle upon application of the optimal protocol F * ext (t) = −E w [w(T )|{ ẋ} T 0 ]/2 and those of the position of such a harmonically-bound active particle.We will see that a similar mapping applies to the AOU particle, see Eq. (S35) below.While the exact form of the full probability densities are typically quite cumbersome (see, for example, Ref. [14]), these recurrence formulas allow us to easily extract key characteristics such as the (positive) mean power output and the associated coefficient of variation, defined as the standard deviation normalised to the mean.

A. RnT particle
We now derive the recurrence formulas for the steady-state positional moments of an RnT particle, which draw on the steady-state positional moments conditioned on the particle being in either of its accessible self-propelling modes.We arbitrarily choose to calculate the positional moments of the right-moving state (w(t) = +ν), since those of the left-moving state (w(t) = −ν) can be obtained by symmetry, assuming as we do here that the switching rates are symmetric.We start from the coupled Fokker-Planck equations for the joint probability density P +/− (x, t) to observe a particle in the right/left-moving state at position x at time t, where κ denotes the stiffness of the harmonic potential and we retain all other definitions of symbols from the main text.Since, at steady-state, P + (x, t) = P − (−x, t) by symmetry, the final term in Eq. ( S17a) is odd.It can be written equivalently as α (P − − P + ) = α (P − 2P + ) where the total density P = P + + P − is an even function of x.At steady-state ∂ t P + = ∂ t P − = 0, and after operating on Eq. (S17a) with dx x n , performing all integrations by parts, and exploiting the even/odd symmetry of the final term, we eventually arrive at where we have defined the steady-state conditional moments E + x [x n ] ≡ lim t→∞ dx x n P + (x, t) and used that all boundary terms vanish due to lim x→±∞ P + (x, t) = 0. Eq. (S18) can be rearranged into the form of a recursive relation for the n th conditional moment, Using Eq. (S19), the normalisation condition E + x [x 0 ] = dx P + (x) = 1/2 readily gives rise to the n = 1 moment, E + x [x] = ν/(2(κ + 2α)) which, in turn, can be used to obtain the n = 2 moment, E + x [x 2 ] = D x /2κ + ν 2 /(2κ(κ + 2α)), in agreement with Ref. [14], with care being taken to adjust for the different definition of the 'tumble' rate α used in that work.In fact, closed-form expressions for all higher-order conditional moments can be obtained in a similar manner, since any given conditional moment depends only on the two lower-order moments immediately preceding it.
The steady-state conditional moments for the left-moving state, denoted E − x [x n ] ≡ lim t→∞ dx x n P − (x, t), and those of the total density P are readily obtained from the symmetry relation E respectively.It is thus straightforward to obtain the expression for the variance of the particle position, As was shown in the main text, in the asymptotic regime Pe 1, the dynamics of the fluctuating power extracted from an RnT process with hidden self-propulsion state w subject to the optimal protocol F * ext are identical to those of the square of the position of an RnT particle in a harmonic potential.There, we then used our result for the second moment of the latter, Eq. (S20), to compute the average power output, which defines the upper bound to the average extractable work in this regime.Remarkably, the same correspondence can be exploited to relate the fourth moment of the particle position, to the second moment of the power output distribution upon application of the optimal protocol F * ext , which is found to be to leading order in small Pe.Correspondingly, the coefficient of variation in this regime is indicating that, even under optimal control, fluctuations often lead to transients of negative power extaction.Eq. (S21) correctly reduces to the Gaussian kurtosis E x [x 4 ]/(E x [x 2 ]) 2 = 3 in the limit α → ∞, where the self-propulsion becomes dynamically (although not thermodynamically [28,34]) irrelevant.

B. AOU particle
We now discuss the derivation of the moments of the positional probability density for an active Ornstein-Uhlenbeck process [15] in a harmonic potential.First, we calculate the second moment of the particle position x(t), which is used in Section SIV below to derive an upper bound to the average extractable work in the asymptotic regime Pe 1, Eq. (S37).We start from the Fokker-Planck equation for the joint probability density P(x, w, t) that the particle is at position x exhibiting an instantaneous self-propulsion velocity w at time t, where κ denotes the stiffness of the harmonic potential acting on the position x and µ that of the harmonic potential acting on the self-propulsion velocity w.At steady state, ∂ t P(x, w, t) = 0, we multiply the right-hand side of Eq. (S24) by x 2 and xw and integrate with respect to both x and w to obtain, respectively, Combining the equations above with the known second moment of the standard Ornstein-Uhlenbeck process [35], E x,w [w 2 ] = D w /µ, we eventually arrive at the desired result Eq. ( S26) is consistent with the variance of x reducing to that of a standard OU process in the equilibrium limit D w = 0, where E x,w [x 2 ] = D x /κ.A similar procedure can be followed to calculate higher-order moments of x given that the moments E x,w [w k ] are known from the literature on the OU process [35], In particular, we start from the Fokker-Planck equation (S24) at steady state and multiply by x n w m (n, m ∈ N) before integrating with respect to both variables to obtain As shown in SM Sec.SIV below, in the asymptotic regime Pe 1, the dynamics of the fluctuating power extracted from an AOU process with hidden self-propulsion state w subject to the optimal protocol F * ext are identical to those of the square of the position of an AOU particle in a harmonic potential.There, we use our result for the second moment of the latter, Eq. (S26), to compute the average power output, which defines the upper bound to the average extractable work in this regime, Eq. (S37).The same correspondence allows us to compute the second moment of the power output through Eq. (S29) to leading order in small Pe, The coefficient of variation in this regime is thus ) i.e. the same as for the RnT case in the previous section, Eq. (S23).

SIV. OPTIMAL PROTOCOL FOR AN AOU PARTICLE AT Pe 1
In this supplementary section, we illustrate the general result for the optimal protocol at low Pe obtained in the main text, Eq. ( 24), by applying it to the specific case of a one-dimensional AOU process, the simplest canonical active particle model with a continuous self-propulsion state [15].The dynamics of the self-propulsion velocity w(t) are captured by the equilibrium linear Langevin equation ẇ(t) = −µw(t) + √ 2D w η(t), with η(t) a zero-mean, unit variance white noise with diffusivity D w .The prior probability density is thus Boltzmann, P(w) ∝ exp(−µw 2 /(2D w )), whence σ 2 w = D w /µ.Due to w being a Gaussian process we have that 1) [{ ẋ} T 0 ], i.e. conditional cumulants are, at most, linear in v [35].Eq. ( 22) thus reduces to where H AOU (z) = log E w [e wz ] = σ 2 w z 2 /2 is the known cumulant-generating function of the fluctuating self-propulsion velocity prior probability density P(w).Combining Eqs. ( 5) and (S32), we find the optimal protocol and thus At this point, the only outstanding challenge is to compute the dimensionless functional L (1) [{ ẋ} T 0 ].To leading order in Pe 1, where we have used w(t) = v exp(−2µ(T − t)).The same result can be obtained starting from Eq. ( 24), noticing the second conditional cumulant of the OU process, w 2 (t) , is independent of v.By rescaling T = µT as in the RnT case, Eq. ( 17), and taking ẋc (t) = 0 for t < 0, we can recast Eq. (S35) into a differential equation for L (1) (T ) mirroring that of an AOU particle in a harmonic potential, ) This mapping allows us to extract a tight upper bound to the average extractable power from an AOU particle with hidden self-propulsion velocity in the low-Pe asymptote, SV. BOUNDARY-UPDATE PROTOCOL A. Derivation of conditional splitting probabilities for a run-and-tumble particle In this supplementary section, we derive the conditional splitting probabilities that are the foundation of the numerical "boundary-update" protocol conjectured to achieve optimal power extraction from a RnT particle for all Pe.This generalises the known results for the (unconditional) splitting probabilities for a run-and-tumble particle [36].
Consider a run-and-tumble process governed by the Langevin equation ẋ(t) = νw(t) + √ 2D x ξ(t), where the normalised self-propulsion state w ∈ {−1, 1} follows a symmetric telegraph process with switching rate α.We say that the particle is in a right-moving (left-moving) state at time t if w(t) = +1 (w(t) = −1).Let Π s2 s1L/2 (x 0 , s 3 ) with s 1,2,3 ∈ {−1, 1} denote the probability for a particle to exit through the boundary at x = s 1 L/2 in the w = s 2 state given it was initialised at position x 0 ∈ [−L/2, L/2] in the w = s 3 state.For example, Π − −L/2 (x 0 , +) denotes the probability for a particle to exit through the left-hand boundary in the left-moving state given it was initialised at position x 0 ∈ [−L/2, L/2] in the right-moving state.Four of the eight combinations of s 1,2,3 can be readily obtained from the remaining ones by exploiting the symmetries Π ± L/2 (x 0 , +) = Π ∓ −L/2 (−x 0 , −) and Π ± L/2 (x 0 , −) = Π ∓ −L/2 (−x 0 , +).As such, we will calculate only the splitting probabilities to exit at the left-hand boundary, x = −L/2.To ease notation, we define π ± (x 0 , ±) ≡ Π ± −L/2 (x 0 , ±).We first derive the governing ODEs for the splitting probabilities by starting from the microscopic description on a lattice and taking the continuum limit [37].On a lattice of spacing δ, a particle undergoes one of three processes at each time step, 1. hopping to the right-adjacent site, x + δ, with rate h r , 2. hopping to the left-adjacent site, x − δ, with rate h , 3. switching internal state with rate α, while remaining at the current site, x.
Whence, for a particle leaving through x = −L/2 in the w = −1 state, we have the identities where, for instance, h + is the rate for a right-moving particle to hop to the left such that h + /(h + r + h + + α) is the transition probability for a right-moving particle to hop to the left.Expanding each term on the right-hand side of In this supplementary section we describe the numerical implementation of the boundary-update protocol for the RnT dynamics discussed in the main text.The boundary-update protocol draws on Eq. ( 5) in the main text, which states that the optimal protocol F * ext (t) is proportional to the instantaneous posterior expectation of the hidden selfpropulsion velocity at time t given the entire observed trajectory of the velocity, ẋ(t ) for t ∈ [0, t].In this scheme, the posterior expectation is updated on the fly upon the RnT particle exiting an interval of interest using the conditional splitting probabilities in Eq. (S51).
In the following, p prior (t) and p post (t) denote the Bayesian prior and posterior probability respectively (with respect to the observation that the particle has exited the interval) that the particle is currently a right-mover.Whilst the particle is in the bulk of the interval, no additional information is collected to improve the Bayesian inference of the S28)for n, m ≥ 0, defining the falling factorial (k) 2 = k(k − 1) for compactness.Together with Eq. (S25), the system of equations (S28) can be solved to compute any desired moment E x,w [x n ].Here, only even moments are non-trivial since E x,w [x 2k+1 ] = 0 for all k ∈ N by symmetry.Similarly, we expectE x,w [x 2k+1 w 2 ] = E x,w [x 2k w 2 +1 ] = 0 with ∈ N.Combining the expressions of the type (S28) obtained from all choices of n and m such that n + m = 4 with Eqs.(S25)-(S27), we eventually find the Gaussian kurtosis

− 2 FIG
FIG. S1.Splitting probabilities, as a function of initialisation position x0, for a RnT particle with ν = 1, Dx = 0.1 and α = 2, to cross the left-hand boundary at x = −L/2, before crossing the right-hand boundary at x = L/2, if it exits as (a) a left mover, (b) a right mover, or (c) either a left mover or a right mover.In each subfigure, red (respectively, blue/black) lines indicate the initialisation of a right mover (respectively, left mover/equal superposition of a left mover and a right mover).Subfigure (c) corresponds to the (unconditional) splitting probabilities derived in Ref.[36].Monte-Carlo simulations (markers) were performed by numerically integrating the Langevin equation, ẋ = νw(t) + √ 2Dxξ(t), in timesteps of ∆t = 10 −5 to determine the proportion of times the particle exits through the left-hand boundary for 10 5 realisations at each x0 = 0, 0.01, . . ., L = 1.The theoretical results are in good agreement with simulations.