Quantum Fokker-Planck Master Equation for Continuous Feedback Control

Measurement and feedback control are essential features of quantum science, with applications ranging from quantum technology protocols to information-to-work conversion in quantum thermodynamics. Theoretical descriptions of feedback control are typically given in terms of stochastic equations requiring numerical solutions, or are limited to linear feedback protocols. Here we present a formalism for continuous quantum measurement and feedback, both linear and nonlinear. Our main result is a quantum Fokker-Planck master equation describing the joint dynamics of a quantum system and a detector with finite bandwidth. For fast measurements, we derive a Markovian master equation for the system alone, amenable to analytical treatment. We illustrate our formalism by investigating two basic information engines, one quantum and one classical.

The theoretical description of feedback control in quantum systems is typically based on stochastic differential equations [59][60][61][62][63][64][65][66][67][68][69][70] -powerful tools that can describe discrete as well as continuous feedback protocols. In general, these equations must be solved numerically, providing limited qualitative insight. An important exception, amenable to analytical treatment, is the Wiseman-Milburn equation [63], a Markovian master equation for * bjorn.annby-andersson@teorfys.lu.se continuous feedback protocols that depend linearly on the measured signal. However, optimal control often requires nonlinear protocols, for instance bang-bang control [71,72] which has promising thermodynamic applications in solid state architectures [27,[73][74][75]. For such continuous, nonlinear feedback protocols, no master equation description exists, emphasizing a need for further analytical tools. We stress that the word "nonlinear" here refers to the protocol's dependence on the measured signal, not to the system's dynamics.
In this letter, we satisfy this need by developing a general framework for continuous measurement and feedback control in quantum systems, able to provide analytical insight into nonlinear feedback protocols. Our main result, Eq. (1) below, is a quantum Fokker-Planck master equation describing the joint dynamics of a quantum system and a detector with finite bandwidth (see Fig. 1). This equation is applicable to any quantum or classical system undergoing continuous feedback control. For fast measurements, Eq. (1) reduces to a Markovian master equation for the system alone, generalizing the Wiseman-Milburn equation to nonlinear feedback protocols. The broad scope of Eq. (1) suggests that our results will impact a wide variety of topics where nonlinear, continuous feedback control can be applied, such as quantum error correction [1], entanglement generation [2], quantum state stabilization [6], Maxwell's demon [74,75] and machine learning [76].
To illustrate our formalism, we investigate two toy models, a classical and a quantum two-level system, operated via nonlinear feedback protocols. For the classical model, we also derive a fluctuation theorem, highlighting the role of continuous measurement and feedback in information thermodynamics.
Fokker-Planck master equation. A general setup for continuous measurement and feedback is depicted in Fig. 1. We consider an open quantum system whose dynamics, in the absence of measurement and feedback, are described by a Liouville superoperator L. A detector continuously measures a system observableÂ. The measurement strength λ determines the magnitude of the measurement backaction, the limit λ → 0 (λ → ∞) corresponds to a weak, non-intrusive (strong, projective) measurement preserving (destroying) the quantum coherence of the system. Weak measurements thus reduce backaction, but increase measurement uncertainty. To provide a realistic detector description, we consider a finite bandwidth γ, acting as a low-pass frequency filter, eliminating high frequency measurement noise at the cost of introducing a time delay scaling as 1/γ. Feedback control is incorporated by continuously feeding back the measurement outcome D into the system, controlling the system Liouville superoperator via L(D).
Our main result is the following deterministic Fokker-Planck master equation (derivation outlined below), describing the joint system-detector dynamics under continuous measurement and feedback control. The density operatorρ t (D) represents the joint state of system and detector, whereρ t ≡ dDρ t (D) is the system state for an unknown measurement outcome D, and P t (D) ≡ tr{ρ t (D)} defines the probability distribution of the measurement outcome D. Note that dDP t (D) = 1 and tr{ρ t } = 1, see Supplemental Material (SM) below. The first term on the RHS of Eq. (1) describes the feedback-controlled evolution of the system. This term allows for feedback protocols that are nonlinear in D. The second term, where D[Â]ρ =ÂρÂ − 1 2 {Â 2 ,ρ} (noteÂ † =Â) describes how the system is dephased in the eigenbasis ofÂ at a rate proportional to λ due to measurement backaction. The last two terms constitute a Fokker-Planck equation describing the detector time evolution. These terms define an Ornstein-Uhlenbeck process [77] with a system dependent superoperator drift coefficient A(D)ρ ≡ 1 2 {Â − D,ρ} and diffusion constant γ/8λ. This describes a noisy relaxation of the measurement outcome towards a value determined by the system state. The derivation of Eq. (1) is rather involved, see details in SM. The main text instead aims to highlight its implications and applications. However, we sketch the derivation at the end of the letter.
Equation (1) is, like most formalisms for continuous measurement and feedback, typically restricted to numerical solutions. However, when there exists a wide separation between the system and detector timescales, Eq. (1) simplifies to a Markovian master equation for the system stateρ t , allowing for analytical treatment. The detector timescale 1/γ appears in the last two terms in Eq. (1), and the system timescale 1/Γ is determined by L(D) + λD [Â]. The role of λ, the measurement strength, is subtle, see below. When γ Γ,ρ t evolves, to first order in 1/γ, according to with zeroth order Liouville superoperator L 0 and first order correction L corr . L 0 is obtained by approximating the system-detector density operator asρ and superoperators V aa ρ ≡ a|ρ|a |a a |, where we used the eigenvalues and eigenvectors of the measured operatorÂ = a ξ a |a a|. In this approximation, the detector is always in a system dependent stationary distribution π aa (D). This is justified for γ Γ, where changes of the system occur with a rate much smaller than the inverse detector relaxation time. Inserting this approximation in Eq. (1) results in L 0 = dDL(D) [ aa π aa (D)V aa ], describing the system dynamics for a detector with zero delay time. The first order correction γ −1 L corr accounts for the lag of the detector due to its finite response time γ −1 . As usual in linear response theory, this correction can be written in terms of time-integrated correlation functions -see SM. Note that λ plays a special role in the separation of timescales since it appears both in the first and second line of Eq. (1). In general, Eq. (2) is thus only justified for λ γ. Here we keep λ/γ arbitrary as there are scenarios where Eq. (2) also holds for strong measurements, see below.
We emphasize that Eq. (2) describes arbitrary feedback protocols, both linear and nonlinear in D. As a consistency check, we recover the Wiseman-Milburn equation [63] from Eq. (1) by employing the separation of timescales approximation to first order in 1/γ, using a linear feedback Liouville superoperator L(D)ρ = Lρ − iD[F ,ρ], with feedback HamiltonianF , and taking the infinite bandwidth limit (see SM). Our formalism thus generalizes the important earlier work of Ref. [63] to nonlinear feedback protocols.
In the following, we highlight the usefulness of Eq. (1) by studying protocols for power production in two toy models.
Classical toy model. By classical system, we refer to a situation with discrete energy levels, but where the density matrix remains diagonal in the energy basis at all times. This can be achieved either by suppressing quantum coherence by environmental noise, or by decoupling the diagonal and off-diagonal elements ofρ t (see SM for details). Under these conditions, [ρ t (D),Â] = 0 and the backaction term in Eq. (1) has no influence on the dynamics. To facilitate a comparison between the classical and quantum models, we use the same notation. We consider a classical two-level system, with states |0 and |1 , coupled to a thermal reservoir at temperature T , see inset of Fig. 2(a). The system and reservoir exchange energy quanta with energy ∆ at rate Γ. The state of the system is continuously monitored by measuring the observablê A =σ z , with Pauli-Z operatorσ z = |1 1| − |0 0|, such that whenever the measurement outcome D < 0 (D ≥ 0) for an ideal detector (low noise and delay), the system resides in |0 (|1 ). Feedback is incorporated by flipping the levels according to the solid arrows in Fig. 2(a) when an excitation is detected, i.e., when D changes sign, thereby extracting energy from the reservoir. The Hamiltonian is given byĤ where we introduced the feedback error probability η = [1 − erf(2 λ/γ)]/2 for a single feedback event, where erf(·) is the error function and 0 ≤ η ≤ 1/2. Feedback is applied incorrectly when the measurement outcome does not reflect the true system state. Note that, weak (strong) measurements yield high (low) detector noise and increase (decrease) the error probability.
To zeroth order in 1/γ, the average power production reads where P > 0 corresponds to extracting energy from the bath. For strong measurements (η → 0), feedback is consistently applied correctly and energy is only extracted from the reservoir. The maximum extraction rate P = Γ∆n B (∆) is limited by the coupling Γ and the average occupation n B (∆) of the bath. For weak measurements, feedback errors together with the asymmetry between excitation and de-excitation rates lead to a net dissipation of energy. Interestingly, the maximum dissipation rate P = −Γ∆/2 is independent of n B (∆). Equation (6) is plotted with a black, dashed line in Fig. 2, illustrating the behavior for weak and strong measurements. Additionally, we computed the power by (i) numerically solving Eq. (1) (solid colored lines), and (ii) using the separation of timescales technique to first order in 1/γ (dashed colored lines) (see SM for details). As γ decreases, the extracted power decreases because the detector can no longer resolve fast changes in the system, missing opportunities to extract energy. The separation of timescales approximation gradually breaks down as γ and Γ become comparable. Following Ref. [78], in the long-time limit, Eq. (5) implies the detailed fluctuation theorem for the number of extracted energy quanta m, where m > 0 (m < 0) corresponds to extracting (dissipating) energy from the bath. The term ∆/T is the entropy change in the bath related to the exchange of a single quantum. The information term ln 1−η η is given by the log-odds of not making an error and can be interpreted as the difference in information content between correctly and incorrectly applying feedback. Note that most information from the continuous measurement is discardedit is only the information during a change in the system state that matters. In the error free limit, η → 0, the information term diverges, illustrating absolute irreversibility, i.e., all excitations are extracted. See SM for a derivation of Eq. (7).
Quantum toy model. We consider a qubit coherently driven by an external driving field, see inset of Fig. 2(b). Measurement and feedback are identical to the classical toy model, now extracting energy from the driving field. The feedback protocol is described by where ∆ is the qubit level spacing, g the strength of the qubit-driving field coupling, andσ x the Pauli-X operator.
Separating system and detector timescales to first order in 1/γ results in system Liouville superoperator (details in SM) with effective dephasing rateλ = λ + ∆ 2 ln(2)/2γ, and The first term on the RHS of Eq. (9) represents the coherent drive, while the second term describes dephasing due to measurement and feedback. The third term is a source for quantum coherence, stabilizing the coherence in the long-time limit. We emphasize that the first order correction is essential to compute the power as the steady state coherence vanishes to leading order, and hence, no power can be extracted. Note that the third term, which goes beyond leading order, can lead to negativities inρ t , which is of no concern in the separation of timescales regime where the term is small. We stress that this term is trace preserving asσ x is traceless.
The average power of the system is given by P (t) = tr{[∂ tĤ (D)]ρ t }, where power is extracted (dissipated) when P (t) > 0 (P (t) < 0). Over one driving period τ = 2π/∆, the time averaged power reads For strong measurements λ γ, the power vanishes because of the quantum Zeno effect. For weak measurements λ γ, large detector noise leads to completely random feedback, and the power goes to zero because of the symmetric driving. This is highlighted in Fig. 2 Outline derivation main result. To outline the main steps in the derivation of Eq. (1), we start by describing the continuous measurement. For a single instantaneous measurement, the system stateρ t transforms aŝ whereK(z) is the measurement operator for obtaining outcome z, obeying the completeness relation dzK † (z)K(z) = 1, tr{ρ t (z)} is the probability of obtaining z, and dzρ t (z) is the system state for an unkown measurement outcome. Stressing that temporal coarse graining results in Gaussian noise for any measurement operator [79], we consider Gaussian measurement operators [79,80] where δt is the time between measurements. A weak continuous measurement is obtained by repeatedly measuring the system, taking the limit λδt → 0 for a fixed measurement strength λ. In this limit, the sequence of outcomes becomes a continuous signal z(t).
The detector bandwidth γ is introduced through a lowpass frequency filter [1,12,[81][82][83][84][85] such that the measurement outcome D(t) is a smoothened version of the signal z(t). The filter reduces the high frequency measurement noise and introduces a detector delay. This provides a realistic detector model, but the filter is also necessary for nonlinear feedback protocols because higher orders of z(t) are ill-defined due to its white noise spectrum which includes diverging frequencies [1,12,83]. Feedback is incorporated by controlling the system time evolution in between measurements, i.e., making the Liouville superoperator L(D) dependent on the frequency filtered measurement outcome D. Combining time evolution due to measurements and due to the Liouvillian, we find Eq. (1) in the continuous limit δt → 0. The derivation can be carried out either in the framework of stochastic calculus following the methods outlined in Refs. [79] and [68], or under the rules of conventional calculus. See details in SM.
Conclusions. We have derived a Fokker-Planck master equation for continuous feedback control, describing the joint system-detector dynamics for detectors with finite bandwidth. By separating system and detector timescales, we obtain a Markovian master equation for the system alone, opening a new avenue for analytical modeling of nonlinear feedback protocols. The Marko-vian description further implies fluctuation theorems, providing insight into the connection between thermodynamics and information theory. With two simple toy models, we highlighted the usefulness of our formalism, showing that it can be applied to a large variety of systems in both the classical and quantum regimes. Future endeavors include extensions of the formalism to include non-Markovian effects and state-estimation feedback [61,86].

Supplemental Material: Quantum Fokker-Planck Master Equation for Continuous Feedback Control
In this supplement, we provide detailed technical derivations for the results presented in the main text. Our main result, Eq. (1) in the main text, is derived by two different methods in Sec. I. Section II provides details on the separation of time-scales, which results in Eq. (2) in the main text. Two different approaches are provided. Additional details on the numerical calculations are provided in Sec. III and detailed calculations for the classical and quantum toy models are given in Secs. IV and V, respectively. Equation and Figure numbers not  In this section, we derive the quantum Fokker-Planck master equation (QFPME) in Eq. (1) in the main text by the means of conventional calculus. For compact notation, we introduce the measurement superoperator [79,80] where z is the outcome, λ measurement strength, andÂ the measured observable. To describe a continuous measurement, time is discretized into n intervals δt = (t − t 0 )/n, where t 0 and t are the initial and final times, respectively. By successively applying time evolutions and measurements on the initial stateρ t0 , we get representing the joint state of the system and the sequence of outcomes z j obtained at times t j = t 0 + jδt. Feedback is incorporated by the measurement dependent time evolutions in between measurements, where D j is the measurement outcome observed on the detector (which includes a low-pass filter with bandwidth γ) at time t j . The relation between the filtered outcome D j and the unfiltered outcome z j is defined by and is a discretized version of Eq. (13). For a fixed measurement strength λ, we obtain a weak continuous measurement in the limit λδt → 0.
We may now write the joint state of the system and the measurement record {D j } n j=1 aŝ with z 0 specifying the initial value of z(t). Equation (S4) results in where we substituted D n−1 → D and D n → D. Finally, to first order in δt, the measurement operator M (D|D ) reads where δ (D − D ) and δ (D − D ) denote the first and second derivative with respect to D on the Dirac delta function. Equation (S7) was found by usingÂ |a = ξ a |a and the inverse Fourier transform 2λ where we used e −γδt ≈ 1 − γδt. The first order expansion of the LHS is found by expanding the second exponential under the integral, and then computing the integral. Inserting Eq. (S7) in Eq. (S6), letting δt → dt results in Eq. (1). Finally, we emphasize that Eq. (1) in the main text preserves the trace ofρ t = dDρ t (D). To describe a normalized probability distribution over D, bothρ t (D) and ∂ Dρt (D) must vanish as |D|→ ∞. Thus, by integrating Eq. (1) over D, the last two terms vanish. The remaining two terms, i.e., L(D) and D[Â], are trace preserving, implying that Eq. (1) is trace preserving as well. It follows that tr{ρ t } = 1, and that dDP t (D) = 1, where P t (D) = tr{ρ t (D)}.

B. Stochastic calculus
Equation (1) in the main text may also be derived using the tools of Itô stochastic calculus. To this end, we consider the conditional density matrix which changes over time as [87] where z denotes the (unfiltered) measurement outcome at time t and M(z) is given in Eq. (S1). Note that due to the denominator on the right-hand side, this equation is nonlinear. We now introduce the Wiener increment which has zero mean and obeys dW 2 = dt [79]. With Eq. (S10), we may eliminate z from Eq. (S9). Expanding Eq. (S9) to first order in dt (second order in dW ) then results in the Belavkin equation [88] dρ c ≡ρ , we find the stochastic differential equation Finally, we note that the quantity of interest can be written aŝ and Inserting Eq. (S14) into Eq. (S13), we recover Eq. (1) in the main text with the help of Eqs. (S15,S11) and by employing

II. SEPARATION OF TIMESCALES
In this section, we provide details on the treatment of the regime where the detector and the system are governed by different time-scales. This results in Eq. (2) in the main text. We provide two different approaches, one based on Nakajima-Zwanzig projection operators, Sec. II A, and one based on multiple time scale perturbation theory, Sec. II B.

A. Nakajima-Zwanzig approach
To employ Nakajima-Zwanzig projection operators [89,90], we first re-write Eq. (1) in the main text as where the individual terms are defined in the main text. We now introduce the projection superoperator where π aa (D) is given in Eq. (3) and V aa ρ = |a a|ρ|a a |. We note that we have PF(D) = F(D)P = 0. Using these superoperators, we can show that where we assumed that the initial state fulfills Qρ 0 (D) = 0.
We now approximate the last equation assuming a separation of time-scales. For bookkeeping purposes, we assume L λ (D) ∝ Γ and F(D) ∝ γ, with γ Γ. To lowest order in Γ, we may then drop L λ (D) in the exponential in Eq. (S19). Furthermore, we make a Markov approximation, replacing the time argument of the density matrix under the integralρ s (D) →ρ t (D) and we extend the integral to minus infinity. This is justified when the integrand vanishes much faster than the time-scale over whichρ t changes. This results in the differential equation where we introduce the Drazin inverse [91,92] Integrating Eq. (S20) over D reproduces Eq. (2) in the main text with We now introduce the generalized Hermite polynomials of variance σ = γ/(8λ) where H n (x) = (−1) n e x 2 ∂ n x e −x 2 are the standard physicist's Hermite polynomials. The generalized Hermite polynomials fulfill the orthogonality condition With these definitions, it can be shown that (for n > 0) (S26) We note that for n = 0, the last expression reduces to the matrix elements of L 0 , cf. Eq. (S22). With the help of Eqs. (S25,S26), we can write the matrix elements of L corr in Eq. (S22) as

Linear feedback
Here we consider a feedback Liouvillian of the form In the inifinite bandwidth limit, this scenario is described by the Markovian master equation derived by Wiseman and Milburn [63]. For the zeroth order, we find The first order reduces to In the limit γ → ∞, we recover the master equation by Wiseman and Milburn [63]. The last two terms constitute finite bandwidth correction terms to this well-known equation. We note that the first term in L corr is linear in γ and thus also contributes to the infinite bandwidth limit. The reason for this is that in this limit, D itself, and thus the eigenvalues of L λ (D), may become very large. In particular, the standard deviation of D scales as √ γ resulting in an extra factor of γ on the right-hand side of Eq. (S22).

Threshold feedback
In the main text, we focus on feedback Liouvillians of the form where θ(x) = 1 for x ≥ 0 denotes the Heaviside theta function. For the zeroth order, we find where we introduced the superoperators For the n = 0 coefficients, we find

B. Multiple time scale perturbation approach in Fock-Liouville space
Here we present an alternative approach, using multiple time scale perturbation theory [93], for deriving the reduced master equation (2) of the main text. We analyze the problem in Fock-Liouville space, where density matrices are converted to column vectors and superoperators become matrices [94,95]. When we apply this formalism to a classical model in Sec. IV A, the subspaces corresponding to the classical populations and quantum coherences decouple, leading to classical rate equations.

Extension to Fock-Liouville space
Equation (1) of the main text can be rewritten by introducing a vectorized form of the density matrix |ρ , containing the entries of the original matrix stacked in a single vector with N 2 elements, where N is the dimension of the Hilbert space [94,95]. The master equation (1) then becomes Here L D and D A are N 2 × N 2 matrices with entries whereÂ|a = ξ a |a as in the main text. The elements of L D are functions of D, while the elements of the diagonal matrix D A are numbers. The operator F is a diagonal matrix of operators, whose elements F aa ,aa ≡ F aa are Ornstein-Uhlenbeck (OU) operators in D space with Gaussian stationary distributions centered at (ξ a + ξ a )/2 with variance γ/8λ, i.e., F aa π aa (D) = 0 with π aa given in Eq. (3). Note that in contrast to the Sec. II A, here we redefined OU operators in a dimensionless form (we factor out γ from Eq. (S17) to get Eq. (S38)) for convenience of applying the perturbation scheme as shown in the next subsection. However, to avoid clutter we are not introducing any new notation for the OU operators in this section and using F to refer to its dimensionless form as shown in Eq. (S38). The matrices L D and D A are the Fock-Liouville representations of the superoperators L(D) and D[Â] appearing in the first line of Eq. (1), and F is equivalent to the superoperator appearing on the second line of that equation.
We also define the N 2 × N 2 diagonal matrix such the aa diagonal element of G(D) is the stationary distribution of F aa . The matrix G(D) is the Fock-Liouville counterpart of the superoperator aa π aa (D)V aa appearing after Eq. (2) of the main text.

Perturbation scheme
To derive Eq. (2) of the main text from Eq. (1) using the multiple time scale method, we start with Eq. (S35) and define two time scales of the system: a slow time scale τ 1 = t and a fast time scale τ 2 = γt/Γ, where Γ is defined in the main text. The smallness parameter for the method is = Γ/γ. We now extend |ρ t (D) to its two-timed [93] analogue |ρ(D, τ 1 , τ 2 ) and rewrite Eq. (S35) as Next, we expand our two-timed state vector |ρ(D, τ 1 , τ 2 ) in a series, Substituting Eq. (S41) into Eq. (S40) and collecting terms by orders of we obtain the set of equations: The 0th order Eq. (S42) implies that |ρ The general solution to the kth order Eq. (S43) is The second term on the right side of Eq. (S45) may lead to secular terms, that is terms that grow linearly with τ 2 . For consistency with the perturbation scheme, such terms must be removed [93]. Secular terms arise in Eq. (S45) if the source term on the right side of Eq. (S43) contains a component inside the nullspace of the operator F. We therefore remove secular terms by imposing the condition that this source term contains no component in the nullspace of F. This implies Notice that, in removing the secular terms at kth order, Eq. (S46) imposes a condition on the (k − 1)th order solution. Thus, to completely specify a solution to any order of the perturbation scheme we need to impose the condition that the source term in the next order of the perturbation equation exists outside the nullspace of the operator F. Once this perturbative solution for the two-timed state vector |ρ [k] (D, τ 1 , τ 2 ) is obtained, reverting back to the original state vector |ρ [k] Solutions obtained from the multiple time-scale approach capture the dynamics both in the fast and slow time-scales but often we are more interested in the slow time-scale dynamics where we neglect the effect of the fast or transient dynamics. To obtain the slow time scale dynamics after the transient time, we set τ 2 → ∞ in Eq. (S44) and Eq. (S45) and then substitute back τ 1 = t. Under this separation of time scale assumption the 0th order system-detector distribution is given as where |ρ [0] t is yet to be determined. Similarly for k ≥ 1 we have with |ρ [k] t yet to be determined. F + is a diagonal matrix containing pseudo-inverses of the OU operators in the corresponding diagonal elements of F matrix where I is the identity matrix in the Fock-Liouville space and P 0 is the null-space projection operator defined as This operator F + is equivalent to the Drazin inverse defined in Eq. (S21) but now it is considered in the Fock-Liouville space. These coefficient vectors |ρ with j ≥ 0 can be understood as the marginalized density matrix elements written as vectors of the system when the detector variable D has been integrated out To determine these vectors, we use the secularity removal condition, Eq. (S46), written in the original variable t ∂ t |ρ These conditions lead to a set of master equations describing the dynamics of the system after the detector variable D is integrated out. In the next section we explicitly obtain these master equations in the 0th and 1st order of the perturbation scheme.
Although we have assumed that L D is time-independent in arriving at these results, our analysis remains valid if L D depends on the slow time variable τ 1 = t.

Master equations
Setting j = 0 in Eq. (S52) and using the 0th order solution Eq. (S47), we obtain In Eq. (S53) we have used the fact that D A is a diagonal matrix and is independent of the detector variable D. The matrixL 0 is the Fock-Liouville representation of the superoperator L 0 of the main text. Next, setting k = 1 in Eq. (S48) and using Eqs. (S47), (S53) and (S54), we get |ρ [1] t (D) = G(D)|ρ Now substituting this expression into Eq. (S52) with j = 1, we arrive at We can take a step further and rewrite this first order correction in terms of time integrals of correlation functions Here · φ denotes an ensemble average over the probability distribution φ(D), and ψ(t)ξ(0) cc φ denotes the ensemble average of the function ψ(D t )ξ(D 0 ), with D 0 initially sampled from φ(D 0 ), and the superscript cc indicates that D t is obtained by evolving D 0 for a time t under the dynamics generated by F cc . The integral in Eq. (S58) converges because the integrand decays exponentially fast to zero, by properties of OU dynamics.
Combining Eqs. (S53) and Eq. (S56) we get a master equation for the system dynamics alone, to O( 1 ) where we have used |ρ t = |ρ where I is the identity matrix and |ρ t evolves under Eq. (S59).

III. GENERAL NUMERICAL METHOD FOR SOLVING THE QFPME
Before providing detailed analytical calculations for the toy models based on the separation of time-scale approaches discussed above, we briefly outline how Eq. (1) was solved numerically for these models. While we considered two-level systems, generalizing the method to higher dimensions is straightforward.
To numerically find the steady state of Eq. (1) for the toy models, we expand the density operator in terms of the generalized Hermite polynomials [defined in Eq. (S23)] aŝ where the sum is truncated to N terms, the matrix M n is written in the {|0 , |1 } basis, and a n , b n , and c n are expansion coefficients for the respective elements of the density matrix. For the classical toy model, c n can be put to zero. Inserting this expansion in Eq. (1), multiplying with He To explicitly find the function f m for threshold feedback, the following identity was used n + m even, C nm , n + m odd, where C nm is given for even n and odd m, and we note that (−1)! ! = 1.

IV. CLASSICAL TOY MODEL
A. Recovering a classical model Starting from Eq. (1) of the main text, we can recover an effective classical model by considering a Liouville superoperator L(D) that does not connect diagonal elements (populations) with off-diagonal elements (coherences), in the basis of the measured operatorÂ. In this case, the coherences decouple from the populations and decay exponentially with time, provided the spectrum ofÂ is non-degenerate. We will illustrate this approach below. An alternative approach (which we do not pursue here) is to consider a model where the coherences are destroyed by a large measurement backaction or an additional decoherence term.
To obtain a Liouville superoperator with the above-mentioned property, we assume that the system Hamiltonian H commutes with the measured operatorÂ, allowing us to writeĤ|a = ω a |a andÂ|a = ξ a |a , and we consider a superoperator L(D) of the form where off-diagonal terms of the rate matrix are W aa = M aa , diagonal terms are W aa = − a =a M a a and F aa is defined by Eq. (S38). The coherences obey (a = a ) Note that the decay rate is proportional to the flux of probability out of state a and a and a contribution due to the measurement. In the long time limit the coherences decay to zero. We see from Eq. (S65) that the backaction term effectively drops out of the equations of motion for the populations. As a result, to use separation of time scales, we no longer need to assume that γ λ (as required in the general quantum setting, see main text) thus we can use arbitrary values of λ/γ. In a classical setting, measurements do not inherently disturb a system, hence λ only determines the information per unit time that is gathered about the physical system.
In the remainder of this subsection we focus on the evolution of the the detector distribution ρ aa (D) that corresponds to the populations, and we ignore the distribution ρ aa (D) corresponding to coherences. This effectively means that we will work in an N -dimensional space rather than an N 2 -dimensional space. Equivalently, we will now work with classical probability vectors rather than quantum density matrices. Following the notational convention of the main text, see comments after Eq. (1), we will use ρ(D, t) to denote the joint classical probability distribution of the system and detector, and ρ(t) to denote the probability distribution of the system alone. Similar comments apply to the steady state distributions ρ ss (D) and ρ ss .
At leading order of approximation, the equations of motion for the populations ρ aa follow from Eqs. (S53) and (S54), resulting in the classical rate equation where is a vector that contains all the populations, andW 0 is an N × N matrix with elements (W 0 ) aa = dD W aa (D)π a a (D).
By Eq. (S37), the elements of D A that operate on populations ρ aa vanish, hence the second term λD A on the right side of Eq. (S53) does not contribute to Eq. (S67).
At the next order of approximation, Eq. (S56) gives us where ρ [1] is defined similarly to ρ [0] , and the N × N matrixW 1 is given bỹ whose elements are obtained by keeping only those elements of G, see Eq. (S39), that correspond to populations and not to coherences. Similarly, F + c is the N × N diagonal matrix obtained by keeping only those elements of F + that correspond to populations and not to coherences. The classical analog of Eq. (S58) can be written as: Combining results and setting ρ = ρ [0] + ρ [1] , we obtain a master equation valid to first order in : where we have used Γ −1 = γ −1 . This result corresponds to Eq. (S59), applied to our classical model. The joint system-detector distribution for the classical case, to O( 1 ), follows from Eqs. (S47) and (S55): where I c is the N × N identity operator and ρ(t) obeys Eq. (S74). This is the classical analogue of Eq. (S60).

Perturbative solution to QFPME for classical toy model
As discussed in Sec. IV A, for the classical toy model we can write down rate equations in the classical subspace, that is the subspace of populations only, without coherences. In what follows, we first use the results of Sec. IV A to solve for ρ ss , and from that result we determine ρ ss (D).
For the particular model of threshold feedback we have where Here, Eq. (S76) is equivalent to the Eq. (4) of the main text and W + and W − are rate matrices that correspond to the classical subspace of the superoperators L + and L − respectively. Matrices G c (D) and F + c are given by (S78) Where π 00 and π 11 are Gaussian distributions centered at −1 and 1, respectively. They follow from Eq. (3) with the particular choice for the measurement operatorÂ =σ z with ξ 0 = −1 and ξ 1 = 1 as introduced in the main text. From these expressions we can evaluate the matricesW 0 andW 1 defined in Sec. IV A. Eq. (S69) gives where is the error probability. Equation (S79) leads to the following normalized steady state solution of Eq. (S67) which together with Eq. (S81) implies We now use these results to construct the joint steady state probability distribution ρ ss (D). Combining Eq. (S81) and the classical analogue of Eq. (S47) give the 0th order joint steady state distribution Similarly, Eqs. (S55,S81,S87) give If we now define for convenience a function then the joint system detector steady state distribution can be written, to first order, as ρ ss (D) = 1 2 π 00 (D) π 11 (D) Upon integrating this expression over the variable D, we recover Eq. (S86). Now we move to the calculation of the power extracted or dissipated by the feedback. There are two main sources of dissipation in the system: weak measurements leading to limited information and incorrect application of feedback, and slow detector dynamics that lag with respect to the system dynamics. The first order correction for the power using separation of timescales in γ captures the latter source of dissipation.

Power calculation
To where ρ ss (D) is given by Eq. (S91). The desired mode of operation corresponds to counterclockwise transitions around the network shown in Fig. S1, and in every cycle 2∆ work is extracted, where ∆ is the energy separation of the states |0 and |1 . In the steady state, the probability currents are equal across all four edges and can be calculated from any of the edges. Here we define power as P = 2∆J ss = 2∆J (1,+)→(0,+) = 2∆ (W + ) 01 P (1,+) − (W + ) 10 P (0,+) . We then obtain which can be written as with m, z and k defined by Eqs. (S83,S84,S85) [for an alternative expression, see Eq. (S111)]. We see that even in the 0th order in the detector dynamics, the extracted power decreases with η, reflecting the negative effects of weak measurements that gather insufficient information from the system.

Strong measurement approximation
When the detector is accurate (λ γ) the distributions π 00 and π 11 become very localized. In that situation η ≈ 0, and the function h(D) can be approximated as h(D) ≈ Γ(n B + 1) [π 00 (D) − π 11 (D)]. We also have where π(D; y) denotes the stationary distribution of an OU process centered at y. Eq. (S85) can now be approximated as and similarly in this approximation m ≈ 0 and z ≈ 0. Using these we can write down the expression for power at very strong measurement limit as where the negative contribution in the first order reflects the fact that we get less extracted power when the detector lag cannot be neglected.

Fluctuation theorem
Here we derive Eq. (7) in the main text. We may write Eq. (S79) as whereW (ν) aa is the transition rate from state a to a with level configuration ν = ±, for which − corresponds to the configuration with |0 as ground state and + to the one with |1 as ground state. The rates are given bỹ where k B is the Bolzmann constant. This is in line with the postulated relation for local detailed balance for Maxwell demon feedback given in Ref. [78], and we now follow this reference to derive the fluctuation theorem.
We begin by defining a forward trajectory X = {(t j , ν j , S j−1 → S j )} n j=1 , specifying that at time t j , with state configuration ν j , the system transit from state S j−1 to S j . A time reversed trajectory X tr can be defined analogously. In steady state, we find the detailed fluctuation theorem where P (X) denotes the probability for observing trajectory X and M (X) the number of extracted energy quanta for trajectory X. Defining the probability of extracting m energy quanta as P (m) = k:M (X k )=m P (X k ), summing over all trajectories with m extractions, we find Eq. (7) in the main text.

C. Alternative power calculation
In this section, we provide an alternative calculation for the power in the classical toy model that can be used to access also the power fluctuations. To this end, we employ the method of full counting statistics, following Ref. [96].
We start with the density matrix formalism to illustrate the general parts of this derivation. The specific calculations are then carried out in the vector notation used above. The probability of having n transferred energy quanta from reservoir to two-level system after time t is given by P t (n) = ∞ −∞ dD tr{ρ t (D, n)}, whereρ t (D, n) is the numberresolved system-detector density operator. The joint state of system and detectorρ t (D) = nρ t (D, n). The discrete Fourier transform of the number-resolved density operator readsρ t (D, χ) = nρ t (D, n)e inχ , where we introduced a counting field χ. We can now write the counting field-resolved version of Eq. (1) as with Liouvillian L(D, , with counting field dependent dissipators D χ ± [Ô]ρ = e ±iχÔρÔ † − 1 2 {Ô †Ô ,ρ}, andσ = |0 1|. We further define the moment generating function such that the average number of transferred quanta reads n = −i∂ χ ψ t (χ)| χ=0 . As t → ∞, the steady state current of quanta is given by whereρ ss (D) is the steady state solution to Eq. (1). The steady state power now becomes P = ∆ n /t. Equation (S104) may be computed numerically, using the method outlined in Sec. III, or, assuming a separation of time-scales, analytically as outlined below. Following the method outlined in Sec. II A 2, we find (using the vector notation introduced in Sec. IV A) where σ x denotes the Pauli x-matrix, I the identity in two dimensions, and we abbreviated n B ≡ n B (∆). For the first order correction, we obtaiñ where the sum goes over all entries in the vector and we introduced C 3 = C 1 + C 2 − ηC 0 . After a long but straightforward calculation, one can verify that Eq. (S94) matches Eq. (S110) with the following mapping whereσ = |0 1|. The power operator in this system is given bŷ P = ∂ tĤ (t) = −g∆ sin(∆t)(σ † +σ), (S113) and the average power is determined by P = Tr{Pρ}. Note that in general both the power operator as well as the density matrix are time-dependent objects.

A. Analytical calculations
To leading order in the separation of time-scales, the Liouvillian reduces to L 0ρ = −ig cos(∆t)[σ † +σ,ρ]. (S114) The steady state solution to this Liouvillian is simply the identity matrix which implies that P = 0 to leading order in the separation of time-scales. This is related to the subtle interplay between γ and λ in the quantum regime. In the large bandwidth limit, the measurement strength λ needs to go to zero in order for the separation of time-scales to be valid. Note that this trade-off is absent in the classical model as the time-scale of the system evolution is not determined by λ.

B. Numerical calculations
This section details the results of the numerical calculations for the quantum toy model. We begin by noting that the feedback Liouvillian in Eq. (S112) is periodic in time. Assuming that the system reaches a periodic steady state, we expand the system-detector density operator as a Fourier series, with time independent coefficientsρ q (D) = τ −1 τ 0 dtρ t (D)e −iq∆t , where τ = 2π/∆ is the driving period of the driving field. Equation (1) This implies that only q = ±1 provides non-zero solutions for Eq. (S121b). These were found numerically using the method outlined in Sec. III. The total density matrixρ t (D) =ρ (1) q=1 (D)e i∆t ] was used to numerically calculate the power in Fig. 2(b), and the density matrix elements are plotted for t = τ in Fig. 2(c).