Informational steady-states and conditional entropy production in continuously monitored systems

We put forth a unifying formalism for the description of the thermodynamics of continuously monitored systems, where measurements are only performed on the environment connected to a system. We show, in particular, that the conditional and unconditional entropy production, which quantify the degree of irreversibility of the open system's dynamics, are related to each other by the Holevo quantity. This, in turn, can be further split into an information gain rate and loss rate, which provide conditions for the existence of informational steady-states (ISSs), i.e. stationary states of a conditional dynamics that are maintained owing to the unbroken acquisition of information. We illustrate the applicability of our framework through several examples.


I. INTRODUCTION
The dynamics of a quantum system depends not only on itself, but also on how it is probed, showcasing the remarkable extrinsic character of quantum mechanics. This unavoidable backaction due to measurements can be directly probed in the laboratory [1][2][3][4], and is by far the most intriguing and dramatic aspect of quantum theory. It also has a clear thermodynamic flavor [5], since backaction is an intrinsically irreversible process. A comprehensive theory describing the thermodynamics of monitored systems would therefore greatly benefit our understanding of the interplay between information and dissipation. Constructing such a theory, however, is not trivial, since it requires reformulating the 2nd law to take into account the information learned from the measurements. We call this a conditional 2 nd law. It quantifies which processes are allowed, given a certain set of measurement outcomes. Interestingly, due to measurement backaction, the noise introduced by the measurement can actually make the conditional process more irreversible, as recently demonstrated in a superconducting qubit experiment [6].
When a system is coupled to two baths at different temperatures, it usually tends to a non-equilibrium steady-state (NESS), where the competition between the two baths keeps the system away from equilibrium. Continuous measurements can lead to a similar effect. In this case, noise is constantly being introduced by the environment or the measurement backaction. But information is also constantly being acquired. These two effects compete, leading the system toward an informational steady-state (ISS). Crucially, the ISS relies on the experimenter's knowledge of the measurement records. A beautiful experimental illustration of this effect was recently given in [7], where the authors studied an optomechanical membrane monitored by an optical field. By measuring the field, one could monitor the position of the mechanical membrane and thus infer a steady-state which was close to the ground state. Conversely, if the measurements are not read, the membrane is perceived to be in a thermal state with higher * gtlandi@if.usp.br temperatures. The ISS is therefore colder, due to the information acquired from the continuous measurement.
ISSs are just one example of the many interesting phenomena that emerge when quantum measurements are introduced in a thermodynamic picture. The deep connections between the two concepts, together with recent experimental advances in controlled quantum platforms, have led to a surge of interest in formulating conditional laws of thermodynamics [8][9][10][11][12][13][14][15][16][17][18][19][20]. This also motivated ground-breaking experiments applying these ideas to Maxwell demon engines and feedback control [6,[21][22][23][24]. In all these frameworks, however, the measurements are assumed to act directly on the system, making them explicitly invasive.
Conversely, our interest in this paper will be on formulating the laws of thermodynamics when the measurements are done only on the environment and only after it interacted with the system. The scenario is therefore non-invasive by construction, so that any information acquired can only make the process more reversible, even if the measurement is very poor (as is often the case when dealing with large environments). This represents a change in philosophy compared to, e.g., Ref. [12], where the measurement was introduced by coupling the system to a memory and then measuring the memory. In that case one constructs the conditional 2 nd law by comparing the situation where the system is fully isolated, with that in which it is open due to the interaction with the memory. In our case, we assume instead that the interaction between system and bath is inevitable and will happen whether or not we measure it. We then ask how measuring the bath affects the degree of irreversibility of the process.
Crucially, the framework we develop will focus on continuously monitored system, in contrast to e.g. Ref [12]. It is therefore particularly suited for describing ISSs. Our endeavor began in Ref. [20], where we put forth a semiclassical theory valid for Gaussian processes. We were interested in quantum optical experiments, which have already been using some of these ideas for many decades, in the framework of continuously monitored systems [25,26]. In fact, our theory was recently employed in [27] to experimentally assess the conditional 2 nd law in an optomechanical system. However, in addition to being semiclassical, the framework of Ref. [20] also has another serious limitation: it is formulated solely in arXiv:2103.06247v2 [quant-ph] 12 Sep 2022 terms of the stochastic master equation obeyed by the system; that is, it does not require an explicit model of the environment, but only which type of open dynamics it produces.
There has been increasing evidence that a proper formulation of thermodynamics in the quantum regime is only possible if information on the environment and the systemenvironment interactions are provided [28]. Reduced descriptions, based only on master equations, can show apparent violations of the 2 nd law [29], something which can only be resolved by introducing a specific model of the environment [30].
In this paper we put forth a very general framework for describing the thermodynamics of continuously monitored systems, where measurements are only done indirectly in the bath. The formalism applies to a broad variety of systems and process, and is particularly suited for describing ISSs. The building block we use is to replace the continuous dynamics by a stroboscopic evolution in small time-steps, described in terms of a collisional model (CM) [31][32][33][34][35][36][37][38][39][40]. This has two main advantages. First, the thermodynamics of CMs is by now very well understood [30,[40][41][42][43] (see also [28] for a recent review). And second, CMs naturally emerge in quantum optics, from a discretization of the field operator into discrete timebins [44,45]. The typical scenario is a system interacting with an optical cavity, where a constant flow of photons is injected by an external pump [cf. Fig. 1(a)]. At each time step, the system will only interact with a certain time-window of the input/output field, thus transforming the dynamics into that of a series of sequential collisions between the system and some ancilla. Due to this connection, collisional models serve as a convenient tool for constructing the framework of continuous measurements in experimentally relevant systems. We refer to these as Continuously Monitored Collisional Models (CM 2 ).
Our paper is organized as follows. Sec. II establishes the basic framework, including the collisional setup. The corresponding information flows and thermodynamic features are characterized in Sec. III, which also contains the main contribution of this work: namely the construction of a conditional 2 nd law, which is capable of capturing the interplay between thermodynamics and information. In Sec. IV, we apply the CM 2 framework to models involving qubits providing some illustrative applications. Accompanying this manuscript, we also make publicly available a self-contained numerical library in Mathematica, for carrying out stochastic simulations of CM 2 s [46]. Finally, in Sec. V we draw our conclusions and highlight the perspectives opened by our approach.
Here we develop the basic framework of CM 2 . We consider a system X, with initial density matrix ρ X 0 , which is put to interact sequentially with a series of independent and identically prepared (iid) ancillae, labelled Y 1 , Y 2 etc., and prepared always in the same state ρ Y t = ρ Y . Time is labeled in discrete units of t = 0, 1, 2, 3, . . .. The collision taking the system from t − 1 to t is described by a unitary U t acting only between the system X and ancilla Y t as ( Fig. 1(b)): where Y t refers to the state of ancilla Y t after the collision. Taking the partial trace over the ancilla leads to the stroboscopic (Markovian) map Notice that E does not need to carry an index t, since it is the same for all collisions. After such map, the ancilla Y t never participates again in the dynamics and, for the next step, a fresh ancilla Y t+1 is introduced and the map in Eq. (2) is repeated.
Information on the state of the system is acquired indirectly by measuring the states ρ Y t of each ancilla after they collided with X. The measurement is described by a set of generalized measurement operators {M z }, satisfying z M † z M z = 1 1, so that outcome z t occurs with probability By using generalized measurements, we encompass both projective, as well as weak measurements in the bath. A diagrammatic depiction of the dynamics is shown in Fig. 1(c). A CM 2 is completely described by specifying {ρ Y , U, M z }.
The distribution in Eq. (3) concerns only the marginal statistics of a single outcome. Our interest will be instead on the joint statistics of the set of measurement records ζ t = (z 1 , . . . , z t ). (4) The indices are chosen so that ζ t contains all information about the system available up to time t. As ζ encompasses the entire measurement record, it is associated with the "integrated" information on X. Conversely, z t represents a differential information gain associated only with the step X t−1 → X t ( Fig. 1(d)). The joint distribution P(ζ t ) is given by where Note that since the measurements act only on those ancillae that no longer participate in the dynamics, it is irrelevant whether the measurement M z t occurs before the next evolution with Y t+1 or not. Finally, we also require the conditional state of the system ρ X t |ζ t , which quantifies the knowledge the experimenter has about the system, given that the measurement record ζ t was observed. Such state is given by FIG. 1. (a) A typical method for continuously monitoring a system is to couple it to an optical cavity and measure the photons leaking out.
(b) In a collisional model picture, the monitoring is introduced instead through a series of sequential collisions between the system X and independent ancillae Y t , which are subjected to measurement after each collision. (c) Diagrammatic representation of the model. The system is described stroboscopically (discrete time) by a state ρ Xt . At each instant of time, it interacts with an independent ancilla, prepared in state ρ Y , according to the map in Eq. (1). Afterwards, the ancillae are measured, as described by generalized measurement operators {M z }, which produce a classical (and random) outcome z t . (d) As time progresses, one builds up a measurement record ζ t = (z 1 , . . . , z t ), which contains all the information acquired about X up to time t.
As the measurements are performed only on the ancillae, there is never a direct backaction on the system, which is expressed mathematically by for any choice of generalized measurements {M z }. That is, the average of ρ X t |ζ t over all outcomes ζ t yields back the unconditional state ρ X t . Thus, while there may be a conditional backaction, unconditionally the measurement is non-invasive. The normalization factor P(ζ t ) in Eq. (6) introduces a unwanted complication, as it forbids us to write ρ X t |ζ t as a map acting on ρ X t−1 |ζ t−1 . This can be resolved, however, if we work with unnormalized states. We define the completely positive, trace non-preserving map which is indexed by the possible outcomes z of the measurements. Instead of working with ρ X t |ζ t in Eq. (6), we consider the unnormalized states X t |ζ t , defined as the sequence generated by the map with initial condition X 0 |ζ 0 = ρ X 0 . One may readily verify that The states X t |ζ t therefore contain the outcome distribution P(ζ t ) at any given time. And the normalized state in (6) is recovered as ρ X t |ζ t = X t |ζ t /P(ζ t ).
It is useful to keep in mind the interpretation of a CM 2 as a Hidden Markov model [9,47,48]. The system evolution is Markovian, but this is hidden from the observer who is partially ignorant about its dynamics: access to X is only possible through the classical outcomes ζ t . In the language of Bayesian networks, the key issue entailed by our framework is thus about the predictions that can be made on the state of the hidden layer X given the information available through the visible layer of the outcomes ζ t only. This highlights the nice interplay between quantum and classical features, present in these models: The evolution of the system is quantum but information is only accessed through classical data. We have also found it illuminating to understand what would be the classical version of a CM 2 , as this allows us to relate our framework directly with the classical formalism of Ito, Sagawa and Ueda [9,10]. This is addressed in Appendix A, where we also discuss the conditions for a CM 2 to be incoherent.

A. Quantum-classical information
The information content in the unconditional state ρ X t can be quantified by the von Neumann entropy S (X t ) ≡ S (ρ X t ) = − tr ρ X t ln ρ X t . Similarly, the information in the conditional state ρ X t |ζ t (properly normalized) is quantified by quantumclassical conditional entropy Each term S (ρ X t |ζ t ) quantifies the information for one specific realization ζ t , and S (X t |ζ t ) is then an average over all trajectories. Note also that this is not the quantum conditional entropy, a quantity which can be negative. Here, since we are conditioning on classical outcomes, S (X t |ζ t ) is always strictly non-negative. In this paper all conditional entropies will be of this form. The mismatch between S (X t ) and S (X t |ζ t ) is given by the Holevo information (or Holevo quantity) [49] I(X t : It quantifies the information about X contained in the classical outcomes ζ t . Its interpretation becomes clearer by casting it as where D(ρ||σ) = tr(ρ ln ρ − ρ ln σ) is the quantum relative entropy. Therefore, I(X t : ζ t ) is the weighted average of the "distance" between ρ X t |ζ t and ρ X t . The Holevo information reflects the integrated information, acquired about the system, up to time t. This is different from the small increment that is obtained from a single outcome z, at each step. In order to quantify such differential information gain, the natural quantity is the conditional Holevo information It describes the correlations between X t and the latest available outcome z t , given the past outcomes ζ t−1 = (z 1 , . . . , z t−1 ). The first term involves the state ρ X t |ζ t−1 , which stands for the state of the system at time t, conditioned on all measurement records, except the last one. In symbols, it can thus be written as where E is the unconditional map in Eq. (2). This therefore affords a beautiful interpretation to Eq. (14). Starting at ρ X t−1 |ζ t−1 , one compares two paths: a conditional evolution taking ρ X t−1 |ζ t−1 → ρ X t |ζ t and a unconditional evolution taking (14) measures the gain in information of the latter, compared to the former.

B. Information rates and informational steady-states
Eq. (12) is always non-negative. However, this does not imply that it will necessarily increase with time. In fact, the information rate can take any sign. This reflects the trade-off between the gain in information and the measurement backaction. A natural question is then whether it is possible to split ∆I t as the difference between two strictly non-negative terms, the first naturally identified with the differential gain of information (14), and the second to the differential information loss. That is, whether a splitting of the form would lead to the identification of a loss term L t which is strictly non-negative. As we will see in what follows, the answer to this question is in the positive.
To find a formula for L t we simply insert the first line of (14) into Eq. (16) to find This is already clearly interpretable as a loss term, as it measures how information is degraded by the map in Eq. (15). Indeed, we can show that it is strictly non-negative. To do that, we use Eq. (13) to write L t as (15)]. Together with the data processing inequality [50], this is enough to ascertain the non-negativity of L t for any quantum channel E.
In the long time limit the system may reach a steady-state where I ∞ no longer changes, so ∆I ∞ = 0. This does not necessarily mean G ∞ = L ∞ = 0, however. It might simply stem from a mutual balancing of gains and losses. That is, G ∞ = L ∞ 0. We define an informational steady-state (ISS) as the asymptotic state for which In an ISS, information is continuously acquired, but this is balanced by the noise that is introduced by the measurement. Crucially, the ISS does not mean that ρ X t |ζ t is no longer changing. This state is stochastic and thus continues to evolve indefinitely. Instead, what become stationary is the stochastic distribution of states in state-space [51].

C. Unconditional 2 nd law
Next we turn to the thermodynamics. The 2 nd law of thermodynamics characterize the degree of irreversibility of a certain process and can be formulated in purely informationtheoretic terms. This allows it to be extended beyond standard thermal environments, and also to avoid difficulties associated with the definition of heat and work, which can be quite problematic in the quantum regime [28].
At each collision, the entropy of the system will change from S (X t ) to S (X t+1 ). This change, however, may be either positive or negative. The goal of the 2 nd law is to identify a contribution to this change associated with the flow of entropy between system and ancilla, and another representing the entropy that was irreversibly produced in the process. The separation thus takes the form where ∆Φ u t is the unconditional flow rate of entropy from the system to the ancilla in each collision, and ∆Σ u t is the unconditional rate of entropy produced in the process. The 2 nd law is summarized by the statement that we should have ∆Σ u t ≥ 0. Eq. (21) is merely a definition, however. The goal is precisely to determine the actual forms of ∆Φ u t and ∆Σ u t . In standard thermal processes, this is usually accomplished by postulating that the entropy flow ∆Φ u t should be linked with the heat flowQ t entering the ancillae through Clausius' expression [52] ∆Φ u t = βQ t , where β is the inverse temperature of the thermal state the ancillae are in. By fixing ∆Φ u t we then also fix ∆Σ u t . This, however, only holds for thermal ancillae, thus restricting the range of applicability of the formalism.
Instead, we approach the problem using the framework developed in Ref. [53] (see also [40,54]), which formulates the entropy production rate in information theoretic terms, as where is the quantum mutual information between system and ancilla after Eq. (1) and is the relative entropy between the state of the ancilla before and after the collision. The first term thus accounts for the correlations that built up between system and ancilla, while the second measures the amount by which the ancillae were pushed away from their initial states. Thus, from the perspective of the system, irreversibility stems from tracing over the ancillae after the interaction in such a way that all quantities related either to the local state of the ancilla, or to their global correlations, are irretrievable [54].
As the global map in Eq. (1) is unitary, and the system and ancillae are always uncorrelated before a collision, it follows that Hence, the mutual information may also be written as Plugging this in Eq. (22) and comparing with Eq. (21) then allows us to identify the entropy flux as The entropy flux is seen to depend solely on the degrees of freedom of the ancilla. Although Eq. (25) is general and holds for arbitrary states of the ancillae, it reduces to βQ, as in the Clausius expression, if ρ Y is thermal. Another very important property of the entropy flux is additivity. What we call an "ancilla" may itself be a composed system consisting of multiple elementary units. In fact, as we will illustrate in Sec. IV, this can give rise to interesting situations. Suppose that Y t = (Y t1 , Y t2 , . . . , Y tN ) and that the units are prepared in a globally product state ρ Y t = N j=1 ρ Y t j . After colliding with the system, the state ρ Y t might no longer be uncorrelated, in general. Despite this, owing to the structure of Eq. (25), we would have where ρ Y t j is the post-collision reduced state of the j th unit of the ancilla. This property is quite important, as it allows one to compute the flux associated to each dissipation channel acting on the system.

D. Conditional 2 nd law
Eqs. (21), (22) and (25) specify the thermodynamics of the unconditional trajectories ρ X t , when no information about the ancillae is recorded. We now ask the same question for the conditional trajectories ρ X t |ζ t . In this case, the relevant entropy is the quantum-classical conditional entropy S (X t |ζ t ) in Eq. (11). Thus, we search for a splitting analogous to Eq. (21), but of the form where ∆Σ c t and ∆Φ c t are the conditional counterparts of the unconditional quantities used in Sec. III C. The identification of suitable forms for such quantities is the scope of this Section.
We adopt an approach similar to that used in Refs. [20,55], which consists in defining the conditional flux rate as the natural extension of Eq. (25) to the conditional case. That is, as ∆Φ c t refers to a specific collision, it should depend only on quantities pertaining to the specific ancilla Y t , thus being of the form where ρ Y t |z t = (M z t ρ Y t M † z t )/P(z t ) is the final state of the ancilla given outcome z t and P(z t ) = tr M z t ρ Y t M † z t ) [cf. Eq. (3)]. Moreover, S (Y t |z t ) is defined similarly to Eq. (11). Note how the causal structure of the model implies that the flux should be conditioned only to outcome z t , instead of the entire measurement record ζ t .
By defining the reconstructed state of the ancilla after the Eq. (28) can be recast into the form which showcases the potential difference between conditional and unconditional fluxes. Depending on the measurement strategy {M z } being adopted, it is reasonable to expect that ρ Y t ρ Y t , thus resulting in ∆Φ u t ∆Φ c t . This reflects the potentially invasive nature of the measurements on the ancilla. However, it should be noted that this is an extrinsic effect, related to the specific choice of measurement by the observer, and fully unrelated to the thermodynamics of the system-ancilla interactions.
We will henceforth assume that the measurement strategy is such that That is, it that does not change the population of Y t in the eigenbasis of the original state ρ Y t . This can be accomplished, for instance, by measuring in the same basis into which the state of the ancillae is prepared. We can then reach the important conclusion that This result is intuitive: Conditioning on the outcome is a subjective matter, related to whether or not we read out the outcomes of the experiment. It should therefore have no effect on how much entropy flows to the ancillae. Similar ideas were also used in many contexts [10,12,18,55]. However, these studies were concerned with the heat flux, which coincides with the entropy flux for thermal baths. Here we show that this is a general property, valid for any bath, provided we restrict to the special class of measurements characterized by Eq. (30). Under these conditions, comparing Eqs. (27) and (21), and reminding of the information rate in Eq. (16), we find This is a key result of our framework: It shows how the act of conditioning the dynamics on the measurement outcome changes the entropy production by a quantity associated with the change in the Holevo information. Hence, it serves as a bridge between the information rates and thermodynamics.
In particular, in an ISS, ∆I ISS = 0 and so ∆Σ c ISS = ∆Σ u ISS , although ρ X t and ρ X t |ζ t are in general different.

E. Properties of the conditional entropy production
We now move on to discuss the main properties of the conditional entropy production. The quantities ∆Σ u t and ∆Σ c t refer to the incremental entropy production in a single collision. Conversely, it is also of interest to analyze the integrated entropy production Since ∆I t in Eq. (16) is an exact differential, when we sum Eq. (32) up to time t, the terms in ∆I τ successively cancel, leaving only The integrated entropy production up to time t therefore depends only on the net information I(X t : ζ t ). Since I(X t : ζ t ) 0, it then follows that Therefore, conditioning makes the process more reversible. This happens because we only carry out measurements in the environment, so that there is never a direct backaction in the system. A stronger bound can also be obtained by using the fact that L t 0, which then leads to The reduction in entropy production is thus at least the total information gain. Returning now to the entropy production rate in each collision, in Appendix B we provide a proof of the following relation where D(Y t ||Y t ), is the backaction caused in the ancillary state due to its collision with the system, while I(Y t : ζ t−1 ) quantifies the amount of information gained about the ancilla through the measurement strategy. This is one of the overarching conclusions of our work, bearing remarkable consequences. On the one hand, it proves that the 2 nd law continues to be satisfied in the conditional case. On the other hand, it provides a nontrivial lower bound to the conditional entropy production rate in terms of the changes that take place in the ancillae only. It should also be noted that, the first inequality in Eq. (37) is saturated by processes where the measurement extracts all the information available.

IV. SIMPLE QUBIT MODELS
We now apply the ideas of the previous sections to simple models of CM 2 s, aimed at illustrating their overarching features while keeping the level of technical details to a minimum, so as to emphasize the physical implications of the framework illustrated so far.
We will focus on the case in which both the system and the elementary units of the ancilla are qubits. Despite their simplicity, such situations have far-reaching applications. For instance, in Ref. [45] it was shown how quantum optical stochastic master equations naturally emerge from modeling opticals baths in terms of effective qubits in a collisional model. Moreover, suitably chosen measurement stategies {M z } implemented on qubits allow also to simulate widely used measurement schemes, such as photo-detection, homodyne and heterodyne measurements. Finally, by tuning the initial state of the qubits, one can also simulate outof-equilibrium environments, such as squeezed baths. In Ref. [56], we complement the study reported here by addressing explicitly the case of continuous-variable systems.
Recall that a CM 2 is completely specified by setting {ρ Y , U, M z }. The unconditional dynamics is governed by the map E defined in Eq. (2), which can be simulated directly with very low computational cost. The conditional dynamics, on the other hand, is governed by the map E z in Eqs. (8) and (9), which we simulate using stochastic trajectories.

A. Single-qubit ancilla
We begin by studying the case where the system interacts with single-qubit ancillae prepared in the thermal state . Finally, we assume that the ancillae are measured in the computational basis, so that M 0 = |0 0| Y and M 1 = |1 1| Y . For concreteness, we take the initial state of the system to be The evolution of the relevant information and thermodynamic quantities of the problem, for a specific choice of f and g, is presented in Fig. 2. Panel (b) shows how conditioning always reduces our ignorance about the system, by demonstrating that S (X t |ζ t ) S (X t ) at all times. As the model being considered implement a homogenization process [32,33], the steady state ρ X ∞ coincides with the initial state of the ancilla, ρ X ∞ = ρ Y . This causes U(ρ X ∞ ⊗ ρ Y )U † = ρ X ∞ ⊗ ρ Y , so no information can be acquired anymore. The final state is thus an equilibrium state, not an ISS. The information rate, gain and loss are shown in Fig. 2(c). Initially the gain is very large, as the state of the system is significantly different from the thermal steady state and each measurement results in a significant acquisition of information. In turn, this results in ∆I t > 0. As the system evolves towards ρ X ∞ , the detrimental effect of homogenization starts prevailing over the information gain, causing an inversion in the sign of ∆I t . The long-time limit is associated with (∆I ∞ , G ∞ , L ∞ ) → 0 and no ISS emerges.
A comparison between the conditional and unconditional entropy production is shown in Fig. 2(d), which also reports on the entropy flux. The rates ∆Σ c t and ∆Σ u t are both nonnegative, but are not necessarily ordered. This happens because, in individual collisions, conditioning may not make the process more reversible. An ordering is instead enforced when looking at integrated quantities: Conditioning always reduces the entropy production [cf. Eq. (34)], as shown in Fig. 2(e).
For completeness, we also show in Figs. 2(f), (g), (h) the behavior of ∆I t , G t and L t along six randomly sampled trajectories ζ t . Typical stochastic fluctuations are observed, showing that in a single stochastic run, the net gain and loss can differ substantially (the curves in Fig. 2(b)-(e) were produced by averaging over 2000 such trajectories).

B. Two-qubit ancilla
We now move on to consider a case allowing the emergence of ISSs, opening up many interesting possibilities. The ancillae do not have to be just a single qubit, but can have arbitrary internal structure. Moreover, within a single collision, the system does not have to interact with all elementary units simultaneously, but may do so sequentially. We illustrate this by considering the case where each ancilla is actually 2 qubits, Y t = (Y t1 , Y t2 ), which interact sequentially with the system (cf. Fig. 3). The unitary U t between X and Y t will then have the form where U XY t j has support only over the Hilbert space of X and the unit Y t j . As discussed in Ref. [28,41], if the ancillae are prepared in different states, the system will not be able to equilibrate with either, but will instead keep on bouncing back and forth indefinitely. Hence, it will reach a NESS. Moreover, if at least one of the ancillae are measured, the conditional state may embody an ISS.
To illustrate this, we assume the first unit to be prepared in a thermal state such as the one considered in Sec. IV A, while the second unit is in |x + . The unitaries in Eq. (38) are chosen, as before, to be partial SWAPs with strengths g 1 and  2), but for two-qubit ancillae, prepared in ρ Y 1 = f |0 0|+(1− f )|1 1| and ρ Y 2 = |x + x + |. The qubits interact sequentially with the system via partial SWAPs and only ancilla Y 1 is measured. In contrast to Fig. (2), this model has a non-trivial ISS (G = L 0). We have taken, for concreteness, f = g 1 = 0.3 and g 2 = 0.1. g 2 . Finally, we choose to measure only the first unit which, by being prepared in a thermal state, acts as a classical probe. On the other hand, by being endowed with quantum coherence, the second unit represents a "resourceful state." In Fig. 3 we report the results of an analysis similar to the one that we have performed for the previous example, for direct comparison. The results are strikingly different as, in particular, the system now allows for an ISS. This is visible in Fig. 3(c) from the fact that G = L 0 when t → ∞ with the thermodynamic quantities in Fig. 3(d) also converging to non-zero long-time values. A marked difference with the case of no ISS is also seen in the behavior of the integrated entropy production in Fig. 3(e): As the rates now remain non-zero, the integrated quantities diverge in the long-time limit.
We can also perform another experiment that beautifully illustrates the essence of an ISS. While the initial state used in Fig. 3 was arbitrarily chosen, we could take it to be the steadystate of the unconditional dynamics. The idea is that we first allow the system to unconditionally relax by letting it undergo a large number of collisions, and only then we start measuring. Due to the effect of the measurements, the conditional state ρ X t |ζ t will start to differ from unconditional steady-state (while the unconditional dynamics remains fixed).
The results are shown in Fig. 4. Panel (a), in particular, neatly illustrates how the unconditional entropy does not change in time, while the measurements performed in the conditional strategy reduce the entropy of the state of the system, which is effectively driven to a state with a larger purity. This is the essence of an ISS.

C. Time series in the single-shot scenario
The quantities in Fig. 2-4 were obtained by repeating the experiment multiple times, always starting from the same state and evolving in the exact same way. We now contrast this with the single-shot scenario. That is, when we have access only to a single stochastic realization of the experiment. We focus on the two-qubit model where the system starts in the steady-state of the unconditional dynamics, as in Fig. 4. The dynamics of S (X t |ζ t ), G t and ∆Σ c t along a single trajectory is shown in Fig. 5. As one might expect, these quantities fluctuate significantly. Fig. 5 also shows the behavior of accumulated averages, up to a certain time, showing that both the entropy and gain rate tend to converge precisely to the ISS value in Fig. 4. In a classical context, processes satisfying this property are called stationary ergodic [57]. In Fig. 5(d) we plot the integrated average of the actual outcomes, Z t = ( t j=1 z t )/t, the actual outcomes being binary. Such integrated average outcome shows that in the ISS 70% of the clicks are associated with M 1 and the remaining 30% with M 0 .
Finally, the single-shot data in Fig. 5(a)-(d) can also be used to construct a histogram of the most relevant quantities, as illustrated in panels (e)-(h). These histograms shed light on the magnitude of the fluctuations of the relevant quantities. For instance, ∆Σ c t fluctuates very little, while the information gain G t fluctuates dramatically.

V. CONCLUSIONS
We have investigated the interplay between information and thermodynamics in continuously measured system by way of a collisional model construct. In particular, we were able to FIG. 5. Thermodynamics and information in the single-shot scenario. The configuration is the same as Fig. 4, but everything now refers to a single stochastic realization of the experiment. The red curves depict (a) S (X t |ζ t ), (b) G t and (c) ∆Σ c t for that single realization. The blue curves, on the other hand, represent the accumulate average; that is, the average of the given quantity up to that time. Image (d), in particular, shows the accumulated average for the outcomes Z t = ( t j=1 z t )/t, where the outcomes z t are either 0 or 1 (not shown for visibility). The black line in image (c) is the unconditional entropy production rate ∆Σ u t , which serves as a baseline for ∆Σ c t . Images (e)-(g) are the histograms obtained from the data in (a)-(d), discarding the first 20 points (to eliminate transients). (h) Stochastic trajectory in Bloch's sphere.
formulate the entropy production and flux rate -two pivotal quantities in (quantum) thermodynamics -from a purely informational point of view and accounting for repeated indirect measurements of the system of interest. These results offer a clear way to point-out and characterise the effect of quantum measurements on the thermodynamics of open quantum system.
We model the indirect measurement of the system via a collisional model where (a part of) the environment with which the system interact is monitored. This allows us to compare the entropy production with the case in which the environment is not measured and the evolution of the system is thus unconditioned. In turn, this comparison leads directly to a tightened second law for monitored systems with a very clear separation between entropic contributions coming from the dissipative interaction with the environment and the ones coming from the information gained during the monitoring. This allows us to introduce the concept of information gain rate and loss rates, and informational steady-states. The latter are particularly interesting since they represent cases where a delicate balance is established between the information that gets lost into the environment and the one that is extracted by measuring.
The interplay between information and the 2 nd law has been the subject of several works over the last decade. Stroboscopic dynamics, such as the one considered in Sec. II, have been studied in the classical context of Hidden Markov models [9,47,48]. A classical framework, where quantum measurements are mimicked by generic interventions, was put forth in [18], and resembles the classical version of our CM 2 's, developed in Appendix A. In the quantum context, the conditional dynamics analyzed here are a particular case of process tensors [58][59][60] whose thermodynamics has been recently considered in [19,61]. Unlike our framework, however, these studies assume the system is always connected to a standard thermal bath, while the ancillae play only the role of memory agents. For this reason, their definition of entropy production is based on a Clausius-like inequality and is therefore different from ours. Furthermore, we have opted to focus on informational aspects of thermodynamics, neglecting entirely the energetics of the problem. Detailed accounts of the latter can be found in Ref. [19,61,62].
Ref. [12] put forth a framework (recently assessed experimentally in Ref. [63]) where the ancillae play the role of active memories. This means their effect is always deleterious to the system. As a consequence, instead of using the Holevo quantity (12) to quantify information, they use the Groenewold-Ozawa quantum-classical information [64,65] I GO = S (X) − S (X |z). The two quantities are related by I(X : z) = I GO − ∆S X , where ∆S X = S (X ) − S (X). Depending on the type of collision, ∆S X may have any sign, so I GO is not necessarily non-negative.
The formalism developed in this work is widely applicable, as exemplified by the case studies we have considered (see also Ref. [56]). This makes it a valuable tool in the thermodynamic assessment of a broad variety of quantum-coherent experiments. The scenario we considered also fits perfectly with the characterization of emergent quantum applications, such as quantum computing devices [66][67][68]. Being able to characterize irreversibility in these devices should thus offer a significant advantage in the design and engineering of future devices. Let us focus on a single collision event. We assume that, at a certain instant of time, the system is at ρ X = x p(x)|x x| for some basis |x , while the ancilla is prepared in ρ Y = y p(y)|y y|, for some basis |y . The unconditional state of the system after one collision will then be where y |U|xy is still a ket in the Hilbert space of the system. This ket is not normalized, however, so we define |Ψ xyy := y |U|xy P(y |xy) , P(y |xy) = || y |U|xy || 2 . (A1) The state of the system may then be written as ρ X = xyy p(x)p(y)P(y |xy)|Ψ xyy Ψ xyy |.
When written in this way, it gives the impression that ρ X is already in diagonal form. But this is not the case, since in general the states |Ψ xyy are not orthogonal and do not form a basis. Moreover, there are usually many more states than that required to span the Hilbert space of X (there can be up to d X d 2 Y of them, where d X , d Y are the dimensions of system and ancilla). As a matter of fact, in general the eigenvectors of ρ X will have no simple relation with the states |Ψ xyy .
Conversely, we say a model is unconditionally incoherent if for any xyy , the states |Ψ xyy are always elements of the basis |x . In this case ρ X will be automatically diagonal, where the populations p(x ) can be found from x,y,y p(x)p(y)P(y |xy) x |Ψ xyy Ψ xyy |x .
Using (A1), we can also write this as where is the transition probability of observing a transition (x, y) → (x , y ). A matrix of this form is said to be unistochastic, which is a particular case of doubly stochastic matrices. An example of a unconditionally incoherent model is when both system and ancillae are qubits, interacting with the partial SWAP In this case with λ ∈ [0, 1].
In unconditionally incoherent models, if the system is originally diagonal in the basis |x , it will remain so throughout the evolution, with the populations evolving according to the classical Markov chain Q(x |x) = y,y Q(x y |xy)p(y).
(A7) Next we can do the same for the conditional map E z in Eq. (8). As we will see, however, unconditional incoherence does not imply conditional incoherence. Following the same steps as before, we can write E z (ρ X ) = xyy p(x)p(y) y |M z U|xy xy|U † M † z |y .
We now introduce two completeness relations in the y basis: xyy y y p(x)p(y) y |M z |y y |U|xy xy|U † |y y |M † z |y .
If the model is unconditionally incoherent, the states y |U|xy will be elements of the basis |x . But the resulting state will in general not be diagonal due to the terms y |M z |y and y |M z |y . In other words, coherence may very well be produced by the measurement itself. And while this cannot affect the unconditional dynamics of the system (due to nosignaling), it may very well affect the conditional one.
We therefore define a model to be conditionally incoherent if it is unconditionally incoherent and if y |M z |y ∝ ∆ y ,y .
The simplest possibility would, of course, be to take M z as projective measurements in the basis |y . But there may also be other interesting possibilities. For instance, we can take M z to be an imprecise projective measurement, which only runs over certain elements of the basis |y . Or we could make M z be a noisy measurement, that blurs the outcomes of each |y . It is worth noting, in passing, that conditional incoherence also immediately implies the validity of Eq. (31) on the entropy fluxes for conditionally incoherent models.
In any case, when the model is conditionally incoherent the map (8) can be written as is the conditional probability of observing outcome z, given that the ancilla is in |y . This therefore represents the "postprocessing" of the ancillary state. The state (A8) can also be written as where p(x , z) = In a classical context, this is the most important object defining a CM 2 . It describes the (Markovian) transition probability, of observing the system in x , as well as the outcome z, given that initially the system was in x. With this definition, it follows that which, classically, is precisely what one would expect from the law of total probability. Finally, we adapt these ideas to multiple collisions. The initial state of the system is ρ X 0 = x 0 p(x 0 )|x 0 x 0 |. The conditional (unnormalized) state after the first collision is obtained by applying (A10): where, recall ζ 1 = z 1 . Similarly, after the second collision, the conditional state will be X 2 |ζ 2 = x 2 p(x 2 , ζ 2 )|x 2 x 2 |, where p(x 2 , ζ 2 ) = x 0 ,x 1 W(x 2 z 2 |x 1 )W(x 1 z 1 |x 0 )p(x 0 ), Proceeding in this way, we then see that after the t-th collision, the state of the conditional system will then be where p(x t , ζ t ) = x 0 ,...,x t−1 W(x t z t−1 |x t−1 ) . . . W(x 1 z 0 |x 0 )p(x 0 ), (A13) Tracing over this state and recalling Eq. (10), we then finally obtain the distribution of outcomes P(ζ t ) = x 0 ,...,x t W(x t z t |x t−1 ) . . . W(x 1 z 1 |x 0 )p(x 0 ) (A14) This result is quite important, as it clearly highlights the hidden Markov structure of the present model, discussed in Sec. II. Summarizing, the incoherent version of a CM 2 is completely defined by the transition matrix W(x z|x) in Eq. (A11). This, in turn, depends on the transition matrix Q(x y |xy) in Eq. (A4), which must be unistochastic, and the noise matrix M(z |y), which can be any conditional probability.
It compares the Holevo information for a single collision outcome z, with the full quantum mutual information between system and ancilla, after the collision. This means that, no matter what measurement strategy {M z } one utilizes, the information about the system that can be extracted from the ancilla is at most equal to the full information encoded in the global quantum state ρ X Y . This inequality also holds for states conditioned on past outcomes. That is, G t = I c (X t : z t |ζ t−1 ) I(X t : Y t |ζ t−1 ), where the conditioning is over previous records ζ t−1 = (z 1 , . . . , z t−1 ) (i.e., those that happened before the present collision) and G t is defined in Eq. (14). This is true since conditional states are still quantum states (provided they are properly normalized), so that Eq. (B1) must still hold.
We now start with Eq. (32) and introduce the splitting (17) to write ∆Σ c t = ∆Σ u t − G t + L t . Next we use Eq. (22) for ∆Σ u t and Eq. (14) for G t . We then get ∆Σ c t = I(X t : Y t ) + D(Y t ||Y t ) − I c (X t : z t |ζ t−1 ) + L t .
Using the inequality (B2) then shows that Finally, we use Eq. (24) for I(X t : Y t ). The other mutual information I(X t : Y t |ζ t−1 ) also satisfies a similar formula Thus, the difference between the two mutual informations can be written as where we recognize, in the first two square brackets, the information loss term L t defined in Eq. (18).
Plugging this back in Eq. (B3) we then finally find Eq. (37). Being a consequence of (B2), we can also conclude that the first bound in (37) is saturated by processes where the mea-surement extracts all the information available. Even in such limiting case, we still get a non-zero ∆Σ c t , so the process is still irreversible.