What is the probability of a thermodynamical transition?

If the second law of thermodynamics forbids a transition from one state to another, then it is still possible to make the transition happen by using a sufficient amount of work. But if we do not have access to this amount of work, can the transition happen probabilistically? In the thermodynamic limit, this probability tends to zero, but here we find that for finite-sized systems, it can be finite. We compute the maximum probability of a transition or a thermodynamical fluctuation from any initial state to any final state, and show that this maximum can be achieved for any final state which is block-diagonal in the energy eigenbasis. We also find upper and lower bounds on this transition probability, in terms of the work of transition. As a bi-product, we introduce a finite set of thermodynamical monotones related to the thermo-majorization criteria which governs state transitions, and compute the work of transition in terms of them. The trade-off between the probability of a transition, and any partial work added to aid in that transition is also considered. Our results have applications in entanglement theory, and we find the amount of entanglement required (or gained) when transforming one pure entangled state into any other.


I. INTRODUCTION
Given a quantum system in a state ρ with some Hamiltonian, H 1 , when can it be deterministically transformed into another state σ associated with a potentially different Hamiltonian, H 2 ? If we can put the system into contact with a heat bath at temperature T , then in the thermodynamical limit, and if interactions are short-ranged or screened, we can make a transition provided the free energy of the initial configuration is larger than the free energy of the final configuration. The free energy of the state ρ defined as: were S(ρ) is the entropy; S(ρ) = − tr ρ log ρ. This is a formulation of the second law of thermodynamics, if we factor in energy conservation (the first law). However, what if we are interested in small, finitesized systems? Or in systems with long-range interactions? The thermodynamics of systems in the microregime, where we do not take the thermodynamical limit, has gained increased importance as we cool and manipulate smaller and smaller systems at the nano scale and beyond [1][2][3][4][5]. Theoretical work has continued a pace, with increased interest in the field in recent years . If we do not take the thermodynamical limit, then provided σ is block-diagonal in the energy eigenbasis, a necessary and sufficient condition for a transition between two states called thermo-majorization was proven in [20]. Thermo-majorization is a set of conditions that are more stringent than the ordinary second laws and had been conjectured to provide a limitation since 1975 [7].
Once again though, if σ is not thermo-majorized by ρ, then a transition is still possible, provided sufficient work is used. One can compute the work required (or gained) from this transition using thermo-majorization diagrams [20] or via the relative-mixedness [24]. Suppose however, we want to make a transition from ρ to σ, and it requires work which we cannot, or do not wish to, expend. Can we still nonetheless make the transition with some probability p rather than with certainty? And if so, what is the highest probability, p * , that can be achieved? In particular, given ρ and σ, we are interested in maximizing p in the following process: with X being some arbitrary state.
We will upper bound this maximum probability for any given ρ and σ. When σ is block-diagonal in the energy eigenbasis, we will show that this bound can be achieved and furthermore, that there exists a two outcome measurement that can be performed on ρ such that we obtain σ with the maximum probability p * . Of course, measurements do not come for free in thermodynamics -it costs work to erase the record of the measurement outcome [33]. That this measurement can be performed is noted for completeness -however, we do not consider measurements to be an allowed thermodynamical operations, and take Eq. (2) as our goal, defining what we mean by a thermodynamical transition.
Our main result, will be Theorem 5, which upper bounds the probability p * in terms of a minimization over a finite set of ratios between thermodynamical monotones. When the final state is block-diagonal, this bound is tight. These monotones, which we will show are given by Eq. (39), can be thought of as analogous to free energies, since they can only go down. This is proven in Theorem 4 and is equivalent to the thermo-majorization criteria of [20]. The set of ratios that we use to bound p * arXiv:1504.00020v2 [quant-ph] 7 Apr 2015 thus gives an alternative way of verifying if the thermomajorization criteria are satisfied. Rather than considering the thermo-majorization curves [20] or considering a continuous set of monotones [24,25] we provide a finite set.
Before proving Theorem 5, we will consider in Section II the simpler case where the Hamiltonian of the system is trivial, i.e. H ∝ I. Solving the problem in this regime, referred to as Noisy Operations [15], will provide us with insight into the solution for non-trivial Hamiltonians. In this simplified situation, p * is given by Theorem 1. The result is similar in form to [34] which considers the analogous problem of probabilistic pure state entanglement manipulation using Local Operations and Classical Communication (LOCC). However, care must be taken -the class of operations allowed under LOCC is very different to what is allowed in thermodynamics. For example, under LOCC one can bring in pure states for free (which can be a source of work in thermodynamics) and one is allowed to make measurements for free (which costs work). Perhaps more importantly, many of the LOCC monotones are concave, which is not the case here, thus we will require some different techniques. It should also be noted that in entanglement manipulation, the maximum probability achievable will be zero if the target state has a larger Schmidt rank than the starting state. Under Noisy Operations, we will see that p * is always non-zero (though can be arbitrarily small).
In Section III we consider the general case of arbitrary initial and final Hamiltonians and states. We will prove our results using the paradigm of Thermal Operations (TO) [20,35]. There are a number of different paradigms one can use to study thermodynamics (e.g. allowing interaction Hamiltonians or changing energy levels), however, these other paradigms are equivalent to Thermal Operations [20,36]. We introduce Thermal Operations at the beginning of Section III. In the case of a trivial Hamiltonian, Thermal Operations reduce to Noisy Operations, the regime considered in Section II.
Our expression for the work of transition between any two states using only a finite number of monotones is given in Lemma 2 for Noisy Operations and Lemma 6 for Thermal Operations. The Noisy Operations result can be adapted to give an expression for the amount of entanglement required (or gained) when transforming any pure bipartite state into another under LOCC. This is given in Appendix A and generalizes existing expressions for the distillable entanglement [37,38] and cost of entanglement formation [39]. We also show how p * can be upper and lower bounded using the work of transitions from ρ to σ and σ to ρ. This is done in Lemma 3 for the case of a trivial Hamiltonian, and in Lemma 7 for the general case.
Finally, we conclude in Section IV with a discussion on other goals, related to Eq. (2), which one could attempt when making a probabilistic transition. We also pose some open questions. One of these regards how p * varies if we supply additional work to drive the transition from ρ to σ or demand that additional work be extracted. The solution for qubit systems with trivial Hamiltonian is given in Appendix B.

II. PROBABILITY OF TRANSITION UNDER NOISY OPERATIONS
Before investigating Eq. (2) in the context of Thermal Operations, we will first consider a simpler, special case -Noisy Operations. In this particular instance of thermodynamics, the Hamiltonian of the system under consideration is trivial. Noisy Operations were first defined in [15] where the problem of whether a transition between two given states under a particular set of operations was considered. Within Noisy Operations, the following actions are allowed: i) a system of any dimension in the maximally mixed state can be added, ii) any subsystem can be discarded through tracing out and iii) any unitary can be applied to the global system. For a comprehensive review of Noisy Operations, see [40].
Given two states, ρ and σ, it was shown in [15] that transition from ρ to σ is possible under Noisy Operations if and only if ρ majorizes σ (written ρ σ). That is, if we list the eigenvalues of ρ and σ [41] in decreasing order and denote these ordered lists by η = {η 1 , . . . , η n } and ζ = {ζ 1 , . . . , ζ n } respectively, the transition is possible if and only if: where: Lorenz curves are a useful tool for visualizing these criteria ( Figure 1). For a given state ρ, its Lorenz curve is formed by plotting the points: and connecting them piecewise linearly (together with the point (0, 0)) to form a concave curve. If ρ majorizes σ, the Lorenz curve for ρ is never below that of σ. The functions defined in Eq. (4), and their analogue in Thermal Operations, will be crucial for the rest of the paper. They are monotones of the theory, only decreasing under Noisy Operations.

A. Non-deterministic transitions
We will now consider transitions when the conditions given in Eq. (3) are not necessarily fulfilled. Here, rather than transforming ρ to σ with certainty, we shall do so is an example of a sharp state. e) s I∞(σ) is the least sharp state that majorizes σ.
with some probability as formulated in Eq. (2). In particular, we are interested in the maximum probability, p * , that can be achieved. A similar problem is considered in [34] for entanglement manipulation and adapting its techniques can be used to show: Theorem 1. Suppose we wish to transform the state ρ to the state σ under Noisy Operations. The maximum value of p that can be achieved in the transition: is given by: Proof. The proof is split into two parts: first we show that it is impossible to achieve a value of p greater than that in Eq. (7) and then we give a protocol obtaining p = p * . To achieve our first goal we begin by showing that given Eq. (6): To see this note that from Weyl's inequality [42,43], we have: where x n is the smallest eigenvalue of X. As X is a positive semidefinite matrix, x n ≥ 0 and: Hence: where the first inequality uses Eq. (3) and the second follows from Eq. (10). Now suppose it was possible to achieve a value of p greater than p * in Eq. (6). Then there would exist an l such that V l (ρ) < pV l (σ), contradicting Eq. (8).
To show that p * is obtainable, we define the following quantities. First, define l 1 by: Then we proceed iteratively and, provided l i−1 < n, define: so we have: Define l i by: Note that we have r (i) > r (i−1) as for a, b, c, d > 0: This generates a set of l i such that 0 = l 0 < l 1 < . . . < l k = n and a set of r i such that p * = r (1) < . . . < r (k) . Now we split ρ and σ into blocks and define: Then from Eq. (14) (and the fact that equality occurs when l = l i ), ρ i majorizes r (i) σ i and we can perform: With a bit of massaging and recombining the blocks, this is the same form as Eq. (6) with p = p * and the blocks of X being defined by: Note that as the endpoints of the Lorenz curves coincide at (1, 1) and η 1 > 0, we are guaranteed that 0 < p * ≤ 1.
If we want to obtain σ from ρ with probability p * rather than have it as part of a probabilistic mixture as per Eq. (6), we can do so by performing a two outcome measurement, with measurement operators { √ M , √ I − M }, where the blocks of M are given by: After applying this measurement to ρ and reading the result, we will have either: or However, performing this measurement is outside of the class of Noisy Operations and hence costs work. As such, if a general two outcome measurement is allowed, it can be possible to transform ρ into σ with probability greater than p * . For example, if ρ and σ are qubits, we can convert ρ into σ with certainty using this extra resource. Firstly we add an additional qubit in the maximally mixed state and measure it in the computational basis. This results in a pure state, either |0 or |1 . As these majorize all other qubit states we can use it to obtain any σ with certainty.

B. Work of transition under Noisy Operations
If it is not possible to deterministically convert ρ into σ using Noisy Operations, to perform the transformation with certainty will cost work. Similarly, if ρ can be converted into σ using Noisy Operations, it may be possible to extract work. Here, we compute the work of transition in terms of a finite set of ratios of monotones. This is done in a similar manner to [40], although we show that the minimization can be done over fewer points.

Sharp States
Quantifying the optimal amount of work in both cases for the more general Thermal Operations was considered in [20,24]. We shall denote this quantity by W ρ→σ . If work must be added, the quantity is negative, while if we can extract work, it will be positive. For |W ρ→σ | = log d j , we define an associated sharp state [40] by: See Figure 1 for an example of a sharp state's Lorenz curve. The state s |Wρ→σ| is such that: In terms of Lorenz curves, tensoring a state ρ with a sharp state s W has the effect of compressing the Lorenz curve of ρ by a factor of 2 −W with respect to the x-axis [20].

Monotones for Noisy Operations, and the work of transition
The function V l (ρ) is equal to the height of the Lorenz curve of ρ at x = l n . An alternative set of monotones, L y (ρ) where 0 ≤ y ≤ 1, can be defined as the shortest horizontal distance between the Lorenz curve of ρ and the y-axis at y. Note that these functions never decrease under Noisy Operations. In particular: If we define the set D (σ) by: then a transition from ρ to σ is achievable with certainty under Noisy Operations if and only if: That it is sufficient to consider only y ∈ D (σ) will be justified below. The horizontal monotones, L y , also allow us to quantify the optimal work of transition that is required or extracted in going from ρ to σ: Lemma 2. Given two states ρ and σ, under Noisy Operations: Proof. Note that we have: as this follows from the fact that to obtain the optimal value of W ρ→σ , we wish to rescale the Lorenz curve of ρ with respect to the x-axis in such a way that it just majorizes that of σ -the curves should touch but not cross. The amount that we need to rescale by is given by Eq. (30). We now show that it is sufficient to maximize over y ∈ D (σ). Let s 0 = 0 and s k = k i=1 ζ i for 1 ≤ k ≤ rank (σ). Then, for 1 ≤ j ≤ rank (σ), as the Lorenz curve of σ is a straight line on the interval [s j−1 , s j ] and the Lorenz curve of ρ is concave: It is straightforward to check that the maximum value occurs at either r = 0 or r = 1. We can thus replace the inequality in Eq. (31) with an equality and it follows that it suffices to maximize over y ∈ D (σ).
As ρ NO −→ σ is possible if and only if W ρ→σ ≥ 0, the finite set in Eq. (28) is justified.
Note that in [40] it was shown that it is possible to calculate W ρ→σ by performing an optimization over the ratios calculated at the 'elbows' (see Figure 1 for a definition) of both ρ and σ. In Lemma 2 we have shown that it suffices to consider just the 'elbows' of σ.

Bounds on the transition probability
The quantities W ρ→σ and W σ→ρ can be used to bound p * as follows: Lemma 3. Given two states ρ and σ, under Noisy Operations: where as p * ≤ 1, we assume W ρ→σ ≤ 0. If W ρ→σ ≥ 0, p * = 1 and the transformation from ρ to σ can be done deterministically, potentially extracting a finite amount of work.
Proof. We start with the lower bound, giving a protocol which achieves p = 2 Wρ→σ . Assuming |W ρ→σ | = log d j for simplicity: i.e. we obtain something of the form Eq. (6) with p = 2 Wρ→σ and X = Tr B Y . As p * is the maximum value of p obtainable in Eq. (6), we derive the lower bound. We now consider the upper bound and to obtain a useful bound, assume W σ→ρ > 0. We define I ∞ (ρ) as the work of formation of ρ under NO [18], and hence let s I∞(ρ) be the least sharp state that majorizes ρ (see Figure 1). Note that I ∞ decreases under Noisy Operations and is additive across tensor products [40]. In terms of the eigenvalues of ρ and σ: By definition, as W σ→ρ > 0: Now, using first the monotonicity of I ∞ and then the additivity: as required.
From Eq. (32) we can see that when W ρ→σ = −W σ→ρ ≡ W (that is, in a reversible transition) then p * = 2 −W . This occurs when either σ NO = ρ ⊗ s |W | or ρ NO = σ ⊗ s |W | depending on whether W is positive or negative (when W ≥ 0 the transition is deterministic). In terms of Lorenz curves, this means that the curves of ρ and σ have the same shape up to re-scaling by a factor 2 −W . In particular, this is the case when both ρ and σ are sharp states, where both Lorenz curves are straight lines.
This result can be applied in the thermodynamic regime of many independent copies. If we want to perform a transition such as: we need an amount of work given by −N W ρ→σ . Hence, the probability of success in such a case is bound by: which tends to 0 for large N . This can be seen as a way in which in the thermodynamic limit statistical fluctuations are suppressed. We plot the curves of ρ, σ and ρ compressed by p * (with respect to the x-axis). The points A and B at which the vertical ratio between the curves of ρ and σ is maximum (which sets l1 and p * ), and the sharp states that pass through those points are also shown. After compressing the Lorenz curve of ρ by a ratio of p * , the point B will be taken to C, which will always either be below the curve of σ or just touching it. This proves the lower bound in Eq. (32).

Lorenz curve interpretation
In terms of Lorenz curves, adding W ρ→σ work to ρ to make the transition possible is equivalent to compressing the Lorenz curve with respect to the x-axis by a ratio 2 −Wρ→σ , such that the curve of ρ lies just above that of σ. Hence, a compression by p * ≥ 2 −Wρ→σ must mean that there is at least a point of the compressed curve just below or touching σ. A proof of this is given in Figure 2.
Extracting W σ→ρ work from σ before performing NO into ρ is equivalent to compressing the curve of ρ by a ratio of 2 −Wσ→ρ such that the curve of σ lies just above that of ρ. Hence, to prove the upper bound in Eq. (32), it suffices to show that in compressing the curve of ρ by p * at least one point of the new curve must lie above or touch that of σ. In Figure 3 we show a diagrammatic version of the proof given in Section II B.
It should be noted that with Lemma 3 we are proving a general statement about convex Lorenz curves. This is, that the minimum vertical ratio of two given curves (p * ) is lower and upper bounded respectively by the minimum and the maximum horizontal ratio of the two.

III. PROBABILITY OF TRANSITION UNDER THERMAL OPERATIONS
Noisy Operations can be generalized to include systems with arbitrary, finite Hamiltonians. This is the resource theory of Thermal Operations [20,35]. Within this scheme, the allowed operations are: i) a system with any Hamiltonian in the Gibbs state of that Hamil-  3. We plot the curves of ρ, σ and ρ compressed by p * (with respect to the x-axis). The points A and B at which the vertical ratio between the curves of ρ and σ is maximum (which sets l1 and p * ) and the sharp states I∞ (ρ) and I∞ (σ) are also shown. Given that for sharp states all bounds are saturated, the appropriate maximum vertical and horizontal ratios coincide, and are η1/ζ1, the ratio of the heights of B and A . But this ratio is, by definition, bigger than or equal to p * , the ratio between A and B. This means that if the curve of ρ is compressed by p * , the point B is mapped to C just above or touching the curve of σ, proving the upper bound of Eq. (32).
tonian can be added, ii) any subsystem can be discarded through tracing out and iii) any energy-conserving unitary, i.e. those unitaries that commute with the total Hamiltonian, can be applied to the global system.
In general, the initial and final systems may have different Hamiltonians but, by making use of the 'switching qubit' construction in [20], we can w.l.o.g. assume that the initial and final Hamiltonians are the same. As such, the results in this section will assume this but in Section III C we will discuss how a changing Hamiltonian affects them.
In the absence of catalysts, and provided the final state is block-diagonal in the energy eigenbasis, it was established in [20] that a transition from ρ to σ is possible under Thermal Operations if and only if ρ thermomajorizes σ. This is similar in form to the majorization criteria of Noisy Operations and can be visualized in terms of thermo-majorization diagrams which are similar to Lorenz curves but with two crucial differences.
Suppose ρ is also block-diagonal in the energy eigenbasis with eigenvalue η i associated with energy level E i , for 1 ≤ i ≤ n. Firstly, rather than ordering according to the magnitude of η i , we instead β-order them, listing η i e βEi in descending order.
The second difference is that we no longer plot the βordered η i at evenly spaced intervals. Instead we plot the points: where the superscript ρ on E i and η i indicates that they have been β-ordered and this ordering depends on ρ.
Thermo-majorization states that ρ can be deterministically converted into a block-diagonal σ if and only if its thermo-majorization curve never lies below that of σ, as is shown in Figure 4. This is analogous to the case of Noisy Operations. In what follows, we assume that the η i have been β-ordered unless otherwise stated. If ρ is not block-diagonal in the energy eigenbasis, to determine if a transition is possible we consider the thermo-majorization curve associated with the state formed by decohering ρ in the energy eigenbasis. This state, ρ D , is given by: where |E i is the eigenvector of the system's Hamiltonian associated with energy level E i . The operation of decohering ρ to give ρ D is a thermal operation and commutes with all other thermal operations [36]. A transition from ρ to σ, where σ is block-diagonal in the energy eigenbasis, can be made deterministically if and only if the thermo-majorization curve of ρ D is never below that of σ.
Finally, if σ is not block-diagonal, a transition from ρ to σ is possible only if ρ D thermo-majorizes σ D and finding a set of sufficient conditions is an open question.
In what follows, the thermo-majorization curve of a state with coherences is defined to be the thermomajorization curve of that state decohered in the energy eigenbasis as per Eq. (38).
Similarly to how Eq. (4) defines monotones for the Noisy Operations resource theory, the height of the βordered thermo-majorization curves provides monotones for Thermal Operations. If we denote the height of the thermo-majorization curve of ρ at x byṼ x (ρ), for 0 ≤ x ≤ Z (where Z is the partition function), then by the thermo-majorization criteria, this function is nonincreasing under Thermal Operations. In particular, for block-diagonal ρ, we have: These monotones also give us an alternative way of stating the thermo-majorization criteria: . Then ρ can be deterministically converted into σ under Thermal Operations if and only if: Proof. Suppose ρ T O −→ σ. Then by thermo-majorization, V x (ρ) ≥Ṽ x (σ), for 0 ≤ x ≤ Z and in particular Eq. (40) holds.
Conversely, suppose Eq. (40) holds and, setting t 0 = 0, label the elements of L (σ) arranged in increasing order by t i for i = 1 to n. Then on the interval [t i−1 , t i ], for 1 ≤ i ≤ n, the thermo-majorization curve of σ is given by a straight line. From ρ, define the block-diagonal state ρ σ by the thermo-majorization curve: and note that due to the concavity of thermomajorization curves, ρ thermo-majorizes ρ σ . On the interval [t i−1 , t i ], 1 ≤ i ≤ n, the thermo-majorization curve of ρ σ is also given by a straight line. The construction of this state ρ σ is shown in Figure 5. AsṼ ti (ρ σ ) =Ṽ ti (ρ), ∀i by construction, Eq. (40) implies thatṼ x (ρ σ ) ≥Ṽ x (σ), ∀i. Hence on the interval [t i−1 , t i ], 1 ≤ i ≤ n, the thermo-majorization curves for ρ σ and σ, and therefore ρ and σ, do not cross. As this holds for all i and the intervals cover [0, Z] the thermomajorization curve of ρ is never below that of σ and we can perform ρ T O −→ σ deterministically.
If we define the number of 'elbows' in the thermomajorization curve of σ to be j, this reduces thermomajorization to checking j criteria and generalizes Lemma 17 of [40] to Thermal Operations. Note also that if σ is not block-diagonal in the energy eigenbasis, Eq. (40) gives a necessary but not sufficient condition for the transition from ρ to σ to be possible.
Here we illustrate the construction of the state ρσ used in the proof of Theorem 4. The points of the curve ρ that are at the same horizontal position as the elbows of σ are joined, and by concavity the resultant curve is always below ρ.

A. Non-deterministic transformations
Having defined the appropriate monotones for Thermal Operations, we are now in a position to investigate non-deterministic transformations and prove a theorem analogous to Theorem 1.
Theorem 5. Suppose we wish to transform the state ρ to the state σ under Thermal Operations. The maximum value of p, p * , that can be achieved in the transition: is such that: Furthermore, if σ is block-diagonal in the energy eigenbasis, there exists a protocol that achieves the bound.
Proof. Proving this result is more complicated than proving Theorem 1 due to the fact that ρ and σ may have different β-orderings. We proceed as before, first showing the bound in Eq. (43) and then giving a protocol that achieves the bound when σ is block-diagonal. We begin by showing that given Eq. (42): First consider (for general σ) the maximum value of p that can be achieved in attempting to convert ρ into σ.
As decohering is a thermal operation, this value of p can also be achieved when attempting to convert ρ into σ D : Thus, to upper bound p * , it suffices to show that Eq. (43) holds for block-diagonal σ. Furthermore, w.l.o.g. we can assume that ρ and X are also block-diagonal. Using Weyl's inequality as per Theorem 1 to deal with degenerate energy levels, for block-diagonal ρ , σ and X, we have: Now consider the sub-normalized thermo-majorization curve of pσ given by the points: and the (possibly non-concave) curve formed by plotting the eigenvalues of ρ according to the β-ordering of σ. This is given by the points: By Eq. (45), the curve defined in Eq. (47) is never below that defined in Eq. (46). Finally, the thermo-majorization curve of ρ is given by: Note that attempting to construct a thermo-majorization curve for ρ with respect to the β-ordering of another state, as we do in Eq. (47), has the effect of rearranging the piecewise linear segments of the true thermomajorization curve. This means that they may no longer be joined from left to right in order of decreasing gradient. Such a curve will always be below the true thermomajorization curve. To see this, imagine constructing a curve from the piecewise linear elements and in particular, trying to construct a curve that would lie above all other possible constructions. Starting at the origin, we are forced to choose the element with the steepest gradient -all other choices would lie below this by virtue of having a shallower gradient. We then proceed iteratively, starting from the endpoint of the previous section added and choosing the element with the largest gradient from the remaining linear segments. The construction that we obtain is the true thermo-majorization curve. A graphical description of this proof is shown in Figure 6. As such, the curve in Eq. (48) is never below that in Eq. (47). This gives us: where the first inequality holds as, by definition, ρ thermo-majorizes ρ . In particular we have: .
Here we show graphically the steps of the proof of the first part of Theorem 5. In the decomposition of Eq. (42) the curve pσ must always be below that of ρ and hence also ρ. This sets the maximum probability p * as defined in Eq. (43). Both pσ and the disordered ρ have the same β-ordering.
When σ is block-diagonal in the energy eigenbasis, a protocol that saturates the bound is: where ρ σ was defined in Eq. (41) and is thermomajorized by ρ. As ρ σ and σ have the same β-ordering and:Ṽ applying the same construction used in Theorem 1 gives a strategy to produce ρ σ that achieves: For block-diagonal σ, after obtaining ρ through Thermal Operations we may apply the measurement defined by Eq. (21) to extract our target state with probability p * . This can be done through a process that uses an ancilla qubit system, Q, that starts and ends in the state |0 and has associated Hamiltonian, H Q = I 2 , a unitary that correlates the system with the ancilla and a projective measurement on the ancilla qubit. As the measurement operators are diagonal in the energy eigenbasis, we will find that the unitary is energy conserving and within the set of Thermal Operations. Hence the only cost we have to pay, is to erase the record of the measurement outcome.
The unitary that we shall use is given by: We show the thermo-majorization curves of a state to which a work qubit in one of two pure states has been tensored. Adding this work system takes Z → Z 1 + e −βW , extending the x-axis. When we tensor with the ground state to form ρ ⊗ |0 0|, the curve is the same as for ρ alone, but when the excited state is tensored, there is a change in the energy levels of the β-ordering, and as a result the curve of ρ is compressed by a ratio of e −βW .
to extract work. In the thermal operation scheme, the optimal amount of work that must be added or gained can be quantified using the energy gap, W , of a 2-level system with ground state |0 and excited state |W with energy W . The associated Hamiltonian is: The work of transition, W ρ→σ , is such that: As we illustrate in Figure 7, the effect of tensoring a pure state of work to ρ is equivalent to stretching the thermo-majorization curve by a factor of e −βW , and tensoring by the corresponding ground state to σ does not change the curve [20]. In both cases the β-order is preserved, and the new curves will have a lengthened x-axis 0, Z 1 + e −βW . These different stretchings can serve to place the curve of ρ just above that of σ, in which case W will be the work of transition, in a similar way to the case of Noisy Operations.

Monotones under Thermal Operations, and the work of transition
In Thermal Operations, the horizontal distance between a state's thermo-majorization curve and the y-axis is again a monotone for each value of y ∈ [0, 1]. We denote these byL y and, as before, they never decrease under Thermal Operations. In particular, for blockdiagonal ρ, we have: where all sums have been properly β-ordered.
Similarly to Lemma 2 we have: Lemma 6. Given two states ρ and σ, where σ is blockdiagonal in the energy eigenbasis, under Thermal Operations: .
The proof is near identical to that given for Noisy Operations and so we omit it here.
If σ is not block-diagonal, the right hand side of Eq. (58) lower bounds e −βWρ→σ . (To see this, recall that decohering is a thermal operation and hence W ρ→σ ≤ W ρ→σ D .)

Bounds on the transition probability
We can prove a result analogous to Eq. (32) for the thermal case: Lemma 7. Given two states ρ and σ, where σ is blockdiagonal in the energy eigenbasis, under Thermal Operations: where as p * ≤ 1, we assume W ρ→σ ≤ 0. If W ρ→σ ≥ 0, p * = 1 and the transformation from ρ to σ can be done deterministically, potentially extracting a finite amount of work.
Proof. The previous Lemma 3 can be seen as a general statement about pairs of concave Lorenz-like curves: the minimum vertical ratio is lower and upper bounded by the minimum and maximum horizontal ratios of the two. Given our previous definitions of the work of transition, and the fact that p * is the minimum vertical ratio of the two Lorenz curves (as shown in Theorem 5), the result follows.

C. Changing Hamiltonian
Our results so far have assumed that ρ and σ are associated with the same Hamiltonian. Suppose the initial system has Hamiltonian H 1 and the final system Hamiltonian H 2 . Following [20], this scenario can be mapped to one with identical initial and final Hamiltonian, H, if we instead consider the transition between ρ ⊗ |0 0| and σ ⊗ |1 1| where: Note that the partition function associated with H is The height of the thermo-majorization curve of ρ ⊗ |0 0| with respect to H, is identical to that of ρ with respect to H 1 on [0, Z 1 ] and equal to 1 on [Z 1 , Z]. Similarly, the height of the thermo-majorization curve of σ ⊗ |1 1| is identical to that of σ on [0, Z 2 ] and equal to 1 on [Z 2 , Z]. Hence by extending the definition ofṼ x (ρ) so thatṼ x (ρ) = 1 for x ≥ Z 1 , we can readily apply Theorems 4 and 5 to the case of changing Hamiltonians.

IV. CONCLUSION
Here, we have introduced a finite set of functions which, like the free energy, can only go down in the resource theory of Thermal Operations. We used these to compute the work of transition, and the maximum probability of making a transition between two states. The work of transition between the initial and final states, and visa versa can be used to bound this probability.
At the moment, little is known about the case when the final state is not block-diagonal in the energy eigenbasis. In such a situation, our results provide necessary conditions but are not sufficient. Finding sufficient conditions is expected to be difficult, as we do not know such conditions even for non-probabilistic transformations. For recent results on the role of coherences in quantum thermodynamics, see for example [26][27][28]32].
Our analysis has focused on Noisy and Thermal Operations in the absence of a catalyst, i.e. an ancilla which is used to aid in a transition but returned in the same state. In Catalytic Thermal Operations, CTO, given ρ and σ, we are interested in whether there exists a state ω such that: If such an ω exists, we say ρ CT O −→ σ. There exist instances where ρ TO −→σ and yet ρ CTO −→ σ. Investigating when such catalytic transitions exists has led to a family of second laws of thermodynamics that apply in the single-shot regime [25]. Having access to catalysts has the potential to achieve higher values of p than that defined by p * and it would be interesting to find an expression or bound for the maximum value of p in the process: Note that a bound can be obtained from any nonincreasing monotone of CTO, M say, that satisfies M (pσ + (1 − p) X) ≥ pM (σ). Bounding the maximum transition probability under Catalytic Thermal Operations is made more difficult by the fact that the generalized free energies found in [25] are not concave.
In maximizing the value of p in Eq. (2) to obtain p * , we have attempted to maximize the fraction of σ present in a state obtainable from ρ. With access to a single two outcome measurement, σ can also be obtained from ρ with probability at least p * . There are other measures that one could quantify in attempting to obtain a state that behaves like σ. For example, one could consider the fidelity between σ and a state reachable from ρ: where F (ρ, σ) = tr √ρ σ √ρ is the fidelity between the two states. Investigating this problem is an open question, but note that for diagonal σ we have F TO (ρ, σ) ≥ F (ρ , σ) ≥ √ p * . Lemma 7 limits the maximum probability of reaching a mixed state σ from which work W σ→ρ may be extracted in the reverse transition. It would be of interest to find how this relates to the well-known fluctuation theorems [46,47] and their quantum versions [48]. See [49] for a review. These place limits on the probability of extracting work p [W, λ] in a given non-equilibrium process, λ, as compared to that in the time-reversed process,λ, by: Some potential links between these relations and singleshot quantities have been discussed in [50]. Another avenue of research is to generalize our result to the case where one is interested in not only maximizing the probability of obtaining a single state, but rather, finding the probability simplex of going to an ensemble of many states. Again, the fact that the monotones used in thermodynamics are not in general concave, means that straight application of the techniques used in entanglement theory [51] cannot be immediately applied.
Finally, by supplying more work or demanding that extra work is extracted, the value of p * achieved can be raised or lowered. For W ≤ 0, one could calculate p * (as a function of W ) for the states ρ⊗|W W | and σ ⊗|0 0|. For W > 0 the states to consider would be ρ ⊗ |0 0| and σ⊗|W W |. What is the tradeoff between p * and W ? As an example, the solution for qubit systems in the Noisy Operations framework is given in Appendix B.
The monotones that we have used for studying Noisy Operations, have been, or can be, defined solely in terms of Lorenz curves. They are also monotones in the resource theory of bipartite pure state entanglement manipulation under Local Operations and Classical Communication [52,53], where such curves can also be constructed. Using our monotones, and the behavior of Lorenz curves under tensor product with certain states, we give an expression for the single-shot entanglement of transition. This is the amount of entanglement that must be added (or can be extracted) in transforming |Ψ AB into |Φ AB under LOCC.
Previous work has considered the distillable entanglement and entanglement cost -the entanglement of transition when one of |Φ AB or |Ψ AB , respectively, is taken to be a separable state. In [37], the amount of entanglement that can be distilled from a single copy of a bipartite mixed state, σ AB , was bounded in terms of the coherent information. For a bipartite pure state, |Ψ AB , it is given precisely by the min-entropy of the reduced state tr B |Ψ AB Ψ AB | [38]. The amount of entanglement required to create a single copy of σ AB was calculated in [39] in terms of the conditional zero-Rényi entropy. In each paper, the analysis extends to accomplishing the task up to fixed error, . Here we go beyond the distillation and cost, showing that the more general entanglement of transition between two arbitrary pure bipartite states, can be quantified in terms of the monotones L y .
For a bipartite pure state, |Ψ , on a system AB, let: Without access to any additional resources, it is possible for two separated parties to transform |Ψ into another bipartite state, |Φ , under LOCC if and only if ρ |Φ majorizes ρ |Ψ [52]. Hence if |Ψ can be transformed into |Φ : and: L y ρ |Φ ≤ L y ρ |Ψ , ∀y ∈ D ρ |Ψ , where the functions V l , L y and the set D are defined as per Section II. Note that for LOCC we consider the 'elbows' of the Lorenz curve associated with the initial state whilst for NO we consider the 'elbows' of the final state's curve when determining if a transition is possible. This change occurs as for a transition to take place in pure state entanglement theory, we require that the final state majorizes the initial state whilst in the theory of NO, we require that the initial state majorizes the final. The unit for quantifying entanglement costs is the ebit -the maximally entangled state with local dimension 2. The maximally entangled state with local dimension d: requires the two parties to share log d ebits to prepare it and they can extract log d shared ebits if they share one. Separable states are free within this resource theory so if we define: as a separable pure state with local dimension d, |sep d costs 0 ebits to prepare and no shared entanglement can be extracted from it. Note that: The entanglement of transition, E |Ψ →|Φ , is the optimal amount of shared, bipartite entanglement that the parties need to add, or can gain, to transform a copy of |Ψ into |Φ under LOCC. If the quantity is negative, entanglement must be used up to make the transition possible while if it is positive, entanglement can be extracted. E |Ψ →|Φ is the maximum value of v log d 2 − u log d 1 that can be achieved where u, v, d 1 , d 2 ∈ Z are such that: In terms of Lorenz curves, the addition of entangled and separable state serve to rescale (with respect to the x-axis) the curves associated with |Ψ and |Φ by d 2 −v and d 1 −u respectively. To maximize E |Ψ →|Φ , the Lorenz curve of the rescaled |Ψ needs to lie just to the right of the Lorenz curve of the rescaled |Φ . Hence: with equality for some y. This gives: in analogy with Lemma 2 for the work of transition in Noisy Operations. This can be generalized to consider situations where we require only that the final state is -close to the target state Φ with respect to a measure such as the squared fidelity, F 2 (|Φ , |Φ ) = | Φ |Φ | 2 . Let: Then, defining E |Ψ →|Φ by: we can write: As W ≥ 0, the minimum ratio occurs at l = j. Hence: Combining these results, we have that for η 1 < ζ 1 : As an example, in Figure 8, we plot p * (W ) against W for η = {0.6, 0.4} and ζ = {0.85, 0.15}. For completeness, for η 1 ≥ ζ 1 :