Relative Entropy and Catalytic Relative Majorization

Given two pairs of quantum states, a fundamental question in the resource theory of asymmetric distinguishability is to determine whether there exists a quantum channel converting one pair to the other. In this work, we reframe this question in such a way that a catalyst can be used to help perform the transformation, with the only constraint on the catalyst being that its reduced state is returned unchanged, so that it can be used again to assist a future transformation. What we find here, for the special case in which the states in a given pair are commuting, and thus quasi-classical, is that this catalytic transformation can be performed if and only if the relative entropy of one pair of states is larger than that of the other pair. This result endows the relative entropy with a fundamental operational meaning that goes beyond its traditional interpretation in the setting of independent and identical resources. Our finding thus has an immediate application and interpretation in the resource theory of asymmetric distinguishability, and we expect it to find application in other domains.


I. INTRODUCTION
The majorization partial ordering has been studied in the light of the following question [HLP52]: What does it mean for one probability distribution to be more disordered than another?While there exist several equivalent characterizations of majorization, the most fundamental is arguably the notion that a distribution p majorizes another distribution p when p can be obtained from p by the action of a doubly stochastic map.This captures the intuition that p is more disordered than p (see Subsection A 4 for technical details).
The theory of majorization has several widespread applications, some of which are in quantum resource theories [CG19].For example, transformations of pure bipartite states by means of local operations and classical communication can be analyzed in terms of majorization of the Schmidt coefficients of the states [Nie99].Majorization has been shown to determine whether state transformations are possible not just in the resource theory of entanglement, but also coherence [ZMC + 17, WY16] and purity [HHO03] as well.
As it turns out, majorization is a special case of a more general concept introduced in [Vei71] and now called relative majorization [Ren16,BG17].This more general notion also has a rich history in the context of statistics, where it goes by the name of "statistical comparison" or "comparison of experiments" [Bla53].To recall the notion, a pair (p, q) of probability distributions p and q relatively majorizes another pair (p , q ) of probability distributions p and q if there exists a classical channel (conditional probability distribution) that converts p to p and q to q .That is, (p, q) relatively majorizes (p , q ) if there exists a stochastic matrix N such that p = N p and q = N q.In each pair, if the second distribution is the uniform distribution, relative majorization collapses to majorization, demonstrating that relative majorization is indeed a generalization of majorization.Another generalization of majorization comes in the form of catalysis [JP99].A catalyst is an ancillary distribution whose presence enables certain transformations that would otherwise be impossible, with the only constraint is that its reduced state is returned unchanged.Catalysis has found application in various resource theories, including entanglement [JP99, DK01], thermodynamics [BHN + 15], and purity [BEG + 19].
In this paper, we reformulate the transformation task in relative majorization by allowing for a catalyst that can aid the transformation.In this scenario, a catalyst consists of an arbitrary pair (r, s) of probability distributions r and s that can be used in conjunction with the original input pair (p, q) to generate the output pair (p , q ) by means of a classical channel Λ acting on p ⊗ r and q ⊗ s.To make the catalytic task non-trivial, we demand that the first and second marginals of Λ(p ⊗ r) are p and r, respectively.We also demand that Λ(q ⊗ s) be exactly equal to q ⊗ s.As a consequence of this demand, the catalyst is returned unchanged and can be used in future catalytic tasks for distributions that are independent of p and p .We call this task relative majorization assisted by a catalyst with correlations, and the task is succinctly summarized in Figure 1.
Let q and q be distributions with rational entries and full support, and suppose that p = p .One main result of our paper is that it is possible to transform the pair (p, q) to the pair (p , q ) in the above sense if and only if D(p q) ≥ D(p q ) and rank (p) ≤ rank (p ) (1) where the relative entropy D(p q) is defined as Another contribution of our paper is that it is possible to perform an inexact, yet arbitrarily accurate transformation if the same conditions hold but q and q are not required to have rational entries.We establish these results by building on the prior work of [BHN + 15, M 18].As done in the prior works, we employ an embedding map as a technical tool [BHN + 15], and we also allow for the catalyst to become correlated with the target distribution [M 18].
We note here that our results apply to the more general "quasi-classical" case in which the first pair consists of commuting quantum states ρ and σ and the second pair consists of commuting quantum states ρ and σ .This follows as a direct consequence of the fact that commuting quantum states are diagonal in the same basis and thus effectively classical, along with the fact that there is a unitary channel that takes the common basis of the first pair to the common basis of the second pair.

II. MAIN RESULT
We begin by stating the main result of our paper and the associated corollary.We present all proofs in Appendices C, D and E.
Theorem 1 (One Shot Characterization of Exact Pair Transformations).Given probability distributions p, q, p , q on the same alphabet where q and q have full support and only rational entries, and p = p , the following conditions are equivalent: 1. D(p q) > D(p q ) and rank (p) ≤ rank (p ) 2. For any γ > 0, there exist probability distributions r and s, a joint distribution t , and a classical channel Λ such that (a) Λ(p ⊗ r) = t with marginals p and r Furthermore, we can take s = η, where η is the uniform distribution on the support of r.
The theorem states that the relative entropy is the only relevant information quantity that characterizes pair transformations when catalysts are available.Furthermore, the relative entropy D(t p ⊗ r) can be made as small as desired and thus the output of the classical channel Λ arbitrarily close to the product of p and r.The condition on q and q being rational can be relaxed by allowing for errors in the formation of the target state, as follows: Theorem 2 (One Shot Characterization of Approximate Pair Transformations).Given probability distributions p, q, p , q on the same alphabet where p = p , and q and q are full rank, the the following conditions are equivalent: 1. D(p q) ≥ D(p q ) 2. For any ε > 0 and γ > 0, there exist probability distributions r, s and p ε , a joint distribution t ε , and a classical channel Λ such that (a) Λ(p ⊗ r) = t ε with marginals p ε and r Furthermore, we can take s = η, where η is the uniform distribution on the support of r.
The relative entropy D(t ε p ε ⊗r) can be made as small as desired and thus the output of the classical channel Λ arbitrarily close to the product of p ε and r.Furthermore, the state p ε can be arbitrarily close to the target distribution p in trace distance.Note that allowing for a slight error in the transformation of p while demanding no error in the transformation of q is consistent with the frameworks of [Mat10, WW19, BST19], with the framework of [WW19] being known as the "resource theory of asymmetric distinguishability" due to this asymmetry in the transformation.
Corollary 1.Given probability distributions p and p on the same alphabet where p = p , the following conditions are equivalent.

H(p) ≤ H(p )
2. For any ε > 0 and γ > 0, there exist probability distributions r and p ε , a joint distribution t ε , and a unital classical channel Λ such that (a) Λ(p ⊗ r) = t ε with marginals p ε and r A unital classical channel is one that preserves the uniform distribution.The corollary states that, given a catalyst, the condition on the transformation is based on the Shannon entropy only.Furthermore, the relative entropy D(t ε p ε ⊗r) can be made as small as desired and thus the output of the classical channel Λ arbitrarily close to the product of p ε and r.Lastly, the state p ε can be arbitrarily close to the target distribution p in trace distance.

III. DISCUSSIONS AND IMPLICATIONS
In this section we discuss the various implications of our results.A catalyst is a special ancillary probability distribution whose presence enables certain transformations that would otherwise be impossible.In other words, a catalyst decreases the strength of the condition required for a transformation [JP99].Additionally, allowing for the catalyst to be correlated with the target distribution, subject to the condition that its reduced state is returned unchanged, the strength of the condition is further reduced and previously impossible transformations become possible.Since the catalyst is return unchanged, the paradigm of catalytic majorization with correlations is physically relevant and justified.
In the resource theories of pure-state entanglement, purity, and pure-state coherence, state transformations in the absence of a catalyst are governed by the majorization partial order, whereas in the resource theory of quasi-classical thermodynamics, transformations are governed by thermomajorization [GJB + 18].The catalytic majorization condition is equivalent to an infinite set of conditions on the Rényi entropy [Kli07,Tur07].The strength of the catalytic majorization condition lies strictly below the majorization condition, as there exist distributions p and q where p q but p T q [JP99], proving that catalysis provides an advantage.
Catalytic majorization can be further relaxed.In catalytic majorization, the target state, after transformation, exists in a product state with the catalyst; i.e., the catalyst and the target state are independent after the transformation.Relaxing this additional assumption and allowing the final state to be a non-product state, with the condition that their local states remain unchanged, further reduces the strength of the required condition.This is a generalization of catalytic majorization, called catalytic majorization with correlations, as the product state is a special case of a general distribution.
Our result is a necessary and sufficient condition for the last case, where we allow catalysis and correlations.The result is a condition on relative entropy and neatly ties all the resource theories mentioned into a single paradigm.This also enhances the operational meaning of the relative entropy by establishing its relevance in the singleshot regime as well.

IV. RELATED WORK
There are several related generalizations and regimes of operations to the ones mentioned in this work.This section provides an overview of the landscape.
Majorization and relative majorization are most prominent in the single-shot regime.This regime has to do with tasks involving a single system.
There exists another regime that deals with situations in which many identically and independently distributed (i.i.d.) systems are available, often referred to as the asymptotic regime.In the asymptotic regime, perhaps the more pertinent question is that of rate of conversion from initial to target distributions and not necessarily feasibility of the transformation.For the resource theories of pure-state entanglement, pure-state coherence, and quasi-classical thermodynamics, the rate of conversion is given by ratios of the entropy (purestate entanglement and pure-state coherence) and relative entropy to the Gibbs distribution (quasi-classical thermodynamics).The asymptotic version of Blackwell's "comparison of experiments" paradigm was studied in [Mat10,WW19,BST19].In [WW19], the problem was approached by introducing the resource theory of asymmetric distinguishability, which allowed for analyzing this problem in a resource-theoretic way.The result of [WW19,BST19] is that, in the asymptotic regime, the rate of conversion between two pairs of probability distributions (p, q) and (p , q ) is given by the ratio of the relative entropy of the pairs, thus enhancing the fundamental operational meaning of the relative entropy.The results in the various regimes and resource theories are presented in Table I.
Another generalization of majorization, called quantum majorization, was proposed to generalize classical stochasticity to the quantum regime [GJB + 18].A necessary and sufficient condition for the transformation of a pair of qubits to another has been given in [AU80].However, it was shown that the conditions are necessary but not sufficient for quantum systems with dimensions greater than two [Mat14], and thus a more general result is lacking.

V. CONCLUSION
The majorization partial order was developed to capture the notion of disorder in probability distributions and governs transformations in various resource theories.The addition of a catalyst reduces the strength of the required condition by allowing for transformations that are not possible in the framework of majorization without catalysis.Furthermore, allowing correlations between the target and the catalyst further reduces the strength of the required condition to one based on relative entropies.The main result of our paper is that a pair (p, q) of probability distributions can be converted to another pair (p , q ) of probability distributions assisted by a catalyst with correlations if and only if D(p q) ≥ D(p q ).This enhances the operational meaning of the relative entropy beyond the traditional independent and identical regime into the single-shot regime.This finding thus complements and enhances the previous finding from [BEG + 19] in the context of entropy and the resource theory of purity.
D ∞ (p q) can also be expressed as An important property of the Rényi divergence is that it obeys the data processing property, i.e. for all stochastic maps Λ and ∀α ∈ Another important property of the Rényi divergence is that for α ∈ [0, ∞], for all probability distributions p = q.Additionally, The Rényi divergence is additive under products, i.e. for probability distributions p 1 , p 2 , q 1 and q 2 [vEH14],

Quantum Rényi Divergence
The Rényi divergence from the classical regime can be extended to the quantum regime, but the extension is not unique.Classical probability distributions can be modeled as quantum density operators that are diagonal in an orthonormal basis.Thus, modelling two probability distributions is equivalent to working with two density operators that commute with each other, i.e., that are diagonalizable in the same basis.However, in general, this may not be possible, which leads to the non-uniqueness of the quantum extension.Note that any quantum extension should collapse to the classical case if the states commute.
Here, we use the quantum generalization from [Pet86]: The limits of α ∈ {0, 1} are where Π ρ is the projector onto support of ρ.The quantum Rényi divergence obeys the data processing inequality for all CPTP maps Λ and 0 ≤ α ≤ 2 [Pet86]: Another property of the quantum Rényi divergence is that for ρ = σ, [Pet86]

Majorization and its Extensions
The theory of majorization was developed to capture the notion of disorder.The crux of the theory is to answer the question: When is one probability distribution more disordered than another?The results can be succinctly put forth as follows [HLP52].
For two probability distributions p, q ∈ R k , we say that "p majorizes q" or p q if for all n = 1, . . ., k − 1, where p ↓ is the reordering of p in descending order.If p q, we say that q is more disordered than p.
There are several formulations for majorization that can be proven to be equivalent (see Lemma 1).Majorization has several applications, especially in entanglement transformations [Nie99].It has been shown that majorization defines a partial order for states that can be transformed into another.
An extension of the theory of majorization comes in the form of catalysis.Given two distributions p, q such that p ⊀ q, does there exist a distribution r such that p ⊗ r ≺ q ⊗ r?If such a distribution exists, it can be used to transform q into p and be left unchanged.Since r is unaffected by the process, we call it the catalyst of the transformation.We say that p is catalytically majorized by q, written p ≺ T q, if there exists a catalyst r such that p ⊗ r ≺ q ⊗ r.The theory of catalytic majorization is not as well understood as majorization.However, several important results have been shown [Kli07,Tur07].

Embedding Map
A mathematical tool frequently used in the resource theory of thermodynamics is an embedding map, which is a classical channel that maps a thermal distribution to a uniform distribution [BHN + 15].We use this tool in a more general sense needed in our context here, in order to map a probability distribution described by a set of rational numbers to a uniform distribution.
Consider the simplex of probability distributions {p i } k i=1 and a set of natural numbers where Since Γ d is an injective map, there exists a left-inverse for Γ d denoted by Γ * d : R N → R k and defined as follows: Note that Γ * d is a classical channel, and furthermore, Γ * d • Γ d = I, where I is the identity channel.

Appendix B: Technical Lemmas
The technical lemmas presented in this appendix have been established in prior work.Here we list them for convenience and completeness.

For
3. x = Dy for some d × d doubly stochastic matrix D.
Lemma 2 (Embedding of distribution with rational entries [BHN + 15]).For a distribution γ d defined using a set of natural numbers {d i } k i=1 , where then where η N is the uniform distribution of size N .
Proof.From the definition of an embedding map Γ d (see Subsection A 5), we see that This concludes the proof.
Lemma 3 (Preservation of Rényi's Divergence [BHN + 15]).Let p = {p i } k i=1 be a probability distribution and let {d i } k i=1 be natural numbers with where γ d is as defined in Lemma 2 and Γ d is as defined in Subsection A 5.
Proof.We split the proof into separate parts for different α values (see Subsection A 1).
For α > 1, joint convexity does not hold.Denote r = (1 − δ)p + δq.Then, the statement of the lemma is equivalent to x α is a convex function over all x > 0. Since q 1−α i > 0, the linear combination is convex with respect to p.For p = q, F (p) > 1 which comes from positivity of Rényi's divergence D α (p q).
Note that F (q) = k i=1 q i = 1.Thus, where the first inequality is by convexity and the second inequality is because F (p) > F (q). Thus, we show the validity of (B13).
For α < 0, if p is not full rank, D α (p q) = ∞.But q is full rank, implying that D α ((1 − δ)p + δq q) is finite and hence inequality holds.If p is full rank, we use a similar approach as the case for α > 1 by defining α = 1−α and thus α > 1.We use the previous case.This concludes the proof.

There exists a set of natural numbers {d
3. There exists a classical channel E such that E(q) = q and for any other distribution p, Proof.We construct a q that satisfies statements 1 and 2.Then, we create channel E to satisfy statement 3. Since q is decreasing, min i q i = q k .Firstly, we choose an N such that, We now define q.For i ∈ {1, . . ., k − 1}, qi = Note that for i ∈ {1, . . ., k − 1}, qi ≥ q i and qk ≤ q k since both are normalized to 1. Also note that, where the first equality follows from (B18) and the second equality follows from (B17).Thus, which follows from (B16) by the choice of N. Thus q is a valid probability distribution.Furthermore, we see that Using this result, we show that where the third equality follows from (B22) and the last inequality is due to (B20).Thus, if we pick ε = k N , we can satisfy statements 1 and 2.
Lastly, we need to construct a channel that takes q to q.Such a channel must slightly increase the probabilities q i , ∀i ∈ {1, . . ., k − 1} and reduce q k .Channels are characterised by conditional probabilities P (j|i) such that which is just the normalization preservation.
Consider a channel E defined by We now prove that the normalization condition is met and E(q) = q.We begin with the normalization condition.
Next, we show that E(q) = q.Let q = E(q).We need to show that q = q .
• For j ∈ I, q j = i∈I,i =j P (j|i)q i + P (j|j)q j + P (j|k)q k , = 0 + q j + ∆ j q k q k , = q j + ∆ j , = qj .(B29) • For j = k, Thus, we have proven that q = q .
Finally, we need to show that, for any other distribu- where the first equality is from the last equality of (B23), the first inequality is because . Furthermore, we can show that E maps the basis vectors to probability distributions with non-zero entries, proving that is indeed stochastic.This completes the proof.
Remark: For a full rank distribution p, the distribution E(p) (as defined above) is full rank.The reasoning is as follows: • The channel increases all entries except the last entry.Thus, all entries except the last entry in E(p) are strictly greater than 0 since p is full rank.
• The last entry in E(p) is 1 − ∆ q k p k .Since ∆ = q k − qk , by choice of N , we see that this entry is strictly greater than 0 as well (see (B20)).
Thus, the distribution E(p) is full rank for a full rank p.
Remark: A classical channel E is defined using a conditional probability distribution.In other words, given an input distribution p(x) and a channel E(y|x), the output distribution p (y) is given by A reversal channel of a classical channel can be defined as follow.From Bayes' Theorem, we know that Summing both sides of (B33) over x and noticing that x E (x|y) = 1 is the normalization condition, we recover (B32).
Similarly, summing both sides of the Equation (B33) over y, we see that and thus, a reversal channel E can be defined.
Proof.Consider the joint probability of two random variables X and Y given by the distribution w and the channel.X = 0 denotes inputs are from the first group (items from 1 to ), and X = 1 means inputs are from the second group (items from + 1 to n).Similarly, Y denotes similar events for the outputs.
Lemma 7 (Continuity [vEH14]).The relative entropy D(p q) is continuous in both arguments p and q, when q has full rank.
Lemma 8 (Inclusion of support [Ren05]).Let ρ AB be a density operator in This concludes the proof.
Lemma 9 (Superadditivity of D 1 [CLPG18]).Let H AB = H A ⊗ H B be a bipartite Hilbert space, and ρ AB , σ AB be two density operators.If σ AB = σ A ⊗ σ B , then Thus, Proof.Using the definition of Quantum Rényi's Divergence (from Subsection A 3), we see that , thus concluding the proof.
Lemma 10 (Superadditivity of D 0 ).Let H AB = H A ⊗ H B be a bipartite Hilbert space and ρ AB , σ AB be two density operators.If σ AB = σ A ⊗ σ B , then Proof.From Lemma 8, we see that supp (ρ AB ) ⊆ supp (ρ A ) ⊗ supp (ρ B ) and thus, Then, This concludes the proof.
Lemmas 9 and 10 are results based on the Quantum Rényi divergence .However, any result that holds for the quantum case, must hold for the classical case.This is because, allowing the quantum states to be commuting, we can diagonalize them in the same basis and thus are effectively quasi-classical.
Lemma 11 (Müller [M 18]).Let p A , p A ∈ R m be probability distributions with p ↓ A = p ↓ A .Then, there exists a probability distribution q B and an extension p AB with marginals p A and q B such that Proof.We begin with statement 1 implies statement 2; i.e., we suppose D(p q) > D(p q ) and rank (p) ≤ rank (p ), and prove the existence of a classical channel Λ, probability distributions r and s, and a joint distribution t that satisfy the conditions of statement 2. Since q and q have rational entries, without loss of generality, we pick q = d1 N , . . ., From the definition of Rényi entropy H 0 (see Subsection A 2) and (C5), we see that Using (C4) and (C6), we see that, from Lemma 11, there exists a probability distribution r and an extension v with marginals p and r such that for any choice of γ > 0. In other words, there exists a bistochastic map Φ (see Lemma 1) such that Using Lemma 6, we see that r can be considered, without loss of generality, to be of full rank.Concretely, if r is of rank d (less than full rank), then define u = p ⊗ r, u = v and w = η N ⊗ η.Using Lemma 6, Φ can be split into Φ Since this is a composition of stochastic maps, the overall map is stochastic.
Proving 2a) We now define t = [(Γ * d ⊗ I)]v .We notice that the marginals of t are Γ * d (p ) = p and r.

Proving 2b)
We need to show that Λ(q ⊗ s) = q ⊗ s for some s.Pick s = η.Then, where the second equality is from Lemma 2. Thus, Λ(q ⊗ s) = q ⊗ s.
where the first inequality is from (C7) and the second inequality is from the data-processing inequality (A5) using channel [(Γ * d ⊗ I)].Thus, D(t p ⊗ r) ≤ γ.This completes the proof that statement 1 implies statement 2. We now look at the reverse direction; i.e., we assume a classical channel Λ that satisfies the conditions of statement 2. Let α ∈ {0, 1}.Then, where the first equality follows from additivity of Rényi divergence (A8), the first inequality follows from the data-processing property (A5) and the last inequality follows from Lemmas 9 and 10 for α = 1 and α = 0 respectively.Thus, since we showed that r, s can be taken to be full rank, D α (r s) is finite and can be subtracted, and thus we can conclude that D α (p q) ≥ D α (p q ).(C14) We now split into two cases.α = 0. Thus, D 0 (p q) ≥ D 0 (p q ).Since this is true for all q, q , it must be true for q = q = η.From the for any choice of γ > 0. In other words, there exists a bistochastic map Φ (see Lemma 1) such that Φ( Ẽ(p) ⊗ r) = v (D15) Using Lemma 6, we see that r can be considered, without loss of generality, to be of full rank.Concretely, if r is of rank d (less than full rank), then define u = Ẽ(p) ⊗ r, u = v and w = η N ⊗ η.Using Lemma 6, Φ can be split into Φ 1 ⊕ Φ 2 .If r is of full rank, Φ = Φ 1 .where the first inequality is because of the triangle inequality of the trace distance, the second inequality is from (D2), the second equality is from (A17), the third inequality is because of the triangle inequality of the trace distance and the last inequality is because of (D2) and (D4).Since 1 , 2 and δ are arbitrarily chosen we can set them to .Thus Λ(p ⊗ r) = t ε with marginals p ε and r.Furthermore, p ε − p 1 ≤ ε.

Proving 2b)
We need to show that Λ(q ⊗ s) = q ⊗ s for some s.where the third equality follows because Φ 1 is bistochastic.Thus Λ(q ⊗ s) = q ⊗ s.Proving 2d) where the first inequality is from (D14) and the second inequality is from the data-processing inequality (A5) using channel [(E * ⊗ I) • (Γ * d ⊗ I)].Thus, D(t ε p ε ⊗ r) ≤ γ.This completes the proof that statement 1 implies statement 2. We now look at the reverse direction; i.e., we assume a stochastic map Λ. D(p q) + D(r s) = D(p ⊗ r q ⊗ s), ≥ D(Λ(p ⊗ r) Λ(q ⊗ s)), = D(t ε q ⊗ s), ≥ D(p ε q ) + D(r s), where the first equality follows from additivity of relative entropy (A8), the first inequality follows from the dataprocessing property (A5) and the last inequality follows from Lemma 9. Thus, since we showed that r, s can be taken to be full rank, D(r s) is finite and can be subtracted.
Appendix E: Proof of Corollary 1 Proof.Simply pick q = q = η and apply Theorem 2. Since Λ preserves the uniform distribution, it is a unital classical channel.

FIG. 1
FIG. 1. (p, q) is a pair of distributions to be converted to another pair (p , q ) by means of a classical channel and with the use of the catalyst pair (r, s).The state t after transformation has first marginal p and second marginal r.Dotted arrows indicate marginal distributions.

TABLE I .
Results in various regimes and resource theories.(The bold cell in bold indicates our result.)The last row is a general case of the rows above.The one-shot regime with catalysis depends on Hα and Dα, ∀α ∈ [−∞, ∞] (see Subsections A 1, A 2). Furthermore, in the one-shot cases, moving further right reduces the strength of the required argument, by allowing for catalysis.