Uniqueness and Optimality of Dynamical Extensions of Divergences

We introduce an axiomatic approach for channel divergences and channel relative entropies that is based on three information-theoretic axioms of monotonicity under superchannels (i.e. generalized data processing inequality), additivity under tensor products, and normalization, similar to the approach given recently for the state domain. We show that these axioms are sufficient to give enough structure also in the channel domain, leading to numerous properties that are applicable to all channel divergences. These include faithfulness, continuity, a type of triangle inequality, and boundedness between the min and max channel relative entropies. In addition, we prove a uniqueness theorem showing that the Kullback-Leibler divergence has only one extension to classical channels. For quantum channels, with the exception of the max relative entropy, this uniqueness does not hold. Instead we prove the optimality of the amortized channel extension of the Umegaki relative entropy, by showing that it provides a lower bound on all channel relative entropies that reduce to the Kullback-Leibler divergence on classical states. We also introduce the maximal channel extension of a given classical state divergence and study its properties.

A summary of all optimal extensions of the Shannon entropy.Red arrows represent maximal extensions and blue arrows represent minimal extensions.The Kullback-Leibler divergence is the only classical relative entropy that reduces to the log of the dimension minus the Shannon entropy when the second argument is a uniform distribution (in [1] a one-to-one correspondence between classical entropies and classical relative entropies was proven).Regularization is assumed in all extensions.The channel relative entropy D reg (N M) equals both the amortized divergence and the regularized minimal channel extension D reg (N M).
It is the smallest of all channel relative entropies that reduces to the Kullback-Leibler divergence on classical states, while the maximal divergence D reg (N M) is the largest one.The blue and red dashed arrows indicate that the minimal and maximal channel-extensions can be obtained directly from the Kullback-Leibler divergence using the extension techniques introduced in this paper.

A. Notations
We denote both physical systems and their corresponding Hilbert spaces by A, B, C etc. The letters X, Y, Z are reserved for classical systems.We will only consider finite dimensional systems and denote their dimensions by |A|, |B|, etc.The algebra of all |A| × |A| complex matrices is denoted by L(A).Similarly, L(X) denotes the algebra of all |X| × |X| diagonal matrices.The set of density matrices (i.e.quantum states) in L(A) is denoted by D(A).The elements of D(A) will be denoted by lower case Greek letters such as ρ, σ, ω, etc, whereas for a classical system X, the elements of classical density matrices in D(X) are denoted by p, q, r, etc, where depending on the context, p, q, r will be viewed as either diagonal density matrices in a fixed (classical) basis or as probability vectors.Moreover, for classical systems we will sometimes use the notation D(n) ≡ D(X) if n = |X|.The support of a density matrix ρ will be denoted by supp(ρ).
The set of all linear transformations from L(A) to L(B) is denoted by L(A → B).The elements of L(A → B) will be denoted by calligraphic letters such as N , M, E, F, etc.The set of completely positive maps in L(A → B) is denoted by CP(A → B), and the set of all completely positive trace-preserving maps (i.e.quantum channels) by CPTP(A → B).We denote by 1 the trivial physical system and identify D(A) = CPTP(1 → A).In particular, the set containing only the number one is identified with {1} = D(1) = CPTP(1 → 1).Note also that with these identifications, the trace is the only element of CPTP(A → 1) = {Tr A }.The Choi matrix of a linear map N ∈ L(A → B) will be denoted by J AB N := j,k |j k| ⊗ N (|j k|).For two linear maps N , M ∈ L(A → B) we write N M if N − M ∈ CP(A → B).We also consider linear maps from L(A → B) to L(A → B ).We call such linear maps supermaps, and denote by L(AB → A B ) the set of all supermaps from L(A → B) to L(A → B ).We will also make the identification L(1A → 1B) = L(A → B), where again, 1 denotes the trivial system.The elements of L(AB → A B ) will be denoted by capital Greek letters such as Θ and Υ.A supermap Θ ∈ L(AB → A B ) is called a superchannel if there exists a refrence system R, a pre-processing map E ∈ CPTP(A → AR), and a post-processing map F ∈ CPTP(BR → B ) such that for any N ∈ L(A → B) In [29] it has been shown that a supermap is a superchannel if and only if it maps quantum channels to quantum channels in a complete sense (i.e. even when tensored with the identity supermap; see [16,26,30] for more details).
The set of all superchannels in L(AB → A B ) will be denoted by SC(AB → A B ).We will also use the notation to denote superchannels that maps quantum states in D(A) = CPTP(1 → A) to quantum channels in CPTP(A → B ). Similarly, SC(AB → A ) denotes the set of all superchannels that maps channels in CPTP(A → B) to states in D(A ).

B. Divergences
We follow the definition given in [1,2] of classical and quantum divergences, and relative entropies.
Definition. [1,2] Let D : A D(A) × D(A) → R ∪ {∞} be a function acting on pairs of quantum states in all finite dimensions.
1.The function D is called a divergence if it satisfies the data processing inequality (DPI) Furthermore, we say that D is normalized if D(1 1) = 0.
2. A divergence D is called a relative entropy if in addition it satisfies: (a) Normalization.
(b) Additivity.For any ρ 1 , ρ 2 ∈ D(A) and any σ 1 , σ 2 ∈ D(B) We use the bold D notation to denote a general quantum divergence, and use the notation D to denote a general classical divergence.For specific divergences, we use the standard notations to denote them.For example, the classical Rényi divergence of order α ∈ [0, ∞] is denoted by The min and max quantum relative entropies [31] are denoted for any ρ, σ ∈ D(A) by where Π ρ denotes the projection to the support of ρ.In [2] it has been shown that any quantum relative entropy D satisfies Another important example of a quantum relative entropy with several operational interpretations is the Umegaki relative entropy, given by This divergence plays a key roll in quantum statistics [32], quantum Shannon theory [4], and quantum resource theories [33].In [34] (see also [2,35]) it has been shown to be the only relative entropy that is asymptotically continuous.The Umegaki relative entropy also known to be the regularized minimal quantum extension of the classical KL divergence; that is, for all ρ, σ ∈ D(A) with where the supremum is over all classical systems X and all POVMs E ∈ CPTP(A → X) [9].The equality in (11) means that any quantum divergence that reduces to the classical KL relative entropy must be no smaller than the Umegaki relative entropy (see [2] for more details on optimal extensions).In [9] it was shown that (11) also hold if D is replaced with the sandwiched or minimal quantum Rényi divergence of order α Recently, the extensions of quantum divergences to channel divergences have been studied intensively [13][14][15][16][17][18].Examples include the channel extension of the Umegaki relative entropy given by where the maximum is over all pure states in D(RA) with |R| = |A|.The min and max relative entropies have also been extended to quantum channels.They are given for all N , M ∈ CPTP(A → B) by where |R| = |A|.The divergence D min (M N ) can be viewed as the = 0 case of the Hypothesis testing channel divergence given by where |R| = |A|, the maximum is over all pure states in D(RA), and for any ρ, σ ∈ D(A) We will discuss more examples of channel divergences later on.

C. relative majorization
We say that a pair of vectors p, q ∈ D(X) is relatively majorized by another pair of vectors p , q ∈ D(Y ), and write if there exists a stochastic evolution matrix E ∈ CPTP(X → Y ) such that (p , q ) = (E(p), E(q)).Therefore, the relative Rényi entropies behave monotonically under relative majorization; i.e. if (p, q) (p , q ) then D α (p q) D α p q for all α ∈ [0, ∞].If we have (p, q) (p , q ) (p, q) then we will write Relative majorization is a partial order that can be characterized with testing regions.The testing region of a pair of probability vectors p, q ∈ D(n) is a region in R 2 defined as where the inequalities are entry-wise.This region is bounded by two curves known as the lower and upper Lorenz curves.
The upper Lorenz curve can be obtained from the lower Lorenz curve by a rotation of a 180 degrees.Therefore, the lower (or upper) Lorenz curve determines uniquely the testing region.The lower Lorenz curve of a pair of probability vectors p, q ∈ R n , denoted here by L(p, q), has n + 1 vertices that can be computed as follows.First, observe that the testing region is invariant under the transformation (p, q) → (Πp, Πq) where Π is any permutation matrix.Therefore, w.l.o.g.we can assume that the components of p and q are arranged such that The n + 1 vertices of L(p, q) are the (0, 0) vertex and the n vertices {(a , b )} n =1 ⊂ L(p, q), where The relevance of testing regions to our study here is the following theorem that goes back to Blackwell [36], and that since then have been rediscovered under different names including d-majorization [37], matrix majorization [38], and thermo-majorization [39] (see also the book on majorization by Marshall and Olkin [40]).
Theorem.[36] Let p, q ∈ P(n) and p , q ∈ P(m) be two pairs of probability vectors in dimensions n and m, respectively.Then, The theorem above provides a geometric characterization to relative majorization; that is, (p, q) (p , q ) if and only if the lower Lorenz curve of (p , q ) is nowhere below the lower Lorenz curve of (p, q).Relative majorization has another remarkable property that was proven in [1].
Theorem ( [1]).Let p, q ∈ D(X) and set a := |X|.Suppose that the components of q are positive and rational.That is, there exists n 1 , ..., n a ∈ N such that Let Then, where u ∈ D(n) is the n-dimensional uniform distribution.

III. CHANNEL DIVERGENCES
We first start with the formal definition of a channel divergence and a channel relative entropy.We will use the notation D for a general channel divergence, to distinguish it from a general quantum divergence D, or a general classical divergence D.
be a function acting on pairs of quantum channels in finite dimensions.
1.The function D is called a channel divergence if it satisfies the generalized Data Processing Inequality (DPI).
That is, for any M, N ∈ CPTP(A → B) and a superchannel Θ ∈ SC(AB → A B ) The divergence D is said to be normalized if in addition D(1 1) = 0 (here the number 1 is the only element of CPTP(1 → 1)).
2. A channel divergence D is called a channel relative entropy if it satisfies the following additional properties: (a) Additivity.For any M 1 , M 2 ∈ CPTP(A → B) and any Channel divergences can be viewed as a generalization of quantum divergences since the set CPTP(A → B) with |A| = 1 can be viewed as the set of quantum states D(B).Hence, we view here the quantum states in D(B) as a special type of quantum channels, i.e. those in CPTP(1 → B) where 1 represents the one dimensional trivial system.However, there is another type of quantum channels that can be identified with quantum states.These are the replacement channels.Let σ ∈ D(B) and define the channel Therefore, this channel is uniquely determined by the dimension of system A and the state σ B .It is therefore natural to ask if channel divergences between replacement channels reduce to quantum divergences between the states that define the replacement channels.Not too surprising, we will see below that the answer to this question is on the affirmative.

A. Basic Properties
Channels divergences and relative entropies borrow some of their properties from quantum divergences.In this section we discuss a few of these basic properties.We will say that a channel divergence D is faithful if D(N M) = 0 implies that M = N .
Theorem 1 (Properties of Channel Divergences).Let D be a channel divergence.Then, 1.If D is a normalized divergence then for any two channels M, N ∈ CPTP(A → B) with equality if M = N .
2. If D is a channel relative entropy then it is a normalized divergence.
3. D is faithful if and only if its reduction to classical states (i.e.probability vectors) is faithful.
4. For any two replacement channels, E σ1 , E σ2 ∈ CPTP(A → B), as defined in (30) with σ 1 , σ 2 ∈ D(B) and |A| > 1 5. Suppose D is a channel relative entropy, and let E 1 , ..., E n ∈ CPTP(A → B) be a set of n orthogonal quantum channels; i.e. their Choi matrices satisfies Then, for any probability vector p = {p x } n x=1 , and 6.If D is a channel relative entropy then for all M, N ∈ CPTP(A → B) where D min and D max are the channel min and max relative entropies as defined in (15) and (14), respectively.
7. Let D be a channel relative entropy, R ∈ CPTP(A → B) be the completely randomizing channel, and V ∈ CPTP(A → B) be an isometry channel (we assume |A| |B|).Then, 8. If D is a channel relative entropy then for any N , M, E ∈ CPTP(A → B) 9. If D is a channel relative entropy then for any N , E, M ∈ CPTP(A → B) Moreover, if J N , J E , and J M have full support then Remark.Property 8 implies that for any N , E, F ∈ CPTP(A → B) where is a metric (also a divergence) on CPTP(A → B).Hence, in particular, any channel relative entropy D is continuous in its second argument on the subset of CPTP(A → B) consisting of channels that have strictly positive Choi matrices.Similarly, Property 9 implies a continuity in the first argument of D. That is, if E is very close to N then s can be taken to be very close to one as long as supp(J E ) ⊆ supp(J N ).Note that in this case, if we also have supp(J N ) ⊆ supp(J M ) then the continuity of D max implies that D max N + s(M − E) M goes to zero as s goes to one (recall that if s = 1 then E = N ).

B. Divergences of Classical Channels
In this subsection we consider classical dynamical divergences; i.e. divergences of classical channels.We start with the following theorem.
Theorem 2. Let D be a classical channel divergence that reduces to the classical (state) divergence D on classical states in D(X) × D(X).Suppose further that D is quasi-convex.Then, for all classical channels M, N ∈ CPTP(X → Y ) where Moreover, D is a classical channel relative entropy, and D is a normalized classical channel divergence.
Remark.In the next section, we will develop a general framework to extend channel divergences from one domain to a larger one, and the optimality of the divergences D and D will follow trivially from that general formalism.Hence, the theorem above can be viewed as a corollary of the third property in Theorem 5 of the next section.

Uniqueness of the Channel KL-Relative Entropy
The uniqueness theorem above holds only for the KL divergence and it is not clear to the author if this uniqueness still holds for the case that the KL divergence is replaced with the Rényi divergences.In [2] (cf.[34,35]) it was shown that the Umegaki relative entropy (and in particular the KL-divergence) is the only asymptotically continuous divergence.Therefore, the uniqueness theorem above implies that there is only one classical channel divergence that on classical states is asymptotically continuous.

IV. OPTIMAL EXTENSIONS
In this section we apply the extension techniques developed in [2] for general resource theories to study the optimal extensions of a classical divergence to a channel divergence.One can also consider extensions of quantum divergences to channel divergences.As we will see, such extensions give rise to additional types of channel divergences.We start with the general framework for channel extensions of divergences.

A. General Framework for Extensions
In the following theorem we apply the results that were given in [2] for a general resource theory, to channelextensions of divergences.We start with the definition of an R-divergence.Definition 2. Let R be a function that maps any pair of quantum systems A and B to a subset R(A → B) ⊂ CPTP(A → B) of quantum channels.A function Any R-divergence has two optimal extensions to a quantum channel divergence: 1.The minimal channel-extension, for any M, N ∈ CPTP(A → B) where the supremum is also over all systems A , B .
2. The maximal channel-extension, for any M, N ∈ CPTP(A → B) where the infimum is also over all systems A , B .
Remark.In the definition above we assumed that R(A → B) is a subset of quantum channels.By taking R(A → B) to be a subset of replacement channels as in (30), the extensions above can also be applied to state divergences.
Theorem 5. Let R be a function that maps any pair of quantum systems A and B to a subset R(A → B) ⊂ CPTP(A → B), and let C be an R-divergence.Then, its maximal and minimal channel-extensions C and C have the following properties: 2. Data Processing Inequality.For any M, N ∈ CPTP(A → B) and any Θ ∈ SC(AB → A B ) 3. Optimality.Any quantum channel divergence D that reduces to C on pairs of channels in R(A → B), must satisfy 4. Sub/Super Additivity.If C is weakly additive under tensor products then C is super-additive and C is sub-additve.Explicitly, for any

5.
Regularization.If C is weakly additive under tensor products then any weakly additive quantum channel divergence D that reduces to C on pairs of channels in R(A → B), must satisfy where and C reg and C reg are themselves weakly additive normalized channel divergences.
Remark.In the case that C is additive (even weakly additive) the minimal and maximal extensions are super-additive and sub-additive, respectively.This, in turn, implies that the limits in (58) exists so that C reg and C reg are well defined.Moreover, in general, the bounds on D in (57) are tighter than the bounds in (55).This assertion follows from the fact that that C is super-additive and in particular satisfies C reg (N M) C(N M).Similarly, the sub-additivity of C implies that C reg (N M) C(N M).
In the following subsections we apply Theorem 5 to the cases that R is the subset of all quantum states (i.e.replacement channels) and the subset of all classical states.We will see that this give rise to several optimal channelextensions of state divergences.We start, however, by using the theorem above to prove the uniqueness of the max channel relative entropy.

B. Uniqueness of the max relative entropy
The max relative entropy is defined for any ρ, σ ∈ D(A) as The max relative entropy is unique with respect to its monotonicity property.Unlike the relative entropy and all the other Rényi entropies, it behaves monotonically under any CP map (not necessarily trace preserving or trace non-increasing).More precisely, let ρ, σ ∈ D(A), and let E ∈ CP(A → B) be such that E(ρ) and E(σ) are normalized quantum states in D(B).Since we do not assume here that E is trace non-increasing we cannot conclude that there exists a CPTP map that achieves the same task; i.e. taking the pair (ρ, σ) to the pair (E(ρ), E(σ)).Yet, the max divergence behaves monotonically under such maps; explicitly, for a given ρ, σ ∈ D(A), for any E ∈ CP(A → B) for which E(ρ), E(σ) ∈ D(B) for the given ρ and σ.Note that we assumed here that D max is defined only on pairs of normalized states.Extensions to subnormalized states can be made using the techniques studied in [2].
For quantum channels, D max has been defined analogously to the states case as [21] D max (N M) := log min t ∈ R : tM N .
One can easily see that similar to the states case, D max (N M) behaves monotonically under any CP preserving (CPP) supermap that takes the pair of channels (M, N ) to any other pair of channels (M , N ).Specifically, let Θ ∈ CPP(AB → A B ) be a CPP supermap that is not necessarily a superchannel, and suppose that M := Θ[N ] and N := Θ[N ] are quantum channels in CPTP(A → B ).Then, We show here that the extension of D max from classical states to quantum channels is unique.We already know from Property 6 of Theorem 1 that any channel relative entropy cannot exceed D max .In fact, in the proof of Property 6 of Theorem 1 we only use the normalization property of relative entropy.Therefore, one can conclude something slightly stronger that it is even not possible to extend D max to a non-additive channel divergence.
From the optimality property of Theorem 5 it follows that in order to prove uniqueness, it is sufficient to show that the maximal and minimal extensions of D max are equal to each other.Applying the general framework for extensions developed in the previous subsection, the maximal and minimal extensions of D max to quantum channels ,denoted by D max and D max , respectively, are defined for any N , M ∈ CPTP(A → B) as Theorem 6 (Uniqueness of the dynamical max relative entropy).Let D be a channel divergence that reduces to D max on classical probability distributions; i.e. for any classical system X and p, q ∈ D(X), D(p q) = D max (p q).Then, for all N , M ∈ CPTP(A → B) Remark.Note that we do not assume that D is a channel relative entropy (i.e.additive), only a divergence that reduces to the max relative entropy on classical states.The main idea of the proof is to show that the two expressions in (63) are both equal to D max .
C. Extension from a quantum (state) divergence to a channel divergence In this section we study the optimal extensions of quantum state divergences to quantum channel divergences.For a given quantum state divergence D we denote by D its minimal channel-extension.According to (51) the minimal channel-extension is given by where in the last equality the supremum has been replaced with a maximum since w.l.o.g.we can assume that |R| = |A| and that ψ RA is a pure state [13,14].Similarly, by the definition in (52), the maximal channel-extension is given by where for any density matrices ρ, σ ∈ D(R) and channel E ∈ CPTP(RA → B) we denote The channels above have been studied under the name environment-parametrized channels [41][42][43], and the expression in (65) has been used in the literature for the cases that D is the trace norm (in which case D becomes the diamond norm [44]), Umegaki relative entropy, and quantum Rényi divergences [13,14].The following corollary is the restatement of Theorem 5 for the optimal channel-extensions of quantum state divergences.
5. If D is a weakly additive quantum state divergence then any weakly additive quantum channel divergence D that reduces to D on quantum states, must satisfy where and D reg and D reg are themselves weakly additive normalized channel divergences.
In addition to the corollary above, we have the following property for the maximal extension.
Theorem 7. Let D be a jointly convex quantum divergence.Then, its maximal channel-extension D is also jointly convex.
The channel divergence D has been shown in [20] to satisfy the generalized DPI.For the case that D = D is the Umegaki relative entropy, it was shown in [28] that it satisfies a chain rule.The latter property in particular implies that its regularization can be expressed as [28] D reg (N M) = sup ρ,σ∈D(RA) where the expression on the RHS is known as the amortized divergence [15].We will see below that any channel relative entropy that reduces to the KullbackLeibler divergence on classical states must be no smaller than the above expression.

D. Extensions from classical state divergences to channel divergences
In this subsection we study optimal channel-extensions of a classical state divergence D. We define the following four optimal extensions of D to quantum channel divergences.where D is the maximal quantum state-extension of D.
4. The min-max extension of D, where D is the minimal quantum state-extension of D.
From Theorem 5 it follows that all the four functions above satisfy the generalized data processing inequality (and therefore they are indeed divergences), and they all reduce to the classical divergence D on classical states.Moreover, for a pair of quantum states ρ, σ ∈ D(A) we have D(ρ σ) = D(ρ σ) and D ↑ (ρ σ) = D(ρ σ).In addition, Theorem 5 implies that any channel divergence D, that reduces on classical states to a classical divergence D, must satisfy This in particular applies to the cases D = D and D = D ↑ , and also note that in general, the optimal channelextensions in (65) and (66) of a quantum state divergence D, that reduces to a classical state divergence D on classical states, satisfy 1.The minimal channel extension and its regularization Note that the minimal extension D can be expressed as where the supremum is over all systems X, R, over all ψ ∈ D(AR), and over all E ∈ CPTP(BR → X).Note that w.l.o.g.we can assume that ψ RA is a pure state.There is at least one divergence for which the expression above coincide with the optimization given in (65).
Remark.The Lemma above implies that any channel divergence D that reduces to D min on classical states satisfies The minimal channel extension is typically not additive even if the classical divergence D is additive (i.e.D is a relative entropy).However, from Theorem 5 we know that if D is a classical relative entropy then its channel extension D is super-additive.This means that the limit in its regularization exists; From Theorem 5 it follows that D reg is a weakly additive divergence, and all channel relative entropies that reduces to D on classical states must be no smaller than it.Suppose now that D = D is the KL divergence.In this case, we denote by D reg its minimal regularized channel extension.Another closely related quantity that plays important role in applications is the regularized version of the minimal channel extension of the (quantum) Umegaki relative entropy, denoted as D reg .That is, where with D being the Umegaki relative entropy.Since D reg is the minimal channel extension we must have Note that if M and N in the equation above are quantum states (i.e. the input dimension |A| = 1) then the equality holds.This is due to the fact that that on quantum states, the Umegaki relative entropy D reg (ρ σ) = D(ρ σ) equals to the minimal additive state extension of the KL relative entropy (see (11)).We now show that the equality also holds for any two channels.
Theorem 9. Let M, N ∈ CPTP(A → B) be two quantum channels.Then, That is, D reg (M N ) is the smallest channel relative entropy that reduces to the KL relative entropy on classical states.
Starting with a classical relative entropy D, the process at which we arrived the weakly additive channel divergence D reg had 2 steps: (1) Extend D to the minimal channel divergence D, and (2) regularize D to obtain a weakly additive divergence.One can also introduce regularization in the state level and only then apply the channel extension.Specifically, starting with a classical relative entropy D, we apply the following four steps:

The maximal channel extension
To the author's knowledge, the maximal channel extension in (74) is new, and was not studied before.Note that the infimum in (74) can be expressed as For the case that M is an isometry we get the following result.
Theorem 10.Let V ∈ CPTP(A → B) be an isometry channel defined via V(ρ) = V ρV * , for all ρ ∈ D(A), and with isometry matrix V (i.e.V * V = I A ).Then, for any N ∈ CPTP(A → B) In section III B we provided a closed formula of this divergence for the classical case.We saw that the formula reveals that this divergence is not additive (even for classical channels), and therefore is not a relative entropy.Recall, however, that from Theorem 5 we know that if D is a classical relative entropy then its channel extension D is sub-additive.This means that the limit in its regularization exists, and D reg D. The divergence D reg is weakly additive, and it remains open to determine if it is a relative entropy (i.e.fully additive).

The geometric channel relative entropy
Given a classical divergence D, its maximal extension to quantum states is given for all ρ, σ ∈ D(A) by The geometric divergence is defined as the minimal channel-extension of this maximal state-extension of D. For the case that D = D α is the classical Rényi entropy with α ∈ (0, 2] and α = 1, it was proved in [17,18] that where For all α ∈ (0, 2], the formula above gives for an isometry V and a channel N Hence, due to Theorem 10, D α (V N ) coincides with the maximal quantum-channel extension D α (V N ).However, recall that in general we have D α (M N ) D α (M N ), and we saw in Sec.III B that the inequality can be strict even on classical channels.In fact, since D α is additive, we must have D α (M N ) D reg α (M N ), and it is left open if this inequality can be strict for some choices of M and N .
The formula (91) reveals that D α is additive (at least for α ∈ (0, 2]).In fact, to the authors knowledge, with the exception of D max , this function is the only known channel relative entropy, since all other channel divergences discussed in this paper are at most known to be weakly additive and the question whether they are fully additive is open.In [17] D α was used to derive upper bounds on certain QIP tasks, and in [18] some operational interpretations were given for it in the context of channel discrimination.

V. CONCLUSIONS
In this paper we introduced an axiomatic approach to dynamical divergences.This approached is minimalistic in the sense that we only require channel divergences to satisfy the generalized DPI under superchannels, and channel relative entropies to be in addition additive and normalized.Remarkably, we showed that these axioms are sufficient to induce enough structure, leading to numerous properties satisfied by all channel relative entropies.One of our main results, is a uniqueness theorem, Theorem 4, in which we show that in the classical domain, there exists only one channel relative entropy, that reduces to the Kullback-Leibler divergence on classical states (i.e.probability vectors).In the quantum case, it is known that this uniqueness does not hold even for quantum states, but we were able to show that the amortized relative entropy as defined in (72) is the smallest channel relative entropy that reduces to the Kullback-Leibler divergence on classical states.Due to the one-to-one correspondence between classical entropies and classical relative entropies [1], this means that the amortized relative entropy is in fact the smallest one that reduces on a pair of classical states (p, u) (here u is the uniform distribution) to the log of the dimension minus the Shannon entropy.
There are many open problems for future investigations.For example, in the classical domain, for α = 1, is the classical-channel extension of the α-Rényi relative entropy unique?Another interesting problem is whether the regularization of the maximal channel extension of relative entropies coincide with the geometric channel relative entropies.Finally, another interesting question is whether the maximal channel extensions of relative entropies satisfy a 'chain rule' similar to the one satisfied by the minimal channel extension (65) [28].
where the last equality follows from the fact that (sM − N )(φ RA + ) has one eigenvalue that is zero (since s is the smallest number satisfying sM N ).Hence, if t < s then (t − s)∆ • M(φ RA + ), which commutes with (sN − M)(φ RA + ) will make the zero eigenvalue strictly negative.Hence, D max (N M) log s = D max (N M) . (A106) This completes the proof.
7. Proof of Theorem 7 Theorem.Let D be a jointly convex quantum divergence.Then, its maximal channel-extension D is also jointly convex.
Proof.Let N = x p x N x , and M = x p x M x .Let R = R X and observe that This completes the proof.

4 .
FIG.1.A summary of all optimal extensions of the Shannon entropy.Red arrows represent maximal extensions and blue arrows represent minimal extensions.The Kullback-Leibler divergence is the only classical relative entropy that reduces to the log of the dimension minus the Shannon entropy when the second argument is a uniform distribution (in[1] a one-to-one correspondence between classical entropies and classical relative entropies was proven).Regularization is assumed in all extensions.The channel relative entropy D reg (N M) equals both the amortized divergence and the regularized minimal channel extension D reg (N M).It is the smallest of all channel relative entropies that reduces to the Kullback-Leibler divergence on classical states, while the maximal divergence D reg (N M) is the largest one.The blue and red dashed arrows indicate that the minimal and maximal d g 4 4 7 O m T k p s 1 a e r S n 0 7 s W H e j 0 x K J D n d 6 3 6 E K n n 9 n W u c 5 P b b 7 U e b t b u E 6 f W r S v 0 2 M r M l 7 T O I + w f g t r c 4 p z o c a j O I A 9 2 p y m 2 w L 5 9 u P R l 8 + + v J 4 e / j 9 b h e O 9 w Y f D z 4 d f D Y Y D b 4 a f D 8 4 G E w H Z w P / o / 8 8 / O T h 5 s O / r w / W / 7 r + j / X P W 9 W 3 3 + p s / j L o P e t f / x e X V v i 2 < / l a t e x i t >

< l a t e x i t s h a 1 _
b a s e 6 4 = " S P O F 5 0 1 3 H l n U 7 s 3 q G + 7N w I 3 K H O U = " > A A A Y b H i c p V h b c 9 y 2 F d 6 k t 2 R 7 i d N m Y n U y n W G 7 o 4 y j 8 X i 0 s p N 0 p k 0 n 0 q 5 U J b b k l S z J 8 o i K B u S C u 4 x 4 W x C U v K b 5 0 p / Y t / 6 E v v Q 3 9 I A k D k B g 5 T 6 U G i 0 P v + + c A + D g d g A v i8 K c b 2 7 + 6 7 3 3 f / L T n / 3 8 F x 9 8 2 P / l r 3 7 9 m 4 / u f f z b s z w t m E 9 P / T R K 2 b l H c h q F C T 3 l I Y / o e c Y o i b 2 I v v S u R 4 J / e U N Z H q b J C V 9 m 9 D I m s y Q M Q p 9 w g K 4 + / v Q f / f p Z r 5 + + m 7 H 0 J p x S P 4 1 j k k x L 9 7 S 6 G F 6 W A u b U 5 y 4 r I l o +

FIG. 2 .
FIG. 2. Three Lower Lorenz Curves.The channels M, N ∈ CPTP(X → Y ) with |X| = 2 and |Y | = 4.For the general case, we provide in the theorem below a closed formula for D(M N ) for any two classical channels M, N ∈ CPTP(X → Y ) in finite dimensions |X|, |Y | < ∞.For any x = 1, ..., |X|, we will denote by (m x , n x ) := M(|x x|), N (|x x|) and by M = (M y|x ) and N = (N y|x ) the |Y | × |X| matrices whose components are M yx = y|m x |y and N yx = y|n x |y , respectively.We also rearrange the components of the columns of M and N such that for each x = 1, ..., |X|

Theorem 4 .
Let D be a classical channel divergence that reduces to the KullbackLeibler divergence, D, on classical states.If D is continuous in its second argument then for all N , M ∈ CPTP(X → Y ) D(M N ) = max x∈{1,...,|X|} D M(|x x|) N (|x x|) .

Corollary 1 . 3 .
Let D be a quantum (state) divergence, and let D and D be its minimal and maximal extensions to quantum channels.Then, 1.Both D and D are quantum-channel divergences.2. Both D and D reduces to D on quantum states.Any other channel divergences D that reduces to D on quantum states must satisfy D(N M) D(N M) D(N M) ∀ N , M ∈ CPTP(A → B) .(68) 4. If D is a weakly additive quantum state divergence then D is super-additive and D is sub-additve with respect to tensor products.Explicitly, for any M 1 , M 2 ∈ CPTP(A → B) and any N 1 , N 2 ∈ CPTP(A → B )

1 .
Extend D to the minimal quantum state divergence D.

2 .
Regularize D to get a weakly additive quantum state divergence D reg .3. Using the minimal extension, extend D reg to a channel divergence D ch .4. Regularize D ch to get a weakly additive channel divergence D reg ch .Fig. 3 illustrate these four steps.

In [ 9 ]
it was shown that if D is the classical Rényi entropy with α ∈ [1/2, ∞] then D reg is the sandwiched quantum relative entropy of order α [7, 8].Therefore, in this case, D ch (M N ) = D α (M N ) is simply the channel extension of the sandwich relative entropy so that D reg ch (M N ) = D reg α (M N ) is just the regularization of D α (M N ).In Theorem 9 above we showed that for α = 1, D reg α (M N ) = D reg α (M N ) which means that for α = 1, D reg α is the smallest weakly additive divergence that reduces to the KL relative entropy on classical states.The question remains open if this equality holds for all α ∈ [1/2, ∞].