Entropy of a quantum channel

The von Neumann entropy of a quantum state is a central concept in physics and information theory, having a number of compelling physical interpretations. There is a certain perspective that the most fundamental notion in quantum mechanics is that of a quantum channel, as quantum states, unitary evolutions, measurements, and discarding of quantum systems can each be regarded as certain kinds of quantum channels. Thus, an important goal is to define a consistent and meaningful notion of the entropy of a quantum channel. Motivated by the fact that the entropy of a state $\rho$ can be formulated as the difference of the number of physical qubits and the"relative entropy distance"between $\rho$ and the maximally mixed state, here we define the entropy of a channel $\mathcal{N}$ as the difference of the number of physical qubits of the channel output with the"relative entropy distance"between $\mathcal{N}$ and the completely depolarizing channel. We prove that this definition satisfies all of the axioms, recently put forward in [Gour, IEEE Trans. Inf. Theory 65, 5880 (2019)], required for a channel entropy function. The task of quantum channel merging, in which the goal is for the receiver to merge his share of the channel with the environment's share, gives a compelling operational interpretation of the entropy of a channel. The entropy of a channel can be negative for certain channels, but this negativity has an operational interpretation in terms of the channel merging protocol. We define Renyi and min-entropies of a channel and prove that they satisfy the axioms required for a channel entropy function. Among other results, we also prove that a smoothed version of the min-entropy of a channel satisfies the asymptotic equipartition property.

In his foundational work on quantum statistical mechanics, von Neumann extended the classical Gibbs entropy concept to the quantum realm [1]. This extension, known as the von Neumann or quantum entropy, plays a key role in physics and information theory. It is defined by the following formula [1]: where ρ A is the state of a system A. The entropy has operational interpretations in terms of quantum data compression [2] and optimal entanglement manipulation rates of pure bipartite quantum states [3], where the choice of base two for the logarithm becomes clear. In recent developments of quantum thermodynamics, it was shown that the free energy, namely, the difference of the energy and the product of the temperature and the von Neumann entropy, can be interpreted as the rate at which work can be extracted from a large number of copies of a quantum system in a thermal bath at fixed temperature, by using only thermal operations [4]. By defining the quantum relative entropy of a state ρ A and a positive semi-definite operator σ A as [5] if supp(ρ A ) ⊆ supp(σ A ) and D(ρ A σ A ) = +∞ otherwise, we can rewrite the formula for quantum entropy as follows: where |A| denotes the dimension of the system A and π A ≡ I A / |A| denotes the maximally mixed state. In this way, we can think of entropy as quantifying the difference of the number of physical qubits contained in the system A and the "relative entropy distance" of the state ρ A to the maximally mixed state π A . This way of thinking about quantum entropy is relevant in the resource theory of purity [6][7][8][9][10][11], in which the goal is to distill local pure states from a given state (or vice versa) by allowing local unitary operations for free. Furthermore, the quantum relative entropy D(ρ A π A ) has an operational meaning as the optimal rate at which the state ρ A can be distinguished from the maximally mixed state π A in the Stein setting of quantum hypothesis testing [12,13].
In what follows, we use the formula in (3) as the basis for defining the entropy of a quantum channel. For some time now, there has been a growing realization that the fundamental constituents of quantum mechanics are quantum channels. Recall that a quantum channel N A→B is a completely positive, trace preserving map that takes a quantum state for system A to one for system B [14]. Indeed, all the relevant components of the theory, including quantum states, measurements, unitary evolutions, etc., can be written as quantum channels. A quantum state can be understood as a preparation channel, sending a trivial quantum system to a non-trivial one prepared in a given state. A quantum measurement can be understood as a quantum channel that sends a quantum system to a classical one; and of course a unitary evolution is a kind of quantum channel, as well as the discarding of a quantum system. One might even boldly go as far as to say that there is really only a single postulate of quantum mechanics, and it is that "everything is a quantum channel." With this perspective, one could start from this unified postulate and then understand from there particular kinds of channels, i.e., states, measurements, and unitary evolutions.
Due to the fundamental roles of quantum channels and the entropy of a quantum state, as highlighted above, it is thus natural to ask whether there is a meaningful notion of the entropy of a quantum channel, i.e., a quantifier of the uncertainty of a quantum channel. As far as we are aware, this question has not been fully addressed in prior literature (see Remark 2 for further discussion), and it is the aim of the present paper to provide a convincing notion of a quantum channel's entropy. To define such a notion, we look to (3) for inspiration. As such, we need generalizations of the quantum relative entropy and the maximally mixed state to the setting of quantum channels: 1. The quantum relative entropy of channels N A→B and M A→B is defined as [15,16] D(N M) ≡ sup ρ RA D(N A→B (ρ RA ) M A→B (ρ RA )), (4) where the optimization is with respect to bipartite states ρ RA of a reference system R of arbitrary size and the channel input system A. Due to state purification, the data-processing inequality [17], and the Schmidt decomposition theorem, it suffices to optimize over states ρ RA that are pure and such that system R is isomorphic to system A. This observation significantly reduces the complexity of computing the channel relative entropy.
2. The channel that serves as a generalization of the maximally mixed state is the channel R A→B that completely randomizes or depolarizes the input state as follows: where X A is an arbitrary operator for system A.
That is, its action is to discard the input and replace with a maximally mixed state π B .
With these notions in place, we can now define the entropy of a quantum channel: Definition 1 (Entropy of a quantum channel) Let N A→B be a quantum channel. Its entropy is defined as where D(N R) is the channel relative entropy in (4) and R A→B is the completely randomizing channel in (5).
We remark here that, in analogy to the operational interpretation for D(ρ A π A ) mentioned above, it is known that D(N R) is equal to the optimal rate at which the channel N A→B can be distinguished from the completely randomizing channel R A→B , by allowing for any possible quantum strategy to distinguish the channels [15]. Again, this statement holds in the Stein setting of quantum hypothesis testing (see [15] for details). We also emphasize here that the entropy of a channel can be negative for some channels, but this negativity has an operational interpretation in terms of the channel merging protocol (see Remark 7 in this context).
The remainder of our paper contains arguments advocating for this definition of a channel's entropy. In the next section, we show that it satisfies the three basic axioms, put forward in [18], for any function to be called an entropy function for a quantum channel, including non-decrease under the action of a random unitary superchannel, additivity, and normalization. After that, we provide several alternate representations for the entropy of a channel, the most significant of which is the completely bounded entropy of [19]. Section III delivers an operational interpretation of a channel's entropy in terms of an information-theoretic task that we call quantum channel merging, which is a dynamical counterpart of the well known task of quantum state merging [20,21]. We calculate channel entropies for several example channels in Section IV, which include erasure, dephasing, depolarizing, and Werner-Holevo channels. In the same section, we introduce the energy-constrained and unconstrained entropies of a quantum channel and calculate them for thermal, amplifier, and additive-noise bosonic Gaussian channels. In Section V, we define the α-Rényi entropy of a channel, prove that it satisfies the basic axioms for certain values of the Rényi parameter α, and provide alternate representations for it. In Section VI, we define the min-entropy of a channel, establish that it satisfies the basic axioms, and provide alternate representations for it. In Section VII, we define the smoothed min-entropy of a channel, and then we prove an asymptotic equipartition property, which relates the smoothed min-entropy of a channel to its entropy. In Section VIII, we discuss other entropies of a channel, noting that several of them collapse to the (von Neumann) entropy of a channel. We finally conclude in Section IX with a summary and some open questions.
Note on related work -After completing the results in our related preprint [22], we noticed [23,Eq. (6)], in which Yuan proposed to define the entropy of a quantum channel in the same way as we have proposed in Definition 1. Yuan's work is now published as [24]. Remark 2 We note here that "the entropy of a channel" was also defined in [25,26], but the definition given there does not satisfy "reduction to states" or the basic axiom of normalization. For this reason, it cannot be considered an entropy function according to the approach of [18].

II. ENTROPY OF A QUANTUM CHANNEL
Proceeding with Definition 1 for the entropy of a quantum channel, we now establish several of its properties, and then we provide alternate representations for it.
A. Properties of the entropy of a quantum channel In [18], it was advocated that a function of a quantum channel is an entropy function if it satisfies non-decrease under random unitary superchannels, additivity, and normalization. As shown in the next three subsections, the entropy of a channel, as given in Definition 1, satisfies all three axioms, and in fact, it satisfies stronger properties that imply these.
Before addressing the first axiom, let us first briefly review the notion of superchannels [27], which are linear maps that take as input a quantum channel and output a quantum channel. To define them, let L(A → B) denote the set of all linear maps from L(A) to L(B). Similarly, let L(C → D) denote the set of all linear maps from L(C) to L(D). Let Θ : L(A → B) → L(C → D) denote a linear supermap, taking L(A → B) to L(C → D). A quantum channel is a particular kind of linear map, and any linear supermap Θ that takes as input an arbitrary quantum channel Ψ A→B ∈ L(A → B) and is required to output a quantum channel Φ C→D ∈ L(C → D) should preserve the properties of complete positivity (CP) and trace preservation (TP). That is, the supermap should be CPTP preserving. Furthermore, for the supermap to be physical, the same should be true when it acts on subsystems of bipartite quantum channels, so that the supermap id ⊗Θ should be CPTP preserving, where id represents an arbitrary identity supermap. A supermap satisfying this property is said to be completely CPTP preserving and is then called a superchannel. It was proven in [27] that any superchannel Θ : L(A → B) → L(C → D) can be physically realized as follows. If for an arbitrary input channel Ψ A→B ∈ L(A → B) and some output channel Φ C→D ∈ L(C → D), then the physical realization of the superchannel Θ is as follows: where Λ C→AE : L(C) → L(AE) is a pre-processing channel, system E corresponds to some memory or environment system, and Ω BE→D : L(BE) → L(D) is a postprocessing channel. A uniformity preserving superchannel Θ is a superchannel that takes the completely randomizing channel R A→B in (5) to another completely randomizing channel R C→D , such that |A| = |C| and |B| = |D|, i.e., For such superchannels, we have the following: Proposition 3 Let N A→B be a quantum channel, and let Θ be a uniformity preserving superchannel as defined above. Then the entropy of a channel does not decrease under the action of such a superchannel: Proof. This follows from the fact that the channel relative entropy is non-increasing under the action of an arbitrary superchannel [18,23]. That is, for two channels N A→B and M A→B , and a superchannel Ξ, the following inequality holds Applying this, we find that The second equality follows by definition from (9).
In [18], a superchannel Υ was called a random unitary superchannel if its action on a channel N A→B can be written as where U x C→A and V x B→D are unitary channels and p X (x) is a probability distribution. In [18], it was proved that a random unitary superchannel is a special kind of uniformity preserving superchannel. Thus, due to Proposition 3, it follows that the entropy of a channel, as given in Definition 1, satisfies the first axiom from [18] required for an entropy function.

Additivity
In this subsection, we prove that the entropy of a channel is additive, which is the second axiom proposed in [18] for a channel entropy function. The proof is related to many prior additivity results from [15,19,[28][29][30]. Proposition 4 (Additivity) Let N and M be quantum channels. Then the channel entropy is additive in the following sense: Proof. This can be understood as a consequence of the additivity results from [15,30], which in turn are related to the earlier additivity results from [19,28,29]. For channels N A1→B1 and M A2→B2 , and corresponding randomizing channels R (1) A1→B1 and R A2→B2 , we have by definition that and so the result follows if ). (21) Note that the inequality "≥" for (21) trivially follows, and so it remains to prove the inequality "≤" for (21). To this end, let ψ RA1A2 be an arbitrary pure state, and define where system R ≡ RB 2 . Then we find that The first inequality follows from the same steps given in the proof of [30,Lemma 38]. This concludes the proof.
Another approach to establishing additivity is to employ the first identity of Proposition 6 (in Section II B) and [28,Eq. (3.28)], the latter of which was independently formulated in [19,Section 2.3].

Reduction to states and normalization
We now prove that the entropy of a channel reduces to the entropy of a state if the channel is one that replaces the input with a given state.
Proposition 5 (Reduction to states) Let the channel N A→B be a replacer channel, defined such that N A→B (ρ A ) = σ B for all states ρ A and some state σ B . Then the following equality holds Proof. For any input ψ RA , the output is N A→B (ψ RA ) = ψ R ⊗ σ B , and we find that This implies that concluding the proof.
A final axiom (normalization) for a channel entropy function [18] is that it should be equal to zero for any channel that replaces the input with a pure state and it should be equal to the logarithm of the output dimension for any channel that replaces the input with the maximally mixed state. Clearly, Proposition 5 implies the normalization property if the replaced state is maximally mixed or pure.

B. Alternate representations for the entropy of a channel
The entropy of a quantum channel has at least three alternate representations, in terms of the completely bounded entropy of [19], the entropy gain of its complementary channel [29], and the maximum output entropy of the channel conditioned on its environment. We recall these various channel functions now.
Recall that the completely bounded entropy of a quantum channel N A→B is defined as [19] where H(B|R) ω ≡ H(BR) ω − H(R) ω is the conditional entropy of the state ω RB = N A→B (ρ RA ) and the system R is unbounded. However, due to data processing, purification, and the Schmidt decomposition theorem, it follows that where ψ RA is a pure bipartite state with system R isomorphic to the channel input system A.
Due to the Stinespring representation theorem [31], every channel N A→B can be realized by the action of an isometric channel U N A→BE and a partial trace as follows: If we instead trace over the channel output B, this realizes a complementary channel of N A→B : Using these notions, we can define the entropy gain of a complementary channel of N A→B as follows [29]: where τ BE ≡ U N A→BE (ρ A ). The entropy gain has been investigated for infinite-dimensional quantum systems in [32][33][34]. We can also define the maximum output entropy of the channel conditioned on its environment as where again τ BE ≡ U N A→BE (ρ A ). We now prove that the entropy of a channel, as given in Definition 1, is equal to the completely bounded entropy, the entropy gain of a complementary channel, and the negation of the maximum output entropy of the channel conditioned on its environment.

Proposition 6
Let N A→B be a quantum channel, and let U N A→BE be an isometric channel extending it, as in (36). Then Proof. Using the identity D(ρ cσ) = D(ρ σ) − log 2 c, for a constant c > 0, and the fact that the conditional entropy H(B|R) N (ψ) = −D(N A→B (ψ RA ) ψ R ⊗ I B ), we find that = H CB,min (N ).
We can then conclude the dimension bound in (41) from the fact that it holds uniformly for the conditional entropy |H(B|R)| ≤ log 2 |B|. Defining τ RBE = U N A→BE (ψ RA ), from the identity for ρ A = Tr R {ψ RA }, and where we used τ R = ψ R , we have that We finally conclude that which follows from the identity (duality of conditional entropy) This concludes the proof.

Remark 7
We note here, as observed in [19], that the dimension lower bound H(N ) ≥ − log 2 |B| is saturated by the identity channel, while the dimension upper bound H(N ) ≤ log 2 |B| is saturated for the completely randomizing (depolarizing) channel, which sends every state to the maximally mixed state. Also, the entropy H(N ) is equal to zero for a replacer channel that replaces the input with a pure quantum state. It is also known that the entropy of a channel is non-negative for all entanglementbreaking channels, as shown in [19]. This includes all classical channels. Thus, unlike entropy of a quantum state, the entropy of a quantum channel can be negative. This negativity captures the ability of the channel to distill quantum entanglement, in a sense made precise by the quantum channel merging theorem stated as Theorem 10 in Section III. In the previous subsection we saw that for a replacer channel with pure output state, the entropy of a channel is zero. This replacer channel is also entanglement breaking. On the other hand, the identity channel is the least noisy channel, and therefore should have the least entropy possible. Indeed, as stated above, for the identity channel, our entropy function equals the negative of the logarithm of the dimension (which is the smallest possible value).

Corollary 8 For any quantum channel N A→B
with equality if and only if N A→B is an isometry.
Proof. The proof that − log |A| is the smallest possible value follows trivially from the well known bound D N A→B (ψ RA ) R A→B (ψ RA ) ≤ log |AB|. Now, from the proposition above Therefore, the smallest possible value − log |A| is achieved if and only if N A→B (ψ RA ) is the maximally entangled state (recall |R| = |A|). This is only possible if |B| ≥ |A| and N is an isometry.

III. QUANTUM CHANNEL MERGING
Given a bipartite state ρ BE , the goal of quantum state merging is for Bob to use forward classical communication to Eve, as well as entanglement, to merge his share of the state with Eve's share [20,21]. The optimal rate of entanglement consumed is equal to the conditional entropy H(B|E) ρ . Alternatively, the optimal rate of entanglement gained is equal to the conditional entropy H(B|R) ψ , where ψ RBE is a purification of ρ BE .
In this section, we define a task, called quantum channel merging, that can be considered a dynamical counterpart of state merging. Given a quantum channel N A→B with isometric extension U N A→BE , the goal is for Bob to merge his share of the channel with Eve's share. We find here that the entanglement cost of the protocol is equal . Equivalently, by employing (50), the entanglement gain of the protocol is equal to H(N ), the entropy of the channel N A→B . Thus, the main result of this section is a direct operational interpretation of the entropy of a channel as the entanglement gain in quantum channel merging. We note here that the completely bounded entropy of [19] (i.e., entropy of a channel) was recently interpreted in terms of a cryptographic task in [35].
We now specify the quantum channel merging information-processing task in detail. Let N A→B be a quantum channel, and suppose that U N A→BE is an isometric channel extending it. Here, we think of the isometric channel U N A→BE as a broadcast channel (three-terminal device), which connects a source to the receivers Bob and Eve. Suppose that a source generates an arbitrary state ψ RA n and then sends the A systems through the isometric channel (U N A→BE ) ⊗n , which transmits the B systems to Bob and the E systems to Eve. The goal is for Bob to use free one-way local operations and classical communication (one-way LOCC) in order to generate ebits at the maximum rate possible, while also merging his systems with Eve's.
Let n ∈ N, M ∈ Q, and ε ∈ [0, 1]. An (n, M, ε) protocol for this task consists of a one-way LOCC channel where Φ K B0E0 and Φ L B1E1 are maximally entangled states of Schmidt rank K and L, respectively and M = L/K, so that the number of ebits gained in the protocol is equal to log 2 M = log 2 L − log 2 K. Figure 1 depicts the task of quantum channel merging.
Definition 9 (Q. channel merging capacity) A rate R is achievable for quantum channel merging if for all ε ∈ (0, 1], δ > 0, and sufficiently large n, there exists an (n, 2 n[R−δ] , ε) protocol of the above form. The quantum channel merging capacity C M (N ) is defined to be the supremum of all achievable rates: Theorem 10 The quantum channel merging capacity of a channel N is equal to its entropy: We provide a detailed proof of Theorem 10 in Appendix A.

IV. EXAMPLES
In this section, we provide formulas for the entropy of several fundamental channel models, including erasure channels, dephasing channels, depolarizing channels, and Werner-Holevo channels. We also define the energyconstrained and unconstrained entropies of a channel and determine formulas for them for common bosonic channel models, including thermal, amplifier, and additive-noise channels.

A. Finite-dimensional channels
A first observation to make is that, for any finitedimensional channel, it is an "easy" optimization task to calculate its entropy. This is a consequence of the identity H(N ) = − sup ρ A H(B|E) U (ρ) from Proposition 6 and the concavity of conditional entropy [36,37] (in this context, see also [28,Eq. (3.19)]). Thus, one can exploit numerical optimizations to calculate it [38,39].
For channels with symmetry, it can be much easier to evaluate a channel's entropy, following from some observations from, e.g., [40,Section 6]. Let us begin by recalling the notion of a covariant channel N A→B [41]. For a group G with unitary channel representations {U g A } g and {V g B } g acting on the input system A and output system B of the channel N A→B , the channel N A→B is covariant with respect to the group G if the following equality holds for all g ∈ G: If the averaging channel is such that 1 |G| g U g A (X) = Tr[X]I/ |A| (implementing a unitary one-design), then we simply say that the channel N A→B is covariant. It turns out that the entropy of a channel is simple to calculate for covariant channels, with the optimal ψ RA in (35) being the maximally entangled state, or equivalently, the optimal ρ A in − sup ρ A H(B|E) U (ρ) being the maximally mixed state.

Proposition 11
Let N A→B be a quantum channel that is covariant with respect to a group G, in the sense of (57), and let U N A→BE be an isometric channel extending it. Then it suffices to perform the optimization for the entropy of a channel over states that respect the symmetry of the channel: where the symmetrizing channel S A = 1 |G| g∈G U g A . Thus, if a channel is covariant, then H(N ) = −H(B|E) U (π) ; i.e., the optimal state ρ A is the maximally mixed state π A .
Proof. First recall from Proposition 6 that Vs.
FIG. 1. The goal of quantum channel merging is for Bob to merge his share of the channel with Eve's. Given a channel NA→B, is an isometric channel extending NA→B. By consuming a maximally entangled state Φ K of Schmidt rank K and applying a one-way LOCC protocol P , Bob and Eve can distill a maximally entangled state Φ L of Schmidt rank L and transfer Bob's systems B n to Eve, in such a way that any third party having access to the inputs A n and the outputs B n and E n would not be able to distinguish the difference between the ideal situation on the left and the simulation on the right. Theorem 10 states that the optimal asymptotic rate of entanglement gain is equal to the entropy of the channel N .
channel N A→B is covariant as in (57), then it is known that there exists a unitary channel W g E such that [14,42] See also [43, Appendix A] for a simple proof. Then we find that The first equality follows from invariance of conditional entropy under the action of a local unitary (the equality holds for all g ∈ G). The third equality follows from channel covariance. The inequality follows from concavity of conditional entropy [36,37].
A simple example of a channel that is covariant is the quantum erasure channel, defined as [44] where ρ is a d-dimensional input state, p ∈ [0, 1] is the erasure probability, and |e e| is a pure erasure state orthogonal to any input state, so that the output state has d+1 dimensions. A d-dimensional dephasing channel has the following action: where p is a vector containing the probabilities p and Z has the following action on the computational basis Z|x = e 2πix/d |x . This channel is covariant with respect to the Heisenberg-Weyl group of unitaries, which is well known to form a unitary one-design. A particular kind of Werner-Holevo channel performs the following transformation on a d-dimensional input state ρ [45]: where d ≥ 2 and T denotes the transpose map T (·) = i,j |i j|(·)|i j|. As observed in [45, Section II], this channel is covariant. The d-dimensional depolarizing channel is a common model of noise in quantum information, transmitting the input state with probability 1 − p ∈ [0, 1] and replacing it with the maximally mixed state π ≡ I d with probability p: By applying Proposition 11 and evaluating the resulting entropy −H(B|E) for each of the above channels when the maximally mixed state π is input, we arrive at the following formulas: where H(p) is the Shannon entropy of the probability vector p. These formulas are plotted and interpreted in Figures 2-4.

B. Energy-constrained entropy of a channel
We can define the energy-constrained entropy of a channel for infinite-dimensional systems, by employing the identity in Proposition 6 and the definition of conditional entropy from [46].
To review the definition from [46], recall that the quantum entropy of a state ρ acting on a separable Hilbert The optimal input state is the maximally entangled state, so that the channel entropy is evaluated on the Choi state of the channel. When p = 0, the dephasing channel is the identity qubit channel and thus takes on its smallest value. When p = 1/2, the dephasing channel is a classical channel, so that its Choi state is maximally classically correlated. For such a state, D(N R) = 1 so that the channel entropy is equal to zero.
space is defined as where η(x) = −x log 2 x if x > 0 and η(0) = 0. The trace in the above equation can be taken with respect to any countable orthonormal basis of H [47, Definition 2]. The quantum entropy is a non-negative, concave, lower semicontinuous function on D(H) [48]. It is also not necessarily finite (see, e.g., [49]). When ρ A is assigned to a system A, we write H(A) ρ ≡ H(ρ A ). Recall that the relative entropy of two states ρ and σ acting on a separable Hilbert space is given by [50,51] D(ρ σ) ≡ When p = 0, the depolarizing channel is the identity qubit channel and thus takes on its smallest value. When p = 1, the depolarizing channel replaces the channel input with the maximally mixed state and thus takes on its maximal value.
where ρ = i p(i)|φ i φ i | and σ = j q(j)|ψ j ψ j | are spectral decompositions of ρ and σ with {|φ i } i and {|ψ j } j orthonormal bases. The prefactor [ln 2] −1 is there to ensure that the units of the quantum relative entropy are bits. For a bipartite state ρ AB , the mutual information is defined as Finally, for a bipartite state ρ AB such that H(A) ρ < ∞, the conditional entropy is defined as [46] H and it is known that [46]. A Gibbs observable is a positive semi-definite operator G acting on a separable Hilbert space such that Tr{e −βG } < ∞ for all β > 0 [14,52,53]. This condition for a Gibbs observable means that there is always a well defined thermal state.
Finally, we say that a quantum channel N A→B obeys the finite-output entropy condition [14,52,53] with respect to a Gibbs observable G if for all P ≥ 0, the following inequality holds We now define the energy-constrained and unconstrained channel entropy as follows: Definition 12 Let N A→B be a quantum channel that satisfies the finite-output entropy condition with respect to a Gibbs observable G. For P ≥ 0, the energyconstrained entropy of N A→B is defined as where ω RB ≡ N A→B (ψ RA ) and the optimization is with respect to all pure bipartite states with system R isomorphic to system A. The unconstrained entropy of N A→B with respect to G is then defined as C. Bosonic Gaussian channels In this section, we evaluate the energy-constrained and unconstrained entropy of several important bosonic Gaussian channels [14,54], including the thermal, amplifier, and additive-noise channels. Here we take the Gibbs observable to be the photon number operatorn [14,54], and we note that each of these channels satisfies the finite-output entropy condition mentioned above. From a practical perspective, we should be most interested in these particular single-mode bosonic Gaussian channels, as these are of the greatest interest in applications, as stressed in [14, Section 12.6.3] and [55, Section 3.5]. Each of these are defined respectively by the following Heisenberg input-output relations: whereâ,b, andê are the field-mode annihilation operators for the sender's input, the receiver's output, and the environment's input of these channels, respectively. The channel in (79) is a thermalizing channel, in which the environmental mode is prepared in a thermal state θ(N B ) of mean photon number N B ≥ 0, defined as where {|n } ∞ n=0 is the orthonormal, photonic numberstate basis. When N B = 0, θ(N B ) reduces to the vacuum state, in which case the resulting channel in (79) is called the pure-loss channel. The parameter η ∈ (0, 1) is the transmissivity of the channel, representing the average fraction of photons making it from the input to the output of the channel. Let L η,N B denote this channel.
The channel in (80) is an amplifier channel, and the parameter G > 1 is its gain. For this channel, the environment is prepared in the thermal state θ(N B ). If N B = 0, the amplifier channel is called the pure-amplifier channel. Let A G,N B denote this channel.
Finally, the channel in (81) is an additive-noise channel, representing a quantum generalization of the classical additive white Gaussian noise channel. In (81), x and p are zero-mean, independent Gaussian random variables each having variance ξ ≥ 0. Let T ξ denote this channel. Note that the additive-noise channel arises from the thermal channel in the limit η → 1, Kraus representations for the channels in (79)-(81) are available in [57], which can be helpful for further understanding their action on input quantum states.
All of the above channels are phase-insensitive or phase-covariant Gaussian channels [14,54]. Let N S ≥ 0. Since the function sup ρ:Tr{nρ}≤N S H(B|E) U (ρ) we are evaluating is concave in the input and invariant under local unitaries, [58,Remark 22] applies, implying that the optimal input state for the entropies of these channels is the bosonic thermal state θ(N S ). We then find by employing well known entropy formulas from [59,60] (see also [61] in this context) that where g 2 is the bosonic entropy function defined in (210) and Note that we arrived at the formula for H(T ξ ,n, N S ) by considering the limit discussed above. Furthermore, by the same reasoning as given in [58,Section 6], these functions are decreasing with increasing N S , and so we find that = lim = lim = lim which leads to the following formulas for the unconstrained entropies of the channels: These formulas are plotted and interpreted in Figures 5-7. A Mathematica file is available with the arXiv posting of this paper to automate these calculations, but we note here that the expansion g 2 (x) = log 2 (x)+1/ ln 2+O(1/x) is helpful for this purpose. We also note that the formulas in (95)- (96) were presented in [62,Eq. (2)] and the formula in (97) was presented in [59, Section V].

V. RÉNYI ENTROPY OF A QUANTUM CHANNEL
Generalizing the von Neumann entropy of a quantum state, the Rényi entropy finds extensive application in physics and information theory. Given a pure bipartite  state, the Rényi entropy of the reduced state is an entanglement measure, which finds application in conformal field theory [63], holography [64], and black holes [65]. The full range of values of the Rényi entropy is known as the entanglement spectrum. The Rényi entropy finds information-theoretic meaning in the expression for the error exponent of entanglement concentration [66] and quantum data compression [67], indicating the exponential rate at which errors in these settings decay to zero. As such, it is worthwhile to understand the Rényi entropy of a channel as a generalization of the Rényi entropy of a state.
In this section, we define the Rényi entropy of a channel, following the same approach discussed in the introduction. That is, we first write the Rényi entropy of a state as the difference of the number of physical qubits and the Rényi relative entropy of the state to the maximally mixed state. Then we define the Rényi entropy of a channel in the same way as in Definition 1, but replacing the channel relative entropy with the sandwiched Rényi channel relative entropy from [15].
The Rényi entropy of a quantum state ρ A of system A is defined for α ∈ (0, 1) ∪ (1, ∞) as where X α ≡ [Tr{|X| α }] 1/α and |X| ≡ √ X † X for an operator X. The Rényi relative entropy of quantum states can be defined in two different ways, known as the Petz-Rényi relative entropy [68,69] and the sandwiched Rényi relative entropy [70,71]. The sandwiched Rényi relative entropy is defined for α ∈ (0, 1) ∪ (1, ∞), a state ρ, and a positive semi-definite operator σ as whenever either α ∈ (0, 1) or supp(ρ) ⊆ supp(σ) and α > 1. Otherwise, it is set to +∞. The sandwiched Rényi relative entropy obeys the data processing inequality for ρ and σ as above, a quantum channel N , and α ∈ [1/2, 1) ∪ (1, ∞) [72] (see also [70,71,[73][74][75]): It converges to the quantum relative entropy in the limit α → 1 [70,71]: By inspection, the Rényi entropy of a state can be written as The sandwiched Rényi channel divergence of channels N A→B and M A→B is defined for α ∈ [1/2, 1) ∪ (1, ∞) as [15] where the optimization is with respect to bipartite states ρ RA of a reference system R of arbitrary size and the channel input system A. Due to state purification, the data-processing inequality in (101), and the Schmidt decomposition theorem, it suffices to optimize over states ρ RA that are pure and such that system R is isomorphic to system A.
We now define the Rényi entropy of a quantum channel as follows: Definition 13 (Rényi entropy of a q. channel) Let N A→B be a quantum channel. For α ∈ [1/2, 1) ∪ (1, ∞), the Rényi entropy of the channel N is defined as where R A→B is the completely randomizing channel from (5).
We remark here that D α (N R), for α > 1, has an operational interpretation as the strong converse exponent for discrimination of the channel N A→B from the completely randomizing channel R A→B , when considering any possible channel discrimination strategy [15].
One could alternatively define a different Rényi entropy of a channel according to the above recipe, but in terms of the Petz-Rényi relative entropy. However, it is unclear whether the additivity property is generally satisfied for the resulting Rényi entropy of a channel, and so we do not consider it further here, instead leaving this question open.

A. Properties of the Rényi entropy of a quantum channel
The Rényi entropy of a channel obeys the three desired axioms from [18], and in fact, the proofs are essentially the same as the previous ones, but instead using properties of the sandwiched Rényi relative entropy.

Proposition 14
Let N A→B be a quantum channel, and let Θ be a uniformity preserving superchannel as defined above. Then for all [1/2, 1) ∪ (1, ∞): Proof. We follow the same steps as in (12)- (16), but making the substitutions H → H α and D → D α . Also, we use the fact that, for [1/2, 1) ∪ (1, ∞), the sandwiched Rényi channel divergence does not increase under the action of a superchannel, as shown in [18].
Proposition 15 (Additivity) Let N and M be quantum channels. Then the channel Rényi entropy is additive in the following sense for α ∈ (1, ∞): Proof. The proof here follows the same approach given in the proof of Proposition 4, making the substitutions H → H α and D → D α . The steps in (24)-(28) follow from the same steps given in the proof of Proposition 41 of [30], which in turn rely upon the additivity result from [19]. See also [15] in this context.

Proposition 16 (Reduction to states)
Let the channel N A→B be a replacer channel, defined such that N A→B (ρ A ) = σ B for all states ρ A and some state σ B . Then the following equality holds for all α ∈ (0, 1) ∪ (1, ∞): Proof. The proof is essentially the same as the proof of Proposition 5, making the substitutions H → H α and D → D α .
We can then conclude that the Rényi entropy of a channel satisfies the normalization axiom from the fact that H α (B) σ = log |B| if σ B is maximally mixed and H(B) σ = 0 if σ B is pure. Just as we showed in Section II B that there are alternate representations for the entropy of a quantum channel, here we do the same for the Rényi entropy of a channel. We define the conditional Rényi entropy of a bipartite state ρ AB as where D α (ρ σ) is the sandwiched Rényi relative entropy from (100). The conditional Petz-Rényi entropy of a bipartite state ρ AB is defined as where the Petz-Rényi relative entropy D α (ρ σ) is defined for α ∈ (0, 1) ∪ (1, ∞) as [68,69] whenever either α ∈ (0, 1) or supp(ρ) ⊆ supp(σ) and α > 1. Otherwise, it is set to +∞. The Petz-Rényi relative entropy obeys the data processing inequality for ρ and σ as above, a quantum channel N , and α ∈ (0, 1) ∪ (1, 2] [68,69]: The completely bounded 1 → α norm of a quantum channel is defined for α ≥ 1 as [19] N A→B CB,1→α ≡ sup where the optimization is with respect to a density operator ρ R and Γ RA ≡ |Γ Γ| RA denotes the projection onto the following maximally entangled vector: where {|i R } i and {|i A } i are orthonormal bases and system R is isomorphic to the channel input system A.
We can now state the alternate representations for the Rényi entropy of a channel: Proposition 17 Let N A→B be a quantum channel, and let U N A→BE be an isometric channel extending it, as in (36). Then for α ∈ (0, 1) ∪ (1, ∞), where the first optimization is with respect to bipartite pure states with system R isomorphic to system A, ω RB ≡ N A→B (ψ RA ), τ BE ≡ U N A→BE (ρ A ), and β = 1/α.
For α ∈ (1, ∞), we have that Proof. To establish the equality we follow the same reasoning as in (42) we employ the identity [76, Theorem 2] To establish the dimension bounds, consider from data processing that where the second inequality follows from a dimension bound for the Rényi The first inequality is stated in [76,Corollary 4], and the second follows from data processing of the Petz-Rényi relative entropy under measurements, which holds for β ∈ (0, 1) ∪ (1, ∞), as shown in [77, Section 2.2] (note that a measurement in the eigenbasis of τ B combined with the partial trace over system E is a particular kind of measurement).
To establish the connection to the completely bounded norm for α > 1, we invoke [15,Lemma 8]

to find that
where concluding the proof.
Again, the dimension lower bound is saturated for the identity channel, while the dimension upper bound is saturated for the completely depolarizing channel.

VI. MIN-ENTROPY OF A QUANTUM CHANNEL
The min-entropy of a quantum state ρ A of a system A is defined as [78] It has found extensive application in the context of quantum cryptography [78]. The max-relative entropy of a state ρ with a positive semi-definite operator σ is defined as [79] whenever supp(ρ) ⊆ supp(σ), and otherwise, it is set to +∞. The max-relative entropy was recently given an information-theoretic meaning as the distinguishability cost of two quantum states [80]. It is known that [70] D max (ρ σ) = lim α→∞ D α (ρ σ).
Observe that the min-entropy of a quantum state ρ can be written as the difference of the number of physical qubits for the system A and the max-relative entropy of ρ to the maximally mixed state π A : Thus following the spirit of previous developments, we define the min-entropy of a channel as follows:

Definition 18 (Min-entropy of a quantum channel)
We define the min-entropy of a quantum channel N A→B according to the recipe given in the introduction of our paper: where D max (N R) is the max-channel divergence [15,16] and R A→B is the completely randomizing channel from (5).
The max-channel divergence is defined for two arbitrary channels N A→B and M A→B as [15,16] The latter equality, that an optimal state is the maximally entangled state Φ RA , was proved in [30, Lemma 12] (see also [81,Eq. (45)] and [30,Remark 13] in this context). In fact, an optimal state is any pure bipartite state with full Schmidt rank (reduced state has full support).
Due to the limit in (135) and the equality in (139), it follows that As such, we can immediately conclude that the minentropy of a channel H min (N ) is equal to the following limit and that it satisfies non-decrease under a uniformity preserving superchannel, additivity, and reduction to states (i.e., for a replacer channel, it reduces to the min-entropy of the replacing state), which, as stated previously, imply the three axioms from [18].

A. Alternate representation for the min-entropy of a channel in terms of conditional min-entropies
The conditional min-entropy of a bipartite quantum state ρ AB is defined as [78] H We can also define the following related quantity: and clearly we have that The identities in (35) and (40), as well as the definition of conditional min-entropy, inspire the following quantity: In the above, ω RB ≡ N A→B (ψ RA ) and ψ RA is a pure state with system R isomorphic to the channel input system A. This quantity might seem different from the minentropy of a channel, but the following proposition states that H ↑ min (N ) is actually equal to the min-entropy of the channel H min (N ), thus simplifying the notion of minentropy of a quantum channel: where ω RB ≡ N A→B (ψ RA ) and ψ RA is a pure state with system R isomorphic to the channel input system A. Also, the state Φ N RB = N A→B (Φ RA ) is the Choi state of the channel.
Proof. The first equality follows from the same steps in the proof of Proposition 17 (see the reasoning around (118)). The second equality, i.e., follows by the observation in (139). The proof of the equality H min (N ) = H ↑ min (N ) follows from semi-definite programming duality, similar to what was done previously for conditional min-entropy in [82]. Consider that Considering the innermost part of the last line above as the following semi-definite program its dual is given by (155) Now let us write the pure state ψ RA as Due to the fact that the set of pure states ψ RA with full-rank reduced state ψ R is dense in the set of all pure states, it suffices to optimize over these. This means that we can rewrite the last line of the first block (without the negative logarithm) as Then we find that the above is equal to The first equality above follows because we are maximizing over both ρ R and X RB , and the objective function only increases by taking X R = ρ R and with a maximal value one for the trace. So we conclude that where Φ N = N A→B (Φ RA ) and the last equality follows from (149).

B. Relation of min-entropy of a channel to its extended min-entropy
The extended min-entropy of a channel is defined as [18] where

VII. ASYMPTOTIC EQUIPARTITION PROPERTY
The smoothed conditional min-entropy of a bipartite state ρ AB is defined for ε ∈ (0, 1) as (see, e.g., [83]) where the optimization is with respect to all subnormalized states ρ AB (satisfying ρ AB ≥ 0, Tr{ ρ AB } ≤ 1, and ρ AB = 0) and the sine distance (also called purified distance) of quantum states ρ and σ [84][85][86][87] is defined in terms of the fidelity [88] as The definition of fidelity is generalized to subnormalized states ω and τ as follows [89]: where the right-hand side is the usual fidelity of states (that is, we just add an extra dimension to ω and τ and complete them to states). The smoothed conditional min-entropy satisfies the following asymptotic equipartition property [90] (see also [83]), which is one way that it connects with the conditional entropy of ρ AB : The purified channel divergence of two channels N A→B and M A→B is defined as [16] Again, due to state purification, the data-processing inequality for P (ρ, σ), and the Schmidt decomposition theorem, it suffices to optimize over states ρ RA that are pure and such that system R is isomorphic to system A. We then use this notion for smoothing the min-entropy of a channel:

Definition 20 (Smoothed min-entropy of a channel)
The smoothed min-entropy of a channel is defined for ε ∈ (0, 1) as where P (N , N ) is the purified channel divergence [16].
In the following theorem, we prove that the smoothed min-entropy of a channel satisfies an asymptotic equipartition theorem that generalizes (173).

Theorem 21 (Asymptotic equipartition property)
For all ε ∈ (0, 1), the following inequality holds We also have that Proof. We first prove the inequality in (176). Let ω R n A n denote the de Finetti state [91], defined as where σ RA is a pure state with system R isomorphic to the channel input system A, and d(σ RA ) denotes the Haar measure on pure states. This state is the maximally mixed state of the symmetric subspace of the systems (RA) n , and it is permutation invariant [92]. That is, for a unitary channel W π R n ⊗W π A n corresponding to a permutation π, we have that ω R n A n = (W π R n ⊗ W π A n )(ω R n A n ) for all π ∈ S n , with S n denoting the symmetric group.
Let ω R R n A n denote the purification of the de Finetti state, with the purifying system R satisfying the inequality |R | ≤ (n + 1) |A| 2 −1 [91]. The reduced state ω A n is permutation invariant and has full rank. The latter follows because the set of pure states ψ RA with a full-rank reduced density operator ψ R is dense in the set of all pure states, and tensor products of full-rank states are full rank. Let ω N n R R n B n denote the state resulting from the action of the quantum channel N n A n →B n on the input state ω R R n A n , and let ω N ⊗n R R n B n denote the state resulting from the action of the quantum channel N ⊗n A→B on the input state ω R R n A n . Let CPTP(A n → B n ) denote the set of all quantum channels from input system A n to output system B n . Let Perm(A n → B n ) denote the set of all permutation covariant quantum channels from input system A n to output system B n . Define ψ N n RB n to be the state resulting from the action of the channel N n A n →B n on the input state ψ RA n . Then consider that ≥ sup N n ∈Perm(A n →B n ): The first equality follows from Definition 20. The first inequality follows by restricting the maximization to permutation-covariant channels. The second equality follows because the reduced state ω A n has full rank and by applying the remark after (139), to conclude that The second inequality follows by applying the postselection technique [91, Theorem 1] with (See also Proposition D.5 of [93].) Note that the factor of two in the exponent of (188) is necessary because we are employing the sine distance as the channel distance measure. To be clear, the statement we are invoking is that if then Continuing, we have that Eq. (182) = sup N n ∈CPTP(A n →B n ): ≥ sup σ R R n B n : P (N ⊗n (ω R R n A n ),σ R R n B n )≤2ε /3 The first equality follows from reasoning similar to that given for Lemma 11 in Appendix B of [94], i.e., that a permutation-covariant channel is optimal among all channels, due to the fact that the original channel N ⊗n is permutation covariant. In our case, it follows by employing the fact that the channel min-entropy does not decrease under the action of a uniformity preserving superchannel (see the discussion after (141)), and the superchannel that randomly performs a permutation at the channel input and the inverse permutation at the channel output is one such superchannel. The second equality is a consequence of the fact that the following two sets are equal: N n A n →B n (φ RA n ) : which follows from applying Lemma 10 in Appendix B of [94]. The inequality follows from Theorem 3 of [95] (while noting that the stateρ AB defined therein satisfieŝ ρ B = ρ B , so that the proof Theorem 3 of [95] applies to our situation). Continuing, and by applying [80, Eq. (L10)] and definitions, we find that sup σ R R n B n : P (N ⊗n (ω R R n A n ),σ R R n B n )≤2ε /3 where The second inequality follows from the definition of the Rényi entropy of a channel (Definition 13), and the equality follows from the additivity of the Rényi entropy of a channel (Proposition 15). Putting everything above together, we conclude the following bound: Taking the limit as n → ∞, we conclude that the following inequality holds for all α > 1: Since this inequality holds for all α > 1, we can take the limit as α → 1 to conclude that This concludes the proof of the inequality in (176).
To arrive at the second inequality in (177), let N n be a channel such that Now let φ RA n be an arbitrary state. We then have from the definition in (174) that Defining the states we find that The second inequality follows from monotonicity of the conditional Rényi entropy with respect to α, and the last from the uniform continuity bound in [96, Lemma 2]. The above bound holds for any choice of φ RA n , and so we conclude that where the equality follows from the additivity of the entropy of a channel (Proposition 4). Now, the inequality has been shown for all N n satisfying P (N ⊗n , N n ) ≤ ε, and so we conclude, after dividing by n, that Taking the limit as n → ∞, we get that Now taking the limit as ε → 0, we arrive at the second inequality in (177).
In Appendix B, we point out how an approach similar to that in the above proof leads to an alternate proof of the upper bound in [94,Theorem 8], regarding an asymptotic equipartition property for the smoothed maxmutual information of a quantum channel.

VIII. GENERALIZED CHANNEL ENTROPIES FROM GENERALIZED DIVERGENCES
In this section, we discuss other possibilities for defining generalized entropies of a quantum channel. One main concern might be how unique or distinguished our notion of entropy of a channel from Definition 1 is, being based on the channel relative entropy of the channel of interest and the completely randomizing channel. As a consequence of the fact that there are alternate ways of defining channel relative entropies, there could be alternate notions of channel entropies. However, we should recall that one of the main reasons we have chosen the definition in Definition 1 is that the channel relative entropy appearing there has a particularly appealing operational interpretation in the context of channel discrimination [15]. That is, for what one might consider the most natural and general setting of quantum channel discrimination, the optimal rate for distinguishing a channel from the completely randomizing channel is given by the channel relative entropy in (4) [15]. As we show in what follows, there are further reasons to focus on our definition of the entropy of a channel from Definition 1, as well as our definition of the min-entropy of a channel from Definition 18.
To begin the discussion, let S(C) denote the set of quantum states for an arbitrary quantum system C. Let us recall that a function D : S(C)×S(C) → R∪{+∞} is a generalized divergence [97,98] if for arbitrary Hilbert spaces H A and H B , arbitrary states ρ A , σ A ∈ S(A), and an arbitrary channel N A→B , the following data processing inequality holds Examples of interest are in particular the quantum relative entropy, the Petz-Rényi divergences, the sandwiched Rényi divergences, as considered in this paper. Based on generalized divergences, one can define at least two different channel divergences as a measure for the distinguishability of two quantum channels N A→B and M A→B . Here we consider a function of two quantum channels to be a channel divergence if it is monotone under the action of a superchannel.
1. Generalized channel divergence [16]: In the above, the optimization can be restricted to pure states of systems R and A with R isomorphic to system A. The monotonicity of the generalized channel divergence under the action of a superchannel was proven in [18].
2. Amortized channel divergence [30]: The monotonicity of the amortized channel divergence under the action of a superchannel was proven in [30].
We can consider other divergences as follows, but they are not known to be monotone under the action of a general superchannel, and so we do not label them as channel divergences: 1. Choi divergence: As we show in Appendix C, the Choi divergence is monotone under the action of a superchannel consisting of mixtures of a unital pre-processing channel and an arbitrary post-processing channel.

Adversarial divergence:
(219) In the above, due to state purification, data processing, and the Schmidt decomposition, the maximization can be restricted to pure states ρ RA of systems R and A with R isomorphic to system A. The minimization should be taken over mixed states σ RA . For a proof of this fact, see Appendix D.
3. Adversarial Choi divergence: (220) 4. "No quantum memory" divergence: There could certainly even be other divergences to consider. In our context, two effective ways of singling out particular divergences as primary and others as secondary are 1) whether the channel divergence has a compelling operational interpretation for a channel discrimination task and 2) whether the channel divergence leads to an entropy function that satisfies the axioms from [18].
Based on the recipe given in the introduction, from a given divergence D (N M) (any of the choices above), one could then define a generalized entropy function of a channel N A→B as where R A→B is the completely randomizing channel from (5).
Taking the above approach to pruning entropy functions, we can already rule out the last one ("no quantum memory"), as done in [18], because, after taking D to be the most prominent case of quantum relative entropy, the resulting entropy function is the minimum output entropy of a channel, which is known to be non-additive [99]. While an entropy arising from the Choi divergence leads to an entropy function satisfying the axioms desired for an entropy function, the Choi divergence itself does not appear to have a compelling operational interpretation in the sense of being a "channel measure" because it simply reduces a channel discrimination problem to a state discrimination problem (i.e., it does not make use of the most general approach one could take for discriminating arbitrary channels). This point could be debated, and we do return to entropy functions derived from Choi and adversarial Choi divergences in Section VIII D below.
A. Collapse of entropy functions derived from quantum relative entropy From the list above, by focusing on the operational and axiomatic criteria listed above, this leaves us with the generalized channel divergence and the amortized channel divergence. Here we also consider the adversarial divergence. Interestingly, after taking D to be the prominent case of quantum relative entropy and the channel M to be the completely randomizing channel, we find the following collapse of the divergences: The first equality was shown in [15,30], and we show the second one now. From the definitions, we have that Now taking an infimum over all σ RA and invoking the non-negativity of quantum relative entropy, we conclude that By taking a supremum over ρ RA , we then conclude that D adv (N R) = D(N R). Thus, the collapse in (223), as well as the operational interpretation of D(N R) from [15] and the fact that the resulting entropy function satisfies the axioms from [18], indicate that our choice of the entropy of a quantum channel in Definition 1 is cogent.
B. Collapse of entropy functions derived from max-relative entropy Interestingly, a similar and further collapse occurs when taking D to be the max-relative entropy: The first two equalities were shown in [30,Proposition 10] for arbitrary channels N and M. By employing a semidefinite programming approach as in the proof of Proposition 19, we can conclude the last equality. Thus, this collapse, as well as the facts that the max-relative entropy D max (N M) is an upper bound on the rate at which any two channels can be distinguished in an arbitrary context [30,Corollary 18] and the resulting entropy function H min (N ) satisfies the axioms from [18], indicate that our choice of the min-entropy of a quantum channel in Definition 18 is also cogent.

C. Entropy functions derived from Rényi relative entropies
In Section V, we defined the Rényi entropy of a channel as in Definition 13, in terms of the sandwiched Rényi relative entropy. The following collapse is known for the sandwiched Rényi relative entropy for α ∈ (1, ∞) [15,30]: However, it is not known whether these quantities are equal for α ∈ (0, 1) or whether they are equal to the adversarial divergence D adv α (N R) for any α ∈ (0, 1) ∪ (1, ∞). At the same time, one of the most compelling reasons to fix the definition of channel Rényi entropy as we have done is that the channel divergence D α (N R) has both a convincing operational interpretation in channel discrimination as the optimal strong converse exponent and the entropy function satisfies all of the desired axioms for an entropy function. Furthermore, the entropy function H α (N ) represents a useful bridge between the entropy and min-entropy of a quantum channel, due to the facts that lim α→1 H α (N ) = H(N ), lim α→∞ H α (N ) = H min (N ), and H α (N ) ≤ H β (N ) for α ≥ β ≥ 1.
One should notice that we did not define the Rényi entropy of a channel in terms of the Petz-Rényi relative entropy and the resulting channel divergence, amortized channel divergence, or adversarial divergence. One of the main reasons for this is that it is not known whether the resulting entropy functions are additive. Furthermore, operational interpetations for these divergences have not been established, having been open since the paper [15] appeared. As such, it very well could be the case that one could derive cogent notions of channel entropy from the Petz-Rényi relative entropy, but this remains the topic of future work.

D. Entropy functions derived from Choi and adversarial Choi divergences
In this subsection, we discuss various entropy functions derived from Choi and adversarial Choi divergences. As emphasized previously, we note again here that the operational interpretations for these divergences are really about state discrimination tasks rather than channel discrimination tasks. Nevertheless, the resulting entropy functions satisfy the axioms put forward in [18].
By picking the divergence D to be the quantum relative entropy D, we find that the Choi and adversarial Choi divergences are equal when discriminating an arbitrary channel N A→B from the completely randomizing channel R A→B : The proof of this statement follows along the lines of (224)-(229). There is a simple operational interpretation for D Φ (N R) in terms of state discrimination [12,13], while an operational interpretation for D adv,Φ (N R) in terms of state discrimination was given recently in [100]. We could also pick the divergence D to be Petz-Rényi relative entropy D α or the sandwiched Rényi relative entropy D α . The resulting Choi and adversarial Choi divergences are then generally not equal when discriminating an arbitrary channel N A→B from the completely randomizing channel R A→B . There is an operational interpretation for D Φ α (N R) for α ∈ (0, 1) in terms of state discrimination [101,102] (error exponent problem), and there is an operational interpretation for D Φ α (N R) for α ∈ (1, ∞) in terms of state discrimination [74] (strong converse exponent problem). Interestingly, [100] has given a meaningful operational interpretation for the adversarial Choi divergences D adv,Φ α (N R) for α ∈ (0, 1) and D adv,Φ α (N R) for α ∈ (1, ∞) in terms of error exponent and strong converse exponent state discrimination problems, respectively.
For N A→B a quantum channel and Φ N RB ≡ N A→B (Φ RA ) the Choi state, the resulting channel entropy functions are then as follows: It then follows that all of the above entropy functions are additive for α ∈ (0, 1) ∪ (1, ∞) (with the exception of additivity holding for H adv,Φ α (N ) for α ∈ [1/2, 1) ∪ (1, ∞)), due to the facts that the Choi state of a tensor-product channel is equal to the tensor product of the Choi states of the individual channels, as well as the additivity of the underlying conditional entropies, for H adv,Φ α (N ) shown in [83] and for H adv,Φ α (N ) following from the quantum Sibson identity [98,Lemma 7] (see also [76, Lemma 1]). Normalization and reduction to states (as in Proposition 5) follows for all of the above quantities. What remains is monotonicity under random unitary superchannels, and what we can show is something stronger: monotonicity under doubly stochastic superchannels, the latter defined in [18] as superchannels Θ such that their adjoint Θ † is also a superchannel, where the adjoint is defined with respect to the inner product for supermaps considered in [18].
Proof. Recall from [18] that, since Θ is doubly stochastic, we have that Let Θ be as above, and let us begin by considering the adversarial quantities for the ranges of α for which data processing holds. Let ω R be an arbitrary state. Let ξ AER ≡ Λ C→AE (Φ CR ), and note that the marginal ξ A is the maximally mixed state due to (243) and the dimension constraint |A| = |C|. Therefore, there exists a quantum channel E R→ER such that Let σ ER ≡ E R→ER (ω R ). With these notations set, and working with the specific entropy function in (239), we find that Since the inequality holds for an arbitrary state ω R , we conclude that which is the inequality in (242) for the adversarial Choi Rényi entropy H adv,Φ α (N ). The proof for the entropy functions in (236) and (240) goes the same way, since the above proof only relied upon the data processing inequality.
To arrive at the inequality in (242) for the entropy functions in (237)-(238), we exploit the same proof, but we choose ω R to be the maximally mixed state. By tracing over systems AE in (245), we find that and so we conclude that the reduced channel Tr E •E R→ER is unital. This means that, by choosing σ ER ≡ E R→ER (ω R ) again, we can conclude that σ R = π R . By applying the same steps as above, we then find that which is the inequality in (242) for the entropy function in (237). The proof for the entropy function in (238) then goes the same way.
As a final remark to conclude this section, we note that the following limit holds as a consequence of (135), and so the proof given above represents a different way, from that given in [18], for arriving at the conclusion that the extended min-entropy of a channel is non-decreasing under the action of a doubly stochastic superchannel.

IX. CONCLUSION AND OUTLOOK
In this paper, we have introduced a definition for the entropy of a quantum channel, based on the channel relative entropy between the channel of interest and the completely randomizing channel. Building on this approach, we defined the Rényi and min-entropy of a channel. We proved that these channel entropies satisfy the axioms for entropy functions, recently put forward in [18]. We also proved that the entropy of a channel is equal to the completely bounded entropy of [19], and the Rényi entropy of a channel is related to the completely bounded 1 → p norm considered in [19]. The smoothed min-entropy of a channel satisfies an asymptotic equipartition property that generalizes the same property for smoothed minentropy of quantum states [90]. We showed that the entropy of a channel has an operational interpretation in terms of a task called quantum channel merging, in which the goal is for the receiver to merge his share of the channel with the environment's share, and this task is a dynamical counterpart of the known task of quantum state merging [20,21]. We evaluated the entropy of a channel for several common channel models. Finally, we considered other generalized entropies of a quantum channel and gave further evidence that Definition 1 is a cogent approach for defining entropy of a quantum channel.
Going forward from here, one of the most interesting open questions is to determine if there is a set of axioms that uniquely identifies the entropy of a quantum channel, similar to how there is a set of axioms that uniquely characterizes Shannon entropy [103]. We wonder the same for the Rényi entropy of a channel, given that the Rényi entropies were originally identified [104] by removing one of the axioms that uniquely characterizes Shannon entropy. On a different front, one could alternatively define the entropy of n uses of a quantum channel in terms of an optimization over quantum costrategies [105,106] or quantum combs [107], and for analyzing the asymptotic equipartition property in this scenario, one could alternatively smooth with respect to the strategy norm of [107,108]. The results of [15] suggest that the asymptotic equipartition property might still hold in this more complex scenario, but further analysis is certainly required. Note that a related scenario has been considered recently in [109]. Finally, if the Petz-Rényi channel divergence between an arbitrary channel and the completely depolarizing channel is additive, then a Rényi channel entropy defined from it would be convincing. This question about Petz-Rényi channel divergence has been open since [15].
Proof. For the achievability part, we employ ideas used in the theory of quantum channel simulation [93,[111][112][113]. In particular, the main challenge of quantum channel merging over quantum state merging is that it is necessary for the protocol to work for every possible state ψ RA n that could be input, not merely for a fixed state input. In prior work on quantum channel simulation [93,[111][112][113], this challenge has been met by appealing to the post-selection technique [91, Theorem 1]. Here, we use the same approach. In the context of the postselection technique, it is helpful to consult the unpublished note [114] for further details.
Let ζ A nÂn denote the maximally mixed state of the symmetric subspace of the A nÂn systems [92], whereÂ is isomorphic to the channel input system A. Note that this state can be written as [92,Proposition 6] where ψ AÂ denotes a pure state and dψ AÂ is the Haar measure over the pure states. This state is permutation invariant; i.e., for a unitary channel W π A n ⊗ W π A n corresponding to a permutation π, we have that ζ A nÂn = (W π A n ⊗ W π A n )(ζ A nÂn ) for all π ∈ S n , with S n denoting the symmetric group. Let ζ R Ân A n be a purification of ζ A nÂn , and note that it can be chosen such that [114] ζ R Ân A n = (W π R ⊗ W π A n ⊗ W π A n )(ζ R Ân A n ), where W π R is some unitary, which implies that The first goal is to show the existence of a state merging protocol for the state (U N A→BE ) ⊗n (ζ R Ân A n ). As shown in [115,Theorem 5.2] (see also the earlier [116,Proposition 4.7] in this context), there exists a state merging protocol with error √ 13ε , with the entanglement gain satisfying (To arrive at the inequality in (A22), one needs to use the fact that P (ρ, σ) ≥ 1 2 ρ − σ 1 for any two states.) That is, there exists a one-way LOCC channel P B n E n B0E0→ B n E E n B1E1 such that the following inequality holds ically relies upon the additivity H α (N ⊗n ) = nH α (N ) from Proposition 15, which in turn directly follows from the main result of [19]. Putting everything together, we conclude that for ε ∈ (0, 1/13), there exists an (n, L/K, √ 13ε) channel merging protocol for N A→B such that its entanglement gain satisfies the following inequality for all α > 1: 2 log 2 ( 1 ε ) + 4 |A| 2 − 1 log 2 (n + 1) + 1 α . (A34) We arrive at the statement of the proposition by a final substitution ε = √ 13ε ∈ (0, 1), which implies that ε = (ε ) 2 /13 and 2 log 2 (1/ε) = 4 log 2 (1/ε ) + 2 log 2 13.

Quantum channel merging capacity is equal to the entropy of a channel
We can now put together the previous two propositions to conclude the following theorem: Proof of Theorem 10. By applying the limits n → ∞ and ε → 0, the following bound is a consequence of Proposition 23: For an arbitrary α > 1, ε ∈ (0, 1), and δ > 0, we can conclude from Proposition 24 that there exists an (n, 2 n[Hα(N )−δ] , ε) channel merging protocol by taking n sufficiently large. This implies that H α (N ) is an achievable rate for all α > 1. However, since this statement is true for all α > 1, we can conclude that the rate sup α>1 H α (N ) = H(N ) is achievable also. This establishes that C M (N ) ≥ H(N ).
Appendix B: Max-mutual information of a channel and the asymptotic equipartition property In this appendix, we point out how the max-mutual information of a quantum channel is a limit of the sandwiched Rényi mutual information of a channel, the latter having been defined in [117]. We then show how to arrive at an alternate proof of the asymptotic equipartition property in [94,Theorem 8] by making use of this connection.
First recall that the sandwiched Rényi mutual information of a channel is defined for α ∈ (0, 1) ∪ (1, ∞) as [117,Eq. (3.5)] where ω RB ≡ N A→B (ψ RA ), where D α is the sandwiched Rényi relative entropy from (100). It was subsequently used in [15]. The max-mutual information of a channel is equal to [94,Definition 4] I max (N ) ≡ max ψ RA I max (R; B) ω , (B4) Proposition 25 For a quantum channel N A→B , the following limit holds Proof. To see this, consider that Now consider that The smoothed max-mutual information of a quantum channel N A→B is then defined for ε ∈ (0, 1) as [94, Definition 5] where, in the first inequality, we used the monotonicity of the divergence under data processing, and for the second inequality, we used the monotonicity under maps of the form in (C3). (D1) where ψ RA is pure with system R isomorphic to system A.
To see the claim after (219), let ρ RA be an arbitrary state with purification φ R RA . It thus holds that φ R RA is a purification of ρ A , with R R acting as the purifying systems. By taking a "canonical" purification of ρ A that is in direct correspondence with its eigendecomposition, there exists a purification ϕ SA of ρ A with system S isomorphic to system A. Since the purification φ R RA is related by an isometric channel U S→R R to the purification ϕ SA as φ R RA = U S→R R (ϕ SA ) and applying the isometric invariance of generalized divergences [119], we conclude for an arbitrary state ω SA that The first inequality is from data processing under the partial trace over R . Since the inequality holds for arbitrary ω SA , we conclude that (D8) This concludes the proof.