Resource theory of asymmetric distinguishability for quantum channels

,


I. INTRODUCTION
In many scientific fields of interest, distinguishability is an important concept. More generally, it can be considered as a resource in that it allows for making decisions, and furthermore, the more distinguishable that two possibilities are, the easier and faster it is to make a decision.
In a recent paper, we formalized the notion of distinguishability as a resource by developing the resource theory of asymmetric distinguishability in detail [WW19], following the original proposal from [Mat10,Mat11]. This resource theory demonstrates that distinguishability is truly a fundamental resource that can be manipulated and interconverted into different forms. The benefit of developing this resource theory is that, not only can fundamental tasks such as quantum hypothesis testing [HP91,ON00,Hay03,Hay04,WR12,Hay17] be recast into an intuitive approach based on resourcetheoretic thinking, but also new information processing tasks emerge, such as distinguishability dilution, which is related to concepts such as simulation and synthesis of quantum states. The present paper illustrates further benefits of the resource-theoretic approach by using it to solve some outstanding questions in the theory of quantum channel discrimination.
In the resource theory of asymmetric distinguishability for states [WW19], the basic object to be manipulated is a quantum "box" (ρ, σ) consisting of two quantum states ρ and σ. The descriptor "asymmetric" applies to this resource theory because it allows for a slight error in the transformation of the first state of the box, while not allowing for any error in the transformation of the second state of the box. One basic task is to distill as many bits of asymmetric distinguishability as possible from this box by processing it with an arbitrary quantum channel [WW19]. Another basic task is to dilute bits of asymmetric distinguishability to prepare the box (ρ, σ), with the goal being to use as few bits of asymmetric distinguishability as possible in order to do so [WW19]. These tasks give operational meaning to fundamental entropic measures such as the min-relative entropy [Dat09], the smooth min-relative entropy [BD10,BD11,WR12], the max-relative entropy [Dat09], and the smooth maxrelative entropy [Dat09]. One of the core results for this resource theory is that it is reversible, and the fundamental rate of interconversion is characterized by the quantum relative entropy [WW19].
The main goal of the present paper is to generalize these concepts from quantum states to quantum channels, given the prominent role of the latter in quantum information and beyond. We note here that recently there has been much effort more generally in extending concepts from resource theories of quantum states to resource theories for quantum channels (see, e.g., [BHLS03,BGMW17,DBW17,GFW + 18, LBL18,  BDW18, TR19, TEZP19, WW18, SC19, WWS19, LY19, LW19, YLZ + 19]). In the resource theory of asymmetric distinguishability for quantum channels presented here, the basic object to be manipulated is a quantum channel box (N , M), which consists of two quantum channels N and M. The idea is that the input and output ports of the channel box are accessible to an agent in the resource theory, while the particular choice of the channel is unknown to the agent. A key difference between this resource theory and the former one for quantum states is that a quantum channel can be probed by means of both an input port and an output port, which implies that the way they are manipulated is by means of a quantum superchannel [CDP08b]. As a simple example of a superchannel, consider that the encoding and decoding, i.e., pre-and post-processing, of a channel commonly employed in quantum Shannon theory [Wil17] realize a physical transformation of a channel. The incorporation of superchannels into the resource theory implies that the channel resource theory is more involved than it is for boxes consisting only of states.
More generally, we allow for quantum strategy boxes [GW07,CDP08a,CDP09,Gut09,Gut12,GRS18] and manipulate them by means of general physical transformations [CDP09] that take quantum strategies to other quantum strategies (note that quantum strategies are in one-to-one correspondence with quantum combs [CDP09]). By the results of [CDP09], such physical transformations are in fact quantum strategies themselves, so that our generalization of the resource theory to quantum strategies is a significant generalization.
We consider several fundamental tasks in this resource theory, which can be understood as extensions of the tasks considered in [WW19]. The first basic one is distinguishability distillation, in which the goal is to distill as many bits of asymmetric distinguishability as possible from a single channel box in the one-shot setting, or multiple channel boxes in the n-shot setting. This task is intimately related to asymmetric hypothesis testing for quantum channels [CMW16] (see [Hay09] for the classical case), which is a particular kind of quantum channel discrimination. For this task, there are a variety of possibilities to consider, including the one-shot case and the n-shot case, in the latter using either a parallel or sequential strategy [GW07,CDP08a,Gut09,DFY09,HHLW10,Gut12,CMW16]. We also consider this task for quantum strategy boxes. Another basic task of interest is distinguishability dilution, in which the goal is to dilute bits of asymmetric distinguishability to a single or multiple channel boxes, using as few bits of asymmetric distinguishability as possible. This task also has a variety of possibilities, including one-and n-shot, the latter having parallel and sequential variants as well. We likewise consider this task for quantum strategy boxes. This task is also intimately related to quantum channel simulation [BSST02, BDH + 14, BCR11, BBCW13, BRW14, Ber13, BGMW17, GFW + 18, FBB19, FWTB19, Wil18], but here takes on a specific form due to the structure of the resource theory of asymmetric distinguishability.
One of the major tasks in this resource theory is to convert one channel box to another, doing so either exactly or approximately. As a variant of this problem, another task is to determine the rate at which it is possible to convert n channel boxes, with each box consisting of the same pair of channels, to m boxes consisting of another pair of channels, when n is allowed to be arbitrarily large. More generally, we consider the conversion of an n-round quantum strategy box to an m-round strategy box. The simpler transformation problem for state boxes was solved in [WW19] and is relevant for addressing the channel box transformation problem for particular channel boxes that are environment-seizable, as defined in [BHKW18].

II. SUMMARY OF RESULTS
We now summarize the main contributions and results of our paper: 1. We establish the resource theory of asymmetric distinguishability for quantum channels, with the basic objects being quantum channel boxes, the free operations to manipulate them being quantum superchannels [CDP08b], and the basic units of currency being bits of asymmetric distinguishability (see Section III). Later we accomplish the same for quantum strategy boxes, with the free operations to manipulate them being quantum strategies (see Section VIII).
2. We prove that the approximate channel box transformation problem is characterized by a semidefinite program and thus can be calculated efficiently with respect to the input and output dimensions of the channels (see Section IV).
3. The exact one-shot distillable distinguishability of a quantum channel box is equal to the channel min-relative entropy, which is a particular case of the generalized channel divergence of [CMW16,LKDW18]. The exact one-shot distinguishability cost of a quantum channel box is equal to the channel max-relative entropy, which is a particular case of the generalized channel divergence of [CMW16,LKDW18] and explored in more detail in [GFW + 18,BHKW18]. See Section V A for both of these results.
4. The approximate one-shot distillable distinguishability of a quantum channel box is equal to the smooth channel min-relative entropy of [CMW16], the latter also known as channel hypothesis testing relative entropy [CMW16]. The approximate oneshot distinguishability cost of a quantum channel box is equal to the smooth channel max-relative entropy, again a particular case of the generalized channel divergence of [LKDW18] and explored in more detail in [GFW + 18]. See Section V B for both of these results.
5. We consider asymptotic parallel versions of the above tasks in Section VI. We find that the exact distillable distinguishability is given by the regularized channel min-relative entropy (see Section VI A). By means of an example from [Aci01], we conclude that the regularization seems to be necessary because the channel min-relative entropy is highly non-additive. We then prove that the exact distinguishability cost is equal to the channel max-relative entropy (see Section VI B). The distillable distinguishability is equal to the regularized channel relative entropy (see Section VI C), and the same quantity is a lower bound on the distinguishability cost (see Section VI D). These latter operational tasks simplify for both environment-seizable and classical-quantum channel boxes.
6. Section VII considers the asymptotic parallel version of the general channel box transformation problem, giving basic definitions and some bounds that apply to this case. Again, the results simplify for the case of environment-seizable and classicalquantum channel boxes.
7. Section VIII considers the quantum strategy box transformation problem. To begin with, this section introduces the generalized quantum strategy divergence as a generalization of the strategy distance of [CDP08a,CDP09,Gut12] and establishes a data processing inequality for this distinguishability measure. The section then establishes several bounds on how well one can perform a physical transformation from one strategy box to another strategy box. All of the results apply to sequential channel boxes because these are special cases of strategy boxes. Furthermore, we consider an asymptotic version of the box transformation problem for sequential channel boxes and prove concrete results for environment-seizable and classicalquantum channel boxes.
8. We then consider distillation and dilution of strategy boxes in Section IX. Our key results here, specialized to sequential channel boxes, include singleletter formulas for the asymptotic exact sequential distinguishability cost and the asymptotic sequential distillable distinguishability, expressed respectively as the channel max-relative entropy and the amortized channel relative entropy of [BHKW18], giving these quantities fundamental operational interpretations in the resource theory of asymmetric distinguishability. The latter result can be alternatively understood as a solution to Stein's lemma for quantum channels in the sequential setting.
In the rest of the paper, we discuss details of the resource theory of asymmetric distinguishability for quantum channels, as well as the contributions listed above.

III. RESOURCE THEORY OF ASYMMETRIC DISTINGUISHABILITY FOR QUANTUM CHANNELS
We begin by generalizing the resource theory of asymmetric distinguishability from [WW19] to the setting of quantum channels, by considering a channel box of the following form: where N and M are quantum channels, each acting on an input system A and outputting a system B. Recall that a quantum channel is a completely positive, tracepreserving (CPTP) map. We also write these as N A→B and M A→B in what follows in order to indicate the input and output systems explicitly. The channel box generalizes the state box (ρ, σ) from [WW19], which consists of a pair of quantum states ρ and σ. In fact, a state box is a special case of a channel box in which the input systems are trivial. One interpretation of the channel box in (1) is that a distinguisher is allowed to prepare any state ρ RA of a reference system R and the channel input A, either the channel N A→B or M A→B is applied, and then the distinguisher is allowed to perform any post-processing on the reference R and the channel output B in order to decide which channel was applied. That is, by inputting an arbitrary state ρ RA to the channel box (pre-processing) and then applying the channel P RB→S (post-processing), one can transform it to the following state box: More generally, the agent who has access to the channel box in (1) can perform a quantum superchannel [CDP08b] on it in order to transform it to another channel box, as discussed in Section III B below.
As stated earlier, the channel box in (1) indeed generalizes the state box (ρ, σ) considered previously in [WW19]. Another way of seeing this is to take the channels N and M in (1) to be replacer channels with the following action: Then no matter what state τ RA is input to the channel box (N , M), it reduces to the state box (τ R ⊗ ρ A , τ R ⊗ σ B ), which, by the discussion in [WW19, Section III], is equivalent by a free operation to the state box (ρ B , σ B ).
A. Environment-parametrized and -seizable channels Other simple classes of channel boxes that are strongly related to state boxes, generalizing the above example of a replacer channel box in (3)-(4), include those that are environment parametrized [TW16] and the subclass of environment-seizable channel boxes [BHKW18]. Note that environment-parametrized channel boxes are related to programmable channels [NC97,DP05].
A channel box (N A→B , M A→B ) is environment parametrized with associated environment states ρ E and σ E if there exists a common interaction channel P AE→B such that for all inputs ω A [TW16]. In this way, any pre-processing of an environment-parametrized channel box as can be viewed as a postprocessing of the state box so that the distinguishability of the channel box (N , M) is always limited by that of the state box (ρ E , σ E ), as observed in [TW16] (see [JWD + 08, DDM14] for related observations in quantum estimation theory). We should emphasize that an arbitrary channel box (N , M) is environment-parametrized with associated environment states that are orthogonal [DW19]. That is, we can set ρ E = |0 0| E and σ E = |1 1| E and the common interaction channel P AE→B as In this way, the channels N A→B and M A→B are realized as in (5)-(6), by starting from the state box (|0 0| E , |1 1| E ) and applying the common interaction channel P AE→B in (9). However, this realization of the channels is the least efficient from the perspective of the resource theory of asymmetric distinguishability, because a state box consisting of a pair of orthogonal states is equivalent to an infinite number of bits of asymmetric distinguishability [WW19]. (See [WW19] for the notion of bits of asymmetric distinguishability, and Section V for this notion in the channel resource theory.) In this sense, the realization of an arbitrary channel box in the above way is trivial because it requires an infinite number of bits of asymmetric distinguishability in order to do so. The concept of environment-parametrized channel boxes becomes non-trivial when the background environment states have finite distinguishability, when measured according to some divergence, so that the channel box can be realized starting from a finite number of bits of asymmetric distinguishability. Environment-seizable channel boxes are defined to be environment-parametrized with associated environment states ρ E and σ E and additionally have the property that it is possible to find a common pre-and postprocessing of the channel box (N A→B , M A→B ) to retrieve the state box (ρ E , σ E ) from it [BHKW18]. That is, for environment-seizable channels, there exists a common input state τ RA and a common post-processing channel D RB→E such that In this way, we have the following equivalence for environment-seizable channels: with the direction ← of the equivalence following from (8) and the other direction → following from the seizable property. Thus, environment-seizable channel boxes represent a broader generalization of state boxes than do channel boxes consisting of replacer channels. Furthermore, environment-seizable channel boxes are fully identified with the background environment states ρ E and σ E in the above sense. As we show later, and as observed in earlier work [TW16,BHKW18], the equivalence in (12) simplifies the resource theory of asymmetric distinguishability significantly for environment-seizable channel boxes. Finally, several examples of environmentseizable channel boxes were presented in [BHKW18], and the notion of environment-seizable channel boxes is related to the notion of resource-seizable channels from [Wil18].

B. Superchannels as transformations of channel boxes
The most general physical transformation allowed on a channel box is a superchannel Θ, which is a quantum physical transformation of channels [CDP08b]. That is, a superchannel is a linear map that preserves the set of quantum channels, even when the quantum channel is an arbitrary bipartite channel with external input and output systems that are arbitrarily large. In this sense, superchannels are completely CPTP preserving. Note that the terminology "superchannel" was introduced in [Gou19].
To see this, a superchannel Θ (A→B)→(C→D) takes as input a quantum channel N A→B and outputs a quantum channel K C→D , which we denote by The superchannel Θ (A→B)→(C→D) is completely CPTP preserving in the sense that the following output channel is a CPTP map for all input quantum channels M RA→RB , where id (R)→(R) denotes the identity superchannel [CDP08b]. One of the fundamental theorems of superchannels is that each superchannel Θ (A→B)→(C→D) has a physical realization as a pre-and post-processing of the channel N A→B along with a quantum memory system: where E C→AM and D BM →D are pre-and post-processing quantum channels, respectively [CDP08b]. This transformation is depicted in Figure 1.

IV. GENERAL CHANNEL BOX TRANSFORMATION PROBLEM
We can now state one main problem for the resource theory of asymmetric distinguishability for quantum channels, which we call the channel box transformation problem. The goal of this problem is to determine, for an input channel box (N A→B , M A→B ) and an output channel box (K C→D , L C→D ), whether there exists a superchannel Θ (A→B)→(C→D) such that the following transformation is possible: where the notation means that the following equations should be satisfied This problem was introduced and solved in [Gou19], in the sense that the answer to this question can be determined by means of a semi-definite program or by employing the extended conditional min-entropy and a quantum dynamic generalization of majorization. The problem there was called "comparison of quantum channels." Note that the simpler problem regarding transformation of state boxes via a common quantum channel has a long history, having been considered extensively both in classical and quantum information theory [Bla53, AU80, CJW04, MOA11, Bus12, HJRW12, BDS14, BaHN + 15, Ren16, BD16, Bus16, GJB + 18, Bus17,BG17]. In many cases of interest, the transformation in (16) is simply not possible. Thus, it is sensible to modify the problem to allow for approximation, and the way that we do so is consistent with how we did so for the related problem in the resource theory of asymmetric distinguishability for states [WW19]. Namely, we allow for an approximation error in the transformation of the first channel in the box, but we demand that the second channel be simulated exactly (hence the descriptor "asymmetric" in "resource theory of asymmetric distinguishability"). Mathematically, this corresponds to the following optimization problem: where SC denotes the set of superchannels and the shorthand N 1 ≈ ε N 2 for channels N 1 and N 2 is defined as follows: In the above, P A→B denotes the diamond norm [Kit97] of a Hermiticity-preserving map P A→B , defined as where the optimization is with respect to quantum states ρ RA and the reference system R can be arbitrarily large. However, note that the following significant simplification holds where the optimization is with respect to pure-state inputs ψ RA with the reference system R isomorphic to the input system A. Why do we adopt the diamond norm to measure the distance between two quantum channels N A→B and M A→B ? Related, how should we assess the performance of a quantum information processing protocol in which the ideal channel to be simulated is N A→B but the channel realized in practice is M A→B ? Suppose that a third party is trying to assess how distinguishable the actual channel M A→B is from the ideal channel N A→B . Such an individual has access to both the input and output ports of the channel, and so the most general strategy for the distinguisher to employ is to prepare a state ρ RA of a reference system R and the channel input system A. The distinguisher transmits the A system of ρ RA into the unknown channel. After that, the distinguisher receives the channel output system B and then performs a measurement described by the POVM {Λ x RB } x on the reference system R and the channel output system B. The probability of obtaining a particular outcome Λ x RB is given by the Born rule. In the case that the unknown channel is N A→B , this probability is Tr[Λ x RB N A→B (ρ RA )], and in the case that the unknown channel is M A→B , this probability is Tr[Λ x RB M A→B (ρ RA )]. What we demand is that the deviation between the two probabilities Tr[Λ x RB N A→B (ρ RA )] and Tr[Λ x RB M A→B (ρ RA )] is no larger than some tolerance ε. Since this should be the case for all possible input states and measurement outcomes, what we demand mathematically is that where ρ RA ≥ 0, Tr[ρ RA ] = 1, and 0 ≤ Λ RB ≤ I RB . As a consequence of a well known characterization of trace distance from [Hel69,Hel76], we have that where 1 2 N − M is the normalized diamond distance between N and M. This indicates that if 1 2 N − M ≤ ε, then the deviation between probabilities for any possible input state and measurement operator never exceeds ε, so that the approximation between quantum channels N A→B and M A→B is naturally quantified by the normalized diamond distance 1 2 N − M . We note that related interpretations of the diamond distance of channels have been given in [KW04,RW05,GLN05].
As we indicated above, the approximate channel box transformation problem is fundamental to the resource theory of asymmetric distinguishability, indicating exactly how well one can convert channel boxes. It captures distinguishability in a fundamental way: as pointed out in [Gou19], a necessary condition for a transformation to be possible exactly is if the two channels in one channel box are more distinguishable than the two channels in the target channel box, as quantified by a channel divergence [LKDW18]. Thinking along the lines of [BaHN + 15], these kinds of limitations from channel divergences can be interpreted as "second laws" for distinguishability that draw the line between the possible and impossible. As these lines might be too sharp for practical purposes (i.e., if the transformation were to be pos-sible with small error), then it is sensible to consider the relaxation presented in (19). Furthermore, generalizations of the approximate box transformation problem will have applications in other resource theories of channels, such as entanglement, thermodynamics, purity, magic, etc., and therein can also be interpreted as second laws or approximate second laws.
In Appendix B, we show that the optimization in (19) for the approximate channel box transformation problem can be calculated by a semi-definite program, and thus can be efficiently solved, where the complexity of the problem is polynomial in the dimension of the inputs and outputs of the channels (N A→B , M A→B ) and (K C→D , L C→D ). This result generalizes the recent finding in [Gou19] mentioned after (16) above.

V. ONE-SHOT DISTILLATION AND DILUTION OF QUANTUM CHANNEL BOXES
Another way of addressing the general approximate channel box transformation problem, which is helpful for considering asymptotic versions of the problem, is to break it into two steps, as was done in [WW19] for the case of states. Namely, one can first distill a standard channel box from the original one, and then dilute this standard channel box to the final target one. In this work, we take the standard channel box to be the following one: where R σ denotes a replacer channel, which has the following action on an arbitrary input ρ: which is simply to discard the input ρ and replace it with a state σ. Also, the state π M is defined as for M ≥ 1. Our interpretation of the channel box in (26) is that it contains log 2 M bits of asymmetric distinguishability. Since the replacer channel box in (26) is equivalent to the state channel box (|0 0|, π M ), this interpretation is consistent with the interpretation given in [WW19].
A. Exact one-shot distillation and dilution of quantum channel boxes A primary goal in this setting is the task of exact distillation of as many bits of asymmetric distinguishability as possible, which is similar to the task for states considered in [WW19], but instead we allow for the most general processing of the channel box according to a superchannel. Mathematically, we can phrase this problem as the following optimization: We also consider exact dilution of the channel box, starting from as few bits of asymmetric distinguishability as possible. The requirement here is to convert bits of asymmetric distinguishability by the action of a common superchannel to the channel box (N , M) exactly, in such a way that the number of bits log 2 M of asymmetric distinguishability is as small as possible. Mathematically, this corresponds to the following optimization problem: and Π ρ is the projection onto the support of ρ. Note that the min-relative entropy of states is also equal to the Petz-Rényi relative entropy of order zero [Pet85,Pet86], as observed in [Dat09]. Let D max (N M) denote the channel max-relative entropy [CMW16, LKDW18, GFW + 18], defined as with the maximally entangled state Φ RA of Schmidt rank d defined as and the max-relative entropy of states ρ and σ defined as [Dat09] D max (ρ σ) := inf λ : ρ ≤ 2 λ σ .
The equality in (34) was proved in [GFW + 18,BHKW18]. We then have the following fundamental result for exact distillation and dilution: Theorem 1 The exact one-shot distillable distinguishability of the channel box (N , M) is equal to the channel min-relative entropy: and the exact one-shot distinguishability cost is equal to the channel max-relative entropy: The equality in (37) is proved in Appendix C 1, and the equality in (38) is proved in Appendix C 2.
We remark that it is appealing that the exact one-shot distinguishability cost of a channel box has a simple characterization in terms of the Choi states of the channels N and M, as indicated by the equality in (34).

B. Approximate one-shot distillation and dilution of quantum channel boxes
We also consider approximate versions of these tasks. The goal of approximate distillation is to transform the channel box (N , M) into as many ε-approximate bits of asymmetric distinguishability as possible. Mathematically, this corresponds to the following optimization: The goal of approximate dilution is to transform as few bits of asymmetric distinguishability into a channel box ( N , M), such that N ≈ ε N . Mathematically, this corresponds to the following optimization: Let D ε min (N M) denote the smooth channel minrelative entropy from [CMW16], defined as with the optimization being with respect to all pure states ψ RA with system R isomorphic to the channel input system A. The smooth min-relative entropy of states ρ and σ is defined as [BD10, BD11, WR12] (42) The quantity D ε min (ρ σ) is also known as the hypothesis testing relative entropy [WR12], and D ε min (N M) is also known as the channel hypothesis testing relative entropy [CMW16].
Let D ε max (N M) denote the smooth channel maxrelative entropy [GFW + 18, Definition 19], defined as with the optimization being with respect to all quantum channels N satisfying N ≈ ε N , in the sense of (20). We note here that the smooth channel max-relative entropy has been studied extensively in [LW19], in the context of resource erasure. In Appendix C 3, we prove that the smooth channel min-and max-relative entropies can be calculated by semi-definite programs. It follows from these characterizations that the non-smooth quantities can be as well.
We then have the following result, endowing both the smooth channel min-and max-relative entropies with fundamental operational meanings in the context of the resource theory of asymmetric distinguishability: Theorem 2 The approximate one-shot distillable distinguishability of the channel box (N , M) is equal to the smooth channel min-relative entropy: and the approximate one-shot distinguishability cost is equal to the smooth channel max-relative entropy: The equality in (44) is proved in Appendix C 4, and the equality in (45) is proved in Appendix C 5.
As a consequence of Theorems 1 and 2, and the facts that we conclude the following limits: We give alternative proofs of these limits in Appendix C 6. As an application of the operational approach taken here, we arrive at the following bound relating D ε1 min and D ε2 max : where ε 1 , ε 2 ≥ 0 and ε 1 + ε 2 < 1. This bound represents a generalization of a related bound for quantum states in [WW19], and it in fact reduces to it when the channel box (N , M) is environment seizable.
The main idea for arriving at the bound in (50) can be understood as a channel generalization of the operational argument from [WW19]. As shown in [WW19], any approximate distillation protocol performed on the state box (|0 0|, π M ) that leads to the state box ( 0 ε , π K ), for ε ∈ [0, 1) and 0 ε a state such that 0 ε ≈ ε |0 0|, is required to obey the bound One way to realize the full transformation is to proceed in two steps: use the equivalence max (N M) and then perform an optimal distillation protocol (N , . Finally, we realize the transformation ( R |0 0| C→D , R π K C→D ) → ( 0 ε1 , π K ) by inputting any state to the final channel box. By employing the triangle inequality for the diamond distance, the error of the overall transformation is no larger than ε 1 + ε 2 . Since the fundamental limitation in (51) applies to any protocol, the bound in (50) follows.

VI. PARALLEL n-SHOT DISTILLATION AND DILUTION OF QUANTUM CHANNEL BOXES
An important case to consider in the resource theory of asymmetric distinguishability for channels is the case of parallel tasks. In particular, we are interested in nshot parallel distillation and dilution of channel boxes, which essentially amounts to the replacement (N , M) → (N ⊗n , M ⊗n ) in our previous one-shot results from Section V. However, here we are interested in optimal rates at which one can distill or dilute bits of asymmetric distinguishability from or to a channel box, respectively, both in the exact and approximate cases.

A. Exact case: distillable distinguishability
We define the n-shot, parallel, exact distillable distinguishability of a channel box (N , M) as follows: noting that it is equal to the optimal rate at which one can distill exact bits of asymmetric distinguishability for fixed n ≥ 1. The equality above is a direct consequence of (37). The asymptotic parallel exact distillable distinguishability is then defined as where the equality is again a direct consequence of (37). We note that the regularization in (55) seems to be necessary in general, due to the fact that D min for channels can be non-additive. As an example, suppose that N is the identity channel and M is a unitary channel characterized by a unitary operator U . Then it follows that It is known from [Aci01] that there are unitaries for which F (I, U ) ∈ (0, 1) but F (I ⊗n , U ⊗n ) = 0 for some finite n.
Turning this around, we conclude that there are channels for which for some finite n, indicating that the channel min-relative entropy exhibits an extreme form of non-additivity. A special case for which the exact distillable distinguishability simplifies is for environment-seizable channels. As a consequence of the observation in (12), an immediate conclusion is the following equality: which holds for any channel box (N , M) that is environment seizable in the sense of (12). The first equality follows from (12), and the second follows from the additivity of the min-relative entropy for states. We thus conclude that the asymptotic exact parallel distillable distinguishability has the following single-letter formula for the case of environment-seizable channel boxes:

B. Exact case: distinguishability cost
We define the n-shot, parallel, exact distinguishability cost of a channel box (N , M) as follows: noting that it is equal to the optimal rate at which one can dilute exact bits of asymmetric distinguishability to the channel box (N ⊗n , M ⊗n ) for fixed n ≥ 1. The equality above is a direct consequence of (38) and the additivity of the max-relative entropy of channels, due to the fact that (34) holds.
The asymptotic exact distinguishability cost is then defined as where the equality is again a direct consequence of (38). Thus, exact distinguishability dilution in the parallel case is rather different from exact distinguishability distillation, given that we have a simple single-letter formula characterizing all channel boxes for the former case but not for the latter.

C. Approximate case: distillable distinguishability
We define the n-shot, parallel, ε-approximate distillable distinguishability as follows: noting that it is equal to the optimal rate at which one can distill approximate bits of asymmetric distinguishability for fixed n ≥ 1 and ε ∈ (0, 1). The asymptotic parallel distillable distinguishability of the channel box (N , M) is then defined as the following limit of the above formula: where the latter equality follows from (44). Note that the quantity in (66) is equal to the optimal exponent in Stein's lemma for the case of parallel quantum channel discrimination [CMW16]. The following theorem gives a formal expression for this quantity in terms of the regularized channel relative entropy.
Theorem 3 The parallel distillable distinguishability of the channel box (N , M) is equal to the regularized channel relative entropy: and it is finite if and only if D max (N M) < ∞.
Proof. By exploiting the following bound for states ρ and σ [WR12, MW14,KW17], where h 2 (ε) := −ε log 2 ε−(1 − ε) log 2 (1−ε) is the binary entropy, we conclude the following bound for channels after an optimization: By making the substitution (N , M) → (N ⊗m , M ⊗m ), dividing by m, and taking the limit m → ∞ followed by ε → 0, we conclude that Also, note that the following lower bound holds as a consequence of the lower bound from [Li14,TH13]: where Φ −1 is the inverse of the cumulative standard normal distribution function, V ε (N M) is the channel relative entropy variance, defined as with Π the set of all bipartite pure states achieving the optimal value of D(N M) and the relative entropy variance V (ρ σ) of states ρ and σ defined as Taking the limit as n → ∞ and ε → 0, we find that However, we can also conclude the following bound by making the substitution (N , M) → (N ⊗m , M ⊗m ) in (72), from which we conclude that for all m ≥ 1. Since this bound holds for all m, we can take the limit, and when combining with (71), we conclude that As observed in [BHKW18, Remark 19], the regularized channel relative entropy on the right-hand side is finite if and only if D max (N M) < ∞.
An important case in which the situation simplifies considerably is for environment-seizable channel boxes, as identified in [BHKW18]. As a consequence of the observation in (12), an immediate conclusion is the following equality for any channel box (N , M) that is environment seizable in the sense of (12). For such channels, we can even conclude the following expansion: for such environment-seizable channel boxes. Another important case for which we have a handle on the distillable distinguishability is classical-quantum channel boxes, defined as where ω X is an arbitrary input state, {|x X } x is an orthonormal basis, and {ρ x B } x and {σ x B } x are sets of states. An immediate consequence of [BHKW18, Corollary 28] is the following equality for classical-quantum channel boxes: Eq. (84) indicates that the asymptotic parallel distillable distinguishability of a classical-quantum channel box depends only on the maximum quantum relative entropy that can be realized by the input of a single classical state to the channels.

D. Approximate case: distinguishability cost
We define the n-shot, parallel, ε-approximate distinguishability cost as follows: noting that it is equal to the optimal rate at which one can dilute the channel box (N ⊗m , M ⊗m ) approximately from bits of asymmetric distinguishability for fixed n ≥ 1 and ε ∈ (0, 1). The asymptotic parallel distinguishability cost of the channel box (N , M) is then defined as the following limit of the above formula: where the latter equality follows from (45).
As a direct consequence of the inequality in (50) and Theorem 3, we find that We note that an inequality similar to the above one, which does not include regularization, has been reported as [LW19,Theorem 11]. Whether the lower bound in (88) is also an upper bound remains an open question. However, the following upper bound holds as a consequence of definitions and the fact that the channel max-relative entropy is single-letter: Furthermore, from this upper bound and [BHKW18, Remark 19], we conclude that the asymptotic parallel distinguishability cost is finite if and only if D max (N M) is. Although we have not been able to solve the asymptotic parallel distinguishability cost in general, we can do so for some interesting special cases. First, for any channel box (N , M) that is environment seizable, in the sense of (12), an immediate conclusion is the following equality: Then as a consequence of the asymptotic equipartition property for states [TCR09], by taking the limit n → ∞ of (90), it follows that thus demonstrating a complete understanding of the asymptotic cost for these channel boxes. As in [WW19], one can make refined statements (for second-order expansions) of 1 n D ε c (N ⊗n , M ⊗n ) for such channels. Another important case for which we have a handle on the distinguishability cost are classical-quantum channel boxes (N X→B , M X→B ), with a common classical input alphabet and output Hilbert space, defined as in (82)-(83): . Then the asymptotic parallel distinguishability cost is equal to the channel relative entropy: Proof. It is known from [BHKW18] that the following identity holds for classical-quantum channel boxes: Thus, the lower bound is a direct consequence of (88) and (93).
To establish the upper bound, we make use of Proposition 4 from Appendix D, which states that the following inequality holds for all α > 1 and ε ∈ (0, 1): As such, we apply this inequality to the channel box (N ⊗n X→B , M ⊗n X→B ), as well as [BHKW18,Lemma 25], to find that the following inequality holds for all α > 1 and ε ∈ (0, 1): Taking the limit as n → ∞, we find that the following inequality holds for all α > 1: Now taking the limit as α → 1, we conclude that lim sup This concludes the proof.
Proposition 1 indicates that the asymptotic parallel distinguishability cost of a classical-quantum channel box depends only on the maximum quantum relative entropy that can be realized by the input of a single classical state to the channels. As such, when combined with the result from (84), we conclude that the resource theory of asymmetric distinguishability is reversible in the asymptotic setting of parallel channel box transformations when restricted to classical-quantum channel boxes, meaning that one can convert between such channel boxes without any loss. We provide further related remarks about this observation in the next section.

VII. GENERAL CHANNEL BOX TRANSFORMATION: PARALLEL CASE
We can now address the general channel box transformation problem for the parallel case. Before doing so, let us formalize the problem. Let n, m ∈ Z + and ε ∈ [0, 1].
An (n, m, ε) parallel channel box transformation protocol for the channel boxes (N , M) and (K, L) consists of a superchannel Θ (n) such that A rate R is achievable if for all ε ∈ (0, 1], δ > 0, and sufficiently large n, there exists an (n, n [R − δ] , ε) parallel channel box transformation protocol. The optimal parallel channel box transformation rate R p ((N , M) → (K, L)) is equal to the supremum of all achievable rates.
On the other hand, a rate R is a strong converse rate if for all ε ∈ [0, 1), δ > 0, and sufficiently large n, there does not exist an (n, n [R + δ] , ε) parallel channel box transformation protocol. The strong converse parallel channel box transformation rate R p ((N , M) → (K, L)) is equal to the infimum of all strong converse rates.
Note that the following inequality is a consequence of the definitions: An important result is that if the channel boxes (N , M) and (K, L) are either classical-quantum or environment-seizable, then the following equality holds indicating that the channel relative entropy plays a central role as the optimal conversion rate between these kinds of channel boxes. Appendix E provides detailed proofs of converse bounds that justify the claim in (104), by starting with converse bounds for generic one-shot channel box transformation protocols and then applying them to the parallel case of interest (see also Appendix F for how to translate some of these bounds to lower bounds on the smooth channel max-relative entropy). The achievability part follows from combining a distillation protocol with a dilution protocol (as was done for states in [WW19]) and the fact that these tasks have simple characterizations for these channel boxes.

VIII. GENERAL BOX TRANSFORMATION: SEQUENTIAL CHANNELS AND QUANTUM STRATEGIES
We now move on to consider another variant of the general channel box transformation problem corresponding to the sequential case. This case is more involved than the parallel case considered above because it cannot be reduced to the one-shot case. That is, it is fundamentally a multi-shot problem, and the theory relies upon key developments from [CDP09]. As such, we develop the theory more generally for quantum strategies [GW07] or quantum combs [CDP09] and then apply it to sequential channel boxes, which are a special case of quantum strategies. A quantum strategy consists of a sequence of quantum channels, each of which has an accessible input and output, while passing along an internal memory system that can vary in size [GW07]. We remark here that there are various terms to refer to this same physical object, including quantum memory channels [KW05,CGLM14], quantum strategies [GW07,Gut09,Gut12,GRS18], and quantum combs [CDP09], and there are even earlier works where similar notions appear [BGNP01,ESW02]. Here we adopt the terminology "quantum strategy" to refer to such an object.
The main reason for considering the more complicated quantum strategies is that doing so leads to a better understanding and simplification of the analysis of sequential channel boxes, while at the same time providing a significant generalization of the theory. Indeed, regarding this latter point, one might think of generalizing the theory even further by considering physical transformations of quantum strategies and even an infinite hierarchy of this sort, just as we generalized the resource theory of states to channels by considering physical transformations of channels in the form of superchannels. However, a key insight of [CDP09] is that quantum strategies are the end of the line: physical transformations of quantum strategies are simply quantum strategies, so that the hierarchy ends with quantum strategies. Thus, the theory developed here in this sense is a rather general resource theory of asymmetric distinguishability.

A. Quantum strategies and sequential channel boxes
The basic object to manipulate in this setting is a quantum strategy box or a sequential channel box (N (n) , M (n) ). A sequential channel box is a special case of a quantum strategy box, and since it is simpler, we discuss it briefly first. For a sequential channel box, the notation N (n) indicates n sequential uses of the channel N A→B and M (n) indicates n sequential uses of the channel M A→B . Sequential channel boxes have been considered implicitly in previous work on sequential quantum channel discrimination [CDP08a, CDP09, DFY09, HHLW10, CMW16, BHKW18].
notation, we sometimes write where M 0 and M n are trivial registers.
It is straightforward to see that a quantum strategy box generalizes a sequential channel box discussed above, with each element of a sequential channel box being a sequence of the same channel without any memory. That is, the sequential channel box is a special case of (105) and (106) A quantum co-strategy [GW07] (or tester [CDP08a,CDP09]) for distinguishing two quantum strategies consists of an input state ρ R1A1 and a set of testing channels {A i RiBi→Ri+1Ai+1 } n−1 i=1 , such that the final state when processing the first quantum strategy N (n) is given by (107) and the final state when processing the second quantum strategy M (n) is given by Figure 2 depicts the state ρ RnBn in (107) when n = 3. For our developments in this and the next section, it is helpful to define a generalized quantum strategy divergence as an abstract measure of how distinguishable two quantum strategies are.
Definition 1 (Generalized q. strategy divergence) The generalized quantum strategy divergence of a quantum strategy box (N (n) , M (n) ) is defined as where the generalized divergence D for states is defined by (A1), the states ρ RnBn and σ RnBn are defined in (107) and (108), respectively, and the optimization is with respect to all quantum co-strategies or testers that could be used to distinguish the quantum strategies N (n) and M (n) .
Note that this quantity generalizes the quantum strategy distance and quantum strategy fidelity of [CDP08a,CDP09,Gut12,GRS18], as well as the strategy maxrelative entropy of [CE16], to arbitrary divergences. Those quantities employ trace distance, fidelity, and max-relative entropy as the underlying divergences, respectively, but in what follows, we make extensive use of the generality afforded by Definition 1.

B. Physical transformations of quantum strategy boxes and data processing
Just as quantum channels model physical transformations of quantum states and superchannels model physical transformations of quantum channels, we can also consider physical transformations of quantum strategies. Given a quantum strategy N (n) , we consider a general linear and completely positive transformation Θ (n→m) of it, which takes as input an n-round quantum strategy and outputs an m-round quantum strategy. A fundamental result of [CDP09] is that such a physical transformation Θ (n→m) of a quantum strategy N (n) is in turn described by an (n + m)-round quantum strategy that interconnects with N (n) to generate an output, m-round quantum strategy.
Due to various choices of time ordering involved, there is not a unique way to describe this physical transformation [CDP09], but here we adopt the choice that the physical transformation Θ (n→m) first processes all channels involved in the quantum strategy N (n) , and then it generates the output m-round strategy Θ (n→m) (N (n) ). As such, the physical transformation Θ (n→m) consists of n + m channels F i for i ∈ {1, . . . , n + m}, and the output quantum strategy K (m) = Θ (n→m) (N (n) ) then consists of the following m channels: for j ∈ {2, . . . , m − 1}, where we identify the memory systems for the output strategy K (m) as M k ≡ R n+k for k ∈ {1, . . . , m}. Figure 3 depicts the transformation of a three-round quantum strategy N (3) to a threeround quantum strategy K (3) by a physical transformation Θ (3→3) consisting of the channels F 1 , . . . , F 6 , along with the pairing of the transformed strategy with a quantum co-strategy.
The following data processing inequality for the generalized strategy divergence is a direct consequence of the definition and the fact that the underlying generalized divergence D obeys data processing. This key property allows for establishing bounds on the general strategy box transformation problem. Also, it generalizes the data processing inequality for strategy distance and strategy fidelity from [GRS18], but we require physical transformations in order to establish it.
Theorem 4 Let N (n) and M (n) be n-round quantum strategies, and let Θ (n→m) be a physical transformation of them, of the form discussed above, that leads to m-round quantum strategies Θ (n→m) (N (n) ) and Θ (n→m) (M (n) ). Then the following data processing inequality holds for the generalized quantum strategy divergence: . Also, let us consider a quantum co-strategy for K (m) and L (m) , which consists of a state ρ R 1 C1 and a set of channels: Suppose first that the physical transformation Θ (n→m) acts on the quantum strategy N (n) . In this case, the first channel F 1 C1→R1A1 acts on the state ρ R 1 C1 and outputs systems A 1 and R 1 . Then the channel N 1 A1→M1B1 is applied, and the second channel F 2 R1B1→R2A2 is applied. This repeats n − 1 more times, and the resulting state is as follows: At this point, the other elements of the co-strategy and the remainder of the transformation Θ (n→m) are applied, which consists of the co-strategy channels interleaved by the transformation channels F n+2 , . . . , F m . The resulting state is then where We also define the following states for the quantum strategy M (n) : Then consider that The first inequality follows because the state ρ R 1 C1 and the channels F i for i ∈ {1, . . . , n} constitute a particular co-strategy for discriminating N (n) from M (n) . The next inequality is a consequence of quantum data processing for the underlying generalized divergence, given that P R 1 L1D1→R m Dm is a quantum channel. Since the inequality holds for all possible co-strategies T that could be used to distinguish K (m) from L (m) , we conclude (113).
Remark 1 We note that the data processing inequality in (113) holds more generally for physical transformations of quantum strategy boxes that do not necessarily proceed in the order that we have fixed (i.e., it holds for other time orderings of physical transformations of strategy boxes). The main idea for establishing it is to use the data processing inequality for the underlying generalized divergence and that a co-strategy for a physically transformed strategy is a special kind of co-strategy for the original strategy.

C. Quantum strategy box transformation problem
The goal of this setting is to convert the quantum strategy box (N (n) , M (n) ) to the strategy box (K (m) , L (m) ) by means of common physical transformation Θ (n→m) , subject to the constraint that K (m) is realized approximately from N (n) , i.e., while L (m) is realized perfectly from M (n) by the protocol Θ (n→m) , i.e., just as is the case with all of the other transformations that we have considered in the resource theory of asymmetric distinguishability. The common physical transformation Θ (n→m) that we consider is as we discussed in Section VIII B and is depicted in Figures 4 and 5. It consists of a general physical processing of the strategy box (N (n) , M (n) ) to convert it approximately to the strategy box (K (m) , L (m) ), in the sense given in (124)-(125). The notion of approximation that we employ in (124) is the normalized strategy distance of [CDP08a,CDP09,Gut12], which generalizes the normalized diamond distance to the setting of interest here. This quantity is a special case of the generalized strategy divergence from Definition 1, with the underlying divergence set to be the normalized trace distance 1 2 · 1 . The motivation for employing the normalized strategy distance is the same as that which we gave for normalized diamond distance: it quantifies the worst-case statistical error (absolute deviation) that one could make when trying to distinguish the simulation Θ (n→m) (N (n) ) from the ideal output strategy K (m) by any quantum-physical experiment.
We now describe the above in more detail. The general physical transformation Θ (n→m) of the first strategy box (N (n) , M (n) ) consists of n + m channels, denoted by Suppose first that the transformation Θ (n→m) acts on the quantum strategy N (n) . In this case, the first channel F 1 C1→R1A1 acts on the state ρ R 1 C1 and outputs systems A 1 and R 1 . Then the channel N A1→B1 is applied, and the second channel F 2 R1B1→R2A2 is applied. This repeats n − 1 more times, and the resulting state is as follows: (128) At this point, the other elements of the co-strategy and the remainder of the simulation are applied, which consists of the testing channels {A i R i Di→R i+1 Ci+1 } m−1 i=1 interleaved by the simulation channels F n+2 , . . . , F m . The resulting state is then FIG. 4. Depiction of the physical transformation Θ (n→m) that converts the three-round strategy N (3) to the three-round strategy K (3) . The physical transformation Θ (n→m) consists of the channels F 1 , . . . , F 6 . A discriminator could in principle then perform a co-strategy to distinguish the simulation in the top part of the figure from the ideal implementation of the strategy K (3) in the bottom part of the figure. We demand that the absolute deviation in probability between any measurement outcome in the top part be no larger than ε when compared to the same from the bottom part, i.e., that the normalized strategy distance be no larger than ε. where The final state above is then compared with the following state, which results from the application of the quantum co-strategy T to the ideal strategy K (m) : See Figure 4 for a depiction of these two scenarios.
The simulation has ε error if the following inequality holds where the optimization is with respect to all quantum co-strategies T as defined in (127). The expression on the left-hand side above is in fact equal to the mround normalized quantum strategy distance considered in [CDP08a,CDP09,Gut12,GRS18], so that we can write (132) equivalently as As a shorthand for the inequality in (133), we employ the notation It is also demanded that the transformation Θ (n→m) be such that Θ (n→m) (M (n) ) = L (m) , which is the same [Gut12] as demanding that This is consistent with our prior error criteria in the simpler scenarios for the resource theory of asymmetric distinguishability. Thus, the general strategy box transformation problem can be phrased as the following optimization problem, which is a function of n, m ∈ Z + and channels N , M, K, and L: FIG. 5. Depiction of the physical transformation Θ (n→m) that converts the three-round strategy M (3) to the three-round strategy L (3) . The physical transformation Θ (n→m) is the same as that given in Figure 4 and consists of the channels F 1 , . . . , F 6 . A discriminator could in principle then perform a strategy to distinguish the simulation in the top part of the figure from the ideal implementation of the three-round strategy L (3) in the bottom part of the figure. We demand that the absolute deviation in probability between any measurement outcome in the top part be exactly equal to zero when compared to the same from the bottom part, i.e., that the normalized strategy distance be equal to zero, so that the simulation is perfect in this case.
where the infimum is with respect to physical transformations Θ (n→m) . We assert here that the optimization problem in (136) can be cast as a semi-definite program, by employing the facts that the quantum strategy distance can be calculated by a semi-definite program and one can write down Choi operators for Θ (n→m) , N (n) , M (n) , K (m) , and L (m) [CDP08a, CDP09, Gut12] along with various non-signaling constraints to denote the time-orderings involved. However, we do not elaborate on the details here.
In Appendix G, Proposition 14 states converse bounds that apply to arbitrary protocols that transform the nround strategy box (N (n) , M (n) ) to the m-round strategy box (K (m) , L (m) ) while satisfying Θ (n→m) (N (n) ) ≈ ε K (m) and Θ (n→m) (M (n) ) = L (m) . The bounds are expressed in terms of strategy Rényi divergences, which are defined as special cases of Definition 1 with the underlying divergence fixed to be the Rényi divergences.

D. Asymptotic setting for sequential channel box transformations
It does not seem sensible to consider an asymptotic version of the general strategy box transformation prob-lem, as in general there is no regular structure associated with arbitrary strategy boxes. However, if we impose some structure, then it is sensible to do so.
The simplest structure that we can impose is that each strategy box is actually a sequential channel box, involving sequential uses of the same quantum channels. Then we can phrase the sequential channel box transformation problem in an asymptotic, Shannon-theoretic way, similar to how we did for the parallel channel box transformation problem in Section VII.
Let n, m ∈ Z + and ε ∈ [0, 1]. An (n, m, ε) sequential channel box transformation protocol for the channel boxes (N , M) and (K, L) consists of a physical transformation Θ (n→m) , as described in Section VIII B, such that where N (n) , M (n) , K (m) , and L (m) are the sequential channels corresponding to the channels N , M, K, and L, respectively. For clarity, Figure 6 depicts an example of a sequential channel box transformation protocol.
A rate R is achievable if for all ε ∈ (0, 1], δ > 0, and sufficiently large n, there exists an (n, n [R − δ] , ε) sequen- Depiction of a sequential channel box transformation protocol. Three sequential uses of the channel N are converted approximately to three sequential uses of the channel K, while three sequential uses of the channel M are converted exactly to three sequential uses of the channel L. This is a special case of a strategy box transformation protocol, as depicted in Figures 4 and 5.
tial channel box transformation protocol. The optimal sequential channel box transformation rate R((N , M) → (K, L)) is equal to the supremum of all achievable rates. On the other hand, a rate R is a strong converse rate if for all ε ∈ [0, 1), δ > 0, and sufficiently large n, there does not exist an (n, n [R + δ] , ε) sequential channel box transformation protocol. The strong converse sequential channel box transformation rate R((N , M) → (K, L)) is equal to the infimum of all strong converse rates.
The following inequality is a direct consequence of definitions: Although it is a challenging question in general to de-termine the optimal rates in (139) for arbitrary channel boxes, there are some special cases for which it is possible to determine them.
The main reason that this simplification occurs is that the channels involved for environment-seizable pairs are equivalent to states, so that the prior achievability results for states [WW19] apply. Also, the converse bounds from Appendix G 1 simplify for the same reason.
2. If the channel boxes (N , M) and (K, L) are classical-quantum, then the following strong converse bound holds as a consequence of [BHKW18, Lemma 26] and the discussion in Appendix G 2. It is reasonable to conjecture that this bound is saturated-what remains is to show that D(N M) is the optimal rate of distinguishability dilution for classical-quantum channels. follows because one can first distill bits of asymmetric distinguishability from (N , M) at the rate D(N M) and then dilute them to (K, L), in a sequential simulation, with the latter simulation being possible easily by preparing the environment states for (K, L) and then acting with the relevant common channels on demand when needed.

IX. DISTILLATION AND DILUTION OF QUANTUM STRATEGY AND SEQUENTIAL CHANNEL BOXES
In this section, we present distillation and dilution of quantum strategy boxes. A special case of this theory involves distillation and dilution of sequential channel boxes. Here we are interested in not only in the optimal number but also rates at which one can distill or dilute bits of asymmetric distinguishability from or to a strategy or sequential channel box, respectively, both in the exact and approximate cases.
All of the basic definitions in this case represent generalizations of what we have presented previously for oneshot tasks regarding quantum channels. As such, we do not delve into as many details as we did before but mainly state the results and provide brief justifications.

A. Exact case: distillable distinguishability
Given a strategy box (N (n) , M (n) ), the exact distillable distinguishability is equal to the largest M such that we can transform (N (n) , M (n) ) to the channel box (R |0 0| C→D , R π M C→D ) exactly by means of a physical transformation Θ (n→1) . Note that the physical transformation Θ (n→1) is a special case of those that we discussed previously in Section VIII B, taking an n-round quantum strategy box to a channel box. Mathematically, the exact distillable distinguishability is defined as the following optimization problem: Note that this problem is essentially equivalent to D 0 d (N (n) , M (n) ) , which is the largest m for which a physical transformation Θ (n→m) exists such that where the superscript (m) indicates m sequential channel uses. This is because the channel box (R |0 0| C→D , R π 2 m C→D ) and the sequential channel box ) are equivalent to each other by means of common quantum strategies, due to the fact that the underlying channel pairs are environment seizable and thus equivalent to state boxes.
By employing reasoning similar to that which we employed previously to justify (37), we conclude that where D min (N (n) M (n) ) is the quantum strategy divergence from Definition 1, with D therein set to D min . The main reasons that this equality holds are that 1) the optimal co-strategy for D min (N (n) M (n) ) leads to a protocol for distilling bits of asymmetric distinguishability and 2) its optimality follows from the data processing inequality (Theorem 4) for D min (N (n) M (n) ) with respect to an arbitrary physical transformation Θ (n→1) that produces the channel box (R |0 0| C→D , R π M C→D ) exactly. If the strategy box (N (n) , M (n) ) is in fact a sequential channel box for all n, with corresponding channels N and M, then we define the exact sequential distillable distinguishability as Just as with the parallel case discussed in Section IX A, the underlying quantity D 0 d (N (n) , M (n) ) can jump from zero to ∞ as n increases. In fact, this jump can occur in the simplest case when n goes from one to two [HHLW10]. By the general bound from [BHKW18], we have that B. Exact case: distinguishability cost Given a strategy box (N (n) , M (n) ), the exact distinguishability cost is equal to the smallest M such that we can transform the channel box (R |0 0| C→D , R π M C→D ) to (N (n) , M (n) ) exactly by means of a physical transformation Θ (1→n) . Note that the physical transformation Θ (1→n) is a special case of those that we discussed previously in Section VIII B, taking a channel box to an nround quantum strategy box. Mathematically, the exact distinguishability cost is defined as the following optimization problem: For similar reasons stated in the previous section, this problem is essentially equivalent to D 0 c (N (n) , M (n) ) , which is the smallest m for which a physical transformation Θ (m→n) exists such that where the superscript (m) again indicates m sequential channel uses. By employing reasoning similar to that which we used previously to justify (38), we conclude that where D max (N (n) M (n) ) is the quantum strategy divergence from Definition 1, with D therein set to D max . The quantity D max (N (n) M (n) ) has already been defined in and studied in [CE16], wherein it was shown that it is equal to the max-relative entropy of the Choi operators of the strategies. Eq. (152) gives D max (N (n) M (n) ) its fundamental operational meaning in terms of the exact distinguishability cost of the strategy box (N (n) , M (n) ). The main reasons that the equality in (152) holds are that 1) an optimal dilution protocol, generalizing that from Appendix C 2, results from a strategy that outputs strategy N (n) if |0 0| is input and outputs the strategy if |1 1| is input and 2) its optimality follows from the data processing inequality (Theorem 4) for D max (N (n) M (n) ) with respect to an arbitrary physical transformation Θ (1→n) that produces the strategy box (N (n) , M (n) ) exactly from the channel box (R |0 0| C→D , R π M C→D ). If the strategy box (N (n) , M (n) ) is in fact a sequential channel box for all n, with corresponding channels N and M, then we define the exact sequential distinguishability cost as Note that the following inequality holds because a sequential simulation is more stringent than a parallel simulation. That is, any sequential simulation works as a parallel simulation.
A key result that we have for this problem, strengthening our earlier finding from (64), is expressed by the following theorem.
Theorem 5 For channels N and M, the exact sequential distinguishability cost is equal to the channel maxrelative entropy: Proof. The inequality D 0 c (N , M) ≥ D max (N M) is a consequence of (155) and (64). The other inequality is a consequence of the following scheme for simulating the sequential channel box (N (n) , M (n) ), similar to that employed in [GFW + 18,WW18]. In the first round of the sequential simulation, one starts from the channel box (R In the next round, one uses the leftover channel box (R |0 0| C→D ⊗ M). Again employing (38) and an analysis similar to the above, the cost for doing so is This continues until the last round, and adding everything up, the total cost for the simulation of the sequential channel box (N (n) , M (n) ) is nD max (N M). Since this holds for every n, we conclude that D 0 c (N , M) ≤ D max (N M), and in turn, we conclude (156).

C. Approximate case: distillable distinguishability
Given a strategy box (N (n) , M (n) ), the approximate distillable distinguishability is equal to the largest M such that we can transform the strategy box (N (n) , M (n) ) to the channel box (R |0 0| C→D , R π M C→D ) approximately by means of a physical transformation Θ (n→1) . Mathematically, it is defined as the following optimization problem: where the shorthand ≈ ε is defined in (133)-(134) in terms of the normalized strategy distance.
For similar reasons stated in the previous section, this problem is essentially equivalent to D ε d (N (n) , M (n) ) , which is the largest m for which a physical transformation Θ (n→m) exists such that By employing reasoning similar to that which we used previously to justify (44), we conclude that where D ε min (N (n) M (n) ) is the quantum strategy divergence from Definition 1, with D therein set to D ε min . The main reasons that this equality holds are that 1) the optimal co-strategy for D ε min (N (n) M (n) ) leads to a protocol for distilling bits of asymmetric distinguishability approximately and 2) its optimality follows from the data processing inequality for D ε min (N (n) M (n) ) with respect to an arbitrary physical transformation Θ (n→1) that produces the channel box (R A key result of our paper is the following formal expression for D d (N , M) in terms of the amortized channel relative entropy from [BHKW18]: Theorem 6 For channels N and M, the sequential distillable distinguishability is equal to the amortized channel relative entropy of [BHKW18]: where Proof. The bound follows from [BHKW18, Proposition 16], due to the equivalence between sequential distillable distinguishability and the optimal rate of the quantum hypothesis testing problem considered in [BHKW18]. So it remains to establish the opposite inequality.
To do so, here we employ a technique used in the resource theory of coherence [GFW + 18, Theorem 17], which was used therein to show that the amortized relative entropy of coherence is equal to the distillable coherence of a quantum channel. A similar technique was also discussed previously in [BHLS03, Section 2.4].
Let ρ RA and σ RA be arbitrary quantum states. Let ψ RA be a state such that (If such a state does not exist, then D d (N , M) is trivially equal to zero.) The first step is to send in the tensorpower state ψ ⊗m RA to m parallel calls of the unknown channel, where for δ > 0, and distill bits of asymmetric distinguishability at the rate D(N A→B (ψ RA ) M A→B (ψ RA )). Second, we dilute these bits of asymmetric distinguishability to the state box (ρ ⊗n RA , σ ⊗n RA ). Third, we then send this state box into n uses of the unknown channel, producing the state box ([N A→B (ρ RA )] ⊗n , [M A→B (σ RA )] ⊗n ). Fourth, from this state box, we distill bits of asymmetric distinguishability at the rate D(N A→B (ρ RA ) M A→B (σ RA )) − δ. We output a fraction R − 2δ of these bits, where and then reinvest a fraction D(ρ RA σ RA )+δ for the next round. We then repeat steps 2) through 4) k times. In the last round, a fraction R f − δ bits of asymmetric distinguishability are output, where and no reinvestment is made (because it is the last round). Counting up everything, this protocol calls the unknown channel kn + m times, while outputting bits of asymmetric distinguishability. Thus, the rate of the protocol is given by In the limit as k → ∞, this rate converges to R−2δ. Since δ > 0 is arbitrary, the rate R is achievable. Note that all of the conversions stated above are approximate, but for large enough n and by employing the triangle inequality, the error vanishes. Finally, since the states ρ RA and σ RA are arbitrary, we can take a supremum over all of them and conclude the inequality thus completing the proof.
Theorem 6 establishes an operational meaning for the amortized channel relative entropy of [BHKW18], thus giving it some distinction in the resource theory of asymmetric distinguishability for quantum channels. Theorem 6 can alternatively be understood as a formal solution to Stein's lemma for quantum channels in the sequential setting, thus completing the line of reasoning put forward in [BHKW18].
More generally, this result can be used to determine whether a sequential protocol is truly necessary to attain the optimal distillable distinguishability. If an amortization collapse occurs for a pair of channels, so that D A (N M) = D(N M), then one can conclude that a sequential protocol is not necessary and one can simply input a tensor-power state ψ ⊗n RA to distinguish the channels optimally in the asymptotic regime [BHKW18]. This collapse occurs for both environment-seizable and classical-quantum channel boxes. It also occurs for channel boxes in which the first channel is arbitrary and the second is a replacer channel [CMW16,BHKW18]. What Theorems 3 and 6 add to this story is that the condition is necessary and sufficient for an adaptive strategy to have an advantage over a parallel strategy in the setting of asymmetric channel discrimination, or equivalently, when distilling bits of asymmetric distinguishability. Determining whether (176) holds for a pair of quantum channels is an interesting and challenging open problem. It seems that the main idea of [GFW + 18, Theorem 17] (also [BHLS03, Section 2.4]), as used in the proof of Theorem 6, can be employed for a sequential distillation task in any quantum resource thery for which the static version of the theory (for quantum states) is asymptotically reversible. This is because the interleaving of distillation and dilution plays an essential role in the given protocol, and for an asymptotically reversible resource theory, there is no loss when going back and forth like this.

D. Approximate case: distinguishability cost
Given a strategy box (N (n) , M (n) ), the approximate distinguishability cost is equal to the smallest M such that we can transform the channel box (R |0 0| C→D , R π M C→D ) to (N (n) , M (n) ) approximately by means of a physical transformation Θ (1→n) . Mathematically, it is defined as the following optimization problem: . (177) For similar reasons stated previously, this problem is essentially equivalent to D ε c (N (n) , M (n) ) , which is the smallest m for which a physical transformation Θ (m→n) exists such that where the superscript (m) again indicates m sequential channel uses. By employing reasoning similar to that which we used previously to justify (45), we conclude that where the smooth strategy max-relative entropy is defined as and D max ( N (n) M (n) ) is defined in (152). The infimum in (181) is with respect to n-round strategies N (n) that are ε-close in normalized strategy distance to the strategy N (n) . The main reasons that this equality holds are that 1) an optimal approximate dilution protocol results from applying an optimal exact dilution protocol to N (n) and M (n) , where N (n) is ε-close to N (n) with respect to the normalized strategy distance and 2) its optimality follows from the data processing inequality for D ε max (N (n) M (n) ) with respect to an arbitrary physical transformation Θ (1→n) that produces the strategy box (N (n) , M (n) ) approximately from the channel box (R |0 0| C→D , R π M C→D ). If the strategy box (N (n) , M (n) ) is in fact a sequential channel box for all n, with corresponding channels N and M, then we define the sequential distinguishability cost as Note that the following inequality holds because a sequential simulation is more stringent than a parallel simulation. That is, any sequential simulation works as a parallel simulation.
As occurred for all other tasks in this paper, the sequential distinguishability cost simplifies for environment-seizable channel boxes. It remains an interesting open question to understand the sequential distinguishability cost of quantum channel boxes other than environment-seizable ones.

X. CONCLUSION
In this paper, we generalized the resource theory of asymmetric distinguishability from states [Mat10, Mat11, WW19] to channels. In this resource theory, the main constituents are quantum channel boxes that can be manipulated by means of a quantum superchannel, the most general physical transformation that sends quantum channels to quantum channels. Furthermore, the basic units of currency are bits of asymmetric distinguishability [WW19].
In the one-shot scenario, we considered the approximate channel box transformation problem and proved that it is characterized by a semi-definite program. As special cases of this, we considered exact and approximate one-shot distillation and dilution of channel boxes, arriving at the following conclusions: 1. The exact one-shot distillable distinguishability of a channel box is equal to the channel min-relative entropy.
2. The exact one-shot distinguishability cost of a channel box is equal to the channel max-relative entropy.
3. The approximate one-shot distillable distinguishability of a channel box is equal to the smooth channel min-relative entropy.
4. The approximate one-shot distinguishability cost of a channel box is equal to the smooth channel maxrelative entropy.
These results endow these fundamental channel measures of distinguishability with operational interpretations. We then moved on to consider asymptotic parallel versions of the above tasks, with our key findings here being that the parallel distillable distinguishability is equal to the regularized channel relative entropy and the parallel exact distinguishability cost is equal to the channel max-relative entropy. We solved the asymptotic version of the parallel channel box transformation problem for environment-seizable and classical-quantum channel boxes.
We finally considered the approximate strategy box transformation problem and asserted that it is characterized by a semi-definite program. We introduced the generalized strategy divergence as a way of quantifying distinguishability of quantum strategies and used instantiations of this concept to provide bounds on how well one can convert one strategy box to another. In particular, transformations of sequential channel boxes are a special case of transformations of strategy boxes, so that many of the results for strategy boxes apply directly, and all of the results simplify for environment-seizable or classicalquantum sequential channel boxes.
By focusing on distillation and dilution tasks, we proved that the asymptotic sequential distillable distinguishability of a sequential channel box is equal to the amortized channel relative entropy of [BHKW18], thus endowing this quantity with a fundamental operational meaning. We also proved that the exact sequential distinguishability cost is equal to the channel max-relative entropy.
Going forward from here, there are many open questions for future work. Are there other channel boxes, besides environment-seizable and classical-quantum ones, for which the theory simplifies significantly? Based on the distillation results of [CMW16], and other findings of [FWTB19], it seems plausible that the channel relative entropy should be the optimal rate for dilution protocols of channel boxes in which the first channel is arbitrary and the second is a replacer channel. Are there examples of channel boxes for which the regularization in the regularized channel relative entropy is necessary? Are there examples of channel boxes for which the amortized channel relative entropy does not collapse to the ordinary channel relative entropy? Answers to these questions would provide insights as to whether general parallel or sequential strategies are helpful in distinguishability distillation. Can we characterize the asymptotic parallel or sequential distinguishability cost, in the case in which the simulation need not be exact but with vanishing error in the asymptotic limit? Is it possible to give a more general theory beyond independent and identically distributed channels, i.e., for memory channels with some structure? These and other questions remain the subject of future investigations. Note: After our paper appeared online, the preprint [FFRS19] was posted, which has addressed some of the open questions from our paper.

ACKNOWLEDGMENTS
We are grateful to Vishal Katariya for pointing out a problem with our previous formulation of the semidefinite program for the smooth channel max-relative entropy. We are grateful to Andreas Winter for pointing out [BHLS03, Section 2.4] in the context of Theorem 6. XW acknowledges support from the Department of Defense, and MMW acknowledges support from the National Science Foundation under Grant No. 1907615. A generalized divergence is a function D(ρ σ) taking arbitrary quantum states ρ and σ to the non-negative reals and such that the data processing inequality holds for an arbitrary quantum channel N [SW12]: Generalized divergences of interest include the trace distance, the negative logarithm of the fidelity [Uhl76], the quantum relative entropy [Ume62], the Petz-Rényi relative entropy [Pet85,Pet86], and the sandwiched Rényi relative entropy [MLDS + 13, WWY14]. For completeness, we define the last three quantities now and refer to our companion paper [WW19] for further details of their properties. The quantum relative entropy D(ρ σ) is defined for states ρ and σ as if supp(ρ) ⊆ supp(σ) and it is set to ∞ otherwise. The Petz-Rényi relative entropy is defined for states ρ and σ as [Pet86] D α (ρ σ) : if α ∈ (0, 1) or α ∈ (1, ∞) and supp(ρ) ⊆ supp(σ). If if α ∈ (0, 1) or α ∈ (1, ∞) and supp(ρ) ⊆ supp(σ). If α ∈ (1, ∞) and supp(ρ) ⊆ supp(σ), then D α (ρ σ) := ∞. A generalized channel divergence is defined from that for states, as presented above, given by the following function of quantum channels N A→B and M A→B [LKDW18]: where the optimization is with respect to a quantum state ρ RA such that the reference system is arbitrary. As observed in [LKDW18], the following simplification holds where the optimization is with respect to pure states ψ RA such that the reference system R is isomorphic to the channel input system A.
The data processing inequality holds for the generalized channel divergence, with respect to a superchannel Θ: as proved in [Gou19]. The inequality in (A7) follows from the definition in (A5) and the fact that the underlying generalized divergence D obeys the data processing inequality in (A1). Other applications and interpretations of channel divergences were considered in [Yua19].
For an environment-parametrized channel box (N , M) with environment states ρ E and σ E , the following inequality holds [TW16] If the channel box is also environment seizable (see Section III A), then the opposite inequality D(N M) ≥ D(ρ E σ E ) holds as well (as a consequence of (A7)), from which we conclude the following equality in this case: Particular examples of generalized channel divergences are the channel min-relative entropy in (31), the smooth channel min-relative entropy in (41), and the channel max-relative entropy in (33). Other examples include those built from the relative entropy, the Petz-Rényi relative entropy, and the sandwiched Rényi relative entropy, as defined in [CMW16]. As such, the inequality in (A7) holds for all of these channel divergences, a property that we make extensive use of in what follows.
It is not clear how to write the smooth channel maxrelative entropy in (43) as a generalized channel divergence. However, it does obey the data processing inequality in (A7), as the following simple argument demonstrates. Let N and M be arbitrary channels, and let Θ be a superchannel. Let N be a channel satisfying N ≈ ε N .
Then, from the data processing inequality for the diamond distance with respect to superchannels, it follows that Θ( N ) ≈ ε Θ(N ). We then have that The first inequality follows from the data processing inequality for D max of channels, and the second follows from the definition of the smooth channel max-relative entropy and the fact that Θ( N ) ≈ ε Θ(N ). Since the inequality holds for all N satisfying N ≈ ε N , we conclude the desired data processing inequality:

Choi isomorphism for quantum channels
The Choi isomorphism is a way of characterizing quantum channels that is suitable for optimizing over them in semi-definite programs. For a quantum channel N A→B , its Choi operator is given by where Γ RA = |Γ Γ| RA and where systems S, R, and A are isomorphic and the last line employs the facts that (M S ⊗ I R ) |Γ SR = (I S ⊗ T R (M R )) |Γ SR for T R the transpose map, defined as and Γ| SR (I S ⊗ X RB ) |Γ SR = Tr R [X RB ]. We often abbreviate the transpose map simply as Since the constraints Γ M RB ≥ 0 and Tr B [Γ M RB ] = I R are semi-definite, this is a useful way of incorporating optimizations over quantum channels into semi-definite programs.

Semi-definite programs for diamond distance
The normalized diamond distance between quantum channels N A→B and M A→B is given by the following primal and dual semi-definite programs [Wat09]: The latter expression is equal to

Choi isomorphism for quantum superchannels
Just as there is a Choi isomorphism for quantum channels, as reviewed in Appendix A 2, there is a Choi isomorphism for quantum superchannels [CDP08b,Gou19]. To define it, we can exploit the known result that a quantum superchannel Θ (A→B)→(C→D) is in one-to-one correspondence with a bipartite channel L CB→AD that has no-signaling constraints [CDP08b,Gou19]. That is, as stated in (15), every superchannel Θ (A→B)→(C→D) can be physically realized by means of pre-and postprocessing channels E C→AM and D BM →D , respectively, such that (15) holds. The bipartite channel corresponding to E C→AM and D BM →D is then given by i.e., where we do not "plug in" the channel N A→B to the ports A and B, and instead system B is available as input and system A is available as output. On the other hand, suppose that L CB→AD is a bipartite channel with the constraint that it is no-signaling from input system B to output system A. Then there exist channels E C→AM and D BM →D such that L CB→AD can be realized as in (A22), as proved in [ESW02], placing superchannels in one-to-one correspondence with bipartite channels that have a no-signaling constraint.
Using this correspondence, we define the Choi operator of a superchannel Θ (A→B)→(C→D) with corresponding B → A no-signaling bipartite channel L CB→AD as The fact that Θ (A→B)→(C→D) preserves completely positivity corresponds to the condition Γ Θ R C R B AD ≥ 0, and the fact that Θ (A→B)→(C→D) preserves trace preservation corresponds to the condition Γ Θ R C R B = I R C R B . The no-signaling condition corresponds to Γ Θ where π R B is the maximally mixed state. Furthermore, as an extension of (A16), the Choi operator Γ K R C D for the output channel K C→D of the superchannel Θ (A→B)→(C→D) , when the input is a channel N A→B with Choi operator Γ N R A B , is as follows: (A24) This kind of formulation of a superchannel allows for incorporating optimizations over superchannels into semidefinite programs, as we do in Appendix B.
Appendix B: General channel box transformation problem as a semi-definite program Here we prove the statement claimed at the end of Section IV, that the general channel box transformation problem stated in (19) can be solved by means of a semi-definite program. By employing the Choi representation of superchannels from Appendix A 4, as well as the semi-definite program for the diamond distance in Appendix A 3, we find that (19), as a function of channels N A→B , M A→B , K C→D , and L C→D , can be written as the following semi-definite program: subject to where we employ the shorthand with system C isomorphic to system C and system A isomorphic to system A.
The dual of the semi-definite program in (B1)-(B2) is given by subject to By employing strong duality, it follows that the optimal value of (B1)-(B2) is equal to the optimal value of (B6).
Appendix C: One-shot distillation and dilution of channel boxes 1. Channel min-relative entropy as exact one-shot distillable distinguishability To establish (37), we first prove the inequality Let Θ (A→B)→(C→D) be the superchannel that traces out the input C, prepares the pure state ψ RA , transmits A through the unknown channel N or M, and then applies the following channel to systems R and B, where B is the output of the unknown channel: is the projection onto the support of the state N A→B (ψ RA ). By construction, if the unknown channel is N A→B , then the channel realized by the superchannel delineated above is the replacer channel R So the output in this latter case is the replacer channel R π M C→D . Taking a supremum over all input states ψ RA then establishes the inequality in (C1).
The opposite inequality follows from the data processing inequality for D min (N M) under the action of a superchannel. Let Θ be an arbitrary superchannel satisfying Then it follows from (A7) that where the second-to-last equality follows from (A9), given that pairs of replacer channels are environment seizable, and the last equality follows by direct evaluation. Since the exact distillable distinguishability involves an optimization over all superchannels, the inequality in (C4) follows, and combined with (C1), we conclude (37).
2. Channel max-relative entropy as exact one-shot distinguishability cost To establish (38), we first prove the inequality Recall the characterization of D max (N M) from (34).
Let λ be such that Then this means that 2 λ M A→B (Φ RA ) − N A→B (Φ RA ) ≥ 0, so that is a quantum state. Furthermore, since where π R is the maximally mixed state on system R, it follows that ω RA is the Choi state of a quantum channel N A→B , so that Furthermore, by linearity, we have that Then we construct the superchannel Θ (C→D)→(A→B) as follows. Let τ C be a fixed state that is input to the unknown replacer channel R |0 0| C→D or R π M C→D , where M = 2 λ . Then we perform the following channel on the output system D and the input system A: In the case that the unknown channel is R |0 0| C→D , the channel realized by this process is N A→B . In the case that the unknown channel is R π M C→D , the channel realized by this process is demonstrating that Now taking an infimum over all λ such that (C12) holds, we conclude the inequality in (C11). The opposite inequality follows from the data processing inequality for D max (N M) under the action of a superchannel. Let Θ be an arbitrary superchannel satisfying Then it follows from (A7) that The first equality follows by direct evaluation, and the second follows from (A9), given that pairs of replacer channels are environment seizable. Since the exact distinguishability cost involves an optimization over all superchannels, the inequality in (C20) follows, and combined with (C11), we conclude (38).
In this appendix, we prove that the smooth channel min-and max-relative entropies are characterized by semi-definite programs, starting with the former. We note that Proposition 2 below was also found in [Fai19].
Proposition 2 Let N A→B and M A→B be quantum channels and ε ∈ [0, 1]. The smooth channel min-relative entropy is given by the following primal semi-definite program: The dual semi-definite program is given by Proof. By definition, we have that This then means that Consider that we can restrict the infimum above to being over all pure states ψ RA such that the reduced state ψ R is positive definite, i.e., ψ R > 0, due to the fact that the set of all such states is dense in the set of all pure bipartite states. Note that we can write all such states as ψ RA = X R Γ RA X † R for some operator X R such that |X R | > 0 and Tr[X † R X R ] = 1. Then it follows that (C34) where we have defined Ω RB := X † R Λ RB X R and ρ R := X † R X R . Then we can rewrite as Again using the fact that the set of positive-definite density operators is dense in the set of all density operators, we conclude (C27).
The dual SDP is given by (C28), and its optimal value is equal to the optimal value of the primal SDP in (C27) by strong duality.
Semi-definite programs for the channel min-relative entropy D min (N M) are recovered by setting ε = 0 in (C27) and (C28).
Proposition 3 Let N A→B and M A→B be quantum channels and ε ∈ [0, 1]. The smooth channel max-relative entropy is given by the following primal semi-definite program: The dual semi-definite program is given by Proof. The primal form in (C36) follows from the SDP formulation of the max-relative entropy and the SDP formulation of the diamond distance of two channels in (A20). By definition, we have that Considering that the optimization in (C41) below follows by combining these, with Y RB understood as the Choi operator for the channel N being optimized: The dual program is given by (C37), and its optimal value is equal to the optimal value of (C36) by strong duality.
Semi-definite programs for the channel max-relative entropy D max (N M) are recovered by setting ε = 0 in (C36) and (C37).

Smooth channel min-relative entropy as approximate one-shot distillable distinguishability
In order to establish the equality in (44), we first prove the following inequality: Let ψ RA be an arbitrary pure state and Λ RB a corresponding measurement operator satisfying 0 ≤ Λ RB ≤ I RB and Let Θ (A→B)→(C→D) be the superchannel that traces out the input C, prepares the pure state ψ RA , transmits system A through the unknown channel N or M, and then applies the following channel P RB→X to systems R and B, where B is the output of the unknown channel: With this construction, it follows that both Θ (A→B)→(C→D) (N A→B ) and Θ (A→B)→(C→D) (M A→B ) are replacer channels, and we find that Furthermore, the following equality holds for . (C47) The equality in (C45) follows from the reasoning in [WW19, Appendix F-1]. It then follows that Optimizing over all such ψ RA and Λ RB satisfying the constraints above, we conclude that We now prove the opposite inequality: Now let Θ (A→B)→(C→D) be an arbitrary superchannel satisfying Then we find that The first inequality follows from (A7) and the second equality from (C52). The last inequality follows from reasoning similar to that in [WW19, Appendix F-1]. Let ∆(·) = |0 0|(·)|0 0| + |1 1|(·)|1 1| denote the completely dephasing channel. Since Θ(N ) ≈ ε R |0 0| C→D , we find from the data processing inequality for normalized trace distance and an arbitrary input state ψ RC that which implies that 0|Θ(N )(ψ C )|0 ≥ 1 − ε for all input states ψ RC . Thus, we can take Λ RD = I R ⊗ |0 0| D in the definition of D ε min (Θ(N ) R π M C→D ), and we have that involves an optimization over all measurement operators Λ RD and states ψ RC satisfying Tr[Λ RD Θ(N )(ψ RC )] ≥ 1 − ε, we conclude the inequality in (C55). Since the inequality holds for an arbitrary superchannel Θ (A→B)→(C→D) satisfying (C51)-(C52), we conclude (C50).

Smooth channel max-relative entropy as approximate one-shot distinguishability cost
In order to establish the equality in (45), we first prove the following inequality: Let N A→B be a quantum channel satisfying N A→B ≈ ε N A→B . Then by constructing a superchannel as we did in Appendix C 2, but for N A→B instead of N A→B , we conclude the following inequality: Then taking the infimum over all such channels N A→B satisfying N A→B ≈ ε N A→B , we conclude the inequality in (C61). For the opposite inequality let Θ be an arbitrary superchannel satisfying Then consider that The second equality follows from (A9), given that pairs of replacer channels are environment seizable. The first inequality follows from (A7). The last inequality follows from the definition in (43). Since the chain of inequalities holds for all superchannels Θ satisfying (C64)-(C65), we conclude (C63). Putting together (C61) and (C63), we conclude the equality in (45), i.e., D ε c (N , M) = D ε max (N M).
6. Limits of smooth channel min-and max-relative entropy Here we provide an alternate proof of the limits stated in (48)-(49), starting with (48). These proofs use some of the results from [WW19, Appendix A-3] as a starting point. Let ψ RA be an arbitrary bipartite state. By the inequality D ε min (ρ σ) ≥ D min (ρ σ), which holds for all states ρ and σ and ε ∈ (0, 1), we conclude that for all ε ∈ (0, 1). Now taking a supremum over all ψ RA , we find that for all ε ∈ (0, 1). Taking the limit, we conclude that For the other limit, recall the following inequality from [WW19, Appendix A-3], holding for all states ρ and σ, for ε ∈ (0, 1), and α ∈ (0, 1): Taking an optimization over all input states ψ RA to the channels N A→B and M A→B , we conclude that Taking the limit as ε → 0, we conclude that for all α ∈ (0, 1). Now taking the limit of the left-hand side as α → 0, and applying arguments similar to those needed for [CMW16, Lemma 10], we conclude that Combining (C73) and (C77), we conclude the limit stated in (48).
Another proof for the inequality in (49) goes as follows. By taking N = N , we conclude that N ≈ ε N , so that applying definitions gives for all ε ∈ (0, 1). Then applying a limit gives Now suppose that N is a channel satisfying N ≈ ε N for ε ∈ (0, 1). Then this implies that and applying an inequality from [WW19, Appendix A-3], we find that Since this bound holds uniformly for all channels N satisfying N ≈ ε N , we conclude that Now taking the limit ε → 0, we find that Combining (C79) and (C83), we conclude the inequality in (49).
Appendix D: Upper bound on smooth max-relative entropy of classical-quantum channels The main purpose of this appendix is to prove Proposition 4, which establishes an upper bound on the smooth max-relative entropy of classical-quantum channels. We begin by noting a simple lemma: Lemma 1 Let {ρ x B } x∈X and {σ x B } x∈X be the output states of classical-quantum channels N X→B and M X→B , respectively, as defined in (82)-(83). Then we have that where the latter equality holds for all α ∈ [1/2, 1)∪(1, ∞).
Proof. The second equality follows from [BHKW18,Lemma 26]. The proof of the first equality is similar to the proof of the second one. For completeness, we provide a proof. Let ψ RX be an arbitrary pure bipartite quantum state (X is a quantum system here). Then the states resulting from the action of the classical-quantum channels on this state are as follows: where Then it follows that So we have established a uniform upper bound for any possible bipartite input state. The upper bound is achieved by calculating the value of x that achieves the optimum and inputting |x x| X to the channel box.
Proposition 4 Let {ρ x B } x∈X and {σ x B } x∈X be the output states of classical-quantum channels N X→B and M X→B , respectively. Then the following bound holds for all α > 1 and ε ∈ (0, 1): The first inequality follows by restricting the optimization to be over classical-quantum channels. The last equality follows because the objective function is jointly concave with respect to {p(x)} x∈X and {Λ x B } x∈X , and it is convex with respect to the states { ρ x B } x∈X . Also, the sets over which we are optimizing are convex and compact. Thus, the Sion minimax theorem applies [Sio58]. For each operator Λ x B , let its spectral decomposition be given as Then define the set S x and the projection Π x B as The above implies for all x ∈ X that Then from the data processing inequality of the sandwiched Rényi relative entropy for α > 1 [FL13,Bei13] and by dropping terms, we find that which in turn implies that for all x ∈ X . Then we find that for all x ∈ X , whereΠ We define the states and we note that for all x ∈ X . Then we find that So this means that we have the following bound holding for all x ∈ X : From this, we conclude (D11).
Appendix E: Bounds for general one-shot or n-shot parallel channel box transformations In this appendix, we establish some bounds for general channel box transformations, by generalizing the results of [WW19] from states to channels. We begin with the following proposition: Proposition 5 Let N 0 A→B , N 1 A→B , and M A→B be channels such that D max (N 0 M) < ∞. Then for α ∈ (1/2, 1) and β := β(α) = α/ (2α − 1) > 1, the following inequality holds (E1) where D β (N 0 M) and D α (N 1 M) are channel sandwiched Rényi relative entropies and F (N 0 , N 1 ) is the channel fidelity, each of which is defined from (A5) and the underlying functions of states.
Proof. Recall the following inequality from [WW19, Lemma 1] for states ρ 0 , ρ 1 , and σ: Let ψ RA be a pure bipartite state. Applying the above inequality for states, we find that Taking a supremum over all input states ψ RA on the lefthand side, and an infimum on the right-hand side, we find that Since the above inequality holds for all input states ψ RA , we finally take another supremum to conclude (E1).
Proposition 6 Let N 0 A→B , N 1 A→B , and M A→B be channels such that D max (N 0 M) < ∞. Then for α ∈ (0, 1) and β := β(α) = 2 − α > 1, the following inequality holds where D β (N 0 M) and D α (N 1 M) are channel Petz-Rényi relative entropies, each of which is defined from (A5) and the underlying functions of states.
Proof. This is a consequence of the following inequality from [WW19, Lemma 4], for states ρ 0 , ρ 1 , and σ, and the same reasoning as in the proof of Proposition 5: concluding the proof.
We can then use the above bounds for channels to establish converse bounds for general channel box transformation protocols.
We can now use these one-shot bounds to establish converse bounds on the rate at which it is possible to convert the n-fold channel box (N ⊗n , M ⊗n ) to the mfold channel box (K ⊗m , L ⊗m ).
Proof. Applying Proposition 7, we conclude that The second inequality follows from the fact that (E17) The other inequality follows from similar reasoning but instead using data processing and (E8).

Remark 2
The desired additivity relations in (E18)-(E19) and (E21)-(E22) hold for channel boxes that are classical-quantum or environment seizable [BHKW18]. Thus, by applying reasoning similar to that given in [WW19, Appendix J], we conclude the following strong converse bound for these channel boxes: follows from combining a distillation protocol with a dilution protocol for these channel boxes, as well as reasoning similar to that given in [WW19, Appendix J], and along with the fact that the rates D(N M) and D(K L) are achievable for these tasks and these channel boxes. Thus, the asymptotic parallel box transformation problem has a simple solution for these channel boxes.
Appendix F: Bounding the smooth channel max-relative entropy in terms of channel relative entropies In this appendix, we provide lower bounds for the smooth channel max-relative entropy in terms of the channel sandwiched and Petz-Rényi relative entropies.
Proposition 9 Let N A→B and M A→B be quantum channels. Then the following bound holds for all α ∈ [1/2, 1) and ε ∈ [0, 1): Proof. First fix α ∈ (1/2, 1). Then pick N to be a channel such that N ≈ ε N . We find for β := α/(2α − 1) that The first inequality follows from the fact that the sandwiched Rényi relative entropies are monotone [MLDS + 13] and lim α→∞ D α = D max [MLDS + 13]. The second inequality follows from Proposition 5. The final inequality follows because Since the bound holds for an arbitrary channel N satisfying N ≈ ε N , we conclude (F1). The inequality for α = 1/2 follows by taking a limit. Another lower bound on the smooth channel maxrelative entropy is as follows: Proposition 10 Let N A→B and M A→B be quantum channels. Then the following bound holds for all α ∈ [0, 1) and ε ∈ [0, 1): Proof. First fix α ∈ (0, 1). Then pick N to be a channel such that N ≈ ε N . We find for β := 2 − α that The first inequality follows from the fact that D max ≥ D 2 [JRS + 16, WW19] and the Petz-Rényi relative entropies are monotone with respect to β [TCR09]. The second inequality follows from Proposition 6. Since the bound holds for an arbitrary channel N satisfying N ≈ ε N , we conclude (F7). The inequality for α = 0 follows by taking a limit.
We can then use the above bounds for quantum strategies to establish converse bounds for general strategy box transformation protocols.
We can now use these bounds to establish converse bounds on the rate at which it is possible to convert the quantum strategy box (N (n) , M (n) ) to the strategy box (K (m) , L (m) ).

Sequential channel box transformations and amortized channel divergence
In [BHKW18], the notion of amortized channel divergence of a channel box (N , M) was introduced as where the optimization is with respect to input states ρ RA and σ RA , and the system R has unbounded dimension. The intuition behind this quantity is that it represents the largest net distinguishability that can be generated by the channels N and M if we are allowed to start with some distinguishability to begin with, in the form of the state box (ρ RA , σ RA ).
Suppose now that we have a sequential channel box (N (n) , M (n) ), where N (n) consists of a sequence of n uses of N and M (n) consists of a sequence of n uses of M. As stated earlier, this sequential channel box is a special kind of strategy box. Then by employing the same reasoning as in the proof of [BHKW18, Lemma 14], we conclude that the amortized channel divergence is an upper bound on the normalized strategy divergence of (N (n) , M (n) ): For some channel boxes and choices of divergences, the inequality in (G25) is saturated as a consequence of the amortized channel divergence collapsing to the usual channel divergence [BHKW18]. This occurs for all classical-quantum or environment-seizable channel boxes paired up with the Petz-Rényi relative entropy, the sandwiched Rényi relative entropy, or the quantum relative entropy [BHKW18]. Thus, for these channels, the desired relations in (G18)-(G19) and (G21)-(G22) hold, so that the bounds in (G20) and (G23) hold for these channels.