qu an tph ] 2 8 M ay 2 01 9 Resource theory of asymmetric distinguishability

This paper systematically develops the resource theory of asymmetric distinguishability, as initiated roughly a decade ago [K. Matsumoto, arXiv:1006.0302 (2010)]. The key constituents of this resource theory are quantum boxes, consisting of a pair of quantum states, which can be manipulated for free by means of an arbitrary quantum channel. We introduce bits of asymmetric distinguishability as the basic currency in this resource theory, and we prove that it is a reversible resource theory in the asymptotic limit, with the quantum relative entropy being the fundamental rate of resource interconversion. The distillable distinguishability is the optimal rate at which a quantum box consisting of independent and identically distributed (i.i.d.) states can be converted to bits of asymmetric distinguishability, and the distinguishability cost is the optimal rate for the reverse transformation. Both of these quantities are equal to the quantum relative entropy. The exact one-shot distillable distinguishability is equal to the Petz–Rényi relative entropy of order zero, and the exact one-shot distinguishability cost is equal to the max-relative entropy. Generalizing these results, the approximate one-shot distillable distinguishability is equal to the hypothesis testing relative entropy, and the approximate one-shot distinguishability cost is equal to the smooth max-relative entropy. As a notable application of the former results, we prove that the optimal rate of asymptotic conversion from a pair of i.i.d. quantum states to another pair of i.i.d. quantum states is fully characterized by the ratio of their quantum relative entropies.


I. INTRODUCTION
Distinguishability plays a central role in all sciences. That is, the ability to distinguish one possibility from another is what allows us to discover new scientific laws and make predictions of future possibilities. In the process of scientific discovery, we form a hypothesis based on conjecture, which is to be tested against a conventional or null hypothesis by repeated trials or experiments. With sufficient statistical evidence, one can determine which hypothesis should be rejected in favor of the other. If the null hypothesis is accepted, one can form alternative hypotheses to test against the null hypothesis in future experiments.
What is essential in this approach is the ability to perform repeated trials. Repetition allows for increasing the distinguishability between the two hypotheses. A natural question in this context is to determine how many trials are required to reach a given conclusion. If the two different hypotheses are relatively distinguishable, then fewer trials are required to decide between the possibilities. In this sense, distinguishability can be understood as a resource, because it limits the amount of effort that we need to invest in order to make decisions.
One of the fundamental settings in which distinguishability can be studied in a mathematically rigorous manner is statistical hypothesis testing. The basic setup is that one draws a sample x from one of two probability distributions p ≡ {p(x)} x∈X or q ≡ {q(x)} x∈X , with common alphabet X , with the goal being to decide from which distribution the sample x has been drawn. Let p be the null hypothesis and q the alternative. A Type I error occurs if one decides q when the distribution being sampled from is in fact p, and a Type II error occurs if one decides p when the distribution being sampled from is in fact q. The goal of asymmetric hypothesis testing is to minimize the probability of a Type II error, subject to an upper bound constraint on the probability of committing a Type I error.
In the scientific spirit of repeated experiments, we can modify the above scenario to allow for independent and identically distributed (i.i.d.) samples from either the distribution p or q. One of the fundamental results of asymptotic hypothesis testing is that, with a sufficiently large number of samples, it becomes possible to meet any upper bound constraint on the Type I error probability while having the Type II error probability decaying exponentially fast with the number of samples, with the optimal error exponent being given by the relative entropy [Ste,Che56]: (1) That is, there exists a sequence of schemes that can achieve this error exponent for the Type II error probability while making the Type I error probability arbitrarily small in the limit of a large number of samples. At the same time, the strong converse property holds: any sequence of schemes that has a fixed constraint on the Type I error probability is such that its Type II error probability cannot decay any faster than the exponent D(p q). This gives a fundamental operational meaning to the relative entropy and represents one core link between hypothesis testing and information theory [Bla74], the latter being the fundamental mathematical theory of communication [Sha48]. Another perspective on the above process of decision making in hypothesis testing, the resource-theoretic perspective [Mat10,Mat11] not commonly adopted in the literature on the topic, is that it is a process by which we distill distinguishability from the original distributions into a more standard form. That is, we can think of the distributions p and q being presented as a black box or ordered pair (p, q). Given a sample x ∈ X , we can perform a common transformation T : X → {0, 1} that outputs a single bit, "0" to decide p and "1" to decide q. The common transformation T can even be stochastic. In this way, one transforms the initial box to a final box as where p f ≡ {p f (y)} y∈{0,1} and q f ≡ {q f (y)} y∈{0,1} are binary distributions. Then the probability of a Type I error is p f (1), and the probability of a Type II error is q f (0). Since the goal is to extract or distill as much distinguishability as possible, we would like for q f (0) to be as small as possible given a constraint ε ∈ [0, 1] on p f (1) (i.e., p f (1) ≤ ε).
Once we have adopted this resource-theoretic approach to distinguishability, it is natural to consider two other questions, the first of which is the question of the reverse process [Mat10,Mat11]. That is, we would like to start from initial binary distributions p i ≡ {p i (y)} y∈{0,1} and q i ≡ {q i (y)} y∈{0,1} having as little distinguishability as possible, and act on their samples with a common transformation R : {0, 1} → X in order to produce the distributions p ≡ {p(x)} x∈X and q ≡ {q(x)} x∈X , while allowing for a slight error when reproducing p. That is, we would like to perform the dilution transformation wherep ≡ {p(x)} x∈X is a distribution satisfying d(p,p) ≤ ε, for some suitable metric d of statistical distinguishability. In this way, we characterize the distinguishability of p and q in terms of the least distinguishable distributions p i and q i that can be diluted to prepare or simulate p and q, respectively. This dilution question is motivated by related questions in the theory of quantum entanglement [BDSW96]. The second, more general question is regarding the existence of a common transformation T : X → Z that converts initial distributions p and q into final distributions r ≡ {r(z)} z∈Z and t ≡ {t(z)} z∈Z : wherer ≡ {r(z)} z∈Z is a distribution satisfying d(r,r) ≤ ε. One can then ask about the rate or efficiency at which it is possible to convert a pair of i.i.d. distributions to another pair of i.i.d. distributions. This resource-theoretic approach to distinguishability offers a unique and powerful perspective on statistical hypothesis testing and distinguishability, similar to the perspective brought about by the seminal work on the resource theory of quantum entanglement [BDSW96], which has in turn inspired a flurry of activity on resource theories in quantum information and beyond [CG19]. Although the reverse process in (3) may seem nonsensical at first glance (why would one want to dilute fresh water to salt water? [BSST02]), it plays a fundamental role in characterizing distinguishability as a resource, as well as for addressing the general question posed in (4). It is also natural from a thermodynamic or physical perspective to consider reversibility and cyclicity of processes. Another application for the reverse process is in understanding the minimal resources required for simulation in various quantum resource theories [CG19].

II. MAIN RESULTS
The main goal of this paper is to develop systematically the resource-theoretic perspective on distinguishability, which was initiated in [Mat10,Mat11]. More precisely, the theory developed here is a resource theory of asymmetric distinguishability, given that approximation is allowed for the first distribution in all of the distillation, dilution, and general transformation tasks mentioned above. The theory that we develop applies in the more general setting of quantum distinguishability, as it did in [Mat10,Mat11], in particular when the distributions p and q are replaced by quantum states ρ and σ, respectively, and the common transformations allowed on a quantum box (ρ, σ) are quantum channels.
Some key findings of our work are as follows: 1. We introduce the fundamental unit or currency of this resource theory, dubbed "bits of asymmetric distinguishability." Then the distinguishability distillation and dilution tasks amount to distilling bits of asymmetric distinguishability from a box (ρ, σ) and diluting bits of asymmetric distinguishability to a box (ρ, σ), respectively.
2. We formally define the exact one-shot distinguishability distillation and dilution tasks, and we prove that the optimal number of bits of asymmetric distinguishability that can be distilled from a box (ρ, σ) is equal to the min-relative entropy [Dat09] (see (31)), while the optimal number of bits of asymmetric distinguishability that can be diluted to a box (ρ, σ) is equal to the max-relative entropy [Dat09] (see (35)), giving both of these quantities fundamental operational interpretations in the resource theory of asymmetric distinguishability.
3. We define the approximate one-shot distinguishability distillation and dilution tasks, and we prove that the optimal number of bits of asymmetric distinguishability that can be distilled from a box (ρ, σ) is equal to the smooth min-relative entropy [BD10, BD11, WR12] (see (44)), while the optimal number of bits of asymmetric distinguishability that can be diluted to a box (ρ, σ) is equal to the smooth max-relative entropy [Dat09] (see (48)), giving both of these quantities fundamental operational interpretations in the resource theory of asymmetric distinguishability.
4. We prove that the optimization problems corresponding to one-shot distinguishability distillation and dilution, as well as the optimization corresponding to the quantum generalization of the transformation problem considered in (4), are characterized by semi-definite programs (see Appendices B and C). Thus, all of these quantities can be computed efficiently.
5. We finally consider the asymptotic version of the resource theory and prove that it is reversible in this setting, with the optimal rate of distillation or dilution equal to the quantum relative entropy. The implication of this result is that the rate or efficiency at which a pair of i.i.d. quantum states can be converted to another pair of i.i.d. quantum states is fully characterized by the ratio of their quantum relative entropies (see (62)).
In what follows, we provide more details of the resource theory of asymmetric distinguishability and a full exposition of the main results stated above. We relegate details of mathematical proofs to several appendices, and we note here that some of the technical lemmas in the appendices may be of independent interest.
As far as we are aware, the first proposal for a resource theory of distinguishability was given in [Mat10, Mat11], which we have highlighted above. It appears that this aspect of the work [Mat10, Mat11] has gone largely unnoticed since its posting to the arXiv, given that there have been several subsequent proposals or calls to formalize a resource theory of distinguishability [Mor09, BK15b, Blu17] that apparently were not aware of [Mat10, Mat11].

III. RESOURCE THEORY OF ASYMMETRIC DISTINGUISHABILITY
We begin by establishing the basics of the resource theory of asymmetric distinguishability. The basics include the objects being manipulated, called "boxes," the fundamental units of resource, "bits of asymmetric distinguishability," and the free operations allowed, which are simply arbitrary quantum physical operations.
The basic object to manipulate in the resource theory of asymmetric distinguishability is the following "box"or ordered pair: where ρ and σ are quantum states acting on the same Hilbert space. The interpretation of the box (ρ, σ) is that it corresponds to two different experiments or scenarios. In the first, the state ρ is prepared, and in the second, the state σ is prepared. The box is handed to another party, who is not aware of which experiment is being conducted (i.e., which state has been prepared). One basic manipulation in this resource theory is to transform this box into another box by means of any quantum physical operation N , as allowed by quantum mechanics. Such physical operations are mathematically described by completely positive, trace-preserving (CPTP) maps and are known as quantum channels. By acting on the box (ρ, σ) with the common quantum channel N , one obtains the transformed box (N (ρ), N (σ)). Observe that it is not necessary to know which experiment is being conducted in order to perform this transformation; one can perform it regardless of whether ρ or σ was prepared. For this reason, all quantum channels are allowed for free in this resource theory, so that the transformation is allowed for free. If the channel being performed to transform the box in (5) is an isometric channel U(ω) = U ωU † (where U is an isometry satisfying U † U = I and ω is an arbitrary state), resulting in the box (U(ρ), U(σ)), then it is possible to invert this transformation and return to the original box in (5). A quantum channel that inverts the action of U is given by where θ is an arbitrary state and τ is some state. Another kind of invertible transformation is the appending channel A τ (ω) = ω ⊗ τ , which appends the state τ and has the following effect on the box: One can recover the original box (ρ, σ) from (9) by discarding the second system (described mathematically by partial trace). Thus, isometric channels and appending channels are perfectly reversible operations in this resource theory. The fundamental goal of this resource theory is to determine how and whether it is possible to transform an initial box (ρ, σ) to another box (τ, ω) for states τ and ω, by means of a common quantum channel N . Mathematically, the question is to determine, for fixed states ρ, σ, τ , and ω, whether there exists a completely positive and trace-preserving map N such that N (ρ) = τ and N (σ) = ω. As it turns out, various instantiations of this question have been studied considerably in prior work [Bla53, AU80, CJW04, MOA11, Bus12, HJRW12, BDS14, BaHN + 15, Ren16, BD16, Bus16, GJB + 18, Bus17, BG17], and a variety of results are known regarding it. In this paper, we offer a fresh resource-theoretic perspective on this matter.
Motivated by practical concerns, one important variation of the aforementioned box transformation problem is to determine whether it is possible to accomplish the transformation approximately as with some tolerance ε ∈ [0, 1] allowed, such that the state τ ε is ε-close to the desired τ . The precise way in which we allow some tolerance is motivated exclusively by operational concerns. In a single run of the first experiment in which ρ is prepared, the transformation N (ρ) = τ ε occurs. Then a third party would like to assess how accurate the conversion is. Such an individual can do so by performing a quantum measurement {Λ x } x with outcomes x (satisfying Λ x ≥ 0 for all x and x Λ x = I). The probability of obtaining a particular outcome Λ x is given by the Born rule Tr[Λ x τ ε ]. What we demand is that the deviation between the actual probability Tr[Λ x τ ε ] and the ideal probability Tr[Λ x τ ] be no larger than the tolerance ε. Since this should be the case for any possible measurement outcome, what we demand mathematically is that It is well known that indicating that our notion of approximation is most naturally quantified by the normalized trace distance 1 2 τ ε − τ 1 . Thus, the mathematical formulation of the approximate box transformation problem is as follows: where the notation ζ ≈ ε ξ for states ζ and ξ is a shorthand for 1 2 ζ − ξ 1 ≤ ε; i.e., The fact that we allow for approximate conversion for the first state but not the second is related to the fact that the resource theory presented here is a resource theory of asymmetric distinguishability. In Appendix C, we show that (13) is equivalent to a semi-definite program (SDP), implying that it is efficiently computable with respect to the dimensions of the states involved. In the case that ε((ρ, σ) → (τ, ω)) = 0, this means that it is possible to perform the desired transformation (ρ, σ) → (τ, ω) exactly, reproducing the previous result from [GJB + 18]. We can also consider the asymptotic version of the box transformation problem, in which the box consists not just of a single copy of the states ρ and σ but many copies of them (i.e., the box (ρ ⊗n , σ ⊗n ) instead of the original (ρ, σ)). By considering the asymptotic setting with approximation error, we can modify the original box transformation question as follows: what is the optimal rate R at which the transformation is possible, for large n and arbitrarily small approximation error? In this setting, the SDP characterization of ε((ρ ⊗n , σ ⊗n ) → (τ ⊗nR , ω ⊗nR )) is not particularly useful, due to the fact that the computational complexity of the optimization problem grows exponentially with increasing n, and so we resort to other, information-theoretic methods to address it.

A. Bits of asymmetric distinguishability
One way of addressing the various formulations of the box transformation problem is to break the transformation down into two steps, in which we first distill a standard box and then dilute this standard box to the desired one. It turns out that the most natural way to do so is to consider the following basic unit of currency or fiducial box: where is the maximally mixed qubit state. We also refer to the object in (16) as "one bit of asymmetric distinguishability." As before, we should think of the box in (16) as being in correspondence with two different experiments. In the first experiment, the first state ρ = |0 0| ("null hypothesis") is prepared, and in the second experiment, the second state σ = π ("alternative hypothesis") is prepared. A distinguisher presented with this box, and unaware of which experiment is being conducted, can try to determine which state ρ or σ has been prepared. Suppose that the distinguisher performs a measurement of the observable σ Z := |0 0| − |1 1| and assigns the outcome +1 to the decision "ρ was prepared" and −1 to the decision "σ was prepared." Then in the case that the state ρ was prepared, he can determine this with zero chance of error; on the other hand, if the state σ was prepared, then he can determine this with probability equal to 1/2. In other terms, with this strategy, he has zero chance of making a Type I error (misidentifying ρ) and he has a 50% chance of making a Type II error (misidentifying σ).
The above strategy of basing the decision rule on the outcome of a σ Z measurement is not the only strategy that the distinguisher can perform. By performing a quantum channel N that accepts a qubit as input and outputs another quantum system, the distinguisher can convert the box in (16) to the following box: After doing so, the distinguisher can base his decision rule on the outcome of a general quantum measurement. However, if the goal is to have zero chance of making a Type I error, then it is intuitive and can be proven that no strategy can perform better than the σ Z measurement strategy given in the previous paragraph. Thus, arbitrary channels acting on the box in (16) do not increase distinguishability.
One bit of asymmetric distinguishability is not a particularly strong resource. Indeed, with only one bit of asymmetric distinguishability, there is still a large chance of making a Type II error. However, the following box, consisting of m bits of asymmetric distinguishability, improves the situation: For such a box, there is a much smaller chance of making a Type II error. Indeed, by performing m independent measurements of the observable σ Z on each qubit and assigning the outcome "(+1, . . . , +1)" to the decision "|0 0| ⊗m was prepared" and the outcome "not (+1, . . . , +1)" to the decision "π ⊗m was prepared," the distinguisher still has zero chance of making a Type I error, but now has a one out of 2 m chance of making a Type II error. So with each extra bit of asymmetric distinguishability, the chance of making a Type II error decreases by a factor of two. This is the value of having more bits of asymmetric distinguishability. Note that the following transformation is forbidden when n > m: That is, one cannot increase bits of distinguishability by the action of a quantum channel; i.e., there is no quantum channel N that performs the map N (|0 0| ⊗m ) = |0 0| ⊗n and N (π ⊗m ) = π ⊗n for n > m. Quantum channels have a linear action on their inputs, and this linearity forbids such transformations, as shown in Appendix D.
A major goal of any resource theory is to quantify the amount of resource. For the simple boxes presented above, any Rényi relative entropy suffices as a good quantifier of the number of bits of asymmetric distinguishability contained in them. Two prominent examples of measures were put forward roughly a decade ago as measures of distinguishability and studied therein as quantum information-theoretic quantities of interest [Dat09].
They are known as the min-and max-relative entropies, defined respectively as follows for states ρ and σ: where Π ρ denotes the projection onto the support of ρ.
If ρ is orthogonal to σ, then D min (ρ σ) = ∞, and if supp(ρ) ⊆ supp(σ), then there is no finite λ ≥ 0 such that ρ ≤ 2 λ σ, implying that D max (ρ σ) = ∞. Evaluating these measures for the box given in (19), one finds that D min (|0 0| ⊗m π ⊗m ) = mD min (|0 0| π) = m, (23) consistent with the notion that the box in (19) contains m bits of asymmetric distinguishability. By performing the following quantum channel: one can convert the box in (19) to the following box: Furthermore, by performing the quantum channel one can convert the box in (26) back to the box in (19). For this reason, these boxes have an equivalent number of bits of asymmetric distinguishability, being equivalent by free operations. It also means that we can take the box in (26) to be the basic form of m bits of asymmetric distinguishability. Once we have done that, it is then sensible to allow m in (26) to be any non-negative real number, so that the box in (26) has m bits of asymmetric distinguishability, with m a non-negative real number. For this case, we still find that Going forward from here, we take the box in (26) to be the basic form of m bits of asymmetric distinguishability, for m any non-negative real number.

B. Exact distillation and dilution tasks
In any resource theory, the basic questions concern distillation and dilution tasks, and whether and in what senses the resource theory might be reversible [BDSW96, CG19]. In a distillation task, the goal is to process a general resource with free operations in order to distill as much of the basic resource as possible, while in the dilution task, the goal is to perform the opposite: process as little of the basic resource as possible, using free operations, in order to generate or dilute from it a more general resource. A prominent goal is to determine the ultimate rates at which these resource interconversions are possible and from there one can determine whether the resource theory is reversible.
In the resource theory of asymmetric distinguishability, the goal of exact distinguishability distillation is to process a general box (ρ, σ) with an arbitrary quantum channel in order to distill as many bits of asymmetric distinguishability as possible. Mathematically, we can phrase this task as the following optimization problem: where the choice of D d in D 0 d (ρ, σ) stands for distillable distinguishability, the "0" in D 0 d (ρ, σ) indicates that we do not allow any error, CPTP denotes the set of CPTP maps (quantum channels), and As we show in Appendix E 1, the following equality holds where D min (ρ σ) is the min-relative entropy [Dat09], as defined in (21). The equality in (31) thus assigns to D min (ρ σ) a fundamental operational meaning as the exact distillable distinguishability in the resource theory of asymmetric distinguishability. A strongly related operational meaning for D min (ρ σ) in quantum hypothesis testing was already given in [Dat09].
In the case that ρ is orthogonal to σ, then this means that the box (ρ, σ) can be converted to the box (|0 0|, |1 1|), by means of the quantum channel From the latter box, one can obtain as many bits of asymmetric distinguishability as desired. Indeed by performing the channel where π 2 m := 2 −m |0 0| + (1 − 2 −m ) |1 1|, one can obtain m bits of asymmetric distinguishability from the box (|0 0|, |1 1|). Since this is possible for any m ≥ 0, it follows that the box (|0 0|, |1 1|) has an infinite number of bits of asymmetric distinguishability, consistent with the fact that D min (ρ σ) = ∞ when ρ is orthogonal to σ. The goal of exact distinguishability dilution is the opposite: process as few bits of asymmetric distinguishability as possible, using free operations, in order to generate the box (ρ, σ). Mathematically, we can phrase this task as the following optimization problem: where the choice of D c in D 0 c (ρ, σ) stands for distinguishability cost and the "0" in D 0 c (ρ, σ) again indicates that we do not allow any error. As we show in Appendix E 2, the following equality holds where D max (ρ σ) is the max-relative entropy [Dat09], as defined in (22). The equality in (35) thus assigns to the max-relative entropy D max (ρ σ) a fundamental operational meaning as the exact distinguishability cost of the box (ρ, σ).
In the case that the support of ρ is not contained in the support of σ, then there is no finite value of M nor any quantum channel P that performs the transformations P(|0 0|) = ρ and P(π M ) = σ. However, in the limit M → ∞, the box (|0 0|, π M ) becomes the box (|0 0|, |1 1|), which is interpreted as containing an infinite number of bits of asymmetric distinguishability, as discussed after (33). In this case, we can pick the channel P as P(ω) = 0|ω|0 ρ + 1|ω|1 σ, and then the transformation P(|0 0|) = ρ and P(|1 1|) = σ is easily achieved. Thus, in this sense, if the support of ρ is not contained in the support of σ, then the distinguishability cost D 0 c (ρ, σ) = ∞, consistent with the fact that D max (ρ σ) = ∞ in this case.
An important case to consider in any resource theory is the case of independent and identically distributed (i.i.d.) resources. For our case, this means that we should analyze the box (ρ ⊗n , σ ⊗n ) for arbitrary n ≥ 1. Due to the additivity of D min (ρ σ) and D max (ρ σ), it follows that so that the number of bits of asymmetric distinguishability distilled and required in each respective task scales precisely linearly with n.
Due to the fact that we generally have D min (ρ σ) = D max (ρ σ) for states ρ and σ, it follows that the resource theory of asymmetric distinguishability is not reversible if we demand exact conversions from one box to another. In fact, the irreversibility in the exact case can be as extreme as desired. By picking ρ = |0 0| and σ = |ψ ψ| , so that the exact distillable distinguishability can be arbitrarily close to zero while the exact distinguishability cost is always infinite in this case.

C. Approximate distillation and dilution tasks
In realistic experimental scenarios, it is typically not possible to perform transformations exactly, thus motivating the need to consider approximate transformations and approximations of the ideal resources. For the resource theory of asymmetric distinguishability, we define an ε-approximate bit of asymmetric distinguishability as where ε ∈ [0, 1] and so that 0 ε ≈ ε |0 0|. The motivation for this choice is operational as before (see the discussion before (13)). Also, since the maximally mixed state π is diagonal in any basis, it suffices to consider (38) as the basic definition of an ε-approximate bit of asymmetric distinguishability, because one could simply perform the diagonalizing unitary for a general qubit state τ to bring a general box (τ, π) into the form of (38).
Generalizing (26) and (38), the following box represents m approximate bits of asymmetric distinguishability: If m is an integer, then this box is equivalent by the transformation in (27) to the following one: where so that 0 m ε ≈ ε |0 0| ⊗m . With such a notion in place, we can now generalize exact distillation of asymmetric distinguishability to its approximate version. The goal of ε-approximate distinguishability distillation is to distill as many εapproximate bits of asymmetric distinguishability as possible from a given box (ρ, σ). Mathematically, it corresponds to the following optimization for ε ∈ [0, 1]: As we show in Appendix F 1, the following equality holds where D ε min (ρ σ) is the smooth min-relative entropy [BD10, BD11, WR12], defined as (45) Thus, the equality in (44) assigns to the smooth min-relative entropy an operational meaning as the ε-approximate distillable distinguishability of the box (ρ, σ). This operational interpretation is directly linked to the role of D ε min (ρ σ) in quantum hypothesis testing [HP91, ON00, Hay03, Hay04, WR12, Hay17]. Note that D ε min (ρ σ) is also known as "hypothesis testing relative entropy" in the literature, which is terminology introduced in [WR12]. This quantity can be computed efficiently by means of a semi-definite program [DKF + 12], the proof of which we recall in Appendix B.
Note that by combining (31), (44), and the fact that lim ε→0 D ε d (ρ, σ) = D 0 d (ρ, σ), we conclude the following limit: We provide an alternative proof in Appendix A 3. We can also generalize the distinguishability dilution task to the approximate case. In this case, we define the ε-approximate distinguishability cost of the box (ρ, σ) to be the least number of ideal bits of asymmetric distinguishability that are needed to generate the box (ρ ε , σ), where ρ ε ≈ ε ρ. This notion of approximate distinguishability cost is fully operational and consistent with the more general problem in (13). The precise definition of the ε-approximate distinguishability cost of the box (ρ, σ) is as follows: As we show in Appendix F 2, the following equality holds where D ε max (ρ σ) is the smooth max-relative entropy [Dat09], defined as Thus, the equality in (48) assigns to the smooth max-relative entropy a fundamental operational meaning as the ε-approximate distinguishability cost of the box (ρ, σ). The smooth max-relative entropy can also be efficiently calculated by means of a semi-definite program, the proof of which we give in Appendix B. Note that by combining (35), (48), and the fact that lim ε→0 D ε c (ρ, σ) = D 0 c (ρ, σ), we conclude the following limit: We provide an alternative proof in Appendix A 3. An application of the operational approach to distinguishability taken here is the following bound relating D ε min and D ε max : where ε 1 , ε 2 ≥ 0, and ε 1 + ε 2 < 1. The bound in (51) is most closely related to the upper bound in [DMHB13, Theorem 11], but we employ a different notion of smoothing for the smooth max-relative entropy. It also generalizes the bound from [DKF + 12, Eq. (47)] (by appropriately working through the different conventions here and in [DKF + 12]) and is in the same spirit as [Tom12, Proposition 5.5] and [TH13, Eq. (22)]. The main idea for arriving at the bound in (51) follows from resource-theoretic reasoning. Any approximate distillation protocol performed on the box (|0 0|, π M ) that leads to the box ( 0 ε , π K ), for ε ∈ [0, 1), is required to obey the bound which follows as a consequence of the fundamental limitation in (44). One way to realize the transformation (|0 0|, π M ) → ( 0 ε , π K ) is to proceed in two steps: first perform an optimal dilution protocol (|0 0|, π M ) → (ρ ε2 , σ) such that log 2 M = D ε2 max (ρ σ) and then perform an optimal distillation protocol (ρ, σ) → ( 0 ε1 , π K ) such that log 2 K = D ε1 min (ρ σ). By employing the triangle inequality, the error of the overall transformation is no larger than ε 1 + ε 2 . Since the fundamental limitation in (52) applies to any protocol, the bound in (51) follows. We give a detailed proof in Appendix G.

D. Asymptotic distillable distinguishability and distinguishability cost
We can now reconsider the i.i.d. case of a box (ρ ⊗n , σ ⊗n ) in the context of approximate distillation and dilution. Recall that the quantum relative entropy if supp(ρ) ⊆ supp(σ) and D(ρ σ) = ∞ otherwise. By defining the asymptotic distillable distinguishability and asymptotic distinguishability cost of the box (ρ, σ) as follows: respectively, we conclude from the quantum Stein's lemma [HP91, ON00] and the asymptotic equipartition property for the smooth max-relative entropy [TCR09] that thus demonstrating the fundamental operational interpretation of the quantum relative entropy in the resource theory of asymmetric distinguishability. It is worthwhile to note that we can conclude the stronger statement from [Tom12, TH13, Li14] (see Appendix H). Thus, the equality of approximate distillable distinguishability and approximate distinguishability cost in the i.i.d. case holds in the leading order term, with a difference in sublinear in n terms. As discussed in Appendix L, the secondorder term in (57) can be identified exactly by appealing to [Li14,TH13]. The second-order term in (58) can be identified also by appealing to [TH13], but there is a need in this case to change the quantification of error in the resource theory of asymmetric distinguishability from normalized trace distance to infidelity. As a consequence of the fundamental equality in (56), we conclude that the resource theory of asymmetric distinguishability is reversible in the asymptotic setting. That is, for large n, by starting with the box (ρ ⊗n , σ ⊗n ) one can distill it approximately to nD(ρ σ) bits of asymmetric distinguishability, and then one can dilute these nD(ρ σ) bits of asymmetric distinguishability back to the box (ρ ⊗n , σ ⊗n ) approximately.

E. Asymptotic box transformations
We can also solve the asymptotic box transformation problem stated around (15). Before doing so, let us formalize the problem. Let n, m ∈ Z + and ε ∈ [0, 1]. An (n, m, ε) box transformation protocol for the boxes (ρ, σ) and (τ, ω) consists of a channel N (n) such that A rate R is achievable if for all ε ∈ (0, 1], δ > 0, and sufficiently large n, there exists an (n, n[R − δ], ε) box transformation protocol. The optimal box transformation rate R((ρ, σ) → (τ, ω)) is then equal to the supremum of all achievable rates. On the other hand, a rate R is a strong converse rate if for all ε ∈ [0, 1), δ > 0, and sufficiently large n, there does not exist an (n, n[R + δ], ε) box transformation protocol. The strong converse box transformation rate R((ρ, σ) → (τ, ω)) is then equal to the infimum of all strong converse rates.
Note that the following inequality is a consequence of the definitions: The final result of our paper is the following fundamental equality for the resource theory of asymmetric distinguishability: indicating that the quantum relative entropy plays a central role as the optimal conversion rate between boxes.
The proof of this result consists of two parts: achievability and optimality. For the achievability part, i.e., the bound we first distill bits of asymmetric distinguishability from (ρ ⊗n , σ ⊗n ) at the rate ≈ D(ρ σ). After doing so, we then dilute these ≈ nD(ρ σ) bits of asymmetric distinguishability to the box (τ ⊗m , ω ⊗m ), such that . For the optimality part, i.e., the strong converse bound we suppose that there exists a sequence of (n, m, ε) box transformation protocols and then employ a pseudocontinuity inequality for sandwiched Rényi relative entropy (Lemma 1) and its data processing inequality to conclude that R((ρ, σ) → (τ, ω)) ≤ D(ρ σ) D(τ ω) . Alternatively, we can employ a pseudo-continuity inequality for the Petz-Rényi relative entropy (Lemma 3) and its data processing inequality. See Appendix J for details. We note here that the bounds in Propositions 1 and 2 are exponential strong converse bounds, demonstrating that the error in the transformation converges to one exponentially fast if the rate of conversion is strictly larger than D(ρ σ) D(τ ω) .

IV. CONCLUSION
In this paper, we have developed the resource theory of asymmetric distinguishability. The main constituents consist of boxes as the objects of manipulation, all quantum channels as the free operations, and bits of asymmetric distinguishability as the fundamental currency of interconversion. The resource theory is reversible in the asymptotic case, and the quantum relative entropy emerges as the fundamental rate at which boxes can be converted. Our one-shot results can be compactly summarized as follows: 1. The min-relative entropy is equal to the exact oneshot distillable distinguishability.
2. The max-relative entropy is equal to the exact oneshot distinguishability cost.
3. The smooth min-relative entropy is equal to the approximate one-shot distillable distinguishability.

The smooth max-relative entropy is equal to the approximate one-shot distinguishability cost.
Thus, each of these one-shot entropies are fundamentally operational quantities. Finally, the ratio of quantum relative entropies of two pairs of quantum states is equal to the optimal rate of asymptotic box transformations between them.
Going forward from here, there are many interesting directions to pursue. The resource theory of asymmetric distinguishability for quantum channels has recently been developed in [WW19]. The main constituents consist of a channel box (N , M), for quantum channels N and M, as the basic objects of manipulation, superchannels [CDP08] as the free operations, and bits of asymmetric distinguishability as the fundamental currency. Some basic results are that the one-shot distillable distinguishability of a channel box is equal to the smooth channel min-relative entropy [CMW16], and the one-shot distinguishability cost is equal to the smooth channel max-relative entropy [GFW + 18, LW19]. The theory reduces to the theory for quantum states in the case that the channels that are environment-seizable, as defined in [BHKW18].
It remains open to determine optimal error exponents and strong converse exponents for the distinguishability dilution task, as well as for the more general box transformation problem. These quantities have been established for distinguishability distillation (i.e., hypothesis testing) [Nag06, Hay07, ANSV08, HMO08, MO15], and so there is a strong possibility that these operational quantities could be determined for the dual task. Some of the bounds in Appendix K could be useful for this purpose. The same questions remain open for second-order asymptotics.
In Appendix L, we explore a variation of the resource theory of asymmetric distinguishability in which the infidelity is employed as a measure of approximation, rather than the normalized trace distance. There are similar interesting questions regarding this variation, in particular, whether error exponents and strong converse exponents for distinguishability dilution could be proven to be optimal.
One could also consider the case in which the boxes consist of not just two states but multiple states, connecting with the theory of quantum state discrimination [BC09, BK15a]. The boxes could even consist of a continuum of states or channels, connecting with quantum estimation theory [Hel69, Hel76] and the resource theoretic approach put forward in [Mat05]. The boxes could also consist of a state and a set of states, with the set of free operations restricted, which allows for connecting with general resource theories [CG19, LBT19]. Extending this, the boxes could consist of a channel and set of channels, with restricted free operations, allowing to connect with general resource theories of quantum channels A particularly interesting direction would be to consider reversibility of the resource theory of asymmetric distinguishability beyond the first order and investigate resource resonance effects. For this direction, the recent results of [KH17, CTK18, KCT19, CTK19] are quite relevant. Related to this, one could investigate more fine-grained questions related to asymptotic reversibility along the lines of [KH13], where we expect similar findings to hold in the resource theory of asymmetric distinguishability.
Note: After completing our paper, we learned about the independent and related work of In the following appendices, we provide detailed proofs of all claims in the main text. As a resource, we have included derivations of some of the dual semi-definite programs listed below as an ancillary file available for download with the arXiv posting of this paper. We begin by providing some background facts in Appendix A, some of which can be found in [Wil17].
Appendix A: Background

Normalized trace distance
A quantum state is described mathematically by a positive semi-definite operator with trace equal to one. The normalized trace distance between two quantum states ρ and σ is given by 1 2 ρ − σ 1 , where the trace norm of an operator A is defined as The following variational characterization of the normalized trace distance is well known [Hel69]: endowing the normalized trace distance with its operational meaning as the largest probability difference that a single POVM element can assign to two quantum states. The right-hand side of (A1) is a semi-definite program as written, with the following dual: where the equality holds from strong duality.

Choi isomorphism
The Choi isomorphism is a standard way of characterizing quantum channels that is suitable for optimizing over them in semi-definite programs. For a quantum channel N A→B , its Choi operator is given by where Γ RA = |Γ Γ| RA and with {|i R } i and {|i A } i orthonormal bases. The Choi operator is positive semi-definite J N RB ≥ 0, corresponding to N A→B being completely positive, and satisfies Tr B [J N RB ] = I R , the latter corresponding to N A→B being trace preserving.
On the other hand, given an operator J M RB satisfying J M RB ≥ 0 and Tr B [J M RB ] = I R , one realizes via postselected teleportation [Ben05] the following quantum channel: where systems S, R, and A are isomorphic and the last line employs the facts that (M S ⊗ I R ) |Γ SR = (I S ⊗ T R (M R )) |Γ SR for T R the transpose map, defined as and Γ| SR (I S ⊗ X RB ) |Γ SR = Tr R [X RB ]. We often abbreviate the transpose map simply as Since the constraints J M RB ≥ 0 and Tr B [J M RB ] = I R are semi-definite, this is a useful way of incorporating optimizations over quantum channels into semi-definite programs.
The min-relative entropy obeys the data processing inequality for states ρ and σ and a quantum channel N : This inequality was proved in [Dat09] by utilizing its relation to the Petz-Rényi relative entropies. For an alternative proof, first note that the inequality in (A24) is equivalent to To see the latter, let U be an isometric extension of the channel N , so that Then we find that The first equality follows because U Π ρ U † = Π UρU † . The inequality follows because the support of U ρU † is contained in the support of The smooth min-relative entropy obeys the data processing inequality as well, in fact for any trace nonincreasing positive map N and for all ε ∈ (0, 1): The max-relative entropy also obeys the data processing inequality for an arbitrary positive map N : To see this, let λ be such that ρ ≤ 2 λ σ. Then from the fact that N is positive, it follows that N (ρ) ≤ 2 λ N (σ). It then follows that Since this is true for arbitrary λ satisfying ρ ≤ 2 λ σ, we conclude (A32). The smooth max-relative entropy obeys the data processing inequality for a positive, trace-preserving map N and for all ε ∈ (0, 1): To see this, let ρ be an arbitrary state such that Then from the data processing inequality for normalized trace distance under positive trace-preserving maps, it follows that So it follows that Since the inequality holds for an arbitrary state ρ satisfying (A36), we conclude (A35). Since all of the above quantities obey the data processing inequality for quantum channels, we conclude that they are invariant under the action of an isometric channel U(·) = U (·)U † : which follows because U is a channel and the channel in (8) perfectly reverses the action of U.
In the main text, we provided an operational proof of this limit. An alternative proof goes as follows. Consider that the following inequality holds for all ε ∈ (0, 1): because the measurement operator Π ρ (projection onto support of ρ) satisfies Tr[Π ρ ρ] ≥ 1 − ε for all ε ∈ (0, 1). So we conclude that Alternatively, suppose that Λ is a measurement operator satisfying Tr[Λρ] = 1 − ε (note that when optimizing D ε min , it suffices to optimize over measurement operators satisfying the constraint Tr[Λρ] ≥ 1 − ε with equality [KW17]). Then applying the data processing inequality for D α (ρ σ) under the measurement {Λ, I − Λ}, which holds for α ∈ (0, 1), we find that Since this bound holds for all measurement operators Λ satisfying Tr[Λρ] = 1−ε, we conclude the following bound for all α ∈ (0, 1): Now taking the limit of the right-hand side as ε → 0, we find that the following bound holds for all α ∈ (0, 1): Since the bound holds for all α ∈ (0, 1), we can take the limit on the left-hand side to arrive at (A50) Now putting together (A46) and (A50), we conclude (A44).
As stated in (50), the following limit holds In the main text, we provided an operational proof of this limit. An alternative proof goes as follows. Consider that the following bound holds for all ε ∈ (0, 1): which follows as a simple consequence of the fact that we can always set ρ = ρ. Then the following limit holds To see the other inequality, let ρ be a state satisfying 1 2 ρ − ρ 1 ≤ ε. Then this means that ρ − ρ ∞ ≤ 2ε. Consider that Since this bound holds for all ρ satisfying 1 2 ρ − ρ 1 ≤ ε, we conclude that (A55) Then taking the limit ε → 0, we find that Putting together (A53) and (A56), we conclude (A51).

Appendix B: SDPs for smooth min-and max-relative entropies
Here we show that the smooth min-and max-relative entropies are characterized by semi-definite programs. We also give the dual programs for convenience.
Consider that which is an SDP as written. The dual SDP is given by and is equal to D ε min (ρ σ) by strong duality. See [DKF + 12] in this context.
By employing the definition of the smooth max-relative entropy in (49) and the dual characterization of the normalized trace distance in (A2), we find that The dual SDP is given by and is equal to D ε max (ρ σ) by strong duality.

Appendix C: Approximate box transformation is an SDP
We prove that the approximate box transformation problem can be computed by a semi-definite program. First, recall that the problem is characterized by for states ρ, σ, τ , and ω. By employing the dual form of the trace distance from (A2), we find that The dual program is given by with the equality holding from strong duality.

Appendix D: Impossibility of distinguishability increasing transformations
It is impossible for a quantum channel N to increase the distinguishability of a box (ρ, σ). That is, it impossible for the transformation (ρ, σ) N − → (N (ρ), N (σ)) to be such that the distinguishability of (N (ρ), N (σ)) is strictly larger than the distinguishability of (ρ, σ). This follows as a direct consequence of the data processing inequality for quantum relative entropy [Lin75]: when using quantum relative entropy as a quantifier of distinguishability.
For the specific transformation in (20), we find that so that if the transformation in (20) existed, it would violate (D1), due to the assumption n > m.
The fact that the transformation in (20) does not exist can also be seen as a consequence of the linearity of quantum channels. Let us first suppose that the boxes (|0 0| ⊗m , π ⊗m ) and (|0 0| ⊗n , π ⊗n ) have been reversibly transformed to their standard form as respectively, where we recall that π 2 m = 2 −m |0 0| + (1 − 2 −m ) |1 1|. Then the original question is equivalent to the question of whether there exists a channel N that takes the first box to the second for n > m. Such a channel would then perform the transformations: By linearity of the channel, consider that we can conclude the action of the channel on the orthogonal state |1 1|: If n > m, then we have that 2 −n −2 −m (1−2 −m ) < 0, so that N (|1 1|) is not a quantum state. Thus, there cannot exist a quantum channel performing the transformation in (20) whenever n > m.
Appendix E: Entropic characterizations of exact distinguishability distillation and dilution

Exact distillable distinguishability
We prove the equality in (31): Recall that First suppose that Tr[Π ρ σ] = 0. Consider that the measurement channel so that Now let P be a particular quantum channel such that P(ρ) = |0 0| and P(σ) = π M . Then by the dataprocessing inequality for D min as recalled in (A24), we find that Since the inequality D min (ρ σ) ≥ log 2 M holds for all channels P satisfying the constraints in (E2), we conclude that Combining (E7)-(E9) and (E13), we conclude the equality in (31), i.e., D min (ρ σ) = D 0 d (ρ, σ). In the case that Tr[Π ρ σ] = 0, then this means that the measurement channel above is such that M(ρ) = |0 0| and M(σ) = |1 1|. In this case, as stated in the main text, the interpretation is that the box (ρ, σ) contains an infinite number of bits of asymmetric distinguishability, so that D 0 d (ρ, σ) = ∞. This is consistent with D min (ρ σ) = ∞ in this case.

Exact distinguishability cost
We now prove the equality in (35): First recall that Let us first suppose that supp(ρ) ⊆ supp(σ) and D max (ρ σ) = 0. By definition, this means that the condition ρ ≤ σ holds, which in turn implies that σ − ρ ≥ 0. Given the characterization of the normalized trace distance in (A2), this means that we can set Y = σ − ρ. Since Tr[Y ] = 0, we conclude that 1 2 σ − ρ 1 = 0. Since · 1 is a norm, this means that ρ = σ. So in this trivial case, it follows that we can take P in (E15) to be the replacer channel Tr[·]ρ and it follows that we can achieve the dilution task with zero bits of asymmetric distinguishability. So then D 0 c (ρ, σ) = 0 if supp(ρ) ⊆ supp(σ) and D max (ρ σ) = 0. Now suppose that supp(ρ) ⊆ supp(σ) and D max (ρ σ) > 0. Let λ > 0 be such that 2 λ σ ≥ ρ. This then means that 2 λ σ − ρ ≥ 0, so that ω := 2 λ σ−ρ 2 λ −1 is a quantum state. Furthermore, we have that Then by means of the following channel we have that so that this protocol accomplishes the distinguishability dilution task. This means that Now taking the infimum over all λ satisfying 2 λ σ ≥ ρ, we conclude that Now consider an arbitrary channel P that accomplishes the transformation (|0 0|, π) → (ρ, σ). By the data processing inequality for the max-relative entropy as recalled in (A32), we have that Taking an infimum over all such protocols, we conclude that Putting together (E21) and (E25), we conclude the equality in (35), i.e., D 0 c (ρ, σ) = D max (ρ σ). In the case that supp(ρ) ⊆ supp(σ), we have that Tr[Π σ ρ] < 1 and by definition D max (ρ σ) = ∞. This is consistent with the fact that, in such a case, there is no finite λ ≥ 0 such that 2 λ σ − ρ ≥ 0. For if there were, then we would have that where the inequality follows from Tr[{A ≥ 0} A] ≥ Tr[ΠA] for any Hermitian operator A, projector Π, and {A ≥ 0} denoting the projection onto the positive eigenspace of A. The above implies that contradicting the fact that Tr[Π σ ρ] < 1 when supp(ρ) ⊆ supp(σ). As explained in the main text, when supp(ρ) ⊆ supp(σ), there is no finite value of M nor any quantum channel P such that P(|0 0|) = ρ and P(π M ) = σ. If there were, then by the general fact that, for a quantum channel N and states τ and ω, supp(N (τ )) ⊆ supp(N (ω)) if supp(τ ) ⊆ supp(ω) [Ren05, Appendix B] and the fact that supp(|0 0|) ⊆ supp(π M ) for all M < ∞, the existence of such a channel P would contradict the assumption that supp(ρ) ⊆ supp(σ). The interpretation then is as stated in the main text: that D 0 c (ρ, σ) = ∞ when supp(ρ) ⊆ supp(σ), which is consistent with the fact that D max (ρ σ) = ∞ in such a case.
Appendix F: Entropic characterizations of approximate distinguishability distillation and dilution
Appendix H: Asymptotic distillable distinguishability and distinguishability cost As a direct consequence of (44) and results from [TH13, Li14], the following expansion holds for sufficiently large n: where D(ρ σ) is the quantum relative entropy. The relative entropy variance V (ρ σ) [TH13, Li14] is defined as if supp(ρ) ⊆ supp(σ) and is otherwise undefined. Furthermore, Φ −1 (ε) is the inverse of the cumulative normal distribution function, defined as where Based on the inequality in (51), we have that Then by picking δ = 1/ √ n, and applying (48), (44), (H1), and the fact that Φ −1 (1 − ε) = −Φ −1 (ε), we find that By following the proof of [TH13, Eq. (21)], but instead using the normalized trace distance as the metric for smooth max-relative entropy, we find that where ε ∈ (0, 1) and |spec(σ)| is equal to the number of distinct eigenvalues of σ. We give a detailed proof of (H6) in Appendix I. By the operational interpretations of D ε max and D 1−ε 2 min , the inequality in (H6) can equivalently be written as Now accounting for the fact that |spec(σ ⊗n )| = O(log n) and applying (H1), we conclude that Thus, we have that Appendix I: Bound relating smooth max-and min-relative entropies Here we prove the following bound: where |spec(σ)| is equal to the number of distinct eigenvalues of σ.
The proof follows the proof of [TH13, Eq. (21)] closely, but instead using the normalized trace distance as the metric for smooth max-relative entropy and accounting for a minor typo present in the proof of [TH13,Eq. (21)].
Let the eigendecomposition of σ be σ = x λ σ x Π σ x , where Π σ x is the projection onto the eigenspace of σ with eigenvalue λ σ x . Let E σ (·) = x Π σ x (·)Π σ x denote the pinching quantum channel. In what follows, we make use of the pinching inequality [Hay02]: ρ ≤ |spec(σ)| E σ (ρ). (I2) Let µ be the largest value such that Tr[QE σ (ρ)] = 1 − ε 2 , where Q = {E σ (ρ) ≤ 2 µ σ}. Due to the fact that Q commutes with σ, we have that E σ (Q) = Q, which implies that Then we set for which we have that by applying [Wil17, Lemma 9.4.1]. This in turn implies that via the inequality 1 , so that ρ is a candidate for the optimization involved in D ε max (ρ σ). Now consider that So it follows that Now consider that Tr[(I − Q) ρ] = ε 2 and I − Q = {E σ (ρ) > 2 µ σ}, for which we have that implying that Taking a negative logarithm, this gives Since Tr[(I − Q) ρ] = ε 2 , this means that I − Q is a candidate for Λ in the definition of smooth min-relative entropy, from which we conclude that where the latter inequality follows from the data processing inequality in (A31). Putting together (I14) and (I21), we arrive at (I1).

Appendix J: Asymptotic box transformations
We now provide a proof of Eq. (62), i.e., (J1) so that the quantum relative entropy gives the optimal conversion rate for boxes. We prove this result in two steps, called the direct part and strong converse part.

Achievability: Direct part
We begin with the direct part. The goal is to show that for all ε ∈ (0, 1], δ > 0, and sufficiently large n, there exists an (n, n[R − δ], ε) box transformation protocol with R = D(ρ σ) D(τ ω) . The approach we take here is related to an approach from [KW19].
Consider that we can perform the transformation (ρ ⊗n , σ ⊗n ) → ( 0 ε1 , π M ) such that Then applying the following inequality from [AMV12, Proposition 3.2] (see also [QWW18, Proposition 3]) we find that Set α ∈ (0, 1) such that which is possible due to (A12) and (A14), and for this choice of α, take n large enough so that Then we have that Also, consider that we can perform the transformation (|0 0|, π) → ( τ ⊗m , ω ⊗m ) (with error ε 2 ), for fixed M , by taking m as large as possible so that the following inequality still holds If it is not possible to find an m to saturate the inequality, then one can find states τ ′ and ω ′ with just enough distinguishability such that while having a negligible impact on the final parameters of the protocol. The resulting protocol then produces the states ≈ ε τ ⊗m ⊗ τ ′ and ω ⊗m ⊗ ω ′ , and the final step is to perform a partial trace over the extra ancilla system. By applying the following inequality from Proposition 6 D ε2 max (ρ σ) ≤ D β (ρ σ) + log 2 (1/ 1 − ε 2 2 ) + 1 β − 1 log 2 (1/ε 2 2 ), (J11) proved in Appendix K, we find that Now set β > 1 such that which is possible due to (A17) and (A21), and for this choice of β, take n sufficiently large so that (Note that we require n large enough so that both (J7) and (J14) hold.) Then we have that Putting together (J8) and (J15), we find that Now dividing both sides by nD(τ ω), we find that The rate of this scheme is equal to m/n. The error of the protocol is no larger then ε 1 + ε 2 = ε, following from an application of the triangle inequality. Thus, we have shown that for all ε ∈ (0, 1), δ > 0, there exists an (n, n [R − δ] , ε) box transformation protocol with R = D(ρ σ) D(τ ω) , concluding the proof of the achievability part.

Strong converse via sandwiched Rényi relative entropy
Before proving the strong converse, we establish the following lemma as a generalization of [LWD16, Proposition 2.8]. In fact, the proof of the following lemma is contained in the proof of [LWD16, Proposition 2.8]. The following lemma serves as a pseudo-continuity inequality for the sandwiched Rényi relative entropies.
We now give a proof for the strong converse statement in (62). Our proof is related to the approach from [KW19]. Fix ε ∈ [0, 1) and δ > 0. We need to show that there is an n large enough such that there does not exist an (n, n[R + δ], ε) box transformation protocol, with R set as follows: From Proposition 1, the following bound holds for an arbitrary (n, m, ε) protocol, α ∈ (1/2, 1), and β ≡ β(α) := α/ (2α − 1): Set δ 2 such that 0 < δ 2 < δD(τ ω). Then set δ 1 > 0 such that the following equation is satisfied i.e., Set α ∈ (1/2, 1) such that which is possible due to (A12), (A14), (A17), (A21), and the fact that β = α/ (2α − 1), and for this choice of α, pick n large enough so that For these choices, we then have that and we also have that Putting these inequalities together, we find that Thus, the rate of the protocol m n is strictly less than D(ρ σ) D(τ ω) + δ, so that an (n, n[R + δ], ε) box transformation protocol cannot exist for the choice of n taken nor any n larger than that (for the latter statement, note that (J45) still holds for larger n).

Strong converse via Petz-Rényi relative entropy
We now discuss an alternative proof of the strong converse by going through the Petz-Rényi relative entropy. We begin with a pseudo-continuity inequality for the Petz-Rényi relative entropy. The proof of Lemma 3 below follows the spirit of the proof of [LWD16, Proposition 2.8], but this time some steps are different.
We note here that one could arrive at the strong converse statement by going through steps similar to those in (J41)-(J48), but using Proposition 2 instead.
We now give some upper bounds on the smooth maxrelative entropy in terms of the quantum relative entropy and the sandwiched Rényi relative entropy. The method for doing so follows the proof approach of [JN12, Theorem 1] very closely. The upper bound in Proposition 5 is very similar to [JN12, Theorem 1], but it is expressed in terms of quantum relative entropy rather than observational divergence.
Pick λ = 1 ε 2 D(ρ σ) + 1 2 ln 2 ρ − σ 1 , and we conclude from the above that and in turn that [FvdG98] We also have that Now let Λ be an arbitrary operator satisfying Λ ≥ 0 and Tr[Λσ] ≤ 1, and let Π be the projection defined in (K23) for this choice of Λ. Then we find that This concludes the proof.
The proof of the following proposition follows the same proof approach of [JN12, Theorem 1] (as recalled above), but instead employs the sandwiched Rényi relative entropy and its data processing inequality. The following proposition was also reported recently in [ABJT19]: Proposition 6 Given states ρ and σ, the following bound holds for all α > 1 and ε ∈ (0, 1): Proof. The first steps are exactly the same as (K17)-(K25). Now consider from the data processing inequality under the channel Now picking we conclude that The rest of the proof then proceeds as in (K34)-(K48), and we find that D ε max (ρ σ) ≤ λ + log 2 1/ 1 − ε 2 .