Efficient Randomness Certification by Quantum Probability Estimation.

For practical applications of quantum randomness generation, it is important to certify and further produce a fixed block of fresh random bits with as few trials as possible. Consequently, protocols with high finite-data efficiency are preferred. To yield such protocols with respect to quantum side information, we develop quantum probability estimation. Our approach is applicable to device-independent as well as device-dependent scenarios, and it generalizes techniques from previous works [Miller and Shi, SIAM Journal on Computing 46, 1304 (2017); Arnon-Friedman et al., Nature Communications 9, 459 (2018)]. Quantum probability estimation can adapt to changing experimental conditions, allows stopping the experiment as soon as the prespecified randomness goal is achieved, and can tolerate imperfect knowledge of the input distribution. Moreover, the randomness rate achieved at constant error is asymptotically optimal. For the device-independent scenario, our approach certifies the amount of randomness available in experimental results without first searching for relations between randomness and violations of fixed Bell inequalities. We implement quantum probability estimation for device-independent randomness generation in the CHSH Bell-test configuration, and we show significant improvements in finite-data efficiency, particularly at small Bell violations which are typical in current photonic loophole-free Bell tests.


I. INTRODUCTION
Randomness is important for many applications including Monte-Carlo simulations, statistical sampling, randomized algorithms, and cryptography [1]. A fundamental feature of randomness is unpredictability, which is also exhibited by quantum measurement outcomes.
Quantum mechanics thus provides natural strategies for generating randomness. For example, a uniformly random bit can be generated by measuring a two-level quantum system in an equal superposition of its two levels. In this scheme, however, to guarantee the performance one needs to trust the inner working of quantum devices. It is desirable if the generated randomness can be certified solely by statistical tests of the inputs and outputs of quantum devices. A loophole-free Bell test provides such a strategy, as first proposed in 2006 by Colbeck in his PhD thesis [2]. This strategy for certified randomness generation without trust in quantum devices is known as device-independent randomness generation (DIRG).
Many DIRG protocols [3][4][5][6][7][8][9][10][11][12][13][14][15] have been developed in the past nine years. They are different in the following aspects: the specific requirements on quantum devices, the Bell-test configuration applied, the security level achieved, and the asymptotic randomness rate and finite-data efficiency exhibited. Also, in the past four years loophole-free Bell tests have been realized [16][17][18][19][20], enabling experimental demonstrations of DIRG [21][22][23]. However, due to the lack of finite-data efficiency, even the most advanced DIRG protocol with respect to quantum side information [13] requires a very large number of trials with current loophole-free Bell tests. With a state-of-the-art photonic loophole-free Bell test [23] the DIRG protocol in Ref. [13] requires at least 9.4 × 10 9 trials, corresponding to 13 hours of experiment time (see Fig. 3 of Ref. [23]), before certifying any randomness with error bounded by 10 −5 , where, informally, the error can be thought of as the probability that the protocol output does not satisfy the certified claim. For practical applications of randomness such as randomness beacons that provide trusted public randomness [24], it is important to improve finite-data efficiency, as these applications often require short blocks of fresh random bits with minimum delay or latency.
Excellent finite-data efficiency for DIRG with respect to classical side information has been recently achieved [21,22,25,26]. Particularly, the method developed by us in Refs. [21,22] reduces the number of trials required for generating 1024 device-independent random bits with error 10 −12 [22] by one order of magnitude as compared with the previously most advanced method (for addressing classical side information) in Refs. [3,6]. To improve the finite-data efficiency further, we developed probability estimation [25,26] for certifying randomness against classical side information. Different from previous works [3,6,21,22], probability estimation provides an estimator, constructed as a function of experimental results, to directly lower-bound the amount of certifiable device-independent randomness without relying on a hypothesis test of local realism. The natural question then is whether we can upgrade probability estimation for certifying randomness against quantum side information.
In this work we develop quantum probability estimation, which enables a full security analysis of DIRG with respect to quantum side information, and most importantly, yields protocols with unsurpassed finite-data efficiency. As the quantum smooth conditional minentropy quantifies the number of near-uniform random bits certifiable in the presence of quantum side information [27], the goal of quantum probability estimation is to obtain a lower bound on the quantum smooth conditional min-entropy. For this, we first construct an estimator to directly lower-bound the sandwiched Rényi entropy (see Thm. 2). Such an estimator is the main reason for achieving unsurpassed finite-data efficiency. Then we lowerbound the quantum smooth conditional min-entropy by the sandwiched Rényi entropy via Prop. 6.5, Pg. 99 of Ref. [28] (see Thm. 3). We also give a sound protocol to realize end-toend randomness generation.
Besides unsurpassed finite-data efficiency, quantum probability estimation has many other advantages over previous works [5, 8-10, 13-15, 29], which include adaptability to changing experimental conditions, flexibility of stopping the experiment early as soon as the prespecified randomness goal is achieved, and tolerance of imperfections in the input distribution. Like entropy accumulation developed in Refs. [13][14][15]29], quantum probability estimation can achieve asymptotically optimal randomness rates and is broadly applicable to both device-independent and device-dependent randomness generation.
The conceptual difference between our work and previous works [5, 8-10, 13-15, 29] for addressing quantum side information in the device-independent scenario lies in that previous works require quantifying randomness as a function of violations of fixed Bell inequalities before performing security analysis with finite data. Although Bell violations and deviceindependent randomness are related, they are inequivalent quantities: a stronger violation of a fixed Bell inequality does not necessarily certify a larger amount of device-independent randomness [30]. Therefore, previous works usually cannot yield protocols with optimal randomness rates or finite-data efficiency. With respect to proof techniques, the main difference between our work and previous works [8,9,[13][14][15]29], which also benefit from the recent studies of sandwiched Rényi entropies, is that quantum probability estimation provides a tighter lower bound on the sandwiched Rényi entropy (see Thm. 2), whereas previous works establish lower bounds on the sandwiched Rényi entropy via uncertainty principles for quantum measurements (as in Refs. [8,9]) or via the conditional von Neumann entropy (as in Refs. [13][14][15]29]). These differences provide an informal explanation of the improvements achieved by quantum probability estimation as compared with previous works, see Fig. 2 for a comparison.
The paper is structured as follows. In Sect. II, we introduce the notation used in this work. In Sect. III we motivate and introduce quantum probability estimation. To implement our method, we need to construct quantum estimation factors (QEFs). In Sect. IV, we show that QEFs can certify quantum smooth conditional min-entropies. Then we present an end-to-end protocol for randomness generation and prove its soundness in Sect. V. Details on the implementation of our method are provided in Sects. VI and VII. Specifically, in Sect. VI we explain the construction of the model that describes all possible states shared between the quantum side information and the classical results after the experiment. In Sect. VII, we discuss several properties of QEFs and provide details on the construction of QEFs. In Sect. VIII we show how the general method of quantum probability estimation is instantiated in the experimentally relevant CHSH Bell-test configuration [31] for DIRG, and then we demonstrate significant improvements on finite-data efficiency, corresponding to significant reductions in latency compared with previous works. Finally we conclude the paper in Sect. IX.

II. NOTATION
We denote classical (random) variables by upper-case letters in regular math font (such as U, V, W), and denote finite sequences of classical variables by upper-case letters in upright bold font (such as C, Z). As is conventional, the values of classical variables are denoted by the corresponding lower-case letters. We use juxtaposition to denote concatenation of variables or their values. For example, we write the concatenation of U and V as UV. The value space of a classical variable such as U is denoted by Rng(U). The cardinality of the value space of U is |Rng(U)|. All classical variables considered in this work are assumed to have finite value spaces.
We identify classical systems with classical variables. We denote and label quantum systems with upper-case letters in sans serif font (such as D, E). Throughout this work, E plays a distinguished role as the system carrying the quantum side information. We denote the identity operator on a classical or quantum system by 1. For a quantum system E, we denote its Hilbert space as ℋ(E). Quantum states, which are positive semidefinite (Hermitian) operators, are denoted by lower-case Greek letters (such as ρ, σ, τ). Both normalized and un-normalized quantum states are considered in this work. Let S(E) be the set of un-normalized states, S 1 (E) = ρ E ∈ S(E): tr(ρ E ) = 1 be the set of normalized states, and S ≤ 1 (E) = ρ E ∈ S(E): tr(ρ E ) ≤ 1 be the set of sub-normalized states of E, where "tr" denotes the trace.
In this work, we study the joint states of a classical variable U and a quantum system E. Such states are called classical-quantum states and have the following form where ρ E (u) ∈ S(E) is the state of E given U = u. We denote the set of classical-quantum states of the above form by S(UE). The set of normalized states and the set of subnormalized states of UE are denoted by S 1 (UE) and S ≤ 1 (UE), respectively. If there are multiple classical variables U, V, W, … involved in an experiment, we denote the classicalquantum states and the corresponding sets of states in similar ways. For example, ρ UVE is the joint state of the classical variables U, V and the quantum system E, and S(UV E) is the corresponding set of classical-quantum states.
For a classical-quantum state such as ρ UE , when the quantum system E is one-dimensional the state ρ UE (up to a normalization constant) specifies a classical probability distribution of the variable U. We also use lower-case (but different) Greek letters (such as μ, ν) to denote classical probability distributions. The probability of an event Φ according to a probability distribution μ is denoted by ℙ μ (Φ). The expectation of a classical variable U according to μ is denoted by E μ (U).
For convenience, our notional conventions, as well as important parameters and variables used in this work, are summarized in Appendix A.

III. QUANTUM ESTIMATION FACTORS
Consider an experiment with inputs Z and outputs C. Both the inputs and outputs are classical variables. The inputs normally consist of the random choices made for measurement settings. The outputs consist of the corresponding measurement outcomes. The inputs and outputs are determined in a sequence of n time-ordered trials, where the i'th trial has input Z i and output C i , so Z = (Z i ) i = 1 n and C = (C i ) i = 1 n , see Fig. 1. We refer to the trial inputs and outputs collectively as the trial results. In addition, we refer to the results from the trials preceding the upcoming one as the past. The past can also include initial conditions and any additional information that may have been obtained, which are usually implicit when referring to or conditioning on the past. The external quantum system carrying the quantum side information is E, whose initial state before the experiment may be correlated with the quantum system D of the devices used. After the experiment, the quantum system D is traced out, and only the classical inputs Z and outputs C of the devices are considered. The joint state of the classical systems C, Z, and the quantum system E is a classical-quantum state where ρ E (cz) is the sub-normalized state of E given results cz. The trace tr (ρ E (cz)) is the probability of observing the results cz after the experiment. In general, we consider the set of all possible classical-quantum states that can occur after the experiment. We refer to this set as the model C for the experiment (see Sect. VI for details). Our goal is to estimate (or strictly speaking, bound) the quantum smooth conditional min-entropy of C given Z and the side information in E, without knowing which particular normalized state ρ CZE in the model describes the experiment.
In previous works [25,26], we considered the case that the quantum system E is onedimensional. In this case, ρ E (cz) specifies the probability of observing cz given the side information, which we write as μ E (cz), and so the side information is classical. The model C becomes the set of probability distributions μ E of CZ that capture verified, physical Equivalently, for each μ E ∈ C, the probability that C and Z take values c and z for which (ϵF(C = c, Z = z)) −1/β ≤ μ E (C = c|Z = z) is at most ϵ. This defines (ϵF(CZ)) −1/β as a level-ϵ probability estimator. Probability estimates provide lower bounds on the smooth minentropy of C conditional on ZE, which quantifies the amount of randomness certifiable in the presence of classical side information, as established in Refs. [25,26]. Throughout this work, we refer to a bound on a state-dependent quantity obtained from a statistic, either a lower or an upper bound depending on the context, as an estimate of the quantity interested.
In this work, we study the general case where the dimension of the quantum system E is finite but can be arbitrarily large. In the presence of quantum side information, instead of estimating conditional probabilities, we estimate sandwiched Rényi powers defined as follows. Fix α > 1 and β = α − 1. Let ℋ be a finite-dimensional Hilbert space, and 0 ≤ ρ ≪ σ ∈ ℋ, where we write ρ ≪ σ if the support 1 of ρ is a subspace of the support of σ.
The sandwiched Rényi power of order α of ρ conditional on σ is defined as and the normalized sandwiched Rényi power of order α is defined as We remark that the Rényi power ℛ α (ρ|σ) is defined to be 0 if ρ = 0 and σ = 0, and that the normalized Rényi power ℛ α (ρ|σ) is defined to be 1 if ρ = 0. The definitions ensure that ℛ α (ρ|σ) and ℛ α (ρ|σ) are always non-negative. The quantity −log 2 (ℛ α (ρ|σ))/β as defined in Refs. [34,35] is called the sandwiched Rényi entropy of order α of ρ conditional on σ. One motivation for estimating sandwiched Rényi powers is that the amount of randomness certifiable in the presence of quantum side information as quantified by the quantum smooth conditional min-entropy is bounded from below by a sandwiched Rényi entropy, see Prop. 6.5, Pg. 99 of Ref. [28]. Throughout this work, we assume that α > 1 and so β = α − 1 as introduced above is positive.
Given a state ρ CZE ∈ S 1 (CZE), define ρ E (z) = c ρ E (cz) and Note that ρ ZE ∈ S 1 (ZE) and ρ ZE = tr C (ρ CZE ), where tr C is the partial trace over system C. As discussed above, when the quantum system E is one-dimensional, ρ E (cz) = μ E (cz) and so powers provide lower bounds on the amount of certifiable randomness. These motivate us to define quantum probability estimation, with ℛ α (ρ E (cz)|ρ E (z)) taking the role that μ E (c|z) β takes in classical probability estimation. With this in mind, we introduce quantum estimation factors (QEFs) as the quantum generalization of PEFs. A QEF with power β > 0 is a function F : Rng(CZ) ℝ ≥ 0 such that for all normalized states ρ CZE in the model C, F(CZ) satisfies the QEF inequality cz tr(ρ E (cz))F (cz)ℛ α (ρ E (cz)|ρ E (z)) = cz F (cz)ℛ α (ρ E (cz)|ρ E (z)) ≤ 1 .
The concept of a QEF generalizes techniques for certifying randomness against quantum side information used in previous works [9,13]. In particular, the role of QEFs is similar to the role of the weighting terms in the weighted (1 + ϵ)-randomness function of Eq. (6.4) in Ref. [9]. QEFs also play a role similar to that of the quantum systems D i D i in Eq. (16) of Ref. [13]. The existence of QEFs is suggested by the existence and construction of PEFs shown in our previous works [25,26]. Methods for constructing non-trivial and useful QEFs will be provided in Sect. VII. In this and the next two sections we will present results for randomness certification using QEFs.
A QEF with power β can be interpreted as an estimator of a normalized sandwiched Rényi power of order α = 1 + β. We formalize this interpretation as follows: Theorem 1. Let F(CZ) be a QEF with power β for the model C. For an arbitrary normalized state ρ CZE in C, where μ(CZ) = tr E (ρ CZE ).
According to the theorem, for each normalized state p CZE in the model C, the probability that C and Z take values c and z for which 1/(ϵF (cz)) ≤ ℛ α (ρ E (cz)|ρ E (z)) is at most ϵ. This defines 1/(ϵF(CZ)) as a level-ϵ estimator of the normalized sandwiched Rényi power ℛ α (ρ E (CZ)|ρ E (Z)), which is the QEF analogue of the statement below Eq. (3) for PEFs.
Proof. According to the QEF inequality at the normalized state ρ CZE and in view of the fact that μ(cz) = tr(ρ E (cz)), Since F (CZ)ℛ α (ρ E (CZ)|ρ E (Z)) ≥ 0 and by Markov's inequality, The theorem follows by rearranging the inequality defining the event in the probability on the left-hand side. ◻ In view of our previous result [25,26] that level-ϵ estimators of the conditional probability μ E (C|Z) provide lower bounds on the smooth conditional min-entropy in the presence of classical side information and the observation that the normalized sandwiched Rényi power ℛ α (ρ E (CZ)|ρ E (Z)) reduces to μ E (C|Z) β when the quantum system E is one-dimensional, Thm.
1 suggests that lower bounds on the smooth conditional min-entropy in the presence of quantum side information can be obtained with QEFs. To obtain such lower bounds, we take advantage of the relation between sandwiched Rényi powers and quantum smooth conditional min-entropies established by Prop. 6.5, Pg. 99 of Ref. [28].
Proof. The normalized classical-quantum state conditional on Φ′ can be explicitly written as Direct calculation establishes the following equality Then, the bound in the theorem statement follows immediately from the QEF inequality (7) and the non-negativity of both sandwiched Rényi powers and QEFs. Specifically, it suffices to rewrite the QEF inequality and drop irrelevant terms: Using Eq. (10), the claimed inequality is obtained by multiplying both sides of the above inequality by q β /κ α . ◻ We remark that for experimentally relevant models a QEF with power β is a PEF with the same power, as the model for an experiment in the presence of quantum side information is a superset of the model for the experiment in the presence of classical side information. For Bell-test configurations, considering that a PEF is a test factor for the hypothesis test of local realism (see the last paragraph of the main text in Ref. [36]), so is a QEF. Therefore, if a finite sequence of trial results CZ is explainable by local realism and F(CZ) is a QEF with power β for the experiment, according to Ref. [37] the event Φ in the statement of Thm. 2 would happen with probability at most q β (which is parallel to the statement for a PEF in the last paragraph of the main text in Ref. [36]).

IV. QUANTUM SMOOTH CONDITIONAL MIN-ENTROPY
The amount of randomness that is available in the presence of quantum side information is characterized by the quantum conditional min-entropy [27]. Below we first specialize the definitions of relevant quantities in Refs. [27,38] to the family of classical-quantum states treated in this work. Then we show that QEFs can certify the presence of randomness in C conditional on ZE.
We consider the classical-quantum state ρ CZE which may be sub-normalized. The state ρ CZE has maximum probability (abbreviated as max-prob below) p of C given ZE if there exists a normalized state σ ZE ∈ S 1 (ZE) such that ρ E (cz) ≤ pσ E (z) for all cz. The exact max-prob of C given ZE at ρ CZE is The quantity H ∞ (C|ZE) ρ = −log 2 (P max (C|ZE) ρ ) is called the quantum conditional minentropy of C given ZE at ρ CZE . When writing a state such as ρ CZE in a subscript we omit the underlying systems of the state if there is no ambiguity. It is conventional to focus on additive entropy quantities. However, since PEF-and QEF-based estimates are naturally related to probabilities, we find it convenient to focus on multiplicative, probability-related quantities instead.
The quantum conditional min-entropy provides a lower bound on the number of nearuniform random bits that can be extracted from C conditional on ZE but this bound is unnecessarily conservative [27]. A better bound may be provided by the quantum smooth conditional min-entropy. Assume that ρ CZE is a normalized classical-quantum state. Then ρ CZE has ϵ-smooth max-prob p of C given ZE if there exists a sub-normalized state ρ CZE ′ which has exact max-prob P max (C|ZE) ρ′ ≤ p and is within purified distance ϵ from ρ CZE .
Here, the purified distance [38] between the normalized state ρ CZE and the sub-normalized state ρ CZE ′ is defined as where for a matrix M, its modulus |M| is given by |M| = M † M. The exact ϵ-smooth maxprob of C given ZE at ρ CZE is P max ϵ (C|ZE) ρ = inf ρ CZE ′ P max (C|ZE) ρ′ : ρ CZE ′ ∈ S ≤ 1 (CZE), PD(ρ CZE , ρ CZE ′ ) ≤ ϵ . (13) The quantity H ∞ ϵ (C|ZE) ρ = − log 2 (P max ϵ (C|ZE) ρ ) is called the quantum ϵ-smooth conditional min-entropy of C given ZE at ρ CZE . The above definitions are monotonic in the smoothness parameter ϵ. For example, if P max ϵ (C|ZE) ρ ≤ p and ϵ′ > ϵ, then P max The second main result of this work is that QEFs yield lower bounds on quantum smooth conditional min-entropies, which is formalized as follows: The event Φ′ can be interpreted as the event that the experiment succeeds, and κ is the probability of success according to the classical probability distribution of relevant events induced by the state ρ CZE .

V. QUANTUM RANDOMNESS GENERATION
The last theorem indicates that QEFs can certify the presence of randomness with respect to quantum side information. With a quantum-proof extractor, we can design an end-to-end randomness-generation protocol to extract near-uniform random bits. Our goal is to make this protocol sound, meaning that the protocol has guaranteed performance no matter how low the success probability is. In this section, we first discuss the extractor used, as it determines the choices of various parameters in the protocol. Then we formalize the definition of soundness. Finally, we present our randomness-generation protocol and prove its soundness.

A. Quantum-proof strong extractors
The input, output and seed to an extractor are denoted by C, R and S. Define n i = log 2 (| Rng(C)|) , k o = log 2 (|Rng(R)|) and k s = log 2 (|Rng(S)|). When C, R and S are bit strings, n i , k o and k s are their respective lengths. The seed S has a uniform probability distribution and is independent of all other classical variables and quantum systems.
Consider a function Ext : Rng(C) × Rng(S) → Rng(R). The function Ext is called a quantum-proof strong extractor with parameters (n i , k s , k o , k i , ϵ x ) if for every normalized classical-quantum state ρ CE = c |c〉〈c| ⊗ ρ E (c) with H ∞ (C|E) ρ ≥ k i , the joint state ρ RSE of the extractor output R = Ext(C, S), the seed S and the quantum system E satisfies where τ RS is a fully mixed and normalized state of dimension 2 k s +k o and ρ E is the marginal state of E according to ρ CE . Inequality (17) asserts that the joint state of R and S is nearly uniform (a property of strong extractors) and almost independent of the quantum side information in E (a property of quantum-proof extractors). A specific strong extractor is Trevisan's extractor [39], which is proven to be quantum-proof in Ref. [40].
To ensure the inequality in Eq. (17) hold, the parameters (n i , k s , k o , k i , ϵ x ) need to satisfy a set of constraints, called extractor constraints. The extractor constraints depend on the specific quantum-proof strong extractor to be used, but these constraints always include that 1 ≤ k i ≤ n i , k s ≥ 0, k o ≤ k i , and 0 < ϵ x ≤ 1.

B. Protocol soundness
A generic randomness-generation protocol G produces three outputs: R G -a bit string of length k o consisting of fresh random bits generated by the protocol, S G -a bit string of length k s consisting of the random seed S required for running the protocol, and P G -a flag whose value 0 or 1 indicates failure or success, respectively. The outputs R G , S G and P G are determined not only by the inputs Z and outputs C of the devices, but also by the specific quantum-proof strong extractor Ext used and its seed S.
Given a normalized state ρ CZE in the model C, we denote the normalized classical-quantum state of the classical variables R G , S G , P G , Z and the quantum system E after running the protocol with the extractor Ext on ρ CZE by ρ R G S G P G ZE . The normalized state conditional on success P G = 1 is denoted by ρ R G S G ZE | (P G = 1) . A randomness-generation protocol G is ϵsound at the normalized state ρ CZE if there exists a normalized state σ ZE ∈ S 1 (ZE) such that where σ R G S G is a fully mixed and normalized state of dimension 2 k o +k s and ℙ ρ (P G = 1) is the probability of success according to the classical probability distribution of relevant events induced by the state ρ R G S G P G ZE .
The protocol G is ϵ-sound for a model C if it is ϵ-sound at all normalized states in the model. Our goal is to obtain an ϵ-sound protocol for the model of the randomnessgeneration experiment. We emphasize that the soundness error e absorbs the contribution of the success probability and so unlike the certification of quantum smooth conditional minentropy as done in Thm. 3, a soundness statement does not require a presumed bound κ′ on the success probability.
Our definitions of quantum-proof strong extractors and protocol soundness differ from others such as those in Refs. [41,42] by requiring small purified distance instead of small trace distance, where the trace distance between two normalized states ρ and σ is defined as where |ρ − σ| is the modulus of the matrix (ρ − σ). With this change we can take advantage of the extendibility of the purified distance to previously traced-out quantum systems in order to analyze the composability of protocols involving the same devices, see Appendix B for a detailed discussion. We also note that as the purified distance is an upper bound of the trace distance (see Prop. 3.3, Pg. 50 of Ref. [38]), our definitions imply the definitions in Refs. [41,42].
We remark that a protocol G is called κ-complete for a model C if there exist a normalized state ρ CZE ′ in the model and the corresponding state ρ R G S G P G ZE ′ after running the protocol according to which the success probability satisfies ℙ ρ′ (P G = 1) ≥ κ. Completeness is an important factor to consider when designing an experiment, while soundness guarantees the performance of the protocol regardless of the actual implementation of the experiment.

C. QEF-based randomness-generation protocol
The end-to-end randomness-generation protocol is displayed in Protocol 1, where the notation 0 ⌢k denotes a string of k consecutive zeros. We emphasize that as specified in Protocol 1, the parameters k o , k s , k i , ϵ, ϵ x , n and β are determined before running the experiment. In this work, the QEF F(CZ) with power β for a sequence of trials is constructed by multiplying the QEFs F i (C i Z i ) with the same power β for each individual trial i, i = 1, 2, …, n (see Sect. VII B). In this case, the trial-wise QEF F i (C i Z i ) needs only to be fixed before the i'th trial rather than before the start of the experiment. We assume that the set X defined in Protocol 1 is non-empty. This assumption needs to be checked before invoking the protocol, and the input parameters can be adjusted to ensure that the assumption holds.
The soundness of Protocol 1 can be proven by composing Thm. 3 with the quantum-proof strong extractor used.
Theorem 4. Protocol 1 is an ϵ-sound randomness-generation protocol for the model C.
Proof. Let ρ CZE be an arbitrary normalized classical-quantum state in the model C from which CZ is instantiated to cz when running the protocol. Write the event Φ = cz: F (cz) ≥ f min = 1/(p β (ϵ ℎ 2 /2)) such that when cz ∈ Φ, the protocol succeeds, that is, P G = 1. Let κ = cz ∈ Φ Tr(ρ E (cz)) = ℙ ρ (P G = 1) be the probability of success according to the classical probability distribution of relevant events induced by the state ρ CZE .

Input :
• k o -the number of fresh random bits to be generated.

Given :
• An experiment with inputs Z and outputs C of length n, as specified in the first paragraph of Sect. III.
• A QEF F(CZ) with power β for the model C of the experiment (see the paragraph including Eq. (7)).
• A quantum-proof strong extractor Ext (see Sect. V A).

Promise:
• The set X defined by X = {(k s , k i , ϵ x ) : (n i , k s , k o , k i , ϵ x < ϵ) satisfies the extractor constraints for Ext (see Sect. V A), where n i = log 2 (|Rng(C)|)} is non-empty.
Output : R G , S G , P G as specified in the first paragraph of Sect. V B.
Choose (k s , k i , ϵ x ) ∈ X; Get an instance s of the uniformly random seed S of length k s ; If β > 1, then set p = 2 −k i ϵ (β−1)/β , otherwise set p = 2 −k i ; Run the experiment and get an instance cz of CZ;

end
We first consider the case κ ∈ [ϵ, 1]. We apply Thm. 3 with the following substitutions: With these substitutions, the event Φ′ in the statement of Thm. 3 becomes the same as the event Φ defined above, and so the parameter κ in the statement of Thm. 3 becomes the same as the parameter κ introduced above. According to Eq. (16), we have The protocol's choice of p depends on β. Specifically, if β ≤ 1, then pκ (1−β)/β ≤ p = 2 −k i , and if β > 1, considering that κ ∈ [ϵ, 1] we have pκ (1−β)/β ≤ pϵ (1−β)/β = 2 −k i . Hence, we always have pκ (1−β)/β ≤ 2 −k i and so from Eq. (20) we get The extractor constraints ensure that k i ≤ n i = log 2 (|Rng(C)|), so 2 −k i |Rng(C)| ≥ 1. According to Lem. 7 in Appendix C, there exists a normalized state ρ CZE ′ such that and that is, H ∞ (C|ZE) ρ′ ≥ k i . Because the parameters (n i , k s , k o , k i , ϵ x ) satisfy the extractor constraints, we can apply the quantum-proof strong extractor Ext with the state ρ CZE ′ and get where τ RGSG is a fully mixed and normalized state of dimension 2 k o +k s .
Since the purified distance satisfies the data-processing inequality (Thm. 3.4, Pg. 51 of Ref. [38]), from Eq. (22) we get The triangle inequality for the purified distance (Prop. 3.2, Pg. 50 of Ref. [38]) together with Eqs. (24) and (25) yield We multiply both sides by κ for For the case κ < ϵ, since the purified distance cannot be larger than one, Therefore, the condition for ϵ-soundness is satisfied for the full range of values of κ at the state ρ CZE . Because ρ CZE is an arbitrary normalized state in the model C, Protocol 1 is ϵsound for the model C. ◻

VI. CONSTRUCTION OF MODELS
In order to perform quantum probability estimation, we first need to specify the model for the experiment. The model is the set of all possible classical-quantum states that can occur at the end of the experiment.
The model for the whole experiment is normally constructed by combining models for the individual trials, where the model for a trial specifies the set of all possible states describing the joint state of the classical results at the trial and the quantum system E. The model for a trial can depend on the past and is usually specified by known constraints on the trial. In the case of Bell tests, these constraints include the familiar non-signaling conditions [32,33] and the requirement that the trial results can be achieved with measurements on separate quantum systems according to the configuration.
We refer to the operation of combining trial models as model chaining. When chaining trial models, a Markov-chain condition on the inputs Z and outputs C is required in order to certify randomness in C conditional on both Z and E (see Sect. VI B for details). This is similar to the Markov-chain condition required by entropy accumulation [13,29]. Since quantum probability estimation can be applied to trials satisfying the Markov-chain condition, it does not require the trials to be independent and identically distributed (i.i.d.).
In this section, we give a formal definition of a model and explain the details behind the model construction. For this, we consider an experiment performed with quantum devices for either device-dependent or device-independent randomness generation, as described in Fig. 1. Before the experiment, the initial state of the quantum devices may be correlated with an external quantum system E. After the experiment, we obtain the inputs Z and outputs C.
At each trial of the experiment, we allow arbitrary one-way communication from the system E to the devices. For example, E can initialize the state of the quantum devices via a oneway communication channel. We also allow the possibility that the device initialization at a trial by E depends on the past inputs preceding the trial. This implies that the random inputs Z can come from public-randomness sources, as first pointed out in Ref. [6]. However, at any stage of the experiment the information of the outputs C cannot be leaked to E.

A. General models
A model is generally denoted by C. To specify the classical variable or variables that a model depends on, we use C(U), C(UV ), etc. The default quantum system underlying a model is E. A model C(U) for UE is defined as a subset of classical-quantum states ρ UE in S(UE) closed under multiplication by non-negative real numbers. The set of normalized classical-quantum states in C(U) is denoted by N(C(U)) = ρ UE : ρ UE ∈ C(U) and ρ UE ∈ S 1 (UE) . In a similar way, we can define a model that depends on several classical variables, such as C(UV ) for UVE.
A model becomes classical when the quantum system E is one-dimensional or traced out. In this case, the model specifies the set of un-normalized probability distributions of the underlying classical variable or variables. A classical model for U is denoted by C cl (U).
Model chaining is formally specified as follows. Suppose that C(U) is a model for UE and for each u, C u (V ) is a u-dependent model for VE. We write C U (V ) for the family of udependent models consisting of all C u (V ). The result of chaining C(U) and C U (V ) is the and for all u, ρ uV E ∈ C u (V ) , (26) where ρ uVE is the u-dependent classical-quantum state for VE determined from The chaining operation can be applied inductively to construct a model for the whole experiment from models for the individual trials. Let us consider an experiment with a finite sequence of classical variables U = (U i ) i = 1 n , where U i is the classical variable associated with the i'th trial of the experiment. (In a randomness-generation experiment, U i = C i Z i .) Let i − 1 and u < i = (u j ) j = 1 i − 1 be the sequences of classical variables and their values before the i'th trial. The sequences U ≤i and u ≤i are defined similarly. By convention, U ≤0 and u ≤0 are empty sequences. We construct the model C(U) by chaining past-conditional models C U < i (U i ). The trial models C U < i (U i ) can be constructed by considering all allowed measurements for the device configuration in an experiment, as described in the next paragraph. We call such trial models induced models.
Consider a generic trial and the associated classical variable U, where for generic trials we omit the trial index and make the dependence on the past results implicitly by dropping the subscript. Let D be the quantum system of the devices and ρ DE ∈ S(DE) be the joint state of the quantum systems D and E before the trial. The state ρ DE can be arbitrary as the system E is inaccessible from the experiment and also has the freedom to prepare the state ρ DE . Moreover, the state ρ DE can be un-normalized, as it is prepared probabilistically conditional The specific family P D (U) of POVMs may depend on the past results; however, each POVM P D (U) in P D (U) should be consistent with the behavior of the quantum devices at the trial.
For example, in Bell-test configurations the system D can be decomposed into several subsystems associated with each local party. Therefore, the POVM P D (U) should have a tensor-product structure over these subsystems. When U contains inputs with known probability distributions, the POVM is additionally constrained, see Eqs. (34), (35) for an example. In partially device-dependent applications, one may also trust the form of the specific measurements or the dimensions of the subsystems.
Finally we consider a special kind of map of quantum states and the corresponding closure property of models. We call a map ℰ on ℋ(E) a pure completely positive map (pCP map) if it is of the form ℰ(ρ) = MρM † for some linear operator M on ℋ(E). A pCP map ℰ transforms a state ρ UE ∈ S(UE) according to The model C(U) is pCP-closed if for each pCP map ℰ and each state ρ UE in C(U) the resulting state ℰ(ρ UE ) is still in C(U). The pCP-closure property is satisfied by the induced model ℳ(P D (U); E). To prove this pCP-closure property, it suffices to observe that by definitions, pCP maps on ℋ(E) preserve S(E), and POVMs of D with outcome U commute with pCP maps on ℋ(E). The pCP-closure property is useful for constructing QEFs, see Sect. VII B for details.

B. Models for input-conditional randomness generation
For randomness generation, the sequence of classical variables U in an experiment usually consists of both the inputs Z and outputs C. In order to certify randomness in the outputs C conditional on the inputs Z as well as the quantum side information in E, we need to restrict the chained models such that at each trial i, information about the past outputs C <i cannot be leaked through the input Z i . For this we require that the input Z i is independent of the past outputs C <i given E and the past inputs Z <i . Because E is quantum, this is formulated by means of a short quantum Markov chain [43].
Specifically, models for input-conditional randomness generation can be constructed by inductively applying the chaining operation defined as follows. Let C(C < i Z < i ) be a model is a short quantum Markov chain over Z <i E (see Appendix D for the definition of short quantum Markov chains).
In practice, the input Z i at each trial i is treated as a free choice in the sense that Z i is independent of other classical variables, the quantum devices used and the quantum system E. Given this independence, the model C(CZ) can be constructed by chaining the trial models C C < 1 Z < 1 (C 1 Z 1 ), C C < 2 Z < 2 (C 2 Z 2 ), …, C C < n Z < n (C n Z n ) with the standard chaining operation of Eq. (26). Independence ensures that each state ρ CZE in the chained model C(CZ) satisfies the short quantum Markov-chain condition over Z <i E for each i. Indeed, the short quantum Markov-chain condition is satisfied if and only if at each trial i, the input Z i is independent of the past outputs C <i given E and the past inputs Z <i [43].
Models constructed by chaining with conditionally independent inputs capture the standard experimental configurations for randomness generation with free setting choices, including both device-dependent and device-independent scenarios. We remark that there is no restriction on the dynamics of the devices between trials, nor is there any reason to explicitly represent this dynamics. The model keeps track only of the joint state of CZE, and with the formulation of induced trial models, any quantum systems or quantum operations that the devices use over the course of the experiment are subsumed by the trial models and the chaining constructions.

VII. CONSTRUCTION OF QEFS
In this section, we first give an expression for the QEF inequality imposed by an arbitrary state (not necessarily normalized) in the model C(CZ). Then we discuss several properties of QEFs. Next we show that QEFs for models obtained by chaining with conditionally independent inputs can be constructed by multiplying QEFs for the individual trial models in the chain. QEFs for later trial models may depend on the results of earlier trials, so we refer to the construction of QEFs above as QEF chaining. Finally, we formulate the construction of trial-wise QEFs as an optimization problem.
Let ρ CZE be an arbitrary state in C(CZ). The QEF inequality with power β at ρ CZE is given where ρ E (cz) and ρ E (z) are the un-normalized marginal states of E given the results cz and z according to ρ CZE . Both sides of the QEF inequality in Eq. (27) are positively homogeneous of degree 1 in ρ CZE . It follows that to show a function F : Rng(CZ) ℝ ≥ 0 is a QEF for C(CZ), it is necessary and sufficient that the QEF inequality holds for normalized states in N(C(CZ)). For normalized states, the QEF inequality becomes the inequality in Eq. (7).
Similarly, we can define QEFs for each individual trial model. We remark that the QEF inequality in Eq. (27) is helpful for proving properties of QEFs as well as QEF chaining, while the QEF inequality in Eq. (7) is used for numerical constructions of QEFs.

A. Properties of QEFs
Here are a few useful properties of QEFs; their proofs are given in Appendix F.

Property 1.
For all models C(CZ), the constant function F(cz) = 1 for all cz ∈ Rng(CZ) is a QEF with power β for each β > 0.
Let Cone(C(CZ)) be the convex cone generated by C(CZ). Then we have According to Property 2, if Cone(C′(CZ)) ⊇ C(CZ), then every QEF for C′(CZ) is a QEF for C(CZ). Thus a strategy for constructing QEFs is to find an easily characterized model C′(CZ) whose convex closure contains the model C(CZ) of interest.
Let ℰ Z be an arbitrary family of z-dependent completely positive and trace preserving (CPTP) maps ℰ z on ℋ(E). As an operation, ℰ Z transforms a state ρ CZE according to Define the model ℰ Z (C(CZ)) as Then, another property is that Property 3. A function F : Rng(CZ) ℝ ≥ 0 is a QEF with power β for ℰ Z (C(CZ)) if the function is a QEF with power β for C(CZ).
Property 3 is useful for certifying randomness in the situation where the system E learns the input Z i to the devices after each trial i. For this situation, we start with the model C(CZ) for CZE and construct the QEFs F(CZ) for C(CZ) (see the next subsection for the construction details). After the experiment, as E learns the inputs Z, the change of E can be modelled by a family ℰ Z of z-dependent CPTP maps ℰ z on ℋ(E). Therefore, the model becomes ℰ Z (C(CZ)). According to Property 3, the constructed QEFs are still valid for ℰ Z (C(CZ)).
For the next property, we define a function K: Rng(CZ) ℝ to be an entropy estimator for the model C(CZ) if for all states ρ CZE in the model, where H 1 (C|ZE) ρ is the conditional von Neumann entropy (in binary logarithm) of C given ZE at ρ CZE . Every QEF yields an entropy estimator.

Property 4.
Let F(CZ) be a QEF with power β for C(CZ). Then log 2 (F(CZ))/β is an entropy estimator for C(CZ).
The affine min-tradeoff functions required for entropy accumulation [13,29] are closely related with the entropy estimators defined above. With our notation, an affine min-tradeoff function f for the model C(CZ) is a linear and real function of the probability distribution μ(CZ) such that f(μ(CZ)) ≤ H 1 (C|ZE) ρ for all normalized states ρ CZE in C(CZ) which have the marginal tr E (ρ CZE ) = μ(CZ). Since f is linear and real, we can write f(μ(CZ)) = cz a cz μ(cz) + a 0 = cz (a cz + a 0 )μ(cz) with real numbers a cz and a 0 , so f(μ(CZ)) = E μ (K(CZ)) where K(CZ) : cz ↦ a cz + a 0 is an entropy estimator. According to Property 4, affine min-tradeoff functions can be derived from QEFs. It is an interesting problem to see whether the affine min-tradeoff functions derived from QEFs can improve on the ones previously obtained [13] for entropy accumulation.
Besides the above four properties, there are two additional properties satisfied by QEFs: Property 5. Let F(CZ) be a QEF with power β for C(CZ). Then for all β′ ≥ β, F(CZ) is a QEF with power β′ for C(CZ).
Property 5 is useful for studying the finite-data performance of QEFs, while Property 6 helps to study the asymptotic behavior of randomness generation according to QEFs, see Sect. VII C for more discussions.

B. QEF chaining
Consider an arbitrary trial indexed by i. For past results c <i z <i , let C c < i z < i (C i Z i ) be a trial model for C i Z i E which can depend on the past. Suppose that the trial model C c < i z < i (C i Z i ) is pCP-closed. Let C C < i Z < i (C i Z i ) be a family of such trial models consisting of all C c < i z < i (C i Z i ). Suppose that for each trial model C c < i z < i (C i Z i ), we are able to construct the QEF Fc <i z <i (C i Z i ), where the subscript of the QEF indicates that its construction can depend on the past results c <i z <i . Let F C <i Z <i (C i Z i ) be a family of trial-wise QEFs consisting of all F c <i z <i (C i Z i ). If the model C(CZ) is obtained by chaining the trial models C C < 1 Z < 1 (C 1 Z 1 ), C C < 2 Z < 2 (C 2 Z 2 ), …, C C < n Z < n (C n Z n ) with conditionally independent inputs, then the QEFs F(CZ) for C(CZ) can be constructed by multiplying or chaining the trial-wise QEFs F C <i Z <i (C i Z i ), i = 1, 2, …, n. This construction follows from the next theorem by induction. To simplify the notation in the next theorem, we consider a generic trial with input Z and output C, and we denote the past inputs and outputs by Z and C. The proof of Thm. 5 can be found in Appendix E.
In practice, each C cz (CZ) is a trial model induced by a family of POVMs. Thus the pCPclosure property required in Thm. 5 is satisfied. Moreover, in standard situations for randomness generation, the input Z at a trial is a free choice. So, the model chaining with conditionally independent inputs required in Thm. 5 is also satisfied. Accordingly, we can construct QEFs for a sequence of trials by chaining trial-wise QEFs.
An advantage of QEF chaining is that trial-wise QEFs can be adapted while the trials are acquired. Specifically, let k be the number of trials performed (or analyzed) so far. According to QEF chaining, the next trial's QEF F k+1 (C k+1 Z k+1 ) ≡ F C ≤k Z≤k (C k+1 Z k+1 ) can depend arbitrarily on the past results C ≤k Z <k . In particular, one can check the statistics of recent trials to infer whether the probability distribution of trial results has changed and if so, adapt the construction of the next trial's QEF accordingly.
A consequence of the above adaptive construction of trial-wise QEFs is that one can stop acquiring trials as soon as the chained QEF takes a value larger than or equal to the threshold f min in Protocol 1. Specifically, if the value of the chained QEF so far, Π i = 1 k F i (c i z i ) with k < n, already exceeds the threshold f min , then one can set all future QEFs F i (C i Z i ), where i = k + 1, …, n, to be the constant 1, which is a valid QEF according to Property 1. Since this eliminates any contribution from future trials to the final chained QEF value, it is not necessary to perform the future trials at this point. Instead, we pad the trial outputs c ≤k observed so far with zeros in order to run randomness extraction according to Protocol 1, as the protocol requires a fixed number n of trials determined before the experiment. This ability of early stopping is exploited in our companion experimental work [44].

C. QEF optimization
Let F i (C i Z i ) be a QEF with power β for the i'th trial in an experiment. According to QEF chaining, the product Π i = 1 n F i (c i z i ) is a QEF with power β for the whole experiment consisting of n trials. A construction of good trial-wise QEFs can be derived as follows. The experiment successfully implements Protocol 1 if the chained QEF satisfies the condition Π i 1 n F i (C i Z i ) ≥ f min = 1/(p β (ϵ ℎ 2 /2)), or equivalently, i 1 n log 2 (F i (C i Z i )) β + log 2 (ϵ ℎ 2 /2)/β ≥ − log 2 (p) .
Hence, we aim to obtain a large expected value of the left-hand side of the above equation with as few trials as possible. For this purpose, before the experiment we can choose values for p and ϵ h (see Protocol 1) and optimize over the trial-wise QEFs and the power β such that the number of trials required for success, n exp , is minimized. Then we fix the number of trials n in the experiment to be a number larger than the minimum number of trials, so that the actual experiment succeeds with high probability if the quantum devices used are honest. Moreover, during the experiment we have the freedom to adapt the QEF F i (C i Z i ) before the i'th trial where p, ϵ h , β and n are already fixed. All these optimizations are based on the construction of trial-wise QEFs given fixed β and other parameters. This construction is detailed in the next paragraph.
Consider a generic next trial with results CZ and model C(CZ). Based on prior calibrations or the frequencies of observed results in past trials, we can determine a distribution ν(CZ) ∈ N(C cl (CZ)) ≐ tr E (N(C(CZ))) that is (hopefully) a good approximation to the distribution of the next trial's results. If each trial has the same distribution ν(CZ) and each trial model is the identical C(CZ), then after n trials the left-hand side of Eq. (29) is expected to be E ν (nlog 2 (F (CZ))/β + log 2 (ϵ ℎ 2 /2)/β). Here, F(CZ) is a QEF with power β for C(CZ) and we use the same trial-wise QEF for each trial. Thus, one way to optimize QEFs before the next trial is as follows: Max: E ν (nlog 2 (F (CZ))/β + log 2 (ϵ ℎ 2 /2)/β) Subject to: 1)F (cz) ≥ 0, for all cz, 2) cz F (cz)ℛ α (ρ E (cz)|ρ E (z)) ≤ 1, for all ρ CZE ∈ N(C(CZ)) .
The objective function is strictly concave and the constraints are linear, so there is a unique maximum. More details on QEF optimization in the CHSH Bell-test configuration are available in the next section. We emphasize that the trial-wise QEF returned by the above optimization problem is optimal only when the trial results are i.i.d. with distribution ν(CZ) and with the identical trial model C(CZ), but it is always valid by definition regardless of the actual distribution of the next trial's results as long as the trial model is C(CZ).
Before the experiment, we also would like to minimize the number of trials n exp required for the experiment to succeed. For this, we consider an equivalent task for randomness beacons -certifying a fixed number, b, of bits of quantum ϵ h -smooth conditional min-entropy with as few trials as possible, where the distribution of each trial's results is the same ν(CZ) ∈ N(C cl (CZ)) with the identical trial model C(CZ). For this task, we assume that the actual probability of success is larger than or equal to a positive κ′ fixed beforehand. We informally justify this assumption below. Good reference values for randomness beacons are b = 512 and ϵ h = κ′ = 2 −64 . A tight lower bound on the number of trials required for satisfying the above randomness-beacon task is denoted as n QEF,b , which depends on ϵ h and κ′ implicitly. An expression for n QEF,b is derived in the next paragraph.
In view of Thm. 3 and Eq. (29), if the actual probability of success is larger than or equal to κ′, then conditional on success the amount of quantum ϵ h -smooth conditional min-entropy certified after n trials is log 2 (f min )/β + log 2 (ϵ ℎ 2 /2)/β + αlog 2 (κ′)/β . (31) Success requires that i 1 n log 2 (F i (C i Z i )) ≥ log 2 (f min ). Define the quantity where the supremum is over trial-wise QEFs F(CZ) with power β for C(CZ). Since each trial has the same distribution ν(CZ) and each trial model is the identical C(CZ), we can set all trial-wise QEFs F i , i = 1, 2, …, n, with power β to be the same QEF F that witnesses the value g(β) defined in Eq. (32). So, the expectation of i 1 n log 2 (F i (C i Z i ))/β is ng(β). For adequate probability of success we therefore need log 2 (f min )/β smaller than ng(β) by an amount that is asymptotically small compared to ng(β). For the present analysis, we simply set log 2 (f min ) = ng(β) to determine a number of trials, n b , required for certifying b bits of quantum ϵ h -smooth conditional min-entropy according to n b g(β) + log 2 (ϵ ℎ 2 /2)/β + αlog 2 (κ′)/β ≥ b, or equivalently, n b ≥ bβ − log 2 (ϵ ℎ 2 /2) − αlog 2 (κ′) β g (β) .
The lower bound n QEF,b may be considered tight to lowest order in the sense that one needs only to increase the number of trials used in practice by an amount that is asymptotically small compared to n QEF,b , in order to guarantee sufficient probability of success. Here, the probability of success can be estimated according to the distribution of a sum of i.i.d. random variables associated with the logarithm of the trial-wise QEF used.
Since b ≥ 0 and κ′ ≤ 1, Eq. (33) shows that the number of trials must exceed the minimum of −log 2 (ϵ ℎ 2 /2)(βg(β)) before randomness can be certified, which suggests that the maximum of βg(β) is a good indicator of finite-data performance. From Property 5 one can see that the maximum of βg(β) is achieved in the limit where β goes to infinity. The finite-data performance of QEFs is illustrated in Sect. VIII B.
We remark that for each fixed β, the quantity g(β) defined in Eq. (32) can be identified as the maximum asymptotic entropy rate at constant ϵ h and κ′ witnessed by trial-wise QEFs with power β when each trial has the same distribution ν(CZ) and each trial model is the identical C(CZ). We skip the justification and refer to Sect. 5.4 of Ref. [45] for details. The maximum asymptotic entropy rate witnessed by all possible trial-wise QEFs for C(CZ) is g 0 = sup β>0 g(β). From Property 6 one can see that the rate g(β) is non-increasing in β. Thus g 0 is determined by the limit as β goes to zero. In fact, g 0 can be proven to be equal to the worstcase conditional von Neumann entropy H 1 (C|ZE) over all states ρ CZE allowed by the trial model C(CZ) such that tr E (ρ CZE ) = ν(CZ), see Sect. 6.5 of Ref. [45]. Since this worst-case conditional von Neumann entropy is a tight upper bound on the asymptotic randomness rate [46], quantum probability estimation is asymptotically optimal and we identify g 0 as the asymptotic randomness rate achieved by quantum probability estimation at constant ϵ h and κ ′.

VIII. QEFS FOR THE CHSH BELL-TEST CONFIGURATION
We consider DIRG with the experimentally relevant two-party, two-setting, two-outcome Bell-test configuration, referred to as the CHSH configuration [31]. The parties are labeled A and B. The quantum system D of the devices can be decomposed into two subsystems D A and D B held by A and B respectively. In each trial, a source (which could be under control of E) prepares a state ρ DE shared between D and E, and the party A (B) randomly chooses a setting X (Y) and obtains a measurement outcome A (B). We write Z = XY for the inputs of the trial, and C = AB for the outputs of the trial. For the CHSH configuration, A, B, X, Y ∈ {0, 1}.
The trial model for ABXYE is induced by a family of input-dependent POVMs of D with free inputs XY. In an experiment the input distribution μ(XY) is usually not exactly known, but it is reasonable to assume that the inputs XY are free choices, independent of other classical variables and the quantum systems D, E. Denote the classical model for the inputs XY by C cl (XY ). In most cases the normalized classical model N(C cl (XY )) is a convex polytope. We assume that the input distribution μ(XY ) ∈ N(C cl (XY )). Denote the family of input-dependent POVMs of D by P D, XY (AB). Each POVM in P D, XY (AB) has a tensorproduct structure over the two subsystems D A and D B . Furthermore, according to the non-signaling conditions [32,33] the output of A (or B) is independent of the input of B (or A). Therefore, for arbitrary inputs xy and outputs ab the POVM element P D,xy (ab) is of the form P If the inputs XY are free with distribution μ(XY ) ∈ N(C cl (XY )) and for each xy the induced model for ABE is ℳ(P D, xy (AB); E), then the trial model for ABXYE is given by C 222 (ABXY ) = xy μ(xy)|xy〉〈xy| ⊗ ρ ABE|xy : μ(XY ) ∈ N(C cl (XY )), and for each xy, ρ ABE|xy ∈ ℳ(P D, xy (AB); E) . (35) We emphasize that in the CHSH Bell-test configuration each trial model is the identical C 222 (ABXY ), even if the results obtained from a sequence of time-ordered trials are not i.i.d.
and even if the distribution of free inputs μ(XY) is not exactly known as long as μ(XY ) ∈ N(C cl (XY )).
In Sect. VIII A we simplify both the characterization of the model C 222 (ABXY ) and the corresponding construction of QEFs for certifying randomness against quantum side information. Then we demonstrate the finite-data performance of QEFs in Sect. VIII B.

A. Simple characterization of C 222 (ABXY )
In the device-independent scenario, the dimension of each quantum system D or E can take an arbitrarily large but finite value. By the usual dilation argument, we can consider only projective measurements P D A ,x (A) and P D B ,y (B) in Eq. (34). For the CHSH Bell-test configuration, we can further simplify the characterization of the models ℳ(P D, xy (AB); E) for all xy according to the next theorem. For this, we need the rank-1 projectors on a qubit defined by where s ∈ {0, 1}, θ ∈ (−π, π], and σ z and σ x are two of the Pauli matrices, corresponding to the observables along the z-axis and x-axis of the Bloch sphere. Note that each of the sets consists of positive combinations of states ρ ABE|xy expressible in the form ρ ABE|xy = ab |ab〉〈ab| ⊗ (Uτ 1/2 Π ab|xy (θ A , θ B )τ 1/2 U † ) (37) where τ ∈ ℂ 2 ⊗ ℂ 2 is a positive semidefinite operator, Π ab|xy (θ A , θ B ) = Q x; θ A (a) ⊗ Q y; θ B (b) ∈ ℂ 2 ⊗ ℂ 2 is a rank-1 projector with θ A , θ B ∈ (−π, π], and U is an isometry from ℂ 2 ⊗ ℂ 2 to ℋ(E).
We remark that both τ and U are independent of the inputs xy and outputs ab. Unless it is necessary to emphasize the projector Π ab|xy as a function of θ A and θ B , below we abbreviate it as Π ab|xy . The theorem follows from a well-known analysis going back to Ref. [47] and even earlier Ref. [48]; a nice version of this analysis is in Sect. 2.4.1 of Ref. [49]. A detailed proof is given in Appendix G.
Given an arbitrary non-negative function F ′: Rng(ABXY ) ℝ ≥ 0 and a power β > 0, define W F ′, α, k (θ A , θ B , τ) = abxy F ′(abxy)μ k (xy) tr(τ 1/α Π ab|xy ) α (41) for each extremal distribution μ k . Then for each k we can formulate the following optimization problem f k ≐ Max: W F ′, α, k (θ A , θ B , τ) Subject to: τ ≥ 0 and tr(τ) = 1, −π < θ A , θ B ≤ π . Given a method to determine f k , any generic local search method can be used to find the best QEF for solving the convex-optimization problem in Eq. (30). So, in this work we focus on methods for determining f k . We provide a numerical method for determining both a lower and an upper bound on f k as detailed in Appendix H. Below we refer to the optimization problem in Eq. (42) as QEF verification, since from the solution f k a valid QEF can be constructed. We emphasize that the method presented in Appendix H works with arbitrary non-negative functions F ′: Rng(ABXY ) ℝ ≥ 0 . In the following subsection we derive QEFs by this method from PEFs, because not only are effective methods for constructing PEFs available but also PEFs exhibit unsurpassed finite-data efficiency [25,26]. Recall that QEFs and PEFs address quantum and classical side information, respectively. Hence, by QEF verification we can upgrade the security analysis with respect to classical side information to that with respect to quantum side information.

B. Finite-data performance of QEFs
In this subsection, we compare quantum probability estimation with entropy accumulation [13,29] on their finite-data performances. We also re-analyze the data from the first experimental demonstration of certified randomness for DIRG [3] to enhance its security against quantum side information.
It is best to construct optimal QEFs by solving the optimization problem in Eq. (30) directly. However, an effective algorithm for finding optimal QEFs has not yet been well developed. Instead, here we construct valid QEFs. Such a QEF can be obtained with the following steps. We first construct an optimal PEF F′(ABXY) with power β as in our previous works [25,26]. Then, given F′(ABXY) and α = 1 + β, we determine both an upper and a lower bounds on f max (as introduced below Eq. (42)) with the method in Appendix H. We denote the lower and upper bounds obtained by f max,lb and f max,ub , respectively. Finally, we obtain a valid QEF F(ABXY) with power β by dividing the PEF by the upper bound f max,ub , that is, F(abxy) = F′(abxy)/f max,ub for all abxy. We emphasize that the QEFs derived from PEFs perform well as demonstrated below, however, they are not optimal in terms of the asymptotic randomness rate or finite-data efficiency exhibited. If numerically effective methods for solving the QEF optimization problem in Eq. (30) are available, we can improve the asymptotic randomness rate and finite-data efficiency in the presence of quantum side information.
In our study, we assume that the inputs XY at each trial are free with the uniform distribution. So the trial model of interest is C 222 (ABXY ) as specified in Eq. (35) but under the restriction that the input distribution is uniform. To construct PEFs, we consider the classical trial model T(ABXY ) of distributions of ABXY with uniformly random inputs, satisfying both non-signaling conditions [32] and Tsirelson's bounds [50]. The optimization over PEFs with a fixed power β for T(ABXY ) is a convex-optimization problem, see Sect.
VIII of Ref. [25] for details. For each optimal PEF used, we found that both f max,lb and f max,ub are indistinguishable from 1 at high precision. Therefore, the constructed QEF for the trial model C 222 (ABXY ) with uniformly random inputs is well-performing in the sense that it performs as well as the optimal PEF used for the classical trial model T(ABXY ). We emphasize that when the input distribution is close to uniform, the QEF derived from a PEF performs as well as the original PEF as demonstrated below and in our companion work [44]. However, when the input distribution is far away from uniform (for example, when the total-variation distance between the input and uniform distributions is larger than 0.7) and when the power β is small enough (for example, when β is smaller than 10 −7 ), we observed that the certified upper bound f max,ub could be larger than 1 by a non-negligible amount such that the QEF derived from a PEF does not perform as well as the original PEF.
We first compare the minimum numbers of trials required by quantum probability estimation and by entropy accumulation [13,29] to ensure that the quantum ϵ h -smooth conditional min-entropy estimate is positive, under the assumption that the trial results are i.i.d. with distribution ν. We note that like our method, entropy accumulation works without the i.i.d.
assumption, but the finite-data performance of each method depends on the actual distribution of each trial's results. With quantum probability estimation, the minimum number of trials required is denoted by n QEF,b=0 , where the expression for n QEF,b is given in Eq. (33). With entropy accumulation (EAT) as implemented in Ref. [13], the minimum number of trials required is denoted by n EAT,b=0 . An explicit expression for n EAT,b is given in Eq. (S34) 2 of our previous work [26]. Like n QEF,b , the expression for n EAT,b is associated with moderate completeness depending on the distribution of a sum of i.i.d. random variables as discussed in Sect. VII C.
To determine the number n QEF,b=0 , we need to solve a minimization over all QEFs F(ABXY) with power β > 0, given the distribution ν of trial results, the smoothness error ϵ h , and a presumed lower bound κ′ on the success probability that we need to protect against (see Eq. (33)). Denote the objective function to be minimized by m QEF,b=0 (F, β; ν, ϵ h , κ′). Then, n QEF,b=0 = min F,β m QEF,b=0 (F, β; ν, ϵ h , κ′). Since numerical methods for QEF verification but not for QEF optimization are available, we determine an upper bound on n QEF,b=0 instead. For this, we first minimize the expression for m QEF,b=0 (F′, β; ν, ϵ h , κ′) over PEFs F′(ABXY) with β > 0 by the numerical method developed in Ref. [25]. Suppose that the minimum is witnessed by the PEF For specific comparisons, we need to fix the distribution ν and the smoothness error ϵ h , as well as the presumed lower bound κ′ on the success probability. We choose ϵ h = κ′ = 10 −6 .
The quantum smooth conditional min-entropy estimate by either quantum probability estimation or entropy accumulation depends on ϵ h and κ′ in similar ways, so we expect our choice of ϵ h and κ′ to be representative. We emphasize that the amount of quantum ϵ hsmooth conditional min-entropy certified by either method (before randomness extraction) depends on κ′; however, in an end-to-end randomness-generation protocol it is not necessary to specify a value of κ′ for a soundness statement (see our Protocol 1 and its soundness proof of Thm. 4, for example). For the distribution ν of trial results ABXY, we consider the following three families as in our previous work [26] for certifying randomness against classical side information:
experiments. The values for the parameter θ, p or η of each family are chosen such that Î is above the classical upper bound 2 [31] (and of course not larger than the quantum maximum 2 2 [50]). We note that Î increases monotonically with each parameter θ, p or η.
For each family of distributions above, we determine the QEF advantage factor given by f ν = n EAT, b = 0 /n QEF, b = 0 ′ . For the distributions ν W,p , the advantage factor depends weakly on Î: f ν W,p increases from 45.6 at Î = 2.008 to 47.1 at Î = 2 2. For the other two families of distributions, the advantage factor can be much larger, particularly at Î near 2, as shown in Fig. 2. We also verified that entropy accumulation with improved second-order in Ref. [14] can reduce n EAT,b=0 only by a factor of no more than 2.02. Moreover, we verified that the above comparison results are almost unchanged when the identical value for ϵ h and κ′ varies from 10 −2 to 10 −20 .
Next we re-analyze the work of Pironio et al. in 2010 [3], which reported the first experimental demonstration of certified randomness for DIRG with a Bell test free of the detection loophole. From the results of the experiment, the presence of 42 bits of smooth conditional min-entropy with the error 10 −2 was certified in Ref. [3], where this error is related to a smoothness error but does not reflect currently accepted soundness definitions. The value for the error and that the certification did not take into account the success probability or quantum side information were clarified in subsequent papers [6,7]. A question is whether the experiment could have certified positive smooth conditional minentropy with respect to quantum side information, which is answered as follows.
The experiment of Ref. [3] consisted of 3016 trials, of which we use the first 1000 for calibration. From the calibration data, we determine a distribution ν of trial results in the classical trial model T(ABXY ) by maximum likelihood assuming i.i.d. trials (see Sect. VIII B of Ref. [25] for more details of this step). To find a PEF and its power β, we maximize the objective function in Eq. (30) with n = 2016, ϵ h = 10 −2 , the distribution v determined from the calibration data, and the replacement of a QEF by a PEF with the same power β.
After calibration, we determine that f max for the PEF found satisfies f max ∈ [1, 1 + 9.56 × 10 −6 ]. The bounds were computed at the numerical precision of 2 −52 ≈ 2.22 × 10 −16 with Matlab, then verified with Mathematica at the precision of 10 −32 . From the PEF found and the upper bound on f max , we can obtain a valid QEF and apply this QEF to the remaining 2016 trials. The obtained QEF witnesses 127.86 bits of quantum smooth conditional minentropy with ϵ h , = 10 −2 in the experiment reported in Ref. [3], if the presumed lower bound κ′ on the success probability in Thm. 3 is formally set to be 1. We note that for the observed frequencies of trial results in this experiment, entropy accumulation implemented in Ref. [13] requires 54688 trials; while entropy accumulation with improved second-order in Ref. [14] requires 27620 trials, much more than the 3016 trials available in Ref. [3], in order to certify any random bits at the same ϵ h , and κ′ as above. Here, the assignment of κ′ = 1 is purely formal for comparison with respect to the soundness criteria implicit in Ref. [3]. These soundness criteria are now considered inadequate. With modern soundness criteria and at ϵ h , = κ′ = 3× 10 −2 , the number of bits witnessed by the QEF is 72.70. This number is derived from the experimental QEF value. In a protocol, the number of bits to be produced needs to be determined before the experiment and would have to be set to a smaller number to ensure sufficiently high probability of success.

IX. CONCLUSION
The finite-data efficiency is an important factor for practical applications of deviceindependent randomness generation (DIRG). Previously available DIRG protocols with respect to quantum side information do not exhibit sufficiently high finite-data efficiency, so they require too many experimental trials even with the state-of-the-art photonic loopholefree Bell tests. In this work, we develop quantum probability estimation to yield DIRG protocols with unsurpassed finite-data efficiency and with respect to quantum side information. This enables a practical device-independent randomness beacon where a block of 512 device-independent random bits is generated with an average experiment time of less than 5 min and with certified error bounded by 2 −64 , see our companion experimental work [44]. Our work also enables the realization of device-independent randomness expansion in the near future. Moreover, quantum probability estimation can be applied to the devicedependent scenario, which can result in more efficient randomness generation.
In contrast with previous works for addressing quantum side information, quantum probability estimation does not rely on fixed Bell inequalities for certifying deviceindependent randomness. The main result of quantum probability estimation is that the product of trial-wise quantum estimation factors (QEFs) yields an estimator of the sandwiched Rényi entropy. Quantum probability estimation can further lower-bound the quantum smooth conditional min-entropy after considering the relation between sandwiched Rényi entropies and quantum smooth conditional min-entropies established in the literature. The implementation of quantum probability estimation requires well-performing trial-wise QEFs. For DIRG with the CHSH Bell-test configuration, we provide a numerical approach to effectively construct such QEFs. It is straightforward to extend this numerical approach to Bell-test configurations with multiple parties as long as each party can randomly perform one of two binary-outcome measurements, see the last section of Ref. [45] for details.
Certifying quantum smooth conditional min-entropies is also the central task for quantum key distribution (QKD) [54]. In principle, quantum probability estimation can be extended to improve the finite-data efficiency of QKD, particularly device-independent QKD. For this, we need to certify the quantum smooth conditional min-entropy evaluated at a classicalquantum state after the error-correction step in QKD as done in Ref. [13]. We also need to develop alternative numerical approaches for constructing trial-wise QEFs. We will address the details required for this extension in the future work.

Appendix B:: Protocol composability
Other definitions of soundness for randomness generation use the trace distance instead of the purified distance as used in this work. The purified distance allows for extension to previously traced-out quantum systems such as that of the devices used in the protocol. This enables analysis of protocol composition involving the same devices which may have memory (see the next two paragraphs). This kind of composition could introduce the possibility of memory attacks, whereby the devices leak information about the results of past protocols through leakage channels enabled by later protocols [55]. For randomnessgeneration protocols, such a leakage channel is introduced by the success flag P G : The devices can modify their future behavior so that the flags P G of later protocols depend on the results of past protocols. A detailed discussion of memory attacks for randomness generation is in the supplemental material of Ref. [55]. We note that our protocol presented in Sect. V C of the main text has fixed-length outputs, which avoids leakage channels based on the length of the output but does not eliminate implementation-dependent leakage channels such as variations in timing or side-effects of using randomness.
We do not formally analyze composition of randomness-generation protocols with the same devices, and unrestricted composability is not assured. But to support such composition, we require that the devices are permanently isolated from E and that they never gain knowledge of the seeds used for randomness extraction. The latter supports the following strategy to mitigate P G − based leakage channels: Anticipate the number of future instances of the protocol and reduce the number of bits extracted from the current protocol accordingly. The requirements may be difficult to guarantee in practice but can be weakened once the randomness generated in past protocols is used, see the discussion in Ref. [55].
Let D be the quantum system of the devices, and let ρ R G S G P G ZDE be the actual, normalized Therefore, the soundness in terms of small purified distance implies the soundness in terms of small trace distance even if the previously traced-out quantum system of the devices is included. Here the soundness in terms of small trace distance is defined as existence of a normalized state σ R G S G ZDE satisfying Eqs. (B2) and (B3). Our soundness definition thus enables composability analysis of protocols involving the same devices, where the composability is evaluated in terms of small trace distance. The condition p|Rng(C)| ≥ 1 is satisfied in Protocol 1 of the main text for randomness generation.

Appendix D:: Quantum Markov chains
For randomness certification with explicit conditioning on inputs, we make use of the concept of short quantum Markov chains [43]. Below we specialize the definition of short quantum Markov chains in Ref. [43] to the family of classical-quantum states considered in this work.
Consider a classical-quantum state ρ UVWE of classical systems U, V, W and quantum system E. The state ρ UVWE is said to be a short quantum Markov chain over WE, written as ρ UVWE ∈ U ↔ WE ↔ V, if for all w, there exists a factorization ℋ(E) = ⊕ k ℋ(E 1, k (wU)) ⊗ ℋ(E 2, k (wV )) such that ρ E(uvw) = ⊕ k σ E 1,k (wu) ⊗ τ E 2,k (wv) for all uv ∈ Rng(UV). The definition is symmetric in U and V. That is, ) .
We first observe that where the last equality follows from the definition of χ E in Eq. (E1). In view of the above equation and the definition of sandwiched Rényi powers in Eq. (4) of the main text, we have ℛ α (σ E | ζ E ) = tr(ρ E ). Considering that tr(ρ CZE ) = tr(ρ E ), we thus obtain the first desired equality.
We next prove the second desired equality. For this it suffices to prove the spectral equivalence for each cz. The lemma assumes that for all z, σ E (z) = ∑ c σ E (cz) ≪ ζ E (z), so the support of σ E (cz) is contained in that of ζ E (z) for the right-hand side of Eq. (E6). Starting from the lefthand side and substituting the expression of ρ E (cz), we get where for the above spectral equivalence we used the fact that M † M ~U MM † with To simplify the spectral equivalence in Eq. (E7) further, we use the definition of short quantum Markov chains. Since ξ UZE ∈ U ↔ E ↔ Z, there exists a factorization ℋ(E) = ⊕ k ℋ(E 1, k (U)) ⊗ ℋ(E 2, k (Z)) such that σ E (z) = k σ E 1, k ⊗ ξ E 2, k (z), ζ E (z) = k ζ E 1, k ⊗ ξ E 2, k (z) .

(E8)
Let ξ E 2, k = z ξ E 2, k (z). Then Eq. (E8) implies that From the above equation and in view of the definition of χ E in Eq. (E1), we have The operator χ E 1,k is well defined since the assumption that σ E (z) ≪ ζ E (z) in the statement of the lemma ensures that σ E 1,k ≪ ζ E 1,k for each k. With the direct-sum expressions for ζ E and χ E , we get where 1 E 2, k is the projector onto the support of ξ E 2,k in ℋ(E 2, k ). From Eqs. (E8), (E10) and in view of the expressions of ρ E (z) in Eq. (E5) and χ E 1,k in Eq. (E9), we obtain Let Π E 1,k be the projector onto the support of χ E 1,k and Π E = ⊕ k Π E 1, k ⊗ 1 E 2, k be the projector onto the support of χ E . Substituting the identities obtained in Eqs. (E10), (E11) and continuing from the last line of Eq. (E7) we get where to obtain the second last line we used Eq. (E8), and to obtain the spectral equivalence in the last line we used the fact that which is contained in the support of χ E . Therefore, the projector Π E onto the support of χ E can be eliminated from the final expression in Eq. (E12) to finish the proof of Eq. (E6). ◻ Now we can prove Thm. 5 of the main text as follows.
Suppose that the inequality in Eq. (E13) is proven. Then we have where the inequality in the third line follows from Eq. (E13), and the inequality in the fourth line follows from the facts that G(CZ) is a QEF with power β for C(CZ) and that the state ρ CZE ≐ tr CZ (ρ CZCZE ) is in the model C(CZ) according to the model chaining defined in Eq.
(26) of the main text. Because ρ CZCZE is an arbitrary state in the chained model C(CZ)ο Z|Z C CZ (CZ), the above inequality shows that the function G ⋄ F defined in the statement of Thm. 5 of the main text is a QEF with power β as claimed.
The inequality in the third line follows from the data-processing inequality for Rényi powers, and the equality in the sixth line follows from the fact that each ℰ z is tracepreserving.
The property follows from the fact that ρ CZE is an arbitrary normalized state in the model C(CZ) and the definition of entropy estimators in Eq. (28).
Proof of Property 5: Consider an arbitrary normalized state ρ CZE ∈ C(CZ). In view of the QEF inequality in Eq. (7) of the main text, it suffices to show that g cz (α) ≐ ℛ α (ρ E (cz) | ρ E (z)) is a non-increasing function of α for all cz. For this, we consider the following two cases: 1) For the cz with ρ E (cz) = 0, we have g cz (α) = 0 for all α, and so g cz (α) is non-increasing.
2) For the cz with ρ E (cz) > 0, since ∑ c′z′ g c′z′ (α) ≤ 1 (by Property 1) and since the summands are non-negative, we have g cz (α) ≤ 1. Therefore, the value of log (g cz (α)) is nonpositive. Log-convexity of sandwiched Rényi powers as functions of α (see the first half of Cor. 4.2, Pg. 56 of Ref. [28]) implies that the slope of log (g cz (α)) as a function of α is nondecreasing. In view of −∞ < log (g cz (α)) ≤ 0, the slope of the function log (g cz (α)) at any α cannot become positive, otherwise when α ↗ ∞ the value of log (g cz (α)) would become positive. Thus log (g cz (α)) is a non-increasing function of α. Moreover, since the function log(x) is order-preserving, g cz (α) is also a non-increasing function of α.
Proof of Property 6: Consider a normalized state ρ CZE ∈ C(CZ). Define the probability distribution μ(CZ) by μ(cz) = tr(ρ E (cz)). We check the QEF inequality with power γβ at ρ CZE as follows: since F(CZ) is assumed to be a QEF with power β.
The property follows from the fact that ρ CZE is an arbitrary normalized state in the model these exist orthonormal bases in ℋ(D A ) and ℋ(D B ) such that P D A (i) , x (a) and P D B (j) , y (b) can be written as Q x;θ A,i (a) and Q y;θ B,j (b) for some θ A,i , θ B,j ∈ (−π, π].
The direct-sum structure of POVMs implies that for all inputs xy and outputs ab where the state ρ D A Since each Π ab|xy is a rank-1 projector, direct calculation shows that tr D (|Φ〉 DE 〈Φ|(Π ab|xy ⊗ 1 E )) = (Π ab|xy ) T , where for a matrix M, M T denotes its transpose. Further, considering that each Π ab|xy is real and symmetric, the state in Eq. (G6) is simplified to the desired form in the statement of Thm. 6 of the main text. ◻ Given a real normalized state τ 0 ∈ S 1 , let ∇W| τ 0 be a gradient matrix of the function W at τ 0 satisfying that W(τ 0 + ϵΔ) = W(τ 0 ) + ϵ tr (∇W| τ 0 Δ) + o(ϵ) for every Hermitian matrix Δ and an arbitrarily small parameter ϵ > 0, where o() is the little-o notation. We observe that ∇W| τ 0 can be chosen to be a Hermitian matrix. To see this, suppose that M ≐ ∇W| τ 0 is a complex but not Hermitian matrix. Then, as the function W(τ 0 + ϵΔ) is real, so is tr(MΔ) for every Hermitian matrix Δ. Moreover, tr(M † Δ) is real, as it is equal to tr(MΔ). Now we can set the gradient ∇W| τ 0 to be (M + M † )/2, which is a Hermitian matrix. In a similar way, we can further choose ∇W| τ 0 to be a real symmetric matrix.
It is desirable to make the upper and lower bounds as tight as possible. For this we use an iterative method. Given a feasible solution τ 0 , we can apply line search to find another feasible solution τ 1 such that W(τ 1 ) ≥ W(τ 0 ) as follows. Let Π 0 be the projector onto the span of the eigenvectors of the gradient matrix ∇W| τ 0 with eigenvalues larger than or equal to W(τ 0 ). Then, the line segment L = {(1 − ϵ)τ 0 + ϵΠ 0 /tr(Π 0 ) : 0 ≤ ϵ < 1} constitutes a subset of feasible solutions. By line search, we can find τ 1 = argmax τ∈L W(τ). We thus have W(τ 1 ) ≥ W(τ 0 ). By iteration, we can obtain a sequence of feasible solutions {τ i : i = 0, 1, 2, …} such that W(τ i ) ≤ W(τ i+1 ) for all i. At each feasible solution, we have W (τ i ) ≤ max τ ∈ S 1 W (τ) ≤ λ max (∇W | τ i ) following the argument of the above paragraph. We can stop the iteration as long as the gap between the least upper bound and the greatest lower bound obtained so far is smaller than a prespecified precision. We remark that the convergence of the obtained upper bounds {λ max (∇W| τ i ), i = 0,1, 2, …} is not promised. However, we emphasize that a certified but not necessarily tight upper bound is sufficient for QEF verification.
In order to implement the above iterative method, we need to compute the gradient matrix ∇W. Considering the explicit expression of W as in Eq. (41) of the main text, it suffices to compute the gradients of functions in the form W Π (τ) = (tr(τ 1/α Π)) α , where Π is a rank-1 projector. We can write the gradient in the form ∇W Π (τ) = α(tr(τ 1/α Π)) β X, where X ≐ ∇tr(τ 1/α Π). Note that if the initial solution τ 0 is positive definite (which is true as usual), so is each τ i obtained by the iterative method. Therefore, for practical implementation we only need to know the gradient ∇W Π at arbitrary positive τ in S 1 . For this case, the matrix X is derived in Appendix I.
We remark that m is a free parameter, characterizing the resolution in the division of the feasible region of (θ A , θ B ). With the increase of m, we expect that the upper and lower bounds on f k obtained converge. In our implementation of the above method, we start dividing the feasible region at a low resolution, and refine the subregions I AB,ij if the gap between the upper and lower bounds on f k is too large. However, not all subregions I AB,ij need to be refined. We refine the subregions I AB,ij in the order of priority. The priority is determined by the obtained upper bounds u k,ij on f k (I AB,ij ). The subregion I AB,ij with the highest upper bound u k,ij will be refined with the highest priority. A possible refinement strategy is to divide the region I AB,ij into four subregions. We determine the upper and lower bounds on each refined subregion. We then update both the upper and the lower bounds on f k obtained so far. We continue the refinement until the gap between these two bounds is smaller than a prespecified precision.
Appendix I:: Derivation of the matrix X in Eq. H2 To compute X at τ > 0 we use perturbation techniques. Let τ′ = τ + ϵΔ, where ϵ > 0 is sufficiently small and Δ is a real symmetric matrix. The matrix τ can be decomposed in terms of its eigenspace projectors as τ = ∑ i λ i Λ i , where λ i > 0 and Λ i Λ j = Λ i δ ij . Since where Δ ij = Λ i ΔΛ j for all ij. By introducing S = ∑ i≠j S ij with S ij = Δ ij /(λ j − λ i ) and noting that τΛ i = Λ i τ = λ i Λ i , we can write Δ as One can see that the support of Δ ij is in Λ i and S is skew-symmetric (that is, S T = −S) with Λ i SΛ i = 0 for each i.
Let U = e ϵS . Since S is skew-symmetric, we have U T U = e ϵS T e ϵS = 1. That is, U = e ϵS is orthogonal. Therefore, for any γ > 0 and any ρ ≥ 0 we have (UρU T ) γ = Uρ γ U T .
where γ > 0 and O() is the big-O notation.
To prove the equality in Eq. (I3), we first set Y = ∑ i Δ ii , and note that Y and ρ commute with each other. So, Y can be written as Y = ∑ i y i Λ i , and where for the equalities in the third and last lines we used the orthonormality conditions Λ i Λ j = Λ i ε ij , and for the equality in the forth line we used the fact that all λ i > 0 and the Taylor-series approximation of (λ i + ϵy i ) γ . Second, we combine Eq. (I4) with the Taylorseries approximation U = e ϵS = 1 + ϵS + O(ϵ 2 ). Then considering that S T = −S, direct calculation establishes the desired equality in Eq. (I3).  Here, for the equality in the second line we used Eq. (I1), for the equality in the third line we used Eq. (I3) with γ = 1, for the equality in the fifth line we used Eq. (I2) with γ = 1/α, and for the equality in the last line we used Eq. (I3) with γ = 1/α.

Now we obtain
With the decomposition τ = ∑ i λ i Λ i and the orthogonality relations Λ i Λ j = Λ i δ ij and Λ i Δ jj = Δ jj δ ij , we have Further, considering that S = ∑ i≠j Δ ij /(λ j − λ i ), Δ ij Λ k = Δ ij δ kj and Λ k Δ ij = Δ ij δ ki we get Schematic of an experiment. The experiment can be either for device-independent or devicedependent randomness generation.