Security proof of practical quantum key distribution with detection-efficiency mismatch

Quantum key distribution (QKD) protocols with threshold detectors are driving high-performance QKD demonstrations. The corresponding security proofs usually assume that all physical detectors have the same detection efficiency. However, the efficiencies of the detectors used in practice might show a mismatch depending on the manufacturing and setup of these detectors. A mismatch can also be induced as the different spatial-temporal modes of an incoming single might couple differently to a detector. Here we develop a method that allows to provide security proofs without the usual assumption. Our method can take the detection-efficiency mismatch into account without having to restrict the attack strategy of the adversary. Especially, we do not rely on any photon-number cut-off of incoming signals such that our security proof is complete. Though we consider polarization encoding in the demonstration of our method, the method applies to a variety of coding mechanisms, including time-bin encoding, and also allows for general manipulations of the spatial-temporal modes by the adversary. We thus can close the long-standing question how to provide a valid, complete security proof of a QKD setup with characterized efficiency mismatch. Our method also shows that in the absence of efficiency mismatch, the key rate increases if the loss due to detection inefficiency is assumed to be outside of the adversary's control, as compared to the view where for a security proof this loss is attributed to the action of the adversary.


I. INTRODUCTION
For practical quantum key distribution (QKD) [1] using photon-counting techniques (discrete variable QKD), information is usually encoded in optical signals that contain multiple photons. To decode the information, one measures the optical signals usually with threshold detectors which cannot tell apart the number of incoming photons. Security proofs of practical QKD protocols usually assume that all threshold detectors used have the same efficiency. Under this assumption, one can push the detection efficiency into the transmission channel, which is under Eve's control. Thus the transmission loss and the inefficiencies of the detectors can be lumped together, and one can apply a security proof that applies to the new increased effective transmission loss followed by ideal threshold detectors with perfect efficiency [2].
In practice, however, it is not an easy job to build two detectors that have exactly the same efficiency. For example, the two detectors may be fabricated by different processes and so a mismatch between their efficiencies is induced. In the presence of efficiency mismatch, the different values for detection inefficiency cannot be lumped together and further treated as a single value for the loss over the transmission channel. Therefore, existing security-proof techniques cannot be applied.
Even with a single detector, an efficiency mismatch can be induced by an adversary. Suppose that the response of this detector to a photon depends on its degrees of freedom such as spatial mode, frequency, or arrival time. These degrees of freedom are not necessarily being used to encode information. If an adversary can manipulate these degrees of freedom, then an effective efficiency mismatch is induced. When the induced mismatch is large enough, powerful attacks on QKD systems exist, as demonstrated in Refs. [3][4][5][6]. In typical experiments the efficiency mismatch may not seem significant, but it still means that the security cannot be formally proven by existing techniques.
In this paper we develop analytic tools that allow, subsequently, to prove with numerical methods the security in the presence of detection-efficiency mismatch. The method works as long as the efficiency mismatch is characterized, even if the mismatch depends additionally on degrees of freedom of an optical signal that are not exploited to encode information. To demonstrate our approach, we apply it to a Prepare&Measure BB84-QKD protocol [7]. Without loss of generality, we consider polarization encoding. Here we study the general case where the optical signals received by Bob may contain an unbounded number of photons such that their states live in an infinite-dimensional space. We can lower-bound the secret-key rate as a function of detection-efficiency mismatch and observed statistics. With our method, we can also study the individual effects of transmission loss and detection inefficiency on the secret-key rate. Our method is transferable to other QKD protocols. We note that Refs. [8][9][10] studied the security proof of the BB84-QKD protocol in the presence of efficiency mismatch but under the assumption that Bob receives no more than one photon. However, this assumption cannot be justified in practical implementations of QKD where threshold de-tectors are being used.
The rest of the paper is structured as follows: In Sect. II we describe the basic setup for an optical BB84-QKD implementation with a special emphasis on the description of the spatial-temporal modes coupled to the detectors. Then we explain our method in Sect. III, where we also apply it to the described setup. In order to show the implication of our proof methods, we require a toy-model that describes what observations we would expect in real experiments, which we do in Sect. IV. There we also show the secret-key rates that we obtain for setups that exhibit detection-efficiency mismatch. We summarize our findings in Sect. V. We note that all detectors considered in the rest of the paper are threshold detectors by default.

II. EXPERIMENTAL CONFIGURATION
The method that we develop in this article is about the treatment and analysis of the detector. Therefore, to lay out and illustrate the method we develop, it is sufficient to use the simple BB84 protocol [7], which we consider with an ideal single-photon source, but with threshold detectors monitoring full optical modes. Without loss of generality, we use the polarization-encoding language.
For our theoretical analysis, we use the entanglementbased formulation of Bennett, Brassard and Mermin [11]. This approach has been later generalized for general QKD protocols to the source-replacement scheme [12]. This source-replacement scheme, in a thought-setup, realizes the source by preparing internally to the source a bi-partite entangled state. Measurements on one system effectively prepare the remaining system in the desired signal states with the prescribed probabilities. In the case of the BB84 protocol with an ideal single-photon source, the internal entangled state in the thought-setup is the maximally entangled state where |H and |V are horizontally and vertically polarized single-photon states, respectively. System A is prepared in the signal states of the BB84 protocol as Alice uniformly randomly selects to measure the system A in the horizontal/vertical (H/V ) basis or the diagonal/antidiagonal (D/A) basis. System A enters the channel controlled by Eve and will emerge as system B at Bob's site. At that stage, the signal is not necessarily a single-photon signal, but can (due to Eve's action) be in any state of the optical modes supported by the detectors. For example, Eve might amplify the signal using an optical amplifier or replace the signals with multi-photon states at her discretion. Bob thus has to perform a measurement on the full optical modes, not on the single-photon signals. In our setup, he randomly selects to measure the signal in either the H/V basis or the D/A basis of the optical modes supported by his device. We call the above procedure of preparing, distributing and measuring signal states a round.
After a large number of rounds, with the data recordings that detail Alice's effective signal choices and Bob's measurement outcomes, Alice and Bob continue the QKD protocol using the usual steps of testing, sifting, key map, error correction, and privacy amplification to obtain secret keys. Our method can be easily generalized for other protocols that use, for example, weak coherent pulses as signal states, but the single-photon source example studied in this work is sufficient to demonstrate our method, which is about the detection side.
So let us turn our attention to Bob's detection: Either the active-or passive-detection scheme, as depicted in Fig. 1, can be exploited. As the detectors used in each scheme are threshold detectors, each detector can respond to an incoming optical signal only in two different ways, click or no click. The detectors might respond to different modes (frequency, timing, etc).
As stated in the introduction, there are two scenarios where an detection-efficiency mismatch may exist. Let us start with the first one. Due to the fabrications or setups in practice, the two detectors shown in Fig. 1(a) for the active-detection scheme may have different efficiencies η H/D and η V /A . Similarly, the four detectors in Fig. 1(b) for the passive-detection scheme may have efficiencies η H , η V , η D and η A respectively. Here, the subscripts indicate the detectors used in a scheme. We call this kind of mismatch the spatial-temporal-mode-independent mismatch, in contrast to the following mismatch which depends additionally on the spatial-temporal modes chosen by an adversary.
The second scenario is that of an active adversary. By manipulating the spatial-temporal mode of an optical signal, the adversary can change the coupling of the signal with a detector, resulting in a change in the effective detection efficiency of the detector. Especially in freespace QKD it is possible for Eve to change the angle of TABLE I. Spatial-temporal-mode-dependent mismatch model in the active-detection scheme, where 0 ≤ η2 ≤ η1 ≤ 1. The efficiencies of the two detectors labelled in Fig. 1(a) are listed in a column, where each column corresponds to a spatial-temporal mode.
an incoming signal [5,6] to influence the coupling of the signal with the active detection area of a detector, while for fiber-based signals simple time delays can be introduced [3] to exploit uneven aligned detection time windows. Therefore, in a setup with several detectors, the efficiencies of these detectors can not only differ from each other but also depend on the spatial-temporal modes coupled to the detectors, giving rise to the so-called spatialtemporal-mode-dependent mismatch. In this work we analyze the security in both above scenarios. Bob's detectors may respond to a large number of spatial-temporal modes. If the detection efficiencies related to these modes differ strongly from each other, it might become possible for Eve to control Bob's detection events thoroughly by sending the signals to the modes that couple particularly well only to a specific detector of Bob for which Eve desired to cause a detection event. For this attack to be possible in its extreme form, the number of modes must be equal to, or larger than, the number of detectors in the setup. For this reason, we choose the number of controllable modes to be equal to the number of detectors. In order to obtain visually simple illustrations of the secret-key rates, we choose mismatch models parametrized by two values for the efficiencies: a high value η 1 for one detector, and a lower value η 2 for the other detectors, as shown in Tables I and II. We emphasize that these mismatch models are considered just for ease of visual presentation, as the approach developed here can be exploited with an arbitrary mismatch model. To analyze the security of QKD systems, for example in a certification process, the choice of the mismatch model and its parameters will need to be justified in practice.

III. KEY-RATE CALCULATION METHOD
A. Formulation of key-rate calculation as a convex-optimization problem The asymptotic key rate certifiable against all collective attacks [13] is given by the difference between two terms, which are associated with privacy amplification (PA) and error correction (EC) respectively. The EC term depends only the the measurement statistics and can be calculated without any further information on the TABLE II. Spatial-temporal-mode-dependent mismatch model in the passive-detection scheme, where 0 ≤ η2 ≤ η1 ≤ 1. The efficiencies of the four detectors labelled in Fig. 1(b) are listed in a column, where each column corresponds to a spatial-temporal mode.

Mode 1 Mode 2 Mode 3 Mode 4
Detector 'H' implementation of the QKD protocol. The main difficulty of the security proof relies on how to obtain a lower bound on the PA term. As shown in Refs. [14,15], a reliable numerical lower bound on the PA term can be provided by solving a convex-optimization problem. In the following, we will give a brief review of the theory behind that reformulation. In a generic QKD protocol, the measurement statistics in an experiment are summarized as a probability distribution p AB (x, y), where x and y are random variables corresponding to the events detected by Alice and Bob respectively. The corresponding measurement operators are M A x and M B y . In addition, for the techniques shown in this paper, we will be able to provide from experimental observations lower bounds on the probability of at most k photons arriving at Bob. These bounds will be brought in as additional explicit constraints in the convex-optimization problem. To formulate the corresponding constraints, we introduce the projectors Π k onto the photon-number subspace of Bob containing at most k photons, and the corresponding lower bound on its expectation value as b k . Then, the calculation of the PA term, denoted by α, can be written as the convexoptimization problem [14,15] α := min ρAB D(G(ρ AB )||Z(G(ρ AB ))) subject to Here, D(σ||τ ) := Tr(σ log 2 σ)−Tr(σ log 2 τ ) is the relative entropy, G is the post-selection map, and Z is the quantum channel describing the key map of the QKD protocol (see below for the details). In our applications we will later choose for the photon-number cut-off k ∈ {1, 2}, or use even the constraints for both values of k. We remark that both the objective function and constraints are convex in the optimization variable ρ AB .
Once we obtain a reliable lower bound β on the PA term α of Eq. (2) as β ≤ α according to the numerical method developed in Ref. [15], the asymptotic key rate K per round is bounded by where leak EC obs denotes the amount of information leaked to the adversary per round of the protocol during error correction. This takes automatically into account any post-selection mechanism of the protocol, as any jointly discarded signals do not cause an error-correction cost. Likewise, the PA cost β automatically takes care of the same post-selection process, so that the total key rate K is counted as per round of the protocol. As we are discussing key rates in the asymptotic limit of a large number of exchanged signals, the reduction by any fraction of signals that is utilized to estimate the observed probability distribution p AB (x, y) of measurement results and other finite size effects are negligible. Furthermore, the security proofs under collective and coherent attacks are equivalent in this limit [16], and hence our approach holds for coherent attacks.
The map G in the objective function of Eq. (2) describes the post-selection after Alice's and Bob's public announcements for sifting. For simplicity, we concentrate here on the case where to distill secret keys Alice and Bob keep only those signals where both measured in the H/V basis. Note that for optical implementations, the announcements usually used for sifting are slightly more involved than the simple basis-dependent sifting of the BB84 protocol. The reason is that the potential presence of multiple photons in the incoming signals can cause several detectors to show detection events simultaneously. If Bob uses the active-detection scheme, the sifting announcement by Bob consists of the declaration whether he used the H/V basis measurement, and whether at least one detector fired. However, if Bob uses the passive-detection scheme, we have to decide what to do with events where we have multiple detections across the groups associated with different polarization bases (cross clicks), for example both the H and the D detector firings. Here we make the choice to keep only those events where either the H, the V , or both the H and V (denoted as HV event) detectors fire, while all other events (no clicks, clicks only in any of the D and the A detectors, or cross clicks) are being discarded. In order to achieve this goal, Alice publicly announces the basis choice where one of two bases is chosen uniformly randomly at each round, and Bob announces whether the desired events are observed. This corresponds to applying the post-selection map HV is a Kraus operator. Here, 1l A is the identity operator in the state space of Alice, and the positive-operator valued measure (POVM) elements M B H , M B V , and M B HV for Bob have been derived in Appendix A and B of Ref. [17], with the remark that for the active-detection scheme we need to put the coefficient 1/2 before each POVM element shown in Ref. [17] to account for Bob's probability of selecting each measurement basis.
After the public announcements and the corresponding post-selection step, Alice chooses a key map, which is represented by a quantum channel Z. The key map is a function whose input is Alice's measurement outcome in the key-generation basis and whose output is a key value, 0 or 1. Suppose that we make a particular choice of key map here, namely that Alice's outcomes H and V are mapped to key values 0 and 1, respectively, and that the corresponding POVM elements M A H and M A V are projective (see Appendix A). The application of the key map corresponds to the application of the quantum channel Given the measurement statistics p AB (x, y), the lower bounds b k on the photon-number distribution, the postselection map G, and the key-map-realizing quantum channel Z, in principle we can run numerical optimization to obtain a reliable lower bound of the minimization problem in Eq. (2). However, for the situation studied, the number of photons arriving at Bob is unbounded and so the dimension of the quantum state ρ AB is infinite. For this reason we need to develop techniques that allow us to simplify the optimization problem such that a reliable key-rate lower bound can be numerically obtained using finite-dimensional quantum states. These techniques are described in the next two subsections. Since Bob's measurement POVMs are block-diagonal with respect to the subspaces associated with total photon numbers across all modes [17], we can assume without loss of generality that Eve performs a quantum nondemolition (QND) measurement of the total photon number after her interaction with the signals, and before their arrivals at Bob's side. As a consequence, the state ρ AB can be assumed, without loss of generality, to be blockdiagonal in the same subspace structure, meaning that the state takes the form The weight of each subspace carrying a total number of n photons is given by the corresponding probability p n , and the corresponding normalized conditional state is denoted by ρ AB . Considering the block-diagonal structure of the state and Bob's measurement POVMs, we can write where k is a free parameter chosen in the security proof and p n≤k is the probability that no more than k photons arrive at Bob. The (n ≤ k)-photon subspace is of finite dimension, which is compatible with the numerical keyrate optimization framework. On the other hand, the (n > k)-photon subspace is infinite dimensional, which is not directly suitable to be handled by our numerical methods. To resolve this problem, we introduce the flagstate squasher. The general framework of squashing models that map large-dimensional measurement descriptions without loss of generality to lower-dimensional systems has been described in Refs. [18][19][20].
where the states |y form an orthonormal basis of H J .
Proof. We need to show that the CPTP map Λ exists with the desired properties. This can be done by explicit construction as indicated in Fig. 2. For this purpose, we consider a general input state given in block form ρ in = ρ bb ρ bB ρ Bb ρ BB , where index 'b' refers to the subspace n ≤ k and index 'B' to subspace n > k. We can then describe the action of the squashing map Λ by its action onto an arbitrary input state of the above form as The second subspace, which in the case of our measurement will be infinite dimensional, is simply reduced to a smaller subspace by performing the measurement on that subspace and flagging the result of that partial measurement into an orthogonal register which replaces the original second subspace. This approach of creating squashing models to smaller total Hilbert spaces relies only on the block-diagonal structure of the original POVM elements. As soon that assumption is met, a flagstate squasher can be constructed. As in any case where a squashing map exists mapping the original measurement to an alternative measurement in a smaller dimension, we can assume that the squashing map is part of Eve's action. As a result, we overestimate Eve's power (see below for a detailed explanation), but as a trade-off we can now assume without further loss of generality that Bob receives signals in the reduced Hilbert space. So the key-rate optimization problem in Eq. (2) formulated with the squashed states of the form in Eq. (9) and POVM elements of the form in Eq. (8), which is a finite-dimensional convex-optimization problem, will provide a lower bound on the secret-key rate in the actual implementation. Note, however, that the virtual POVM element componentsM B y,n>k = |y y| are projective and orthogonal. Therefore, Eve could perform a strong attack by measuring the incoming signals from Alice with Bob's actual measurement {M B y : y = 1, . . . , J} and then preparing/sending to Bob the flag state |y conditional on her measurement result y . This attack would deterministically trigger the same result y when Bob performs the virtual measurement {M B y : y = 1, . . . , J} according to the squashing map. Hence, by attributing the squashing map to Eve's action, Eve could completely learn every result of Bob, and so we overestimate the power of Eve as compared with in the actual implementation. For this reason, the flag-state squasher must be accompanied by a constraint that limits the resulting state mostly to the (n ≤ k)-photon subspace, which is given by the bound b k in our optimization problem of Eq. (2).
Finally, we remark that without loss of generality the states ρ AB and ρ (n) AB can be assumed to be real-valued. This is because all measurement POVM elements M A x and M B y of Alice and Bob can be represented by realvalued matrices and because the objective function to be minimized for bounding the key rate in Eq. (2) is a convex function of the state ρ AB . For detailed proofs see for example Sec. V C in Ref. [21]. We also emphasize that the block-diagonal structure and the real-matrix representation of the state ρ AB apply to both the active-and passive-detection schemes. By using a real-matrix representation of ρ AB , the number of free parameters in the key-rate optimization problem of Eq. (2) is reduced.

C. Constraints on photon-number distribution
To solve the convex-optimization problem in Eq. (2), we need make use of a flag-state squasher as introduced in Theorem 1 where the small-dimensional subspace will be chosen to be the incoming subspace containing at most n = 1 photon, or at most n = 2 photons. In order to obtain positive key rates, it will be necessary to show that the overlap of the incoming states with this subspace can be lower-bounded by some number b k , k = 1 or 2. Following the numerical method developed in our previous study of entanglement verification with efficiency mismatch [17], we obtain such bounds directly from the experimentally observed measurement statistics p AB (x, y). The intuition behind this approach is that higher photon numbers will necessarily lead to double clicks, cross clicks, and/or errors.
This way of using experimental observations to bound the photon-number distribution was first established in Ref. [2] and further refined in Ref. [22], and then extended to the case of inefficient detectors in Ref. [17]. Note that the theoretical approach is independent of the number of spatial-temporal modes that we use (in addition to the polarization degree of freedom). We demonstrate the results of our method here for the two-mode case (with the active-detection scheme) and for the four-mode case (with the passive-detection scheme).
Before explaining the method, we would like to point out that the two properties of the state ρ AB discussed in the above subsection, i.e., the block-diagonal structure with respect to various photon-number subspaces and the real-number representation of the density matrix, will be used also in the optimization problems formulated in this subsection. The second property helps to reduce the number of free parameters in the optimization.

Active-detection case
As stated before, the intuition is that as an increasing number of photons are received by Bob, the probability of double clicks (clicks at both detectors) will increase and finally surpass the double-click probability observed in an experiment. Similar arguments hold for an effective error, which we define below. Thus we will show that the experimental observations allow us to put an upper bound on the probability that the signals received by Bob contain more than any given number of photons.
In order to make this intuition precise, we start by defining the double-click operator and the effective-error operator where the pre-factor 1/2 at each term describes the probability to choose the corresponding measurement basis. The form of the effective-error operator is chosen according to the squashing model [18,19] for the activedetection scheme: there double-click events are mapped uniformly randomly in a post-processing step to either of the two single-click events associated with the chosen basis. In Eqs. (10) and (11), Alice's measurement operators are ideal measurement operators in the one-photon space (see Appendix A), while Bob's measurement operators are described in Appendix A and B of Ref. [17].
We formalize the above intuition by studying the following optimization problems  EE are projections of the operators F DC and F EE onto the n-photon subspace of Bob. We remark that the above optimizations are over all possible n-photon states ρ (n) AB , while the optimization problems formulated in our previous study of entanglement verification with efficiency mismatch [17] run over only the states ρ (n) AB satisfying the positive-partial-transpose criterion [23,24].
The optimization problems described by Eqs. (12) and (13) have the form of semi-definite programs (SDPs). In order to solve them, we utilize the YALMIP [25] toolbox in MATLAB. From our calculations we make the observation that the minimum double-click probability d n,min monotonically increases as the the photon number n goes up. We therefore obtain the inequality d n,min ≥ d 3,min , ∀n ≥ 3.
We would like to point out that we did not go through the effort to prove the above inequalities with analytical methods, though the numerical evidence strongly supports that these inequalities hold for an arbitrary activedetection efficiency mismatch. In Figs. 3 and 4, we report our numerical evidence for the specific mismatch model of Table I. Especially, one can see from these figures that the curve becomes monotonous as the efficiency mismatch increases.  Table I with η1 = 1 and η2 = η. Note the monotonicity of each curve as a function of n and that d2,min, as well as d1,min, is always equal to zero.
In view of Eqs. (14) and (15), we find that the doubleclick probability d obs and the effective-error probability e obs observed in practice satisfy and by using that ∞ n=0 p n = 1. Hence, we can set the bound b 2 ≤ p 0 + p 1 + p 2 as Note that for the observations simulated in Sect. IV, we found that d obs d3,min < e obs e3,min and therefore the bound b 2 =  (20) Thus we can obtain a bound b 1 ≤ p 0 + p 1 as In this case, the double-click estimations do not lead to a non-trivial bound on b 1 as there exist two-photon states that do not lead to double clicks (d 2,min = 0), see Fig. 3.  Table I with η1 = 1 and η2 = η. Note that e3,min is a lower bound for en,min when n ≥ 3 and that e1,min is always equal to zero.
The above bounds b 1 and b 2 together with the flagstate squasher approach for the corresponding subspaces can be used in the key-rate optimization problem of Eq. (2) when the active-detection scheme is used.

Passive-detection case
The passive-detection scheme utilizes a 50/50 beam splitter to passively select a measurement basis, as shown in Fig. 1(b). Clearly, the probability that each output arm of the beam splitter contains at least one photon is given by 1 − 2 −(n−1) . We therefore have the following expectations: 1) The probability of simultaneous photon detections at both output arms (referred to as cross clicks) would increase with the photon number n; 2) In the limit of large n, the cross-click events would happen with near certainty. These motivate us to consider the associated cross-click operator with M B CC being Bob's cross-click POVM element (see Appendix A and B of Ref. [17] for the derivation and  Again, we solve this optimization problem using the YALMIP toolbox [25] in MATLAB. The numerical solutions of the optimization problem in Eq. (23) provide strong evidence that the cross-click probability c n,min increases monotonically with n and converges to the unit value 1 for an arbitrary passive-detection efficiency mismatch. We would like to point out that any evaluation of secret-key rates using our approach requires solving an SDP problem, such as those in Eqs. (12), (13) and 23, thus allowing the validification of the working assumption for a chosen mismatch model and parameters. Particularly, the numerical evidence for our mismatch model and parameters is shown in Fig. 5, which suggests the following two inequalities and c n,min ≥ c 2,min , ∀n ≥ 2.  Table II with η1 = 1 and η2 = η. Note the monotonicity of each curve as a function of n, supporting the inequalities in Eqs. (24) and (25).
The inequality in Eq. (24) tells us that the cross-click probability c obs observed in practice satisfies (26) Here we used the fact that ∞ n=0 p n = 1. Thus we obtain a bound b 2 ≤ (p 0 + p 1 + p 2 ) as Similarly, from Eq. (25) we can obtain a bound b 1 ≤ (p 0 + p 1 ) as The above bounds b 1 and b 2 together with the flagstate squasher approach for the corresponding subspaces can be used in the key-rate optimization problem of Eq. (2) when the passive-detection scheme is used.

IV. SECRET-KEY RATES WITH SIMULATED OBSERVATIONS
As pointed out before, the method developed in Sect. III allows a security analysis of a QKD setup with an arbitrary detection-efficiency mismatch. Any such security analysis requires the determination of constraints on the probability of the state in a subspace containing at most a given number of photons, and then a key-rate lower bound can be obtained using those constraints together with a flag-state squasher. We now illustrate our approach for the specific mismatch models of Tables I and  II. As the security analysis usually requires as input some data observed in experiments, we replace here the experiments by simulations according to a simple quantumoptical model. We specify this toy model below, but it is important to point out that this toy model is not part of the security analysis, or in anyway an assumption on which our security proof itself is based.

A. Data simulation
We study a BB84 protocol with an ideal single-photon source using polarization encoding. As described in Sect. II, at each round of the protocol Alice prepares one of four possible single-photon polarization states selected uniformly randomly. Bob can use either the activeor passive-detection scheme. In the active-detection scheme, we assume that at each round Bob can randomly select the key-generation basis with probability p = 1/2. The single photon prepared by Alice is transmitted through the adversary's domain to Bob. We model the corresponding quantum channel as a depolarizing channel Λ(ρ) = (1 − ω)ρ + ω 1 2 1 1 with depolarizing probability ω; additionally, the single-photon transmission efficiency over the channel is t. In order to introduce multiple detector clicks, Eve intercepts in our channel model with probability r the single photon and resends multiple photons to Bob. Specifically, Eve resends randomly polarized m photons in the state Here, the photon-creation operatorâ † θ is given in terms of the operatorsâ † H andâ † V of the respective linear polarizations asâ † θ = cos(θ)â † H + sin(θ)â † V . In our simulations, we will choose the photon number m = 2.
When applying the flag-state squasher approach, we choose to separate either the (n ≤ 1)-photon or the (n ≤ 2)-photon subspace from their respective complements. In our efficiency-mismatch models we consider several spatio-temporal modes, in addition to the polarization mode. In our toy quantum channel, we additionally assume that the optical signals are uniformly randomly distributed over all considered spatio-temporal modes.
B. Key rates in the absence of mismatch: Trade-offs between transmission efficiency and detection efficiency As mentioned in the introduction, when there is no efficiency mismatch between the detectors used in the measurement device, one can pull the detection inefficiency out of the detectors and into the channel action, creating an effective transmission loss. Consequently, the measurement device now is described by an ideal-detector setup for which a squashing model [18][19][20] exists, and so one can execute a full security proof. However, the resulting key rate might be conservatively low, because the existing security proof assumes that the photon loss during the actual transmission, as well as that due to the detection inefficiency, can be manipulated by Eve while under the original description of Bob's measurement device the photon loss inside of the device cannot be accessed by Eve. Such fact has been explicitly pointed out in literature such as in Ref. [26]. So while it is known that this is an overly pessimistic assumption, the issue is that proof techniques were missing to treat the security assuming the detection efficiency to be not accessible by Eve. We can tackle this question now with the techniques developed in this work.
With our numerical method, we can prove the security of a QKD protocol with arbitrary measurement operators as long as they are well characterized. In particular, we can characterize the detection efficiency of each detector in a measurement device, and so we can determine the corresponding measurement operators (see Appendix A and B of Ref. [17]). In this way, we can study the individual effects of transmission efficiency and detection efficiency on the secret-key rate. To demonstrate these effects, for this particular result we assume for simplicity that each optical signal arriving at Bob contains no more than two photons, rather than using our flag-state squasher approach.
The results are shown in Fig. 6. From this figure, one can see that given the fixed total photon loss over both transmission and detection, Alice and Bob can distill more secret keys if they consider detection inefficiency and transmission loss separately rather than lumping these two kinds of losses together in the security proof. In particular, when the product tη is fixed, the higher the value of t, the higher the secret-key rate is. On the other hand, when t and η are lumped together as an effective transmission efficiency tη, our numerical method provides the same key-rate lower bound 1 4 p det (1 − 2h(e)) (see the results plotted in Fig. 6 when η = 1), with the detection probability p det at the key-generation basis, the qubit error rate e and the binary entropy function h(e), as the standard security proofs with the help of the squashing model [18][19][20] for treating multiple-detection events.  Fig. 1. We consider both the active-and passive-detection schemes. For data simulation, we fix the depolarizing probability ω = 0.05, the multi-photon probability r = 0.05, and the product of transmission efficiency t and detection efficiency η to be tη = 0.1. We choose these values just for ease of graphical illustrations. We remark that under each detection scheme, the probability distribution observed by Alice and Bob does not change with η as long as the simulation parameters ω, r and tη are fixed.
We also performed numerical calculations, not presented here, which show that the higher the multi-photon probability r, the more significant improvement in the secret-key rate is achieved when separating t and η in the security proof. Particularly, we observed that when the optical signal has no multi-photon component (i.e., r = 0), the secret-key rate is independent of η as long as ω and tη are fixed. However, in practice multiple-detection events occur due to the use of sources containing multi-photon states, cross talks in fibers, or dark counts in detectors.

C. Key rates with active-detection efficiency mismatch
Let us study the dependence of the secret-key rate on the detection-efficiency mismatch with the activedetection scheme. We consider two scenarios: In the one-mode scenario all photons received by Bob are in the same spatial-temporal mode, and the two detectors labelled by 'H/D' and 'V /A' in Fig. 1(a) have efficiencies η 1 and η 2 respectively; in the two-mode scenario the photons received by Bob can stay in one of two possible spatial-temporal modes. The efficiency mismatch for the combinations of spatial-temporal modes and polarization detectors is shown as in Table I. For security proofs, we make use of and compare two different assumptions/techniques to deal with potential multiphoton signals arriving at Bob's detectors: we either assume that each signal received by Bob contains no more than two photons, or we prove security without such assumption. In the latter case we apply a flag-state squasher using the (≤ 2)-photon subspace and its complementary subspace, and in the key-rate optimization problem of Eq. (2) we incorporate the lower bounds b 1 and b 2 on the photon-number probabilities (p 0 + p 1 ) and (p 0 + p 1 + p 2 ). These bounds are based on observations and are discussed in Sect. III C (see Eqs. (19) and (21)).  Fig. 1(a). For data simulation, we fix the detection efficiency of the detector labelled by 'H/D' (for the signals stayed in the first spatial-temporal mode) to η1 = 0.2. We also fix the depolarizing probability ω = 0.05, the multiphoton probability r = 0.05, and the transmission efficiency t = 0.5 (corresponding to 3dB loss). We remark that for the active-detection scheme the key rate scales linearly with the probability p for Bob to select the key-generation basis when other simulation parameters are fixed.
The typical results are shown in Fig. 7. We can make directly several observations from Fig. 7: 1. The larger the efficiency mismatch, the lower the secret-key rate is. There exists a threshold for the efficiency mismatch beyond which it is not possible for Alice and Bob to distill secret keys.
2. Making assumptions on Eve's attack strategy, such as assuming that no more than two photons are being resent from Eve to Bob, can overestimate the true secret-key rate computed according to the analysis without making that assumption.
3. The spatial-temporal-mode-dependent mismatch helps Eve to attack the QKD system. Our results show that Eve's corresponding freedom to manipulate the detection efficiencies decreases the secretkey rate.
4. If there is no efficiency mismatch, then the secretkey rate does not differ whether we consider one or two spatial-temporal modes. Note that in this case the lower bounds b 1 and b 2 in Eqs. (21) and (19) are independent of the number of spatial-temporal modes, and so is the key-rate optimization problem in Eq. (2).

D. Key rates with passive-detection efficiency mismatch
As in the active-detection scheme, we consider two scenarios: In the single-mode scenario all photons received by Bob are in the same spatial-temporal mode, and the four detectors labelled by 'H', 'V ', 'D' and 'A' in Fig. 1(b) have efficiencies η 1 , η 2 , η 2 , η 2 respectively; in the four-mode scenario the photons received by Bob can stay in one of four possible spatial-temporal modes. The efficiency mismatch in the four spatial-temporal modes are shown as in Table II. In the security proofs, we again compare the flag-state squasher approach with the photon-number cut-off assumption. Note that for the case with one spatial-temporal mode, we apply a flagstate squasher that uses the photon-number subspace containing at most two photons and its complementary subspace, and at the same time we incorporate the lower bounds b 1 and b 2 in Eqs. (28) and (27). For the case with four spatial-temporal modes, instead we apply a flagstate squasher using the subspace of at most one photon, and exploit the corresponding bound on the probability in that subspace given by the bound b 1 . We do not use the tighter approach using the subspace containing at most two photons, due to the complexity of the corresponding key-rate optimization problem in the presence of four spatial-temporal modes. The typical results are shown in Fig. 8. Similar to the active-detection case, the results suggest that the larger the efficiency mismatch, the lower the secret-key rate is. When the efficiency mismatch is large enough, it is not possible for Alice and Bob to distill secret keys. The results also suggest that spatial-temporal-mode-dependent mismatch helps Eve to attack the QKD system.  Fig. 1(b). For data simulation, we fix the detection efficiency of the detector labelled by 'H' (for the signals stayed in the first spatial-temporal mode) to η1 = 0.2. We also fix the depolarizing probability ω = 0.05, the multi-photon probability r = 0.05, and the transmission efficiency t = 0.5 (corresponding to 3dB loss).
We remark that one cannot straightforwardly compare the robustness of the active-and passive-detection schemes against efficiency mismatch for distilling secret keys via Figs. 7 and 8. The reasons are as follows: First, there is no one-to-one correspondence between the two mismatch models given in Tables I and II, for the active-and passive-detection schemes respectively. Second, for spatial-temporal-mode-dependent mismatch, in the active-detection scheme we considered two spatialtemporal modes and used both of the lower bounds on the photon-number probabilities (p 0 + p 1 ) and (p 0 + p 1 + p 2 ). However, in the passive-detection scheme we considered four spatial-temporal modes and used only the lower bound on the photon-number probability (p 0 + p 1 ). The more constraints on the photon-number distribution, the higher the secret-key rate certified by our method is. We emphasize that here we have developed a general method for proving security of practical QKD protocols with efficiency mismatch. How to optimize our method and improve the secret-key rates certified will require future study.

V. CONCLUSION
The security proof of QKD usually assumes that the threshold detectors used have the same detection efficiency. However, in practice, their detection efficiencies can show a mismatch, either due to the manufacturing and setup, or the influence by an adversary (for example, by controlling the spatio-temporal-mode-dependent coupling of an optical signal with a detector). In this work we present an approach that allows to lower-bound the secret-key rate of a QKD setup with an arbitrary, but specified detection-efficiency mismatch. We formulate the key-rate calculation as a convex-optimization problem. In order to prove security without relying on a cutoff of photon numbers in the optical signal, we exploit the bounds on the photon-number distribution obtained from semi-definite programs (SDPs), and simplify the key-rate optimization problem by introducing a flag-state squashing map. The SDP optimization problems are formulated with the projections of measurement operators onto various photon-number subspaces. These projections and so the obtained key-rate lower bounds depend on the characterized efficiency mismatch.
We illustrate the power of our method with numerical simulations, demonstrating that our method can be numerically well handled even in the presence of spatialtemporal-mode-dependent mismatch. Our method is especially applicable to free-space QKD where spatialtemporal-mode-dependent mismatch can be easily induced by an adversary as demonstrated in Refs. [5,6].
Moreover, with our method, one can clearly see the individual effects of transmission loss and detection inefficiency on the secret-key rate (see Fig. 6). In the particular case of no mismatch, the simulation results show that our method provides a tighter lower bound on the secret-key rate than the squashing model [18][19][20]