Entanglement-Assisted Absorption Spectroscopy

Spectroscopy is an important tool for probing the properties of materials, chemicals and biological samples. We design a practical transmitter-receiver system that exploits entanglement to achieve a provable quantum advantage over all spectroscopic schemes based on classical sources. To probe the absorption spectra, modelled as pattern of transmissivities among different frequency modes, we employ broad-band signal-idler pairs in two-mode squeezed vacuum states. At the receiver side, we apply photodetection after optical parametric amplification. Finally, we perform a maximal-likehihood decision test on the measurement results, achieving orders-of-magnitude-lower error probability than the optimum classical systems in various examples, including `wine-tasting' and `drug-testing' where real molecules are considered. In detecting the presence of an absorption line, our quantum scheme achieves the optimum performance allowed by quantum mechanics. The quantum advantage in our system is robust against noise and loss, which makes near-term experimental demonstration possible.

In this work, we investigate entanglement-assisted absorption spectroscopy (EAAS), as an effective means to achieve a provable quantum advantage over all schemes using classical sources. As depicted in Fig. 1, EAAS uses a source of multichromatic entangled signal-idler mode pairs from a nonlinear media, each being in a two-mode squeezed vacuum (TMSV) state and anti-correlated in the frequency domain. The signals with different frequencies interact with the sample and experience absorption differently, while the idlers are stored locally. Then an optical parametric amplifier (OPA) is applied on the return signal-idler pairs, followed by photodetection to classify samples among a plural of possibilities. * zhuangquntao@email.arizona.edu EAAS achieves a strict quantum advantage in the discrimination of arbitrary absorption patterns. Before addressing the general case, we begin with two basic models: absorption detection-the binary testing of a single absorption line at a specific frequency, and peak positioning-pinpoint a given number of absorption lines within a frequency spectrum. Then we consider the classification of several large organic molecules, and use real spectrum data [40] to simulate the performance against the optimum classical performance in 'wine-tasting' and 'drug-testing'. Let us remark that all components in our EAAS are off-the-shelf, and the quantum advantage is robust against excess noise and idler storage loss, making experimental implementations possible in the near-term.
Pattern recognition on absorption spectra.-In absorption spectroscopy [26], each specific composition is associated with a unique absorption spectrum determined by measuring the transmissivities across the spectrum the input light. Therefore, the overall problem of composition identification can be formulated as a hypothesis testing between several known patterns of the frequencydependent transmissivities, as formulated below.
The multichromatic input light is decomposed into m discrete frequency modes, denoted by the annihilation operators {a } m =1 . The input-output relation for each mode a is modelled as a thermal loss channel L κ ,N B [1] described by the Bogoliubov transformation where κ is the transmissivity and e is a thermal mode with mean photon number N B /(1 − κ ) to model the environmental thermal noise, which is negligible (N B ∼ 0) at the optical wavelengths. However, to demonstrate the robustness of the quantum advantage, N B > 0 is considered for generality.  Figure 1. Diagram of the entanglement-assisted absorption spectroscopy. The source generates multi-chromatic entangled signal-idler pairs via a nonlinear process. The signals interact with the sample and then go through another nonlinear process jointly with the idlers at the quantum receiver. Photo detection extracts the absorption coefficients at different frequencies.
reveals the sample's absorption spectrum. We usually have prior information about the possible patterns, therefore the task is to discriminate between H patterns, each described by a vector κ (h) = {κ (h) } m =1 of transmissivities, where 1 ≤ h ≤ H is the index of the hypotheses and is the frequency-mode index (see Ref. [42] for a channel formulation). In general, we allow M repetitions of the probing attempt to make a decision.
Before addressing the general pattern-recognition problem described above(see Fig. 1), we consider two simplified problems: absorption detection and peak positioning.
In absorption detection, the goal is to determine whether absorption occurs at a single frequency mode (m = 1), therefore there are H = 2 hypotheses, with transmissivities κ B and κ T corresponding to the absence and the occurrence of absorption. In peak positioning, one aims to pinpoint a single absorption peak (target) within m frequencies, therefore we have H = m possible patterns. Each pattern h has a single absorption peak with transmissivity κ T for frequency mode a h while all other frequency modes see a background transmissivity κ B , i.e., κ (h) = κ T if = h, and κ B otherwise.
The problem of absorption detection can be generalized to finding the positions of k absorption peaks in a spectrum of m frequencies, which we call 'k-peak positioning'. In this more general problem, k targets with transmissivity κ T are hidden among m − k backgrounds of transmissivity κ B , so that we have a total of H = C k m hypotheses, where C k m is the binomial coefficient of mchoose-k. Note that, while we consider these simple examples to introduce our results, our methodology applies to the recognition of arbitrary patterns, such as the complex molecules considered at the end of this paper.
Classical lower bounds.-In a classical spectroscopy scheme, one sends an arbitrary mixture of coherent states as input state. Given mM N S mean total number of photons at the input, where N S is the average mean photon number per frequency mode, the minimum error probability affecting the discrimination between the ensemble Photon count Figure 2. Schematic of receiver. Signal beams are irradiated over the sample, modeled by frequency-dependent transmis- The modulated beams go through a single optical parametric amplifier (OPA). Finally spectrallyresolving photodection (PD) offers a 2mM -dimensional count, based on which the maximum-likelihood decisionh is made.
where ν B = 1/(1 + 2N B ) and the minimization is under the energy constraint m =1 X ≤ mM N S (See [42] for a proof). Applying Eq. (2) to the absorption detection case, we obtain the lower bound P C,1,LB = e −ν B M N S ( √ κ B − √ κ T ) 2 /4. In this case, a slightly improved bound can be obtained [4] Specifying Eq. (2) to the problem of k-peak positioning, one obtains [42] where w m,k = kC k m−1 /(C k m − 1). The latter term is equal to 1 for a single peak, and w m,k k(1−k/m) for k peaks. When N B = 0, the lower bound is tight in the error exponent for absorption detection and 1-peak positioning.
Entanglement-assisted strategy.-To achieve a quantum advantage, we exploit entanglement at the input, as given by M copies of a TMSV state φ M E for each signal-idler pair [42]. Each idler mode is stored locally, with imperfections modeleld as a pure-loss channel L κ I ,0 of transmissivity κ I (with a mode transformation a I → √ κ I a I + √ 1 − κ I v and v being a vacuum mode); while the signal modes are sent to probe the patterns. For the special binary case of absorption detection, the error probability is bounded by the asymptotically-tight quantum Chernoff bound (QCB), which can be efficiently calculated [2,3] from the return Gaussian states Ξ (T /B) composed of M identical copies of L κ T /B ,N B ⊗ L κ I ,0 (φ M E ). For the general pattern case, a simple tool like the QCB is missing and, for this reason, we need to design an explicit receiver that is able to show a quantum advantage. Entanglement-assisted receiver design.-We begin our description of the receiver design with a simple case so as to provide its basic modus operandi. Consider the ideal case of κ B = κ I = 1 and N B = 0. Then the returned state Ξ (B) = φ ⊗M M E consists of M copies of the ideal TMSV state (while Ξ (T ) is mixed because κ T < 1). Suppose that we perform a two-mode squeezing (TMS) operation S (via an OPA), that precisely anti-squeezes each TMSV state φ M E . Then we can 'null' S(Ξ (B) ) to tensor products of vacuum, while S(Ξ (T ) ) is non-vacuum. Therefore, a simple photon counting on all the signals and idlers after the TMS operation can identify the input state Ξ (B) if there is any photon count. Errors only occur if we obtain a zero count on S(Ξ (T ) ): when this happens, we can only guess randomly, with an error probability R m . Note that this nulling strategy has been used in classical schemes [42], whose performance is bounded by Eqs. (3) and (4). OPA has also been utilized in quantum illumination [52], however without exploiting correlations in the patterns.
Let us use a compact notation, where m = 1 corresponds to absorption detection, for which R 1 = 1/2; and m ≥ 2 corresponds to single-peak positioning, with one copy of S(Ξ (T ) ) among m − 1 copies of S(Ξ (B) ), so that R m = (m − 1) /m. Accounting for the zero counts, the error probability for absorption detection (m = 1) and single-peak positioning (m ≥ 2) is given by [42] When N S 1 and M 1, we have P E,m R m exp −2M N S 1 − √ κ T . Comparing this with the classical lower bounds in Eqs. (3) and (4), we see that M is chosen to fixed the classical lower bounds to 0.01 [42]. NS = 1 is assumed. Redcrossed diagonal region in (a)(b) represents the degenerate case κB = κT .
In fact, Eq. (5) achieves the QCB [42] and therefore it is optimal for absorption detection. The above receiver design, and the resulting entanglement advantage, can be generalized to cope with more complex spectrum patterns and the presence of noise and idler loss (N B > 0, κ I < 1), as described by the following strategy (see Fig. 2): (i) Apply TMS operation with gain G to each of the return signal-idler pairs a S , a I to obtain new modes Perform photon counting measurement on all signal and idler modes {a S , a I }'s to obtain the results as two vectors n S and n I . (iii) Finally, apply maximum-likelihood (ML) decision rule, i.e., make the decisionh through where P m (n S , n I |h) is the conditional probability of obtaining the outcomes n S , n I if the true hypothesis is h.
To complete the description of our receiver, we need to determine the gain G 's and specify the conditional probabilities. Let us begin with the cases of absorption detection and peak positioning, where we adopt uniform gain G = G; the ideal situation is to get a quantum state close to vacuum; however, if κ B < 1, it is only possible to reduce the signal part of Ξ (B) to vacuum, by choosing G = 1 + N S κ B /(1 + N S (1 − κ B )). In the presence of noise N B > 0 and idler loss √ 1 − κ I > 0, 'nulling' to vacuum is not possible but the same choice of gain still provides an appreciable advantage over classical schemes. For general patterns, due to the absence of symmetry we consider optimization over the gain G 's at different frequency modes. Moreover, as some frequency windows may contain more essential information about the hypotheses, we also allow the optimization over the energy distribution {N S } of the TMSV in different frequency modes. In these cases, although the 'nulling' decision rule does not apply, the ML decision rule in Eq. (6) still leads to an advantage [42]. Now let us compute the conditional probabilities. With M identical repetitions, the probability of obtaining the mM -dimensional measurement results n S = {n S L ,k } M,m L=1,k=1 and n I = {n I L ,k } M,m L=1,k=1 , conditioned on pattern h, is where each term is a function of the subsystem transmissivity κ (h) , the TMSV source energy N S and the gain choice G [42].
With all these theoretical elements in our hands, we can numerically evaluate the error probability P E,m for the problems of absorption detection, peak positioning and general spectrum recognition via Monte Carlo simulations [42]. Although we consider equal priors for simplicity, our ML decision can generally be applied to arbitrary prior probabilities for the patterns.
Detecting and positioning absorption peaks.-In order to investigate the problems of absorption detection and peak positioning, we assume a background transmissivity κ B = 0.95 and a target transmissivity κ T = 0.75. In particular, we study their error probabilities in terms of the number of modes M . For absorption detection, Fig. 3 (a) shows that our EA nulling receiver asymptotically achieves the QCB [4], outperforming both the best known receiver, the EA homodyne receiver [3,42], and the classical lower bound of Eq. (3). In fact, we can verify that our receiver can asymptotically saturate the QCB for absorption-detection with general choices of κ B and κ T [42]. For peak positioning, as shown in Fig. 3(b), our EA receiver is able to outperform the classical lower bound of Eq. (4) by orders of magnitude.
In a practical scenario, we are interested in how much EAAS can enhance the performance, when classical schemes fail to perform well. To showcase the advantage, in Fig. 4, we fix the classical lower bounds in Eqs. (3) and (4) to be 0.01 and plot the error probability P E,m of EAAS. We start with tuning the transmissivities κ B and κ T in Fig. 4(a)(b). Then we fix κ T = 0.75 and κ B = 0.95 and study how the quantum advantage varies with idler loss 1−κ I and noise N B in Fig. 4(c)(d). The white dashed ( e ) T r a n s mi s s i v i t y lines divide the parameter space with/without quantum advantage. We can see that the advantage is remarkable, and also survives for a large range of parameters, especially when κ B 1 as in practice. The robustness of the advantages to imperfections reveals a clear possibility for a near-term experimental demonstration. See [42] for more parameter settings.
General spectrum recognition.-EAAS can also identify actual molecules, each of which is associated with a unique absorption spectrum. As a taste of flavor, we begin with 'wine-tasting'-where one discriminates three common alcohol-like liquids. Methanol could be lethal if mistaken for ethanol (alcohol). Meanwhile, the alcohol, as time goes by, will be dehydrogenated to ethanal, whose concentration provides the age of a vintage [53]. To consider larger molecules, the second example, 'drugtesting', involves three drugs: phenyl salicylate, methyl salicylate, and benzoic acid. In both examples, a nondestructive testing method is preferred, as we shall conduct with the extremely weak quantum light source. The transmissities are taken from real Fourier-transform infrared (FTIR) spectra [40]. These spectra are sampled by averaging them within each of m = 4 frequency slots [42].
As the classical benchmark, we calculate the ultimate lower bound using Eq. (2) and the performance of a ho-modyne receiver on coherent-state input with the same energy distribution (distribution of mean photon number over frequency modes) optimized in Eq. (2). Fig. 5 shows that EAAS with uniform energy distribution and G = 1 (orange) outperforms the homodyne receiver (black solid) in both cases. Then, in drug-testing, EAAS beats the classical lower bound by orders of magnitude, while, in wine-tasting, this advantage is less pronounced. This is mainly due to the classical lower bound being not tight, and uniform energy being sub-optimum, as we see EAAS with energy optimization (purple) enables much better advantages. Although gain optimization only leads to a slight advantage over the energy-optimized EAAS, as evident in the inset plots of Fig. 5(c)(f). In the noisy case, it enables a much better enhancement [42]. Now we address phase noise common in experiments. Phase tracking can typically eliminate the time-invariant phase noise, so the above results directly hold; when phase tracking is not possible, we can model the phase noise by adding a → e iθ a in Eq. (1). The random phase θ clearly complicates the problem. However, if we choose uniform G = 1 (i.e., not applying OPA before photodetection), the same results of the orange curves in Fig. 5(c)(f) hold, and the classical performance can only be worse than the current benchmarks (black). Thus, the quantum advantage sustains.
Conclusion.-We have devised a near-term feasible EAAS scheme that outperforms any classical strategy in determining the presence and position of spectral absorption peaks. The EAAS scheme saturates the QCB in binary detection of a single absorption line and offers orders-of-magnitude advantage in error probability in the discrimination of sampled spectra of molecules even in the presence of experimental nonidealities.

I. CHANNEL SET-UP
In the main paper, we mainly utilize input-output relations to describe the quantum channels, here we make the quantum channel notations explicit.
The overall H patterns being discriminated can each be modelled as quantum channels {J (k (h) ) } H h=1 act on m subsystems. They are given by is associated with M probings of subsystem S and L κ,N is a thermal loss channel with transmissivity κ, noise N .
In the absorption detection case, H = 2, m = 1, and we denote the two channels as The transmissivities κ B and κ T correspond to the absence (background channel Φ (B) ) and the occurrence (target channel Φ (T ) ) of absorption.
In the peak positioning case, we have H = m possible global channels, where J (k (h) ) has a target channel Φ (T ) at subsystem S h while the rest are transparent backgrounds Φ (B) . The corresponding pattern is therefore described by κ (h) = κ T if = h, and κ B otherwise. In the general pattern case, we consider discrimination between the ensemble of channels where D κ I = L κ I ,0 ⊗M and L κ I ,0 is a noiseless lossy channel modelling the imperfections on each idler system I . Then the return state for each sub-channel is Ξ For the case of target/background channels, the return states

A. Covariance matrix derivation
First, we briefly introduce the notion of Gaussian states [S1], whose Wigner functions have a Gaussian shape. An n-mode Gaussian state ρ comprising modes a k , 1 ≤ k ≤ n, is fully characterized by the mean and the covariances of real quadrature field operators Formally, we can define a real 2n-dim vector of operators x = (q 1 , p 1 , · · · , q n , p n ), then the meanx = x ρ and the elements of the 2n-by-2n covariance matrix are given by where {, } is the anticommutator and A ρ = Tr (Aρ).

An important example of Gaussian state is TMSV, given by the wave-function
where |n is the number state. From the above wavefunction, we can obtain the covariance matrix of a TMSV as where I, Z are two-by-two Pauli matrices, and C 0 = N S (N S + 1) is the amplitude of the phase-sensitive cross correlation.
We utilize multiple copies of the signal-diler pair {a S , a I } in a TMSV state to probe the sample. To begin with, we consider the case with no phase noise, where each signal goes through the channel L κ S ,N B , giving the output where e mode is in a thermal state with mean photon number N B /(1 − κ S ). While the idler mode a I goes through a pure loss channel L κ I ,0 .
where the environment mode v is in vacuum state. The state ρ Λ0(κ S ) of each of the signal-idler pair {a S , a I } has the covariance matrix with the signal-idler cross correlation C p = κ S κ I N S (1 + N S ). In the case of absorption detection, we have κ S = κ B or κ S = κ T . With each Gaussian mode-pair of the return state specified by Eq. (S14), we can obtain the QCB on the error probability of the binary hypothesis testing through methods in Ref. [S2], The QCB is asymptotically tight as the mode number M → ∞ and can be efficiently calculated in our case [S2, S3].
On the receiver side, we apply a two-mode squeezing process, parametrized by the gain G ≥ 1, to obtain returned signal-idler mode pairs {a S , a I } They are in a Gaussian state with covariance matrix (S18) Denote G = 1 + N S with the effective photon number N S , we have the variances of signal and idler and the signal-idler correlation as given in the main paper, we have This means that the signal mode becomes vacuum under this choice of the gain. We will choose this gain in the absorption detection and peak positioning cases, with the exception in Sec. VII B, in which we optimize the gain. For the general pattern recognition for molecules, we will consider both uniform gain G = 1 and consider optimization over gain.

B. Photon number statistics
We consider photon counting on each signal-idler pair {a S , a I } in a Gaussian state with the covariance matrix in the form of Eq. (S18), which has zero phase-insensitive cross-correlation a S a † I = 0 and non-zero phase sensitive cross-correlation a S a I = 0. The corresponding joint probability of obtaining results n S , n I is given by where F R is the regularized hypergeometric function and Moreover, for states with covariance matrix equal to Eq. (S18) up to a phase rotation on any mode, the photon number statistics is also given by Eq. (S24). In our hypothesis testing protocols, P (n S , n I ) is fully determined by N S , N S , κ S , √ 1 − κ I , N B , where the last two √ 1 − κ I , N B are the fixed environment parameters, idler loss and thermal noise. Given signal mean photon number N S , our receiver, parametrized by the gain G = 1 + N S , is to jointly discriminate the photon count distributions per slot {P (·, ·|κ S , G, N S )} associated with different signal channel transmissivity patterns, conditioned on the same parameter setting N S , κ I , N B . Here we make the dependence of Eq. (S24) on κ S , G, N S explicit.

C. Conditional probabilities
In the main paper, we give the conditional probability of obtaining the mM -dimensional measurement results n S = {n S L ,k } M,m L=1,k=1 and n I = {n I L ,k } M,m L=1,k=1 conditioned on pattern h in Eq. (7) of the main paper. Here we also print it for reference , also the gain distribution G = 1 + N S and the energy distribution N S optimized probalistically. For the scenarios with strong symmetry, e.g. the peak positioning case, the optimal gain and energy distributions may be trivially uniform due to the symmetry.
For the absorption detection case, m = 1 and H = 2, the two hypotheses are . Furthermore, with total energy being limited, the degree of freedom of N S is frozen. Eq. (S27) reduces to Similarly, for the single-peak positioning case, H = m > 1, each hypothesis h ∈ [1, m] corresponds to the position of the target channel. Noting the symmetry in this scenario, we choose the uniform energy and gain distributions. Still, the degree of freedom N S is frozen and that of gain is limited to one. Eq. (S27) reduces to P m (n S , n I |h) In general, one can tune the energy and gain distributions among the different slots to approach the optimal performance.

D. Limiting cases of the photon statistics
To enable numerical simulation, we also need to deal with term-wise divergence in Eq. (S24), e.g. the hypergeometric function F R could diverge. Since the probability is normalized, all divergences are in fact cancelled out by pairing infinite terms with infinitesimal terms. This must be done in an analytical way before numerical calculations.
The divergence comes from special values of z = 4C 2 /XY , as summarized in the following two cases: for n I ≥ n S or 0 otherwise. Two-mode squeezed vacuum is among this case, by taking E, S = 1 + 2N S , C = 2 N S (N S + 1), we find X = Y = 0, which makes the above expression non-zero only when n I = n S , and indeed we have P (n S , n I ) = N n S S /(1 + N S ) n S +1 δ n S −n I . A special scenario of case 1 happens when z = 0 0 . At this moment X or Y is zero, combined with C = 0. To avoid singularity C −2(1+n I ) in numerical calculation, we take the C → 0 limit with Eq. (S31), which yields (S33) for Y = 0. Note that first case corresponds to signal mode thermal statistics with mean photon number N S = (−1 + E)/2; the second case corresponds to idler mode thermal statistics with mean photon number N I = (−1 + S)/2.
In this case F R is infinite. Indeed it always comes with −1 + C 2 + E + S − ES → 0 which cancels the singularity. We have (S36)

III. ENTANGLE-ASSISTED HOMODYNE RECEIVER
We give a brief summary of the best known receiver design for quantum reading, a Bell-measurement receiver proposed in Ref. [S4], which we utilized to benchmark in Fig. 3(a) of the main paper. The returned signalidler pair travels through a beamsplitter, which yields output modes a + = (a S + a I )/ √ 2, a − = (a S − a I )/ √ 2. Then homodyne measurements are operated on the two With a large identical mode number M , the receiver design constructs χ 2 test variable θ = M L=1 p 2 +,L + q 2 −,L , which comes with the probability density function where Γ(·) is the Gamma function. Under maximumlikelihood decision, the error rate is Here Γ(·, ·) is the incomplete Gamma function Generally, the Bell receiver, extracting the quantum correlations by interfering the signal and the idler with a beamsplitter, yields measurement results with a higher signal-to-noise ratio, thereby it has a significant advantage over a direct homodyne measurement in the classical scenario. Similarly, the nulling receiver discussed in the main text exploits the quantum correlation with the assistance of an OPA, which yields a significant advantage on a photon number resolving detector. In the following, we calculate the performance of a homodyne detector assisted by OPA, where we find that the advantage remains but is undermined. Now we consider a homodyne receiver, measuring the quadratures of the signal-idler pair nulled by an OPA with gain similarly defined by Eq. (S22). After the OPA, each of the two-mode signal-idler pair is in a Gaussian state with the covariance matrix given by Eq. (S18). For homodyne measurements, one can choose to measure the position q S or momentum p S quadrature of the signal mode and q I or p I for the idler mode. The two outcomes, defined as x, y, are Gaussian random variables characterized by the marginal covariance matrix. Observing the symmetry, we see that the measurement on {q S , p I } has the same statistics with {p S , q I }, ditto for {q S , q I } with {p S , p I }. Upon obtaining all measurement results across all mode-pairs, we utilize maximum likelihood estimation (MLE) to make the decision. In general, the error rate of this OPA-assisted homodyne receiver has no closed form.
To provide an outline of the practical performance, we give the numerical result of the ideal case κ B = 1. In this case, the covariance matrix for the background channel is Λ (B) = I. We find it possible to eliminate the correlation between the two measurement outcomes {x, y} per copy by a linear combination that diagonalizes the covariance matrix, mapping {x, y} to {x , y }. X 2 as a weighted sum of two chi-square distributed random variables X 1 , X 2 ∼ χ 2 (M ). The corresponding distribution of s under hypothesis h is a generalized chi-square distribution, which can be numerically calculated. Then, the error probability of MLE is numerically obtained by an numerical integration. Fig. S6 compares the Bell receiver with the OPAassisted homodyne receivers and the classical limit, which illustrates the superiority of the Bell measurement. Note that the homodyne receiver measuring {q S , p I } fails to achieve an advantage over the classical at this moment. This is due to the absence of the quantum correlation between q S , p I . Above all, considering its supremacy among the homodyne receivers, we compare our design only with the Bell receiver in the main text.

IV. ERROR PROBABILITY LOWER BOUND OF CLASSICAL PATTERN RECOGNITION
A. Single-mode phase-insensitive Gaussian channels As explained in Ref. [S5], the action of a single-mode (covariant) phase-insensitive Gaussian channel over input quadraturesx = (q,p) T can be represented by the transformationx → √ µx + |1 − µ|x E + ξ, where µ is a transmissivity (0 ≤ µ ≤ 1) or a gain (µ ≥ 1),x E are the quadratures of an environmental mode in a thermal state with noise variance ω = 2N + 1, with N being the mean number of photons, and ξ is additive classical noise, i.e., a random 2-D Gaussian distributed vector with covariance matrix w add I. Here we assume vacuum shot noise equal to 1.

B. Ultimate lower bounds for general patterns
Consider the general case of hypothesis testing between H different patterns of composite channels each acting on m subsystems {S } m =1 . Here each subsystem consists M modes and each mode goes through a single-mode phase-insensitive bosonic Gaussian channel Φ µ (n) ,E (n) . Therefore, a global channel E n is specified by . Lemma 1 Consider classical states (with positive Prepresentation) as the input, assuming a global energetic constraint of mM N S mean photons with M modes irradiated over each of the m subsystems S k , the error probability of equal-prior hypothesis testing between H channels E n in Eq. (S38) is lower bounded by wherē The constants Corollary 2 (classical channel-position finding, originally derived in [S5]) When H = m, and the pattern recognition problem corresponds to channelposition finding with a target channel Φ ⊗M µ T ,E T among (m − 1) background channels Φ ⊗M µ B ,E B , where the above bound in Ineq. (S39) reduces to In particular, for no passive signature (E T = E B ≡ E), we have the simplification The corollary is easy to obtain from is zero unless = n or = n , therefore we have which easily leads to Eq. (S49).
Corollary 4 (classical channel-position finding with multiple target channels) When there are k target channels Φ ⊗M µ T ,E T among (m−k) background channels Φ ⊗M µ B ,E B , we have H = C k m patterns. For this case, the lower bound in Eq. (S39) reduces to For no passive signature (E T = E B ≡ E) case, the above bound in Ineq. (S39) reduces to Remark 5 Similarly, the corollary is easy to obtain from Lemma 1. First, one realizes that due to the symmetry, the maximization in B is achieved by an arbitrary , e.g. we let = 1. And then the non-zero contribution to Eq. (S41) only comes from patterns with different channels on subsystem = 1. Slight simplification of Eq. (S54) leads to Eq. (4) of the main paper.
Proof. For the convenience of analysis, we will parameterize a coherent state |α with the phase and amplitude squared, i.e., |x, θ ≡ √ xe iθ , where x ≥ 0 and 0 ≤ θ ≤ 2π. In this notation, a multi-mode coherent state over the entire system takes the form is again a tensor product of multiple modes with generally-different amplitudes. Here x are positive and real vectors x = (x (1) , · · · , x ) ≡ {x ( ) } M =1 and x is a simple concatenation of them, i.e., x = (x 1 , · · · , x m ).
In this notation, the general classical state as the input can be written as a Lebesgue integral where the probability measure P over x, θ can be arbitrary. Let us define which is the standard one-norm and equals the total mean photon number of the state |x, θ . Then, the total energy constraint leads to the inequalitŷ where the integral has been simplified to a marginal probability measure P restricted to the non-negative variables x. The total conditional state at the output of the channel E n is also a mixture, with expression where each conditional state is given by The state (ρ C µ (n) ,E (n) ) S is a product of M displaced thermal states, each with amplitude µ (n) x ( ) e iθ ( ) and covariance matrix (2E (n) + 1)I. We use the fidelity-based lower bound of Helstrom limit [S6], From Eq. (S60), we can write the following lower bound to the mean error probability. Consider the equal prior case for simplicity.
where use the joint concavity of fidelity and Jensen's inequality for the square function with K = (H − 1)H/2. Let us now address each fidelity term Using Gaussian fidelity formula [S1], we can compute where the constant (S67) Note that B (n,n ) > 0 and c (n,n ) ≤ 1. From the onenorm in the expression above, it becomes clear that the performance is exactly the same regardless how the energy is distributed among the M modes impinging on a subsystem, as long as the mean total energy irradiated over the subsystem is fixed. Therefore we have where we defined C n,n = m =1 c (n,n ) M/2 . By replac- ing the F C n,n in Eq. (S61), and noticing that F C n,n does not depend on θ we find the following lower bound where we define the function We can use the convexity of e −cx (with c > 0) and Jensen's inequality to move expectation value to the exponent where X =´dP x 1 . Note that equality is only achieved when P corresponds to a delta function. Overall we want to solve the minimization under constraint Ineq. (S57) f = min : Then the lower bound would be This gives the lower bound in Eq. (S42).
Below we obtain a further lower bound.
where we have introducedc = n >n C n,n 1/K . We It is easy to check that the lower bound can be reached only if is independent of n, n . This is not always possible and therefore the lower bound is only achievable in certain symmetric cases. In symmetric cases, when max 1 K n >n B (n,n ) are equal for all , then one can evenly distribute the energy to achieve the lower bound.

V. CLASSICAL STRATEGY: NULLING RECEIVER FOR COHERENT STATE INPUT
Refs. [S7, S8] have conducted a comprehensive analysis on the classical nulling receiver in the noiseless scenario N B = 0. We briefly summarize the pertinent conclusions here, and compare the performance with our lower bound. The channel setup in the classical scenario is almost identical to that in the quantum scenario, the only difference being that the source is classical, e.g. coherent states generated by a laser.

A. Absorption Detection
In this scenario, the task is to distinguish the target channel with transmissivity κ T with the background channel κ B . Given the source state being M copies of coherent state √ N S with total mean photon number M N S , the optimal error rate is tightly bounded by the Helstrom limit [S4] Dolinar has proposed an adaptive receiver [S7] that reaches the Helstrom bound. In this case, the improved classical ultimate lower bound Eq. (3) in the main text coincides with this bound, which results from the fact that encoding with two pure states instead of an ensembles, i.e. concentrating the source energy on a single amplitude level, has achieved the optimum.

B. Peak Positioning
Consider single-peak position among m slots. We categorize the nulling receiver into unconditional nuller and condition nuller. The unconditional nuller, which motivated our entanglement-assisted design in the main text, applies identical displacement D( √ κ B N S ) to all m modes. In the noiseless case N B = 0, the return is in coherent states. The uniform displacement nulls the returned copies through the background channels to vacuum and those through the target channel to yet a coherent state with displacement √ κ T N S − √ κ B N S . Then we apply photon counting on every mode, which immediately identifies the target channel if any click is detected. The error only occurs when the coherent state associated with the target channel yields zero photon count with the false-negative error rate p = exp −( for source mean photon energy M N S . The error rate of unconditional nuller is then (S83) The conditional nuller, however, applies a mode-bymode sequence of displacements dependent on the prior measurement results. Specifically, we null the first mode and measure its photon count. If no photon is detected at the first mode, our hypothesis that the target channel be at the first mode is partially confirmed, and we forgo the nulling on the remaining modes unless any photon is detected in the subsequent measurements. Note that in this noiseless case the false positive error rate is zero, any nonzero photon count is a conclusive evidence in favor of rejecting the current hypothesis for the currently measured mode, which immediately gives the conclusion if the rejected hypothesis is 'background'. On the other hand, if any photon is detected at the first mode, the target hypothesis is conclusively rejected, we move forward to the next hypothesis that the target channel be at the second mode. In sum, the error only occurs if both measurements on the target channel and the background channel mistook by target yield false negative errors. By iteration the relation P C,m = [(1 − p)P C,m−1 + p 2 ] × (m − 1)/m, the error rate of the conditional nuller is When M 1, we have In this limit, the conditional nuller loses to the classical lower bound Eq. (S54) (here E = 0) merely by a constant factor m, achieving the bound in the exponent indeed. Fig. S8 compares the two nulling receivers with the classical lower bound Eq. (4) in the main text. It is verified that the unconditional nuller is overwhelmed by the conditional nuller. Furthermore, the latter is shown close to the classical lower bound in the decaying rate, which never achieves the bound though as expected. Note that in (c) the QCB and classical bound coincidentally get close when κI = 1, this is due to finite M and they will deviate as M further increases.

VI. ANALYTICAL SOLUTIONS OF THE ERROR PROBABILITY
Here we present analytical solutions for the error probability in absorption detection and single-peak positioning, when κ B = κ I = 1, N B = 0.

A. Absorption detection
In this case, Eq. (S22) yields the gain G = 1 + N S . At this moment the squeezing S is exactly the inverse of the two-mode squeezing operation that creates the TMSV state from vacuum. From the covariance matrix Λ (κ B ) in Eq. (S23), the background signal Ξ (B) is nulled to the vacuum state, a pure state capable to be discriminated with zero error rate. Applied on the photon statistics P m (n S , n I |h), the maximum-likelihood decision rule accepts the hypothesish = T for all the photon count results except for zero counts, which leads to the hypothesis toh = B. In this case, the error only happens when the nulled target-present state S(Ξ (T ) ) yields no photon count. Hence the error rate is P E,1 = P m (0, 0|T )/2. Given the transmissivities of background and target present channels κ B , κ T , and the mean photon number constraint on the source N S , we have the covariances Λ (κ T ) of each pair in S(Ξ (T ) ) given by Eq. (S18) with Thus where P (n S , n I ) is given by Eq. (S24). Finally (S91)

B. Peak positioning
The simplest case is when κ B = 1, which is analytically solvable. First, the receiver nulls all the returned slot by uniformly applying the squeezing process S with the same N S = N S . Then the photon number per slot is measured. The slot with any click detected is the targetpresent slot. The errors only occur when no photon is detected among all slots. At this moment we randomly guess to make a decision. Hence the error rate is with the same E, S, C as in the single slot case. Similarly we have

C. Numerical verification of the formula
The solutions in Eq. (S91) and (S93) can be summarized into a unified formula, as done in Eq. (5) of the main paper.
The solvable case requires the ideal channel with κ I = 1, N B = 0. As shown in Fig. S7(a), the analytical formula in Eq. (S91) fits the numerical results well in the absorption detection case and achieves the QCB. As shown in Fig S7(b), the analytical formula in Eq. (S93) fits the numerical results well for the peak positioning case.

A. Monte Carlo simulation
In the main paper, we evaluate the performance of EAAS through Monte Carlo simulations. First we randomly choose the true hypothesis h 0 and simulate the random outcomes {n} measured from the quantum state corresponding to h 0 . Based on the measurement results, the conditional probabilities p(n|h) are generated accordingly and we make the maximum likelihood decisionh = argmax h p(n|h). Note that when there are multiple maximum hypotheses with equal probability, we make a random guess among them. Finally we figure out the frequency ofh = h 0 as an estimation of the error probability P E,m of EAAS.
The sample size, i.e., number of total simulations, determines the precision of the estimation. According to the central limit theorem, larger sample size yields lower estimation variance σ 2 M C in inverse proportion. However, we have to balance the computation cost while targeting at higher precision results. As a result, a variety of sample sizes are chosen according to the specific scenarios in the main paper: Fig. 4

B. Optimal gain
Indeed, in all ideal cases with no noise nor idler loss, the gain G 0 = 1 + N S0 nulling the signal to vacuum is given by Eq. (S22). However the optimal gain that minimizes the error rate is not necessarily the same. We denote the optimal gain G (opt) . Indeed, numerical results show that with most κ B , κ T values under the ideal parameter N B = 0, κ I = 1, G 0 saturates QCB when M N S is relatively large, i.e. optimal gain G (opt) = G 0 . When M N S is small, G (opt) may deviate from G 0 and fluctuate in the neighbourhood.
When the idler storage efficiency κ I < 1, although impeded from nulling the signal mode to vacuum, intuitively we expect to minimize the mean photon number of it.
This requires another setting G min =  argmin G a † S (G)a S (G) different from G 0 in Eq. (S22). We numerically compare the performances for a wide range of the gain, including G 0 and G min , setting M such that the classical optimum P C,1,LB = 0.1, we see the  decay rate of quantum advantage depends on the gain G shown in Fig. S9. Contrary to the intuition, optimum sticks around G 0 in most of cases, as the performance saturates beyond it (in the presented cases G min < G 0 ). In the N B = 0 case (Fig. S9 (e)(f)), when κ I < 1, the optimum can be between G min and G 0 , however, the optimum error probability is only slightly smaller then that of G 0 .

C. Saturating the QCB for absorption detection case
We further explore the saturation of QCB in the absorption detection case. Since QCB is only asymptotically tight, we compare the error exponents as and that of the QCB We also compare the classical error exponent In Fig. S10(a)(b), we plot the error exponent ratio R E /R QCB . To guarantee that we are considering the asymptotic region, we choose the mode number M such that the QCB is around 10 −5 in all data points. We see that the error exponent ratio r = R E /R QCB ∼ 1 in all parameter region of κ B , κ T , verifying the optimality of the EA receiver in absence of noise (N B = 0, κ I = 1). Moreover, in Fig. S10(c)(d) we plot the error exponent ratio R E /R C in the same parameter region, where we see advantage in almost all the parameter region of κ T , κ B . Furthermore, the entangled error exponent can have a factor of ∼ 8 larger than the classical error exponent, showing a great advantage. Note that the expected monotonicity with respect to transmissivities breaks at few points, e.g. (κ T , κ B ) = (0.6, 0). This is due to the discreteness of M when M is small [even ∼ 1 at (0.6,0)], which results in a relative sharp change. Indeed the classical lower bounds P C,1,LB , expected to be fixed by the proper choice on M , fluctuates around 0.1 in these nonmonotonic areas.

VIII. CHARACTERIZING THE QUANTUM ADVANTAGE IN ABSORPTION DETECTION AND PEAK POSITIONING
Here we present more results related to Fig. 4 of the main paper.
In a practical scenario, we are interested in how much EAAS can enhance the performance, when classical schemes fail to perform well. To showcase the advantage, we fix the classical lower bound to be 0.1 or 0.01 and calculate the error probability achievable with EAAS.
In Fig. S11, we plot the error probability P E,m for absorption detection as a function of the transmissivities κ B and κ T , while comparing it with the classical lower bounds in Eqs. (3) of the main paper. The white dashed lines divide the parameter space with/without quantum advantage. From the figure, we can see that the advantage is remarkable (several orders of magnitude for κ B , κ T 1) and also survives for a large range of parameters. In practice, when κ B 1, we find a quantum advantage for all values of κ T for the problem of absorption detection.
In Fig. S12, we plot the error probability P E,m for 100-peak positioning as a function of the transmissivities κ B and κ T , while comparing it with the classical lower bounds in Eq. (4) of the main paper. From the figures, we can see that the advantage is remarkable (several orders of magnitude for κ B , κ T 1) and also survives for a relatively large range of parameters. In practice, when κ B 1, we find a quantum advantage for κ T > ∼ 0.2 for 100-peak positioning. The region permitting quantum advantage is slightly smaller than the absorption detection case, which we think is due to the fact that the classical lower bounds in Eq. (4) of the main paper is looser than Eq. (3) of the main paper.
In the analysis above, we have assumed noiseless (N B = 0) and lossless (κ I = 1) idler storage. In experimental practice, the presence of such noise and loss is inevitable. Therefore we also study how the quantum advantage varies with idler loss √ 1 − κ I and noise N B in Figs. S13 and S14 for N S = 0.1 and N S = 1. Again we fix the classical lower bound and plot the error probability of EAAS for κ T = 0.75 and κ B = 0.95. Regions on the left-down side of the dashed lines show quantum advantage. We see that the advantage is robust against idler loss and channel noise.

IX. DETAILS ON GENERAL SPECTRUM RECOGNITION
Here we give the transmissivities data that are used in our calculations of wine-tasting and drug-testing.
For the wine tasting, there are H = 3 moleculesmethanol (h = 1), ethanol (h = 2) and ethanal (h = 3). To discretize the spectra so that we can perform numerical simulation, samples are tested at four frequency slots 500, 1050, 1400, 1800 cm −1 (wavelength 20, 9.5, 7.1, 5.6µm), with the transmissivities averaged in the span of ±100 cm −1 . The discrete spectrum can be described by κ (S97) Here each column gives the transmissivities of κ (h) for a fixed hypothesis h = 1, 2, 3. The four rows in each column corresponds to the m = 4 spectrum slots.
For the drug-testing case, the molecules are phenyl salicylate (h = 1), methyl salicylate (h = 2), and benzoic Figure S15. Logarithmic scaled learning curve of the error rate optimization over the source energy distribution in the noiseless wine tasting scenario. Average mean photon number per mode 4 =1 N S /4 = 1, noise NB = 0. We find the energy concentrates on the first frequency slot when M is small, and the fourth frequency slot when M goes sufficiently large. It verifies the intuition that suggests the energy be concentrated on the slot with the largest error exponent which dominates the decay of error rate.
We evaluate the performance of our receiver through numerical simulations, with optimization over energy and gain. We generate the measurement results through Eq. (S27), and perform maximum likelihood decision. And the error rates are obtained by calculating the frequencies of error happening.
Noticing that the general patterns in Eq. (S97) and (S98) break the symmetry between different (frequency) slots, symmetric strategies with uniform pa-  rameters over all slots are unlikely to be the optimum; therefore we expect optimizing the parameters, including the source mean photon numbers N S = [N S1 , N S2 , N S3 , N S4 ] T and the receiver gains G = [G 1 , G 2 , G 3 , G 4 ] T over the four frequency slots, to achieve further quantum advantages. Considering the large dimension of parameters in the future application with more frequency slots, we apply the constrained simultaneous perturbation stochastic approximation (SPSA) algorithm [S10, S11] to accelerate the optimization. As a variation of stochastic gradient descent methods, SPSA does not guarantee to always find the global optima. First, we will perform optimization over the energy dis-tribution, and then we will further optimize over the gain. We find that in general energy optimization leads to improved quantum advantages, gain optimizations offers slight improvement in the noiseless case and much better improvement in the noisy case.

A. Energy distribution of the source
In the noiseless scenario, Figs. S15 and S16 show the learning curve of the optimization over the source energy distribution, with the average mean photon number per mode fixed to  both figures, we see that the optimal energy distributions are far from uniform as expected. We achieve an appreciable one-order-of-magnitude improvement when M is large, which guarantees the advantage over the classical lower bound in Fig. 5 of the main text.
As to the energy distribution, the classical lower bound in Eq. (2) of the main paper gives a pretty good intuition for the optimization, although the lower bound itself is likely to be loose. Specifically, the classical bound consists of a few exponential terms with respect to the difference between the transmitted energys ( √ κ i − √ κ j ) 2 M N S through two channels i, j, and the term with the largest error exponent dominates the performance asymptotically when M is sufficiently large. Hence a fair guess of the optimal strategy is to allocate most of the energy to the slot associated with the dominant term. However, in Fig. S15(b), we see that the energy tends to concentrate on the first slot when M increases, which is different from the classical optimum (the second slot). Note that the transmissivities of the first slot are relatively higher and closer to unity, this divergence from the classical prediction is likely due to the gap between the quantum scenario here and the conventional classical region, viz. that the entanglement-assisted advantage over the classical optimum surges as the transmissivity increases close to unity. This can also be seen in Fig. S16(b), the energy was scattered on the four modes without any dominant slot, instead of being concentrated on the second slot as predicted by the classical optimum. The illustrated differences from the intuition based on classical schemes demonstrate the novelty of the entanglement-assisted scenario again.

B. Gain distribution of the receiver
With the same noiseless environment setting, we optimize the gain distribution on top of the optimal energy distribution achieved above. Figs. S17 and S18 give the learning curve of the optimization on the receiver gain distribution. The gains are restricted per slot by G − 1 ≤ 4N S = 4, as the gain of OPA is limited by the nonlinearity of the crystal in practice. Compared with the optimization on energy distribution, we see a slight improvement by further optimizing the gains in this scenario.
The optimization of the gain distribution is more challenging than that of the energy distribution. As shown in Fig. S9 (e-h) for the binary hypothesis-testing case, the gradient of error rate with respect to the gain is close to zero almost everywhere. A large fluctuation is present in the convergence process due to the small gradient-tonoise ratio.
By contrast, gain optimization is likely to achieve a significant improvement in the noisy scenario. To demonstrate the influence of noise, we include a uniform thermal background with mean photon number N B = 0.1 into the channel. Following the same procedure, we first optimize the energy distribution for the noisy case. Here the presence of noise leads to a sharp increase in the computational complexity of the photon statistics. Consequentially, we compromise the average mean photon number per mode down to 4 =1 N S /4 = 0.1. Meanwhile, we correspondingly scale the range of copy number M of interest by a factor 10. Fig. S19 illustrates the gap between the different gain settings. In both the wine-tasting and drug-testing cases, we see a substantial advantage of the gain optimization over the OPA-absent case as expected. Interestingly, a trivial choice of G = G 0 = 1 + N S0 according to Eq.(S22) is sufficient to yield nearly the same improvement. This provides a neat rule-of-thumb estimation of the optimum gain in the practical implementation.