Molecular platform for frequency upconversion at the single-photon level

Direct detection of single photons at wavelengths beyond 2 microns under ambient conditions remains an outstanding technological challenge. One promising solution is frequency upconversion into the visible (VIS) or near-infrared (NIR) domain, where single photon detectors are readily available. Here, we propose a nanoscale solution based on a molecular optomechanical platform to up-convert THz and mid-infrared (MIR) photons into the VIS-NIR domain, and perform a detailed analysis of its added noise and conversion efficiency with a full quantum model. Our platform consists in doubly resonant nano-antennas focusing both the incoming long-wavelength radiation, and the short-wavelength pump laser field, into the same active region. There, infrared active vibrational modes are resonantly excited and couple through their Raman polarizability to the pump field. This optomechanical interaction is enhanced by the antenna and leads to the coherent transfer of the incoming low-frequency signal onto the anti-Stokes sideband of the pump laser. Our calculations demonstrate that our scheme is realizable with current technology and that optimized platforms can reach single photon sensitivity, at wavelengths reaching into the THz domain, with an added noise only limited by the thermal occupancy of the molecular vibration.


I. INTRODUCTION
Many applications in security, material science and healthcare would benefit from the development of new technologies for THz detection and thermal imaging [1]. Driven by applications in astronomy, novel cryogenic detectors in the THz range appeared in the last few years [2,3]. However the ability to efficiently manipulate such electromagnetic signals at room temperature is still lacking [4]. In particular, single photon detection, which is now routine in the VIS-NIR region (wavelength in vacuum from 400 nm to 2000 nm), remains impossible or unpractical at longer wavelengths. Development of new detection devices operating without complex cryogenic apparatus, and featuring improved sensitivity, lower noise and reduced footprint, would significantly impact sensing, imaging, spectroscopy and communication technologies.
In this work we propose a new route to achieve lownoise coherent detection of THz and MIR radiation by leveraging optomechanical transduction with molecules [5], whose natural oscillation frequencies are resonant with the incoming field. Our strategy consists in converting the incoming low-frequency signal onto the anti-Stokes sideband of a pump laser in the VIS-NIR domain, where detectors with single photon sensitivity are readily available [6,7]. This approach is inspired by the recent realization of coherent frequency conversion using different types of optomechanical cavities [8][9][10][11][12][13] and is conceptually disctinct from a recently demonstrated detection scheme assisted by a microfabricated resonator [14]. As an outlook, we propose to leverage constructive interference between signals coming from an array of coherently * corresponding authors: tobias.kippenberg@epfl.ch † chris.galland@epfl.ch pumped up-converters in order to increase further the strength of the converted signal over the incoherent thermal noise. While coherent conversion from the MIR to the VIS-NIR domain has so far been achieved by sum-frequency generation in bulk non-linear crystals [15][16][17][18], these schemes operated under several Watts of pump power and required phase-matching between the different fields propagating in the crystal. Our scheme, on the contrary, relies solely on the spatial overlap of the two incoming fields. Indeed, we use a nanometer-scale dual antenna that confines both electromagnetic fields into similar subwavelength mode volumes. The optomechanical interaction with the vibrational system takes place in the near field, without need to fulfill a phase matching condition. Moreover, thanks to the giant field enhancement provided by plasmonic nano-gaps, the required pump powers to achieve efficient conversion is dramatically reduced.
The protocol that we introduce leverages the intrinsic ability of specific molecular vibrations to interact both resonantly with MIR-THz fields and parametrically with VIS-NIR fields, as routinely observed in infrared absorption and Raman spectroscopy, respectively. The wealth of accessible vibrational modes and frequencies [19,20] offers a convenient toolbox to realize efficient frequency up-conversion for several technological regions of interest (atmospheric window, thermal imaging and THz gap).
We first introduce the framework describing the interaction between a molecular vibration and two electromagnetic fields, one that is resonant with the vibrational frequency, the other one that is parametrically coupled to it through the molecular polarization. We compute the conversion efficiency and the noise figures-of-merit of our novel device as a function of the optical pump detuning and power. We find that internal conversion efficiencies on the order of a few percent and noise equivalent power below 10 −12 W/ √ Hz are achievable with our proposed arXiv:1910.11395v1 [physics.optics] 24 Oct 2019 design, highlighting the potential of this approach for reaching single-photon THz detection.

II. OPTICAL CONVERSION SCHEME
We start with the description of the two types of interactions leveraged in the conversion process and describe the relevant parameters. For simplicity, we now use the abbreviation IR to denote MIR or THz fields, depending on the vibrational frequency considered. First we model the resonant absorption process. We assume that the vibrational system is weakly driven, meaning that the average number of excited collective vibrational quanta is much smaller that the total number of molecular oscillators coupled to the incoming field. At the single molecule level this easily satisfied condition results in considering the single transition from the ground to the first excited vibrational state. The collective excitation of an ensemble of vibrational modes can thus be treated as an ensemble of two-level systems [21].
The interaction part of the Hamiltonian is correspondingly approximated by : withâ † ir ,â ir the IR field bosonic ladder operators and σ + ν ,σ − ν the raising and lowering operators of the collective two level system described by a transition frequency The incoming IR field at frequency ω ir is enhanced by a frequency-matched antenna and performs work on the collective transition dipole d ν of the molecular vibration [22]. On resonance (ω ir = ω ν ) the averaged number of created phonons is (see Appendix B for a detailed derivation)n ir,0 √n ir andn ir the mean occupation of the IR antenna mode and | â in ir | 2 the incoming IR photon flux. In this expression κ ir = κ ir ex + κ ir 0 is the loss rate of the antenna at the incoming frequency, which is the sum of the external decay rate κ ir ex (by coupling to the incoming far-field modes) and the internal decay rate κ ir 0 (by metallic absorption and scattering to other modes), η ir = κ ir ex /κ ir the coupling ratio of the antenna and Γ tot the total vibrational decay rate, where the intrinsic vibrational linewidth Γ ν is modified by its coupling to the IR antenna [23].
As pictured in Fig. 1, we employ a second antenna resonant at ω c (a frequency in the VIS-NIR domain, which we call "optical" domain from here on for brevity), whose decay rates κ opt ex , κ opt 0 are defined in the same Frequency picture of the optomechanical conversion mechanism involving both IR absorption and Raman scattering by specific vibrational modes. Here the pump tone (ωp) is red-detuned from the optical resonance (ωc ωp + ωir) while the incoming IR signal is resonant with a specific vibrational mode (ωir = ων ).
way as the IR antenna parameters. The optical antenna enhances the parametric optomechanical interaction of the molecular vibration with a pump laser in the optical domain, as described in [5]. Concisely the interaction between an optical field and N opt molecular oscillators leads to a dispersive interaction described by the HamiltonianĤ int = − g opt,0 = N opt g opt,0 the collective optomechanical vacuum coupling rate andâ † opt ,â opt b † ν ,b ν the optical pump field bosonic ladder operators (the vibrational phononic operators at frequency ν).
The optical antenna field can be split into an average coherent amplitude α and a fluctuating term so that a opt = α + δâ opt . Expanding to first order in α the optomechanical interaction we obtain the linearized interaction with g opt,0 √n opt andn opt = |α| 2 the mean occupation of the optical antenna mode (see Appendix A).
The outgoing noise spectrum on the optical port in [photons/(Hz·s)] can be evaluated through the calculation of the two-time correlations of the optical output field operators [24,25]: where the prefactor in brackets takes into account the frequency dependence of the radiative coupling rate to the far-field of a dipolar emitter.
Following previous works in optomechanics [26] and their extension to molecular optomechanics [27,28] we can write an analytical expression of the emitted noise spectrum at the anti-Stokes sideband S out (ω aS ) ∝ A −n f / (Γ * ν + Γ opt ) (at the Stokes sideband S out (ω S ) ∝ A + (n f + 1) / (Γ * ν + Γ opt )) withn f the mean final phonon number of the vibrational mode. Due to the IR and optomechanical interactions the intrinsic vibrational damping rate is modified Γ tot = Γ * ν +Γ opt with Γ * ν the IR antenna-assisted damping rate and Γ opt = A − − A + the additional damping rate of electromagnetic origin characterized by the imbalance between the optical antennaassisted transition rates to the ground A − and excited vibrational states A + (see Appendix A). We note that the optical interaction modifies the vibrational lifetime and thus the number of IR excited phonons in the steady state (eq. 2).
The final phonon number in the vibrational mode,n f , is given by the expression [24] wheren b =n ir b +n th is the total phonon number in the absence of optical drive. It is the incoherent sum of the IR-induced vibrational excitation (eq. 2) and the thermal noise,n th = 1/ (exp [ ω ν /k B T bath ] − 1) for a bath temperature T bath . We assume here that the pump laser does not lead to significant Ohmic heating of the system. It is however straight-forward to model laser-induced heating by introducing a pump-power dependent bath temperature T bath .
The resulting noise spectrum, S out opt , in the absence of incoming IR radiation (n ir b = 0) should be integrated over the device's operational bandwidth (BW ≡ Γ tot ) to obtain its dark noise levelS out opt = BW S out opt dω. The noise arising from the thermal contribution to the first term in eq. 5 can be reduced by cooling the bath, whereas the second term describes a minimal noise level resulting from phonon creation by spontaneous Stokes scattering of the pump laser, a process equivalent to quantum backaction in cavity optomechanics. Therefore an optimal power that maximizes the signal-to-noise ratio (SNR) exists, akin to the standard quantum limit (SQL) in position measurements.
From these expressions we are also able to describe the conversion efficiency from an incoming rate of IR photons to an outgoing rate of optical photons :S out ir→opt = η ext | â in ir | 2 where η ext = η rad · η int · η ir is the external conversion efficiency. The radiative efficiency of the optical antenna mode into the far field is decomposed as η rad = (ω/ω c ) 3 η opt with η opt = κ opt ex /κ opt defined in the same way as η ir introduced previously. These factors account for the coupling of free space radiation in and out of the nanostructure.
The internal conversion efficiency η int can in turn be divided into a power dependent part η OM (n opt ) and a part describing the spatial overlap between the IR near field, the optical near field and the molecular ensemble, which we write η overlap . To approximate this last term we factorize it into two contributions: the spatial overlap between the IR and optical near fields (η mode ) and the vectorial overlap between the near field polarization (typically normal to the antenna surface) and the molecular orientation, which we name η pol ; so that we can write The power and detuning dependences of the optomechanical efficiency term η OM are depicted in Fig. 3 (b) and its exact calculation is given in Appendix A.

III. MOLECULAR TRANSDUCER
The electric moment µ ν and polarizability α ν of a mode can be directly extracted from experimental data of resonant light absorption and inelastic light scattering, respectively [19,20]. In specific cases the symmetries of the vibrational mode lead to selection rules in its interaction with light [29]. For vibrational modes lacking centro-symmetry the derivatives of both quantities can be non-vanishing [29]. We show such a situation in Fig. 2 where we plot the projections of the derivatives of the electric moment and of the polarizability with respect to the molecular coordinate Q ν of the 1002 cm −1 mode of thiophenol, which we choose as an example in our calculations. We note that the projection of the tensor ∂αν ∂Qν onto an axis perpendicular to the principal axis of the electronic moment derivative ∂ µν ∂Qν can be non-vanishing. Several polarizations for in-and outcoupling of resonant and up-converted fields are thus conceivable.
The calculations leading to the parametric optomechanical vacuum coupling rate g opt,0 between the antenna field and the vibrational mode has been previously described [5] and its value is given by g opt,0 = ω c e opt · ∂αν ∂Qν · e opt 1 Voptε0 2ων with α ν the polarizability tensor, Q ν the reduced displacement coordinate of the vibrational mode labeled by ν, V opt the optical mode volume and e opt the unit polarization vector of the optical antenna mode.
In analogy the coupling rate g ir,0 associated with a vibrational mode ν is linked to an effective transition dipole vector d ν that can be computed through numerical calculations (cf. Appendix C). Its value is given by g ir,0 = 1 d ν · E 0 where the electric field per photon is given by E 0 = ων 2ε0Vir e ir with V ir the mode volume and e ir the unit polarization vector of the IR mode [25,30].
Since g (N ) ir scales with n ir N ir /V ir , it can be independent of the mode volume as long as this volume is filled with molecules. On the contrary the interaction of the vibration with the VIS-NIR optical field scales as n opt N opt /V opt advocating for a device confining strongly this field and reducing thus the required optical power to reach an efficient conversion process.

IV. DUAL PLASMONIC ANTENNA
Nanoantennas have proven to be instrumental in enhancing the interaction of molecules with off-resonant VIS-NIR optical fields (e.g. for surface-enhanced Raman scattering, SERS) [31,32] and resonant IR fields (e.g. for surface-enhanced infrared absorption, SEIRA) [33,34]. We now present the design of a new dual-resonant antenna (see Figs. 1 and 2) and compute the interaction of both antenna local fields with one vibrational mode of molecules covering the nanostructure. We consider that molecules are attached with their main axis perpendicular to the metallic surfaces, and extract from our DFT calculations the relevant components of the derivatives of the electronic moment and polarizability. We note that calculations for specific self-assembled mono-layer orientations [35,36] or randomly oriented molecules could also be achieved from the full knowledge of ∂αν ∂Qν and ∂ µν ∂Qν . In our design, the incoming field to be up-converted and the pump laser field are each resonant with a different component of the antenna arranged in a crossed configuration. At their intersection, the near-field polarizations of the two fields are co-linear ( e ir e opt ), and we obtain η pol = 33 % for the specific vibrational mode illustrated in Fig. 2. Electromagnetic simulations demonstrate that the two fields, despite being more than one order of magnitude away in frequency in that particular example, are confined within a very similar volume inside the nano-gaps separating the two structures. It results in a spatial overlap of the two main electromagnetic field components within the dual antenna of η mode = 44 % (cf. Appendix D for additional information on the design and parameter values).
From our numerical calculations we find that the antenna-assisted IR coupling rate reaches g (N ) ir,0 /(2π) ∼ 186 GHz as V ir is decreased by several orders of magnitude below its diffraction limit (the calculation of V ir and d ν are detailed in the Appendix). As the cavity damping rate remains strong in comparison to the vacuum IR coupling rate (2g ir,0 < κ ir /2 -Purcell regime) the antenna enhanced damping rate for this vibrational mode can be approximated by the expression [30] : We notice that the dramatic enhancement of the vibrational radiative damping rate (Γ * ν 1.7Γ ν ) achieved with our design may enable efficient generation of IR radiation under optical parametric amplification of the vibrational mode.

V. OPTICAL NOISE CONTRIBUTIONS
The conversion efficiency does not give a complete characterization of the device; we also need to compute the noise added in the up-conversion process on the anti-Stokes signal at ω aS = ω p + ω ir . This noise power (in photon per unit time) is given by [37] n added = â in ir | 2 /SNR(ω aS ), with the optical signal-to-noise ratio defined as SNR(ω) =S out ir→opt (ω)/S out opt (ω). The results of this calculation for the anti-Stokes sideband are shown in Fig. 3.
When operating with a pump laser red-detuned from the optical resonance (∆ = ω p − ω c = −ω ν ) in the resolved sideband regime κ opt /2 < ω ν we can simplify the interaction of eq. 3 and obtain In this regime the coherent up-conversion mechanism is clearly evidenced and reaches maximal efficiency while  Table I. Noise equivalent power of commercially available uncooled devices [4] and comparison with the device presented in this manuscript (Molecular device). keeping a low added noise level. We note that for low vibrational frequency modes the condition κ opt /2 < ω ν could be achieved with the help of hybrid cavities that feature narrower linewidths [38,39].
Alternatively we can describe our up-converter as a detector that reaches a unity signal-to-noise ratio for an incoming power P min,in ir . Then the performance of our device can be evaluated through its noise equivalent power (NEP) : NEP = P min,in ir / √ BW [W · Hz −1/2 ], a commonly used figure of merit to compare detection devices independently of their respective operational bandwidth. The NEP achievable with our detection scheme depends on the optical pump power as shown in Fig. 3, and reaches values that compare favorably with other uncooled technologies for IR detection (Table I).

VI. BUILDING BLOCK OF AN IR DETECTOR ARRAY
The different contributions to the optical noise (cf. eq. (5)) arise from different physical mechanisms. In the weak optical drive limit the noise originates almost entirely from the residual thermal population of the vibrational mode. If we consider an array of molecular converters that are uncoupled in the near-field, their respective outgoing anti-Stokes fields of thermal origin will not have any phase coherence with each other, even if pumped by the same laser field. On the contrary, if all devices are illuminated by the same incoming IR field and pumped by the same optical field, all the sum-frequency anti-Stokes signals will be phase coherent and will interfere constructively in specific directions, in analogy with a phased emitter array [40,41]. With a sufficiently large array and a distance between single elements smaller than the involved wavelengths, only a single direction in the far field exhibits constructive interference. The major advantage of this approach is to increase the SNR linearly with the number of devices, as demonstrated in Appendix E for a linear array.
A configuration with multiple converters within the IR spot could alternatively lead to on-chip IR multiplexing [42][43][44][45] with converters responding to multiple IR  signals and bypassing the limited detection bandwidth of a single converter. This sub-wavelength platform offers novel opportunities benefiting from the coherent nature of the conversion process and opens the way for multi-spectral IR detection, IR imaging and recognition technologies.

VII. CONCLUSION
In summary, we presented a new concept for frequency up-conversion from the mid-infrared to the visible domain based on the interaction of both fields with molecular vibrations coupled to a dual-resonant nano-antenna. We considered an incoming long-wavelength infrared ra-diation that resonantly excites a vibrational mode, which is simultaneously coupled through its Raman polarizability to a coherent pump field at shorter wavelength (visible or near-infrared), resulting in up-conversion of the IR signal onto the anti-Stokes sideband of the pump. Thanks to the recently developed framework of molecular cavity optomechanics, we were able to treat the problem in a full quantum model, and thereby predict the internal quantum efficiency of our device, as well as its added noise. We showed that the latter can be separated in two main origins: a thermal noise and a backaction noise (including quantum and dynamical backaction) that increases with pump power and eventually becomes dominant. We analysed the dependence of the noise-equivalent power (NEP) on the intra-cavity pump photon number and pump-cavity detuning, and predicted that under the optimal condition of red-sideband excitation, the NEP can be as low as few pW · Hz −1/2 , improving on the state of the art for devices operating at ambient conditions.
We would like to stress that our numerical estimates are based on a realistic nano-antenna design and a standard simple molecule (thiophenol). Although the intracavity photon numbers required to reach optimal performance appears to be large, they can be achieved under pulsed excitation [46]. Moreover requirements on the intra-cavity power would be lowered by further reducing the gap size (down to 1-2 nm) and by chemical engineering of the molecular converter toward higher Raman activity. Our study also shows that by moderately increasing the resonant coupling rate between molecular vibration and IR antenna, the system would enter the IR strong coupling regime, with the formation of vibrational polaritons [21]. We leave the study of the conversion process in this new regime for future investigation. A detailed description of the optomechanical framework can be found elsewhere [5,[26][27][28]. Here we just remind the readers of the few definitions and relationships used in the paper.

ACKNOWLEDGEMENT
The average intracavity occupation (both for the IR and optical modes) can be related to the incoming photon flux or incoming power bȳ n opt/ir = | â in opt/ir | 2 κ opt/ir ex When considering the molecular vibrational levels and their parametric coupling to the optical field, the antenna-assisted transition rate to a lower excited level is given by The antenna-assisted transition rate to a higher excited vibrational level is given by The interested reader can find the complete derivation of the outgoing power spectral density in Ref. [24]. In this manuscript we are interested in the signal arising on the anti-Stokes sideband. Starting from eq. (4) in the main text and following the same calculation steps we arrive at the final expression For convenience we label the different components of the outgoing noise spectrum according to the origin of the vibrational population from which they result (cf. eq. (5) in the main text): With this notation the total noise added in the conversion is S out opt = S out th + S out ba .
We can then derive the expression for the conversion efficiency defined asS out ir→opt = η ext | â in ir | 2 and obtain (A6) This expression highlights the different factors constituting eq. (6) of the main text. For a pump field red-detuned from the optical antenna resonance (∆ = ω p − ω c = −ω ν ) this expression can be further developed to evidence the dependence of the internal conversion efficiency on both collective vacuum coupling rates .
(A7) Using this conversion efficiency, we can also calculate the NEP directly from the dark noise and efficiency of the device as NEP = ων ηext Sout opt [7]. In Fig. 4 we show both conversion efficiency and NEP for the case ∆ = −ω ν . The NEP calculated from the conversion efficiency gives identical results to the NEP values as defined and calculated in the main text. In this red-detuned case our model assumptions remain valid for a large range of optical intracavity photon number. At high optical power we observe that both the efficiency and NEP reach an extremal value when the back-action contribution to the outgoing noise becomes predominant as depicted in Fig. 4.

Appendix B: Absorption of incoming IR radiation by a vibrational mode
We describe in the following the coupling between a resonant field and a single vibrational mode inside a cavity (i.e. antenna). We also derive the expression given in the main text for the number of phonons created. Our treatment is inspired by the treatment adopted in Ref. [22].
The interaction between an external monochromatic field of frequency ω ir and the molecular vibration in the dipole approximation inside a cavity is given by : with φ the phase offset between the field and the dipole, φ 0 an adjustable phase parameter of the driving field. volume and e k the unit polarization vector of the IR mode.
This interaction can be written in terms of the bosonic ladder operators describing the IR mode inside the cavitŷ a † ir ,â ir and the vibrational phononic operatorsb † ν ,b ν at frequency ν. For a weak IR drive the vibrational Hilbert space can be reduced to ground and first excited state {|0 , |1 } and described like a two level system (TLS) with creation and annihilation operatorsσ + ν ,σ − ν [25]. We note that the validity of the TLS description for a collective vibrational mode of N oscillators would only break down for a number of excitations of order N [30].
The dipolar transition is purely nondiagonal in this basis and described as d = d ν (σ − ν e ν +σ + ν e * ν ). The field inside the cavity is in turn described by E ir = −i √n ir E 0 e −iφ0â ir e k − e iφ0â † ir e * k [30]. In a frame rotating at the frequency of the IR driving field we only have to take into account the resonant processes (â † irσ − ν ,â irσ + ν ) and we obtain the interaction Hamiltonian : with g ir = d ν · E 0 √n ir e * ν e k e iφ0 = d ν · E 0 √n ir . Here we choose the additional phase term of the driving field in order for the coupling to be real positive, without loss of generality.
We follow the dynamics of the TLS in this rotating frame. Introducing the rate Γ tot which describes the total damping of the vibrational mode as described in the main text, we obtain : with δ = ω ir − ω ν the detuning between the IR drive and the vibrational resonance.
These equations are often described with the help of the Bloch vector components: The components u, v of the Bloch vectors are related to the average dipole value [22]: d = 2 d ν (u cos ω ir t − v sin ω ir t). We derive the master equations as a function of these components: The steady-state solutions of these equations are : The average number of photons absorbed per unit time by the vibrational dipole is given by : If the detuning and coupling are much smaller than the vibrational damping rate (δ, g ir < Γ tot ), the average number of absorptions over an IR period can be written as : In the steady state the rate of photons absorbed by the vibrational mode equals the phonon damping rate so that the average number of excited phonons is : We note that the average number of excited phononsn ir b can also be simply derived from the steady-state population of the upper TLS staten ir b =w + 1 2 .
Appendix C: Simulation of molecular parameters DFT calculations give the infrared absorption intensity of fundamental vibrational transitions I ir ν [29,47] via the calculation of derivatives of electric moment components µ i ν with respect to the normal coordinates representing the vibrational mode of interest. They are usually expressed in [km·mol −1 ]. For a non-degenerate and har-monic vibrational mode the absorption intensity averaged over all orientations is given by : with N A the Avogadro number.

Gaussian calculations
The procedure is well described in the context of Raman calculations in the book of Le Ru & Etchegoin [48]. The Gaussian software gives access to the derivatives of the electric dipole with respect to the i -th compo-nent of the displacement in cartesian coordinates of the n-th atom. These derivatives can then be converted to derivatives with respect to the normal coordinates of a vibrational mode ν and are given in atomic units, i.e. the electric moment is given in Bohr-electron (2.54 Debye / 8.48·10 −30 C·m) and the displacement in Bohr (0.529Å). The quantities ∂ µν ∂Qν can be converted to other systems of units : and finally linked to the absorption intensity I ir ν of an incoming field of polarization e i (C3)

Effective dipole moment
Accordingly we can also describe an effective dipole moment d ν to characterize the vibrational transition and link it to the value of the absorption cross-section:

Raman activity of an ensemble of molecules
We refer the interested reader to the references [5,48] for detailed descriptions of the Raman activity, its connection with the optomechanical coupling rate and its calculation through DFT. For completeness we reproduce here a few expressions of the tensorial quantity ∂αν ∂Qν averaged over randomly oriented molecules. To simplify the notation we introduce in the following the Raman tensor R ν = ∂αν ∂Qν and we refer to the scalar ( e i · R ν · e j ) as R ij ν . The total Raman scattered power arises from the contributions of the two possible outgoing polarizations. Averaging over random orientations of the molecules one can obtain These quantities do not depend on the two orthogonal orientations of the field chosen as the polarisation basis but only on the intrinsic properties of the molecule. In that situation Raman scattering can be described by a scalar named the magnitude of the Raman tensor R 2 = R ii and can be derived directly from DFT calculations.
We also introduce the depolarization ratio ρ = R ji ν 2 / R ii ν 2 that evaluates the importance of the cross-polarized component of the Raman-scattered field (with respect to the incoming field) and that is bounded by 0 ≤ ρ ≤ 3/4. In the SERS scenario the outgoing field is solely polarized along the direction of the local cavity field e L . For randomly oriented molecules the magnitude of the Raman tensor is thus rescaled by a factor depending on the depolarization ratio:

Local overlapη pol
The factor η pol describes the local vectorial overlap between the two fields involved in our conversion scheme, on the one hand, and the IR dipole and Raman tensor of the molecular vibration, on the other hand. It is defined in the following way: with the label L designating the direction of the near-field at the location of the molecule. To compute η pol ) (see Table II) we numerically average η pol over all possible orientations of the molecule, while keeping the IR and optical local field colinear. This is justified by considering that the near-field at a metallic surface is orthogonal to it. 6. Orientation and number of molecules contributing to the IR/optical process From our DFT calculations we compute the molecular parameters for several cases of interest and report their values in Table II. Two orientations (main axis of the molecule parallel to both local fields and fully random) were considered. Two options were also considered for the coverage: one monolayer covering the planar parts of the metallic nanostructure, or a superposition of layers filling the entire volume where the fields are localized. We use the IR/optical mode volumes V IR/opt (given below), the molar mass (M = 0.1102 kg/mol), volume density (ρ = 1077 kg/m 3 ) or surface density (ρ S = 6.8 · 10 18 m −2 ) of thiophenol to estimate the number N ir (N opt ) of molecules participating in the IR (optical) process. Our dual antenna consists of two gold bowtie structures. We set the gap between the tips of both antennas (S = 25 nm) so that the current design could be developed using current nanofabrication techniques such as focused ion beam milling or advanced e-beam lithography. We select the other structural parameters (Fig. 5) in order to obtain appropriate resonances both in the mid-IR (length L and width W ) and in the optical domain (short length l). The gold substrate at a distance D below the dual antenna reflects the incoming field and creates an interference pattern that improves the absorption of the IR incoming field as shown in a previous study [33].

Numerical calculations
We use a 3D FEM software (Comsol Multiphysics) to evaluate our dual antenna design. A Drude-Lorentz model describes the electromagnetic response of gold fitted from experimental data [50]). For the calculation in the optical range a dielectric layer (ITO) with refractive index n = 1.94 and thickness 52 nm was added below the antenna [51].
A dipolar emitter is placed in the center of our structure to evaluate the local density of electromagnetic states inside the antenna. Figure 6 shows the modification of the radiated power as a function of the oscillation frequency of the dipole. Based on these plots, we modeled the response of the structure to an incoming optical field and to an incoming mid-IR field at (around 32 THz) as being dominated by the contribution from a single resonance with Lorentzian profile. We thus used a multi-Lorentzian fit to extract the relevant linewidths and total decay rates. Through the Purcell formula [28], we could estimate the corresponding effective mode volumes. Additional integrals were computed to determine the losses originating from absorption in the metal and determine the ratio between intrinsic and radiative losses at the resonance frequencies. All parameters are shown in Table III.  We discuss the different contributions to the optical noise starting from the expression forn f , eq. (5) in the main text. When Γ opt Γ * ν the equation for the vibrational population splits into three different factors identified as thermaln th , dynamical back-actionn dba and quantum back-action noisesn qba , respectively: For sensing applications it is enlightening to study how the contributions from the different noise terms are affected when considering an array of converters coherently illuminated by the IR field and the pump laser. We describe a linear array of N optomechanical converters. For simplicity we consider identical converters separated uniformly with a spacing d < λ opt < λ ir in order to avoid multiple maxima in the radiation pattern of the array. If all converters are excited in phase, the described configuration is known as broadside configuration and the maximum radiation is directed normal to the array axis. We assume that the optical pump power is split among the antennas 1 , so that the pump power per antenna is diluted according to |α (i) | 2 = 1 N |α (0) | 2 so that the different cavity-assisted molecular rates scale as 1 N , i.e. A +/− (i) = 1 N A +/− (0) . If the back-action effects are weak at a single converter level Γ . The expression of the final populationn f (eq. E1) shows that in this case thermal noise is the main contribution to the total noise.
In the far-field, constructive interference among the fields emitted from individual antennas sharpens the pattern of coherent radiation [40] so that the total IR converted signal in this direction scales as [52] S out,(N ) ir = (array factor) 2 · S out,(i) ir which results in S out,(N ) ir N 2 S out,(i) ir along the direction of maximum radiation for a broadside array. On the contrary if the converters are sufficiently spaced to avoid any near field coupling the thermal emission would remain incoherent and quasiisotropic.
We combine the factors related to the power dilution and to the directivity of the linear array to describe the SNR of the array in the regime dominated by thermal noise: Our argument highlights the interest of this nanoscale converter to elaborate advanced architectures targeting specific applications.