Cavity quantum electro-optics: Microwave-telecom conversion in the quantum ground state

Fiber optic communication is the backbone of our modern information society, offering high bandwidth, low loss, weight, size and cost, as well as an immunity to electromagnetic interference. Microwave photonics lends these advantages to electronic sensing and communication systems, but - unlike the field of nonlinear optics - electro-optic devices so far require classical modulation fields whose variance is dominated by electronic or thermal noise rather than quantum fluctuations. Here we present a cavity electro-optic transceiver operating in a millikelvin environment with a mode occupancy as low as 0.025 $\pm$ 0.005 noise photons. Our system is based on a lithium niobate whispering gallery mode resonator, resonantly coupled to a superconducting microwave cavity via the Pockels effect. For the highest continuous wave pump power of 1.48 mW we demonstrate bidirectional single-sideband conversion of X band microwave to C band telecom light with a total (internal) efficiency of 0.03 % (0.7 %) and an added output conversion noise of 5.5 photons. The high bandwidth of 10.7 MHz combined with the observed very slow heating rate of 1.1 noise photons s$^{-1}$ puts quantum limited pulsed microwave-optics conversion within reach. The presented device is versatile and compatible with superconducting qubits, which might open the way for fast and deterministic entanglement distribution between microwave and optical fields, for optically mediated remote entanglement of superconducting qubits, and for new multiplexed cryogenic circuit control and readout strategies.

The last three decades have witnessed the emergence of a great diversity of controllable quantum systems, and superconducting Josephson circuits are one of the most promising candidates for the realization of scalable quantum processors [19]. However, quantum states encoded in microwave frequency excitations are very sensitive to thermal noise and electromagnetic interference. Short distance quantum networks could be realized with cryo-cooled transmission lines but longer distances and high density networks require coherent upconversion to shorter wavelength information carriers, ideally compatible with existing near infrared (1550 nm) fiber optic technology. So far no solution exists to deterministically interconnect remote quantum microwave systems, such as superconducting qubits [19] and quantum dots or spins in solids [20,21] via a room temperature link with sufficient fidelity to build large-scale quantum networks [22]. Solving this challenge might not only facilitate a new generation of more power efficient classical communication systems [2], but eventually also enable quantum secure communication, modular quantum computing clusters [23] and powerful quantum sensing networks.
An ideal quantum signal converter [24] needs to achieve a total bidirectional conversion efficiency close to unity η ∼ 1 for quantum level signals with a minimum amount of added noise N 1 over a large instantaneous bandwidth that allows for fast transduction compared to typical qubit coherence times. Many different platforms are already being explored for microwave to optical photon conversion [25,26]. Electro-optomechanical systems have shown very encouraging efficiencies [27,28], but typically suffer from a limited bandwidth in the kHz range. Electro-optic [6,11,12] or piezo-optomechanical [29][30][31] conversion can be faster but the conversion noise properties have not been quantified. Facilitated by efficient photon counting and low duty cycle operation, unidirectional transduction of quantum level signals has also recently been shown [32,33], but ground state operation has not been demonstrated in a bidirectional interface so far. In this work we present such a device operating continuously with a microwave mode occupancy N e ≤ 1, for a pump laser power of up to P p = 23.5 µW resulting in a total bidirectional conversion efficiency of η tot = 9.1 × 10 −6 . The maximum achieved total efficiency of η tot = 0.03 % is limited by the highest pump laser power of P p = 1.48 mW for the available setup at millikelvin temperatures. whereâ e ,â p , andâ o stand for the annihilation operators for the microwave, optical pump, and optical signal mode, respectively. This Hamiltonian describes two reciprocal three-wave mixing processes that involve creation and annihilation of photons while respecting energy conservation. The nonlinear vacuum coupling rate g for this interaction depends on the material's effective electro-optic coefficient r and the spatial overlap of the three modes [6] with the mode frequency ω k , the relative permittivity ε k and permeability µ k = 1, the effective mode volume V k , and the normalized spatial field distribution ψ k defined such that the single-photon electric field for mode k ∈ {e, p, o} can be written as E k ( r) = ω k /(2ε 0 ε k V k )ψ k ( r). All three modes are whispering gallery modes (WGM) [34] whose spatial field distribution can be separated in the cross-sectional and azimuthal part ψ k (r, z, φ) = Ψ k (r, z)e −im k φ . The integral in Eq. (2) is non-zero only if the azimuthal numbers of the participating modes fulfill m o = m p + m e , which is also known as phase matching or angular momentum conservation.
In our conversion scheme we drive the modeâ p with a bright coherent toneâ p → α p , which simplifies Eq. (1) toĤ This is known as the beam splitter Hamiltonian and it corresponds to a linear coupling between the optical modeâ o and microwave modeâ e . From the enhanced coupling rate α p g, we define the multi-photon cooperativity as C = 4|α p | 2 g 2 /(κ o κ e ), where κ o and κ e are the total loss rates of the optical and microwave mode, respectively. The multi-photon cooperativity is the figure of merit in most of the resonant electro-optic devices, both for frequency conversion and entanglement generation [5,6,11,14].

Device
The electro-optic transducer consists of a z-cut LiNbO 3 WGM resonator, with a major radius R = 2.5 mm, sidewall surface radius ρ ≈ 0.7 mm and a thickness d = 0.15 mm, coupled to a superconducting aluminum cavity as shown in Fig. 1a. The top and bottom rings of the cavity are designed to confine the microwave mode at the rim of the WGM resonator and maximize the spatial mode overlap with the two optical modes. Here we use type-0 frequency conversion, where all the participating waves are polarized parallel to the material's optic and symmetry axis, addressing the highest electro-optic tensor component of LiNbO 3 . We work with two optical modes of the WGM resonator that are spectrally separated by the resonator's optical free spectral range (FSR) as shown in Resonant electro-optic device. a, Explodedview rendering of the electro-optic converter. The WGM resonator (light blue disc) is clamped between two aluminum rings (blue shaded areas) of the top and bottom part of the aluminum microwave cavity. Two GRIN-lenses are used to focus the optical input and output beams (red) on a diamond prism surface in close proximity of the optical resonator. The microwave tone is coupled in and out of the cavity using a coaxial pin coupler at the top of the cavity (gold). The prism, both lenses and the microwave tuning cylinder (gold shaded inside the lower ring) positions can be adjusted with 8 linear piezo positioners. b, Optical reflection spectrum of the WGM resonator at base temperature (∼ 7 mK). The optical pump mode at ωp/(2π) ≈ 193.5 THz (green) and signal mode (blue) are critically coupled and separated by one free spectral range (FSR, dashed lines). On resonance 38% of the optical power is reflected without entering the WGM resonator due to imperfect optical mode overlap Λ (horizontal dotted line). The lower sideband mode (red) was chosen to couple to a mode family of different polarization, which splits it and facilitates the single-sideband selectivity of the converter. c, Reflection spectrum of the microwave cavity at base temperature (∼ 7 mK) for the tuning cylinder in its up position (blue line) and in its down position (red line). With a tuning range of ∼ 0.5 GHz we can readily match the cavity frequency with that of the optical free spectral range FSR/(2π) = 8.818 GHz (dashed line). m p −1 mode's participation in the resonant interaction is suppressed due to its avoided crossing with another mode family [6], leaving only 2 optical modes in the process. We use an antireflection coated diamond prism to feed the optical pump into the optical resonator via evanescent coupling. The prism is attached to a linear piezo positioning stage that allows to accurately tune the extrinsic optical coupling rate κ ex,o in-situ. The continuous wave optical pump is a ∼ 10 kHz linewidth coherent laser tone that is locked to the resonance of the optical pump WGM at ω p /(2π) ≈ 193.5 THz for conversion measurements. The cryostat optical input line consists of a single mode fiber with a GRIN-lens at its end to focus the optical beam at the prism-WGM resonator coupling point. The reflected optical pump is collected with a second GRIN-lens and coupled to the output line fiber for further measurements at room temperature.
At base temperature (∼ 7 mK) we measure an optical mode separation FSR/(2π) = 8.818 GHz and an intrinsic optical loss rate κ in,o = 9.46 MHz, which corresponds to a quality factor Q in,o = 2.0 × 10 7 -a reduction by a factor 10 (5) from the measured room temperature value outside (inside) the microwave cavity. The chosen optical pump and signal modes have a contrast of 62% at critical coupling (Q ex,o = Q in,o ) as shown in Fig. 1b, due to an imperfect spatial field mode overlap Λ 2 = 0.38 between the optical WGM and the optical input beam (see Supplementary Material). In this work we keep the optical system critically coupled to maximize the optical photon number for a given input power. The optical signal at ω o = ω p + FSR for optical to microwave conversion is created using a suppressed-carrier single-sideband modulator and sent through the same optical path as the pump tone, see Supplementary Information.
The chosen microwave cavity mode undergoes one oscillation around a full azimuthal roundtrip m e = 1, and its frequency is matched to the optical FSR in order to fulfill the conditions of phase matching and energy conservation. We use an aluminum cylinder centered below the WGM resonator and attached to a vertical piezo positioner that shifts the microwave resonance frequency ω e /(2π) from 8.70 to 9.19 GHz at base temperature as shown in Fig. 1c. Microwave tones are sent to the device through a heavily attenuated transmission line and subsequently coupled to the cavity via a coaxial pin coupler mounted in the top part of the cavity as shown in Fig. 1a. The reflected microwave tone and the down-converted optical signal pass two circulators before amplification and measurement with a vector network analyzer (VNA) or an electronic spectrum analyzer (ESA), see Supplementary Information. From the VNA reflection measurements, we extract the resonance frequency ω e , the intrinsic loss rate κ in,e = 6.7 MHz and the extrinsic coupling rate κ ex,e = 3.7 MHz of the microwave resonance mode.

Bidirectional Conversion
In our system the microwave-to-optics and optics-tomicrowave photon conversion efficiencies are equal [5]. The total input-output electro-optic photon conversion efficiency, defined on resonance, is given as with coupling efficiencies η k = κ ex,k /κ k and κ k = κ in,k + κ ex,k . We determine the bidirectional conversion efficiency of the device η tot = √ η eo η oe , independent of the specifics of the measurement setup [35], such as the optical and microwave input attenuations β 1 , β 3 and output amplifications β 2 , β 4 (see Supplementary Material). Performing 4 independent measurements of the coherent scattering parameters |S ij | 2 ∝ |â out,i /â in,j | 2 with i, j = {e, o} for every optical pump power setting we obtain the in-situ calibrated device efficiency η tot from the optical fiber to the microwave coaxial line.
Here the optics-to-microwave |S eo | 2 and microwave-tooptics |S oe | 2 power ratios are measured on resonance ω 0 = ω e , ω o and the reflected optical |S oo | 2 and microwave |S ee | 2 tones are measured at a detuning such that |ω ∆ − ω 0 | κ e , κ o respectively. For higher accuracy we take into account frequency dependent baseline variations using the full measured reflection scattering parameters.
In Fig. 2a we show the measured values of the total η tot (light blue) and the calculated internal conversion efficiency η int = η tot /(η e η o Λ 2 ) (dark blue) together with Eq. (4) taking into account measured cavity linewidth changes (red lines) as a function of the incident optical pump P p . As the pump power increases, the conversion efficiency departs only slightly from the expected linear behavior for the low cooperativity limit (C 1) (dashed lines). For P p ≈ 700 µW (arrow), η tot drops because the aluminum cavity undergoes a phase transition from the superconducting to the normal conducting state, which is accompanied by a sudden increase of κ in,e , see Supplementary Information. The highest conversion efficiency η tot = 3.16 × 10 −4 is reached for the maximum available pump power P p = 1.48 mW, where the refrigerator base plate reaches a steady state temperature of T f = 320 mK with theoretical microwave mode occupation of N f = 0.36.
From the measured values of the bidirectional conversion efficiency η tot and coupling rates κ in,e at each optical pump power P p , which is related to the drive strength and pump photon number n p = |α| 2 = 4P p Λ 2 κ ex,o /( ω p κ 2 o ), we extract the values of the multi-photon cooperativity in the system, ranging from C = 1.23 × 10 −7 for the lowest, to C = 1.67 × 10 −3 for the highest P p . From this we deduce the maximum internal photon conversion efficiency η int = 4C/(1 + C) 2 of 0.67%. We find very good agreement between the measured conversion efficiency and Eq. (4) (solid red lines) for g/(2π) = 40 Hz, close to the directly measured (simulated) value of 36.1 Hz (36.2 Hz) at room temperature.
The normalized optics-to-microwave conversion as a function of the optical signal frequency is shown in the inset of Fig. 2a. The solid red line corresponds to the theoretical expectation for the conversion spectrum [5]

FIG. 2.
Bidirectional microwave-optics conversion. a, Measured photon conversion efficiency ηtot (light blue points) and inferred internal device efficiency ηint = ηtot/(ηeηoΛ 2 ) (dark blue points) together with theory (red lines), i.e. Eq. (4) taking into account measured cavity linewidth changes. The dashed lines are linear fits for the 10 lowest power data points respectively. The arrow marks the input power where the aluminum cavity goes from super to normal conducting. The inset shows the measured and normalized coherent optics-to-microwave conversion power ratio for Pp = 18.7 µW and Po = 267 nW, as a function of the detuning between the optical signal frequency and ωo (blue points) together with theory Eq. (6) (red line), indicating the conversion bandwidth B/(2π) = 9.0 MHz. b, Measured optical power spectrum for microwave-to-optical conversion at Pp = 1.48 mW. The weak coherent microwave tone Pe = 1.0 nW generates two optical sidebands (blue and red) with a suppression ratio of SR = 10.7 dB. The center and sideband peaks are proportional to the intra-cavity pump np and converted optical signal no photon numbers respectively. The noise floor is set by the resolution bandwidth. c, Measured power spectrum for optical-to-microwave conversion at Pp = 2.35 µW. The weak optical input signal Po = 161 nW generates a single coherent microwave tone at ωo − ωp. In this particular example ne = 1.2 intra-cavity microwave photons are generated with an incoherent noise floor PN,out corresponding to an added output conversion noise of Nout = 0.4 photons s −1 Hz −1 in the center of the microwave cavity bandwidth.
where κ e and κ o /(2π) = 18.92 MHz were independently extracted from direct reflection measurements. Selective up-conversion is an important feature of electro-optic transducers, because of the intrinsic noiseless nature of the up-conversion process. Figure 2b displays the measured microwave-to-optics conversion power spectrum corresponding to the highest pump power. Single sideband conversion with a suppression ratio of 10.7 dB in favor of up-conversion can be observed. This is expected from the asymmetric FSR in our WGM resonator due to the splitting of the lower sideband mode as shown in Fig. 1b. The generated microwave output power spectrum from the optics-to-microwave conversion is shown in Fig. 2c, where the peak at the center represents the coherently converted signal power at the microwave cavity output and the broadband incoherent baseline is due to the thermal noise added to the microwave output as a result of optical absorption.

Added noise
The optical pump causes dielectric heating due to absorption in the lithium niobate. In addition, stray light and evanescent fields can lead to direct breaking of Cooper pairs in the superconducting cavity. Both effects cause an increased surface resistance and in turn a larger microwave cavity linewidth κ in,e (see Supplementary Information). The optical heating causes an increase of the microwave cavity bath N b and the microwave waveguide bath N wg , both are related to the incoherent microwave mode occupancy [36] The two bath populations are directly accessible via the detected output noise spectrum given by in the low cooperativity limit. The conversion noise at the output port of the device N out = N det − N sys in units of photons s −1 Hz −1 is related to the measured power spectrum P ESA via N det (ω) = P ESA (ω)/( ω e β 4 ). Here N sys = 12.74 ± 0.36 and β 4 = (67.05 ± 0.16) dB are the calibrated noise photon number and gain of the measurement setup as referenced to the converter output port. In Fig. 3a-c we show the measured noise spectrum obtained by normalizing with a no-pump baseline reference measurement when the sample is cold N det = N sys P ESA /P ESA(Pp=0) for three different pump powers with the same y-axis scale and no signal tone applied. For the lowest pump power P p = 0.23 µW only the N sys offset is discernible (dashed black lines in panels a-c).   largest applied power P p = 1.48 mW we observe a maximum of N out = 5.51 ± 0.20 and N wg = 1.64 ± 0.08, significantly hotter than the dilution refrigerator base plate at N f = 0.36. This is expected for a steady-state localized noise source, such as the optically pumped dielectric resonator, which has a finite temperature dependent thermalization rate to equilibrate with the environment.
The added conversion noise referenced to the device output N out (blue), the broad band waveguide noise N wg (red), the microwave bath N b (yellow) and mode occupancy N e (green) for different optical pump powers P p are shown in Fig. 3d. Sub-photon microwave output noise as low as N out = 0.03 +0.04 −0.03 and microwave mode occupancies as low as N e = 0.025 ± 0.005 are achieved for a continuous wave pump power of P p = 0.26 µW where the total conversion efficiency is η tot = 8.8 × 10 −8 .
As the pump power is increased, we observe a smooth growth of the waveguide noise starting from an equivalent temperature of T wg = 78 +50 −17 mK and roughly proportional to P p over 4 orders of magnitude. This is expected if the effective thermal conductivity to the cold refrigerator bath of approximately constant temperature is increasing linearly such that the heat flow q matches the dissipated part of the pump power q ∝ P p ∝ T · ∆T , as predicted [37] for normal conducting metals such as the copper coaxial port attached to the superconducting cavity.
In contrast, for the microwave bath we observe 3 distinct regions of heating. Up to about 2 µW the scaling is approximately linear, which is expected for local heating with a fixed thermalization to the cold bath. The thermal conductivity of superconducting aluminum far below the critical temperature is exponentially suppressed [37] so this thermalization could be due to radiation or direct excitation of quasiparticles. In this important range of noise photon numbers, a high conductivity copper cavity might therefore show a significantly slower trend. Above 2 µW the scaling is approximately P p , which indicates that part of the cavity, such as the small rings holding the disk, are normal conducting. This is confirmed by an increase in the internal losses (see Supplementary Information). At P p ≈ 0.7 mW we see a sharp drop in the output noise due to a sudden increase of κ in,e from 8.6 to 11.2 MHz. The temporarily slower increase of N b suggests that this is also accompanied by an higher thermalization rate to the cold refrigerator bath, indicating that the entire aluminum cavity undergoes a phase transition at this input power. This interpretation of the data is backed up by stable cavity properties beyond this power (see Supplementary Material). The lowest measured bath occupancies are consistent with qubit measurements for a similar amount of shielding without optics [38] and could be further improved with sensitive radiometry measurements [39,40].
In Fig. 3e we show the time dependence of the measured output noise when the system is excited with a resonant optical square pulse. The measured rise time to the maximum power of P p = 1.48 mW is 1 ms. Facilitated by its macroscopic device design with a large heat capacity and contact surface area to the cold refrigerator bath, we observe that the fastest timescale at the onset of the square pulse is as low as 1.1 photons s −1 . This is roughly 10 7 times slower compared to state of the art microscopic microwave devices pulsed with ∼ 10 3 times lower power [33]. Assuming -as a worst case scenario -a linear increase of the heating rate with the applied power, we can project N out < 10 −4 for a single 100 ns long pulse of power 1 W. For this power C > 1 with unity internal conversion efficiency and interesting new physics to be unlocked.

Conclusion
The presented bidirectional microwave-optical interface operates in the quantum ground state N e 1, as verified by measuring the minimal noise N out 1 added to a converted microwave output signal. Compared to recent probabilistic unidirectional transduction of quantum level signals we showed somewhat lower [33] and orders of magnitude higher [32] efficiency. The very high instantaneous bandwidth of 10.7 MHz compared to typical 100 Hz repetition rates in previous experiments provides a very promising outlook to be able to also verify the quantum statistics using sensitive heterodyne [14] or photon detection measurements [15] in the near future. Furthermore, bandwidth-matched high power pulsed operation schemes should also enable deterministic protocols due to the observed slow heating timescales, i.e. the conversion of quantum level signals with an equivalent input noise N in = N out /η tot 1. Such a fast and high fidelity quantum microwave photonic interface together with the non-Gaussian resources of superconducting qubits [16] might then provide the practical foundation to extend the range of current fiber optic quantum networks [41] in analogy to optical-electrical-optical repeaters in the early days of classical fiber optic communication [1].
Acknowledgements The authors acknowledge the support from T. Menner, A. Arslani, and T. Asenov from the Miba machine shop for machining the microwave cavity, and S. Barzanjeh, F. Sedlmeir and C. Marquardt for fruitful discussions. Competing interests The authors declare no competing interests. Data availability The data and code used to produce the results of this manuscript will be made available via an online repository before publication.
The whispering gallery mode (WGM) resonator was manufactured from a z-cut congruent undoped lithium niobate wafer. The resonator initial dimensions were a major radius of R = 2.5 mm, a curvature radius of ρ ≈ 0.7 mm and an initial thickness d = 0.5 mm. The lateral surface was polished with diamond slurry from 9 µm (rms particle diameter) down to 1 µm. Subsequently, the resonator was thinned down to a 0.15 mm thickness with 5 µm diamond slurry in a lapping machine. Top and bottom surfaces were then finished by chemical-mechanical polishing.

Optical prism coupling
We couple the optical pump into the resonator via frustrated total internal reflection between the prism and the resonator surface. The optical beam coming from the cryostat input optical fiber is focused to the coupling window with an angle Φ c ≈ 50 • using a gradient index (GRIN) lens (see Fig. 1a). The reflected pump and the converted optical signal are caught by a second GRIN lens and directed to the the cryostat output optical fiber. The diamond prism is an isosceles triangle with basis angle 53 • and height 0.8 mm. The prism's input and output sides are antireflexion coated, and it is fixed from the backside to a copper wire as shown in Fig. 1a. The copper wire goes through a small canal outside the microwave cavity and is attached to a linear piezo-positioning stage (PS). This way the distance d between the WGM resonator and the prism coupling surface can be controlled with nanometer scale precision.
In order to reduce GRIN lens misalignments during cool down, we machine a single piece, oxygen-free copper holder which has the prism-WGM resonator coupling point at its center. Furthermore, we set up a low temperature realignment system which consists of two ANPx101-LT and one ANPz101-LT PSs from attocube for each GRIN lens, allowing us to align them in the x-y-z direction. A feedback algorithm tracks the overall optical transmission as well as the optical mode contrast during the dilution refrigerator cooldown to 3 K where the final alignments are performed before condensation and further cooldown to the cryogenic base temperature of about 7 mK. A tunable laser is equally split (50/50) into two paths at the optical coupler OC1. The upper path is used as the optical pump and it goes through a variable optical attenuator VOA1 that allows to vary Pp. The optical pump can then be either sent directly to the cryostat fiber, or it can go first through an electro-optic modulator (EOM) in order to create sidebands for spectroscopy calibration. The second path (horizontal) is used to generate the optical signal. It goes also through a variable optical attenuator and it is then frequency up-shifted by ωe (∼FSR) using a single sideband EO-modulator with suppressed carrier (SSB-SC) driven by a microwave source with local oscillator frequency ωe (S1). A small fraction (1%) of this signal is picked up and sent directly to an optical spectrum analyzer (OSA) for sideband and carrier suppression ratio monitoring. The rest (99%) is recombined with the pump at OC2, sent to the fridge input fiber and the total power is monitored with a power meter (PM). The optical tones are focused on the prism with a GRIN-lens which then feeds the WGM resonator via evanescent coupling. Polarization controllers PC2 and PC3 are set to achieve maximum coupling to a TE polarized cavity mode. The reflected (or created) optical sideband signal and the reflected pump are collected with the second GRIN-lens and coupled to the cryostat output fiber. The optical signal is then split: 90% of the power goes to the OSA and 10% is sent to a photodiode (PD), which is used for mode spectroscopy and to lock on the optical mode resonance during the conversion measurement. The 90% arm is either sent directly to the OSA, or goes through an EDFA for amplification, depending on the microwave to optics converted signal power. On the microwave side, the signal is sent from the microwave source S2 (or from the VNA for microwave mode spectroscopy) to the fridge input line via the microwave combiner (MC1). The input line is attenuated with attenuators distributed between 3 K and 10 mK with a total of 60 dB in order to suppress room temperature microwave noise. Circulator C1 redirects the reflected tone from the cavity to the amplified output line, while C2 redirects noise coming in from the output line to a matched 50Ω termination. The output line is amplified at 3 K by a HEMT-amplifier and then at room temperature again with a low noise amplifier (LNA). The output line is connected to switch MS1, to select between an ESA or a VNA measurement. Lastly, microwave switch MS2 allows to swap the device under test (DUT) for a temperature T50Ω controllable load, which serves as a broad band noise source in order to calibrate the output line's total gain and added noise (see D 2)

Optical characterization
The optical resonator is characterized by analyzing its reflection spectrum. We sweep the frequency ω/(2π) of an optical tone over several GHz around 1550 nm and measure the intensity of the reflected signal on a photodiode (PD in Fig. S1). In Fig. S2a we show the pump mode spectrum for a TE polarized tone, whose polarization is parallel to the WGM resonator's symmetry and optical axis. The optical free spectral range (FSR) for this mode was measured by superimposing it with EOM-generated sidebands from modes one FSR away [6] (see Fig. S1 for the EOM). The measured optical FSR changed from 8.79 GHz at room temperature to 8.82 GHz at base temperature.
To characterize the coupling to the optical system, we measure the optical mode spectrum for different positions where the factor Λ describes the electric field overlap between the evanescent tail of the beam reflecting on the prism and the resonator mode and ∆ω κ o . The external coupling rate κ ex,o strongly depends on the distance d between the prism and the resonator with κ max ex,o = κ ex,o (d = 0) and k 0 = ω o n 2 LN − 1/c [42], n LN the refractive index of LN and c the speed of light in vacuum. We control the distance by applying a DC-voltage to the piezo stage d ∝ −V dc and measure the transmission spectrum. The fitted total optical linewidth κ o = κ ex,o + κ in,o as a function of V dc is shown in Fig. S2a. From an exponential fit of the measured κ o (red line), we extract the offset corresponding to κ in,o /(2π) = 9.46 MHz. Furthermore, at critical coupling (κ ex,o = κ in,o ), we extract Λ 2 = 0.38 from a fit to Eq. (B1) as shown in Fig. S2a. The intra-cavity photon number for the optical pump is given by where P p stands for the optical pump power sent to the resonator-prism interface. The WGM resonators's FSR and linewidth do not change over the full optical pump power range.
Appendix C: Microwave cavity

Design
The conversion efficiency between the optical and microwave modes depends strongly on the microwave electric field confinement at the rim of the WGM resonator. Our hybrid system, based on a 3D-microwave cavity and a WGM resonator, offers a high degree of freedom to control the microwave spatial distribution ψ e ( r), microwave resonance frequency ω e and external coupling rate κ ex,e . We used finite element method (FEM) simulations in order to find suitable design parameters for the microwave cavity.
A schematic drawing of the microwave cavity with its important dimensions is shown in Fig. S3a. The LiNbO 3 WGM resonator is clamped between two aluminum rings (highlighted in blue). In this way we maximize the microwave electric field overlap with the optical mode, the latter being confined close to the rim of the WGM resonator (see Fig. S3b). The microwave spatial electric field distribution shows one full oscillation along the circumference of the WGM resonator (see Fig. S3c and d) to fulfill the phase matching condition. The aluminum rings have a cut in the r (mm) middle in order to maximize the field participation factor and minimize potential magnetic losses in the dielectric. The cavity's cylindrical inner volume can be tailored to achieve the desired microwave resonance frequency, which can then be tuned by ∼ 500 MHz in situ, by moving an aluminum cylinder placed inside the lower ring. This allows to compensate the thermal contraction induced frequency shift that occurs during cooldown of the device. The top right part of the shown top half of the cavity is cut out in order to facilitate the assembly of the device.

FEM simulation of electro-optic coupling
From FEM simulations we obtain the single photon spatial electric field distribution given as E e,z (r, z, φ) = E max e,z Ψ e (r, z)(1 + f (φ)) cos(φ), where Ψ e (r, z) is normalized to 1, r = x 2 + y 2 and φ = arctan(y/x), see also Fig S3d. For this simulation we used the reported [43] dielectric permittivity of lithium niobate at 9 GHz, i.e. ε e = (42.5, 42.5, 26). The function f (φ) is symmetric and describes the deviation of the azimuthal field distribution from a pure sinusoidal shape as shown in Fig. S3c. The optical mode is distributed along the ring {r = r o , z = 0, φ ∈ [0 2π]} and the maximum value of the microwave electric field on this ring is E eo = E e,z (r o , 0, φ max ) = E max e,z Ψ e (r o , 0) = 11.1 mV/m. The optical mode being a clockwise (C) traveling wave, we must decompose the stationary microwave field into a clockwise and a counterclockwise (CC) traveling wave in order to calculate the coupling By introducing E e,z (r o , 0, φ, t) into Eq. (2), we get (E + CC and E − CC do not participate in the interaction) Where n p and n o are the refractive indices of the pump ω p and the sideband ω o . The effective mode volumes V k are given by the integral dV ψ k ψ * k over the respective optical field spatial distributions ψ p = Ψ p (r, θ)e −im and ψ o = Ψ o (r, θ)e −i(m+1) . The second term in the integral in Eq. (C2) is zero due to the symmetry of f (φ), reducing Eq. (C2) to where n o ≈ n p = 2.13 (ε o ≈ ε p = 4.54) is the extra-ordinary refractive index (dielectric permittivity) of LN at ω o ≈ ω p = (2π) × 193.5 THz and r 33 = 31 pm/V is the electro-optic coefficient. For these values we estimate g sim /(2π) = 38 Hz at room temperature.
3. Room temperature measurement of g The system was assembled at room temperature and a microwave tone was fed into the cavity with a coaxial probe coupler of length 1.2 mm. By displacing the tuning cylinder the cavity frequency ω e /(2π) could be shifted from 8.40 to 9.22 GHz, slightly shifted up compared to numerical simulations. We attribute this to small air gaps between the WGM resonator and the aluminum disk, which decrease the effective dielectric constant between the electrodes. To match the measured frequency range exactly, we introduce an air gap of only ∼ 1 µm in the simulations, bringing down the estimated coupling to g sim /(2π) = 36.2 Hz, At ω e = FSR, the microwave mode has the parameters κ ex,e /(2π) = 2.48 MHz and κ in,e /(2π) = 29 MHz. We infer the nonlinear coupling constant of the system by applying a strong microwave drive tone to the cavity and measuring the resulting optical mode splitting S ≈ 4 √ n e g rt as described in Ref. [17]. In Fig. S4a we show a measured splitting of S/(2π) = 220 MHz for a 9.3 dBm microwave pump power applied on resonance. This corresponds to g rt /(2π) = 36.1 Hz, a five fold improvement compared to earlier results [6], and in excellent agreement with the simulations.

Microwave cavity fabrication
The microwave cavity is milled out of a block of pure aluminum (5N). It is divided into a lower and upper part, that are closed after placing the WGM resonator and the prism using brass screws. The internal geometry can be seen in Fig. S3a. When closing the cavity the pure aluminum rings get in contact with the optical resonator. The rings deform slightly, which minimizes the formation of air gaps that would otherwise reduce g.

Microwave characterization
The microwave resonance tuning range was measured with a VNA connected to the cryostat transmission line as shown in Fig. S1. The resonance frequency can be tuned from 8.70 GHz to 9.19 GHz as shown in the main text. This range is at slightly higher frequency compared to the room temperature one. This we attribute to thermal contraction that leads to small air gaps. The decay rates of the microwave mode at the cryogenic base temperature and the lowest optical input power are κ ex,e /(2π) = 3.7 MHz and κ in,e /(2π) = 6.7 MHz. Unlike the optical system, the microwave cavity's parameters undergo changes as a function of the optical pump power P p applied to the WGM resonator. In Fig. S5a we show the normalized spectra of the microwave resonance at the smallest (blue) and largest (red) optical pump power together with a Lorentzian fit. From these measurements we extract the microwave resonance frequency ω e (shown in panel b) and the internal and external loss rates κ in,ex,e (shown in panel c) as function of P p . κ ex,e depends only on the fixed geometry and is approximately constant. In contrast κ in,e increases and ω e decreases with increasing P p until the microwave cavity undergoes the superconducting phase transition. Once the normal conducting state is reached, a further increase of P p does not lead to any perceptible change, as can be seen in Fig. S5 panels b and c. The microwave resonance red shift (see Fig. S5c) and the κ in,e increase are expected due to optically induced creation of quasiparticles in the aluminum cavity as discussed for example in Ref. [12]. While the local heating is significant, the temperature of the mixing chamber plate of the dilution refrigerator follows a slow (P 0. We define the noise conversion matrix σ ij as the ratio between the output noises and the microwave input noise to the system in the absence of any coherent signal as [5,42] where N wg (ω) = (exp( ω/k B T wg ) − 1) −1 and N b (ω) = (exp( ω/k B T b ) − 1) −1 are wide band distributions compared to κ ex,e , such that they can be approximated as constant.
The full input-output model including vacuum noise is given as with n vac = 0.5 0.5 .
The device is fixed to the mixing chamber of a dilution refrigerator with a base temperature of ∼ 7mK, preventing direct access to the device's input and output ports, see Fig. S1. In Fig. S6b we present a simplified schematic of the measurement setup, with attenuation and gain β 1 , β 2 for the optical path, and β 3 , β 4 for the microwave path. We define the measured scattering matrix including the transmission lines on resonance as where η i = κ ex,i /κ i and C = 4npg 2 κoκe stands for the multi-photon electro-optic cooperativiy. For large signal detuning ω ∆ = ω − ω 0 κ e , κ o with ω 0 = ω e , ω o the scattering matrix simplifies to We infer the bidirectional conversion efficiency at each optical pump power by measuring the microwave-to-optics and optics-to-microwave transmissions on resonance and the microwave-to-microwave and optics-to-optics reflections off resonance. The total device efficiency can then be defined as In the limits for C 1 this can be approximated as This equation was used to calculate the nonlinear coupling constant in the main text. It can be also shown that in the limit of C 1, Λ < 1 and η e = 0.5, the bidirectional efficiency can be estimated using only resonant measurements where Λ and η i are measured accurately from microwave and optical spectroscopy.
The system noise originates from the microwave resonator and waveguide baths N b and N wg respectively. By applying the matrix to the noise vector in Eq. (D3) we can solve for the output noise N out , which simplifies in the low cooperativity limit (G 2 κ o κ e ) to N out,e (ω) = 4κ in,e κ ex,e κ 2 e + 4ω 2 (N b − N wg ) + N wg + 0.5. (D10) In our system the resonator bath N b is always hotter than the waveguide bath N wg , because the dominant part of the dielectric absorption takes place right inside the resonator. Therefore, the output noise spectrum N out (ω) always consists of a Lorentzian function with amplitude N b − N wg on top of the broad band noise level N wg as shown in Fig. 3. Finally, following the same formalism the integrated (dimensionless) internal microwave mode occupancy is given as

Microwave calibration
The microwave transmission line is characterized by the input attenuation β 3 , the output gain β 4 and the total added noise of the output line N sys . The output line is first calibrated by using a 50 Ω load, a resistive heater, and a thermometer that are thermally connected. Weak thermal contact to the mixing chamber of the dilution refrigerator allows to change the temperature T 50Ω of the 50 Ω load without heating up the mixing chamber. We vary T 50Ω from 21.5 mK to 1.8 K and measure the amplified thermal noise on a spectrum analyzer. The measured power spectral density P ESA (ω) is approximately constant around the microwave resonance frequency ω e and its temperature dependence follows P ESA = ω e β 4 BW 1 2 coth ω e 2k B T 50Ω + N add , where BW stands for the chosen resolution bandwidth, k B is Boltzmann's constant and N add is the effective noise added to the signal at the output port of the device due to amplifiers and losses. At T 50Ω = 0 K this reduces to P ESA = ω e β 4 BWN sys with N sys = N add + 0.5. Figure S7 shows the detected noise N det − N add = P ESA /( ω e β 4 BW) − N add as a function of the load temperature T 50Ω . The values for gain and added noise obtained from a fit to Eq. (D12) are 67.65 ± 0.05 dB and 10.66 ± 0.15 as shown in Fig. S7. The emitted black body radiation undergoes the same losses and gains, as shown in Fig. S1, up an independently calibrated cable length difference right at the sample output resulting in an additional loss of 0.6 ± 0.09 dB. Taking into account this addition loss we arrive at the corrected gain and system noise β 4 = 67.05 ± 0.16 dB and N sys = 12.74 ± 0.36. For the stated error bars we take into account the 95% confidence interval of the fit, an estimated temperature sensor accuracy of ±2.5% over the relevant range, as well as the estimated inaccuracy in the cable attenuation difference. The input attenuation is then easily deduced from a VNA reflection measurement |S ee | 2 that yields β 3 = −74.92 ± 0.16 dB.

Optical calibration
The optical transmission lines consist mainly of two optical single mode fibers. The input optical line starts from the OC2 (see Fig. S1) and terminates at the WGM resonator-prism interface. The output optical line is defined from the WGM resonator-prism interface to the OSA (see Fig. S1). From the measured external conversion efficiencies η tot and the microwave line calibration, we can determine the losses of the input and output transmission lines using P out,o ω o = β 2 η tot β 3 P in,e ω e P out,e ω e = β 4 η tot β 1 P in,o ω o (D13) where P in,i are the input powers of our transmission lines coming out from OC2 and S2 and P out,i are the measured powers at the end of the transmission lines measured with the OSA and ESA. The procedure yields the input attenuation β 1 = −4.81 dB and the output gain (via EDFA) β 2 = +30.8 dB. For measurements above P p = 0.1 mW, we bypass the EDFA by switching OS2, resulting in an output attenuation of β 2 = −5.5dB. Figure S8 shows the measured total conversion efficiency as a function of signal frequency (dots) using Eq. (D13) together with theory (lines) using Eq. (6) in both conversion directions for two different pump powers. Because the optical calibration Eq. (D13) assumes symmetric bidirectionality we also find that the measurement results are perfectly symmetric. Nevertheless, direct measurements of β 1 taken at room temperature of -2.6 dB are in good agreement with the optical calibration. We attribute the additional loss of up to 2.2 dB to changes in the optical alignment during the cooldown, e.g. in the cold APC connector, as well as reflection loss at the first prism surface that is not included in the room temperature calibration.