Resonant Excitation and Purcell Enhancement of Coherent Nitrogen-Vacancy Centers Coupled to a Fabry-P\'{e}rot Micro-Cavity

The nitrogen-vacancy (NV) center in diamond has been established as a prime building block for quantum networks. However, scaling beyond a few network nodes is currently limited by low spin-photon entanglement rates, resulting from the NV center's low probability of coherent photon emission and collection. Integration into a cavity can boost both values via the Purcell effect, but poor optical coherence of near-surface NV centers has so far prevented their resonant optical control, as would be required for entanglement generation. Here, we overcome this challenge, and demonstrate resonant addressing of individual, fiber-cavity-coupled NV centers, and collection of their Purcell-enhanced coherent photon emission. Utilizing off-resonant and resonant addressing protocols, we extract Purcell factors of up to 4, consistent with a detailed theoretical model. This model predicts that the probability of coherent photon detection per optical excitation can be increased to 10% for realistic parameters - an improvement over state-of-the art solid immersion lens collection systems by two orders of magnitude. The resonant operation of an improved optical interface for single coherent quantum emitters in a closed-cycle cryogenic system at T $\sim$ 4 K is an important result towards extensive quantum networks with long coherence.

The nitrogen-vacancy (NV) center in diamond combines optical transitions suitable for remote entanglement generation under moderate cryogenic conditions with outstanding electron spin coherence (T 2 > 1 s) and extensive control capabilities over local 13 C memory atoms [21,22]; these features have enabled pioneering quantum network experiments [23,24] and fundamental tests of physics [25]. However, entanglement generation rates are limited by the relatively low photon emission into the zero-phonon line (ZPL), as well as low collection efficiency from diamond, hindering scaling beyond a few nodes. Both values can be significantly increased by embedding the NV center inside an optical cavity, making use of the Purcell effect. Cavity-coupling of NV centers at low temperature has been demonstrated for different cavity implementations, including photonic crys-tals [26][27][28][29][30][31], microdisk resonators [32], and open microcavities [33,34].
Entanglement generation between nodes can be achieved through resonant excitation and photon detection from individual, coherent emitters [21]. However, poor optical NV center coherence (∼ GHz linewidths), resulting from surface noise effects and/or implantationinduced damage, has prevented the required resonant optical addressing of Purcell-enhanced NV centers in optical cavities to date [26,29,31,33,35,36].
In this work, we capitalize on recent breakthroughs in diamond membrane fabrication [37] and diamond-based open micro-cavities [33,[38][39][40][41][42][43] and demonstrate Purcell enhancement of coherent, resonantly excited NV centers coupled to a fiber-based cavity. We outline the experimental system in Sec. II, and begin experiments by verifying Purcell enhancement under off-resonant excitation in Sec. III. Next, we develop a resonant excitation protocol for measuring optical coherence (Sec. IV) and Purcell enhancement (Sec. V). Finally, in Sec. VI, we detect coherent, Purcell enhanced photon emission from a single quantum emitter, and model current photon loss sources, paving the way for future experimental designs.

II. EXPERIMENTAL SETUP
An overview of the experimental setup used in this work can be seen in Fig. 1(a). At the heart of the experiment is an open, fiber-based Fabry-Pérot micro-cavity (design finesse 6200), formed from a flat, super-polished mirror, and a laser ablated fiber mirror [44]; this finesse value is chosen as it maximises the outcoupled fraction of photons emitted into the zero-phonon line (ZPL) for the vibrations present in the cavity [45]   (c) From bottom to top, measurements of intrinsic cavity linewidth, inferred vibrations-broadened cavity linewidth, spectrometer peaks (same data as right panel of (b)), and stabilization curve, using the continuous off-resonant excitation and ZPL detection measurement sequence displayed in (d). (d) General measurement sequence used throughout this paper. A ∼ 637 nm laser is frequency stabilized to a wavemeter (1), and serves as a reference for the cavity length (2). Measurement blocks that are repeated multiple times are interleaved between stabilization rounds (3). please see the supplementary material [46]). The fiber sits on top of a piezo positioning stage, which enables in situ tuning of the cavity position and length under operation in a closed-cycle cryostat (T ∼ 4 K for all measurements in this paper). Excitation light is delivered to the cavity via the fiber mirror, while all detection in this work takes place in free space through the flat mirror of the cavity. Unbalanced mirror coatings set the design finesse almost entirely by transmission through this flat cavity mirror; for a full overview of the experimental setup, please see [46]. An electron irradiated and annealed diamond membrane is bonded to the flat cavity mirror [40], and etched down to a final thickness of ∼ 5.8 µm in the cavity region, following the process flow developed in Ref. [37]. Importantly, this recipe has been shown to preserve the optical coherence of NV centers needed for entanglement generation, even for few-µm thin diamond samples.
We start characterizing the coupled system of the membrane and fiber-cavity by recording cavity spectra under illumination with a broadband light source for different cavity lengths. From fits to a transfer matrix model [38], we infer a membrane thickness of ∼ 5.8 µm, and an air gap of ∼ 7 µm (see App. A). This air gap could not be reduced further, likely due to a piece of dust on, or an angled mounting of, the fiber. This limits the cavity finesse to ∼ 2000 due to operation in the clipping loss regime of the fiber mirror [44,45]. The cavity parameters (NV center excited state decay rate, NV-cavity coupling rate, and cavity decay rate {γ, g, κ} ∼ 2π × {13 MHz, 180 MHz, 3.5 GHz}) place the system in the weak coupling regime of cavity quantum electrodynamics, in which collection of photons from an NV center is maximized (App. A).
We find NV centers by scanning a fundamental cavity mode over the ZPL transition frequencies (∼ 470.4 THz), while constantly illuminating with off-resonant (∼ 515 nm) laser light. At this laser wavelength, the optical excitation keeps the NV center predominantly in the negative charge and m s = 0 spin state. Fig. 1(b) shows the fluorescence counts in a window of ± 2 nm around the expected ZPL frequency on a spectrometer, for different piezo voltages (and thus cavity lengths). Background light in the cavity serves as an internal light source [47], revealing the expected decrease in cavity frequency with increasing voltage. As the cavity mode is tuned through the NV center ZPL transition frequencies, additional NV center fluorescence is collected. Summing the spectrometer fluorescence counts per frequency over all cavity lengths reveals two peaks (right panel of Fig. 1(b)), each significantly wider than the intrinsic cavity linewidth.
To investigate the origin of the observed width of the NV center emission peaks, we compare the intrinsic cavity linewidth (blue trace in Fig. 1(c)) -measured on a timescale much faster than the vibrations in the cavity [46] -to the "vibration linewidth", calculated by convolving the measured vibration value with the intrinsic cavity linewidth (orange trace in Fig. 1(c)). The resulting curve, which is an estimate for the linewidth averaged over during the full spectrometer run, explains most of 40 [46]. (c) Detuning sweep of the cavity with respect to the NV centers with measured ZPL fluorescence counts (blue) and lifetime (red). The data is taken on different days (circles and diamonds). We perform a joint fit of our model to both curves (solid lines) [46]. (b) and (c) were measured at different locations on the sample, which are also different from Fig. 1.
the width of the peaks in the summed spectrometer data (green trace in Fig. 1(c)); the origin of the additional widening of the peaks is investigated below. For more refined cavity control, we design a measurement protocol which we use throughout the remainder of this paper (see Fig. 1(d)). The sequence starts by stabilizing a ∼ 637 nm laser to a given setpoint using a wavemeter. We then stabilize the cavity to this laser frequency, after which we start a measurement sequence. By repeating this procedure, we can take out slow drifts between measurements. We test this measurement protocol by stabilizing the cavity to different laser setpoints and taking fluorescence data in the ZPL under off-resonant excitation (red trace in Fig. 1(c)). The resulting data is consistent with the lineshape found via the spectrometer measurement.

III. OFF-RESONANT EXCITATION
Entanglement generation rates in quantum networks scale with the collection of coherent photon emission. A cavity acts as a spectral filter, only allowing resonant emission to exit, and opens up an additional decay channel for the excited state of any coupled NV centers, which in turn decreases their lifetime. Thus, when the cavity is resonant with an NV center optical transition, more ZPL light should be emitted with a reduced lifetime, as has been observed in different cryogenic systems [26][27][28][29][30][31][32][33]. Importantly, this additional decay is also funneled into the cavity mode which can be readily collected, as opposed to free space systems, in which the ZPL light is emitted in all directions. From the reduced lifetime, one can (following the terminology of Ref. [33]) extract the Purcell enhancement induced by the cavity, F ZP L P , as where β 0 is the Debye-Waller factor, recently estimated to be 2.55% [33], τ 0 is the lifetime without the influence of the cavity, and τ is the reduced NV center lifetime (see App. B for a derivation). In this paper, we choose the definition of the Purcell factor as the increased emission into the zero-phonon line (ZPL) only -rather than the increase in total emission -because it better reflects the coherent ZPL light which can be used for entanglement. A subset of the ZPL light, (F ZP L To measure lifetime and fluorescence from NV centers in the cavity under off-resonant excitation, we replace the measurement block in Fig. 1(d) with pulsed green excitation, pictured in Fig. 2(a). The cavity is again stabilized with a red laser and the excitation is provided by a ∼ 532 nm picosecond pulsed laser input through the fiber. We collect the resulting fluorescence in the ZPL path on an avalanche photodiode (APD), and extract counts per pulse and lifetime as we sweep the detuning of the cavity. For further experimental details please see the supplemental material [46]. Fig. 2(c) shows one such detuning sweep, in which the fluorescence counts and lifetime vary with cavity detuning. The highest fluorescence counts and lowest NV center lifetime coincide, demonstrating Purcell-enhanced NV center emission induced by coupling to the cavity. The widths of the fluorescence peak and of the lifetime reduction curve are several times broader than the cavity linewidth. To understand the quantitative behavior of the detuning sweep, we introduce a model which includes vibrations of the cavity and a spectral distribution of NV center transition frequencies (see App. B). By fitting this model to the data, we can extract the emission into the ZPL in the cavity, (F ZP L P − 1)β 0 = (7.9 ± 2.2)%, and the off-resonant lifetime, τ 0 = (11.8 ± 0.2) ns, which is consistent with the lifetime of NV centers reported for bulk diamond [48,49]. This data was taken on two different days, leading to two different curves in Fig. 2(c) (circles and diamonds) that can be explained by drifts, which we account for in our modeling [46].
We investigate whether the emission is produced by a single emitter using an autocorrelation (g (2) ) measurement. At most locations, there is little or no drop in co-  incidence counts at zero pulse separation. Therefore, we conclude that we are addressing several emitters within the cavity mode volume, likely because the high density of NV centers in our sample lowers the chance of single center addressing with off-resonant excitation. Fig. 2(b) displays the most significant drop in coincidences observed, which falls to (0.58 ± 0.05) at zero time delay ([0.54 ± 0.05] with background correction). We observe significant bunching behavior for small pulse separations, which we attribute to probabilistic state initialization into a bright state [46]. The Purcell enhancement of the ZPL by a factor of 3.9 ± 0.9 (assuming the same β 0 as in Ref. [33]), is a lower bound for the largest enhancement of single centers at this spot, because there are multiple NV centers within the cavity mode volume.

IV. CONTINUOUS RESONANT EXCITATION
Entanglement generation in quantum networking protocols requires coherent addressing of individual zerophonon line (ZPL) transitions with linewidths close to their lifetime-limited value [21]. To determine the suitability of our device for such tasks, we now move on to photoluminescence excitation (PLE) scans. The measurement protocol for these scans is displayed in Fig. 3(a, inset): short green pulses used for initialization of the NV center in the negative charge and m s = 0 spin state are interleaved with red measurement pulses. Fluorescence counts collected in the phonon-sideband (PSB) during the red pulses are then correlated with simultaneously recorded wavemeter readings. This scheme thus allows predominant detection of the two m s = 0 spin-conserving optical transitions per NV center; these transitions connect the ground state to the two optically excited states typically labeled as E x and E y . For further details about this measurement sequence, please see the supplemental material [46].
A resulting PLE scan can be seen in Fig. 3(a). We observe a multitude of narrow lines per cavity spot, confirming our interpretation that there are multiple NV centers present per cavity mode volume. Fig. 3(b) shows a zoomin into the red dashed region of Fig. 3(a); note that each binned point is comprised of many underlying individual datapoints. Importantly, the individual transition peaks can be spectrally distinguished, which is a pre-requisite for single resonant NV center addressing, as probed below.
To test the spectral stability of the transition peaks, we repeatedly scan the excitation laser over a NV center ZPL transition. Fig. 3(c) shows a series of 17 consecutive scans over the frequency region dashed in purple in Fig. 3(b). To correct for slow drifts in this measurement (likely due to temperature fluctuations in the cryostat), we fit a Gaussian lineshape to each individual PLE trace, and shift the lines to a common center frequency, displayed in Fig. 3(d). Fig. 3(e) shows the averaged data of the original and centered scans in blue and red, respectively. From Gaussian fits to this data (solid lines), we extract full width at half maximum linewidths of (224 ± 10) MHz and (190 ± 9) MHz for the original and centered case, respectively. We probed the linewidths of 14 NV center transitions during the course of this study, for a total of 4 different cavity positions. The supplemental material [46] shows the individual linewidths of each NV center transition, which average to (204 ± 59) MHz and (168 ± 49) MHz, for the un-centered and centered case, respectively. Thus, the linewidths in our sample are comparable to the ones found in bulk diamond samples for green repumping [50]. Importantly, NV centers with similar optical coherence have enabled previous entanglement generation experiments [21].
We next investigate the polarization behaviour of NV center transitions in the cavity by scanning the polarization angle of the excitation laser with a half wave plate (HWP), and observe a dependence of the background corrected fluorescence counts on this HWP angle, see Fig. 3(f); polarization can be used to suppress resonant excitation pulses in a cross-polarization detection scheme [21], as demonstrated below. Interestingly, the NV centers in the two studied frequency clusters around 470.407 THz and 470.452 THz show a different polarization behaviour. A possible origin of this effect is a strain-induced splitting in transition frequencies for E x and E y polarized lines, as observed in Ref. [37]; future investigation is required to conclusively determine the origin [46].

V. PULSED RESONANT EXCITATION
Now that we can resolve individual NV centers spectrally, we characterize their Purcell enhancement with a detuning sweep similar to that of Sec. III. The measurement block replaces off-resonant pulses with single frequency resonant pulses ( Fig. 4(a)). The sequence consists of a green repump pulse to initialize the NV center predominantly in the negative charge and m s = 0 spin state, followed by a series of short (∼ 2 ns) red pulses at the frequency of the NV center transition. We record the fluorescence counts in the phonon-sideband (PSB) path after each pulse, and extract the NV center lifetime. For further details about the measurement sequence, please see the supplementary material [46].
First, we perform a pulsed autocorrelation measurement with two detectors in the PSB path to confirm that the individual peaks measured during photoluminescence excitation scans are indeed from single NV centers. Fig. 4(b) displays the normalized g(2) value of (0.19 ± 0.09) ([0.16 ± 0.07] with background correction) for zero pulse separation. Unlike the off-resonant case, the value at zero pulse difference clearly falls below 0.5 in all three NV centers we tested, indicating that we are observing single quantum emitters. The fit accounts for bunching effects due to probabilistic state initialization into a bright state and the finite pulse train we apply (solid line) [46]. (c) Detuning sweep of the cavity with respect to the NV center with measured PSB fluorescence (blue) and lifetime (red). We perform a joint fit of both curves to a model (solid lines) with four free parameters [46]. (b) and (c) were measured on the same NV center as Fig. 3(c-e), which is a different emitter than the ones studied in Fig. 2(a-b).
For the same NV center measured in Fig. 3(c-e), we sweep the detuning of the cavity from the NV center transition while keeping the excitation pulses resonant with the latter to measure the Purcell enhancement. The collection efficiency of PSB emission is independent of the cavity length. However, the probability of exciting the NV center depends on the overlap between the excitation laser frequency and the cavity resonance, so the PSB intensity should vary with detuning for fixed excitation power (see App. B). In the measurement (Fig. 4(c)), the fluorescence counts increase and the lifetime decreases when the cavity is on resonance with the NV center, demonstrating that we observe Purcell-enhanced NV center emission. We fit our model to both curves and extract the fraction of ZPL emission into the collectable cavity mode, (F ZP L P − 1)β 0 = (7.0 ± 3.4)% (see definition in Sec. III), the off-resonant lifetime, τ 0 = (10.9 ± 0.2) ns, and the root mean squared cavity vibrations, σ vib = (0.18 ± 0.02) nm [46]. The Purcell enhancement of this NV center is consistent with the enhancement we found for NV centers under off-resonant excitation.

VI. ZPL COLLECTION AND FUTURE IMPROVEMENTS
So far, we have only studied light emitted into the phonon-sideband (PSB) after excitation with resonant light pulses. For quantum information applications, however, it is important to extract the emitted zero-phonon line (ZPL) photons with high efficiency. In the current configuration, the excitation light is directly transmitted to the detector. Therefore, we separate out the ZPL photons from the bright excitation pulse with crosspolarization and time-bin filtering [21,25], and shorten the excitation pulse further by introducing an additional electro-optic modulator (see supplementary material [46] for details).
To be able to detect ZPL photons after a resonant excitation pulse, cross-polarization detection is especially important, because state of the art photodetectors have a dead time longer than the lifetime of the NV center; if a photon from the excitation pulse hits the detector, the dead time prevents detection of a ZPL photon, effectively reducing detection efficiency. Unfortunately, cross polarization only reduces the pulse power by a factor of 4 in our setup, which is likely due to vibrations of the freely hanging single mode fiber in the cryostat. We additionally insert a 8.6 dB attenuation neutral-density (ND) filter into the ZPL path so that the efficiency of the detector remains high. Fig. 5(a) displays the fluorescence counts after a resonant excitation pulse, recorded simultaneously in the PSB and ZPL collection paths. The fluorescence in the ZPL path decays with a lifetime that agrees to within error with the lifetime of the NV center in the PSB path [46]. Thus, our technique enables us to see ZPL light from an NV center in a cavity under resonant excitation.
Summing the fluorescence counts from the NV center in the ZPL path gives us a benchmark for the performance of our system as a collection enhancement tool. The excitation pulse obscures the initial counts, so we extrapolate a fit to the lifetime to extract the total NV center counts in the ZPL and correct for the ND filter. We find (9.3 ± 0.2) ×10 −5 photons per pulse in the ZPL path and (4.6 ± 0.1) ×10 −4 in the PSB path. Due to low available laser power, we operate with both low initialization probability into the m s = 0 state and low excitation probability to the excited state [46].
To better compare the cavity with other collection enhancement systems, and to understand current limitations and possible improvements in a future system, we break the loss of coherent photons down into its main contributions, see  [46]. The parallel decay of an exponential fit to both curves (solid lines) indicates emission from the same NV center. Shaded regions indicate areas for which fluorescence counts are summed to extract counts per excitation pulse (for the ZPL we include the extrapolated region shaded in light red) [46]. (b) Schematic of the sources of loss of coherent ZPL photons for the current cavity, the proposed improved cavity and a state-of-the art solid immersion lens (SIL) collection system. We breakdown the loss contributions into ZPL fraction, collection efficiency and excitation probability and separate the losses into vibration induced losses (stripes) and internal cavity losses (dotted) (see App. D).
PSB, rather than the ZPL. Importantly, correcting for the excitation probability alone already raises the ZPL detection probability to ∼ 2.0×10 −3 , which is comparable to the collection of coherent ZPL photons achieved for a NV center in a solid immersion lens (∼ 5 ×10 −4 ) [24,25]. Thus, despite the relatively low collection efficiency from our current cavity, limited in large parts due to operation in the clipping regime of the fiber mirror, our system already produces ZPL photons at a level comparable to state of the art non-cavity systems (see App. D).
There are a number of changes which could improve ZPL collection under resonant excitation; we focus on three main developments which have already been achieved in other systems. First, introducing microwaves and a spin pumping laser is a standard technique for initializing and controlling the NV − charge and m s = 0 spin state with high fidelity in bulk samples. The narrow spectral diffusion linewidths demonstrated in this work should make this possible in our system as well. Together with upgraded laser excitation pulse power, this should allow for near unity excitation per pulse, as is common for bulk diamond samples [51]. By increasing the polarization extinction by a factor of at least 100 by either fixing the fiber in the cryostat or switching to polarization maintaining fiber, the excitation pulse can be suppressed sufficiently even for these higher excitation powers. Second, reducing the vibrations by a factor of 20 (from ∼ 0.2 nm to ∼ 0.01 nm), as demonstrated in Refs. [14,[52][53][54][55], would increase the ZPL detection fraction by a factor of ∼ 16. Our current data already shows evidence for this potential improvement: by correlating the lifetime and the PSB fluorescence counts per resonant excitation pulse with the vibration level in the cryostat, we observe a reduction in lifetime (increase in fluorescence counts) from (10.02 ± 0.12) ns ([2.5 ± 0.1] ×10 −4 ) to (9.77 ± 0.08) ns ([5.8 ± 0.2] ×10 −4 ) for data collected during high and low vibration time periods of the cryostat, respectively. We explore this correlation in detail in App. C. Finally, by working with a different fiber that is not clipping-loss limited, we expect to improve collection by a factor of 3. Together, these three improvements would raise the joint probability of producing and detecting a ZPL photon after short pulsed excitation to ∼ 10%.

VII. CONCLUSION
We have demonstrated the resonant operation of Purcell enhanced, coherent single photon emitters coupled to a fiber-based cavity system in a closed-cycle cryostat operated at a temperature of 4 K. We are able to address single Purcell enhanced NV centers via frequency-selective resonant excitation, and we have developed a sequence that allows us to collect up to (9.3 ± 0.2) ×10 −5 photons per excitation pulse in the zero-phonon line (ZPL). We have developed a theoretical model that describes our results, and used it to identify low excitation probability per laser pulse, length fluctuations between the cavity mirrors, and losses related to clipping on the fiber mirror as the main limitations in our current system. Using mutually non-exclusive numbers that have already each been achieved in several systems, we predict that we can increase the collected ZPL photons per excitation pulse to ∼ 10% in a future NV-cavity system operated in a closed-cycle cryostat. Building on the previous success of NV centers in entanglement generation and other network protocols [21,23,24], a cavity enhanced NV-photon interface could dramatically improve entanglement rates and fidelity; single click and double click protocols would speed up by a factor of 100 and 10000 respectively. We expect that the realization of fully coherent quantum emitters embedded in optical fiber-based cavities will enable more extensive quantum networks with long coherence times, a crucial step towards a quantum internet.

VIII. Acknowledgements
We thank Wouter Westerveld, Martin Eschen, Guus Evers, and Santi Sager La Ganga for experimental assistance, Thomas Fink for fabrication of the fiber mirror, Lennart van den Hengel for electron irradiation of the diamond and Simon Baier and Conor Bradley for reviewing the manuscript. We acknowledge financial support from the EU Flagship on Quantum Technologies through the project Quantum Internet Alliance, from the Netherlands Organisation for Scientic Research (NWO) through a VICI grant, and the European Research Council (ERC) through an ERC Consolidator Grant.

A. CAVITY CHARACTERIZATION MEASUREMENTS
To characterize the cavity, we first input white light through the fiber, and measure the cavity transmission on a spectrometer for different cavity lengths. Fig. 6(a) displays the resulting dispersion diagram with avoided crossings between air and diamond modes [38,39]. From a fit to this data, we infer that we operate in an air-like mode with a diamond thickness of 5.8 µm and a typical air gap between 5 µm and 7.5 µm (7.3 µm in the measurement shown) [38]. This gap cannot be reduced further, likely due to an angled mounting of, or dirt on, the fiber. To measure the cavity linewidth, we then scan the cavity length through a resonance and measure the corresponding transmission peak after the cavity with a photodiode (sub-ms timescale to minimize vibration contributions to the linewidth). We apply sidebands to the laser to calibrate the frequency of this scan. In the measurement displayed in Fig. 6(b), the linewidth of the cavity is κ/2π = (3.5 ± 0.2) GHz, and the finesse is (2200 ± 100). Over the course of this work, we operated with a finesse between 1000 and 2500, dependent on the cavity position. Based on the measured parameters of the cavity and a complete transfer matrix method described in Ref. [45], we can estimate the maximum possible Purcell enhancement in our cavity (F ZP L P ≈ 7), which corresponds to a coupling between NV centers and the cavity of g/2π ≈ 300 MHz. This estimate holds for an NV center at the optimal depth location, which is a cavity standing wave antinode, and the optimal xy position for maximum overlap with the Gaussian mode of the cavity. We describe the mismatch with the ideal position with a parameter, ξ, which we measure via the Purcell enhancement in the main text. The coupling is also reduced by vibrations as discussed in App. B. The lifetime limited transition linewidth of the NV center is γ/2π ≈ 13 MHz [48]. In Sec. III and Sec. V, we measured the Purcell enhancement under off-resonant and resonant excitation and de- (a) Cavity dispersion for the coupled membrane-air fiber-cavity system. The length of the cavity is swept and transmission is recorded on a spectrometer. From fits of the fundamental modes to a transfer matrix model, we can determine the diamond and air thicknesses to be 5.8 µm and 7.3 µm, respectively. (b) Finesse measurement. We scan the cavity quickly over a transmission peak to extract the cavity linewidth (κ/2π = [3.5 ± 0.2] GHz) from fits to the data; the laser exciting the cavity is modulated with sidebands of 9 GHz with an EOM, serving as frequency calibration reference.
termined F ZP L P ≈ 4 and thus g/2π ≈ 180 MHz (see eq. B3). The relative values of g, κ and γ, put us firmly in the weak coupling Purcell regime (γ g κ), which is the ideal parameter range for collecting photons from an emitter [56].

Purcell Enhancement
NV centers exhibit limited emission into the zerophonon line (ZPL), β 0 , given by the Debye-Waller factor. Some states also couple to a non-radiative transition through an intersystem crossing [57]. Without interaction with a cavity, we can model the decay from the NV center excited state, γ 0 , with a radiative decay rate, γ rad , and non-radiative decay rate, γ dark , as: Coupling to the mode of the cavity opens up another decay channel for the NV center, which we characterize with a Purcell factor F ZP L P . The decay rate from the NV excited state, modified by the cavity, γ , is then given as with where g/2π is the NV-cavity coupling, κ/2π is the cavity linewidth, and γ/2π is the lifetime limited optical transition linewidth of the NV center [33]. We choose the definition of the Purcell factor as the enhancement of the ZPL -rather than the enhancement of the total emission from the NV center -because it better reflects the increase in coherent light.
In this work, we primarily investigate transitions with linear polarization, and we preferentially initialize into the m s = 0 ground state with green excitation. The transitions in the m s = 0 manifold are E x and E y , for which γ dark is much smaller than γ rad in bulk diamond [49,58]: In Sec. V, we measure a reduced lifetime when the cavity is off resonance of (10.9 ± 0.2) ns, compared to the ∼12 ns typically found in bulk diamond [49,58]; from this, we estimate that γ dark /γ rad ≈ 0.1, so we assume γ rad γ dark , and simplify eq. B1 to where τ 0 is the NV center lifetime without the cavity and τ is the modified lifetime. The fraction emitted into the ZPL in the cavity mode is ( If the assumption that γ dark is negligible is incorrect, then (F ZP L P − 1)β 0 is increased by a factor of (γ rad + γ dark )/γ rad . A simple rearrangement then gives us eq. 1.
The rates of detected photons in the ZPL path, C ZP L , and the phonon-sideband (PSB) path, C P SB , are where p ex is the probability that a pulse excites the NV center to its excited state, and η zpl (η psb ) is the detection efficiency for ZPL (PSB) photons from the cavity. A number of these parameters depend on the cavity transmission, T , which itself depends on the detuning between the cavity and the transition, ∆: This modifies F ZP L P and η zpl to The PSB collection is not resonant with the cavity, and its spectral distribution lies mostly outside of the stopband of the mirror, so there should be minimal changes from small variations in cavity detuning. For the same reason, green excitation light is also independent of the cavity detuning. However, resonant red excitation depends strongly on cavity detuning, and the corresponding power in the cavity. The Rabi frequency is proportional to the square root of the intracavity power which scales with T (∆). Therefore, in the resonant excitation case: Here φ p (0) is the Rabi rotation angle induced by the pulse on resonance (e.g. π for a complete population of the excited state). In the weak excitation limit explored in this work, this can be approximated as φ p (0) T (∆)/2.

Vibration Model
The above simple model does not take into account the vibrations in the cavity length. We extend the model of Ref. [45] to build a complete numerical transfer matrix model for the cavity. We assume a Gaussian distribution of cavity lengths, f vib , given as where σ is the width of the length distribution, and dL the length flucutations around a certain cavity length, as induced from vibrations. We then find the Purcellenhanced fraction of emission into the ZPL (F p (dL)β 0 ) for each dL.

a. Off-resonant Excitation
Under off-resonant excitation, the probability of exciting the NV center does not depend on the length of the cavity. For each frequency detuning, ∆, we find the equivalent cavity detuning length, L det , using our complete model of the cavity dispersion relationship. We integrate eq. B5 over the length distribution in eq. B11 to determine the emitted counts in the ZPL: We then numerically integrate and fit the function C(t) to an exponential decay to determine the measured lifetime τ (∆). We also sum over t to find total counts C tot (∆). For the case of multiple NV centers we also introduce an extra Gaussian broadening term, g nv , inside the integral. The exact form of g nv depends on the (unknown) distribution of NV centers, but we find that the Purcell factor we extract is relatively insensitive to the shape we pick.

b. Resonant Excitation
Under resonant excitation the probability of exciting the NV center depends strongly on the excitation laser detuning from the cavity. This modifies the function for counts from the cavity slightly. We integrate the counts curves in the same way as in the case of off-resonant excitation to determine the total counts. For collection from the ZPL and the PSB, respectively, we get p in is the probability of initializing in the m s =0 state. We do not include g nv because we only excite single NV centers in the case of resonant excitation. Counts and lifetime are then calculated in the same manner as for the off-resonant case.

C. VIBRATIONS AND THEIR INFLUENCE ON COUNTS AND LIFETIMES
All measurements in this paper were taken in a closedcycle cryostat operated at T ∼ 4 K. Such a cryogenic systems has intrinsic vibrations due to moving parts. Nevertheless, we chose a closed-cycle system -as opposed to essentially vibrations free liquid helium bath cryostatsbecause of its ease of operation, and the possibility of uninterrupted measurement cycles on a timescale of several months without any human intervention; this is an important operational consideration for a future quantum network, with nodes distributed over distant locations, as it removes the cost and labour associated with a helium infrastructure.
The average root mean square (rms) fluctuations in cavity length during one cryostat period, inferred from a cavity transmission measurement (see supplemental materials [46] for measurement details and additional data), are displayed in Fig. 7(a). We observe a (5 -6) fold change in vibration level over the course of a coldhead cycle, with two spikes in the data, related to the coldhead starting to move up and down, respectively. Therefore, the vibrations in the system are directly linked to the coldhead movement.
As discussed in App. B, cavity length fluctuations influence the collected counts and detected lifetime from an NV center. In the case of resonant excitation and PSB detection, for example, both the excitation probability, and the Purcell enhancement, depend on the cavity detuning at a given time. To probe this dependence, we utilize a synchronization signal from the cryostat coldhead that marks the beginning of a new period, and use it to record the time in the cryostat period that a NV center photon is detected. We assign NV center photon arrival timestamps to periods in the cryostat with low (orange shaded area in Fig. 7(a)) and high vibration values (blue shaded area in Fig. 7(a)). We can then extract the vibration influence on counts and corresponding excited state lifetime, see Fig. 7(b). During the cryostat times with low vibrations, we extract a NV center lifetime (counts per excitation pulse) of (9.77 ± 0.08) ns ([5.8 ± 0.2] ×10 −4 ), compared to (10.02 ± 0.12) ns ([2.5 ± 0.1] ×10 −4 ) during high vibration times. In comparison, the lifetime (counts) over all cryostat times is (9.87 ± 0.04) ns ([4.7 ± 0.2] ×10 −4 ). This serves as a further proof that it is indeed Purcell enhancement through coupling to a cavity that leads to a lifetime reduction and increase in counts of the NV center when the cavity frequency is tuned to the NV center transition frequency. Additionally, it also shows that reducing the vibrations in a future system increases Purcell enhancement, as dis- cussed in App. D below. Note that we average over all cryostat period times in the measurements in this study, except if stated otherwise.

D. CURRENT CHALLENGES AND FUTURE IMPROVEMENTS
In Sec. VI, we measured the absolute counts from an NV center in the cavity in both the PSB and the ZPL under resonant excitation. It is helpful to understand the loss contributions which prevent perfect PSB and ZPL detection. In this way, we can project what improvements may be possible with future upgrades to the setup. The investigations in this section underlie the loss plot in Fig. 5(b).

Vibrations
First, we estimate the effect of vibrations on cavity coupling and enhancement. In App. C and the supplementary materials [46], we characterize the vibrations in the system. In our model in App. B, the vibrations reduce three parameters: the excitation probability of the NV center, the ZPL fraction emitted into the cavity, and the collection efficiency of the cavity. These effects are correlated, because the vibrations are slow (∼ ms timescale) compared to the excitation and emission timescales of the NV center (∼ ns timescale). Only the excitation probability of the NV center has a notable contribution to the PSB counts, as the fraction into the PSB is only slightly decreased for our cavity, and the PSB is not collected resonantly with the cavity.
From the data in Fig. 4(b) and Fig. 5(a) of the main text, we determine that (7.0 ± 3.4)% of the emission goes into the ZPL path, and we excite and detect ZPL photons with a probability of (9.3 ± 0.2) ×10 −5 . If we set the vibration level to zero in our model, we find that 17% is emitted into the ZPL, and 1.7x10 −3 ZPL photons are collected. Thus, the total reduction of ZPL collection and excitation due to vibrations is 13 dB. Because these reductions are correlated, we distribute the vibration contributions to the losses in Fig. 5(b) according to the relative contribution to the T 5/2 term in eq. B13.

Collection Efficiency
We divide the ZPL collection efficiency into two parts: internal collection efficiency and external collection efficiency. The external collection efficiency is the coupling between the mode exiting the cavity and the detector, and is determined by the classical optics in between. Internal collection efficiency is the probability that a photon escapes the cavity and couples into the free space mode.
The internal efficiency, η int , can be estimated using the measured transmission rate of the free space mirror, κ F S , as η int = κ F S /κ. We design κ F S to be significantly larger than the fiber transmission and scattering loss rates. However, the angle of the fiber forced us to operate in a regime where diffraction losses contributed significantly to the cavity finesse (which can be readily overcome with a new fiber). Therefore, in this work we operated with η int in the range of 0.05 -0.17, which also includes a factor of ∼ 1/3 from vibrations.
To measure the external collection efficiency, we send classical light through the fiber, and estimate the losses at each section between the fiber input and the detector. The loss to and from the cavity, probed in the fiber in reflection, is approximately -12.4 dB, including the fiber connector and fiber splice twice, and the reflection off the cavity once. Based on the depth of the reflection dip from the cavity, we estimate that we have a 2.8% incoupling efficiency. Based on the design values of our We fix the design parameters of the mirrors to match our current system (Finesse 6200). Fraction emitted into the ZPL (a) and outcoupled fraction of photons in the ZPL (b) for the current (blue) and improved (green) vibration levels, as a function of achieved cavity finesse. These simulations are based on the model from Ref. [45].
mirrors, we expect 2.5% incoupling from the fiber side, which is in reasonable agreement, and suggests that there is not a large mode mismatch between the cavity and the fiber. For a measurement of the transmitted light, we filter the input laser beam with a 20 dB filter and measure counts on an APD in the ZPL path. Subtracting out the light lost in the excitation path, we estimate that the total collection efficiency is 4%. This is likely divided into ∼ 11% internal collection efficiency and ∼ 32% external collection efficiency, but the precise contributions are not known. The external efficiency is determined by classical optics and the detector efficiency, so we expect that the external efficiency can also be improved in future experiments with better coatings and superconducting nanowire detectors. We can also estimate the collection efficiency of the PSB path to make full use of the direct comparison in counts between PSB and ZPL. This efficiency is relatively low in our setup, because we have a long working distance objective with a numerical aperture of 0.55 (the supplementary material [46] contains a full description of the optics path). Depending on the angle of the NV center dipole emission axis in our <100> sample, the collection is between 0.9% and 2.5%. There is likely a bias towards NV centers with a better coupled dipole, because they produce more collectable PSB counts, and thus are easier to spot when searching for NV centers. The top collecting mirror has a narrow stop band, which by design allows 83% of the PSB light through. We esti- TABLE I. Summary of suggested improvements to current diamond fiber-cavity. The first three upgrades represent improvements that have already been achieved in other systems, as indicated in the second column. Finesse would be limited by the mirror if achieved losses limit the finesse to significantly greater than 6000. The last two items have been achieved but not simultaneously with the other requirements. We give the resulting enhancement of the ZPL collected and the absolute fraction collected in the ZPL.
mate 83% transmission through the path and 70% APD detection efficiency, but the exact values are unknown. Altogether we expect between 0.4% and 1.2% collection efficiency in the PSB.

Benchmarking with Excitation Probability Correction
We can benchmark our system against other diamond collection optics such as solid immersion lenses. However, the state of the art in these systems is near perfect initialization and excitation of the NV center. Higher initialization can be achieved with an additional spin pumping laser and microwaves [21]. In our system we initialize with a green laser, which only has a limited probability of initializing into the m s = 0 spin state. Furthermore, we do not saturate the initialization with green or the excitation with red, due to limited laser power, so it is hard to estimate the probability of excitation. Therefore, the counts we observe are not directly comparable to state of the art.
We can instead use our calculations of the collection efficiencies to estimate the excitation probability based on the total counts we observe from the NV center in the PSB and the ZPL. First we use the ZPL counts; we ex-pect that with perfect initialization the counts in the ZPL should be (F ZP L P − 1)β 0 /(F ZP L P β 0 + 1 − β 0 ) × η int × η ext or approximately 2.6x10 −3 based on the results of Sec. V and Sec. VI. This corresponds to an excitation probability of 3%. For comparison, the PSB counts should be (1 − β 0 )/(F ZP L P β 0 + 1 − β 0 )η psb . The estimated collection efficiency and measured counts give us an excitation probability between and 14% and 4% (with a bias towards the latter). This corresponds to between 0.6 and 2x10 −3 counts in the ZPL with perfect initialization. The estimates show reasonable agreement, and exact determination of the different losses is left for future work. Even with the limited collection efficiency in this work, the ZPL counts we collect are comparable to state of the art when correcting for initialization and excitation probability.

Improvements
Based on the current challenges and our modelling, we suggest three improvements which should be possible in the short term, as summarized in Tab. I. We constrain ourselves to performances that have already been achieved in other systems. The first improvement is to implement microwave control of the ground state NV center spin, which will allow spin pumping and also yellow resonant repumping for higher initialization and even lower spectral diffusion. Using these techniques, it should be possible to achieve near unity excitation as has been shown in bulk diamond [21,48].
The second improvement is to reduce vibrations; a number of research groups have achieved considerably lower vibrations of comparable fiber-based micro-cavities [14,[52][53][54]. In Fig. 8, we simulate the dependence of ZPL emission fraction and collection fraction on the finesse for different vibration levels. We project that a factor of 20 vibration levels reduction gains a factor of 16 in total ZPL collection.
The third improvement is increasing the finesse; our current finesse is limited by diffraction losses, but with a better fiber it should be limited by the mirror coatings, gaining another factor of 3. Although it has not yet been achieved simultaneously, a diamond fiber-cavity with finesse 11000 diamond-like modes would further increase ZPL collection by approximately a factor of 2. Altogether we expect that about 10% ZPL collection should be possible with already achieved parameters and 20% with combining the best results from different setups.