Telecom-band quantum interference of frequency-converted photons from remote detuned NV centers

Entanglement distribution over quantum networks has the promise of realizing fundamentally new technologies. Entanglement between separated quantum processing nodes has been achieved on several experimental platforms in the past decade. To move towards metropolitan-scale quantum network test beds, the creation and transmission of indistinguishable single photons over existing telecom infrastructure is key. Here we report the interference of photons emitted by remote, spectrally detuned NV center-based network nodes, using quantum frequency conversion to the telecom L-band. We find a visibility of 0.79$\pm$0.03 and an indistinguishability between converted NV photons around 0.9 over the full range of the emission duration, confirming the removal of the spectral information present. Our approach implements fully separated and independent control over the nodes, time-multiplexing of control and quantum signals, and active feedback to stabilize the output frequency. Our results demonstrate a working principle that can be readily employed on other platforms and shows a clear path towards generating metropolitan scale, solid-state entanglement over deployed telecom fibers.

Central to commonly used remote entanglement generation protocols [6,[8][9][10][11][12][13] is the propagation and interference of single photons that are entangled with stationary qubits in the nodes. Scaling these schemes to many nodes and to long distances poses two main challenges. First, any source of distinguishability between the emitted photons needs to be removed to generate high-fidelity entangled states. Especially for solid-state emitters, this requirement is difficult to meet for a large number of nodes due to variations in the local environment of the emitters. Second, for long-distance connections photon loss in fibers is a dominant factor determining the rate at which the entanglement generation succeeds. Leading platforms for realizing processor nodes [9,[20][21][22][23][24] in a fu-ture quantum network have natural emission frequencies in the visible spectrum; fiber losses at these frequencies hinder scaling beyond a few kilometers.
In this work, we show that both challenges can be addressed simultaneously by converting the coherent singlephoton emission from NV centers (637 nm) to a single target wavelength in the Telecom L-band (1565-1625 nm) (Fig. 1a). Using the pump lasers to compensate for local detuning and using active stabilization of the frequency of the converted field, we are able to decouple the natural emission wavelength of the emitters from the wavelength used for propagation and interference and build fully independent, modular quantum nodes. We demonstrate that this method enables the removal of spectral offsets over a broad frequency range (>3 GHz). Moreover, the chosen interference wavelength has low propagation losses over commercially available optical fibers, making it suitable for long-range single-photon transmission.
We validate our approach by measuring quantum interference [25] between telecom photons that are frequency converted from the emission of two remote NV-centers that are detuned by more than 100 linewidths. By comparing the data to a detailed model we extract both the major noise sources and the underlying indistinguishability of the converted NV photons.
FIG. 1: Lay-out of the two independent network nodes and the midpoint. a) Schematic of the main components of the set-up. Pulses excite the NV center, which emits single photons through spontaneous decay. The photons are converted to telecom wavelength and guided towards a midpoint placed in the neighboring lab, where they interfere on a beamsplitter S-I. b) NV center level structure showing the spin levels in the ground state and the relevant optical transitions. The optical transitions of the two nodes have different energies due to local variations in strain (see text). c) Detailed schematic showing the optics used for full operation of the system. The nodes are identical, and no hardware is shared between the systems. Each node has a set of charge reset, spin reset and excitation laser, which are modulated, combined and focused via a high NA objective onto the NV center. The Phonon side-band (PSB) (Zero-phonon line (ZPL)) emission from the NV center is filtered using frequency (polarization) filtering and coupled into a multi (single-) mode fiber. The PSB emission is measured locally on an APD, while the ZPL emission is sent to the QFC module. Stabilization light is split off from the excitation path, and brought into the single-photon path via a polarizing beamsplitter. The QFC module contains remotely controllable optics to align the input, pump and converted fields to the waveguide on the ppLN crystal that converts both the single-photons and stabilization light. Polarization maintaining (PM) fibers transport the photons to a midpoint where the single photons are filtered and separated from the stabilization light using a combination of in-fibre filters. The single photons interfere on a beamsplitter and are measured by SNSPDs, while the stabilization light interferes with the reference laser and is measured on a photodiode.

I. INDEPENDENT QUANTUM NETWORK NODES
We employ two independently operated quantum network nodes separated by a few meters on different optical tables. The nodes are connected to a midpoint located in two separate 19-inch racks. The relevant elements are depicted in Fig. 1c. Each node operates a single diamond NV center as stationary qubit, hosted in a closed-cycle cryostat at T ≈ 4K. The relevant energy levels and optical transitions of the NV qubit are depicted in Fig. 1b.
The spin reset transition is used for spin initialization into m s = 0 (fidelity > 0.99). We use the transition to the E x/y excited state to generate single photons: a coherent optical π-pulse (≈2 ns) brings the NV center to the excited state, followed by spontaneous emission (lifetime ≈12 ns [26,27]). Both setups employ their own lasers and optical components that deliver and collect light to the NV center.
The nodes are equipped with a (nominally identical) Quantum Frequency Conversion (QFC) module. Here the light at 637 nm is converted to 1588 nm via a singlestep Difference Frequency Generation (DFG) process. This process has previously been shown to preserve entanglement between the photon and an NV center qubit [28]. The QFC modules are based on waveguides in a periodically poled Lithium Niobate crystal (ppLN), where the large non-linear coefficient facilitates conversion of single photons using a strong 1064 nm pump field. Various remotely controllable components allow for the remote and automated optimization of the QFC modules. While this single-step conversion has the upside of using a prevalent commercially available pump laser, it has the challenge of introducing spontaneous parametric downconversion noise at the target wavelength by the pump laser due to imperfections in the crystal domains [29].
After free-space filtering to remove the bright pump light, the frequency-converted light is guided to a central midpoint. In the midpoint, a series of filters separate photons at the target wavelength from stabilization light and filter out noise photons. To achieve a high suppression of any broad-band background light, we use a two-step filtering process. First, we use a reflection off a narrow Band-Stop filter of 0.35 nm, followed by transmission through an ultra-narrow (FWHM of ∼50 MHz) Fiber Bragg Grating (abbreviated with UNF). The filters are connected via circulators, and an in-fiber polarizer is used to ensure optimal performance of the UNF (Fig. 1B). After filtering, we interfere the two paths using a 50 : 50 beam splitter, and detect the photons using single-photon superconducting nanowire detectors (SNSPDs).

II. REALIZING SPECTRAL STABILITY VIA FEEDBACK
For solid-state emitters, the local environment can influence the emission properties directly via electric, magnetic or strain fields. For instance, for NV centers in nominally low-strain samples the local strain environment can shift the emission frequency by more than 1000 times the linewidth [30]. While direct tuning of emission wavelength is in principle possible on several platforms, e.g. by strain [31] or via static electric fields [10,13], the range of tunability is in general limited. Also, direct tuning brings additional complexities in the device fabrication and can add significant experimental overhead.
We show that we can use the QFC process to remove the spectral offset between the NV emission without the need for direct tuning of the optical transition. An analogous scheme has recently been used on spectrally distinct quantum dots [32,33]; a key difference is that the NV center host diamond also contains a long-lived matter qubit and can function as a processing node in a quantum network [23]. Using resonant excitation spectroscopy we find the optical transition frequency used for single-photon generation at each node (Fig. 2a). We observe that these transition frequencies are separated by approximately 3 GHz (about 100 natural linewidths), as shown in Fig. 2B.
To bring the NV photons to the same target frequency, we realize a scheme that locks the pump laser at each node to the frequency difference between the excitation laser at that node (and hence the NV emission frequency) and a joint Telecom reference laser at the midpoint. To achieve this lock, we use a split-off of the excitation laser, offset in frequency by a fixed 400 MHz, as stabilization light (see Fig. 2D). We propagate this stabilization light through the same frequency-conversion path as the NV photons. Due to the frequency offset, the stabilization light is reflected at the UNF, travelling backwards towards the first circulator where it exits (see Fig. 1c). Light from the joint reference laser is inserted at the second circulator, from where it propagates in opposite direction through the transmission flank of the UNF, also exiting on the first circulator. Here, the interference with the stabilization light is measured on a photo diode yielding the error signal for the lock (see Supplementary Info S5 [34]). We close the loop by applying feedback to the pump laser, imprinting the same frequency shift on both the single photons and stabilization light. By transmitting the reference laser 25 MHz detuned from the transmission peak of the second ultra narrow filter, the DC amplitude on the same photo diode serves as an error signal for active temperature stabilization of the UNF. Typical transmission profiles of the temperaturestabilized UNFs and the respective light-field frequencies are shown in Fig. 2d. The small deviation from ideal peak transmission at the target frequency is due to unaccounted background voltage of the photo diode and the slight difference in FWHM of the two UNFs. The remaining thermal drifts of about 1 MHz have only a minor (∼1%) effect on the transmission (see Supplementary Information ?? [34]). Note that the part of the reference light that is reflected off the ultra-narrow filter exits the circulator towards the SNSPDs and thus needs to be taken into account when designing the experimental sequence (see next section).
We verify our frequency locking by sweeping the excitation laser and monitoring the resulting pump laser frequency in Fig. 2b. We observe the expected linear relationship: a change in excitation laser frequency is precisely compensated by the pump laser frequency to always yield the same target frequency across the full tuning range. A linear fit yields the target frequency of (1587.5298 ± 0.0001) nm. This data demonstrates the ability of our frequency-locked down-conversion system to robustly compensate for a wide range of detunings.

III. PHOTON GENERATION AT THE TARGET TELECOM WAVELENGTH
We now turn to the generation of single telecom photons by the nodes as used for the measurement of twophoton quantum interference at the midpoint. The measurement sequence involves four stages (see Fig. 3), which are synchronized across the nodes by a fixed electronic  Reflections from the excitation light off the sample surface, which follow the same path as the resonant NV emission, are converted to the target telecom frequency and measured on the SNSPDs. The excitation frequency is swept by modulating the AOM frequency (see Fig. 1c). Transmission data is corrected for the frequency dependence of the losses in the AOMs. d) Layout of relevant laser frequencies and transmission data of the ultra-narrow filters (UNF). The UNF transmissions (blue and orange dots) are actively stabilized via their temperature to half transmission of the reference laser (grey vertical line). The stabilization light (red vertical line) is detuned from the target (green vertical line) and therefore reflected off the UNF (see main text).
"heartbeat" every 200 µs. This heartbeat is derived from a GPS-disciplined atomic clock positioned in the midpoint, which is distributed over telecom fibers via the White Rabbit Precision Time Protocol [35]. The first 2.5 µs following each heartbeat are used for the error signal generation for the frequency lock. This scheme allows the frequency lock to operate without knowledge of the state of the nodes, which reduces the complexity and rounds of communication needed. Moreover, it enables the autonomous operation of each of the nodes, using their own independent hardware to control the NV center and generate single photons.
In the first stage of the measurement sequence, a Charge-Resonance (CR) check is performed at each node to ensure that the NV centers are in the correct charge state and their transitions are on resonance with their respective spin reset and excitation lasers [10,36]. In case a CR check fails, a charge reset laser pulse is applied and a new CR check is started; this protocol is repeated until success. Importantly, the CR check can be run in parallel with the frequency locking as the stabilization light for the lock does not reach the NV center nor the local PSB detectors; hence the CR checks can run independently of the heartbeat. The second phase starts once the CR check is passed on a node, where a digital trigger from the micro-controller signals the readiness to the other node. After the readiness of both nodes has been communicated, the heartbeat at which they move to the third stage is agreed upon (Fig. 3a and Supplementary Info S1 [34] for timings and more information).
The third stage of the measurement sequence is used for the time-multiplexed frequency stabilization, as described above. In the fourth stage, we repeatedly apply a block consisting of a spin reset pulse (1.5 µs) followed by 10 optical π-pulses, ideally generating a train of 10 NV photons. A time-tagged digital signal marks the times at which the photon generation takes place. This block is repeated 39 times per heartbeat period. After two heartbeat periods, the system returns to the first stage (CR checks). Note that during the third and fourth stage, time-multiplexing the operations on the NV center and the error signal generation for the frequency lock is criti- Histogram of SNSPD counts in a single excitation round, averaged over ≈ 2.8 × 10 11 repetitions and analyzed in 80 ps bins. The measured count rate (green) is a combination of background (grey) and NV fluorescence (red), which is calculated by subtracting the background from the total counts. The dotted line depicts exponential decay with 12.5 ns lifetime and serves as a guide to the eye. The NV signal before time 0, the peak of the excitation pulse, is due to the non-perfect extinction ratio of the devices generating the optical excitation pulse. The background data was taken continuously over 24 hours to include any drifts that occur over the same timescale as the signal data. (c) Signal-to-background ratio for both detectors as calculated from data in (b) (solid lines). The solid vertical grey line shows the start of our chosen detection window, whereas the dashed (dash-dotted) line shows the end for the data shown in Fig. 4a (Fig. 4b). The difference between the two curves is due to non-equal detector performances.
cal as the stabilization light and the reference light both leak into the single-photon detection path. We analyze the resulting telecom photon detection rate in Fig. 3b (green line). We show the events observed in a single detector, aggregated over all single excitation rounds and both nodes. We denote t = 0 as the relative time of excitation. A sharp increase in count rate is observed when the ≈2 ns-wide optical π-pulse starts, followed by a slower decay dominated by spontaneous emission of the NV centers.
A data set displaying only the noise counts and the counts due to leakage of the excitation π-pulse (grey line) is independently generated by detuning the excitation laser by 1 GHz. The observed uniform background consists of intrinsic detector darkcounts (5 Hz per detector), counts induced by detector blinding from leaked reference and stabilization light (35 Hz per detector), and SPDC photons from the QFCs (≈150 Hz per detector). The leakage of the excitation π-pulse reflected off the sample is clearly visible. By subtracting this background from the data, we isolate the frequency-converted NV signal (red) displaying the characteristic exponential decay.
In remote entanglement experiments, the effect of noise counts can be mitigated by defining a heralding detection time window: only photon counts in this window are taken as valid entanglement heralding events [10]. In general, setting the heralding window involves a trade-off between high signal-to-background and thus high fidelity (favouring shorter windows) and success rate (favouring longer windows). In Fig. 3c we plot the signal-tobackground ratio (SBR) for the two detectors as a function of photon detection time. The SBR is bounded on one side by the leaked excitation pulse and on the other side by the NV signal approaching the uniform background. For the analysis of the two-photon interference visibility ( Fig. 4a), we apply a detection window in which the average SBR exceeds 10 ( Fig. 3c, up to dashed line). For a more detailed comparison of our model to the data ( Fig. 4b) we use an extended window (up to dash-dot line). In order to maintain the same SBR throughout the experiment, we employ a system of automatic optimization based on the live monitoring and processing of the single photon detection events (see Supplementary Info S-V, S6 and S7 [34]).

IV. TWO-PHOTON QUANTUM INTERFERENCE
Next we investigate the distinguishability of the photons emitted by the two nodes by analyzing their quantum interference. For two fully indistinguishable pho- tons impinging on the input ports of a balanced beam splitter, quantum interference leads to vanishing probability to detect one photon in each output port [25], while for fully distinguishable photons this probability is 0.5 [37,38]. From the (properly normalized) coincidence counts in the two detectors we can thus extract the distinguishability of the photons. Figure 4a) shows the measured coincident detections between the two output arms without any background subtraction. Each excitation round is treated as a "detection bin" in which a photon can arrive. We analyze the coincidences per block of 10 excitation pulses, defined as a click in both detectors in the same or two different detection bins. This leads to a maximum detection absolute bin difference of 9 and a coincidence probability increasing linearly towards 0 bin difference. We overlay the data with a model based on independently determined parameters, treating the photons as completely distinguishable (see Supplementary Information [34]). For the non-zero bin differences, in which the NV photons are fully distinguishable by their arrival time difference of at least 10x the lifetime, the model shows excellent agreement with the measured coincidences. In stark contrast, we observe a strong reduction in measured coincidences compared to the model for the zero bin difference. This drop in coincidences when the photons arrive in the same bin is the hallmark of two-photon quantum interference and forms the main result of this work.
The observed visibility is defined as V = 1− C M C Dist , with C M the measured number of coincidences, and C Dist the coincidences we would have measured at zero bin difference in case the photons were completely distinguishable. In the inset of Fig. 4a we show the method of extracting the visibility. First we use a linear fit to the total distinguishable coincidences per detection bin difference to get C E , the extrapolated coincidences for 0 bin difference. From this value we extract C Dist , by correcting for the imbalanced emission rates (see Supplementary Inf S-VI [34]). The resulting visibility is V = 0.79±0.03, which is well above the classical bound of 0.5 [37,38], proving the successful demonstration of quantum interference of single photons in the telecom L-band.
A more detailed picture of the temporal shape of the coincidences allows us to test our model with more precision (Fig. 4b). The accumulated coincidences for nonzero bin difference (top panel) show a characteristic shape dominated by the exponential decay of the NV emission. The data is well described by our model that takes into account the temporal shape of the NV-NV, background-NV and background-background coincidence contributions (derivation in Supplementary Info S-VIII [34]). The temporal histogram of coincidences within the same bin (bottom panel) shows a good match with the temporal shape predicted by our model. In particular, we observe no reduction of coincidences at 0 time delay, consistent with the visibility being limited by background counts rather than frequency differences between emitted photons [39,40].
With our knowledge of the background and signal rates we can extract the degree of indistinguishability of the emission coming from our NV centers. We perform a Monte-Carlo simulation of our dataset using the independently determined parameters and apply Bayesian inference to find the most likely value of the indistinguishability, given our measured result (see Supplementary Info S-IX and S3 [34]).
In Fig. 4c we plot the visibility and the extracted photon indistinguishability for increasing detection time window lengths. While the visibility drops for longer windows consistent with the decreasing signal-to-background ratio, the indistinguishability of the NV photons remains high around 0.9. We note that this latter value is similar to values found for NV-NV two-photon quantum interference without frequency conversion [11], confirming that our conversion scheme including the frequency stabilization to a single target wavelength preserves the original photon indistinguishability, and enables solid-state entanglement generation via entanglement swapping.

V. CONCLUSION AND OUTLOOK
We have shown quantum interference of single photons emitted by spectrally distinct NV centers, by converting them to the same telecom wavelength. We have demonstrated an actively stabilized Quantum Frequency Conversion scheme using Difference Frequency Generation on fully independent nodes. The design and implementation allow for the scheme to be used at large distances. Furthermore, the techniques can be readily transferred to other quantum emitters in the visible regime with minimal adaptations to the conversion optics and control schemes used.
Future improvements to our system can increase the performance in multiple ways. First, adapting our optical design to prevent detector blinding by the stabilization light can lower the detector contribution to the background counts to the design level of 5 Hz. Second, a different approach to the QFC technique based on a bulk crystal may remove the (currently dominating) SPDC background noise due to poling irregularities. Third, the signal level of collected coherent photons from the NV centers could be improved significantly by use of an open microcavity [41,42]. In particular, achieving a fraction of coherent emission of 46% as reported in Ref. [41] would raise the signal-to-background ratio above 200. Finally, by extending the hardware we can stabilize the optical phase of the single photons emitted by the NV centers, enabling entanglement generation upon heralding of a single photon [23].
By combining the protocols demonstrated here with established spin-photon entangling operations and photon heralding at the midpoint [10,11], remote NV centers can be projected in an entangled state via telecom photons. Owing to the low propagation loss of these photons and extendable control scheme, our results pave the way for entanglement between solid-state qubits over deployed fiber at metropolitan scale.

ACKNOWLEDGMENTS
We would like to acknowledge Matthew Weaver and Matteo Pompili for fruitful discussion during the model development and data analysis, Ludo Visser and Pieter Botma for software support during the measurements. We thank Martin Eschen and Boudewijn van den Bosch for their contributions to building and maintaining parts of the setup, and Wouter Koek for simulations aiding the design. We thank Klaas Jan de Kraker, Sander Kossen, and Emanuele Uccelli for managing many tasks in the project/sample production. We thank Ruud Schmits and Jacob Dalle for their contributions to the sample production process development.
We acknowledge funding from the Dutch Research Council (NWO) through the project "QuTech Part II Applied-oriented research" (project number 601.QT.001), VICI grant (project no. 680-47-624) and the Zwaartekracht program Quantum Software Consortium (project no. 024.003.037/3368). We further acknowledge funds from the Dutch Ministry of Economic Affairs and Climate Policy (EZK), as part of the Quantum Delta NL programme, and Holland High Tech through the TKI HTSM (20.0052 PPS) funds. Author

S-I. OPTICAL SET-UP
Our experiments are performed using two nominally identical quantum network nodes. Each node houses a Nitrogen-Vacancy (NV) center in a high-purity type-IIa chemical-vapor-deposition diamond cut along the 111 crystal orientation (Element Six). Both samples have a natural abundance of carbon isotopes. Fabrication of solid immersion lenses and an anti-reflection coating on the diamond samples enhances the photon-collection efficiencies from the NV centers. The ground-state spin levels are split using a small permanent magnetic field aligned with the NV axis of ≈ 30 G.
Experimental equipment used for each node is summarized in Tab. S1. Node 1 and 2 are in the same laboratory, around 7 m apart. The optical fibers that connect Node 1 and 2 with the first midpoint 19" rack containing the filters are PM fibers of 3 m and 10 m long, respectively. After the filters, both are connected to the beamsplitter and subsequently the SNSPDs in a separate 19" rack in the room next door with 10 m long PM fibers.
Details of the optical lay-out of the free-space and in-fibre optics used are shown in figure 1B. For initialization in the correct charge and spin state, we use a combination of off-resonant (515 nm) and resonant (637 nm) excitation respectively. Both lasers are combined using in-fibre optics, and coupled into the free-space part using a beam sampler optimized for transmission. For single-photon generation, we use a second resonant laser tuned to a spin-preserving transition. Both resonant lasers are stabilized using a wave meter. The main part of the excitation light passes electroand acoustic-optical modulation (EOM and AOM) for fast switching, and is coupled into the free-space path using a central polarizing beam splitter (Thorlabs). Combined, this light is guided to the high NA microscope objective (Olympus 100x 0.9NA) using a dichroic mirror (DM) (Semrock), after which the polarization is set at an optimum for cross-polarization. The second part of the excitation laser is sent to a separate set of AOMs that provide an offset in frequency to form the stabilization light and coupled in on the opposite side of the central PBS. The main purpose of this light is to provide a fixed frequency reference of the single photons generated by the excitation light, and its purpose can be extended to stabilize the phase of the relevant phases for entanglement generation [23,43].
Emission from the NV-centre propagates backwards through the set-up, where the phonon side-band (PSB) is separated from the main path using the DM, filtered by additional long-pass filters (Semrock) and coupled into a multi-node fibre to be detected locally by an APD (LaserComponents). During the measurement, detection in this path is used for live monitoring and diagnostic purposes. The resonantly emitted photons in the zero-phonon line (ZPL) are guided to the central PBS, separated from the excitation light based on polarization (Thorlabs), and filtered using a band pass filter (Semrock). We use a deformable mirror and a set of motorized mirrors to couple in both the single photons and frequency control light simultaneously into the same PM SMF. Using these components, we can periodically optimize the coupling remotely, greatly improving the long-term performance and remote operation capabilities. By coupling into a fibre, we can decouple the alignment of the free-space optics from the QFC optics, simplifying the set-up and allowing for easy exchange of different QFC modules (full details of equipment list in Tab. S1 and S2.

S-II. OVERVIEW OF TIMING ELECTRONICS
Interfering photons emitted by two NV centres with high fidelity places strict requirements on the arrival time and thus the generation time of the single photons. Future long distances between our nodes prevent us from using a single waveform generator to meet these requirements, and a more scalable approach is needed, which we have employed for the current experiment. The lay-out is shown in figure S10 (full equipment in S3), showing how a single GPS disciplined clock is shared across the nodes via White-Rabbit enabled switches. The resulting synchronization pulses are then coherently split and distributed to the devices requiring timing synchronization. The microcontrollers EVENT cycles are externally triggered by a 1 MHz pulse. The AWG sequencers are referenced externally by the 10 MHz. Furthermore, the waveform generation inside the AWG is triggered using the 5 kHz heartbeat, after which the AWG sequencer can realize a sub nanosecond delay. Together, this allows for the long range synchronization of the experimental sequence with a jitter of 100 ps.
For future experiments, the heartbeat can be adjusted by changing the settings of the heartbeat generator. This would allow to run the frequency stabilization at a higher rate needed for the higher feedback bandwidth for phaselocking of two remote excitation lasers.

S-III. OVERVIEW EXPERIMENTAL SEQUENCE.
The experimental sequence is given in more detail in Fig. S1. The CR-check procedure contains a bright off-resonant charge reset pulse to probabilistically re-ionize the NV centre into the negative charge state. We then verify that the lasers are resonant with the right transitions by monitoring the fluorescence during excitation with both the spin reset and excitation laser. The counts during this interval are compared to a threshold determined before the experiment. If past this threshold, we assume to be on resonance and in the right charge state. This procedure is repeated until success, and has a typical passing rate of 5−10%, taking on average 1.5 ms [36]. The amount of experimental sequences per CR-checking round is a balance between ionization to the N V 0 state during the sequence and data rate. At the start of each CR-check we find the NV centres to be ionized in about 10% of the cases per node.
The communication between the nodes signalling the ready state is done by exchanging a digital trigger between the micro-controllers in charge of the CR-check process. By using the predetermined travel time of the communication, each micro-controller can calculate the next available starting time upon receiving the digital trigger of the opposite node. Using this information it triggers the AWG in advance of that heartbeat, reliably triggering the AWGs that switch to playing the waveforms that generate the optical pulses needed for photon generation. During the live analysis of the data, the experiment markers are used to signal the generation of single photons. These markers are in sync with the heartbeat present on the midpoint, and allow for the faithful recovery of the single photon events. Future improvements can be made by actively heralding the arrival of photons in the midpoint, conditioned on receiving the experiment signal.

S-IV. DATA COLLECTION AND PROCESSING
The data taken shown in Fig. 3 and Fig. 4 was taken over the course of 17 days. To allow for long-duration remote experiments, we operate the set-ups with minimal manual in-situ intervention, and employ a multitude of automatic calibrations. We determined the amount of data to collect by using previous measurements of coincidence rates and targeting an approximate statistical errorbar of 3% on the measured visibility.
The data was collected in batches of ≈ 24 hours, during which the set-up was operating fully autonomously. Remote monitoring was made possible by live viewing in a Grafana dashboard showing performance and environmental data pushed to a database. Live warnings of parameters that went outside pre-defined ranges provided 24/7 warnings, and the possibility to manually intervene remotely. After each block of 24 hours, manual routine inspections were done to check the sample and set-up status. As an example of the benefits of this system, the interference data was collected over a period of 17 days, of which 78% of the wallclock time under autonomous operation of the complete system, during which no human operator actively controlled the experiment.

A. Data processing
We employ various data processing steps during the experiment. First, all the raw data generated by the Adwin counting modules describing the result of the CR-checking process on the nodes is written to disk every ≈ 2 of minutes, and a next block of measurements is started. Simultaneously the stream of time tags of all synchronizations (heartbeat and experiment marker), and all SNSPD events outside of the blinding window, are tagged using 80 ps bins, and saved to disk on dedicated node PCs every couple of seconds, totalling to ≈ 1.2Tb of uncompressed data.
Because performing the analysis on large amounts of raw data is challenging, we employ live processing that generates significantly smaller files. This processing stores only the photon events that might be of interest for further analysis, by looking at the time bins where converted single photons from the nodes are expected to arrive in the midpoint (see Fig. S1). The data stored when an event occurs is given in Tab. S4, which can be extracted from the combined stream of timetagged events from all 3 timetaggers. One experiment marker can have more than one event (e.g. coincidences). The size of this dataset is less than 0.5% of the raw data generated, making it much more manageable.
While analyzing the coincidence data we noticed that some blocks contain many events that happen at time intervals much shorter than the dead time of our detectors (20 ns). We suspect that these events are due to (yet unexplained) resonances in the biasing electronics of the SNSPDs. To filter out these events we ignore blocks of measurements in which more than 2 detection events have happened in a single 160 ns window for both the SNSPDs. We can give an upper bound on how many events we would expect of this kind based on our signal rate by taking the maximum singles detection rate in Fig. 3b (3000 Hz) for the full duration of the 160 ns. Then, the probability of having more than 2 counts in both nanowires simultaneously is given by P (C > 2) 2 with P (C > 2) = 1 − P (C = 0) − P (C = 1) − P (C = 2) (S1) the chance of having more than 2 counts in one nanowire. For the amount of repetitions of our experiment (282226000000 ≈ 2.8 * 10 11 ), we thus conservatively expect 0.015 of these occurrences. When applying this filter we remove ≈ 36 blocks which equates to 0.05% of the total data.

S-V. AUTOMATED CALIBRATIONS
Keeping multiple nodes at their pre-calibrated performance level in parallel requires an automated calibration framework, in which several setup-specific calibrations can be performed both in parallel and successively, where the order of calibrations is conditional on setup specific inter-dependencies. After a constant amount of optical excitations, each node's performance is evaluated against several parameters that are indicative for the NV photon emission, conversion and collection rate per node, listed below in Sec. S-V A. Every parameter has a specific threshold chosen to reflect the manually calibrated performance level at the time of initialization of each dataset taken per day. A visual representation of the automated calibration framework is shown in Fig. S6. Violation of the pre-calibrated parameter thresholds triggers adding specific calibrations to the 'calibration queue', where the calibrations are ordered by setup specific inter-dependencies and subsequently executed, parallelizing calibrations on nodes as much as possible.
Every individual calibration that will block the optical path output of the QFC temporarily freezes the frequency lock's feedback loop for the duration of the calibration, schematically shown in Fig. S7. We require this when calibrating the QFC efficiency with path-blocking flip-in power meters, as well as using a diagnostic flip-in mirror to measure NV excitation resonant laser light and photons coupling in to the QFC. The lock feedback loop is resumed after every such a calibration to ensure the locking feedback scheme can keep up with longer term drifts of the locking beat frequency.

A. Evaluated performance parameters
• Count-per-shot of PSB photons (CPS PSB ): The amount of photon detections in the PSB per excitation attempt contains information on how well we address the relevant transition with the optical excitation pulse. A reduction in CPS PSB can thus trigger all available node-related calibrations, including re-calibration of the power of the excitation pulse, but only of the specific node on which this is detected.
• Average CR-check passing rate: Indicative of the stability of optical addressability of the NV center's transitions.
• Fixed amount of optical excitations: Recalibration of each node is performed regardless of performance thresholds after a fixed amount of attempts, both for logging purposes of small drifts as well as certainty about bringing the setup to a well calibrated known state, mitigating any drifts we could not observe while the experiment is live.
• Count-per-shot of telecom photons from NV excitation (CPS tel ): The amount of single telecom photons incident on the SNSPD detectors per excitation attempt show a performance of the entire node: the entire optical path the single photons traverse until detection at the SNSPD detectors. This also includes the intrinsic performance of the QFC, which is calibrated separately.
These measurements also enable us to collect the data crucial for the further analysis of the two-photon quantum interference: the probability of detecting a photon from Node 1 or 2 in detector A and B and the rate of background counts detected in detector A and B. An overview of the average value and standard deviation per measurement dataset is shown in Fig. S2. How these values are used for further calculations is explained in the next section.

S-VI. MODEL OF COINCIDENCE PROBABILITIES
In this section we will give details on the model used to calculate the expected coincidence probabilities assuming both completely distinguishable photons and (partially) indistinguishable photons with an indistinguishability η. We collect coincidences in 19 distinct difference bins (-9 ... 9) which are generated as follows: Once both nodes have indicated their ready-state we perform 10 pulsed excitations. This results in 10 consecutive time windows in which we can detect single photons emitted from the NV centers. For each detected photon we are looking for photon coincidences (i.e. events for which also the other detector clicked). We consider coincidence events where both photons were detected in the same time window (these coincidences are shown in the zero-difference bin) as well as photon detections in different time windows. Depending on the difference in time window number we assign these coincidences to one of the 19 possible bin number differences, numbered from −9 to 9, as shown in Fig. 4a of the main text. Due to the finite number of time windows, the probability of detecting a coincidence decreases for higher bin number differences. The probability of detecting a coincidence for any given bin number difference is thus given by P det = sP coinc with the scaling factor s = 10 − |bin| where bin is the bin number difference. The probability of detecting a coincidence is: where P N V N V is the probability of a coincidence between two photons emitted by an NV center, P N V DC is the probability of a coincidence between an NV emitted photon and a dark count (or other noise or background count) and P DCDC is the probability of a coincidence between two dark counts. For the non-zero bin number differences these probabilities are given by where p 1(2)A(B) is the probability of a photon emitted by node 1 (2) to be detected in detector A (B) and p DC A (B) is the probability of a dark count in detector A (B). All these parameters are measured during the experiment by having excitation rounds from only a single node interleaved with the normal excitation rounds. As the single emitter nature of our NV center does not permit two photons from the same NV within one time window (up to a small probability of double excitation which we neglect in this model), in the zero bin number difference P N V N V reduces to the following expression: while P N V DC and P DCDC remain the same. Here η is the indistinguishability, which represents the reduction of the probability of getting an coincidence from NV contributions. This single parameter in our model takes into account the possible sources of distinguishibility such as spectral/temporal offset and polarization differences.

S-VII. CALCULATING THE VISIBILITY WITH UNBALANCED EMITTERS
We define Visibility using the ratio between coincidence counts in the case that photons are fully distinguishable C dist and coincidence counts we measured, C M .
We can directly extract C M from our experiment, as it corresponds to the measured coincidences in the zero-difference bin. In our model, it is given by filling in Eq. S6 and S3 into Eq. S2: However, we do not have such direct access to C dist . To extract C dist from measured parameters we fit the number of coincidences in the non-zero bins using a linear fit (as shown in Figure 4 of the main text), and use this to extrapolate the value in the zero-bin, which we will call C E . This extrapolation will overestimate the coincidences we would get in the zero bin number difference for perfectly distinguishable photons as the non zero bin number differences also include coincidences of photons emitted from the same NV, a case that does not occur in the zero bin number difference. To determine the necessary correction we compare the probability of a coincidence in the zero bin number difference for η = 0 with P E , the probability for an extrapolated zero-bin using the fit of the non-zero bins Resulting in the following correction for the C E , where we use C E = P E N attempt Here we have used the number of experimental attempts N attempt (i.e. the number of excitation pulses sent to the NV center over the course of the entire experiment). We note that in case the used beamsplitter is strongly imbalanced this would also affect the number of coincidences in the zero bin number difference. In particular, our determined indistinguishability would be scaled by an additional correction factor to η(1 − (R − T ) 2 (R 2 + T 2 ) ) with the beam splitter reflectivity R and transmitivity T . Using the specified values of our beam splitter R = 0.496 and T = 0.504, we find that this effect is on the order of η(1 − 10 −4 ) and we therefore neglect it in our model.

S-VIII. MODEL OF TEMPORAL SHAPE OF DISTINGUISHABLE COINCIDENCES
The analysis of the measured coincidences with respect to their arrival time difference holds a vast amount of information of the (relative) emitter properties. In our case, it is difficult to extract detailed information about the NV centres from the measured coincidences in the zero-difference bin as the vast majority of these coincidences involves background counts. We can, however, model the different temporal shape of the three contributions as mentioned in S2 in the non-zero difference bins. Analogous to Kambs [40] we start by writing down the photon wave-packets of the individual events. The single photon wave function an NV-centre from spontaneous emission (ignoring any spectral and phase information) is given by where τ =12.5 ns the lifetime of the excited state. We can define W (t) ≡ H(t − T start )(1 − H(t − T end ) to be the selected window with start(end) time T start (T end ) both larger than 0. H(t) is the Heaviside step function. Here we have chosen t = 0 to be the moment the infinitely short optical excitation would have arrived in the detector as the start of the exponentially decaying wave-packet. We now assume that both NV centres have the same excited state lifetime, and the beamsplitter ratio to be perfect. The joint detection probability is given by: A background photon at the beamsplitter is modeled as a process that has a uniform probability density in time: Here we assume that the contribution of background noise is constant in time, which is a good approximation for the darkcounts of the detector and SPDC noise coming from the QFC process. For the excitation pulse leakage this approximation does not hold, but we assume this error to be small if we limit the amount of pulse light in our analysis window. For completely distinguishable events the interference term in the joint detection probability drops out, and it simplifies to: P dist joint (t, ∆t) = 1 4 (P i (t + ∆t)P j (t) + P j (t + ∆t)P i (t)) (S15) with P i (t) = |φ i (t)| 2 . To obtain the second order cross-correlation function we integrate over the detection times of the single photons. The NV-NV contribution is therefore given by: We can solve this integral by breaking it up for positive and negative ∆t, and absorbing the Heavisides accordingly in the integration limits. For ∆t > 0 we get: For ∆t < 0 we get: which, after combining both, arrive at the expression for ∆t ∈ [−(T end − T start ), T end − T start ]: As a sanity check, we can integrate G N V N V (∆t) over the interval [−(T end − T start ), T end − T start ], to get the overall probability to get a coincidence in our window: which, by separately integrating the for negative and positive ∆t results in This is what we would expect for two completely distinguishable photons. Following the same approach, we can calculate the integral for the NV-DC as For the DC-DC contribution we get that both integrate to 1 2 over the coincidence window. Figure S9 shows the shapes of these cross-correlation functions for the different contributions of coincidences in the experiment. The difference between NV-DC and DC-DC contributions is minimal, but the NV-NV shape is clearly distinguishable from the noise sources. The plot in Fig. 4b in the main text is a weighted sum of the three shapes. The weights of for the different contributions are calculated using Eq. S3-S5 and the data shown in Fig. S2. For the numerical values of input parameters used see Tab. S6 For the indistinguishable case, we use the same temporal shape as the distinguishable, and scale the amplitude with 1 − η. This is a valid approximation where polarization difference are the dominant contribution with respect to any temporal or spectral effects.

S-IX. MONTE-CARLO BASED DERIVATION OF INDISTINGUISHABILITY USING BAYESIAN INFERENCE
As a next step we use our model to derive an expression for the indistinguishablity η. Due to the stochastic nature of the coincidences observed in our measurement, the calculation of the indistinguishability using standard error propagation like the variance formula can lead to non-physical results for the indistinguishability being in its confidence interval. We prevent this by using Bayesian Inference to find a more accurate confidence interval. In words, we want to know the most likely indistinguishability η opt , given the set of measured coincidences C M and C E . Equations S8 and S10 and the number of attempts N attempts give us a direct way of calculating C M and C E , based on p 1(2)A(B) , p DC A(B) (denotedθ in short) and η as described in S-VI. The strategy is then to simulate many realizations of our experiment with η i the only free parameter, via a Monte-Carlo simulation. We then use the outcomes of these simulations to calculate the likelihood of a certain η, given our measured C M and C E via Bayes rule: where P (a|b) is the probability of a given b,M is the observed measurement outcomes C M and C E , andθ the vector containing the parameters that fully describe our experiment. P (η,θ) is the probability of the set of parameters before our measurement, the so called prior. The only unknown parameter is η, for which we take a uniform distribution on the interval [0, 1], to reflect the assumption of no prior knowledge. All the other parameters θ j inθ are assumed to be normally distributed, with the mean and variance determined by the independent samples during the measurement, see previous section S-IV and S2. Draw N times input parameters θn from N (µj, σ 2 j )

4:
Calculate N values for P n E and P n M according to eq. S8 and S10 Calculate fraction , which is equal to P (M |η,θ)

12:
Calculate likelyhood of ηi using eq. S23 and store the value. 13: end for The algorithm to calculate the posterior distribution of η for a given set ofθ andM is given in algorithm 1. It captures both the uncertainty in the input parametersθ, as well as the statistical fluctuations introduced by the stochastic nature of measuring a rate of coincidences. In our case, the uniform chosen prior means that the normalized likelyhood function produced by the algorithm is directly the probability density function (pdf) for the left hand side of Eq. S26. The pdf's for the calculated indistinguishability of the data shown in Fig. 4a and b are shown in Fig. S3. Here we can clearly see the asymmetry of the calculated distributions, and their cut-off at the maximum of 1. To report this likelyhood as a single value with 'errorbars', we chose the maximum of the likelyhood to be our datapoint, and calculate a symmetric 68% confidence interval around the most likely η. For the datapoints where the most likely indistinguishibility is closest to 1, a symmetric confidence interval, with 34% of the probability on either side can not be taken, and asymmetric intervals are used.

S-X. SNSPD BLINDING INDUCED BACKGROUND
The current lay-out of the in-fibre optics guides the reflected part of the reference light through the same port of the circulator as the single-photons. Therefore the power is aimed directly at the nanowires of the SNSPDs, blinding the dectectors. Additionally, because of the beamsplitter in front of the detectors, the reference light reflected from both UNFs interferes, making the output power oscillate in time. To prevent latching, the manufacturer placed anti-latching shunt resistors in the circuits to prevent the loss of photon counting over timescales of >milliseconds.
However, during the early investigation of the time multiplexing of frequency stabilization with single photon generation, we noticed a blinding-power dependent elevated darkcount rate, persisting long (>10 µs after the end of the blinding pulse. By scanning the incident power on one of the nanowires in a controlled manner (Fig. S4, we can see a clear power dependence of this effect. We reduce the additional background counts to a constant manageable level of 30 Hz by moving our single photon generation by ≈45 µs, and optimizing the reference light power in-situ. Further improvement to the in-fibre optics that reduce the backpropagated power through the FBG and active attenuation shielding the detectors can be employed in the future to remove this background contribution.

S-XI. UNF STABILITY
The UNFs are enclosed in an well-isolated box, with a heating pad and thermometer, temperature controlled by a Team Wavelength TC5. This controller provides ∼0.1 mK control of the temperature setpoint. Due to non-homogeneous distribution of temperature inside the box and fluctuating temperature gradients coming from outside the box, with a fixed temperature setpoint we observe significant drift of the UNF frequency if not actively stabilized to a target frequency.
The UNFs are frequency stabilized by measuring the power incident on the detector (Thorlabs PDB482C-AC) that measures the frequency lock beat, extracted as a voltage from a monitoring port of the detector. We stabilize the temperature controller setpoint of the UNF to the relative half-of-maximum transmission point. We calibrate the maximally measured power through each individual UNF by temperature-sweeping them, effectively changing the position of the filters in frequency space. In this calibration, there is no correction for the noise incident on or coming from the detector, . After calibration, we feedback the measured transmission power back to the temperature setpoint of the temperature controller of the UNF with a software implemented PI-control loop.

A. Transmission stability
Due to inherent inaccuracies in the active stabilization described above, a spread of transmission powers of the UNFs with respect to their setpoint at the relative half-of-maximum transmission point is observed. This occurs due to the limitation in accuracy of the actual temperature control at the filter and other drifts of fixed in-fiber components in the path that passes through the UNF. Such a drift in frequency of the UNF maximum transmission point results in a reduction of NV photon transmission probability, however the NV conversion target frequency is unaffected. To determine the exact UNF frequency shift with respect to the setpoint, we start with the Cauchy-distribution fit in Fig. 2d of the transmission power T of the reference laser through the filter with respect to frequency f : where f 0 is the location of the filter, 2γ the filter FWHM and I and B a relative scaling factor and offset to background, respectively. We can relate the offset from the half-of-maximum transmission point to a frequency shift on this characteristic shape of the filter by rewriting the above function to be a function of transmission obtaining f by using the filter-specific fit parameters obtained from Eq. S27. The root-minus solution is chosen according to the boundary condition that appropriately reflects the direction of change in frequency for a change in transmission power. With Eq. S28 we can convert a representative stabilized 70 h set of second-interval transmission power data to a relative shift in frequency of the filter with respect to the the half-of-maximum transmission point, in the assumption that all transmission power drift is due to the shift in UNF frequency. The transmission power is not corrected for any other drifts of in-fiber components between the power-stabilized reference laser emission and detection at the beat detector. The residual frequency instability for both filters is shown in Fig. S8.

B. NV photon transmission
From this frequency instability we can calculate the change in average transmission probability of converted photons emitted from the NV center through the UNFs. The emitted NV photons with natural linewidth of 2γ NV ≈12.7 MHz [26] have a frequency distribution as the Cauchy distribution where f N V is the frequency of our target wavelength (see Fig. 2). We can treat the UNFs as frequency dependent transmission devices, i.e. the Cauchy fit parameters used for Fig. 2D can be used to construct the probability of transmission T p (f ) of an incident photon by setting I = 1, B = 0 in Eq. S27, resulting in T (f = f 0 ) = 1. The probability of the NV photon transmitting through the UNF filter is then Using the above equation we can calculate the relative change in transmission probability from each UNF's fitted parameters and all frequencies f 0 ±1 MHz, the interval where almost all of the occurrences of the measured filter shift are located. Using numerical methods to approximate the above integral we find that the total transmission of the NV photon is reduced to ≈ 75%(UNF1) and ≈ 77% (UNF2) with the filters at the half-of-maximum transmission setpoint, excluding any losses incurred in the physical implementation. For the UNF stability data shown in Fig. S8 we obtain a transmission change of at most ≈ FIG. S1: . Schematic showing pulses played during the experiment. The timescale is indicated above the pulses and sequence stages. The communication step (bottom left) is symmetric whether node1 or node2 succeeds first in passing the CR check. The heartbeat where the next stage is started is shown. Using the Marker signals, the corresponding coincidence windows in the SNSPDs in the midpoint can be calculated. Individual data sets are about 24 hours of measurement. The error-bar on the values is the standard error of the mean. For each datapoint,the parameter is measured during at approximately 2 minute intervals. The signal rate is calculated in the fluorescence window after the excitation as shown in figure 3. The average background rate is calculated using data in between the pulses, far away from the optical excitation and fluorescence. These parameters can be calculated for the different windowlengths using the same dataset. The rise of the background counts during the experiment is currently not understood.  FIG. S4: Rate of measured events after a bright blinding pulse in the superconducting nanowire single photon detectors. After the bright pulse we measure an elevated background level as compared to no blinding pulse applied. This effect persists for tens of microseconds, forcing us to delay the single photon generation by more than 45 µs with respect to the frequency stabilization measurement. The optimal powers used during the experiment is not shown, and was optimized in-situ. To ensure proper handoff of setup-equipment control, the stabilization light is always turned off before any calibration is started whilst freezing the feedback, if this calibration will block the optical path going through the QFC. Resuming of the lock is an exact reversal of this process, where the stabilization light has to be turned on before the lock's feedback loop is resumed, otherwise no locking beat will be generated.  S8: Histogram of the calculated filter shift of both UNFs with respect to their half-of-maximum transmission setpoint over a representative frequency-stabilized 70 h set of second-interval transmission power measurements, in the assumption that all transmission power drift is due to the shift in UNF frequency. From the NV photon and filter overlap we calculate to have at most a change of ≈ 1.5% in transmission power for a change in filter frequency of ±1 MHz, covering approximately 92% and 97% of the spread in filter shift, for UNF1 and UNF2, respectively.   FIG. S10: Schematic overview of hardware to synchronize events across the experiment. All connections that distribute the critical timing signals are propagated over telecom fibres on a dedicated fibre. The data that is generated by the timetaggers is sent over the network to a single PC that processes the data. Both the raw and processed data are stored to disk.