Statistically Modeling Optical Linewidths of Nitrogen Vacancy Centers in Post-Implanted Nanostructures

We investigate the effects of a novel approach to diamond nanofabrication and nitrogen vacancy (NV) center formation on the optical linewidth of the NV zero-phonon line (ZPL). In this post-implantation method, nitrogen is implanted after all fabrication processes have been completed. We examine three post-implanted samples, one implanted with $^{14}$N and two with $^{15}$N isotopes. We perform photoluminescence excitation (PLE) spectroscopy to assess optical linewidths and optically detected magnetic resonance (ODMR) measurements to isotopically classify the NV centers. From this, we find that NV centers formed from nitrogen naturally occuring in the diamond lattice are characterized by a linewidth distribution peaked at an optical linewidth nearly two orders of magnitude smaller than the distribution characterizing most of the NV centers formed from implanted nitrogen. Surprisingly, we also observe a number of $^{15}$NV centers with narrow ($<500\,\mathrm{MHz}$) linewidths, implying that implanted nitrogen can yield NV centers with narrow optical linewidths. We further use a Bayesian approach to statistically model the linewidth distributions, to accurately quantify the uncertainty of fit parameters in our model, and to predict future linewidths within a particular sample. Our model is designed to aid comparisons between samples and research groups, in order to determine the best methods of achieving narrow NV linewidths in structured samples.


I. INTRODUCTION
Excellent spectral properties and low spectral noise are a necessity for most quantum communications and entanglement protocols. Whether the goal is to entangle atoms in different cities [1], to relay quantum information across vast distances through a communications channel [2], to couple a qubit to a photonic cavity [3,4], or to study the interference between two qubits [5], some of the biggest successes of quantum technology rely on quantum sources that are spectrally stable [6]. The nitrogen vacancy (NV) center in diamond has been particularly successful in a variety of quantum information experiments, as the NV spin can be coupled to its optical degree of freedom [3,[5][6][7][8]. Yet a key challenge remains in creating NV centers with good spectral properties in nano-structured samples, even though many groups have studied diverse methods of creating NVs. These methods include implantation and annealing [9][10][11], laser writing of NV centers [12,13], and high-energy electron irradiation [14], with many studies focusing explicitly on the linewidth properties of the NV centers [15][16][17][18][19]. The transform limited optical linewidth of the NV center is ≈ 13 MHz, which sets the ultimate limit to how narrow the lines can be [11]. Depending on the application, broader optical linewidths can be tolerated: a 100 MHz linewidth is acceptable for a decent microcavity [4], and two-photon interference has been shown using an NV center with an inhomogeneous linewidth as broad as 480 MHz [5].
Here we study the distribution of optical linewidths of NV centers formed with implanted and native nitrogen in diamond nanostructures. We implant one of our samples with 15 N, which has a natural abundance of only 0.37% * patrick.maletinsky@unibas.ch [20], so that we can distinguish between implanted and native nitrogen by measuring the nitrogen isotope of the NV center. In line with the results of S. B. van Dam et al., we find that implanted nitrogen yields NV centers with generally broader linewidths than native nitrogen does [21]. We also find evidence that implanted nitrogen can yield NV centers with narrow linewidths. Additionally, we demonstrate the novel approach of post-implantation, in which all nano-structuring and fabrication procedures are completed before implanting the sample with nitrogen. We do this to reduce the effects of fabrication on the NV center properties, as it is unclear to what degree the fabrication procedures themselves influence the optical linewidth [4,14]. In studying post-implanted samples, we find a significant proportion of narrow linewidth NV centers, even in structured areas as thin as 1.57 µm. Finally, as determining what influences the NV center coherence properties remains an open question that is actively being explored, we develop a rigourous statistical model to help unify approaches within the community and to more easily compare results across research groups. We discuss our model in depth and show how we can use it to compare different data sets. To aid other researchers in implementing our model, we include a demo Matlab script, available as Supplementary Online Material.

A. Fabrication Processes
In the experimental part of this work, we first study two samples in detail (the third sample is discussed in Sec. V D). Both are made from electronic grade (N < 5 ppb, B < 1 ppb) diamond acquired from Element Six. Our fabrication procedure is summarized by P. Appel et arXiv:2005.03666v1 [cond-mat.mes-hall] 7 May 2020 al. [22]. In both samples, we fabricated a membrane of a nonuniform thickness spanning 2.5-5 µm, as well as cantilevers with variable dimension: lengths from 35-70 µm, widths of approximately 4.5 µm, and thickness of roughly 2.5-4 µm. An optical microscope image of Sample B is shown in Fig. 1(a), showing the cantilevers, membrane, and bulk parts of the sample.

B. Implantation Parameters
After all fabrication of the membrane and cantilevers was finished, Sample A was sent to the Helmholtz-Zentrum Dresden-Rossendorf to be implanted with nitrogen, and Sample B was sent to CuttingEdge Ions. Both samples were implanted with 12 keV nitrogen ions at an angle of 7 • relative to the sample mount and at a fluence of 10 11 ions/cm 2 . Whereas Sample A was implanted with 14 N, Sample B was implanted with 15 N, so that the NVs could be isotopically classified. After the samples were implanted, they were annealed with a procedure outlined in P. Appel et al. [22], consisting of 4 hours at 400 C • , 10 hours at 800 C • , and 2 hours at 1200 C • . Finally, the samples were cleaned with a tri-acid clean [22,23].

III. MEASUREMENT METHODOLOGY
We begin by taking a confocal fluorescence map in the target area of the sample (see Fig. 1(b)). We then characterize each potential NV center by taking a photoluminescence spectrum under green (532 nm) laser excitation. Once the zero-phonon line (ZPL) has been identified on the spectrometer, we perform a photoluminescence excitation (PLE) measurement on the NV center by sweeping the wavelength of a red (637 nm) laser across the transition while recording the fluorescence counts on an avalanche photodiode (APD), yielding a measurement of the excited state transition linewidth. We did not use the same intensity of red laser power for each NV center, as the broader linewidths were often too weak to measure at low laser power. Red laser powers for NV centers with narrow linewidths ranged from 10−200 nW, whereas broad linewidths were typically measured with 2 µW of excitation power. Optical linewidths are extracted from the FWHM of a Gaussian fit to the PLE data. Because we include a repump pulse in every iteration of the pulse sequence, our linewidths are broadened by spectral diffusion, making a Gaussian fit suitable [10,21]. The pulse sequence we use is shown above Fig. 1(c), and representative measurements for 14 NV and 15 NV centers are shown in Figs. 1(c)&(e). In the case in which both E x and E y lines were visible, we used only the narrower linewidth in the dataset, as the goal of this analysis is to analyze the narrowest linewidth measurable on each individual NV center.
After recording the optical linewidth, we use pulsed optical detection of magnetic resonance (ODMR) to measure the hyperfine structure of the NV center ground   state, thereby identifying the isotope of the NV nitrogen. The pulse sequence is shown above Fig. 1(d), and typical hyperfine-resolved ODMR measurements are shown in Figs. 1(d)&(f) for 14 N and 15 N, respectively. The locations and widths of the ODMR dips are extracted from Gaussian fits to the data. We attempted to measure the ZPL wavelengths, optical linewidths, and hyperfineresolved ODMR of a total of 159 NV centers in Sample A and 104 NV centers in Sample B. We note, however, that some NV centers did not exhibit any PLE, and others failed to show hyperfine-resolved ODMR. In total, we successfully measured PLE on 78 NV centers in Sample A and 61 NV centers in Sample B. Similarly, we were able to isotopically classify 47 NV centers on Sample B (iso- topic classifcation was not performed on Sample A, as it was implanted with 14 N).

A. Influence of NV Center Location on Linewidth
We summarize the data for Sample A in Fig. 2(a), which plots the measured optical linewidths against the measured ZPL wavelength. The data points are colorcoded to indicate which part of the sample they were taken on. The bulk part of the sample is approximately 50 µm thick, whereas the membrane and cantilever dimensions are discussed earlier. According to Wilcoxon ranked sum tests [24], the linewidths of NV centers found in the membrane likely follow the same statistics as those in the cantilevers. We therefore combine the cantilever and membrane measurements into a single category: structured. The ZPL wavelengths for Sample A are tightly clustered (spanning a spectral range of only 0.2 nm), and the sample exhibits no clear relationship between ZPL wavelength and optical linewidth. Binning the linewidths and color-coding them according to the sample location (see Fig. 2(b)) reveals that there are two distinct populations of NV centers: those with narrow linewidths, and those with broad linewidths. Both types of linewidths can be found anywhere on the sample. Fig. 2(c) shows the empirical cumulative distribution functions (ECDFs) for the structured, bulk, and total datasets for Sample A, showing that there is no apparent difference between the three distributions. A Wilcoxon ranked sum test (p-value 0.551) reveals that there is no statistically significant evidence that the structured and bulk linewidth distributions are different. Fig. 2(c) also shows that the median measured linewidth was approximately 200 MHz.
In Figs. 2(d)-(f), we show similar plots for Sample B. In Fig. 2(d) we see that although the ZPL wavelengths are far more scattered in Sample B (spanning a range of 2 nm) than in Sample A, there is still no clear relationship between ZPL wavelength and optical linewidth, indicating that local strain does not play a strong role in determining the linewidth. Plotting the data in a histogram labelled by location of the sample in Fig. 2(e) shows a similar situation as in Fig. 2(b): there are two distinct populations of NV centers, independent of the location on the sample. Likewise, Fig. 2(f) leads to similar conclusions as Fig. 2(c). Again, a Wilcoxon rank sum test (p-value 0.334) indicates that there is no clear evidence for a difference between the linewidth distributions in the bulk and structured areas in Sample B. As we see similar results in two different samples, and across different regions on those samples, we turn to isotopic classification to better understand these two populations.

B. Influence of NV Center Isotope on Linewidth
In Fig. 3(a) we bin the data and color-code the bins by isotope classification. We find that although many NV centers could not be clearly classified, a clear pattern emerges: NV centers formed with native 14 N exhibit narrow (< 1 GHz) linewidths, whereas most of the 15 NV centers showed broad (> 1 GHz) linewidths, in agreement with the results of S. B. van Dam [21]. Indeed, the median 14 NV linewidth in Sample B was roughly 100 MHz. We fit log-normal sampling distributions to the 14 NV and 15 NV data (dashed lines). In Fig. 3(b) we plot the ECDFs for the isotopically classified datasets, as well as the cumulative distribution functions (CDFs) for the log-normal fits. The CDFs show exceptional agreement with the ECDFs (see Sec. V C for model diagnostics). The fit curve for the unclassified data set comprises a weighted sum of two log-normal distributions in which the weights are also fit parameters. This fit suggests that all the unclassified data can be attributed to one of the two distributions.
Finally, we note that six of the 15 NV center linewidths were well separated from those of the other 15 NV centers. Based on an analysis of Q-Q plots (see the discussion of Q-Q plots in Sec. V C), we exclude these NV centers from the fits in Fig. 3(a), as they clearly do not belong to the same population; in Fig. 3(b) we include the ECDF of these data points but do not fit them. Due to the low natural abundance of 15 N, it is highly unlikely that the narrow 15 NV centers are due to naturally occurring 15 N. To wit: in a sample size of 61 PLE lines, there is a mere 1.2 × 10 −5 % chance of observing 6 or more naturally occurring 15 NV centers, i.e. P (m ≥ 6|n = 61, p = 0.0037) ≈ 1.2 × 10 −7 , calculated from the CDF of the binomial distribution with 61 trials and a success rate of 0.37%. Previous studies have reported that implanted nitrogen can lead to crystal damage that degrades the optical properties of NV centers, and that this damage can be at least partially repaired through annealing [21], but it is unclear whether the annealing is the reason we were able to observe narrow linewidths from NV centers formed by implanted nitrogen.

A. Building the Model
We now develop a model to describe the two distinct populations we see, as it could be useful to determine how different the populations are. A model could help to decide how we should classify future or unclassified data points, and to predict how narrow future linewidths in the same sample will be. Additionally, having a model will allow us to more quantitatively determine which fabrication procedures yield NV centers with better optical linewidths and quantify how certain we are a new procedure is better. Using a Bayesian approach, we model the likelihood of a particular linewidth x i with a log-normal likelihood: which is parameterized by a median µ and a standard deviation σ. This is an appropriate distribution for any purely positive quantity that has contributions from multiple independent noise sources [25] (here, e.g., electric field noise, temperature, and strain fluctuations can all influence the optical linewidth [15,16]). Using uninformative priors for the parameters, (uniform distribution for µ and the Jeffreys prior for σ [25]), we find their posterior distributions [24,25]. See Appendix A for details. Broadly speaking, the posterior distributions describe our best guess for the parameters, as well as our confidence in those guesses, given the data we have and the model we use. For ease of notation, we define two constants that depend on the data: (ln (x i )) , where x i is the ith linewidth in the dataset (or data subset, if focusing on a particular isotope, for example) and N is the total number of linewidths in the dataset (or subset). For µ, we find that the posterior distribution P (µ|{x i }) (where {x i } is the dataset of linewidths being analyzed) is given by a location-scale t-distribution: where µ µ = X, ν µ = N −1, and σ µ = For the variance σ 2 , the posterior distribution P (σ 2 |{x i }) is an inverse gamma distribution: where α σ = N −1 2 and β σ = N 2 X 2 − X 2 .
We next consider what distribution of future linewidthsx we expect to measure, given the data we have observed so far. Working in terms of the natural logarithm of the linewidthX ≡ ln(x), we also calculate the posterior predictive distribution P (X|{x i }), which describes how likely the next linewidth is to be narrow. We find that P (X|{x i }) is a location-scale t-distribution: Note that this is a location-scale t-distribution for the natural logarithm of the linewidthX, not for the linewidthx itself. The posterior predictive distribution for the linewidthx is given by which is not quite a t-distribution. For a more detailed discussion of the derivations of these distributions and their interrelations, see Appendix A.

B. Inferences from the Model
Because these distributions are of a common form, it is straightforward to find their most likely values and their credible intervals. For example, the maximum a posterior (MAP) estimate (i.e. the most likely value) for the t-distribution P (µ|{x i }) is given by µ MAP = µ µ = X , and the 95% credible interval is given by where t (f,ν) is the t-statistic at the f th percentile and with ν degrees of freedom [24,25]. Note, however, that the parameter µ in the log-normal distribution has units respectively. The MAP estimate of σ 2 is given by βσ /(ασ + 1) [24]. Unfortunately, there is no closed-form solution for the 95% CI of the inverse gamma distribution, but it can be easily estimated through simulated draws, which we describe below [24]. We graphically represent our results in Fig. 4. The dashed lines in Fig. 4(a) are the log-normal fits from Fig. 3(a). As in Fig. 3, the color of the line indicates the isotope. The solid lines are the posterior predictive distributions P (x|{x i }), and the dotted lines are the posterior distributions P (µ|{x i }). The posterior predictive distributions P (x|{x i }) resemble the sampling distributions P ({x i }|µ, σ) but are slightly broader, as they account for the uncertainty in our estimates of µ and σ. The posterior for µ N14 given by P (µ N14 |{x i } N 14 ) is fairly narrow, indicating that only a narrow range of values of µ N14 is consistent with the 14 NV data. Similar conclusions hold for the 15 NV data.
Finally, we simulate draws from the distributions, which allows us to compare the 14 NV and 15 NV results and give approximate answers to questions such as what is the probability that the next 14 NV linewidth is narrower than the next 15 NV linewidth P (x N14 <x N15 |{x i }) or how likely is the next 14 NV linewidth to be below 100 MHz P (x N14 < 100 MHz|{x i }). For example, using our data and 10 8 simulated draws from each of the posterior distributions, we find that P (µ N14 < µ N15 |{x i }) ≈ 1. Similarly, we estimate that we have a roughly 40% chance of finding sub-100 MHz 14 NV centers: P (x N14 < 100 MHz|{x i }) ≈ 0.398. For details of the simulated draws, see Appendix B.

C. Model Checking
To check how appropriate our model is for our data, we look at the quantile-quantile (or Q-Q) plots for the different data subsets. By comparing the data quantiles to the expected quantiles from the model, Q-Q plots show whether the spread in the data can be explained by the model and are therefore a useful diagnostic for determining whether a model is appropriate for the data. They can also be useful for identifying outliers in the dataset. The quantile for the ith optical linewidth in the dataset is calculated according to the formula and therefore summarizes how many standard deviations the data point is from the mean of the dataset [26]. Using a log-normal model to calculate the theoretical quantiles, we plot the Q-Q plots for the 14 NV data, the broad 15 NV data, the narrow 15 NV data, and the unclassified data in Fig. 5. Both the 14 NV data in Fig. 5(a) and the broad 15 NV data in Fig. 5(b) closely follow the diagonal dashed line, indicating the quantiles of the measured data match the quantiles we would expect from a log-normal distribution in both cases. Due to the dearth of data points, it is difficult to say how appropriate a log-normal model is for the narrow 15 NV data in Fig. 5(c), but our data do show that a log-normal model is promising. From  Fig. 5(d), it is clear that a single log-normal model is inappropriate for the unclassified data, as expected.

D. Example with Sample C
As an application of our statistical model, we now examine a third structured sample, Sample C, which was post-implanted by InnovIon with 52 keV 15 N ions at an angle of 7 • and a fluence of 5 × 10 9 ions/cm 2 . In Sample C, we compare two structured parts of the sample: one area that is 1.57 µm thick, and one that is 0.87 µm thick. First, we note that we were able to observe two narrow (< 250 MHz) linewidths in the 1.57 µm-thick area (see Fig. 6(a)). To our knowledge, these are the narrowest NV ZPL lines reported in such thin structures obtained by standard etching techniques. A recent report, however, suggests that ultra-slow etching can significantly improve surface quality and lead to a further reduction of charge noise, which is at the origin of the inhomogenous broadening [27]. We note that the distributions of the data from the two sample areas strongly overlap (see Fig. 6(a)). In Fig. 6(b), we show log-normal fits to the data and the posterior distributions for µ. We find that the two data subsets have similar MAP estimates for the medians: µ MAP 0.87 µm ≈ ln(2.27 GHz) and µ MAP 0.87 µm ≈ ln(1.47 GHz). Although the data and fits overlap and the estimates for µ are similar for the two data subsets, the posterior distributions for the medians µ barely overlap. Using the data from Sample C and simulated draws from the posterior distributions for µ, we find that P (µ 1.57 µm < µ 0.87 µm ) ≈ 0.996, strongly suggesting that the two areas have different median linewidths. For the purposes of estimating the two medians, we exclude the two narrowest linewidths in the 1.57 µm area of the sample and the broadest linewidth in the 0.87 µm area of the sample, as Q-Q plots (not shown) reveal these data points to be outliers. Although our data suggest that the thinner part of the sample has a larger median linewidth, it is unclear whether this change is due the thickness itself or due to confounding variables. We hope our statistical model will aid in determining which variables influence the spectral properties of NV centers in other nano-structured samples.

VI. CONCLUSION
We have shown that NV centers in post-implanted samples exhibit narrow linewidths, even in structured samples as thin as 1.57 µm, and that the narrow lines are primarily due to NV centers being formed from nitrogen native to the diamond. Even so, we observe a few narrow linewidths that can be attributed to implanted nitrogen. Furthermore, we develop a statistical model to aid in summarizing our results and to enable easy comparison of results between research groups. Indeed, we employ our model to show that in one of our samples, the sample thickness is linked to changes in the linewidth distribution. Our results suggest that postimplantation is a useful method for reducing or avoiding fabrication-induced damage. To further investigate the benefits of post-implantation, we propose testing different fabrication steps on a post-implanted structured sample, to study if and how various common fabrication techniques degrade the NV properties. If post-implantation does consistently improve NV coherence properties, it is worthwhile to study the effects of carbon implantation, as implanted nitrogen rarely leads to narrow linewidths. Finally, the model itself can be developed further, by implementing a hierarchical model (to allow, e.g., isotope abundance to vary across the sample, or to allow the parameters µ and σ to vary with sample location) and by including a model for data sampling and missing data. We find the joint posterior distribution for the parameters µ and σ by using Bayes law: where P (µ, σ|{x i }) is the joint posterior for µ and σ, P ({x i }|µ, σ) is the sampling distribution or likelihood, P (µ, σ) is the joint prior distribution for µ and σ, and P ({x i }) acts as a normalizing constant. The most important component is the sampling distribution P ({x i }|µ, σ), as this acts as our model for the data. As mentioned in the main text, we use a log-normal model for the dataset, such that For the prior distribution P (µ, σ) we choose uninformative priors. Typical uninformative priors for µ and σ are the uniform and Jeffreys priors, respectively [25], but as they are improper distributions (i.e. not normalizable), it is common to start with proper (i.e. normalizable) distributions and take a limit at the end of the calculation to turn the proper priors into the desired uninformative priors [24]. As we can rewrite P (µ, σ) = P (µ) P (σ) (assuming our prior states of knowledge for µ and σ are independent), we have to choose two priors. For P (µ), we choose a uniform prior: where I is the indicator function and causes P (µ) to be non-zero only within the bounds set by µ U B . In the limit µ U B → ∞, this goes to a uniform distribution that allows all real values of µ. We take this limit after finding P (µ, σ|{x i }). For P (σ) we choose the Jeffreys prior, which is a uniform distribution on a logarithmic scale and is commonly used for scale parameters such as the standard deviation: where σ LB and σ U B are the lower and upper bounds on σ. We next calculate P (µ, σ|{x i }) by combining our expressions for P ({x i }|µσ), P (µ), and P (σ), and we also use the fact that P ( −µ U B P ({x i }|µ, σ) P (µ) P (σ) dµ dσ. After taking the limits µ U B → ∞, σ LB → 0, and σ U B → ∞, we find e − N X 2 − 2Xµ + µ 2 /2σ 2 I −∞≤µ≤∞ I 0≤σ≤∞ . (A1) From here on, we leave out the indicator functions I ∞≤µ≤∞ and I 0≤σ≤∞ for ease of notation, but they are always implicitly there. Note that this derivation relies on the assumption that N ≥ 2, i.e. the dataset or data subset has at least two data points in it. Now we can calculate the marginal posteriors for µ and σ, which summarize how much our data determine those parameters. The marginal posterior for µ is defined as follows: Using our expression for P (µ, σ|{x i }) in Eq. A1, we find where α σ and β σ are defined in the main text. Note that this is a distribution for the standard deviation σ, not the variance σ 2 . To find the distribution for σ 2 , we perform a change of variables and find which is the Inverse Gamma distribution of Eq. 3 in the main text. Finally, we derive the posterior predictive distribution, which summarizes what the next data point could be, based on the data taken so far. The posterior predictive distribution is defined as follows: which leads to Eq. 5 of the main text: whereν,μ, andσ are defined above. Making a change of variables usingX = ln (x) leads to Eq. 4 of the main text: which is a location-scale t-distribution.

Appendix B: Simulated Draws
Simulating draws is a common technique in Bayesian statistics to estimate credible intervals and answer probabilistic questions [24]. To simulate draws from the posterior and posterior predictive distributions, we use Matlab's makedist function in the Statistics and Machine Learning Toolbox to define location-scale t-distributions and inverse gamma distributions with parameters determined by the data, as described in Section V.1. Matlab's random function then allows us to sample from the distributions we defined based on our data. Once samples have been drawn, it is simple to estimate the probabilities we describe above. For example, if we label our samples from P (µ N14 |{x i }) and P (µ N15 |{x i }) as {μ N14 } and {μ N15 }, respectively, then we can estimate P (µ N14 < µ N15 |{x i }): where mean (x) is the Matlab command for taking the mean of a vector [24].