Origins of Diamond Surface Noise Probed by Correlating Single-Spin Measurements with Surface Spectroscopy

Diamond Surface Noise Probed by Correlating Single-Spin The nitrogen-vacancy (NV) center in diamond exhibits spin-dependent fluorescence and long spin coherence times under ambient conditions, enabling applications in quantum information processing and sensing. NV centers near the surface can have strong interactions with external materials and spins, enabling new forms of nanoscale spectroscopy. However, NV spin coherence degrades within 100 nm of the surface, suggesting that diamond surfaces are plagued with ubiquitous defects. Prior work on characterizing near-surface noise has primarily relied on using NV centers themselves as probes; while this has the advantage of exquisite sensitivity, it provides only indirect information about the origin of the noise. Here we demonstrate that surface spectroscopy methods and single-spin measurements can be used as complementary diagnostics to understand sources of noise. We find that surface morphology is crucial for realizing reproducible chemical termination, and use this insight to achieve a highly ordered, oxygen-terminated surface with suppressed noise. We observe NV centers within 10 nm of the surface with coherence times extended by an order of magnitude.


I. INTRODUCTION
Nitrogen-vacancy (NV) centers in diamond are a promising platform for quantum information processing and sensing [1,2], and shallow NV centers near the diamond surface are actively explored as highly sensitive sensors with subnanometer resolution [3][4][5][6]. Although it is easy to place NV centers near the surface by low-energy ion implantation [7,8] or delta doping [7,9], the surface itself can host defects that lead to noise that obscures the sensing target [ Fig. 1(a)]. We observe that coherence time degrades with proximity to the surface in numerous samples with different surface conditions [ Fig. 1(b)], consistent with prior studies [9][10][11], pointing to the need for new techniques to understand and control diamond surfaces. Gaining precise control over diamond surface chemistry is challenging because diamond is a chemically inert material, and also because it is hard to prepare uniform, flat diamond surfaces. Surface morphology is difficult to control because diamond's hardness makes etching and polishing nontrivial. State-of-the-art diamond polishing can achieve surface roughness below 1 nm, but the resulting surface is highly strained. Plasma etching can remove this strained layer [12,13], but this process is highly anisotropic, and therefore small differences in initial conditions can lead to dramatic differences in final morphology and termination [14,15] (see Appendix E). Therefore, direct characterization of the surface is crucial for establishing that particular protocols reproducibly lead to specific, desired surface terminations.
In this work, we characterize the diamond surface by correlating photoelectron spectroscopy, x-ray absorption, atomic force microscopy (AFM), and electron diffraction with measurements of NV spin decoherence and relaxation to identify and eliminate sources of noise at the surface. We find that surface roughness leads to poor NV coherence, and we observe that surface morphology changes the density of electronic defects observed with photoelectron spectroscopy, even for the same nominal chemical termination, implying that it is critical to maintain precise control over surface purity and morphology at every processing step.

II. PREPARATION OF A SMOOTH OXYGEN-TERMINATED DIAMOND SURFACE
In our procedure, we remove surface and subsurface damage resulting from polishing and reactive ion etching (RIE) before ion implantation, perform high-temperature annealing to remove implantation damage, and use oxygen annealing followed by wet oxidation to terminate the surface [ Fig. 1(c); see also Appendix A 2]. In order to ensure high purity throughout processing, samples are cleaned in a refluxing mixture of concentrated perchloric, nitric, and sulfuric acids (triacid clean) before RIE and all annealing steps. Starting with scaife-polished substrates with rms roughness of less than 1 nm, we can typically achieve final oxygen-terminated surfaces with rms roughness ∼100 pm, as measured by AFM [ Fig. 1(d)]. We show detailed examples in Appendix E of contamination and irreversible surface roughening when this procedure is not followed. Using this surface processing, we extend the coherence times of NV centers within nanometers of the surface by around one order of magnitude [ Fig. 1(b)].
To study the effects of different oxygen terminations on spin coherence, we prepare samples containing shallow NV centers using low-energy ion implantation followed by high-temperature annealing at 800°C and triacid cleaning, and we focus on detailed comparison before and after oxygen annealing. This procedure, excluding the final oxygen annealing step, is widely used for preparing shallow NV centers [3,4,9,[16][17][18]. We isolate the impact of the oxygen annealing step by studying the same NV centers near the surface through multiple processing cycles with and without this step.

III. NV CENTER SPECTROSCOPY
We performed single-spin measurements on shallow NV centers that are implanted into an electronic grade diamond sample with natural abundance 13 C, before and after oxygen annealing. Six NV centers were randomly selected from a confocal scan. Their depths were measured using a proton NMR signal from the microscope immersion oil [18] (see Appendix A 7), and the direct comparison of their properties under the two surfaces is shown in Fig. 2. We measured the coherence time T 2;echo using a Hahn echo sequence for each NV center. We then studied the spectrum of the local noise environment using XY4 and XY8 dynamical decoupling sequences [8,19]. Additionally, we studied the high-frequency spectral properties of the noise using single-quantum (SQ) and double-quantum (DQ) relaxation measurements for both surfaces [20]. Figure 2(a) shows a comparison of a Hahn echo decay of the same NV center before and after oxygen annealing. The measurements were performed at B z ¼ 1900 G in order to average out the free precession of the 1.1% natural abundance of 13  The surface can host defects that produce electric and magnetic field noise. (b) Hahn echo coherence time T 2;echo as a function of NV depth, measured across six samples with different surface conditions. Although the triacid-cleaned surfaces have different origins and processing histories (see Appendix A 1), the relationship between T 2;echo and depth is similar across all samples, and the high-temperature and oxygen-annealed sample exhibits significantly improved coherence times at the same depths. (c) Surface processing steps before implantation to remove surface polish damage and subsurface RIE damage, and after implantation to form NV centers and create a wellordered oxygen surface termination. The high-temperature annealing step can be performed at 800°or 1200°C. (d) AFM images of initial scaife-polished diamond (top) and the final surface after oxygen annealing (middle); scale bar is 100 nm. Confocal image (bottom) showing photoluminescence (PL) from individually resolvable NV centers; scale bar is 1 μm.
indicating that noise at the surface is suppressed upon oxygen annealing. However, T 2;echo still decreases as the NV centers approach the surface, indicating that surface noise remains the dominant source of decoherence. Figure 2(b) shows an example of the measured coherence time T 2 as a function of the number of π pulses, N. We observe a clear improvement in T 2 under the oxygenannealed surface by up to a factor of 4 compared to the triacid-cleaned surface for all N (see Appendix G). For a slowly fluctuating bath, such as 13 C nuclear spins or dilute P1 center electron spins, T 2 is expected to scale as N 2=3 [ Fig. 2(b), dashed lines] [19]. We fit the data to the scaling N s and obtain s ¼ 0.2-0.7 across all NV centers, and in some cases, we observe that T 2 saturates for N < 40 [ Fig. 2(b) and Appendix G]. This range of values that deviate from the expected N 2=3 scaling indicates that the noise at the surface is broadband, and spectral decomposition reveals a noise spectrum spanning 10 kHz to 1 MHz (see Appendix H).
Because the surface morphology remains smooth through oxygen annealing [ Fig. 1(d)], the termination is reversible. We demonstrate this reversibility by performing a "surface reset" via 800°C vacuum annealing and triacid cleaning [ Fig. 2(c), inset]. For direct comparison between different surfaces, we measure the coherence time with Hahn echo, T 2;echo , and XY8, T 2;XY8 , sequences from the same NV centers across different surface terminations. After the surface reset, coherence times are reduced to values comparable to those prior to oxygen annealing. Finally, these coherence times can be restored by repeating the oxygen annealing, showing that we have reproducible control over surface termination.
We note that we achieve the longest coherence times by annealing at 1200°C after annealing at 800°C [ Fig. 1(b)]. The higher temperature removes divacancies and multivacancy centers that form after ion implantation, which can contribute magnetic noise [10,[21][22][23]. However, we also observe that the NV center charge state is not stable after 1200°C annealing without subsequent oxygen annealing. Therefore, to isolate the role of the oxygen termination, we have performed these experiments with only an 800°C postimplantation anneal.
We probe the separate contributions of electric and magnetic noise by measuring relaxation rates between different levels in the NV ground state [Figs. 2(d) and 2(e)]. The SQ transition can be driven by magnetic noise, while the DQ transition is magnetically forbidden and can thus be used to probe electric field noise. Comparison of the measured SQ and DQ spin relaxation times allows for the extraction of SQ and DQ transition rates, which are a reflection of the relative contributions of electric and magnetic noise [20]. Figures 2(d) and 2(e) shows the measured SQ and DQ spin relaxation times, T 1;SQ and T 1;DQ , measured at B z ¼ 40 G for the two different surface terminations. We observe an improvement in T 1;SQ of 1-2 orders of magnitude after oxygen annealing [ Fig. 2(d)], indicating that the highfrequency magnetic field noise is strongly suppressed, consistent with T 2 measurements that probe the magnetic field noise at lower frequencies. In comparison, T 1;DQ exhibits a small improvement (less than a factor of 3) after oxygen annealing [ Fig. 2(e)]. Dynamical decoupling, SQ, and DQ relaxation measurements are sensitive to different frequency regimes, but the DQ transition rate is expected to scale inversely with frequency as 1=f n , where n ≤ 2, allowing for extrapolation to other frequencies [20]. Spectral comparison of the dynamical decoupling data and DQ relaxation data indicates that the electric field noise is not the dominant source that limits the coherence of NV centers at high frequencies under either surface termination prepared with our methods (see Appendix H). . T 1;SQ shows a more pronounced improvement with oxygen annealing compared to T 1;DQ , indicating that magnetic noise is more strongly suppressed than electric field noise.

IV. SURFACE SPECTROSCOPY
NV-based measurements indirectly suggest that the triacid-cleaned and oxygen-annealed surfaces host different electric and magnetic defects, and therefore have different electronic structure. To directly characterize the structure and chemical composition of the two different oxygenterminated surfaces, we employ a variety of surfacesensitive spectroscopy techniques [ Fig. 3(a)]. Near-edge x-ray absorption fine structure (NEXAFS) spectroscopy probes the density of unoccupied states near the surface, ultraviolet photoelectron spectroscopy (UPS) gives information about the Fermi energy and electron affinity, and x-ray photoelectron spectroscopy (XPS) yields information about the chemical state of the surface termination.
The NEXAFS spectra at the oxygen K edge for the two surfaces are qualitatively similar [Figs. 3(b) and 3(c)]. Both exhibit a sharp π Ã peak at 532.5 eV and a broad σ Ã shoulder at around 540 eV, indicating similar chemical states. Varying the angle of incidence of the linearly polarized x rays changes the relative polarization with respect to the surface normal. As this angle is varied, the signal changes dramatically for the oxygen-annealed surface [ Fig. 3(c)], while the triacid-cleaned surface shows no variation [ Fig. 3(b)]. Strong polarization dependence arises from distinct and well-resolved bond orientations [24], indicating that the oxygen-annealed surface is highly ordered at the atomic scale, while the oxygen groups in the triacid-cleaned surface are disordered [ Fig. 4(c)].
The NEXAFS spectra at the carbon K edge [ Fig. 3(d)] show a characteristic exciton peak at 289.2 eV and a second absolute band gap at 302.2 eV [25]. At energies below the exciton peak at the conduction band edge, both surfaces exhibit two peaks, one at 285 eV that is assigned to sp 2 carbon, and one at 286.5 eV associated with oxygen termination [25]. However, the triacid-cleaned surface has an average of 2.4 times higher density of unoccupied states below the conduction band edge, indicated by the area under the preedge region. These energetically deep unoccupied states at the surface can potentially act as electronic traps that host unpaired electrons, which can contribute both magnetic and electric field noise [26]. Furthermore, a morphologically rough surface after the same surface preparation and oxygen   ,(c) Polarization dependence of NEXAFS spectra at the oxygen K edge. The polarization angle is defined relative to the surface normal. The y axis for all NEXAFS spectra is the partial electron yield (PEY), normalized and baseline subtracted (see Appendix A 5). The oxygen-annealed surface spectrum exhibits strong polarization dependence, indicating that oxygen groups at the surface are well ordered. (d) NEXAFS spectra at the carbon K edge from three triacid-cleaned samples (blue curves) and two oxygen-annealed samples (red curves). Triacid-cleaned surfaces show a higher density of unoccupied states (shaded regions). The full energy range is shown in the inset. (e) UPS spectra with excitation energy hν ¼ 21.2 eV of the triacid-cleaned (blue) and oxygen-annealed (red) surfaces. The spectral width ω is indicated by the shaded regions. The oxygenannealed surface exhibits a higher positive electron affinity E A (þ2.14 eV) than the triacid-cleaned surface (þ0.92 eV). (c) Ball-and-stick diagrams illustrating the surface termination before and after oxygen annealing. The disordered, acid-cleaned surface hosts a mixture of groups, which transform into a highly ordered, predominantly ether-terminated surface upon oxygen annealing.
annealing exhibits a much larger density of unoccupied states than the smooth surface (see Appendix E). Using UPS, we observe that the oxygen-annealed surface exhibits a positive electron affinity of þ2.14 eV, compared to þ0.92 eV for the triacid-cleaned surface [ Fig. 3(e); see also Appendix I], indicating that the two surfaces possess drastically different electronic structure. To the best of our knowledge, this electron affinity is the largest reported for oxygen-terminated diamond [11].
Combining the data from surface spectroscopy and NV measurements, we conclude that disorder at the surface can lead to unoccupied defect states near the conduction band edge of diamond, which in turn lead to rapid decoherence of NV centers near the surface. These defect states give rise to broadband magnetic noise that cannot be circumvented by simple dynamical decoupling. It is therefore important for future applications in nanoscale sensing to devise methods to eliminate disorder and defect states at the diamond surface.
We now turn our attention to the chemical identification of the well-ordered, oxygen-annealed surface. XPS [ Fig. 4(a)] reveals that the only detectable atoms are carbon and oxygen. The oxygen peak comprises 6%-7% of the signal, corresponding to approximately monolayer surface coverage (see Appendix K). High-resolution XPS was used to probe the structure of the carbon and oxygen 1s peaks in detail [ Fig. 4(a), inset]. The carbon 1s spectrum shows a dominant peak at 285 eV, which we assign to diamond sp 3 carbon. Two satellite peaks at higher binding energies of þ1.2 and þ2.4 eV correspond to carbon singly and doubly bonded to oxygen, respectively [27]. The peak at lower binding energy of −0.8 eV is assigned to sp 2 carbon. The oxygen 1s spectrum shows a major peak at 532.3 eV and two satellite peaks at lower binding energies of −1.0 and −2.8 eV with a relative ratio of 11∶2:1. These peaks have been previously assigned to ether, alcohol, and ketone, respectively [28]. Low-energy electron diffraction (LEED) indicates that the surface is 1 × 1 reconstructed [ Fig. 4(b)]. Combining the XPS and LEED data, we assign the surface as predominantly ether terminated (∼80%), with a minority mixture of alcohol and ketone groups [ Fig. 4(c)]. Additionally, our measured electron affinity is consistent with density functional theory calculations of an ether-terminated surface [29]. We note that all of the surface spectroscopy techniques are unable to directly detect hydrogen, and we thus cannot exclude that the mixed surface includes residual hydrogen, although the large positive electron affinity rules out significant hydrogen incorporation [11].

V. CONCLUSION AND OUTLOOK
While the coherence of shallow NV centers is significantly improved by the techniques presented here, these coherence times remain far from typical bulk values [30]. The present work suggests a number of promising avenues for future study. The sp 2 peak observed in NEXAFS and XPS is a deep electronic trap, and is a natural target for improvement [26]. It was also recently demonstrated that NV center coherence can be improved by implanting nitrogen through a boron-doped layer [10]. Combining such strategies with our surface preparation could yield even longer spin coherence. Finally, it is unknown what contribution adventitious carbon contamination makes to magnetic and electric field noise. Our ongoing work includes preparing and interrogating surfaces in ultrahigh vacuum conditions to disentangle the contributions of chemical surface termination and exogenous contamination.
Our approach of combining surface spectroscopy with single-spin measurements can be applied to the future development of novel surface terminations. We have shown that surface morphology and electronic structure measurements can help to evaluate which surfaces are likely to lead to further improvements in NV coherence, which can provide useful benchmarking for rapid exploration of new surface chemistry. More broadly, the strategy of correlating surface spectroscopy with qubit measurements can be applied to a variety of quantum platforms that also exhibit deleterious effects from surfaces and interfaces, such as superconducting qubits [31], trapped ions [32], and shallow donors [33]. The authors declare no competing financial interests.
APPENDIX A: METHODS

Samples
In this paper, we present NV coherence data from several samples in Fig. 1(b). Here, we describe the different samples.
(i) Sample A. The sample used for data in Fig. 2.
Commercially available electronic grade diamond (Element Six) with <5 ppb nitrogen and <1 ppb boron. (ii) Sample B. The sample that was subjected to 1200°C annealing followed by oxygen annealing, presented in Fig. 1(b). This sample originated from the same crystal as sample A. The crystal was sliced prior to any processing described in Fig. 1(c). (iii) Sample C. Another sample that originated from the same crystal as sample A. The crystal was sliced prior to any processing described in Fig. 1(c). (iv) Sample D. An electronic grade sample with 12 C-enriched layer that was processed according to Fig. 1(c).
(v) Sample E. An electronic grade sample with 12 Cenriched layer with rough, as-grown surface that was subsequently processed with Ar=Cl 2 and O 2 RIE. (vi) Sample F. An electronic grade sample with 12 Cenriched layer. The surface is left as grown. In addition to these NV samples, surface spectroscopy data is presented from several electronic grade samples. Three triacid-cleaned samples and three oxygen-annealed samples, one of which is morphologically rough, were used for NEXAFS and high-resolution XPS [Figs. 3(b)-3(d) and Fig. 4(a), inset]. A boron-doped sample (0.1 ppm boron and <5 ppb nitrogen) was used for UPS spectroscopy [ Fig. 3(e)] to prevent charging. Finally, a lower-purity sample (<1 ppm nitrogen and <0.5 ppm boron, Element Six "standard grade") was used for oxygen annealing calibration.

Sample preparation
Our method for preparing a high-quality diamond surface prior to ion implantation relies on a multistep process to remove surface and subsurface damage. Unless indicated otherwise, all samples described above are laser cut and scaife polished to a rms roughness of less than 1 nm with a (100) major face and h110i edges, specified to within 3°. In order to prepare substrates for implantation, reactive ion etching was performed using an inductively coupled plasma (ICP) with the following parameters: 400 W ICP power, 250 W substrate bias rf power, 25 sccm Ar, 40 sccm Cl 2 , 8 mTorr (1 Pa) for 30 min followed by 700 W ICP, 100 W substrate bias, 30 sccm O 2 , 10 mTorr (1.3 Pa) for 25 min (Plasma-Therm Versaline ICP RIE). These two RIE steps etch approximately 2 and 4 μm of the subsurface polish damage layer, respectively [13].
(4) Ramp to 1200°C over 6-12 h. Hold for 2 h. (5) Let cool to room temperature. This annealing results in a <3 nm layer of amorphous carbon at the surface, which is subsequently removed by cleaning the sample in a refluxing 1∶1:1 mixture of concentrated sulfuric, nitric, and perchloric acids (triacid clean) for at least 1 h. The conversion of material to amorphous carbon and subsequent removal is critical for removing subsurface damage resulting from RIE processing. Annealing at lower pressures [below 1 × 10 −7 Torr (1 × 10 −5 Pa)] does not result in a thick layer of amorphous carbon, and thus does not remove this damage layer (Fig. 6).
Sample A was then sent for 15 N ion implantation (Innovion) with the following recipe: dose ¼1×10 9 cm −2 , energy 3 keV, and 0°tilt. Other samples were implanted with the same parameters, except with doses of 3 × 10 9 cm −2 for samples B and F and 5 × 10 8 cm −2 for samples C, D, and E. Following implantation, all samples are triacid cleaned and 800°C annealed in vacuum using the same recipe as above, with or without the 1200°C step. Another triacid clean following this vacuum anneal results in the condition referred to as the "triacid-cleaned" surface throughout the text.
To create the oxygen-terminated surface, the sample is then annealed at 445°C-450°C in a tube furnace (Lindberg Blue Mini-Mite with high-purity quartz process tube) under continuous flow of O 2 at atmospheric pressure for 4 h. The oxygen flow is regulated with a mass flow controller, and the outlet of the process tube is connected to a bubbler to prevent backflow of gases. The input gases, oxygen (for annealing) and nitrogen (for venting the furnace), are filtered via SAES Sentrol point-of-use purifiers, MC1-203F and MC1-902F, respectively. Following the oxygen anneal, the sample is cleaned in a 1∶2 mixture of hydrogen peroxide in concentrated sulfuric acid (piranha). The resulting sample condition is referred to as the "oxygenannealed" surface throughout the text. Finally, sample B was annealed at 1200°C and subsequently oxygen annealed to achieve the best spin coherence times, shown in Fig. 1(b).
XPS is used between each step to verify that the surface is contamination free at the 0.1% level, which is the sensitivity limit of the instrument. If any heteroatoms other than C and O are found (e.g., Na, Cl, Si), the sample is repeatedly cleaned with either triacid or piranha until the contamination is eliminated. We show examples of XPS spectra from a contaminated sample before and after acid cleaning in Fig. 5, as well as micromasking and surface roughening that can result from contamination in Fig. 8.

Process calibration for oxygen annealing
Since diamond etches when heated in an oxygen atmosphere [34], changing the surface termination while avoiding etching requires careful temperature calibration. Our process proceeds as follows.
(1) Clean sample in 1∶2 hydrogen peroxide in sulfuric acid (piranha). Verify that the sample is contaminant-free in XPS. temperature. Previous studies showed that diamond starts to etch in oxygen around 500°C [34]. Therefore, we begin with 450°C for our calibration and choose the final temperature to be the highest temperature that does not produce pitting on the sample. Examples of AFM images taken after annealing at different temperatures are shown in Fig. 7. Fig. 4(a) and AFM images in Fig. 1(c) were performed at the Imaging and Analysis Center (IAC) at Princeton University. XPS was performed with a Thermo Fisher K-Alpha spectrometer, collecting photoelectrons normal to the surface. AFM was performed interchangeably with either a Bruker Nanoman or a Bruker ICON3 AFM operating in ac tapping mode (AFM tip Asylum Research AC160TS-R3, resonance frequency 300 kHz). Each diamond was thoroughly cleaned with either triacid or piranha before AFM scans were performed. Large-scale (5 × 5 μm 2 ) and small-scale (1 × 1 μm 2 or 0.5 × 0.5 μm 2 ) scans were performed in several distinct areas of the diamonds, away from the edge. In general, for the same sample, no clear variation of the rms roughness was observed across the interrogated areas.

NEXAFS and high-resolution XPS
In NEXAFS spectroscopy, monochromatic x rays excite core electrons, and secondary electron yield is measured as a function of the incident x-ray energy, giving a signal that is proportional to the density of unoccupied states near the surface. In XPS, incident x rays ionize core electrons, and the measured binding energy is sensitive to the chemical environment of the ionized atom.
Unless indicated otherwise, all NEXAFS data and high-resolution XPS spectra (Fig. 4, insets) were acquired at the Australian Synchrotron soft x-ray spectroscopy beam line, using light from an APPLE II undulator generating linearly polarized photons and passed through a planegrating monochromator. Prior to scanning, the samples were annealed in situ at 430°C to remove adventitious carbon [35].
Carbon K-edge and oxygen K-edge NEXAFS were collected in partial electron yield mode with grid biases of 220 and 440 V, respectively. The spectra are processed and calibrated by first dividing by the total incident power measured using photoelectrons from clean gold foil in the chamber, subtracting the average preedge background (270-275 eV for carbon, 520-525 eV for oxygen), and normalizing to the postedge electron yield (315-320 eV for carbon, 558-560 eV for oxygen). The energy is calibrated by setting the sharp σ Ã exciton peak of the gold foil to 291.65 eV [36].
High-resolution XPS spectra were analyzed using a SPECS Phoibos 150 hemispherical analyzer with the pass energy set to 5 eV, resulting in a linewidth of better than 0.1 eV. An excitation photon energy of 600 eV was used. XPS spectra were fitted using CasaXPS. A linear fit to the preedge was first subtracted from the data to account for the rising secondary electron tail apparent in spectra acquired with a photon energy close to the core level energy. Subsequently, a universal Tougaard background was subtracted and Voigt functions were used to fit the resulting spectra. Each component function was constrained to have the same FWHM as all others within the same spectrum. We find that the carbon signal fit residual is minimized by fitting two side peaks on the high binding energy side rather than one, and that it does not improve by fitting three side peaks. We identify these two peaks as carbon singly and doubly bonded to oxygen.

Additional XPS, UPS, and LEED measurements
Additional XPS in Fig. 14, UPS, and LEED measurements were carried out in a custom UHV spectrometer at Aberystwyth University. X-ray excitation was provided by a VG twin-anode (Mg and Al) source and He I UV radiation was provided by a SPECS UVS 300 source. Photoelectrons were collected at normal emission by a SPECS Phoibos 100 analyzer using a 2D CCD electron detector.
In UPS, ultraviolet photons (21.2 eV) ionize valence electrons, and their binding energy can then be used to determine the Fermi energy and electron affinity. For XPS, the sample was kept at Earth potential while for UPS, a bias of −2 V was applied to the sample to enable collection of low-energy electrons over a range of sample work functions. The electron analyzer was operated in wide-angle ORIGINS OF DIAMOND SURFACE NOISE PROBED … PHYS. REV. X 9, 031052 (2019) 031052-7 mode to sample band edge states averaged in momentum space. Since the apparent binding energy of electron states measured by photoelectron spectroscopy is affected by surface charging and photovoltage generation [37], we calibrate the valence band edge against the Fermi edge of a tantalum standard. Rear-view VG LEED optics were used to record surface electron diffraction patterns. The beam energy was set to 86 eV for the diffraction pattern shown in Fig. 4(b).

NV measurement setup
NV measurements were performed on a home-built confocal microscope. NV centers are excited by a 532nm optically pumped solid-state laser (Coherent Sapphire LP 532-300), which is modulated with an acousto-optic modulator (Isomet 1205C-1). The beam is scanned using galvo mirrors (Thorlabs GVS012) and projected into an oil immersion objective (Nikon, Plan Fluor 100×, NA ¼ 1.30) with a telescope in a 4f configuration. Laser power at the back of the objective was kept between 60 and 100 μW, approximately 25% of the saturation power of a single NV center, in order to avoid irreversible photobleaching. A dichroic beam splitter (Thorlabs DMLP567) separates the excitation and collection pathways, and fluorescence is measured using a fiber-coupled avalanche photodiode (Excelitas SPCM-AQRH-44-FC). A neodymium magnet is used to introduce a dc magnetic field for Zeeman splitting, and the orientation of the magnetic field was aligned to within 1°of the NV axis using a combination of a rotation stage and a goniometer.
Spin manipulation on the NV center was accomplished using microwaves generated by a dual-channel signal generator (R&S SMATE200A). The two channels are independently gated with fast SPDT switches (Mini-Circuits ZASWA-2-50DR+) and combined with a resistive combiner (Mini-Circuits ZFRSC-42-S+) for double-quantum and double electron-electron resonance (DEER) measurement capabilities. The combined signal is then amplified with a high-power amplifier (Mini-Circuits ZHL-16W-43+ and Ophir 5022A) and delivered to the sample via a coplanar strip line. The strip line is fabricated by depositing 10 nm Ti, 1000 nm Cu, and 200 nm Au on a microscope coverslip. Following metallization, the strip line is photolithographically defined and etched with gold etchant and hydrofluoric acid. Finally, a 100-nm layer of Al 2 O 3 is deposited on top of the fabricated strip line via atomic layer deposition (ALD) to protect the metal layer. This Al 2 O 3 layer is crucial for separating the diamond surface from the metal layer of the strip line, which can contaminate the diamond surface [ Fig. 5(c)]. Pulse timing is controlled with a Spincore PulseBlaster ESR-PRO500 with 2-ns timing resolution, and phase control of the NV microwave pulses is achieved with an arbitrary waveform generator (Agilent 33622A).
To avoid effects of pulse errors during dynamical decoupling, we alternate the phase of each π pulse using the XY4 protocol for the four-pulse sequence, the XY8 protocol for the eight-pulse sequence, and repeated XY8 protocol for higher-order sequences [38].
The errors in depth and coherence presented in the main text and Appendixes are calculated for each NV center as follows. Each sequence consists of measuring the NV coherence as a function of interpulse duration. We fit the data according to Ref. [18] and obtain the depth and uncertainty from the fit. We then average the depths from the different pulse sequences and report the uncertainty to be either the standard deviation or the largest uncertainty from the fit, whichever is greater. For error bars in coherence presented in the graphs, the reported values and error bars are obtained from the fit to a coherence or relaxation decay curve. Each data point results from measuring NV population as a function of evolution time, such as the data in Fig. 2(a).

APPENDIX B: EFFECTS OF SAMPLE CONTAMINATION
In this work, we emphasize that prior to any surface processing step, it is critical to start with a morphologically smooth and contamination-free surface, since impurities can cause irreversible surface damage through processing. It is also important to monitor the surface roughness and contamination between each step. To ensure purity between each processing step, we clean the diamonds in triacid or piranha solution and check for contaminants in XPS before proceeding. Figure 5(a) shows examples of XPS spectra from the same sample before and after triacid cleaning. While both surfaces show identical survey scans, fine scans can reveal small Si and Na contamination peaks (cyan curves) that are removed after triacid cleaning. Typical sources of Si, Cl, Na, and other contamination include improper drying and handling, used solvent bottles, and device packaging. We have also performed similar contamination checks to develop processes such that annealing and reactive ion etching steps do not introduce surface contamination.
If the surface is contaminated before etching or annealing, micromasking and formation of surface carbides can lead to irreversible surface roughening. For example, silicon-containing polymers used in packaging, gloves, and other containers can leech onto the diamond surface, as verified by XPS. If the diamond is annealed above around 900°C, this silicon-containing contamination layer forms a carbide at the diamond surface, which cannot be removed with triacid cleaning, piranha solution, or any other acid or base that we have explored. This carbide layer then results in surface roughening through subsequent processing, such as oxygen annealing. Similarly, Na and Cl contamination are correlated with surface roughening during reactive ion etching, which we attribute to micromasking. We show in Fig. 8 examples of surface roughness that can result from surface contamination and damage, and we discuss the consequences for electronic structure and NV coherence in Appendixes E and F.
Another source of surface contamination is adventitious carbon that is ubiquitous on surfaces exposed to atmosphere [35]. For NEXAFS and high-resolution XPS, we performed in situ annealing in order to probe the intrinsic electronic structure of the surface, rather than the adventitious carbon. Figure 5(b) shows a representative NEXAFS spectrum at the carbon K edge before and after annealing at 530°C. After annealing, the density of unoccupied states near the conduction band edge is suppressed, indicating either the removal or rearrangement of carbon-containing groups at the diamond surface. The sample was then removed from vacuum and exposed to atmosphere for a few minutes, and reinserted into the chamber without further annealing. Upon reexposure to atmosphere, much of the preedge density returned, indicating that this preedge feature arises from carbon-containing contamination of the surface.
This particular NEXAFS dataset was acquired at the NIST U7a beam line of the National Synchrotron Light Source (NSLS) at Brookhaven National Laboratory. For data taken at NSLS, the carbon K-edge and oxygen K-edge NEXAFS were collected in a partial electron yield (PEY) mode with an entrance grid bias of 220 V for carbon and 400 V for oxygen. The incident light from a bending magnet is passed through a toroidal spherical grating monochromator, focused through a monochromator slit, and enters the chamber polarized perpendicular to the plane of incidence. The polarization with respect to the sample can then be controlled by changing the angle of the sample. Spectra are processed and calibrated by first dividing by the total incident intensity, which is measured using photocurrent from a clean gold grid in the chamber, subtracting the average preedge background (270-275 eV for carbon, 520-525 eV for oxygen), and normalizing to the postedge electron yield (340-342 eV for carbon, 568-570 eV for oxygen). The energy calibration is reported relative to amorphous carbon and oxygen standards. A tantalum heater was used for in situ annealing in the loadlock at pressures ranging from 3 × 10 −7 Torr (4 × 10 −5 Pa) to 8 × 10 −7 Torr (1 × 10 −4 Pa).
Finally, surface contamination after processing can impact the coherence and spin relaxation of shallow NV centers. In particular, contact with metal particles from the coplanar microwave strip line can result in decreased spin relaxation time T 1 and coherence time T 2 . Without the protection of the Al 2 O 3 ALD layer, shallow NV centers exhibit shorter spin relaxation and coherence times when placed in microscope immersion oil on the metal strip line, as illustrated in Fig. 5(c).

APPENDIX C: sp 2 CARBON FORMATION DURING HIGH-TEMPERATURE ANNEALING TO REMOVE ETCH DAMAGE LAYER
XPS of the carbon 1s peak following the 1200°C anneal [ Fig. 6(a)] shows a significant sp 2 carbon peak at 284.2 eV, in addition to the diamond sp 3 carbon peak at 285 eV. The double peak is consistent with an sp 2 carbon layer that is <3 nm thick, and the relative magnitude of this peak can be increased by shortening the ramp time and thus increasing the peak pressure during the vacuum anneal. Typical peak pressures are 2 × 10 −6 Torr (4 × 10 −5 Pa) to 5 × 10 −6 Torr (7 × 10 −4 Pa) for the 12-and 6-h ramps, respectively. Triacid cleaning removes the sp 2 carbon layer, as shown by the blue curve in Fig. 6(a). Raman spectroscopy (Horiba LabRam Evolution, 532-nm excitation) of the surface with the sp 2 carbon layer shows no evidence of graphitic carbon, but instead reveals a side peak associated with glassy or amorphous carbon around 1350 cm −1 . At low temperatures, oxygen annealing is not effective at changing the diamond surface chemistry, and at temperatures between 450°C and 500°C, oxygen annealing can lead to irreversible surface roughening. In order to precisely calibrate the oxygen annealing temperature, we perform detailed AFM characterization to detect the onset of surface roughening. Figures 7(a)-7(c) show AFM images from the oxygen annealing calibration after annealing at 450°C-470°C in 10°C steps. The top (bottom) row shows 5 × 5 μm 2 (500 × 500 nm 2 ) scans.
The sample annealed at 470°C shows micropits that are more visible in the 500 × 500 nm 2 scan. Therefore, we anneal the NV samples at 445°C-450°C to avoid pitting.
Most failures of the oxygen anneal result in high surface roughness and prevalence of micropits, as shown in Figs. 9(a) and 9(b). Anecdotally, we also found that after several annealing cycles with the same oxygen tank, the process starts to result in micropits, despite the oxygen tank being more than half full. Switching the oxygen tank more frequently results in a smooth surface with no significant change in the optimal temperature after calibration.
To monitor the etching of the diamond surface due to oxygen annealing, we measure the depths of the NV centers at each of the successive annealing steps. The results are shown in Fig. 7(d). While we observe a small decrease in depths (1-2 nm) on NV2, NV3, and NV5, the depths of NV4 and NV6 remain unchanged across the four processing steps. The depth decrease from the subset of NV centers could be attributed to micropits that occur after successive oxygen annealing [ Fig. 7(c)]. For the depths presented in the main text, we use the average and standard deviation of all the depth measurements for a given NV center through different processing steps.

APPENDIX E: SURFACE DAMAGE FROM OXYGEN PROCESSING AND REACTIVE ION ETCHING
We show in Figs. 8(b)-8(i) AFM and SEM images of various samples with rough surface morphology that result from surface contamination and subsequent processing. This surface roughness leads to drastic differences in the electronic structure at the surface, which is evident in photoelectron spectroscopy. For example, the sample in Fig. 8(c) was RIE etched after improper   6. (a) Normalized XPS spectra of the same sample after a 1200°C anneal with different ramp rates. Material at the surface is converted to sp 2 carbon during annealing (red and green) and manifests as a peak at lower binding energy compared to the spectrum after the subsequent triacid clean (blue), which consists primarily of an sp 3 carbon peak. (b) The Raman spectrum of the 1200°C annealed surface shows a glassy carbon peak, but no clear evidence of graphite peak at 1580 cm −1 [39].
polishing. Figures 8(j) and 8(k) show the NEXAFS and XPS spectra from this same sample after subsequent oxygen annealing, overlaid with spectra from a smooth diamond for comparison. Figure 8(j) shows a larger preedge sp 2 carbon peak, a much larger density of unoccupied density of states below the conduction band edge, and a lower contrast second band gap. The carbon 1s XPS spectrum for this rough surface [ Fig. 8(k)] also shows a significantly larger 284-eV "nondiamond" carbon peak. These data show that despite nominally identical processing, the rough surface exhibits very different electronic characteristics compared to a smooth surface, and thus it is important to start with and maintain the smooth surface morphology throughout various processing steps.

APPENDIX F: SHALLOW NV SPIN COHERENCE IN ROUGH SAMPLES
In addition to the data in the main text, we also performed oxygen annealing on three additional samples, samples C, D, and E (Fig. 9). While we observe improvements in coherence times for sample C and sample D after oxygen annealing, the effects are not as pronounced as that of sample A. Moreover, sample E shows essentially no improvement.
By examining the surface properties of these samples, we found that the surfaces are morphologically different from sample A. These samples had all undergone extensive prior processing. Notably, all three samples have higher average roughness and show micropits that we do not observe in sample A. These results suggest that subtle differences in surface morphology have a significant effect on NV coherence times, even after oxygen annealing.

APPENDIX G: DYNAMICAL DECOUPLING DATA
We show in Fig. 2(b) that shallow NV centers under both triacid-cleaned and oxygen-annealed surfaces show scaling of coherence time with number of pulses that differ from that of a slowly fluctuating spin bath, T 2 ∝ N 2=3 . We fit the data to a saturation curve, given by or to a power law, given by when there is no observable saturation of T 2 [8]. The fitted curves are shown in Fig. 10 and the fitting parameters for the two surface terminations are shown in Table I.
APPENDIX H: NOISE SPECTRAL DENSITY EXTRACTED FROM NV CENTER MEASUREMENTS

Dynamical decoupling spectral decomposition
We probe the spectral density of the noise bath from the oxygen-terminated surface using dynamical decoupling. From the coherence decay CðTÞ, where T ¼ Nτ is the total free precession time, the noise spectrum SðωÞ can be obtained using spectral decomposition, Eqs. (H1) and (H2) [8,20]: where F N ðωTÞ is the N-pulse filter function with a peak maximum at ω ¼ πN=T, given by Spectral decomposition reveals a broadband noise spectrum across the frequency range of 0.01-1 MHz [ Fig. 11(f)].

Double-quantum relaxometry
To probe the electric field noise, we follow the procedure and notation from Myers et al. [20]. We briefly outline their methods here, starting from the NV Hamiltonian, given by where D ¼ 2.87 GHz is the zero-field splitting, γ e ¼ 2.8 MHz=G is the electron gyromagnetic ratio, and d k ¼ 0.35 Hz cm=V; d ⊥ ¼ 17 Hz cm=V are the electric dipole moments. From the Hamiltonian, we can see that while the magnetic field ⃗ B can only drive the single-quantum transition (Δm s ¼ AE1), the electric field component Π ⊥ can result in a double-quantum transition (Δm s ¼ AE2) from the last term in Eq. (H4). Therefore, a full three-level model is required to account for the full spin relaxation resulting from magnetic and electric field noise. The threelevel energy diagram is depicted in Fig. 11(a), with the SQ and DQ transition rates denoted by Ω and γ, respectively. The pulse sequences for measuring the relaxation times T 1;SQ and T 1;DQ are shown in Figs. 11(b) and 11(c). From SQ and DQ T 1 measurements, we can extract the SQ and DQ transition rates, Ω and γ, from Eq. (H5) [20]:   Table I. The electric field noise spectrum, expressed in terms of coupling strength to the NV center, in units of Hz 2 =Hz, can be obtained directly from the DQ transition rate S DQ ðω DQ Þ ¼ γðω DQ Þ, where ω DQ is the DQ transition frequency [20]. Figures 11(d) and 11(e) show the noise spectra sampled from NV4 and NV5, consistent with 1=f noise. Similarly, the magnetic field noise spectrum, expressed in terms of coupling strength to the NV center, can be obtained from the SQ transition rate Unlike S DQ , S SQ is roughly constant across this magnetic field range.

Comparing magnetic and electric noise spectra
Several reports have observed that electric field can significantly contribute to the decoherence of shallow NV centers [40,41]. Here we combine decoherence and relaxation measurements to compare the noise spectral density arising from magnetic field and electric field noise. Even though the magnetic field noise spectrum obtained from SQ measurements (dynamical decoupling and SQ relaxation) and the electric field noise spectrum obtained from DQ measurements are of distinct origins, we can place them on the same scale for comparison by considering the effective field E ⊥ [20], given by where S SQ E k is obtained by considering an effective electric field that would cause first-order dephasing and the relationship hE ⊥ i 2 ¼ 2hE k i 2 is obtained by considering the geometry of the NV axis on a (100)-oriented diamond. The combined spectrum is shown in Fig. 11(f).
Because we probe a small frequency range over which our scaling of 1=f (dashed lines) differs from what is reported by Myers et al. [20], we also plot the extrapolated noise spectral density with a scaling of 1=f 2 (dotted lines), which would be expected from a sum of many Lorentzian noise sources with different cutoff frequencies, as a worstcase estimate. Based on this extrapolation, and comparing to the amplitude of the magnetic field noise obtained from dynamical decoupling and SQ relaxation, we find that the magnetic field noise is the dominant source of noise for the oxygen-annealed surface for ω > 2π × 200 kHz.

Coupling to dark surface spins
Several experiments have reported that the coherence times of the shallow NV centers are limited by magnetic noise arising from "dark" surface spins that are not optically active [7,9,10,17]. To examine this hypothesis, we probe the coupling of our NV centers to dark spins via double electron-electron resonance spectroscopy, where a second microwave tone is applied to flip the dark spins during the NV π pulse in the Hahn echo sequence. To extract the DEER coupling, we model the decoherence of the NV center with two separable sources of decoherence, one arising from the dark spins and another decoherence rate attributable to everything else. The coherence C as a function of free precession time τ can be written as where N NV and N DEER are stretching factors. To directly measure the DEER coupling, we can normalize the DEER decay curve using the Hahn echo decay curve. The resulting decay curve can be used to extract DEER coupling g DEER ¼ 1=T 2;DEER : Experimentally, we interleave the DEER experiment with the Hahn echo experiment in order to mitigate any longterm drift. The pulse sequence and a sample set of data are shown in Figs. 12(a)-12(c). We performed DEER measurements on the same NV centers across several surface processing steps. We observe that for sample A, whose coherence data are presented in the main text, the oxygen annealing also results in a lower DEER coupling across all the NV centers, and this lower DEER coupling is reversible and reproducible. In Appendix F, we show that across several samples with different surface morphology, there are varying degrees of improvement to the coherence times of the NV centers with oxygen annealing, but the oxygenannealed surface shows consistently better coherence. However, the change in DEER coupling is inconsistent among these samples, suggesting that rough surface morphology can lead to a population of persistent dark spins that are not eliminated by oxygen annealing. Figures 12(d) and 12(e) show a comparison of changes in T 2 and DEER coupling between sample A and sample C. While the oxygen annealing improves the coherence times in both samples and reduces the DEER coupling in sample A, the DEER coupling stays the same or increases in sample C. We calculate the Pearson's product-moment correlation coefficient between the coherence time improvement and DEER coupling reduction for both samples and we find no significant correlation (correlation coefficients −0.44 and þ0.25, respectively).
(b) Hahn echo and DEER decay curves taken at B z ≈ 330 G. Collapse and revival arising from hyperfine coupling with a nearby 13 C can be observed. (c) Normalized signal, where the 13 C modulation is no longer pronounced. (d),(e) Spin echo coherence time and DEER coupling for several NV centers in sample A and sample C before and after oxygen annealing. Green (red) numbers indicate the average improvement (worsening) in the decoherence rate 1=T 2 , and the average reduction (increase) in the DEER coupling after oxygen annealing. Fig. 13(a)]. The binding energy scale was calibrated to the core levels and Fermi edge of a tantalum calibration sample [ Fig. 13(a), inset]. The width of the electron distribution curve ω can be measured as the energy difference between the valence band maximum (VBM) and the secondary electron cutoff at high binding energy, as shown in Figs. 13(b) and 13(c). The ionization energy E i can then be used to relate the spectral width ω to the electron affinity E A by where hν ¼ 21.2 eV is the excitation energy from the He I UV source and E g ¼ 5.5 eV is the band gap of the diamond.

APPENDIX J: LEED PATTERN OF OXYGEN-ANNEALED SURFACE
The LEED pattern for the oxygen-annealed surface [ Fig. 3(c), duplicated in Fig. 14(a)] shows that the surface is 1 × 1 reconstructed, and there is no evidence of 2 × 1 reconstruction. In order to verify that the absence of 2 × 1 diffraction peaks does not arise from disorder obscuring the peaks, the sample was annealed at 1000°C to remove the oxygen termination, as verified by XPS [ Fig. 14(c)]. A clear sign of 2 × 1 reconstruction emerges in the diffraction spots [ Fig. 14(b)], which are absent before annealing [ Fig. 14(a)].

APPENDIX K: COMPARISON OF XPS SPECTRA FOR TRIACID-CLEANED AND OXYGEN-ANNEALED SURFACES
High-resolution XPS spectra were measured at the Australian Synchrotron. The carbon 1s spectrum [ Fig. 15(a)] for the triacid-cleaned sample shows a larger peak width compared to the oxygen-annealed surface, individual satellite peaks cannot be resolved, and the sp 2 carbon peak at lower binding energy is more pronounced. There is also some weight to the spectrum at higher binding energies, above 288 eV, possibly indicating the presence of some carboxylic acid groups at the surface [42]. The oxygen 1s spectrum [ Fig. 15(b)] for the triacidcleaned surface also shows a single dominant peak with some species at lower binding energies, but the peak width is 2.1 eV, compared to 1.3 eV for the oxygen-annealed surface. The broader peak widths are consistent with a highly disordered, heterogeneous surface termination.
XPS survey scans of many samples across many processing steps were taken at the Princeton IAC. From the survey scans, the atomic composition of the surface can also be estimated by comparing the magnitude of the XPS peaks. Here, we focus on the oxygen 1s spectra from triacid-cleaned surfaces and oxygen-annealed surfaces. From the inelastic mean free path of photoelectrons at these energies (2.2 nm for 1487 eV Al Kα) [43], we estimate the contribution of the signal from a monolayer of  atoms on the diamond surface to be 7.6%. Figure 15(c) shows a histogram of the measured oxygen 1s atomic percentage from multiple XPS spectra, where the oxygenannealed surfaces show markedly higher oxygen 1s percentage compared to the triacid-cleaned surfaces. The values obtained are consistent with an oxygen monolayer on the surface after oxygen annealing, while the triacidcleaned surfaces have lower oxygen coverage, indicating incomplete oxygen termination.