Real-time estimation of the optically detected magnetic resonance shift in diamond quantum thermometry

We investigate the real-time estimation protocols for the frequency shift of optically detected magnetic resonance (ODMR) of nitrogen-vacancy (NV) centers in nanodiamonds (NDs). Efficiently integrating multipoint ODMR measurements and ND particle tracking into fluorescence microscopy has recently demonstrated stable monitoring of the temperature inside living animals. We analyze the multipoint ODMR measurement techniques (3-, 4-, and 6-point methods) in detail and quantify the amount of measurement artifact owing to several systematic errors derived from instrumental errors of experimental hardware and ODMR spectral shape. We propose a practical approach to minimize the effect of these factors, which allows for measuring accurate temperatures of single NDs during dynamic thermal events. We also discuss integration of noise filters, data estimation protocols, and possible artifacts for further developments in real-time temperature estimation. The present study provides technical details of quantum diamond thermometry and discusses factors that may affect the temperature estimation in biological applications.

For biological applications, the use for thermometry is of particular significance because temperature is a fundamental parameter of biological activity, such as circadian rhythms [17], energy metabolism [18], and developmental processes [19,20]. The biological application of NV thermometry was first demonstrated in cultured cells [21][22][23][24], and recently in in-vivo model animals, such as nematode worms [25,26]. A technological key to the recent in-vivo demonstrations is the method to efficiently determine the temperature of NDs under their dynamic motion inside biological structures; one needs to complete the temperature measurement for a few seconds because the incorporated NDs and whole systems are moving, which significantly challenges the current quantum-sensing protocols. Of various ODMR measurement protocols related to thermometry [21,[27][28][29][30][31][32][33][34], multipoint methods have * masazumi@osaka-cu.ac.jp † yutaka.shikano@keio.jp been proposed [11,21,29,32] to accelerate the ODMR measurement process, wherein fluorescence intensities at three or four frequency points are acquired to estimate the ODMR shift. These methods significantly reduce the measurement time and thus enable more signal sampling, which improves the precision, on comparing with the frequency-shift determination using the wholespectral-shape measurements. These methods, however, are inevitably susceptible to experimental errors, such as change of ODMR spectral shape, temperature-dependent NV fluorescence intensity, and hardware's instrumental errors, because they estimate the frequency shift based on the limited available information at the chosen frequency points. These two types of multipoint ODMR methods, i.e., 3-and 4-point methods, have demonstrated their effectiveness [21,29]. Intuitively, the number of frequency points is likely to affect the results; the estimation based on a higher number of frequency points provides more information and shows a robust behavior under the complicated response of ODMR spectral shape that cannot be simplified using the simple Lorentzian function. However, increasing the number of frequency points reduces the number of measurements, which may lead to deterioration in precision. Understanding error and noise sources associated with these multipoint ODMR measurements is crucial to use these techniques further.
In this report, we study the multipoint ODMR measurements in the context of real-time thermometry for biological applications. Focusing on the 4-point method, we first show that there is a marginal difference in the photo-responsivity between the photon counts of the selected four frequencies, which causes significant artifacts during temperature estimation. This difference in photo-responsivity is also observed for the 3-point method, which can be attributed to the variations of ODMR spectral shape and temperature-dependent NV fluorescence intensities. In addition, the hardware's instrumental errors can affect the photo-responsivity, which becomes significant when using the 6-point method (an extended version of the 4-point method). Second, we propose a practical way to cancel this effect, and compare the performances of each method in terms of precision and associated noises. Third, we demonstrate the monitoring of temperature dynamics of single NDs while having dynamic variation of fluorescence intensity by the thermal drift of the setup and temperature dependence of NV fluorescence. Finally, we summarize the above-detailed investigation and discuss possible artifacts that may be produced while using multipoint ODMR measurement techniques on biological samples.

II. EXPERIMENTS
A. CW and multipoint ODMR measurements Figure 1 shows a schematic diagram of the experimental setup and control sequences for the CW and multipoint ODMR measurements. A CW 532-nm laser was used to excite the ND-NV centers, with an intensity of ∼ 2 kW·cm −2 . An oil-immersion objective with a numerical aperture of 1.4 was used. The NV fluorescence was filtered using a dichroic beam splitter (Semrock, FF560-FDi01) and long-pass filters (Semrock, BLP01-635R- 25) to remove the residual scattered signal of the green laser. The fluorescence emission was coupled to an optical fiber that acted as a pinhole (Thorlabs, 1550HP). The fibercoupled fluorescence was detected by an avalanche photodiode (Excelitas, SPCM AQRH- 14). The output of an APD was fed to two data-acquisition (DAQ) boards equipped with four pulse counters (DAQ-1: National Instruments, USB-6343 BNC) and two counters (DAQ-2: National Instruments, USB-6229 BNC). We used NDs containing ∼ 500 NV per particle (Adámas Nanotechnologies, NDNV100nmHi10ml), which were then spin coated on coverslips. The sample coverslips were set in an incubation chamber that was mounted on a piezo stage. Raster scanning and particle tracking of NDs were performed by controlling the voltage applied to a piezo stage (Piezosystemjena, TRITOR-100SG). Note that an external magnetic field was not used in this study.
To implement both CW and multipoint ODMR measurements, a stand-alone microwave source (Rohde & Schwarz, SMB100A) and five USB-powered microwave sources (Texio, USG-LF44) were connected to an SP6T switch with a switching time of 250 ns (General Microwave, F9160). The output microwave signal was amplified (Mini-circuit, ZHL-16W-43+) and fed to a microwave linear antenna placed on a coverslip (25-µm-thin copper wire) that was sealed with a home-built plastic box with a hole in the center. The typical microwave excitation power was estimated to be 10-50 mW (10-17 dBm) at the linear antenna by considering the source output, amplifier gain, and the experimentally determined S 21 of the antenna system, which provides microwave magnetic field of more than 2-5 gauss in 20 µm from the antenna. To acquire CW-ODMR measurements, the APD detection was gated for microwave irradiation ON and OFF using the SP6T switch and a bit pattern generator (Spincore, PBESR-PRO-300). Specifically, the bitpattern generator fed TTL pulses to the SP6T switch for gating the output from MW1 (200 µs for ON and OFF), which is followed by 100-µs TTL pulse to the acoustooptic modulator for switching off the laser, as described in Fig. 1(b). We prepared two counters in DAQ-1 (Sig and Ref in Fig. 1(b)) and fed the APD output to one of its digital input ports to be assigned to these two counters. These two counters were operated in gated edge counting mode. This resulted in I ON PL and I OFF PL with a repetition rate of 2 kHz. In the multipoint-ODMR measurements, the APD detection was gated for the respective microwave frequencies (4-points: ω 1 to ω 4 . 3-points: ω − , ω + , and off-resonant ω 0 . 6-points: ω 1 to ω 4 , ω − , ω + ). The gate width was t M = 100 µs, common to all the four, three, or six gates, each followed by an interval of t int = 5 µs. For the 4-or 3-point method, we used four or three counters in DAQ-1, each synchronized to the respective microwave sources (4-point: MW1 to MW4. 3-point: MW1 to MW3). For the 6-point method, the four counters of DAQ-1 and two counters of DAQ-2 were each synchronized to the respective microwave sources (MW1 to MW6). The total photon count I tot was obtained using the following equation: where 1.05 was used as the correction factor to account for the time interval of 5 µs, during which no photons were counted. During the temperature measurements, a confocal microscope system was used to track the target NDs. The piezo stage was scanned in the xyz directions while measuring the ND fluorescence intensity. The obtained crosssections of the point spread function along the xyz axes were fitted using Gaussian functions to determine the xyz positions for re-positioning. The piezo stage was moved smoothly to the re-positioning point in five steps of ∼ 20 nm every 2 ms. The re-positioning took 3.8 s and was performed with a tracking period of t track = 4 s. It should be noted that data acquisition for the temperature estimation was performed for 1.0 s (occasionally, for 0.5 s, depending on the application) for every four seconds.
It should be noted that there may be other combinations of experimental devices for using the multipoint methods, which may affect the interpretation of a part of the following results, particularly regarding the artifact caused due to experimental hardware. For example, one could use arbitrary wave-form generator to make multipoint sequences instead of using both microwave switches and many microwave frequency sources. Devising the gate-feeding to the DAQ board may re-duce the number of counters used in the DAQ boards by sharing the same counters for the multiple measurement windows. Field-programmable gate array (FPGA) (a) ODMR spectrum fitted using the sum of two Lorentzian functions centered at the zero-field splitting frequency D and (b) that with two linear slopes fitted to the spectrum. The selected frequencies ω1 to ω4 and ω−, ω+ are represented as orange circles from left to right, and are separated on each of the slopes by δω, as shown in the inset. ω0 is used for the 3-point method and set to 2.65 GHz sufficiently far from D. may provide us with wider choices for the implementation of multipoint ODMR. Using such an implementation could eliminate some of the hardware-derived measurement artifacts. However, the following section reports a measurement artifact that is most likely intrinsic to NV ODMR, that is, it cannot be removed by using the above-mentioned hardware implementations.
B. ODMR spectral analysis to define the frequency points for the multipoint ODMR measurements To determine the microwave frequencies for the multipoint ODMR measurements, the CW-ODMR spectra were first recorded for target NDs. Based on the obtained CW-ODMR spectra, the intensities I 1 to I 4 and I − , I + were selected in the following manner: (1) The obtained CW-ODMR spectra were fitted to the sum of two Lorentzian functions to indicate the ODMR dip, and the zero-field splitting D, as shown in Fig. 2. (2) The two linear slopes of the ODMR dip were determined via linear fits and six frequency points that included three on each slope, which were uniformly distributed, i.e., equidistant with δω over the extent of the slopes. In this case, ω − and ω + were centered on the zero-field splitting (D) such that I(ω − ) = I(ω + ) (see Appendix for more details).
For the 3-point method [29], we take ω − and ω + for the two frequencies on the Lorentzian dip and set ω 0 = 2.65 GHz for the off-resonance frequency to take the baseline fluorescence intensity. The temperature estimate can then be written as follows [29]: where Γ is the full width at half maximum (FWHM) of single Lorentzian and ρ = |ω + − ω − |/Γ. Note that the reason of choosing 2.65 GHz as an off-resonant microwave frequency is because it is sufficiently far from the resonant 2.87 GHz but is not so far as to change the high-frequency noise to the piezo stage or other electronic devices. For the present 6-point method, we simply take the mean of the three types of estimates of the 4-point method as follows: Then, we take their mean as From these equations, the measurement noise can be calculated.
It should be noted that a comparison of the precision of these multipoint ODMR methods needs experimental validation, which is difficult theoretically. Single sequences of the 4-, 3-, and 6-point measurements take t 4pnt = 420 µs, t 3pnt = 315 µs, and t 6pnt = 630 µs, respectively. Supposing the photon count rate to be R, photon shot noise of the single measurement of i-point method (σ) can be written as R × t ipnt . The measurement noise with 1-s sampling time is determined by the number of measurements per second (t −1 ipnt ), which provides the measurement noise Therefore, the 3-point method can provide the smallest precision in a simplistic picture because it can integrate as many measurements (estimation) as possible for a certain time. However, there are other factors affecting the precision of the thermometry, such as spectral shape of the ODMR. The assumption of a single Lorentzian shape is not straightforward because ODMR dip exhibits splitting due to the mixed contribution from the interference between the 3 A spin states and from the inhomogeneous decoherence sources [22,40,41]. Such spectral shape dependency will be discussed in Sec. III A in detail. The 4-and 6-point methods do not explicitly use such an assumption of the spectral shape; however, the selection of frequency points affects the measurement. For example, frequency points on one side (ω 1 and ω 2 or ω 3 and ω 4 ) should be apart from each other to obtain smaller noise (∼ σ 4pnt /δω), but they need to be sufficiently far from the curved region of the ODMR spectrum, where the linear fit is no longer valid.

C. Experimental determination of the precision and accuracy
The precision (σ exp ) of the thermometry was experimentally determined by taking the standard errors of 20 sampling points of ∆T NV that were recorded over 38 s, including tracking time. The sensitivity (η T ) could be calculated as η T = σ exp × 2δt intgr because this duration consists of 19.4-s integration time (δt intgr ) and 18.6s re-positioning time. The accuracy was determined by adding an offset (T 0 ) to ∆T NV to match T S and taking the root-mean-square (RMS) of T S − (T 0 + ∆T NV ), where T S is the sample temperature calibrated separately (see below Sec. II D). The upper bound of the RMS in the steady state was considered to be the accuracy. It should be noted that the fluctuation of the environmental temperature T air (namely, the air-conditioned laboratory room temperature) caused a deviation of T 0 + ∆T NV from T S , which overestimated the accuracy value.

D. External temperature change
The sample temperature (T S ) was varied via direct heat conduction from the oil-immersion microscope objective, whose temperature (T obj ) was controlled by using a PID-feedback controller of the foil heater which wrapped the objective (Thorlabs, HT10K & TC200, temperature precision: ± 0.1 K). The immersion oil was Olympus Type-F. T S was calibrated in the following manner: (1) a tiny flat Pt100 resistance temperature probe (Netsushin, NFR-CF2-0505-30-100S-1-2000PFA-A-4, 5 × 5 × 0.2 mm 3 ) was tightly attached to the sam-ple coverslip by a thin layer of silicone vacuum grease between the probe and the coverslip. (2) The probe was completely covered with aluminum tape whose edges were glued to the base coverslip. (3) In this thermal configuration, T obj was varied while monitoring T S . We obtained the following relation: T S = 1.847 + 0.923T obj • C according to Fig. S2. The temperature probe was read using a high-precision handheld thermometer (WIKA, CTH7000, temperature precision: ± 0.02 K). During the calibration measurement, T air was monitored using a data logger (T&D, TR-72wb, temperature precision: ± 0.5 K), and we confirmed that T air fluctuates within only ± 0.5 K over 12 h. Note that T obj was monitored directly on top of the foil heater. The temperature stability in the incubator is ±0.2 • C over 12 h when measured by the above temperature probe (see the 0.2-• C periodic oscillations in Fig. S2 (a)).

A. Optical power dependent instrumental errors of the quantum thermometry
Theoretically, the counter values on each side of the CW-ODMR spectrum should show the same dependency on the fluorescence counts; however, they show a slightly different dependency. Figure 3(a) shows the dependence of the counter values as a function of the fluorescence intensity for the 4-point measurement, where the fluorescence intensity was varied by controlling the laser excitation power using a variable ND filter (specifically VND-1 in Fig. 1(a)). First, the counter values of I 1 and I 4 are always larger than those of I 2 and I 3 because they exhibit the difference of the ODMR contrast. Second, these two sets of the two counters (namely, I 1 , I 4 , and I 2 , I 3 ) exhibit a similar linear increase; however, there are small differences in the values in the order of 10 −3 . Figures 3(b) and (c), respectively, show the differences of the two counter values (I 1 − I 4 , I 2 − I 3 ) and their ratio to the absolute counts that are defined as where i and j denote the counter identification numbers. The absolute differences are well fitted with the secondorder polynomials, and V ij exhibit a linear variation. The observed small variations of V ij of approximately 0.5 % is significant in the temperature estimation process, which corresponds to a temperature variation of about a few K. For the 3-point measurement, the photo-responsivity difference of I − − I + is similar to that of the 4-point measurement, while the dynamic range of I + is relatively large (Fig. 3(e)), thus deteriorating the curve-fitting precision. V −+ shows the same dependency as V 14 and V 23 ( Fig. 3(f)). For the 6-point measurement, the variations of V ij increase upto ∼ 2.0 %, close to a four-fold increase from those of the 4-and 3-point measurements, as shown We, therefore, tested the behavior of the counter values for the constant TTL pulse trains generated by a frequency generator (Stanford Research Systems, DS345; frequency accuracy, ±5 ppm). The output from the frequency generator was connected to the two DAQ-boards instead of the APD, as shown in Fig. 1(a). Figure 4 shows the pulse-responsivity of the counter values, their differences, and V ij for the 4-point and 6-point measurements. Note that the photo-responsivity of the 4-point  This difference is, however, not critical in the multipoint ODMR measurements because the uniform effect of all counters is finally canceled as in Eqs. 2-5. Instead, the difference between the counter values in Figs. 4(b) and (h) are one or two orders of magnitude smaller than that of the real photon counting shown in Figs. 3(b) and (e) (0-300 ppm of V ij compared to 0-1.8 % of the photon counting case). Furthermore, V ij is flat over the entire range with an offset and distribution of <300 ppm and ± 50 ppm, respectively, and does not show a significant dependence on the counter values, which can be basically explained by the timing accuracy of the DAQ boards of 50 ppm based on the manufacturer's specification sheet.
After quantifying the hardware instrumental error related to the photo-responsivity difference, we next check the effect of microwave frequency. Figures 5(a), (b) show eight patterns of choosing frequencies in the 4-point method. In Fig. 5(a), the upper two frequency points were shifted up, away from the bottom two fixed points, thus increasing the difference of their ODMR depth in the order of 1 → 5. On the contrary, in Fig. 5(b), the bottom two frequency points were shifted up toward the top two fixed points (5 → 8). With these eight patterns of frequency selections in the 4-point method, we analyze the behavior of difference ratio of V ij . Figure 5(c) shows V 14 and V 23 for the respective eight patterns. The V ij curves are basically not affected by the frequency selection within the error of V ij . In contrast, Fig. 5(d) shows the comparison of V ij between these eight patterns and the off-resonant pattern in which the 4-point measurement was performed by simply shifting the pattern 3 by 200 MHz to the lower frequency side, namely, around 2.65 GHz. The V ij values now turn small within the range of ± 0.1 %. While this value is still one order of magnitude larger than the case of Fig. 4, the V ij shows the same flat photo-responsivity as that of the counter responsivity ( Fig. 4(c)). As shown in Fig. 5(e), this on/off-resonance behavior of the 4-point measurement is also observed in the 3-point measurement. In the 6-point measurement, shown in Fig. 5(f), V 23 exhibits the same on/off-resonance behavior as V 23 of the 4-point measurement, whereas the difference between V 14 and V −+ vanish, exactly matching each other. This fact indicates that the use of different DAQ boards created the same dependency for V 14 and V −+ , but the difference between V 14 and V −+ that were observed in the ODMR measurement in Fig. 3(i) should be related to the NV spin resonance itself. Note that the we have confirmed that the photo-responsivity difference is also observed in the permuted 4-point method that has been recently reported [25].
We further confirmed that the irradiation of laser and microwave does not cause this photo-responsivity difference by measuring its dependence on the duration of the photon-counting (t M ) and interval times (t int ). Figure 6 (a) shows the dependence of the difference ratio acquired with the 4-point method on the measurement time, t M . The dependence of V 14 and V 23 does not change significantly when the measurement time is increased from 50 to 500 µs. Similarly, Fig. 6 (b) shows the dependence on the interval time (t int ) when t int is increased from 0.1 to 100 µs. As the interval time increases, V 14 is shifted to the left (particularly 50 and 100 µs) because of the reduction of the detected photon counts per second. However, the overall change of difference ratio between the minimum and maximum counter values does not change because the optical power itself is kept at the same level. Furthermore, the laser irradiation during the interval time does not affect the photo-responsivity difference as shown in Fig. 6 (c).
Summarizing these experimental results, the random nature of photon-counting events just increases the measurement accuracy of DAQ boards, and it does not cause photo-responsivity difference such as linear dependency of V ij . Although using different DAQ counting boards may cause the photo-responsivity difference, it can be identified by checking the photo-responsivity difference between the on/off-resonance behaviors. The final remaining photo-responsivity difference should be related to the intrinsic nature of the ODMR signal. While the mechanism of this ODMR behavior cannot be explained well presently, it may be related to some optical-powerdependent spectral distortion of the ODMR dip. For example, it is well known that the ODMR depth and linewidth depend on the laser excitation intensity [42]. Such optical power dependency and some other associated minor phenomena may cause the present photoresponsivity difference. It should be noted that our CW-ODMR measurements were not able to address this small asymmetry because of the limitations of acquiring sufficiently high signal-to-noise ratio data. In our standard configuration, the whole spectral measurement accumulates 20 Mcts of photons at each point of frequency, which gives 4.5 kcts as a photon shot noise. The ODMR contrast is approximately 10%, i.e., the ODMR depth is 2 Mcts of contrast. The observed spectral asymmetry is therefore to be 2 kcts (0.1 %), which needs high degree of data accumulation. Furthermore, the long integra-tion time of the whole spectral measurement (2-20 min depending on the photon count) inevitably incorporates low-frequency noise, which affects the spectral shape. It is also necessary to work on NV centers in bulk diamonds to account for the underlying material physics of NV centers. We also note that the impedance mismatch between the APD output (50 Ω) and the DAQ inputs (1 kΩ) is not likely to affect the current observation. Impedance mismatch causes reflections in the coaxial electrical line and may affect the detected pulse numbers. However, the pulse counter experiments using the frequency generator (Fig. 4) also have an output impedance of 50 Ω, the same as that of APD. If the impedance mismatch causes the observed photo-responsivity difference, it should have been observed in Fig. 4.

B. Practical approach for canceling the instrumental errors
Whereas the exact origin of the variations of the counter responsivity is not completely interpreted, these effects can be practically eliminated by subtracting pre- known systematic errors from the original counter values. Such a error-correction filter is necessary particularly for the measurements operated in the low-photon regime (I tot < 1 Mcps) because it can create significant artifacts in the frequency-shift estimate (i.e. about 300 kHz corresponding to several degrees). While conducting experiments, the counter responsivity is measured each time before performing the multipoint ODMR measure-ments. After acquiring the counter responsivity data as presented in Fig. 3, the data were fitted with secondorder polynomials. With the fitting parameters of the second-order polynomials, the original counter values can be corrected as follows: where I C i , I NC i , a k , b k , and c k denote the error-corrected photon counts, original photon counts (no error correction), and coefficients of the second-order polynomials for I 3 , I 4 , I + , respectively. Table I summarizes these fitting parameters used in Fig. 3. In this way, we can eliminate the systematic errors of the measurement systems included in the original counter values. Note that this error correction does not explicitly depend on I 1 , I 2 , or I − ; the data points located on the left side of the ODMR spectrum are used concomitantly for the calculation in Eqs. 2-5. This independence of the other side of the ODMR spectrum is important to isolate the error correction from the temperature derived ODMR shift. Figure 7(a) shows I tot , ∆Ω 1 4pnt without and with the error correction for the 4-point measurement, respectively, when the laser intensity is intentionally varied. The error correction suppresses the signal drift. While the temperature estimate shows an offset drift when the laser intensity is changed without the error correction, it no longer shows the drift when the error is corrected. It also works well when the fluorescence intensity is varied for a constant laser intensity, as can be seen in Fig. 8(b); specifically a variable ND filter (VND-2 in Fig. 1(a)) was varied to directly control the fluorescence intensity detected by the APD . Figures 7(c)-(f) show the corresponding data for the 3-and 6-point measurements when either of the laser intensity or fluorescence intensity is changed. The error correction is still effective for the 3-and 6-point measurements. In all the cases, the noise is generally reduced as a result of the error correction in addition to the drift cancellation. Note that the error correction does not work very well in the low-photon regime I tot < 0.5 Mcps owing to the inaccuracy of the fitting parameters; It is, therefore, necessary to always perform measurements at I tot > 0.5 Mcps to ensure that drift is negligible. Note also that the fitting errors are propagated to the absolute accuracy of the temperature measurement (see Eqs. D1-D3 for the error-propagation equation). Figure S3 shows the dependence of the propagated errors on the fluorescence intensity for the present three types of multipoint ODMR methods. In the 4-and 3-point methods, the errors of the absolute measurement accuracy is 2-3 K for the most of the photon count range and can be increased up to 6 K when the photon counts are decreased. In the 6-point method, the accuracy is increased to 6-12 K because of the large instrumental errors of DAQ devices. Note that this inaccuracy means there is a constant uncertain offset in the measured ∆T NV because we do not change the error-correction parameters during the temperature measurement.
C. Measuring stepwise temperature change using the multi-point methods Using these error corrections, we can now measure the temperature change of NDs while the photon counts are dynamically changed, similar to the heating/cooling of the oil-immersion objective, which has been technically challenging because the change of objective temperature strongly shifts the focal position and this positional shift causes photon count noise. Figures 8(a)-(c) show the time profiles of the temperature estimates for the three types of the multipoint ODMR measurements by working on the same single ND with the constant optical and microwave power in the 4-, 3-, and 6-point methods, respectively. Each measurement takes approximately 40 min and was performed sequentially, thereby taking ∼ 2 h throughout these three measurements. As the temperature of the sample (T S ) is decreased stepwise by ∼ 2 K, I tot increases due to the improvement of the fluorescence quantum efficiency, as previously reported [43] (top panel). As shown in the second top panels, all multipoint ODMR methods successfully follow the stepwise temperature change. Interestingly, the scaling factor between T S and ∆T ipnt NV is different among the three methods, giving different temperature dependency of zero-field splitting α (−54. . This deviation indicates that the near-dip region shifted more than the base region above the FWHM, thereby revealing a slight distortion of spectral shape under the temperature change. The relatively large α of the 3-point method is related to the assumption of single Lorentzian shape for the ODMR spectrum. In most cases, the ODMR spectrum cannot be simplified as a single Lorentzian and there must be a deviation from the Lorentzian-based estimation in the real measurements. As described in Fig. S5, Eq. 3 holds only when the ODMR spectrum exactly matches single Lorentzian. It otherwise provides a significant deviation of the estimation from the real temperature change by a factor of upto 2 depending on the real profile. Note that, in our experiment, the observed values of α (particularly of 4-and 6-point methods) are generally smaller than previous reports of -74 kHz · K −1 probably because of the heating method in which NDs are heated from the bottom and cooled by the surrounding air. There is also a material inhomogeneity of α both for bulk diamonds and NDs [26,35,44]. However, the present comparison between the multi-point ODMR methods is not affected by the difference of the absolute values.
The precision and accuracy of the NV thermometry during the measurement are also shown in the two bottom panels in Fig. 8. The 4-and 6-point methods give almost the same precision (0.14 and 0.15 K, respectively), but the 3-point method gives a slightly large error (0.23 K). The sensitivity is then calculated as 0.9, 1.4, and 0.9 K/ √ Hz for the 4-, 3-, and 6-point methods, respectively, for this particular ND. The accuracy is not affected by the measurement methods and is < 0.5 K, which is common to all the methods. It should be noted that the present accuracy is overestimated because ∆T NV in the stable states is significantly influenced by the environmental temperature fluctuation (T air ). The potential accuracy of the NV thermometry should be better than the present value.
The presently used moving average of ∆T 4pnt NV over 20 data points can track the dynamic change of the temperature. The selection of the averaging range needs careful consideration on the sensor noise (as described in the following section of Allan variance analysis) and required time resolution. While the 20-point moving average is best for the present data, one could use other filtering techniques for some applications to achieve temporal resolution and temperature precision simultaneously, such as Kalman filter (see Fig. S11), because these filtering techniques can work more efficiently in some situations such as transient dynamics with noisy measurements [45].

D. Noise analysis of the ND quantum thermometry
We next analyze the noise profiles of the thermometry by recording the temperature time profiles for a long term in the respective multi-point ODMR methods.  Fig. S2). Region 4 sometimes includes instability of NV spin properties; for example, Fig. 9(a) shows a bumpy profile while there were no noticeable changes in the room temperature. Such long-term fluctuations are sometimes observed in our experiments of the ND quantum thermometry, which may be related to the NV stability under the optical excitation. At present,  the noise source of Region 2 has not been clarified, but it may be related to some mechanical instability of the confocal microscope. For these reasons, the time window of the moving average filter is optimal at around 40 s,  Fig. S9(c). Note also that the first plateau of noise profile until ∼ 4 s is due to the uneven time spacing of the data in the Allan variance calculation or interpolation effect as described in Appendix H.

E. Artifact analysis for the temperature estimation
We have so far analyzed the instrumental errors of our thermometry system when estimating the temperature based on the limited available information of the ODMR spectrum. Our study highlights the importance of analyzing all possible factors that may affect the frequency shift estimation. Signals from the NV quantum thermometers may be more complicated in biochemical environments including nanofiber structures [46], lipids [47,48], cells [21,22,49], and living animals [25,26,[50][51][52], which inevitably increases the possibility of sensory artifacts of NV centers. Table II summarizes such factors that may cause frequency shift, linewidth change, and fluorescence intensity, which ultimately can appear as artifacts in the multipoint ODMR measurements.
The most direct factors are the magnetic field, stress, temperature, and electric field, which have been major sensing targets of diamond NV sensors themselves [2]. The magnetic field is the most prominent factor that causes a frequency shift via the Zeeman effect. Given that the effect of the static magnetic field is canceled in the present method (Eqs. 2 and 4), slowly-varying magnetic field, including the geomagnetic field, does not seem to induce artifacts as long as the magnetic field is static for more than 1.0 s. In this context, the application of quantum thermometry to brain activity requires great care during data analysis because neural activity generates strong and complicated magnetic fields for the short time-scale via the internal electric current, which is used for magnetoencephalography (MEG) [53,54]. Conversely, the magnetometry applications of NV centers for MEG need to take into account the temperature variation in brain tissue structures. Stress is not likely to affect the ODMR measurements in most biological experiments because biological pressures are considerably small and do not affect the crystal field of NV centers in diamonds (15 kHz/MPa [55,56]). An electric field can also cause a frequency shift by inducing stark shift [57,58]. While there is a proposal to measure local electric fields in a biological context such as a trans-membrane electric field generated by the cell-membrane potential [2], a more critical situation would be when bio-molecules possessing large electron affinity are adsorbed on the ND surface, which is known to be adsorbed by various biomolecules in cells or living organisms [59][60][61]. Moreover, this mechanism has been used to manipulate the ND spin properties by surface functionalization [62,63]. The surface functionalization is thus important to control both the sensitivity and robustness of NV spins systems.
Other factors that are particular to biological environments include pH, ionic strength, and water adsorption to the ND surface. Previous experiments based on the CW-ODMR spectral measurements have confirmed that these factors do not change both the ODMR frequency and the spectral shape in a wide range of pH and various ionic solutions [24,64]. Because the precision of the CW-ODMR-based methods used in the literature was only in the range 1-2 • C, small effect within this precision range may be detected in the multipoint ODMR methods. It is also important to consider the effect of pH on the emission stability of negatively charged NV centers [65], as the emission instability causes variations in the fluorescence intensity. In addition, NDs move randomly in cells and worms by Brownian motion, intracellular transportation, and body motions [25,26,51]. These particle motions cause photon count fluctuations that may affect the multipoint ODMR measurements if the instrumental errors are not fully removed. Microwave irradiation may cause a change in the temperature during the measurement process due to microwave water heating that causes ODMR shift [66]. Because microwaves are always irradiated during the measurements and would not affect the relative temperature measurements, local heating due to some conformational changes of biological samples may change the local dielectric permeability, thereby affecting the local heat-generation rate. Such artifacts also need to be seriously considered if the observed temperature signal from the NV centers is not very straightforward.
One possible approach to validate the measured temperature data is comparing/combining the ODMR thermometry with all-optical NV thermometry [39,43,67,68] and other optical nanothermometry techniques [69][70][71][72][73][74]. Such dual or multi-modal temperature measurements is meaningful particularly for biosensing applications because of the complex factors that might cause artifacts of the ODMR shift.

IV. CONCLUSION AND FUTURE PERSPECTIVES
In this study, we reported the experimental details and protocols of real-time multipoint ODMR measurements, i.e., 4-point, 3-point, and 6-point methods. The multipoint ODMR measurement is a process used to estimate the frequency shift of ODMR based on limited fluorescence intensity data at several frequency points. Therefore, careful analysis is required with respect to the estimation of the frequency shift because unexpected factors might generate artifacts. The difference in the photoresponsivity between the photon counters is such an example. We have shown that this photo-responsivity difference originates from the intrinsic nature of NV spins. In case of using multiple DAQ boards in the 6-point method, an instrumental error of experimental hardware is further added. Both these error sources are very small and were not noticed in the previous NV quantum experiments. A careful analysis of the different types of multipoint ODMR measurements has suggested a complicated spectral distortion when the thermal shift of ODMR dip occurs. We have proposed a practical method to cancel these artifacts, whereby the artifact values are subtracted from the obtained counter values, based on the pre-characterized photo-responsivity curves. Using developed real-time thermometry, we succeeded in measuring the temperature of single NDs during stepwise temperature change. We also quantitatively compared the precision and noise sources of the 4-, 3-, and 6-point measurements. We also discussed possible noise sources and artifacts in the quantum thermometry that should be considered in biosensing experiments.
An important implication of the present study is that quantum sensing requires both, a high sensitivity and the effective implementation of sensors for realistic measurements. The present study identifies a variety of artifact sources for real-time operation. The combination of multipoint ODMR measurements and particle tracking also requires great care to avoid these artifacts. Particle tracking is a feedback process used to maximize the fluorescence counts of NDs that move away from the focus, and can be coupled with variations of fluorescence intensity derived from temperature changes in NV centers. The present particle tracking is not significantly coupled with the temperature estimation because of the constant re-positioning time. It may be coupled with thermometry when a fast particle-tracking algorithm is employed. Comparing or combining the ODMR thermometry with all-optical NV thermometry and other optical nanothermometry techniques will be important to confirm the results of the measured temperature data particularly for biosensing applications because of the complex fac-tors that might cause artifacts of the ODMR shift. Such studies on the implementation of quantum sensors into realistic measurement systems are crucial to the future development of quantum sensing. To determine the six frequency points at which the ODMR spectra are probed, the entire CW-ODMR spectral shape is considered for analysis. The analysis is divided into the following chronological steps: 1. The ODMR spectra are fitted using a sum of two Lorentzian functions of the form of to account for the increased dip splitting. Here, Y 0 , A i , Γ i ,D i denote the global offset, spectral area, HWHM, and dip frequency of the i-th Lorentzian, respectively. We however simplified this double Lorentzian using a single Lorentzian as follows: The zero-field splitting (D) is then approximately given byD This approximation is valid as the difference between D andD is one or two orders of magnitude smaller than the linewidth.
2. Each domain is then checked for a local intensity minimum, and data points to the right of the local minimum (ω > ω min1 ) in the first domain and data points to the left of the local minimum (ω > ω min1 ) in the second domain are extracted, as indicated in Fig. S1. This is an important step in case of dip splitting as it sets the ground for a solid first guess for the linear fit for the two slopes.
4. The two linear slopes of the ODMR spectrum are recognized by linear fits. In the first step of each iteration, the routine generates two datasets, of which the first misses the first element and the second misses the last element of the initial dataset. Both datasets are then fitted and the resulting residues are compared. The dataset with the lower residue is passed on. It is in this way, the extent of the linear slope is narrowed down. The routine is executed until a preset maximumallowed residue is surpassed or the amount of elements in the dataset reaches the minimum-allowed amount.
5. Two functions are formulated based on the fitting results of the two linear slopes. One additionally has the opportunity to exclude dip noise on each slope, i.e., parts of the linear function that are in close approximation to the dip and hence, likely to become noisy during experiments with high temperature changes. This is done similarly as step 3 by excluding those parts for which 6. To match the requirement for Eq. 2, I(ω − ) = I(ω + ) pairs are allocated to one another and the two frequencies ω − and ω + of the pair, that exhibit the highest collective δω within the extent of the slopes, are chosen as ω − and ω + .
7. The four frequency points, two on each slope, are uniformly distributed, i.e., equally distanced with δω from ω − and ω + , as depicted in Fig. S1. Appendix C: Calibration of TS as a function of T obj Figure S2 shows the calibration data of T S relative to T obj . The errors of the fitting parameters summarized in Table I propagate to the temperature estimation using Eqs. 2-5, which can be explicitly written as Eqs. D1-D3. Figure S3 shows the propagated errors associated with the time trace in Fig. 7(a). Appendix E: Three kinds of stepwise temperature profiles in the 6-point measurement Figure S4 shows the time profiles of each 4-point measurement conducted in the 6-point measurements. ∆ 6pnt−3 NV that probes the bottom region overestimates the temperature change by a factor of 1.7 compared to the other estimates, which indicates that the bottom region is more shifted than the base region of ODMR dip. Appendix F: Effect of the spectral shape to the temperature estimation in the 3-point method In the 3-point method, the real spectral shape of the ODMR does not exactly match the single Lorentzian profile, which causes deviations in the estimation from the real temperature change. Figure S5 shows how the spectral shape affects the temperature estimates in the 3-point method for the ND used in Fig. 8. We first formulate spectral functions by fitting the experimentally measured ODMR spectrum (Fig. S5 (a)) with single Lorentzian, Gaussian, and pseudo-Voigt function. In addition, we formulate an interpolated function of the experimental spectrum. Second, we shift these four spectral functions by assuming the temperature change and numerically estimate ∆T by the 3-point method (Eq. 3). When the spectral shape is a perfect single Lorentzian, the 3-point method correctly estimates the temperature change as shown in Fig. S5 (b). However, the 3point method exhibits a significant deviation from the Lorentzian-based estimation when the spectral shape is Gaussian or pseudo-Voigt (Figs. S5 (c) and (d)). The deviation increases as the frequencies of I − and I + increase. In case of the real ODMR spectrum (interpolated function), the deviation is significant.
The off-resonant photon count (I 0 ) also affects the temperature estimation. Figures S6 (a) and (b) show the time profiles of the stepwise temperature change measured by the 3-point method with microwave irradiation at 2.65 GHz is ON or OFF during the measurement of I 0 , respectively. The irradiation of far off-resonant 2.65 GHz should provide the same photon count as when no microwave is irradiated; however, I 0 is slightly increased (∼ 1%) when the 2.65-GHz microwave is irradiated in the present experiment, which results in the increase of the temperature dependency of the zero-field splitting (α) from -95 to -105 kHz/K. Such a change of I 0 by the microwave irradiation is most probably caused by the high-frequency noise exerting to the piezo stage or other electronic devices because we have detected a very small level of noise effect to electronics of piezo stage (for example, the microwave irradiation seems causing the positional shift of laser focal point on the order of 10 nm). While it could be possible to completely remove such high-frequency noise, the observed sensitivity of the 3-point method to the variation of I 0 may affect the measurement when working on biological samples because water has absorption still at 2.65 GHz. The observed sensitivity of the 3-point method fundamentally comes from the fact that I 0 term is not completely cancelled in Eq. 3 in contrast to the formulation of Eq. 2. In addition to the sensitivity to I 0 , the selection of the two frequency points on the slope of the ODMR dip affects the estimation as shown in Figs. S6 (c) and (d) where the two frequencies were set to the FWHM and FWQM points. In particular, the measurement at FWQM points is very noisy because of the small contrast of ODMR between the off-resonance and measurement points.
For the four NDs in total, we plot the ratio of α between the 4-and 3-point methods against the theoretical deviation that can be expected from the spectral shape by temperature shifting of the interpolated function as shown in Fig. S7. The experimental deviation is usually larger than the theoretical deviation. This trend suggests that there should be other causes to the present excessive estimation of α of the 3-point method. A more thorough analysis will be necessary in the future development of the 3-point method.  Figure S8 shows the slopes of the linear fits to each stage of Allan variance data for the 4-point measurement of Fig. 9(b). For a more detailed understanding of the 6-point analysis, we plot time profiles of ∆T 6pnt−1 NV , ∆T 6pnt−2 NV and ∆T 6pnt−3 NV for the data shown in Fig. S9(c). The noise level varies for the three types of ∆T 6pnt NV . ∆T 6pnt−3 NV yields the smallest noise and the noise increased in the order of ∆T 6pnt−1 NV and ∆T 6pnt−2 NV . The Allan variances of these three types are shown in Fig. S9 (f). Although the magnitude of σ(τ ) is different for these three types, their variance profiles are the same, which indicates that they share the common noise sources. It should be noted that the present 6-point method provides the same precision compared to the 4-point method because of the estimation function for the frequency shift of ODMR line. Rather, the 6-point method is used to understand the experimental hardware necessary for the multipoint measurements. In principle, the 6-point method provides more information than the 4-point analysis and may be useful to estimate the change of whole spectral shape of ODMR if one uses some designated equations.
Appendix H: Effect of the tracking period In the main text, the positional tracking is performed every t track = 4 s. This tracking time can be varied if the positional drift is sufficiently small during the tracking period. Figures S10 (a) and (b) show Allan variance profiles of the thermometry stability data for t track = 4, 8, 16 s where the interval time is included by interpolation and ignored to regard it in 1-s constant sampling time, respectively. Interestingly, all of them exhibit the flat noise profile until 4 s regardless the tracking period as seen in Fig. S10(a) and, in case of ignoring the tracking time, they show almost the same profile until 80 s as in Fig. S10 (b). The noise profiles between 4 and 80 s differ when including the tracking period via interpolation as in Fig. S10 (a). These results indicate that the first plateau of noise profile until 4 s is due to the uneven time spacing of the data in the Allan variance calculation or interpolation effect. k, prior estimate of p k , and the noise covariance of the prediction and measurement, respectively. The measurement and prediction errors are considered to be Gaussian. Figure S11 compares the time profiles of ∆T NV with the 20-point moving average and the Kalman filter with σ 2 p = 0.01 and σ 2 m = 1. The two types of filters effectively extract the dynamics of temperature in the ND quantum thermometry and match each other. While the 20-point moving average filter is sufficient to extract the temperature dynamics as long as working on this kind of step temperature change, the successful implementation of the Kalman filter should be useful for more realistic transient temperature dynamics that we cannot predict easily. Furthermore, a recent demonstration of active feedback quantum thermometry by coupling with a magnetic field may harness the rapid estimation protocol of the Kalman filter.