Magnetic-field-learning using a single electronic spin in diamond with one-photon-readout at room temperature

Nitrogen-vacancy (NV) centres in diamond are appealing nano-scale quantum sensors for temperature, strain, electric fields and, most notably, for magnetic fields. However, the cryogenic temperatures required for low-noise single-shot readout that have enabled the most sensitive NV-magnetometry reported to date, are impractical for key applications, e.g. biological sensing. Overcoming the noisy readout at room-temperature has until now demanded repeated collection of fluorescent photons, which increases the time-cost of the procedure thus reducing its sensitivity. Here we show how machine learning can process the noisy readout of a single NV centre at room-temperature, requiring on average only one photon per algorithm step, to sense magnetic field strength with a precision comparable to those reported for cryogenic experiments. Analysing large data sets from NV centres in bulk diamond, we report absolute sensitivities of $60$ nT s$^{1/2}$ including initialisation, readout, and computational overheads. We show that dephasing times can be simultaneously estimated, and that time-dependent fields can be dynamically tracked at room temperature. Our results dramatically increase the practicality of early-term single spin sensors.

Quantum sensors are likely to be among the first quantum technologies to be translated from laboratory setups to commercial products [1]. The single electronic spin of a nitrogen-vacancy (NV) centre in diamond operates with nano-scale spatial resolution as a sensor for electric and magnetic fields [2-6]. However, achieving high sensitivities for NV-magnetometers has required a low noise mode of operation available only at cryogenic temperatures, which constitutes a major obstacle to realworld applications [7,8]. Machine learning has played an enabling role for new generations of applications in conventional information processing technologies, including pattern and speech recognition, diagnostics, and robot control [9, 10]. Here we show how machine learning algorithms [11][12][13][14] can be applied to single-spin magnetometers at room temperature to give a sensitivity that scales with the Heisenberg limit, and reduces overheads by requiring only one-photon-readout. We go on to show that these methods allow multiparameter estimation to simultanesouly learn the decoherence time, and implement a routine for the dynamical tracking of time-dependent fields.
Magnetic field sensing with an NV centre uses Ramsey interferometry [1,15,16]. With a microwave π/2pulse the spin vector is rotated into an equal superposition of its σ z spin eigenstates, such that its magnetic moment is perpendicular to the magnetic field (B) to be sensed [17,18]. For some Larmor precession time, τ , and frequency, f B = γB/2π, the relative phase between the eigenstates becomes φ = 2πf B τ , where γ is the elec-tron gyromagnetic ratio of magnetic moment to angular momentum. After a further π/2-pulse to complete the Ramsey sequence, a measurement of the spin in its σ z basis provides an estimate of φ, the precision of which is usually improved by repeating the procedure. Collecting statistics for a series of different τ , produces a fringe of phase varying with time, from which B can be inferred.
Increasing the sensitivity of a magnetometer translates to increasing its rate of sensing precision with sensing time. The intrinsic resource cost in estimating B is the total phase accumulation time [19][20][21], which is the sum of every τ performed during an experiment. A fundamental limitation on the sensitivity of an estimate of B is quantum projection noise -from the uncertain outcome of a σ z -basis measurement -the effect of which is conventionally reduced through repeated measurements, at the cost of increasing the total sensing time. A further typical limitation on sensing precision is the timescale, T * 2 , on which spin states decohere due to inhomogeneous broadening (though spin-echo methods could extend this [22]). In an idealised setting, with an optimal sensing protocol, the Heisenberg limit (HL) [23] in sensitivity can be achieved, to arrive at a precision limited by T * 2 in the shortest time allowed by quantum mechanics. In practice, overheads such as the time required for initialisation, computation, and readout must also be accounted for, while repeated measurements due to experimental inefficiencies and low-fidelity readout increase the time to reach the precision limited by T * 2 . The increase in total sensing time due to overheads and repeated mea-surements thus decreases the sensitivity.
A particularly relevant overhead is the time taken to readout the state of the spin, which depends on the experimental conditions. At cryogenic temperatures, spin-selective optical transitions can be accessed such that, during optical pumping, fluorescence is almost completely absent for one of the spin states. This single-shot method allows the spin state to be efficiently determined with a high confidence for any given Ramsey sequence (up to collection and detection efficiencies), resulting in a relatively low readout overhead. At room temperature, in contrast, where spin-selective optical transitions are not resolved in a single shot, readout is typically performed by simultaneously exciting a spin-triplet that includes both basis states, and observing fluorescence from subsequent decay, the probabilities for which differ by only ≈ 35%. Overcoming this classical uncertainty (in addition to quantum projection noise) to allow a precise estimate of the relative spin state probabilities after a given precession time τ , required repeated Ramsey sequences to produce a large ensemble of fluorescent photons. Such a large readout overhead significantly reduces the sensitivity of NV-magnetometry, and so far, the high sensitivities reported at cryogenic temperatures have been out of reach for room temperature operation by several orders of magnitude. Yet NV-sensing at cryogenic temperatures is impractical for biological applications such as in-vivo measurements [2] and monitoring of metabolic processes [24].
A large body of work [8,18,20,23,[25][26][27][28][29] has developed and improved quantum sensing algorithms to surpass the classical standard measurement sensitivity (SMS). While the SMS bounds the sensitivity that can be achieved for NV-magnetometry with constant phase accumulation time, phase estimation algorithms using a set of different precession times τ i , allow the SMS to be overcome [18,20]. Further improvements in sensitivity are possible by adapting measurement bases, to require fewer Ramsey sequences [8]. However, sensing algorithms that use a standard Bayesian approach typically involve probability distributions that are computationally intensive to update, or which contain outlying regions that significantly affect an estimate. An appealing alternative [12,13,30] uses techniques from machine learning to approximate a probability distribution with a relatively small collection of points, known as particles. These methods have been applied to the problem of learning a Hamiltonian [13,31], and to implement noise-tolerant quantum phase estimation [32].
Here, we experimentally demonstrate a magnetic field learning (MFL) algorithm that operates with on average only one photon readout from a single NV centre at room temperature, and achieves a level of sensitivity so far only reported for cryogenic operation [8]. MFL adapts efficient Bayesian phase estimation and Hamiltonian Learning techniques for magnetometry to achieve a fast convergence to the correct value of the magnetic field, and requires no adaptation of measurement bases. The parameters of our MFL algorithm, including the number of particles, can be optimised prior to operation without adding to the sensing time overhead. Each precession time τ i is chosen [33] as the inverse of the uncertainty σ i−1 in the algorithm's previous estimate of B, allowing τ to grow exponentially to achieve HL scaling in sensitivity. We tested MFL on a large data set from 60, 000 Ramsey interferometry experiments on a bulk diamond NV centre. We benchmark the performance of MFL against standard FFT methods, as well as previous experimental results from other phase estimation algorithms. Simultaneous to the learning of B, MFL produces an estimate of T * 2 , which, in contrast to other phase estimation algorithms, allows MFL to lower bound its sensitivity to the SMS, however long its implementation runtime. Remarkably, we show that MFL enables the dynamical tracking of time-varying magnetic fields at room temperature.
In general, Hamiltonian learning algorithms estimate the parameters x of a model HamiltonianĤ( x), through iterations of informative measurements [13]. At each step, a prior probability distribution P ( x) stores estimates of every parameter and its uncertainty [12]. Similarly, the four principal recursive steps of MFL, called an epoch and depicted in Fig. 1(a-d), are: (a) Choose τ i for the next Ramsey sequence from the heuristic τ i 1/σ i−1 , where σ 2 i−1 is the uncertainty embedded in the prior P ( x i−1 ). (b) Allow the system to evolve underĤ for a time τ i , using the Ramsey sequence shown in Fig. 1(e-h). (c) Measure the outcome E, extracted from the photoluminescence count, e.g. Fig. 1(i). (d) Update the prior using Bayes' rule, P ( x|E) ∝ L(E| x; τ )P ( x), where L is the likelihood function [12]. The use of sequential Monte Carlo algorithms [12,13,30] where particles are reallocated when required, makes the inference process practical and computationally efficient. Here, the Hamiltonian for the two relevant NV states is modelled aŝ so that ω is the only parameter to be estimated to learn the value of B. Experiments were performed using a confocal set-up, at room temperature, with an external magnetic field of ≈450 G, parallel to the NV centre axis, giving a Zeeman shift of ω = γB [20], where γ ≈ 2π × 28 MHz/mT [35]. For each Ramsey sequence, the electronic spin is initialised and readout with 532 nm laser pulses, by detecting the photoluminescence (PL) signal with an avalanche photodiode (APD) for 350 ns. The PL signal is then normalised to extract an experimental estimate for L. For every sequence, the experimental overhead is the sum of the times for the laser pulses length (3 µs), an idle time for relaxation (1 µs), a short TTL pulse for synchronization (20 ns) and the duration of the two MW-pulses (together ≈ 50 ns).
Data for several hundred Ramsey fringes were generated from experiments on three NV centres, labelled α, β and (see Supplementary Table S1 the dataset 1 comprises Ramsey sequences for precession times increasing from τ 1 to τ 500 in steps of 20 ns. For each τ i , 20275 sequences were performed, and data were stored such that the results from each individual sequence could be retrieved. Therefore, 20,275 M 500 subsets of data from 1 could be selected and combined to construct fringes comprised of M sequences. Running MFL on a sample of these subsets allowed its performance to be compared over fringes with different (but fixed within a fringe) numbers of sequences including down to M = 8, where (due to low collection efficiencies) the average PL count (n phot ) is approximately one photon. Additional experiments on the three NVs generated further data sets for several hundred fringes that each comprised tens of thousands of averaged sequences. All implementations of MFL are reported as representative behaviour averaged over R =1000 independent protocol runs (unless otherwise stated) each using a single fringe from these data sets.
We begin by analysing how the estimate of uncertainty in the magnetic field, σ(B est ), given by the variance of P ( x), scales with the number of MFL epochs. For this FIG. 2. Experimental results for scaling of precision. Lines represent median values, and performance within the 68.27% percentile range is shown as shaded areas. (a) Estimated uncertainty σ(Best) is plotted as a function of the epoch number; data from one sample run is shown as blue circles. In the inset, a plot of the final σ(ωest) in the Ramsey frequency for a typical protocol run, from FFT (Lorentzian fit) and MFL (Gaussian fit). (b) The scaling of precision with total phase accumulation time T , excluding all overheads, is shown as density plots with a linear least-squares fit (blue dashed line). The FFT approach is plotted as a grey dashed curve. Scaling for phase estimation algorithms in Refs. [18,20] (respectively green and violet lines) are also reproduced. The inset shows data from a Ramsey fringe in normalised PL, with a 20 ns sampling rate, up to τmax ∼ 0.14 T * 2 . A least-squares-fit with a decaying sinusoid is shown as a blue dashed line.
purpose, we use the dataset α 1 , with 120 fringes all obtained with M = 18500 sequences. At every MFL epoch, given the adaptively chosen phase accumulation time τ i 1/σ i−1 , the experimental datum with τ minimising (|τ − τ i |) is provided to the MFL updater. Figure 2(a) shows an exponential decrease in the scaling of σ(B est ), until ∼ 50 epochs are reached. After this point, the precession times τ selected by MFL saturate at τ max = 10 µs, and σ(B est ) is reduced only polynomially fast, by accumulating statistics for τ already retrieved. This slowdown is analogous to that occurring when the heuristic requires τ exceeding the system dephasing time [12] (see Supplementary Information for details). A comparison with FFT methods, inset in Fig. 2(a), finds that σ(B est ) is ∼ 40 times smaller for MFL.
Neglecting overheads, the sensitivity η of a magnetometer, is calculated from where T := N i τ i from N epochs. Figure 2(b) plots η 2 against T , for each epoch, and compares MFL with the standard FFT method, using the same α 1 set. The precision of MFL scales as T −0.99±0.02 , which overlaps with HL scaling (∝ T −1 ). The FFT method rapidly approaches the SMS (∝ T 0 ), whereas (neglecting overheads) the scaling reported for quantum phase estimation methods are qualitatively comparable to MFL, at the expense of more intensive post-processing [29].
For a true measure of absolute sensitivity, experimental and computational overheads must be accounted for. Including initialisation, read-out and computation time, into the total running timeT , we redefine Eq. 2 for absolute scaling ofη (see Methods for details). The average number of luminescent photons, n phot , used for readout during each epoch, scales linearly with the number of sequences M (n phot ∝ M ); on average, one photon every M 8 sequences is detected. As shown in Fig. 3, we use MFL to measure the scaling ofη withT (up to 250 epochs) for decreasing numbers n phot within each epoch. The plots have a shape characterised by an initial slow decrease, followed by a fast increase in precision. The relatively slow learning rate for the short phase accumulation times in the early stages of the algorithm leads to a slow increase in phase accumulation time, since (τ i ∝ 1/σ i−1 ). The algorithm is slowly learning but the total measurement time is increasing faster than the decrease in uncertainty. However, when the particles start converging to a valid estimate of B, the uncertainty decreases exponentially, overcoming the corresponding increase in sensing  times. Our analysis compares well with previous results performed under cryogenic conditions, and scaling parameters for linear least squares fitting obtain a consistent overlap with HL scaling for protocol update rates up to 13 Hz.
Decreasing the number of sequences (thus n phot ) per epoch increases the statistical noise, which extends the slow learning period. However, the total timeT decreases with n phot to produce an increased sensitivity in a shorter time. For n phot < 4, readout infidelities and losses become the dominant noise mechanisms. In the case for n phot = 1 therefore, these additional sources of noise were included in the model. For n phot = 1 we obtain a sensitivity of 60 nT s 1/2 in ∼ 10 ms (see also the Supplementary Information).
When an NV-sensing algorithm begins to request precession times τ i beyond T * 2 , where no information can be retrieved, the effectively wasted sensing time reduces the sensitivity. Knowledge of T * 2 can ensure that all τ i are less than T * 2 , to prevent this reduction in sensitivity and instead guaranteeing it to scale at the SMS for long sensing times. Learning T * 2 simultaneously with B, as part of a multi-parameter estimation strategy [36,37], can be more efficient than independently estimating T * 2 ahead of each sensing experiment. MFL naturally operates as a multi-parameter estimation protocol when the prior probability distribution P ( x) is multivariate [12], with the uncertainty in its joint probability distribution captured by a normalised covariance matrix Σ.
Each precession time τ i is chosen proportionality to the inverse of the (Frobenius) norm of the covariance matrix (see Methods). This can incur an initial slow learning period due to shorter τ i being initially most useful in estimating B while longer precession times are better for an estimation of T * 2 . We therefore begin MFL in the single parameter estimation mode for B, and introduce the simultaneous learning of T * 2 at epoch N = 100 (chosen empirically). Figure 4 shows results from running the MFL algorithm on the β 1 data set, where τ max > T * 2 . As is the case for single parameter estimation results, we find an exponential scaling of the generalised uncertainty with the number of epochs, though the learning rate for B is faster than that for T * 2 . There is a discrepancy between the estimate of T * 2 from MFL shown in 4(a) and the fit (non-weighted least-square) to the decaying sinusoid shown in 4(d). The discrepancy between these two estimates results from the PGH preferentially requesting τ i < T * 2 , such that an estimate of T * 2 is more informed by data at these relatively shorter time scales (see Methods).
The strength of B may not be fixed in time for typical sensing experiments [34]. The Bayesian inference process is conceived to learn on-line when experimentally retrieved likelihoods P (E| x) conflict with its prior information. Thus, the ability to track time-varying magnetic fields follows naturally from the MFL's processing speed and adaptivity. With minor controls in the Bayesian inference procedure, MFL can account for such fluctuations and high-amplitude changes in the sensed B (See Methods for details). Here, we test an algorithm that tracks a B set -field using the 3 dataset, where B set was experimentally modulated by changing the position of the permanent magnet (see Fig 1b). Data recording was paused during magnet adjustments, leading to stepwise transitions in this data set, where the magnetic field instantly jumps to a new strength then remains stable for a period of between hundreds and thousands of milliseconds. 0.2 ms, with the remaining time costs coming from experimental routines. We note the computational efficiency of MFL allows a computational overhead (τ comp =0.21 ms) that is smaller than the average phase accumulation time (τ =0.41 ms) and two orders of magnitude smaller than the experimental overheads (τ exp =16.28 ms).
Figure 5(c) shows numerical results demonstrating the resilience of MFL against a dynamic component of increasing frequency, when tracking an A.C. oscillating field B(τ ) = ω(τ )/γ, where we choose ω(τ ) = ω 0 + w cos(ντ ), with ν a constant and w ω 0 . The effectiveness of the tracking for each run is captured by , averaged for all N epochs performed, capturing the efficiency of the tracking as B is not constant along epochs. Typical estimation errors in B are lower than 3% for dynamic components up to 18 µT/ms.
The performance of magnetic field learning found for our room temperature set-up is comparable to other protocols in cryogenic environments [8]. These methods could be applied to other sensing platforms where noise has been a limiting factor. Alternatively, in pursuit of the fundamental limits in absolute sensing precision they could be used together with single-shot readout [7], adaptive measurement bases [8], faster communication, and dynamical decoupling techniques [38,39]. Our methods would be particularly effective in applications where single-spin sensing is desired for nanoscale resolution, but where cryogenic conditions are prohibitive, such as biological sensing and in new nano-MRI applications [6, 40].

MFL execution
The data processing was performed by adapting the open Python package QInfer [41] to the case of experimental metrology.
In order to describe experimental data from Ramsey fringes collected from an NV centre with dephasing time T * 2 , immersed in a magnetic field of intensity B, we adopt the likelihood function as in [12]: where T * 2 is a known parameter, or approximated by T * 2 = ∞ in all cases when T * 2 τmax. In cases when M > 1, the datum adopted was obtained from M combined sequences as stated in the main text. Results in Fig. 2a&b and Fig. 5 were all obtained adopting a majority voting scheme to pre-process data from combined sequences [32]. Majority voting decides each single-shot datum according to the most frequent outcome. This is done by previously determining, during the characterisation of the experimental set-up, the average photoluminescence counts (n) detected throughout the execution of a Ramsey sequence. The datum of a single outcome is determined by comparing the number of photons detected during the measurement (extracted from M sweeps), n, andn. If n >n then we set the value of the outcome to |1 , otherwise to |0 . Without this scheme in place, the outcome of a measurement is assigned sampling from the set {|0 , |1 }, with probabilities P ∝ {1 − n/nmax, n/nmax}, respectively, with nmax the maximum photoluminescence counts estimated during the characterisation.
Other than the study ofη in Fig. 3, further examples of the performance of MFL with no majority voting scheme in place are reported in the Supplementary Information.
Errors in the precision scaling are estimated from a bootstrapping procedure, involving a sampling with replacement from the available runs (R). The cardinality of each resample matches R. The resampling is repeated 0.1 R times. Median precision scalings from each resample are estimated, and the standard deviation from this approximate population of scaling performances is provided as the precision scaling error.
Absolute scaling In Fig. 3 we reported the absolute scaling of η 2 = σ 2 (Best)T , which requires to take into account the main experimental and computational overheads contributing to the total running timeT of a phase estimation (PE) protocol (communication time τcomm is not considered here). In particular, these can be listed as: the time required by the PE algorithm to compute the next experiment τ comp (here ∼ 0.4 µs per step, per particle on a single-core machine), the duration of the laser pulse τ las for initialisation and readout (3 µs in total), the waiting time τ wait for relaxation (1 µs), a short TTL pulse τ T T L for the photodetector (20 ns) and the duration τ M W of MW-pulses (approximately 50 ns in total). Including variable and constant overheads, we obtain: after N epochs of a PE algorithm.
Multi-parameter Learning For the multi-parameter case, we use again Eq. 3, but now considering the unknown parameters x = {B, T * 2 }. Each precession time τ i is chosen proportionally to the inverse of the Frobenius norm of the covariance matrix, Σ F = cov(B/b, T * 2 /t 2 ) F . The parameters b = max B:P (B) =0 B and t 2 = min T * 2 :P (T * 2 ) =0 T * 2 are introduced to render Σ F dimensionless, with P the prior at epoch N = 100, when both parameters start to be learnt simultaneously. In this analysis this corresponded to b = 11 µT and t 2 = 20.2 µs, however we stress how different choices would be possible, with equivalent results for Σ F , up to a normalisation factor. We observed that MFL estimates of the dephasing time may differ consistently from a non-weighted leastsquare fit. In the presence of dephasing, the heuristic of MFL will preferentially adopt experiments with τ < T * 2 . This relation is similar to a weighing mechanism of the data (see also Supplementary Information ), preferring more consistent observations. On the other hand, a least-square fit will attempt to equally mediate over data-points where the contrast in the fringes is almost completely lost, underestimating T * 2 . MFL tracking We mentioned that Bayesian inference processes are ideally suited for tracking purposes. However, we observe that in cases where the magnitude of the changes in the parameter x completely invalidates the a-posteriori credibility region, the recovery time of a standard Hamiltonian learning protocol might be unsuitable for practical applications. To tackle also this situation, we modified here the standard update procedure to reset its prior when the effective sample size of the particles' ensemble is not restored above the resampling threshold by a sequence of resampling events. Details and a pseudocode are provided in Supplementary Information.
FFT execution For most analyses, FFT estimates were run against the whole datasets available. For example in the case of Fig. 2, the final estimate provided by a single run of FFT was performed using once all of the 500 phase accumulation times, recorded with 20 ns steps, for a representative subset among those available in α 1 (Supplementary Table 1). We emphasise how this amounts to twice as many τ 's as those actually used by the MFL algorithm (being the single-run estimate reported as converged after 250 epochs).
The only exception is the tracking in Fig. 5, where the datapoints were cumulatively added to the dataset. In such tracking applications, as long as B is kept constant, the estimate from FFT compares to MFL in a way similar to Fig. 2. However, FFT keeps estimating B from the prominent peak in the spectrum, corresponding to the ω that was maintained for the longest time, not the most recent. Thus, it fails to track changes as they occur.
Experimental details In Ramsey interferometry, as performed here, we measure the magnetic field component parallel to the NV centres' symmetry axes. However, the MFL protocol can be expanded to differently orientated NV centres, to detect arbitrary orientated magnetic fields.
The experiments are performed here using two different 12 C isotopically purified diamond samples. For the Ramsey interferometry we use the ms = 0 and ms = −1 electronic sublevels.
Photon number estimation After exciting a single NV centre by a 532nm laser pulse, the red-shifted, individual photons were detected by an avalanche photodiode. A time-tagged single photon counting card with nanosecond resolution was used for recoding. A TTL-connection between the time-tagger and the MW pulse generator synchronises the photon arrival time with respect to the pulse sequence and allows to record the number of detected photons for every single laser pulse. Thereby, the photon detection efficiency is mainly limited by the collection volume, the total reflection within diamond (due to the high refractive index) and further losses due to the optics. This results in a photon detected about every eighth laser pulse. Thus, to readout the NV state with high-fidelity (and about 30% contrast) multiple measurements are usually required for meaningful statistics. Acknowledgements

CLASSICAL AND QUANTUM LIKELIHOOD ESTIMATION
In the main text we have introduced the Bayesian inference process underlying the MFL protocol (known as CLE, Classical Likelihood Estimation[12]) as composed by four main steps. Here we expand the discussion to provide additional details and comments about the adoption of CLE.
1. At each epoch the prior distribution P ( x) is used to choose what experimental setting to use for the next iteration.
In MFL, the only experimental setting is the phase accumulation time τ , that can be updated effectively using the so called particle guess heuristic (see the section below).
2. The quantum system undergoes an appropriate evolution, according to the HamiltonianĤ.
(a) The system is prepared in an appropriate initial state |ψ , chosen such to have informative evolution under H. E.g. a state orthogonal to the Hilbert subspace spanned by the Hamiltonian eigenstates. We remark how |ψ is not adaptive in CLE. In this work, the NV centre is always prepared in |ψ = |+ . (b) It is thus possible to apply Bayes' rule: where P ( x) can be immediately inferred from the prior at the corresponding epoch, while P (E) is a normalization factor.
Steps 1 -4 are repeated until the variance of the probability distribution σ( x est ) converges, or falls below a predefinite target threshold. In cases with limited readout fidelity like for the NV centre set-up presented here, in step 3 a meaningful statistics might be cumulated for E repeating the same measurement a number of times M , as suggested in the main text. Evidences from the text suggest that in most cases, this is a suboptimal choice for the absolute scaling performance of the MFL protocol. Note how only steps 2 & 3 involve the quantum sensor. All other steps require instead a simulator. In particular, step 4(a) can be performed on a classical or quantum simulator, the choice of the second being justified whenever the size of the sensor, and the eventual lack of an analytical model to simplify the evolution, make the system simulation classically expensive. In this case, the inference process is known as Quantum Likelihood Estimation (QLE, [30]).  τmax, reported in the plots in darker and brighter colours, respectively. The first dataset is collected with B 58 µT , whereas the second has B 6 µT a, Renormalized photon counts along two different Ramsey experiments with the same NV centre (scatter plots). Superimposed a least-square fit (dashed lines), adopting the oscillatory function with depolarizing noise as in Eq. 3 of Methods. b, Estimated uncertainty σ(Best) and ratio between PGH-generated time and τmax as available from the first dataset, plotted against each epoch of the MFL algorithm. A majority voting method is adopted, under the hypothesis that T * 2 τmax. Solid lines are median values calculated over 1000 independent runs, whereas shaded areas are 68.27% percentile ranges centred around the median. Superimposed as a scatter plot, a sample of times generated by the PGH during an representative run. c, Same as in b, for the case where T * 2 τmax. No majority voting is in place, and data from the experiment are extracted probabilistically from the experimentally estimated likelihoods.

SEQUENTIAL MONTE CARLO APPROXIMATION AND PARTICLE GUESS HEURISTIC
The protocol performances in terms of computational overhead are made possible by adopting advanced approximate Bayesian inference methods. In particular, MFL inherits from CLE the Sequential Monte Carlo (SMC) approximation [12][13][14]. Within this approximation only a finite number of values x i (called particles) are sampled from the prior distribution, and thus used to approximate the prior in each update step. This approximation makes the Bayesian update as in Eq. S1 numerically feasible.
If the particle positions x i were held constant throughout the inference process, and starting the protocol from a uniform prior, the cardinality of their set should scale approximately as n part := |{ x i }| ∝ ∆B/σ(B est ), where ∆B is the expected magnetic field range to be sensed, andσ(B est ) is the targeted uncertainty upon convergence. This is inefficient, as with the inference progressing through epochs, many particles will provide very limited support to the updated prior approximation. Indeed, for a successful learning process lim N →∞ P ( x i ) → 0 for the weights w i ∝ P ( x i ) of most particles, as they have been effectively ruled out by the observations. This inefficiency can be addressed with resampling methods, that allow the particles to be sampled again from the updated posterior, whenever their weights signify that the effective size of the sampled particles i w 2 i has fallen below a user-defined threshold. Following [41], here we adopt a Liu-West resampler (LWR) with optimised resampling threshold t resample = 0.5 and smoothing parameter a= 0.9. These parameters allow to tune when and to what extent the positions of the particles can be altered by the LWR [12]. Hence, it was possible to accurately represent P ( x) throughout the whole protocol execution, whilst employing not more than 1500 particles for the discretisation in most cases. The only exceptions were limited fidelity cases, as for the absolute scaling we chose the number of particles with M the number of averaged single sequences selected. This increase in the number of particles can be justified by a corresponding reduction in the risk of "aggressive" resampling leading to inference failures, heralded especially by multi-modality in the parameter distribution [14]. The particle guess heuristic (PGH) plays a fundamental role in the effectiveness of the MFL protocol. PGH was introduced in [13] to provide optimal choice of the experimental τ (here the phase accumulation time) in analytically tractable cases of Hamiltonian Learning protocols. Such cases happen to include the sensing Hamiltonian presented in the main paper as Eq. 1. The PGH samples two particles { x 0 , x 1 } from the particle distribution P ( x i ), and then chooses: In the single parameter where only B is sensed, τ 1/σ(B est ), where σ(B est ) represents the standard deviation of the Gaussian-approximated posterior distribution P (B). Intuitively, this corresponds to selecting longer, more informative accumulation times, as the estimated uncertainty about the parameter to be learned shrinks.

THE ROLE OF T * 2 IN TIME ADAPTIVITY
In the main text, we observed the emergence of a slowdown in the learning rate, when MFL chooses accumulation times τ i ≥ τ max . This slowdown appears when plotting either the scaling in σ(B est ) as well as η (respectively Fig. 2a&b, referring for example to the dataset α 1 ). This dataset represents a situation, where τ max T * 2 is chosen as a maximum time budget per-epoch. In this case, once the PGH encounters the τ max limit, learning by statistical accumulation of data-points with τ τ max will occur, and MFL precision scaling will tend to η 2 ∝ 1/ √ τ max T . We highlight this behaviour, using averaged sequences from the whole set 1 , in Fig. S1b, plotting the scaling in σ(B est ) alongside with the ratio τ /τ max . When the plateau in σ(B est ) occurs, we correspondingly observe that a typical run of MFL starts suggesting τ ≥ τ max , before saturating as the uncertainty converges.
Note, this artefact deriving from the artificial choice of a maximum time budget τ max is equivalent to the phenomenon exhibited in correspondence of dephasing noise, reducing the contrast from experimental Ramsey fringes, like in the data reported in Fig. 4. To prove it, we show in Fig. S1c the same performance for the dataset 2 , where τ max = 100 µs 1.6 T * 2 (estimated from a least-squares fit), and B =6 µT to have approximately the same number of periods in the corresponding Ramsey fringe, as in dataset α 1 (refer to Fig. 4). For MFL to deal properly with decaying data, in this analysis we remove any majority voting scheme from the data processing, and at each epoch the corresponding datum E is probabilistically extracted from the experimentally estimated likelihood (see Methods). This justifies the slowdown in the scaling of σ(B est ), as each data-point is now affected by the same amount of binomial noise that would occur in a set-up with the same readout fidelity, but single-shot measurements. In other words, the additional information acquired about the likelihood L(B; τ ) by combining M 1 sequences for each measurement is partially removed from the inference process by the bipartite E sampling. For this case, we observe the plateau in σ(B est ) occurs when the adaptive choice of the phase accumulation time saturates in average to τ T * 2 (though a single run will oscillate around this value, as emphasised by the behaviour for a single run also reported in Fig. S1c). Similarly, also the scaling in precision plateaus when τ T * 2 (not reported for brevity), slowing towards η 2 ∝ 1/T * 2 . A formal discussion of this saturation is performed in the following section.

PRECISION BOUNDS AND SENSITIVITY
In assessing the performance of MFL, it is helpful to compare the uncertainties achieved with those achieved by FFT, using the same datasets. We begin by considering the Cramér-Rao bound [42]. Suppose that our procedure implements a functionB(data) of the entire data record (data) that estimates the true magnetic field B. After, we want to minimise the squared error L = (B(data) − B) 2 as much as possible. IfB as the average ofB(data) − B over all possible data records is zero, then we state that our procedure is unbiased. In this case, the Cramér-Rao bound provides, that on average over all data records one finds [43], where τ i is the evolution time used at the ith step of the MFL procedure. We stress that this inequality holds only on average; after all, we might be "lucky" with the estimate that we assign to any particular data record. The right-hand side of this inequality is derived using the Fisher information I for a single measurement, (S6) The Fisher information for an experiment consisting of multiple independent measurements is given by the sum of the Fisher informations for each measurement, giving the Cramér-Rao bound (Eq. S4). Let T be the total phase accumulation time used for a single "run" of a magnetometry procedure; in our case, T = i τ i . By the above argument, L can then scale no better than T −2 , corresponding to consolidating our phase accumulation into a single measurement. This observation is sometimes referred to as the Heisenberg limit for magnetometry.
At the other extreme, suppose that we have a total time budget of T , that we are able to spend on a magnetometry experiment, such that we can consider repeating a given procedure N = T /τ times. The factor of N then factors out of the Cramér-Rao bound, giving The observation that L ∝ 1/T is sometimes referred to as the standard quantum limit,in the case that we repeat a magnetometry procedure for N independent iterations. Indeed, we can use this observation to motivate a general figure of merit for the time budget of the a given magnetometry procedure. Assume that the Fisher information for a given procedure is I = N I 0 , where I 0 (τ ) is the Fisher information for a single repetition using phase accumulation time τ . Then it follows I = T (I 0 /τ ). Next, we define η(T ) := I 0 (T )/τ as the sensitivity of the proposed magnetometry procedure.
Using this definition, we can then restate the standard quantum limit as the statement that η 2 (T ) is constant in T . That is, a magnetometry procedure bound by the standard quantum limit gains no advantage from phase accumulation time beyond that conferred by repeating the entire procedure for R independent runs. By contrast, a Heisenberg limited magnetometry procedure has a sensitivity which scales as η 2 (T ) ∝ 1/T , indicating that an additional advantage can possibly be gained by using longer phase accumulation times.
So far we have considered the case in which T * 2 T , such that we can approximate the dynamics of our magnetometry experiment as dephasing-free. The dichotomy between the Heisenberg and standard quantum limit scalings, however, is changed by dephasing such that we have to consider the definition of the sensitivity in the dephasing-limited case. In particular, Ref. [43] derived that the Fisher information for T * 2 -limited magnetometry is given by where Γ := 1/T * 2 is used to represent dephasing in frequency units, in analogy with γB. We note, that unlike the Fisher information describing the noiseless case, the bound Equation S8 for the dephasing-limited case is not independent of the true value of B. Thus, to determine the achievable sensitivity in the case of dephasing-limited magnetometry, we must either assume a particular value of the field B being estimated, or must generalise beyond the Cramér-Rao bound. We choose the latter case in this work, which provides further insight into the trade-off between phase accumulation time and experimental repetitions for Γ > 0.
Specifically, we consider the van Trees inequality (also known as the Bayesian Cramér-Rao bound) [44], where J π describes the error that can be achieved using prior information, and E B an expectation value over a distribution of different hypotheses about the field B. We intentionally do not further define J π , as this term depends on the context in which a magnetometry procedure is used, rather than on the magnetometry procedure itself. Moreover, the effect of J π is minimal in the limit of large experimental data sets, such that ineffectively consists of a correction to the Cramér-Rao bound in the case of finite data records [45].
In analogy to the Fisher information derivation above, the field-averaged Fisher information in the dephasing-limited case gives E B [I(B)] ≤ (τ e −τ Γ ) 2 for a single phase accumulation τ . Hence, the analogous bound to Equation S4 is given by (S10) To derive the sensitivity in the van Trees case, let J 0 (T ) = E B [I 0 (T, B)] be the average Fisher information for a dephasing-limited procedure. We can then define the average sensitivityη(T ) = J 0 (T )/T for a total phase accumulation time T to reformulate the van Trees inequality in a more practical form for our purposes, thus . (S11) Following Equation S10,η 2 (T ) is constant if a fixed phase accumulation time is used, whileη 2 (T ) ∝ 1/T if T Γ 1. The average Fisher information saturates at T = T * 2 , however, such that the Heisenberg and standard quantum limits coincide as T approaches T * 2 . Therefore, the performance observed in Fig. 1a is limited by saturation near T = T * 2 0.1s.

ABSOLUTE PRECISION SCALING
As discussed in the main text and Methods, using a room temperature set-up can be challenging for the effect of quantum projection noise and readout infidelities. These need to be properly addressed when reduced sequence repetitions lead to a low number of PL photons to be detected when recording a fringe. The results in terms of absolute scalingη have already been discussed (see e.g. Fig. 3 in the main paper). Here, we complete those analyses with additional studies. In Fig. S2a&b, we report respectively the scaling and ultimate uncertainty achievable by MFL after 150 epochs for a subset of cases with n phot = 1, ... 2475. Fig. S2b suggests an approximate ∝ 1/n phot gain in the uncertainty achievable halting the protocol after a fixed number of steps. From Fig. S2c we observe that the choice for N in this case is motivated by running MFL for enough steps, to observe for all cases the convergence of the median quadratic losses -i.e. the square error in the parameters' estimate, here Q.L. := (B − B est ) 2 . To estimate the true B, we run MFL once over the whole dataset (M = M tot = 20275), checking that the result is consistent with FFT. We remark how the advantages in increasing the n phot used (along with the higher precision scaling that increases fromT −0.73 for n phot = 4 to Heisenberg limited for n phot ≥ 20) come at the expense of worse final absolute sensitivities achievable by the protocol. This is due to the linear increment of experimental overheads with n phot . The robustness against sources of noise present in the room temperature set-up is emphasized by Fig. S2d, where we plot the estimates obtained by MFL for a similar subset of n phot 's analysed. We observe that for n phot ≥ 4, MFL estimates are all substantially consistent with the result obtained for M = M tot , within the estimated uncertainty σ(B est ) and taking into account minor fluctuations in B that might have occurred during the collection of the whole dataset. By contrast, we observe how FFT estimates are completely unreliable at the noise level corresponding to O(n phot ) = 10.

2534
< l a t e x i t s h a 1 _ b a s e 6 4 = " 4 X c 3 8 t 3 i 3 l x J c G 4 M q a R 4 e n t 8 j C 4 = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C V b B U 9 l t K 3 o s e P F Y w X 5 A u 5 R s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 C 7 z u 0 9 U a R b J R z O L q S / w W L K Q E W w y q X Z d b w z L F b f q L o D W i Z e T C u R o D c t f g 1 F E E k G l I R x r 3 f f c 2 P g p V o Y R T u e l Q a J p j M k U j 2 n f U o k F 1 X 6 6 u H W O L q 0 y Q m G k b E m D F u r v i R Q L r W c i s J 0 C m 4 l e 9 T L x P 6 + f m P D W T 5 m M E 0 M l W S 4 K E 4 5 M h L L H 0 Y g p S g y f W Y K J Y v Z W R C Z Y Y W J s P C U b g r f 6 8 j r p 1 K q e W / U e G p X m R R 5 H E c 7 g H K 7 A g x t o w j 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A z 3 + N U g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 X c 3 8 t 3 i 3 l x J c G 4 M q a R 4 e n t 8 j C 4 = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C V b B U 9 l t K 3 o s e P F Y w X 5 A u 5 R s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 C 7 z u 0 9 U a R b J R z O L q S / w W L K Q E W w y q X Z d b w z L F b f q L o D W i Z e T C u R o D c t f g 1 F E E k G l I R x r 3 f f c 2 P g p V o Y R T u e l Q a J p j M k U j 2 n f U o k F 1 X 6 6 u H W O L q 0 y Q m G k b E m D F u r v i R Q L r W c i s J 0 C m 4 l e 9 T L x P 6 + f m P D W T 5 m M E 0 M l W S 4 K E 4 5 M h L L H 0 Y g p S g y f W Y K J Y v Z W R C Z Y Y W J s P C U b g r f 6 8 j r p 1 K q e W / U e G p X m R R 5 H E c 7 g H K 7 A g x t o w j 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A z 3 + N U g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 X c 3 8 t 3 i 3 l x J c G 4 M q a R 4 e n t 8 j C 4 = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C V b B U 9 l t K 3 o s e P F Y w X 5 A u 5 R s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 C 7 z u 0 9 U a R b J R z O L q S / w W L K Q E W w y q X Z d b w z L F b f q L o D W i Z e T C u R o D c t f g 1 F E E k G l I R x r 3 f f c 2 P g p V o Y R T u e l Q a J p j M k U j 2 n f U o k F 1 X 6 6 u H W O L q 0 y Q m G k b E m D F u r v i R Q L r W c i s J 0 C m 4 l e 9 T L x P 6 + f m P D W T 5 m M E 0 M l W S 4 K E 4 5 M h L L H 0 Y g p S g y f W Y K J Y v Z W R C Z Y Y W J s P C U b g r f 6 8 j r p 1 K q e W / U e G p X m R R 5 H E c 7 g H K 7 A g x t o w j 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A z 3 + N U g = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " 4 X c 3 8 t 3 i 3 l x J c G 4 M q a R 4 e n t 8 j C 4 = " > A A A B 6 3 i c b V B N S w M x E J 2 t X 7 V + V T 1 6 C V b B U 9 l t K 3 o s e P F Y w X 5 A u 5 R s m m 1 D k + y S Z I W y 9 C 9 4 8 a C I V / + Q N / + N 2 X Y P 2 v p g 4 P H e D D P z g p g z b V z 3 2 y l s b G 5 t 7 x R 3 S 3 v 7 B 4 d H 5 e O T j o 4 S R W i b R D x S v Q B r y p m k b c M M p 7 1 Y U S w C T r v B 9 C 7 z u 0 9 U a R b J R z O L q S / w W L K Q E W w y q X Z d b w z L F b f q L o D W i Z e T C u R o D c t f g 1 F E E k G l I R x r 3 f f c 2 P g p V o Y R T u e l Q a J p j M k U j 2 n f U o k F 1 X 6 6 u H W O L q 0 y Q m G k b E m D F u r v i R Q L r W c i s J 0 C m 4 l e 9 T L x P 6 + f m P D W T 5 m M E 0 M l W S 4 K E 4 5 M h L L H 0 Y g p S g y f W Y K J Y v Z W R C Z Y Y W J s P C U b g r f 6 8 j r p 1 K q e W / U e G p X m R R 5 H E c 7 g H K 7 A g x t o w j 2 0 o A 0 E J v A M r / D m C O f F e X c + l q 0 F J 5 8 5 h T 9 w P n 8 A z 3 + N U g = = < / l a t e x i t > FIG. S3. Ramsey sets mediated through different numbers of single sequences, corresponding to the various n phot reported on the axis below. Experimental data (as blue dots) are reported together with a sinusoidal fit obtained from the case n phot = 2534 (as red lines). The unbalance towards the 0 measurement outcome is evident in the cases n phot = 1, 4. Data-points whose normalisation is higher than 0.5 correspond to Poissonian distributed multi-photon events still present in this case.

THE ROLE OF NOISE FOR LOW PL PHOTON COUNTS
Finally, we observe how for n phot < 4, the Bayesian process fails due to increased experimental noise and reduced statistics, underestimating both the real ω and the uncertainty associated with it. For example, this is evident from Fig. S2c, as the Q.L. does not improve with the number of epochs. In particular, losses in the system cause an asymmetry between ξ 0 and ξ 1 , respectively the overall readout fidelities for the states |0 and |1 (i.e. taken all sources of noise and loss into account). From experimental raw data for n phot = 1 (see Fig. S3), we observed that if we assume ξ 0 ∼ 1, then ξ 1 0.54. This translates in unbalanced output probabilities, that conflict with the underlying assumption made so far of a binomial model for the outcomes E ∈ {0, 1}, with probabilities given by the likelihood in Eq. 3. This level of "poisoning" in the assumed model is evidently beyond the CLE noise robustness [41].
In order to prove there is no fundamental limit preventing MFL to provide correct estimates, within uncertainty, given a correct model, we thus modified the likelihood such that: where ξ ∈ [0, 1], L (0| x, τ ) = 1 − L (1| x, τ ) and for ξ = 1 we recover the usual L (0| x, τ ) = L(0| x; τ ) of Eq. 3. In order to estimate ξ, we use it as the free parameter in a preliminary CLE run against the same dataset, but assuming ω known from the inference process with M = 20275 (i.e. having x = {ξ}). We thus obtain ξ = 0.72, and use this as a known parameter when running MFL with L (B). In principle, ξ could also be estimated from a multi-parameter inference model. The result is reported in Fig. S4. Intuitively, measurement outcomes are interpreted as less informative by the inference process, as E = 0 might be due to additional losses. This effectively slows down the learning rate per-epoch, but at the same time restores a correct behaviour of MFL .
FIG. S4. Noise-compensation in the inference process. a, Estimate of the precession frequency ω from the set 1, using M = 8 ⇒ n phot = 1 average photons collected per step, in peak configuration. In violet the result adopting the usual likelihood in Eq. 3, capable of handling only Poissonian noise in the the data. In blue, results from the modified likelihood Eq. S12, allowing for an extended number of epochs. Shaded areas here represent the median ∼ 68.27 % credible interval provided intrinsically by MFL at each epoch, averaged over 1000 runs. b, Estimates for ω, and uncertainties as a Gaussian fit over the learnt posterior, for the two cases in a, along with some other representative runs from Fig. S2, after 150 epochs. The inference process with no model for infidelities in place, and n phot = 1, falls outside of the plotted interval.

WIDE RANGE OPERABILITY OF MFL METHODS
It is known how in Ramsey experiments, adaptive choices of time can lead not only to scalings beyond the standard quantum limit, but also to improved dynamic ranges for the sensed magnetic field B, up to ω max (B)/σ(ω) ≤ πT /τ min . Given that MFL is adaptive in the choices of the phase accumulation time τ , and we have shown that its precision scaling is Heisenberg limited, it follows naturally that also MFL benefits from the high-dynamic range already reported by previous experiments.
In the main paper, we already showed applicability of MFL for cases in the dataset 3 , with B ∈ [8.3, 550] µT (see Fig. 2 in the main text). Here we complement this study with an additional case (α 2 ) exhibiting B h =713 µT. In the case of this dataset, equivalently to α 1 , M ∼ 20000 single sequences were collected and averaged from the experimental set-up, so it was reasonable to adopt a majority voting scheme to use the additional information in the data. We stress that such high intensities of B tend to make least-squares fit procedures with no initial guess of the parameters fail.
The results in terms of σ(B est ) and precision η 2 scalings are reported in Fig. S5. We observe how after 250 epochs, the difference in the final uncertainties provided by MFL is |σ(B α1 ) − σ(B α2 )| 10 −3 µT. It can thus be considered approximately independent of the strength of the magnetic field. Also the precision scaling is the same, within error, of the one observed for the lower field in α 2 .

TRACKING
In the main text, we tested against experimental data the tracking capabilities of the MFL protocol. In Fig. 4 we reported the results in the case where the magnetic field intensity B is synthetically altered stepwise, at random times, in a fashion completely equivalent to a stochastic time-dependent Poisson process P(t). Such random, abrupt variations in the magnetic field might for example reproduce applicative scenarios such as the raster scanning of a surface embedding magnetic nanoparticles. A sketch of a possible experimental set-up is provided in Fig. S6a. The modifications to the standard CLE inference process required by this particularly demanding tracking scenario are summarised as pseudocode in Algorithm 1. The modifications to the standard inference process adopted amount to detect changes in the sensed parameter, that completely invalidate the current posterior, and thus suggest a reset of the prior as the most effective update step. Without triggering such reset events, huge stepwise changes would otherwise require a long time for MFL to react, because of the little support provided by the prior to the new value.
In Fig. S6b, we show a simulated performance of MFL in a representative run with time-varying B. The figure exemplifies the decrease in the rate of failure events, as the frequency of the oscillating signal is decreased with time. We loosely define failure events, all those τ i at which the quadratic loss of a single run Q.L.(τ i ) R r [Q.L.(τ i , r)]/R, the mean performance achievable by the protocol, estimated across R 1 independent runs. We modify synthetically the magnetic field in the simulations as B(τ ) = B 0 + b cos(ντ ) with b B 0 , equivalently to Fig. 4c of the main text, but in this case we chirp the oscillating frequency for each run, and thus ν(τ ) = ν 0 − kτ , with ν 0 and k constants. We notice how points where the second derivative of the oscillating magnetic signal is highest are those where failure events tend to occur.
Finally, in Fig. S6c, we display the performance expected for MFL when tracking a brownian-like varying signal. Here the 'true' signal is numerically simulated according to an Ornstein-Uhlenbeck process, similarly to the theoretical analysis in [34]. where an NV centre positioned at the end of a scanning microscope is used to scan the magnetic field B in the proximity of a molecule absorbed on a substrate. b, Simulation of MFL capabilities to track a chirped sinusoidal signal, with no experimental overhead and only Poissonian noise present in the data (i.e. high-fidelity readout). The frequency ω is linearly increased after each update step. c, Average performance, mediated over 1000 independent runs, of CLE tracking a magnetic field undergoing an Ornstein-Uhlenbeck stochastic process. In b&c, shaded areas corresponds to the usual ∼ 68 % percentile credible range adopted in this paper.