The search for low-mass axion dark matter with ABRACADABRA-10cm

Two of the most pressing questions in physics are the microscopic nature of the dark matter that comprises 84% of the mass in the universe and the absence of a neutron electric dipole moment. These questions would be resolved by the existence of a hypothetical particle known as the quantum chromodynamics (QCD) axion. In this work, we probe the hypothesis that axions constitute dark matter, using the ABRACADABRA-10cm experiment in a broadband configuration, with world-leading sensitivity. We find no significant evidence for axions, and we present 95% upper limits on the axion-photon coupling down to the world-leading level $g_{a\gamma\gamma}<3.2 \times10^{-11}$ GeV$^{-1}$, representing one of the most sensitive searches for axions in the 0.41 - 8.27 neV mass range. Our work paves a direct path for future experiments capable of confirming or excluding the hypothesis that dark matter is a QCD axion in the mass range motivated by String Theory and Grand Unified Theories.

Two of the most pressing questions in physics are the microscopic nature of the dark matter that comprises 84% of the mass in the universe and the absence of a neutron electric dipole moment. These questions would be resolved by the existence of a hypothetical particle known as the quantum chromodynamics (QCD) axion. In this work, we probe the hypothesis that axions constitute dark matter, using the ABRACADABRA-10 cm experiment in a broadband configuration, with worldleading sensitivity. We find no significant evidence for axions, and we present 95% upper limits on the axion-photon coupling down to the world-leading level gaγγ < 3.2 × 10 −11 GeV −1 , representing one of the most sensitive searches for axions in the 0.41 − 8.27 neV mass range. Our work paves a direct path for future experiments capable of confirming or excluding the hypothesis that dark matter is a QCD axion in the mass range motivated by String Theory and Grand Unified Theories.
The axion is a well-motivated candidate to explain the particle nature of dark matter (DM) [1][2][3]. This pseudoscalar particle is naturally realized as a pseudo-Goldstone boson of the Peccei-Quinn (PQ) symmetry, which is broken at a high scale f a ; the axion would be exactly massless but for its low-energy interactions with quantum chromodynamics (QCD) [4][5][6][7]. The axion mass is tied to the scale f a by m a ≈ 5.7 (10 15 GeV/f a ) neV [8].
The range of scales f a ≈ 10 15 − 10 16 GeV is particularly compelling because of connections to String Theory [9] and Grand Unification [10,11], and in the corresponding mass range of m a ∼ 1 − 10 neV the axion may naturally explain the observed DM abundance [11,12]. In this Article we provide the most sensitive probe of axion dark datter (ADM) in this mass range to date.
ADM that couples to photons modifies Ampère's law such that in current-free regions with E and B the electric and magnetic fields, respectively, a(x, t) the ADM field, and g aγγ the axionelectromagnetic coupling constant. In the presence of a static external magnetic field ADM behaves like an effective current density J eff = g aγγ (∂ t a)B. If the axion * These authors contributed equally; salemi@mit.edu † These authors contributed equally; fosterjw@umich.edu ‡ These authors contributed equally; ouelletj@mit.edu § lwinslow@mit.edu makes up all of the observed DM then, to leading order in the DM velocity, ∂ t a = √ 2ρ DM cos(m a t), with ρ DM ≈ 0.4 GeV/cm 3 the local DM density [13]. It was pointed out in [14,15] that the effective current induces an oscillating secondary magnetic field which may be detectable in the laboratory without the aid of a resonant cavity for sufficiently small m a . The oscillation frequency is given by f = m a /(2π), with bandwidth δf /f ≈ 10 −6 arising from the finite axion velocity dispersion [16]. In this work we leverage this theoretical principle to search for axions in the laboratory.
The most common detection strategy for ADM is through the electromagnetic coupling g aγγ , which for the QCD axion is directly proportional to the mass m a . Until recently, experiments have focused on searching for axions in the mass range 1 m a 40 µeV, which is well-suited to microwave cavity searches [17][18][19][20][21]. In the low-mass regime targeted in this work, the Compton wavelength of the axion λ C ∼ km is much larger than the experimental apparatus, and so the sensitivity of the experiment improves with volume as V 5/6 , roughly independent of m a until the size of the experiment approaches λ C [15]. This scaling is important because the expected coupling g aγγ is smaller at lower masses, requiring ever-more-sensitive experiments to achieve a detection. ABRACADABRA is an experimental program designed to detect axions at the Grand Unification scale using a strong toroidal magnetic field [15]. ABRACADABRA is part of a suite of ADM experiments which together aim to probe the full QCD axion parameter space [19][20][21][22][23][24][25][26][27][28]. The experiment we report on here, ABRACADABRA-10 cm, is a prototype for a larger ADM detector that would be arXiv:2102.06722v1 [hep-ex] 12 Feb 2021 sensitive to the QCD axion. This Article presents data collected in 2020 that is up to an order of magnitude more sensitive than our previous results [29] and places strong limits on ADM in the 0.41 − 8.27 neV range of axion masses.

ABRACADABRA-10 CM DETECTOR
The ABRACADABRA-10 cm detector is built around a 12 cm diameter, 12 cm tall, 1 T toroidal magnet fabricated by Superconducting Systems Inc [30]. The axion interactions with the toroidal magnetic field B 0 drive the effective current, J eff , which oscillates parallel to B 0 and sources a real oscillating magnetic field through the toroid's center. The oscillating magnetic flux is read out with a two-stage DC-SQUID via a superconducting pickup in the central bore. Unlike other axion detector designs, this novel geometry situates the readout pickup in a nominally field-free region unless axions are present [15]. The detector can be calibrated by injecting fake axion signals (i.e., AC currents) through a wire calibration loop that runs through the body of the magnet. The detector, illustrated schematically in Fig. 1, is located on MIT's campus in Cambridge, MA.
In 2019, we performed several detector upgrades from the Run 1 configuration in order to improve our sensitivity [29,31]. In this Article we report the results of the subsequent data campaign (Run 3), collected after the detector upgrade. Run 3 data consists of ∼430 hours of data collected from June 5 to June 29, 2020.
Before the upgrades were complete, we took additional, uncalibrated data (Run 2), which is not presented here. A subset of that data was instead used to develop our data analysis procedure in order to run a blind analysis on the Run 3 data, as described in detail below.
The total expected axion power, A, coupled into our readout pickup is related to the axion-induced flux Φ a as where G is a geometric coupling, V is the magnetic field volume, B max is the maximum value of |B 0 |, and the angle brackets denote the time average [15,32]. Run 1 utilized a 4.02 cm diameter pickup loop made from a 1 mm diameter wire, giving G ≈ 0.027. In 2019, we replaced this readout with a 10 cm tall, 5.1 cm diameter superconducting cylinder pickup centered in the toroid bore. This consisted of a 150 µm-thick Nb sheet wrapped around a polytetrafluoroethylene (PTFE) cylinder. This design yields a stronger geometric coupling to J eff of G ≈ 0.031 and decreases the inductance of the pickup [15]. We compute G using electromagnetic simulations in the COM-SOL Multiphysics package [31,33].
To amplify our signal, Φ a is coupled into the readout SQUID through the pickup circuit (see Fig. 1) yielding a transformer gain M in /L T , where M in is the input coupling to the SQUID, and L T ≡ L p + L in + L wires is the total inductance of the pickup circuit, with L p the  Figure 1. Top: Schematic of ABRACADABRA-10 cm showing the effective axion-induced current (blue), sourced by the toroidal magnetic field, generating a magnetic flux (magenta) through the pickup cylinder (green) in the toroid bore. Bottom: Simplified schematic of the ABRACADABRA-10 cm readout (full circuit diagram in Supp. Fig. S1). The pickup cylinder Lp is inductively coupled to the axion effective current J eff . The power spectrum of the induced current is read out through a DC SQUID inductively coupled to the circuit through Lin. An axion signal would appear as excess power above the noise floor at a frequency corresponding to the axion mass.
pickup cylinder inductance, L in the input inductance of the SQUID package, and L wires the parasitic inductance, dominated by the twisted pair wiring. The SQUID, manufactured by Magnicon [34], is read out using Magnicon's XXF-1 SQUID electronics operating in closed feedback loop mode. The Run 1 sensitivity was limited by parasitic inductance in the NbTi wiring of this circuit that placed a lower limit on L T 1.6 µH. During the upgrade, we replaced this wiring, moving the SQUIDs closer to the detector to reduce the wire length. Based on calibration data, we found that the total impedance in the circuit is ∼ 800 nH. Finally, the SQUID was operated at a higher flux-to-voltage gain setting of 4.3 V/Φ 0 in Run 3, compared to the previous Run 1 which we ran at 1.29 V/Φ 0 due to higher levels of environmental noise. This change does not directly improve the signal gain, but does reduce system noise. We also improved our noise floor by reducing the operating temperature of the SQUID package from ∼870 mK to ∼450 mK. All together, the upgrade campaign increased the expected power coupled into our readout and reduced the total system noise.
The improved sensitivity of the upgraded readout circuit also amplified the low-frequency vibrational backgrounds seen in Run 1, which caused the SQUID amplifier to rail when the magnet was on. In order to correct this, we implemented an active feedback stabilization (AFS) system to reduce the low-frequency noise, which is discussed further in the Supplementary Information (SI).
As in Run 1, the magnet and pickup were placed inside a superconducting tin-copper shield and hung from a passive vibration isolation system, consisting of a string pendulum and spring, within an Oxford Instruments Triton 400 dilution refrigerator [31]. The magnet and pickup were operated at 1 K and the SQUIDs were at ∼400 mK, which kept the readout circuit superconducting over the course of the run and kept thermal noise subdominant to SQUID flux noise. Following the procedure of Run 1, the output of the SQUID was run into an 8-bit AlazarTech AT9870 digitizer via a 70 kHz-5 MHz bandpass filter. The digitizer was locked to a Stanford Research Systems FS725 Rubidium frequency standard in order to maintain clock accuracy over the coherence time of the axion signal, ∼ 1 s for signals at 1 MHz, throughout the data and calibration runs. The gain shown here is defined as the change in amplifier output voltage over a corresponding change in input flux amplitude on the pickup cylinder (∂Vout/∂Φa). Both transfer functions roll off at high frequencies due to the amplifier bandwidth, which we estimate to have a cutoff frequency of approximately 1 MHz. We believe the difference in calculated and measured gain is due to inconsistency in the total inductance of the pickup circuit.
We performed in situ magnet-on and magnet-off calibrations in the data-taking configuration by attaching a harmonic signal generator to the calibration circuit and scanning across frequencies and amplitudes. The calibration signal was attenuated and fed into the calibration loop, mimicking the axion effective current signal J eff up to geometric factors. The geometry is modeled in COMSOL Multiphysics [33], from which we extract the coupling between both the calibration loop and axion effective current signal to the pickup cylinder. By combining the results of the calibration scans and geometric modeling, we can determine the effective gain, ∂V out /∂Φ a , of the SQUID amplifier output voltage as a function of flux on the pickup cylinder (see Fig. 2). This procedure is analogous to that used in Run 1 [31].
The gain measured by the calibrations for Run 3 differs from the calculated gain by a factor of ∼1.8. By individually calibrating various components of the endto-end circuit, we determined that this is likely due to a misestimation of the calculated total inductance of the pickup circuit. The calibrated SQUID noise floors, which set the lower limits of our sensitivity, are shown in Supp. Fig. S2.

DATA COLLECTION
The axion search data was collected using an identical procedure as in Run 1 [31]. The SQUID amplifier output voltage was sampled at a frequency of 10 MS/s, with a ±40 mV voltage window. The data were stored as a series of power spectral densities (PSDs), which were computed on-the-fly:F 10M with a Nyquist frequency of 5 MHz and frequency resolution of ∆f = 100 mHz,F 1M with a Nyquist frequency of 500 kHz and frequency resolution of ∆f = 10 mHz, and a continuous data stream sampled at 100 kHz that can be analyzed offline.F 10M (F 1M ) is averaged over 800 s (1600 s) before being written to disk. In this work, we used theF 10M to search the frequency range from 500 kHz − 2 MHz, and theF 1M spectra to search from 50 − 500 kHz.

DATA ANALYSIS AND RESULTS
An axion signal is expected to manifest as a narrow peak in the PSD data, as illustrated in Fig. 1. The width and overall shape of the signal are set by the local DM velocity distribution, which we take to be the Standard Halo Model with a velocity dispersion of v 0 = 220 km/s and a boost from the halo to the solar rest frame of v = 232 km/s [35]. With the speed distribution and local DM density fixed, the two free signal parameters are the axion mass, m a , which determines the minimum frequency of the signal, and the coupling g aγγ , which determines its amplitude through Eq. (2). Our analysis procedure closely follows the approach used in the Run 1 search [29,31] based on [32], which constrains the allowable values of g aγγ at each possible value of m a .
The search is performed with a frequentist loglikelihood ratio test statistic (TS); exact expressions are provided in the SI (see also [31]). Our broadband search procedure probes ∼ 11.1 million mass points between 0.41 − 8.27 neV (100 kHz − 2 MHz) in Run 3. As we expect only one axion signal in our search (or at most a small number), the majority of the TS values are probing the distribution of the null hypothesis. Assuming only Gaussian noise, we expect this null distribution to be a one-sided χ 2 -distribution [32], which was indeed the case in Run 1 [29,31]. However, the increased sensitivity from the detector upgrades introduced non-Gaussian noise sources that required us to modify our Run 1 analysis procedure. We developed and validated our new procedure on a randomly-selected sample of 10% of Run 2's ∼ 13.7 million mass points, after which we unblinded the Run 3 data with the procedure fixed.
In Run 1, we searched for an axion signal as a feature appearing above a flat white noise background. For each m a , the search was performed in a narrow window around that mass with the background level allowed to vary independently in each window. For the Run 2 and Run 3 analyses we allow the mean background level of the noise to vary linearly with frequency uniquely in each sliding window. We use sliding windows of relative width δf /f ≈ 5.5 × 10 −6 , starting at f = (1 − 10 −6 ) × m a /(2π).
As in Run 1, we use the magnet-off data to veto frequency ranges that also display statistically significant TS values when |B 0 | = 0 and thus the axion power should vanish. However, we observed narrow single-bin 'spikes' that appear to drift in frequency on the timescale of our data collection (see Supp. Fig. S6 for an example). If interpreted in isolation, these spikes sometimes correspond to statistically-significant excesses. Nevertheless, they are inconsistent with axion signals and are most likely due to unknown environmental noise sources near the detector, persisting throughout Runs 2 and 3; indeed, many of the peaks are distributed at multiples of 50 Hz. To remove these artifacts, we leverage the fact that the PSDs are saved periodically to disk yielding a time evolution of the environmental backgrounds; we veto single-bin spikes that move in frequency. We place a 1.0 Hz veto window around these single-bin spikes. These cuts remove 3.8% of the axion mass points from our search in the Run 3 data. The magnet-off vetoing procedure removes an additional 0.07% of mass points.
After implementing the vetoes, we found the distribution of TS values in the 10% Run 2 validation sample deviated from the expected χ 2 distribution; for example, there were 27 mass points with TS > 25 whereas from the χ 2 distribution we would have expected less than one. To account for the deviation in the TS distribution from the χ 2 distribution in a data-driven fashion, we follow the prescription developed and implemented in [36][37][38] for searches for DM-induced lines in astrophysical gammaray data sets. At each mass point, we introduce and profile over a systematic nuisance parameter that would be degenerate with the signal parameter but for a prior that is determined by forcing the TS distribution to follow the χ 2 distribution. Specifically, we force the TS distribution to match the null hypothesis distribution at 4σ local significance. This is described further in the SI.
After the nuisance parameter and vetoing procedures, we construct a likelihood as a function of g aγγ at each mass point. The final distribution of TS values computed from the likelihoods is shown in Fig. 3; no TS values were found in excess of the 5σ look-elsewhere effect-corrected discovery threshold. In the calibration of our analysis procedure, we found one signal candidate in the Run 2 data at over 5σ global statistical significance (see Supp.  The y-axis indicates the fraction of mass points tested with a discovery TS at or above the value on the x-axis. Under the null hypothesis, the distribution should follow the survival function of the one-sided χ 2 distribution with one degree of freedom ("Expected," dotted gray). This is indeed the case after data cleaning for e.g. single-channel excesses in time slices, magnet-off vetoes, and the inclusion of a systematic nuisance parameter, which is tuned in a sliding window at 4σ local significance to give the correct number of excesses at or above that significance, masking the signal of interest. No excesses are found beyond our indicated 5σ LEE-corrected discovery threshold. Fig. S6, where a corresponding feature can be seen in the magnet-off data), but that mass point is not significant in the Run 3 analysis.
In the absence of an excess consistent with an ADM origin, we can determine 95% one-sided upper limits on g aγγ as a function of the mass, m a . The average 95% upper limits from the Run 3 analysis along with their 1σ and 2σ expectations under the null hypothesis are indicated in Fig. 4. In that figure we compare our upper limits to those found from the ADM experiment SHAFT [27] along with results from the solar axion experiment CAST [39] and astrophysical X-ray searches (SSC) [40], both of which do not require the axion to comprise the DM. The fraction of vetoed mass points is illustrated in a sliding window in Supp. Fig. S4, which also shows the distribution of data fractions included in the analyses. In Supp. Fig. S5 we illustrate the magnitude of the systematic nuisance parameter g nuis.
aγγ , while in Supp. Fig. S7 we show what the limit would be without the nuisance parameter tuning. Supp. Fig. S8 shows that the 95% upper limit and discovery TS behave as expected when synthetic axion signals are injected into the real data.

DISCUSSION
In this work we present the results from ABRACADABRA-10 cm's second physics campaign, searching for ADM in the mass range 0.41-8.27 neV. We find no evidence for ADM and constrain the axion-photon coupling down to the world-leading level g aγγ 3.2 × 10 −11 GeV −1 at 95% confidence. Our work motivates key elements of the design of future larger-scale experiments. These include the mitigation of stray fields from the magnet and vibrations induced by a modern pulse-tube-based cryogenic system, which limits our current low-frequency reach. The ABRACADABRA-10 cm results presented in this Article demonstrate the power of mature simulations for optimizing the design of the detector and for modeling the calibration response. An advanced and novel analysis framework was used to identify noise sources and account for systematic uncertainties in a data-driven fashion.
Our work identifies three areas that can be addressed in the next physics campaign: (i) moderate improvements (up to a factor ∼0.4 in g aγγ ) could be achieved by further reducing the wire and SQUID inductances, (ii) better shielding from environmental noise could increase the sensitivity to g aγγ by an order of magnitude at low frequencies, so long as (iii) the fringe fields are reduced or better vibrationally isolated (see Supp. Fig. S2). To significantly increase the sensitivity of the experiment, larger magnets with higher fields are needed since the sensitivity to g aγγ scales with the detector volume V and field B 0 as g −1 aγγ ∼ B 0 V 5/6 [15]. The addition of a resonant readout circuit could enhance the reach in g aγγ by an additional ∼2 orders of magnitude depending on the scanning strategy, with a high frequency readout permitting sensitivity to masses up to 800 neV [15,41]. ABRACADABRA is merging with the DMRadio program to realize a series of experiments that chart a path toward discovering the QCD axion in the parameter space corresponding to new physics at the Grand Unification scale [42][43][44][45][46].

METHODS
The ABRACADABRA-10 cm detector is mounted in an Oxford Instruments Triton 400 dilution refrigerator (DR). The detector is suspended from a vibration isolation system consisting of a ∼2 m long Kevlar thread suspended from a spring. The spring provides isolation in the vertical direction with a rolloff frequency of f ≈ 1.4 Hz, while the long thread behaves like a pendulum with rolloff frequency of 0.4 Hz. The detector is thermalized to the coldest stages of the DR through thin copper strips that cool the detector while minimizing vibration transmission. The fridge was operated without the Turbo Molecular pump in the condensing circuit; all thermometry was disabled and unplugged during data taking to reduce RF noise.
The data are collected as a series of time averaged PSDs. The output voltage is digitized at a sampling frequency of 10 MS/s. It is then Fourier transformed on the fly in 10 s windows and accumulated into a running average PSD calledF 10M . The data stream is simultaneously down-sampled to a sampling frequency of 1 MS/s, Fourier transformed in 100 s windows and accumulated into a running PSD calledF 1M .F 10M (F 1M ) is written to disk every 800 s (1600 s) and then reset, yielding a detailed time evolution of each spectrum. Two collections, which we refer to as Run 2 and Run 3, were performed. Run 2 contained 1,168,000 s (324 h) of magnet on data in 1,460 F 10M spectra and 700F 1M spectra and 320,000 s (89 h) of magnet off data in 388F 10M spectra and 196F 1M spectra. Run 3 contained 1,091,200 s (303 h) of magnet on data in 1,364F 1M spectra and 682F 1M spectra and 448,000 s (124 h) of magnet off data in 560F 10M spectra and 280F 1M spectra. Due to differences in the readout configuration, Run 2 was not as sensitive as Run 3 and the results are not presented here, though the Run 2 data are used for tuning the analysis procedure.
The data were analyzed in search of the expected axion line-shape using a profiled Gaussian likelihood with a linear background model, estimating the data variance as a floating nuisance parameter at 11.1 million axion mass points between 100 kHz and 2 MHz. Data from Run 3 was analyzed independently based on an analysis validated on 10% of Run 2 data. The data were cleaned by removing spectral excesses confined to a single frequency bin. The analysis was initially applied to independently divided time-continuous subintervals of the data (each of size approximately 5% of the full data volume), allowing for a filtering of transient excesses applied independently at each mass. Data which passed the filtering were then stacked and analyzed for each run, then joined to produce a 95 th percentile upper limit and detection significance. The detection significance was then corrected by a Gaussian penalty term with a hyperparameter to accommodate systematic effects that may produce spurious detections.

DATA AVAILABILITY
Source data for this paper are made publicly available; 1 all other data may be made available upon reasonable request.

CODE AVAILABILITY
The code that supports the results presented in this paper may be made available by the corresponding authors upon reasonable request.

ACKNOWLEDGMENTS
We would like to thank Kent Irwin and our DMRadio colleagues for useful discussions and look forward to the next-generation experiment. We would like to thank those that took part in Run 1 of ABRACADABRA-10cm including Zachary Bogorad, Janet Conrad, Joseph Formaggio, Joe Minervini, Alexey Radovinsky, Jesse Thaler, and Daniel Winklehner. We thank Christopher Dessert for useful analysis discussion.  The sensitivity of ABRACADABRA-10 cm to ADM is set by the coupling strength between the axion induced current J eff and the readout SQUIDs. This coupling can be conceptually split into two parts: the coupling between J eff and the pickup, and the coupling between the pickup and the SQUIDs. Before Run 2, the ABRACADABRA-10 cm detector was upgraded in two ways to increase each of these two coupling strengths.
The first step of the upgrade was the installation of the superconducting pickup cylinder. The pickup cylinder geometry more effectively cancels the flux induced by J eff and thus couples more strongly to it. The cylinder was constructed out of a 150 µm thick sheet of Nb wrapped around a PTFE tube, secured with Kapton tape. The resulting cylindrical pickup was 10 cm tall with a 5.1 cm diameter and centered vertically in the magnet bore. This is close to the maximum diameter that could practically fit. A 1 mm gap was left in the wrapping of the Nb sheet to prevent electrical contact and the formation of a complete loop. The PTFE tube was glued and clamped onto the magnet support structure inside the superconducting shielding can. From experience in Run 1, a strong mounting was critical to reducing relative motion between the pickup and magnet.
The second step of the upgrade was a replacement of the wiring between the pickup cylinder and the SQUID readouts. The new wiring -along with the new pickup cylinder -reduced the total inductance of the readout circuit, resulting in more current in the SQUIDs. The new wiring consists of 75 µm superconducting solid NbTi twisted-pair wires that are spot welded to two corners of the Nb sheet. Spot welding ensures a superconducting connection between the Nb sheet and the NbTi wires. We used a series of four spot welds on each corner for redundancy, in case of breakage during handling or due to differential thermal contraction. These wires were then taped to the PTFE cylinder with Kapton in order to minimize stress on the connections. The wires run ∼57.5 cm to the SQUID input. The wires are shielded in a superconducting capillary which extends from about 1 cm from the Nb sheet up to the point where the wires enter the SQUID shielding can. In addition to providing electric shielding, this capillary also reduced the inductance per unit length. The inductance of the new cylinder and the readout wires are calculated using simulations in COM-SOL Multiphysics to be 20 nH and 288 nH, respectively. This decrease in the readout inductance was confirmed with calibrations up to a factor ∼1.8.

II. ACTIVE FEEDBACK SYSTEM
A major challenge encountered in Run 1 was low frequency vibrations converting stray fields from the magnet into low frequency noise. Since this noise is generally below the frequency ranges of interest, we were able to simply filter it and ignore it. In Runs 2 and 3, the higher gain of the upgraded detector amplified this vibrational noise enough to rail the SQUID amplifier. Because most of this noise was relatively slow -below ∼ 1 kHz -we installed an active feedback system to cancel it. We fed part of the output signal into the input of a Stanford Research Systems SIM960 analog PID controller. The output was filtered through a 1 kHz low-pass filter (LPF) and fed into the calibration loop via a 10 dB warm attenuator followed by 40 dB of cold attenuation, as sketched in Fig. S1. The LPF guaranteed that the feedback system could not interfere with signals in our ROI, while the warm 10 dB attenuator reduced the power dissipated on the cold stages of the fridge.
Before Run 3, we added power combiners and power splitters to the active feedback circuit in order to better impedance match and isolate the various parts of the circuit. This improved our in situ calibration, as described below.

III. DETECTOR CALIBRATION
At a basic level, the ABRACADABRA-10 cm readout converts the flux through the pickup cylinder Φ p to an output voltage from the SQUID amplifier, V SQUID . The detector calibration provides an end-to-end measurement of the detector response ∂V SQUID /∂Φ p to an axion-like signal across the full range of frequencies being searched. A schematic of the detector configuration for the Run 3 calibration can be seen in Fig S1. We generate a fixed frequency signal of known amplitude using an Stanford Research Systems SG380 signal generator. This signal is attenuated by 93 dB before passing into the calibration loop in the detector. The current in the calibration loop generates a flux through the pickup cylinder, inducing a current and response in the readout circuit in the same way that an axion signal would, up to geometric factors.
The response of the system to a calibration signal can be written as where V ADC is the RMS voltage measured by the digitizer, I C is the RMS current entering the calibration loop,  Figure S1. ABRACADABRA-10 cm Run 3 calibration circuit diagram. A fake axion signal generated in the signal generator is attenuated by 93 dB (including 3dB loss in the combiner) before being coupled into the pickup cylinder analogously to an axion signal. The resulting power excess is readout on the SQUID and measured in the ADC digitizer. In Run 3, calibration is performed with the magnet turned on and the active feedback circuit running. During data taking, the signal generator is replaced with a 50 Ω terminator. The flux-lock feedback loop (FLL) feedback resistor and inductor are omitted for clarity. and V Sig is the peak-to-peak voltage output by the signal generator. The first and last terms in this conversion are determined by the warm electronics and cold attenuators, and can be measured directly, while the third term is the mutual inductance between the calibration loop and pickup cylinder, which is modeled in COMSOL. By dividing the measured end-to-end calibration by these three terms, we are left with the resulting flux to voltage conversion of the ABRACADABRA-10 cm readout circuit ∂V SQUID /∂Φ p . During Run 3, the calibration was performed in an identical configuration to data taking, namely with the magnet on and the AFS active. The resulting calibration can be seen in Fig. 2, and agreed very well with our calculated signal gain. The rolloff above ∼1 MHz corresponds to the finite bandwidth of the SQUID electronics. The Run 3 calibration circuit diagram is provided in Fig. S1. During Run 2, we were not yet able to accurately calibrate the detector with the AFS system in place, which led to the decision to not present this data in our limits. Instead, the uncalibrated Run 2 data was used to tune our analysis procedure.
The flux noise determines the lower limit of our sensitivity. In particular, the 95% upper limit on g aγγ under the null hypothesis scales like the square root of the flux noise, which is shown in Fig. S2. In that figure we illustrate three different noise levels through the SQUID: (i) the measured magnet on flux, which is the relevant flux for the axion signal analysis; (ii) the magnet off flux; (iii) the flux measured in a similar SQUID that is not connected to the pickup loop circuit (labeled "open input"). The increased noise level in the magnet off data relative to the open input SQUID is likely the result of imperfect shielding, with environmental noise magnified  Figure S2. The SQUID flux for Run 3 over the 70 kHz to 2 MHz frequency range at which we collect data. The magnet on noise level (magenta) is elevated compared to data taken with the magnet off (gold) primarily due to vibrating fringe magnetic fields. For comparison, the noise level from a similar SQUID without anything plugged into its input is plotted in teal.
by the pickup loop. On the other hand, when the magnet is on increased noise is apparent at low frequencies, which is the result of the magnetic fringe fields giving frequency-dependent flux noise because of vibrations. Increasing the quality of the shielding and decreasing either the magnitude of the fringe fields or their vibrational coupling to the pickup loop would improve the sensitivity.

IV. LIKELIHOOD ANALYSIS
In this section, we describe the implementation of the analysis framework used to produce upper limits on g aγγ and determine detection significances for potential excesses. We first define the profiled Gaussian likelihood used herein, followed by our procedure for cleaning the data to enable the removal of spurious excesses and confounding backgrounds. We then detail our treatment of a nuisance hyperparameter used to address potential systematics in the data, describe our results in terms of survival functions and upper limits on g aγγ , and demonstrate the efficacy of our analysis pipeline with injected signal tests. Though uncalibrated and not presented here, the Run 2 data was used while constructing the likelihood analysis framework. As such, we include it in the discussion below.

A. Likelihood for Axion Signal Detection
The likelihood analysis utilized in this work is performed using the signal modeling formalism developed in [32], which was also used in studying the Run 1 results [29,31]. Our starting point is a series of N samples of the flux in the pickup loop {Φ n }, made over a collection time T and at a sampling frequency f = 1/∆t (such that N ∆t = T ). In the presence of an axion, this flux will receive a contribution from both the DM signal and any background. The mean expectation for the PSD at a frequency f k = k/T is with µ k is the mean expected background at this frequency. The signal strength parameter A is given in (2) and is controlled by the unknown g aγγ , while s k is the signal template for a specific axion mass: Here, v ω = 4πf k /m a − 2 and f (v) is the local axion speed distribution, which we take to be the Standard Halo Model with boost velocity v = 232 km/s and velocity dispersion v 0 = 220 km/s. To the extent the background is Gaussian in the time domain, the PSD formed from this data will be exponentially distributed, and the sum of multiple PSDs formed during data stacking will be Erlang-distributed. Nevertheless, in the limit of a large number of stackings, the Erlang-distribution becomes normally-distributed. For this reason we are justified in analyzing the data using a Gaussian likelihood.
In detail, the likelihood used is given by where d k is the average stacked data, A and s k determine the axion signal as described above, µ k is the background model (specified by parameters a), and σ is the standard deviation which we will treat as a nuisance parameter, and therefore estimate directly from the data. For a given axion mass m a , the signal only has support over a narrow frequency range, and therefore we truncate the likelihood to k values between m a (1 − (v + v 0 ) 2 /2)/2π and m a (1 + 2(v + v 0 ) 2 )/2π. Over this narrow range, we find the background is adequately described by a first order polynomial, defined by the two-component vector a (c.f. Run 1 where the background in each signal window was described by a flat white-noise spectrum). In summary, our likelihood is a function of five parameters: m a and A, which define the location and normalization of the signal, and nuisance parameters a and σ, which describe the mean size, slope, and fluctuations of the background. Our goal is to use the likelihood in (S4) to search for deviations from the background only distribution indicative of the presence of an axion. To do so we define the following test statistic (TS), which is a log-likelihood ratio of the signal and null models, t(m a , A) = 2 ln L(d|m a , A;â,σ) L(d|m a , A = 0;â,σ) .
Hatted background quantities are fixed to the value at which the likelihood attains its maximum value, given the specified signal values (i.e. for A = 0,â andσ will in general take different values in the numerator and denominator). In other words, in defining this TS, we profile over the background nuisance parameters. The above test statistic is defined for any m a and A. For a given m a , we then define the discovery TS as The maximization of A is initially performed over a range including positive and negative values, which is critical for the valid interpretation of TS as a χ 2 -distributed quantity under Wilks' theorem; intuitively, background fluctuations below the mean are just as likely as those above. However, as the presence of an actual axion signal will only result in positive spectral excesses, we take TS(m a ) = 0 when the test statistic is maximized with A < 0. Accordingly, the discovery test statistic is expected to have the following asymptotic distribution which is expressed in terms of χ 2 k=1 , the probability density function for the χ 2 -distribution with one degree of freedom, and a Dirac δ function.
Using the test statistic in (S5), we search for evidence of ADM with masses m a such that the signal would appear within the frequency range f min = 100 kHz to f max = 2 MHz. The local significance of any excess can be quantified by inverting the distribution in (S7). In order to cover our entire frequency between f min and f max , this search is performed in 11.1 million signal windows. As such, the local significance can be misleading and we should instead interpret results after accounting for the look-elsewhere effect (LEE). In doing so, we also need to account for the fact that due to the finite extent of the axion signal template s k , nearby windows are correlated. We account for this self-consistently using the formalism developed in [32], from which we compute the N σ detection threshold, accounting for the LEE by expressed in terms of Φ, the cumulative density function of the zero mean and unit standard deviation normal distribution, with Φ −1 its inverse. Using this formalism, we find that the 5σ detection threshold accounting for the LEE is TS thresh ≈ 55.
A direct application of the formalism outlined thus far to the Run 2 and 3 data sets would result in a number of excesses at moderate or even high significance. Rather than interpreting this result as the discovery as a large number of ADM signatures, we interpret these as false positives sourced by coherent backgrounds that are not adequately captured by our null model. We employ two strategies for improving the background model in light of this. Firstly, we apply a data cleaning procedure in order to remove excesses inconsistent with that expected for ADM, for example transient spectral features or features that appear also when the magnet is off. Secondly, after applying our data cleaning pipeline, we modify our likelihood with a nuisance parameter tuned against the ensemble of observed significance values in the clean dataset; a data driven method for ensuring the quoted significance is consistent with the distributions observed directly in data. After applying both corrections factors, we find no significant excesses remain in our combined dataset. We now describe each of these strategies for improving our background treatment in more detail. We emphasize that all data cleaning and analysis procedures were developed and tested on the 10% of the Run 2 data which was unblinded and was then applied identically to the Run 3 data. The 10% was obtained by subdividing the full Run 2 dataset into 1,000 frequency subsets of equal size, and then taking the first 10% of each subset in frequency.

B. Data Cleaning Procedure
Many of the excesses present in the uncleaned data are characterized by a narrow spectral feature, often present in a single frequency bin. The features often drift throughout our collection time, and appear at regular frequency intervals. Although such features are inconsistent with the axion signal expectation, which should be distributed over several frequency bins, such narrow features are far more consistent with our signal model than our linear background model, and therefore result in high-significance TS values.
A notable example is the background resulting from AM radio broadcasts: these manifest as large excesses at uniform 10 kHz intervals from 560 kHz to 1.60 MHz. We identify these AM radio signals in our data and remove them with a mask of width 15 Hz centered on the radio signal peak, beyond which the radio signal falls below our noise floor. In other cases the origin of the features is unclear, although the fact that many appear at 50 Hz intervals suggests a universal environmental origin. Regardless, we remain agnostic to their origin and instead remove them using a data-driven procedure we now outline.
For each frequency bin, we identify a single-bin excess as follows. We determine the mean and standard deviation of the data on either side of the bin of interest; in particular, we use the 10 bins on both sides, ignoring the immediately adjacent frequencies. We then use these results to calculate the significance of the data observed in the bin of interest. We repeat this procedure for each frequency bin in each independently collected dataset, i.e. the data before it is stacked. If the bin of interest attains a significance δχ 2 > 100 in any one dataset, then it is flagged for masking. If the bin is not flagged by this procedure, we then stack the dataset and repeat this procedure once more. If after stacking, the bin now has δχ 2 > 35, then it is again flagged. For all flagged bins, we mask the 21 frequencies centered on the bin of interest. The motivation for considering the individual and stacked datasets is to identify both excesses that drift with time and also those that are only significant in the stacked data where we perform our fiducial analysis.
After the single-bin spikes have been identified and removed, we perform an initial analysis of the data. We analyze theF 10M (F 1M ) data providing a frequency resolution of 0.1 (0.01) Hz for axions which would produce a signal in the 500 kHz -2 MHz (50 -500 kHz) frequency range. We stack the Run 2F 10M (F 1M ) data, which consists of 1460 (700) spectra, into 20 subintervals, each of which are initially analyzed independently. For each axion mass, each subinterval is analyzed independently, with the 50% of subintervals which realize the smallest values of the TS for discovery and any additional subintervals which have TS < 9 accepted. The accepted subintervals are then stacked into a single dataset and analyzed. An analogous procedure is applied to the Run 3 data, where theF 10M (F 1M ) data, consisting of 1364 (682) spectra, are divided into 22 subintervals. This TS filtering procedure was implemented in order to mitigate the impact of transient excesses that imitate an axion signal in some of the subintervals and might produce a spurious excess if included in the stacked data. With this exclusion criteria, under the null, each spectrum is expected to be excluded with probability 0.1%, making this a relatively conservative exclusion criterion, although it does have the effect of somewhat weakening the detection sensitivity of the analysis. The binned data acceptances for the complete Run 3 analysis are shown in the left panel of Fig. S4, which show that all data is accepted into the stacked analysis data for the overwhelming majority of mass points.
Finally, we perform a series of vetoes aimed at removing any remaining excess which have their origins in unmodeled backgrounds or instrumental effects. We directly analyze stacked, unfilteredF 10M andF 1M data which are collected with the magnet off. Since no axion detection can be made with the magnet off, any mass points which are excesses at TS > 16 in both the magneton and magnet-off data are vetoed.

C. Nuisance Parameter Correction
After applying both our individual bin flagging and vetoing procedures, the remaining dataset is designated clean. Nevertheless, the distribution of TS values remains inconsistent with that expected for the asymptotic one-sided χ 2 distribution given in (S7), indicative of further background mismodeling. To resolve this, we implement an additional nuisance parameter correction to our likelihood.
In detail, we modify the likelihood and TS with addi-  Figure S3. As in Fig. 3, but evaluated on the 10% of unblinded Run 2 data against which we calibrated our analysis procedure. tional nuisance parameters A m and σ Am as follows, The index m indicates that the nuisance parameters depend on the signal window under consideration. By construction, the additional nuisance parameter is -up to a penalty factor -fully degenerate with the signal. This allows the background model the flexibility to fit signal-like excesses, but at the cost of a Gaussian penalty factor given by N (A m |0, σ Am ), which is a zero mean normal distribution of width σ Am evaluated at A m . The magnitude of this penalty is controlled by the hyperparameter σ Am , which can be chosen to ensure the above TS has the expected asymptotic distribution. To be specific, we determine σ Am for each mass (indexed by k) by tuning the observed distribution TS(m a |σ Am ) against the expected distribution in the vicinity of the mass point of interest. We consider the ensemble of the discovery test statistics belong to the nearest 94,723 mass points, not including: the mass point of interest; the five nearest mass points above and below the mass point of interest; or any mass points that are vetoed by comparison with the magnet off data. We then tune the value of σ Am to its minimum value such that there are only three discovery test statistics in excess of 16 within the ensemble, which would be expected if the discovery test statistics were half-chi-square distributed. The nuisance hyperparameter σ Am translated into an effective nuisance parameter g nuis aγγ is presented in Fig. S5, and can be understood as an effective floor for our limit-setting power that competes with the statistical noise floor set by the background strength.
We note that many of the procedures required to fix the hyperparameter can be performed analytically. As the log-likelihood is approximately quadratic around its maximum,Â, near the maximum we have where at this stage we do not yet zero out test statistics associated with negative best-fit signal amplitudes. When including the correcting nuisance parameter, the distribution becomes with the final term arising from the Gaussian penalty. We can now define a new test statistic for discovery including the background signal nuisance parameter as TS(m a |σ Am ) =max A,Am t(m a , A, A n |σ Am ) −max Am t(m a , A = 0, A n |σ Am ). (S13) Using this, for a given σ Am , the new test statistic for discovery with the nuisance background signal can be directly constructed from the test statistic without the nuisance background signal. In particular, since (S13) involves only maximizations of a quadratic function, the result is given by which has the effect of decreasing the computed TS. As before, we then zero out TS(m a |σ An ) when the best fit signal strength parameterÂ is negative.

D. Survival Functions, Unvetoed Excesses, and Limits
The analysis procedure was tuned on 10% of the Run 2 data. Once validated, to the full Run 3 dataset, which had remained blinded. The survival function evaluated at various stages of our analysis procedure, realized on the 10% of Run 2 data used for tuning, is shown in Fig. S3. Approximately 10% of masses in Run 2 and 5% of masses in Run 3 are removed by the peak exclusion and vetoing procedure, with the fraction of masses removed as a function of frequency shown in Fig. S4.
Even after the nuisance parameter tuning, there remain some discrepancies between the observed and expected survival functions at moderate values (TS > 16) of the test statistic. In particular, there are a small number of mass points which have TS in excess of our 5σ LEE threshold in Run 2 data. All of these excesses occur in nearby frequencies, associated with a transient, and relatively broad, spectral feature which is shown in Fig. S6. Further, the mass points which are high significance excesses in the Run 2 data are not significant in the Run 3 data. Accordingly, we do not consider these excesses to represent credible detections.
The independent Run 3 limits are shown with and without the tuned nuisance parameter in Fig. S7.

E. Injected Signal Tests
To further validate the robustness of our analysis framework, we can inject a synthetic signal into the data and confirm that: (1) we are able to recover the signal strength, when expected; and (2) our limits will not exclude an injected signal. To perform this test, we select five representative mass points and inspect the real data in the vicinity of the expected location of an injected signal. We generate independently drawn axion signals at a range of axion couplings strengths which we add on top of each of the spectra collected in Run 3. We then apply our analysis framework, adopting the tuned value of the nuisance parameter that was previously determined from the real data in the vicinity of the injected signal location, and evaluate the best-fit axion coupling, the 95 th percentile upper limit on that coupling, and the detection significance as a function of the true axion coupling of the injected signal. As a further test of the performance of our analysis framework, for each of the five mass points, we fit the Run 3 data under the null model. We then generate Monte Carlo data under null model fits and repeat our procedure of injecting and analyzing, allowing us to compare the analysis of signal injections on real data with expected performance of the analysis framework under the null model. With the exception of the tuned nuisance parameter, which we continue to keep fixed at its value determined from the real data, this represents an entirely self-contained test of the analysis procedure.
The results of these tests are shown in Fig. S8. Critically, our analysis procedure is able to place a robust 95 th percentile upper limit which does not exclude the true coupling strength at which the signal is injected more often than would be expected and accurately recovers  Figure S5. The hyperparameter, σA, converted to the units of gaγγ, for the systematic nuisance parameter, g nuis aγγ , as a function of axion mass (labeled Systematic Nuisance Param.). We compare the systematic nuisance hyperparameter to the statistical uncertainties (labeled Hessian Statistical Error), which are computed from the Hessian for the log-likelihood without systematic uncertainties about the best-fit axion coupling,ĝaγγ. the correct axion parameters at a detection significance within the simulated expectations. We also briefly comment on the somewhat jagged nature of the detection significance in the real data as a function of the injected signal strength. These features are a product of the filtering included in our analysis procedure which removes at most 50% of the spectra in ∼5% subintervals if those subintervals have a detection significance in excess of 3σ. This has the effect of somewhat weakening the detection significance in discretized steps and also slightly biases the 95 th percentile limit to slightly lower values. This bias is removed using a TS-dependent correction of at most 8% that is incorporated in our limits. Magnet Off Figure S6. The time evolution of the broad excess that is associated with the putative signal candidate in the Run 2 data that survived all analysis cuts. The excess persists after the magnet is turned off and evolves in frequency, indicative of a background source. The magnet off veto did not anticipate this level of time evolution and so did not remove these excesses. Since this feature was found after unblinding, we report it here but do not consider it to be a credible axion detection.  Figure S7. A comparison of our fiducial limits that include a nuisance hyperparameter correction (black) and those without any correction (blue). Limits set with the nuisance hyperparameter are slightly weaker, but the features and limit-setting power are broadly similar. The figure is smoothed for clarity.