Disentangling Losses in Tantalum Superconducting Circuits

Superconducting qubits are a leading system for realizing large scale quantum processors, but overall gate fidelities suffer from coherence times limited by microwave dielectric loss. Recently discovered tantalum-based qubits exhibit record lifetimes exceeding 0.3 ms. Here we perform systematic, detailed measurements of superconducting tantalum resonators in order to disentangle sources of loss that limit state-of-the-art tantalum devices. By studying the dependence of loss on temperature, microwave photon number, and device geometry, we quantify materials-related losses and observe that the losses are dominated by several types of saturable two level systems (TLSs), with evidence that both surface and bulk related TLSs contribute to loss. Moreover, we show that surface TLSs can be altered with chemical processing. With four different surface conditions, we quantitatively extract the linear absorption associated with different surface TLS sources. Finally, we quantify the impact of the chemical processing at single photon powers, the relevant conditions for qubit device performance. In this regime we measure resonators with internal quality factors ranging from 5 to 15 x 10^6, comparable to the best qubits reported. In these devices the surface and bulk TLS contributions to loss are comparable, showing that systematic improvements in materials on both fronts will be necessary to improve qubit coherence further.


I. INTRODUCTION
Superconducting qubits have been deployed in some of the most sophisticated quantum processors, enabling demonstrations of quantum error correction [1][2][3][4][5], quantum many body physics and entanglement dynamics [6][7][8][9], and quantum simulation [10].Improvements in superconducting qubit coherence would help to enable largescale quantum processors, potentially capable of executing useful tasks.Current superconducting qubits are limited by dielectric loss that is orders of magnitude higher than expected from bulk properties of the constituent materials [11][12][13][14].This high dielectric loss indicates that qubit relaxation likely originates from uncontrolled surfaces, interfaces, and contaminants.Tantalum qubits have recently been demonstrated to exhibit record lifetimes and coherence times exceeding 0.3 ms [15], which has been reproduced with different fabrication methods [16] and substrates [17], indicating that major advances can be enabled by materials discovery.Tantalum qubits have also recently been deployed to achieve break-even quantum error correction [3], and further improvements in coherence could allow current processors and architectures to push beyond the threshold for fault tolerance [5,18].The advantage of tantalum likely arises from its stoichiometric, kinetically-limited oxide and its chemical robustness, allowing for extensive device cleaning [15,19].* npdeleon@princeton.edu† These authors contributed equally.
However, little is known about the remaining sources of loss that limit state-of-the-art tantalum devices.
Prior work in other material systems has focused on the role of parasitic two-level systems (TLSs) in decoherence and dissipation [20].TLSs were originally explored in the context of thermal transport in glasses [21,22], and are ubiquitous sources of loss and decoherence in myriad systems, including superconducting devices [23][24][25][26][27][28][29][30], microwave kinetic inductance detectors [20,31,32], optomechanical cavities [33], and acoustic resonators [34,35].However, the magnitude of the TLS contribution to device loss is difficult to quantitatively disentangle from other sources of loss, such as radiative losses [36], packaging [37], nonequilibrium quasiparticles [38], and nonsaturable absorption [39,40].This identification is complicated by the likelihood that there are multiple TLS sources in a given device, whose relative contribution may depend on device geometry, fabrication and cleaning procedures, and subtle material choices.
Here we quantitatively separate different contributions to microwave loss arising from TLSs, quasiparticles, and other channels by varying temperature and microwave photon number.We observe internal quality factor (Q int ) up to 2 × 10 8 at high power, giving a large dynamic range that allows us to measure subtle sources of loss.We observe a non-monotonic temperature dependence in Q int , with the low temperature behavior well-described by TLS loss.Furthermore, we can quantify surface and bulk TLS contributions by varying device geometry.We find that smaller devices are dominated by surface TLSs, while contributions from TLSs residing in the bulk be-come evident in larger devices.By treating the devices with a post-fabrication buffered oxide etch (BOE), we can decrease the surface TLS bath, and by comparing different surface treatments, we can quantitatively estimate the contribution of different material interfaces.Finally, we characterize the different components of TLS loss at single microwave photon powers as a proxy for qubit performance.

II. RESONATOR FABRICATION AND MEASUREMENTS
We deposit 200 nm of Ta epitaxially on 300 or 500 µm thick sapphire substrates using DC magnetron sputtering at elevated temperatures to stabilize the BCC α-phase [41,42].In contrast to our prior work [15], all films are 111 oriented, single crystal, with some films having a minority component of the 110 orientation.We pattern resonators using photolithography followed by metal etching, either using a selective wet chemical etch or dry etching in an inductively-coupled plasma reactive ion etching system.We then strip the photoresist and clean the devices using a piranha solution composed of 1:2 hydrogen peroxide in sulfuric acid ("native" surface).Finally, in order to chemically alter the tantalum surface, some devices are treated with either a 10:1 buffered oxide etch for 20 minutes ("BOE" surface), a 10:1 buffered oxide etch for 120 minutes ("long BOE" surface), or a refluxing mixture of 1:1:1 concentrated sulfuric, nitric, and perchloric acids ("triacid" surface).
The fabricated devices consist of either coplanar waveguide (CPW) quarter-wave resonators or lumped element (LE) resonators.We vary their sizes to achieve different surface participation ratios (SPR) (Figs.1a,b), the fraction of the electric field energy residing in surface layers of the device [28,43].The CPW resonators are shorted transmission lines with characteristic impedances of 50 Ω, and the LE resonators are LC oscillators with characteristic impedance of 300-400 Ω.Multiple resonators are coupled to a single feedline and are designed to have different resonant frequencies between 4 and 8 GHz to allow for spectrally selective interrogation.
We characterize the losses in each resonator by measuring transmission through the feedline in a dilution refrigerator with a base temperature around 17 mK and scanning the frequency of the probe tone around the resonant frequency [42].At the resonant frequency, the lineshape of the transmission dip reflects both the internal losses and the coupling to the feedline.We fit the lineshape to extract the internal quality factor, Q int (Fig. 1c) [42,44].
A common observation in superconducting circuits is that losses decrease with increasing microwave power, indicating that the losses are from saturable TLSs [12].We observe similar power dependent loss in our devices (Fig. 1d), with low power Q int ranging from 1 × 10 5 to 1 × 10 7 and high power Q int ranging from 1 × 10 7 to 2 × 10 8 across different devices.
-2 0 +2 Frequency (kHz) Different sources of loss can be distinguished by their power and temperature dependence.In order to further disentangle different physical mechanisms for loss, we characterize resonator losses over a wide range of temperatures and microwave powers (Fig. 2a).The full power and temperature dependence is well-described by a model that incorporates three sources of loss: TLSs (Q TLS ), equilibrium quasiparticles (Q QP ), and a separate power-and temperature-independent loss channel that limits Q int at the highest microwave powers (Q other ).We fit the full dataset using the following model: The TLS and quasiparticle losses are parametrized by [20]: and where ω is the center angular frequency of the resonator; T is the temperature; n is the intracavity photon number; Q TLS,0 is the inverse linear absorption from TLSs; D, β 1 [45], and β 2 [46] are parameters characterizing TLS saturation; Q QP,0 is the inverse linear absorption from quasiparticles; ∆ 0 is the superconducting gap (∆ 0 = 1.764kB T c ); T c is the superconducting critical temperature of the film; K 0 is the zeroth order modified Bessel function of the second kind; k B is the Boltzmann constant; and h is the reduced Planck constant.There are seven free fit parameters: This model gives rise to three separate regimes in the data.At high temperatures above 500 mK, Q int exhibits a weak power dependence and decreases exponentially with temperature.This behavior is consistent with equilibrium quasiparticle loss as the temperature becomes an appreciable fraction of the superconducting critical temperature for α-Ta [20].At intermediate temperatures, 100-500 mK, Q int increases with temperature by around a factor of two for the lowest microwave powers, with little variation with temperature at the highest microwave powers, consistent with thermal saturation of TLSs, where the characteristic temperature is given by the resonator frequency [20].At the lowest temperatures, there is an apparent 1/T dependence of Q int .This behavior is consistent with a decreasing TLS coherence time with increasing temperature and subsequent increase of TLS saturation power [20,47].In our devices, the saturation power at base temperature for the TLS bath can be as low as 0.01 photons.
At base temperature and at the powers relevant for transmon operation where the average intracavity photon number is n = 1, the dominant source of loss is 1/Q TLS .We focus on Q TLS,0 as a parameter that captures linear absorption due to TLSs and therefore reveals differences in the materials under different fabrication conditions.We can check that Q TLS,0 is a robust parameter by independently measuring the temperature-dependent shift in the frequency of the resonator (Fig. 2b).The resonance frequency shifts because of the change in the real part of the dielectric constant arising from losses associated with the entire spectral distribution of the TLS bath [48], as well as losses induced by quasiparticles.The frequency shift is given by [20]: where f 0 is the center frequency of the resonator at zero temperature and δf is the difference in the center frequency of the resonator at nonzero temperature.
The TLS and quasiparticle contributions to the frequency shift are given by [20]: where Ψ is the complex digamma function; σ 1 and σ 2 are the real and imaginary parts of the complex conductivity; φ is the phase between the real and imaginary parts of the complex conductivity; and α is the kinetic inductance fraction [42].The three free fit parameters are Q TLS,0 , T c , and α.
We compare the extracted Q TLS,0 from the two measurements (Fig. 2c) and find that they agree on average to within a factor of 2.6σ.We note that our measurements were optimized for measuring Q int rather than δf /f 0 .As δf /f 0 is not sensitive to the applied microwave power, we have a factor of 5 to 10 fewer data points with which to fit δf /f 0 than Q int .We conclude that while the difference between the two measurements is statistically significant, Q TLS,0 is a robust parameter that forms a quantitative basis of comparison across devices when fitted from Q int data.

III. PARAMETRIZING SOURCES OF LOSS
TLSs that cause loss can occur in many different materials in the same device: surface oxides, surface contamination, the exposed sapphire surface, the tantalumsapphire interface, the bulk of the sapphire, and other elements related to packaging.In order to identify the location and origin of TLS loss, we fabricated 24 chips containing a total of 105 devices with varying geometry and surface conditions, and performed temperature and power dependent loss measurements to extract Q TLS,0 .Varying the device geometry changes the SPR (Fig. 3a).By modeling interfaces as dielectrics with an assumed standard thickness (3 nm) and permittivity ( =10), we can compute the fraction of electric field energy that overlaps with the interfaces of a device for a given electromagnetic mode, the SPR [42,43].
For the CPW resonators, we tune the SPR by tuning the pitch of the shorted CPW transmission line.Fixing the impedance to be 50 Ω constrains the ratio between the centerpin width and the gap width [49], so that the centerpin also increases in width as the SPR is reduced.For the LE resonators, we tune SPR by changing the spacing and size of the capacitor pads, where larger spacings and pads correspond to lower SPR.Across both types of devices, we vary the SPR by a factor of 30 [42].
The extracted Q TLS,0 increases with decreasing SPR (Fig. 3b).The trend is approximately linear and then plateaus for low SPR, below 3 × 10 −4 .We model this SPR dependence as arising from two different components, a surface-related TLS loss that scales with SPR, and a bulk TLS loss that is SPR-independent.The losses can be parametrized as a loss tangent, which is the ratio of the imaginary and real components of the dielectric constant, tan δ = Im( )/Re( ).The apparent surface loss tangent varies across the four surface treatments.Fitting these two components to the full dataset across all 105 devices yields surface loss tangents of tan δ surface,BOE = (7.2±0.6)×10−4 , tan δ surface,long BOE = (7 ± 1) × 10 −4 , tan δ surface,native = (13.6 ± 0.6) × 10 −4 , and tan δ surface,triacid = (14 ± 3) × 10 −4 for the BOE, long BOE, native, and triacid surface treatments, respectively.The fitted bulk loss tangent is tan δ bulk = (1.5 ± 0.2) × 10 −7 , which is an order of magnitude higher than recent bulk measurements on the same substrates [11].This indicates that the bulk loss we observe is dominated by a surface damage layer within 50 µm of the surface, rather than a uniform loss tangent throughout the bulk [42].
We study the correlation between loss tangent and tantalum oxide thickness after the four different surface treatments to localize the source of surface-related TLS loss.The native oxide is an approximately 3 nm thick, kinetically-limited, stoichiometric oxide that is remarkably robust to chemical processing, as measured using Xray photoelectron spectroscopy and transmission electron microscopy [15,19,42].In [19], we observe a reduction in oxide thickness after BOE treatment to approximately 2.4 nm, while after triacid treatment, the oxide grows to nearly 6 nm.We correlate results from [19] with labbased XPS results to estimate that the total oxide thickness after the long BOE treatment is 1.5 nm ± 0.3 nm [42].The long BOE and BOE treated devices exhibit 1.96 and 1.89 times higher Q TLS,0 than the devices with a native surface (Figs.3b,c).Since the tantalum oxide layer is amorphous, and it is thinner after either BOE treatment, a likely hypothesis for the origin of TLS loss is the oxide layer.However, we observe that the triacid-treated samples have similar values of Q TLS,0 to those of the native samples, despite their thicker oxide.Therefore, the TLS loss is not proportional to the volume of the oxide, possibly because another bath of TLSs decreases to compensate the additional oxide-related loss in the triacid treated samples.
One possible difference among the samples is that the triacid and BOE treatments are highly effective at removing residual hydrocarbon contamination from fabrication, resulting in a reduction in the parasitic hydrocarbon TLS loss commensurate with the increased oxide loss from triacid treatment.Treatment in BOE removes some surface oxide, therefore any contamination on that surface should be removed concurrently; similarly, the triacid treatment has previously been shown to be strongly oxidizing and effective at removing hydrocarbons [50].We therefore model the observed surface loss tangents for the three surface conditions as arising from three components: a hydrocarbon component at the  As the distance between capacitor pads (gray) increases, the fraction of the electric field (black arrows) energy overlapping with a thin layer at the three interfaces, metal-air (yellow), metal-substrate (purple), and substrate-air (green), decreases.The fraction of the electric field energy in the sapphire substrate (blue) does not strongly depend on the distance between capacitor pads.(b) Dependence of the extracted QTLS,0 from Qint measurements on SPR.For the highest SPR devices QTLS,0 exhibits linear scaling with SPR, but for lower SPR the QTLS,0 saturates, indicating that there are both surface and bulk TLS baths.
Comparing the data for four different surface conditions, native (orange), BOE (dark blue), long BOE (light blue), and triacid (green) allows for the estimate of surface loss tangents for each surface: tan δ surface,native (dashed orange line), tan δ surface,BOE (dashed dark blue line), tan δ surface,long BOE (dashed light blue line), and tan δ surface,triacid (dashed green line), as well as the bulk substrate loss tangent tan δ bulk (dashed grey line).(c) Histogram of surface loss tangents for native (orange), BOE (dark blue) and long BOE (light blue) CPW devices, showing that BOE and long BOE treatments result in surface loss tangents that are a factor of 1.89 times and 1.96 times lower than the native surface on average.
metal-air interface, an oxide-related loss tangent that is proportional to the oxide thickness, and a component related to the metal-substrate and substrate-air interfaces [42]: where tan δ surface,i is the loss tangent for condition i, t 0 is the assumed thickness of the substrate-air and metalsubstrate interfaces used to simulate participation ratios (3 nm), t TaOx,i is the oxide thickness of treatment i, γ i ∈ {0, 1} is a factor determining if hydrocarbon loss is considered for surface condition i, and the subscripts MA, SA, MS, and HC refer to metal-air, substrate-air, metal-substrate, and hydrocarbons respectively.
To estimate the different components of the loss, we assume that the hydrocarbon loss is completely eliminated after any of the triacid or BOE treatments (γ i = 0) and is present only for the native condition (γ native = 1).By quantitatively comparing the extracted loss tangents for our four conditions with Equation 7 [42], we calculate a putative rescaled hydrocarbon-related loss tangent for an assumed standard 3 nm thick interface, pMA pMS tan δ HC = (4.9±0.5)×10−4 , and an intrinsic loss tangent for the tantalum oxide, tan δ TaOx = (5 ± 1) × 10 −3 .When rescaled to account for the difference in participation ratios of different surfaces, pMA pMS tan δ TaOx = (5 ± 1) × 10 −4 .Assuming that the triacid and two BOE treatments do not affect the metal-substrate and substrate-air interfaces, our model also provides an estimate for the loss contributions of those two interfaces, and finds that they give a combined value of tan δ MS + pSA pMS tan δ SA = (4±1)×10 −4 [42].The rescaled loss tangents are all comparable, indicating that all surfaces play a critical role in determining overall loss.Other possible models for the effects of surface treatment that we have ruled out based on the data are detailed in the Supplemental Material [42].
Our data rules out a model for the TLS loss that is purely extensive in the oxide thickness.Here we have hypothesized that a second bath residing in fabricationrelated contaminant hydrocarbons can account for the difference.However, there are other possible microscopic models, such as the possibility that a single chemical species or suboxide is responsible for all the TLS loss in the oxide, and the native and triacid samples have equal amounts of that species despite the large difference in total oxide thickness.Testing such hypotheses would require the measurement of many more surface conditions that independently vary each candidate TLS component.
While Q TLS,0 parametrizes the linear absorption in the device, for transmon operation the steady state photon occupation will be around n = 1, and we observe that the saturation power for the TLS bath can be a small fraction of a single photon.For the largest devices at base temperature, Q TLS (n = 1) ranges from 5 to 15 × 10 6 (Fig. 4), in line with state-of-the-art qubits [15,16].

IV. CONCLUSION
We observe that state-of-the-art tantalum devices are limited by TLS loss.Using systematic measurements and parametrization of losses in superconducting circuits, we have shown that there are multiple sources of TLSs: a surface-related TLS bath associated with the tantalum oxide that can be reduced by around a factor of two with BOE treatment, and a substrate-related TLS bath.Furthermore, the surface-related TLS loss is not extensive in the oxide volume, indicating that there may be at least one additional TLS bath, such as fabricationrelated hydrocarbon contamination.Critically, each of these components is of similar magnitude for state-ofthe-art devices, and future improvements in superconducting qubits will require material improvements that address all of these sources of loss.Two natural avenues to pursue based on our findings would be to passivate the Ta surface to avoid oxide formation entirely, and to study subsurface damage from polishing and surface processing in sapphire substrates.QTLS is a nonlinear function of n, we nevertheless observe a roughly linear dependence between QTLS(n = 1) and SPR.We fit an apparent loss tangent to each surface condition: tan δ surface,native (n = 1) (dashed orange line), tan δ surface, BOE (n = 1) (dashed dark blue line), tan δ surface, long BOE (n = 1) (dashed light blue line), and tan δ surface,triacid (n = 1) (dashed green line), as well as the bulk substrate loss tangent tan δ bulk (n = 1) (dashed grey line).Data are calculated from Equation 7with errors propagated from errors in fit parameters.The error bars are truncated at the lower end by QTLS,0.
The observed temperature dependence also points to two paths for improving the performance of superconducting qubits: reducing the density of TLSs, and improving the coherence time of the TLS bath.Our ongoing work includes studying the dynamics of the TLS bath using pump-probe spectroscopy [36] and other time-domain methods [51].
Correlations between the measurements presented here and direct materials spectroscopy may identify atomistic origins of TLS loss.For example, the losses in the tantalum oxide could arise from particular suboxides or interface states, and detailed chemical profiling using Xray photoelectron spectroscopy could elucidate the particular chemical species responsible for TLS loss [19,52].More broadly, the parameterization presented here isolates and identifies the material-related loss, thereby enabling quantitative comparisons among different material systems, such as new superconducting metals [53] and metal heterostructures, alternative substrates such as high purity silicon [54], and different fabrication and post-processing techniques.
3" diameter sapphire substrates are cleaned in a 2:1 H 2 SO 4 :H 2 O 2 piranha solution for 20 mins, then rinsed in 3 cups of de-ionized water follwed by 1 cup of 2-propanol, and then blow dried in N 2 .Then the sapphire substrate is loaded into a DC magnetron sputtering system (AJA Orion 8).The substrate is heated in-situ at 850 • C before tantalum sputtering.The film deposition parameters were as follows: RF power of 250 W, Ar flow rate of 30 sccm, ambient pressure, temperature ramp rate 1 • C/minute, and steady state temperature of 750 • C, which results in a film growth rate of approximately 8 nm/minute.Post deposition, the tantalum films are confirmed to be predominantly 111 orientation in the α-phase using a Bruker D8 Advance X-ray Diffractometer.The deposited tantalum film is dehydration baked at 110 • C and then cooled for about a minute on a metal plate.Following this, AZ1518 is spun on at 4000 rpm for 45 secs with a ramp rate rate of 1000 rpm/sec for an approximate resist thickness of 3 µm and soft baked at 95 • C for 1 minute.The photoresist is patterned using a Heidelberg DL66+ laser writer with a 1.8 µm spot size with a 50% attenuator, intensity setting of 30% and focus offset setting of 10%.The photoresist is developed in AZ300MIF solution for 90 s and rinsed in de-ionized water for 30 s.After development, the mask is hard baked at 110 • C for 2 minutes and then cooled on a metal plate for 1 minute.
Using the patterned photoresist as a mask, we etched each device with one of three different etch types.One type is a wet chemical etch, 1:1:1 ratio of HF:HNO 3 :H 2 O (Transene Tantalum Etchant 111), in which a sample is swirled for 21 seconds before being rinsed in 3 cups of de-ionized water and 1 cup of 2-propanol, then blown dry in N 2 .The second etch type is a chlorine-based dry chemical etch in an inductively-coupled plasma reactive ion etcher (PlasmaTherm Takachi).The etching parameters for the chlorine dry etch are as follows: ambient pressure of 5.4 mTorr, chlorine flow rate of 5 sccm, argon flow rate of 5 sccm, RF power of 500 W, and bias power of 50 W, which results in an etch rate of approximately 100 nm/min.The third etch type is a fluorine based dry etch, using the same reactive ion etcher as the chlorine etch, with parameters: ambient pressure 50 mTorr, CHF 3 flow rate 40 sccm, SF 6 flow rate 15 sccm, Ar 3 flow rate 10 sccm, RF power of 100 W, and bias power of 100 W.
After etching, the photoresist mask is stripped in a Remover PG bath at 80 • C for 1 hour followed by rinsing in 2-propanol.The patterned Ta film is coated with hard-baked AZ1518 using the same parameters mentioned above to act as a protective layer for dicing.The wafer is diced (Advanced Dicing Technologies proVectus 7100 dicing saw) into 10 mm or 7 mm pieces, depending on the packaging used in the dilution refrigerator.Following dicing, the photoresist is stripped in a Remover PG bath at 80 • C for 1 hour, followed by 2 minutes each sonication in toluene, acetone, and 2-propanol.Some chips were sonicated in methanol for 2 minutes between the acetone and 2-propanol sonication to remove zinc contamination.The chips are blown dry in N 2 , and then cleaned in a 2:1 H 2 SO 4 :H 2 O 2 piranha solution for 20 mins followed by rinsing in 3 cups of de-ionized water and 1 cup of 2-propanol and then blow dried in N 2 .
After fabrication, the samples are treated in BOE or triacid as detailed in Section IB.Then the chips are bonded to a PCB using an automatic wire bonder (Questar Q7800).We used two types of packages for our resonator chips.One comprises a Cu-plated PCB and a Cu puck and penny coated with 1 µm Aluminum.The second comprises a * npdeleon@princeton.edu,† These authors contributed equally.B. Surface processing 10:1 buffered oxide etch (BOE, Transene) is a mixture of 10 parts 40% NH 4 F solution to 1 part 49% HF solution by volume.BOE treated samples were placed in buffered oxide etch at room temperature and were not agitated.After 20 minutes ("BOE" treatment) or 120 minutes ("long BOE" treatment), the samples were removed and triple rinsed in de-ionized water and 2-propanol before being blown dry in N 2 .

C. Measurement apparatus
All devices were measured in a BlueFors XLD dilution refrigerator with a base mixing chamber temperature of approximately 17 mK.There are four independent input lines and four corresponding output lines.A fridge diagram showing the layout for all four input and output lines is given in Figure S1.Each input line has between 60 dB and 85 dB of attenuation from discrete cryogenic XMA attenuators (above mixing chamber, PN: 2082-604X-dB-CRYO) and cryogenic attenuators from Quantum Microwave (at mixing chamber, PNs: QMC-CRYOATTF-06 and QMC-CRYOATTF-03), as well as attenuation from stainless steel coaxial transmission line cables, SMA connections, and insertion losses from filters.The total input line attenuation varies across the lines from 86.7 dB to 108.7 dB at resonator frequencies.Two types of low pass filter are used at the mixing chamber, a commercial filter from K&L Microwave (PN: 6L250-00089) outside of the magnetic shield and an eccosorb filter placed inside of the magnetic shield.Two types of magnetic shield were used across our experiments.One type is a custom-fabricated can made of mu-metal, with which we used a custom made eccosorb filter with upper cutoff frequency approximately 8 GHz, and the other is a prototype product (QCan) from QDevil with which we used an eccosorb filter supplied by QDevil with a similar pass band.
Each output line contains filters, isolators, and amplifiers.At the mixing chamber, we used an eccosorb filter, with part number matching that on the input line and the same K&L filter as the input line.Two isolators were placed in series, both from QuinStar Technology (QCI-075900XM00).At the 4 K stage, a high electron mobility transistor (HEMT) amplifier was used (Low Noise Factory LNF-LNC4 8F).Superconducting NbTi wire was used between the isolators and the HEMT to reduce signal attenuation.Additional filters were sometimes placed in the output line, with pass bands which contained all resonators that were being measured.
Several devices used a traveling wave parametric amplifier (TWPA) sourced from the MIT Lincoln Laboratory.The TWPA was placed in a separate magnetic shield at the mixing chamber and placed in the signal path immediately after the second isolator.The TWPA was pumped using a separate input line.
All measurements were conducted with a vector network analyzer from Keysight (PNA-X Network Analyzer N5241A).For most experiments, the measurement parameters were: span of 5 times the high power resonator linewidth, 201 points across the frequency axis, IF bandwidth of 30 Hz, and an integration time per resonator varying from 1 minute (high power) to three hours (low power).Integration times were adjusted for each resonator chip, and measurement parameters differed slightly for early experiments.

D. Resonator spectroscopy
Resonators are easily located in frequency space due to their high quality factor relative to all other features.Figure S2 shows a wide frequency sweep of a chip with four resonators coupled to a feedline.The wide frequency ripples may be caused by standing waves or reflections from connections on our measurement setup; however, given that the width of these ripples are on the order of 10 MHz and the width of the resonators are on the order of 1 kHz, we ignore these ripples and assume that a flat background exists when measuring each resonator.

E. Measuring Q int
We used the following model to fit each resonator trace, such as the one shown in Figure 1c [1]: where |S 21 | is the magnitude of the transmission through the feedline, Q c is the coupling quality factor, Q tot is the total quality factor (Q ), α is the asymmetry of the resonator, ω 0 is the center angular frequency of the resonator, ω probe is the angular frequency of the probe tone, and |S 21,baseline | is the transmission through the feedline when no resonator is present.We have assumed that |S 21,baseline | is a constant, which is approximately correct for resonators with a small linewidth.The derivation of this model is given in the appendix of [1] with a minimal assumption set.
The coupling quality factor, Q c , parameterizes the loss from the resonator to the feedline.In order to characterize our material losses, we must be able to separately determine Q c and Q int across an entire temperature and power sweep.The value of Q c is determined by the capacitive or inductive coupling between each resonator and the feedline, and therefore we expect Q c to be independent of power and temperature.
In our analysis, each resonator |S 21 | trace is fit independently.To show that we can separately extract Q c and Q int from the same |S 21 | trace, we examine the fitted values of Q c for each temperature sweep.We find that our fitted values of Q c are constant across power and temperature, and so we conclude that we have extracted an accurate value of Q c , and therefore of Q int .An example plot of Q c versus power and temperature, corresponding to the same sweep shown in Figure 2a, is shown below in Figure S3.

F. Nonlinear behavior at high microwave power
When measuring the highest powers, we occasionally were unable to fit a resonator trace (Figure S4), which we attribute to nonlinear behavior of the resonator.Potential sources of this non-linearity are the saturation of an amplifier, or an effect of the superconducting state such as the non-linear kinetic inductance of Cooper pairs [2,3].
As we are most concerned with the behavior of our devices at low power, we excluded traces showing the nonlinear behavior from our analysis.

G. Model for Q TLS
It can be shown that the loss induced by an ensemble of TLSs coupled to an electromagnetic mode takes the form [4]: where n is the average photon number in the mode, T is the temperature of the mode-ensemble system, ω is the frequency of the mode, n c is the critical photon number of the ensemble, and T 1 and T 2 are the average relaxation and decoherence times of the ensemble.In order to obtain the model we use to fit the TLS component of our Q int data, we make a few substitutions.First, the average T 1 of the ensemble can be shown to follow a thermal distribution [4]: Second, TLS-TLS interactions can be modeled as state changes in one TLS causing dephasing in neighboring TLS's.
As the temperature is reduced, thermal fluctuations in the states of the TLSs in the ensemble will reduce as more and more members of the ensemble occupy the ground state.We therefore expect an inverse relationship between the TLS coherence time T 2 and temperature, which we model as [5]: where β 1 is an empirical parameter.Finally, different mode shapes will overlap with and saturate the ensemble differently as they are populated with increasing numbers of photons, and we account for this by introducing another empirical fit parameter, β 2 [6]: Putting all of these substitutions together gives our TLS loss model: The model we use to fit quasiparticle losses has been discussed in other works [7].

H. Model for frequency shift
The model we use to fit the frequency shift is: where: is the TLS contribution to the frequency shift and: is the quasiparticle contribution to the frequency shift.In Equation 8 and 9, Ψ is the complex digamma function; σ 1 and σ 2 are the real and imaginary parts of the complex conductivity; φ is the phase between the real and imaginary parts of the complex conductivity; and α is the kinetic inductance fraction.The derivation of the TLS contribution can be found in, for example, [4].The expression for the quasiparticle contribution is based on [4] but is not explicitly stated, so we derive it in detail below.The frequency shift from quasiparticles is defined as: where X S is the imaginary part of the surface impedance of the superconductor, otherwise known as the reactance.
In general the surface impedance has a cumbersome form, but in three superconducting material limits it takes the simpler form: where A is a constant prefactor, σ(T ) is the superconducting complex conductivity, and γ is a parameter that takes a different value depending on which of the three limits the superconductor is in.As the quasiparticles of interest in our system have a thermal distribution, the complex conductivity takes the form: where: and: In the above equations, h is the reduced Planck constant, k B is the Boltzmann constant, ω is the resonator center angular frequency, T is temperature, ∆ 0 = 1.764kB T c is the superconducting gap, T c is the superconducting critical temperature, I 0 is the zeroth order modified Bessel function of the first kind, K 0 is the zeroth order modified Bessel function of the second kind, and σ n is the normal-state conductivity of the superconductor just above T c .The surface impedance can be rewritten in a more convenient form using these quantities: where: and: The reactance is then: The frequency shift can then be written in terms of the complex conductivities: There are three possible values of γ depending on the electron mean free path ( ), coherence length (ξ 0 ), film thickness (d), and London penetration depth (λ LO ) of the superconductor [4]: However, we did not directly measure the relevant parameters to determine whether or not we were in any of these three regimes, so instead we fit our data to all three to see if the assumed regime made a difference to the outcome of the fit.The results of this analysis are shown below, where the consistency between the superconducting critical temperature T c estimated by our Q int fits and our frequency shift fits is plotted for all three values of γ.As can be seen, the assumed regime makes no difference to the fit outcome, so we choose to work in the thin film local limit (γ = −1) for our quasiparticle frequency shift fits:

II. RESONATOR DESIGN A. CPW resonators
Our CPW resonators are quarter-wave resonators constructed by shorting one end of a transmission line.Our design sets the characteristic impedance (Z 0 ) of the resonators to 50 Ω, which dictates a relationship between the centerpin width and the gap width [8].This means that if the distance between the center pin and ground plane (pitch) of the resonator is specified, the centerpin width is fully constrained.The resonators are designed to have resonance frequencies between 6 and 8 GHz, where the resonance frequency is dictated by: where eff is the effective dielectric constant defined in [8], l is the length of the resonator, and v is the speed of electric field propagation down the transmission line.We generally assume v = c, where c is the speed of light in vacuum.
When designing both the LE and CPW resonators, we aim to have the coupling rate of the resonator to the feedline (1/Q c , sometimes written as 1/Q ext ) be equal to the expected internal loss rate of the resonator (Q int ).If Q c is too small, the measurement is not sensitive to changes in Q int ; and if Q c is too large, the photon lifetime in the resonator is short and signal-to-noise (SNR) decreases.Our CPW resonators are capacitively coupled to the feedline, and we compute this coupling using an equation from [9]: where C c is the capacitance between the centerpin and the feedline.We compute this capacitance using finite element analysis (ANSYS Maxwell 3D), and generally find good agreement between predicted and measured resonance frequencies and external loss rates.

B. LE resonators
The LE resonators consist of a meander inductor in series with a dipole capacitor, with a resonance frequency given by: and an impedance given by: where L and C are the inductance of the inductor and the capacitance of the capacitor.We account for stray capacitance across the inductor by modeling a stray capacitor in parallel with the lumped inductor.Therefore the total capacitance in the resonator is the sum of the lumped capacitance C L and the stray capacitance The resonance frequency and impedance are then: and: For a given resonator design, we compute these three unknowns using three separate simulations.The first is a capacitance simulation in Ansys Maxwell 3D of only the dipole capacitor pads.We take the modeled capacitance to be equal to the lumped capacitance C L .The second and third simulations are HFSS eigenmode simulations of the meander and the full resonator.The resonance frequency of the meander can be written as: and the resonance frequency of the full resonator is given by Equation 26.With C L , f 0,meander and f 0,resonator calculated from the three simulations, the remaining unknowns (C S and L) and the fundamental resonator parameters (f 0 and Z 0 ) can be computed.The external coupling rate of the LE resonators was determined empirically by cooling down an initial design, and then adjusting the distance from the feedline to better match the external coupling rate to the internal loss rate.The distance from the feedline was adjusted by assuming the coupling would fall off proportional to 1/r 3 , as the coupling is inductive.After this initial cooldown, finer adjustments were made for subsequent designs, but in general the external coupling rate of the LE resonators matched the internal loss rate, which is the aforementioned condition for optimizing both SNR and sensitivity to changes in Q int .

III. SPR CALCULATIONS
For both the CPW and LE resonators, the SPRs reported in the main text are computed by simulating the electric field energy stored in 3 nm thick dielectric interface layers with dielectric constants of = 10.For the CPW resonators, the simulation is done using a single cross section of the centerpin and ground plane, and for the LE resonators the simulation is done using a single cross section of the dipole capacitor pads.We use DC finite element simulations (Ansys Maxwell) for both kinds of single cross section simulations.
The single cross section approximation is appropriate for the CPW resonators because their geometry is a single cross section extruded along a path.However, the degree to which the single cross section simulation is a good approximation for the LE resonators is not as obvious, as the LE resonators have nontrivial structure in the direction normal to the cross section plane.To check that the single cross section simulations accurately estimate the SPR's of the LE resonators, we separately compute the SPRs for a handful of LE resonators using the method outlined in [10], which involves an eigenmode simulation supplemented with a DC cross section simulation of the metal edges.We find that the single cross section and 2D sheets methods agree to within 15%, indicating that the single dipole capacitor cross section simulation is a suitable approximation for the full LE resonator SPR.This also implies that the meander inductor does not contribute significantly to the total SPR of the LE resonators.

IV. BULK LOSSES
In the main text we describe how we extract loss independent of SPR by fitting Q TLS,0 versus SPR, and we extract a low power bulk loss tangent an order of magnitude larger than that measured in [11].
One hypothesis for this difference is that the "bulk" loss to which our measurements are sensitive is not the same as the volumetric average bulk loss measured by [11].In our experiments, the device with the lowest SPR is an LE device with 65 µm spacing between capacitor pads.Our experiment therefore cannot distinguish between "bulk" and depths below the surface comparable to this spacing.In [11], by contrast, the experiment probes the loss tangent averaged over the bulk of a 440 µm thick HEMEX sample.We hypothesize that a near surface layer hosts a higher concentration of defects that give rise to TLS behavior.Since our measured bulk loss tangent is an order of magnitude higher than measured in [11], in order to reconcile the two measurements this highly damaged layer would need to be around ten times thinner than the bulk substrate measured in [11], around 50 µm.
These hypothesized extended defects could be caused by polishing, damage from etching, or other fabrication induced damage.Direct materials characterization of the polished sapphire could elucidate potential microscopic sources of TLS associated with this damage.

V. SURFACE CHARACTERIZATION AFTER CHEMICAL PROCESSING
We used X-ray photoelectron spectroscopy (XPS) to characterize the surface of our tantalum films before and after surface processing.We started with samples that had hard baked photoresist applied and stripped off in solvent following the procedures outlined in Section I A. We scanned a sample before any further chemical processing, after a piranha treatment ("native" surface), and after both a piranha and a 20 minute BOE treatment ("BOE" surface), as well as a separate sample after triacid treatment ("triacid" surface).To reduce the amount of adventitious carbon accumulated on the samples after chemical cleaning, we attempted to keep the length of time between chemical treatments and XPS measurements low.We took measurements within thirty minutes of the piranha treatment and BOE treatment, and took measurements four hours after triacid treatment.All XPS measurements were taken on a ThermoFisher K-Alpha XPS Spectrometer with an aluminum Kα X-ray source.
We took a broad survey scan on each sample and observed Ta, O, and C peaks on all samples, and a Na1s peak on the untreated sample only.We took fine scans of the Ta4f, O1s, and C1s peaks for all samples with a binding energy step size of 0.1 eV and a dwell time of 50 ms.We subtracted a Shirley background from the Ta4f and O1s peaks [12], and a linear background from the C1s peaks.To account for potentially different X-ray flux between different measurements, we normalized all intensity data to the total intensity of the Ta4f spectrum for each sample.In addition, we calibrate the binding energy scale by setting the lowest binding energy Ta4f peak to 21.2 eV.
In the Ta4f spectrum, we can resolve two pairs of two peaks.We attribute the symmetric pair of peaks between 26 eV and 30 eV to the dominant Ta 5+ oxidation state and the asymmetric pair of peaks between 21 eV and 24 eV to the tantalum metal [13].Each state generates two peaks due to the strong spin-orbit coupling in tantalum [14].The relative intensity of the Ta 5+ peaks is smallest for the untreated sample, increases slightly after a piranha treatment, decreases slightly after a BOE treatment, and is largest after a triacid treatment (Figure S6(a), qualitatively matching what is described in [15]).
For the C1s peak, the intensity is maximized for the untreated sample, and is significantly reduced by each acid treatment.Performing a BOE treatment after piranha treatment reduces the C1s intensity over that of just piranha.The measurement on our triacid treated sample shows a strongest C1s signal out of the three acid treated measurements (Figure S6(c)).
The relative intensity of the Ta 5+ doublet and the intensity of the O1s peak both indicate that the oxide thickness grows slightly after piranha treatment, is etched slightly after the BOE treatment, and is grown significantly after triacid treatment.In [15], we measure that the BOE treatment reduces the oxide thickness by 20% and the triacid treatment grows the oxide thickness by over a factor of two.
The sources of carbon in our system are adventitious carbon and photoresist residue.Therefore, the intensity change of the C1s peak is related to removal of fabrication residue, but can be complicated by the duration of air exposure, which leads to adventitious carbon accumulation.Our data shows that piranha treatment is effective at removing carbon from the surface, but performing BOE in addition to piranha can remove more carbon than piranha alone.We attribute this further reduction to carbon being removed from the surface of the tantalum oxide as it is etched away.We expect the triacid treatment to be extremely effective at cleaning the surface [16], however, the measurement of the triacid treated sample does not show as much reduction in the C1s signal.We attribute the larger triacid signal to the increased length of time between cleaning and measurement, which would allow more adventitious carbon to deposit on the surface.The "No treatment" data were taken after photoresist was stripped from a sample."Piranha", "Piranha + BOE", and "Triacid" correspond to the "Native", "BOE", and "Triacid" surface conditions, respectively.Ta4f and O1s data have a Shirley background subtracted [12], and all C1s data have a linear background subtracted.All data are normalized to the total Ta4f intensity measured on the sample.

VI. MODEL FOR SURFACE LOSSES
In the limit that surface losses dominate, dielectric loss in superconducting resonators can be expressed as: where tan δ i is the loss tangent of interface i; and MS, MA, and SA are the metal-substrate, metal-air, and substrate-air interfaces, respectively.The above expression can be rearranged as follows: where β i = p i /p MS , and tan δ is the parameter we fit for.We can recast the above in terms of p MA : where We now consider a model in which the BOE, long BOE, and triacid samples have a source of loss on the MA interface that scales linearly with the oxide thickness, and the native samples suffer from both this oxide-thickness dependent loss and an additional source of loss on the MA interface which we hypothesize is due to fabrication related hydrocarbons (Figure S7).We can recast losses in terms of the true oxide thickness and the hydrocarbon related loss by: where t 0 = 3 nm is the standard assumed oxide thickness, t i is the measured oxide thickness for the i th surface processing technique, p MA, i is the true MA surface participation of the i th surface processing technique (up to a factor of the assumed oxide dielectric constant, = 10), and γ i ∈ {0, 1} determines if hydrocarbon loss is considered for the i th surface processing technique (γ native = 1 and 0 otherwise).Equating the third and fifth lines of the above gives: By considering the native surface and any two of the BOE, long BOE, and triacid surface, we can solve the above equations for tan δ MA,0 , α SA tan δ SA + α MS tan δ MS , and tan δ HC .
Consider a set of surface treatments {Native, a, b}, where a and b are any pair of BOE, long BOE, and triacid.The system of equations described by Equation 33 for this set of treatments is solved for tan δ MA,0 and α SA tan δ SA + α MS tan δ MS by: and Native BOE Triacid FIG.S7.A model for hydrocarbon losses in which BOE and triacid treatments remove residual hydrocarbons left over from photoresist.The native samples suffer from losses from both the native oxide and the hydrocarbons on the MA interface.In the above cartoon, the pink layer is hydrocarbons, and the orange, green, and purple layers are the MA, SA and MS interfaces, respectively.The "BOE" diagram corresponds both to the BOE and long BOE treatments, with the difference being the thickness of the oxide layer.This model assumes that piranha cleaning is effective at removing hydrocarbons on the sapphire, but not on the oxide surface.

Native BOE Triacid
FIG. S8.An example of a model excluded by our data.Since BOE does not etch sapphire and not etch hydrocarbons, one possibility is that hydrocarbons reside on the MA and SA interfaces of the native samples and the SA interface of the BOE and long BOE.This model could be achieved if piranha cleaning was ineffective at removing hydrocarbons from the sapphire surface, while the triacid treatment is highly effective.We continue to assume that hydrocarbons are lifted off from tantalum with BOE etching of the oxide.The "BOE" diagram corresponds both to the BOE and long BOE treatments, with the difference only being the thickness of the oxide layer.The parameter values we extract from this model given our data are unphysical.
Note that these solutions only involve the surface treatments a and b.Solving for tan δ HC : In order to have a better basis of comparison to our extracted quantities, we rescale the quantities computed above to p MS , which is the conventional metric by which surface-dependent losses are compared.For the hydrocarbon loss, we have: where Q HC is the inverse loss associated with the hydrocarbons and β MA = p MA /p MS .
For the SA and MS loss: where Q SA and Q MS are the inverse loss associated with the SA and MS interfaces.
Oxide thicknesses for the native, BOE, and triacid treatments are determined in [15], and we estimate the oxide thickness for the long BOE in Section VII.In all cases we consider the total oxide thickness to be a sum of the Ta 5+ , Ta 3+ , and Ta 1+ species.We compare the solutions for tan δ MA,0 , α SA tan δ SA + α MS tan δ MS , and tan δ HC for different choices of three surface treatments in Figure S9, and find that the solutions agree to within uncertainties.For each parameter, we fit the best single value to the three values reported by the three possible sets of surface treatments, and report these fitted values in the main text.
Other assumptions about the configuration of hydrocarbons after the three surface treatments can be made, but we find that our data exclude certain configurations of hydrocarbons.For example, if we consider the distribution of hydrocarbons depicted in S8, we recover unphysical (negative) values for certain loss tangents, which implies that the model is incorrect.This suggests that piranha cleaning is effective at removing fabrication related hydrocarbons from the sapphire surface.

VII. OXIDE THICKNESS AFTER LONG BOE
In [15], we measure the oxide thickness of tantalum films under three surface conditions: native, BOE treated for 20 minutes, and BOE treated for 40 minutes.The technique used, variable energy XPS (VEXPS) requires a synchrotron light source, and so could not be replicated in our lab to measure the thickness of the 120 minute BOE treated surface.Instead, to estimate the total oxide thickness, we correlate XPS measurements done on our laboratory system to oxide thickness measurements from [15].
We measured the Ta4f spectrum for four samples with four surface treatments: native; and treated in BOE for 20, 40, and 120 minutes (Figure S10(a)).We subtracted a Shirley background from all spectra [12] and normalized all data so that the metallic Ta 7/2 peak height is unity.Similar to our observations in Section V, we see a decrease in the photoelectron fraction from the Ta 5+ species with BOE treatment.We fit all Ta4f spectra with doublets associated with the Ta 0 , Ta 0 int , Ta 1+ , Ta 3+ , and Ta 5+ states [13,15].The Ta 0 and Ta 0 int peaks are all fit with asymmetric Voigt peaks, while the Ta 1+ , Ta 3+ , and Ta 5+ peaks are all fit with symmetric Gaussians.
We consider the total oxide thickness to be the sum of the Ta 1+ , Ta 3+ , and Ta 5+ species.We expect the fraction of the photoelectron intensity corresponding to these peaks to be proportional to their thickness with some unknown rate parameter.To find this rate parameter, we take the photoelectron intensity fraction of all oxide species for each sample and compare them to the oxide thicknesses measured in VEXPS (S10(c)).We find an approximate linear relationship between the photoelectron intensity fraction of the oxide and the measured oxide thickness in VEXPS.We extrapolate this line to the photoelectron intensity fraction of the 120 minute BOE treated device and find an estimated oxide thickness of 1.5 ± 3 mm.The error in this estimate is dominated by the linear fit shown in S10(c).[12] and normalized so the peak height of the Ta 7/2 metallic peak is unity.(b) Fit to the XPS peaks for the 20 minute BOE treated sample.The peaks used to fit the spectrum are doublets of Ta 0 (dark blue), Ta 0 int (cyan), Ta 1+ (green), Ta 3+ (yellow), and Ta 5+ (pink).Ta 0 and Ta 0 int peaks are fit with asymmetric Voigt profiles, others are fit with symmetric Gaussians.The lower binding energy peak in each doublet corresponds to the Ta 7/2 spin state and the higher to the Ta 7/2 spin state [14].(c) Correlation of oxide photoelectron intensity fit to our lab-based XPS data to the oxide thickness measured in VEXPS [15].Data (blue) is available from VEXPS for native, 20 minute BOE treated, and 40 minute BOE treated samples.Green is the best fit line to these data points, extrapolated to the oxide photoelectron intensity fraction of the 120 minute BOE treated sample (grey dashed line) to give an estimate of the oxide thickness after a 120 minute BOE treatment (orange).Photoelectron intensity fractions are normalized so that the native intensity fraction is unity.

VIII. CORRELATIONS AMONG OTHER PARAMETERS
The model described in Equations 1-3 in the main text contains seven free fit parameters.To check that we can independently extract all seven parameters from our dataset, we plot each pair of fitted parameters in Figure S11.We see no correlations among parameters except between T c and Q QP,0 and between D and β 2 .
The correlation between T c and Q QP,0 is likely due to the limited amount of high temperature data that we recorded (Section X).We conclude that, with our current dataset, we cannot quantitatively separate T c and Q QP,0 .
The correlation between D and β 2 is an artifact of the parameterization of Equation 2. The correlation is a straight line on a log-linear plot, and therefore if we considered D 1/β2 as our fit parameter instead of D, we would see no correlation.
We note that no correlation is apparent between Q TLS,0 and any of the other six parameters.Therefore we are able to meaningfully differentiate the linear absorption of our TLS bath from its saturation behavior, and from the effects of thermal quasiparticles and the constant loss parameterized by Q other .In particular, we note that the fitted value of T c can range from 0.14 K to above 4 K, which can affect the available temperature range of the data (Section X), but the fitted Q TLS,0 value does not correlate with T c .In addition to the surface treatments we discuss in the main text, we also vary the method of etching the tantalum, as described in Section I A. The plot of Q TLS,0 vs p MS in Figure 3 of the main text is reproduced here, but with the data points further stratified by etch type (Figure S12).It is possible that dry etching with either the Cl or F based recipe leads to additional surface damage and TLS loss or that different etch types create different edge qualities, however our data does not have enough statistical power to conclusively identify another source of loss arising from etch type.

B. Annealing sapphire
One possible source of TLS loss is the disordered sapphire surface.We explored sapphire annealing to interrogate the contribution of this surface.Prior to tantalum deposition, we processed some of our sapphire wafers to achieve a near atomically-flat surface with observable step edges to probe the impact on the metal-substrate and substrate-air losses.Atomic terraces have been previously observed on sapphire after high temperature annealing [17].
After cleaning the sapphire wafers in a 2:1 piranha bath and rinsing dry, we then treated the sapphire in 146 • C sulphuric acid (SigmaAldrich catalogue number: 258105) for 20 minutes, then triply rinsed the wafer in de-ionized water, rinsed once in 2-propanol, and then blew it dry in N 2 .Finally, we annealed the sapphire in an air furnace with a temperature ramp of 4.1 • C/minute to 1100 • C, and then held at 1100 • C for one hour.
We confirmed that we were able to achieve a flat surface by performing atomic force microscopy (AFM) on a sapphire sample prior to deposition (Figure S13(b)).The AFM used was a Bruker ICON3 with a 7 nm AFM tip.We observe discrete steps in height, each approximately 250 pm.This step size is on the order of a single lattice constant, and so we conclude we are observing atomic terraces.
We measured several resonators fabricated from films deposited on this annealed sapphire.All of these resonators were treated in BOE for 20 minutes.The dependence of Q QTLS,0 on SPR for these resonators is not distinguishable from the dependence seen for other BOE treated resonators (Figure S13(a)).As there is no measurable effect, we conclude that the sapphire anneal does not affect TLSs that are limiting our devices.We note that all resonators that were measured on annealed sapphire had high SPRs, in the linear region in Figure S13(a), and thus are not sensitive to changes in bulk loss.An interesting avenue for future exploration would be to see if high temperature annealing can change the bulk loss in sapphire.

C. Packaging
We used three types of package in our experiments.The first was a copper "puck and penny" assembly, the second was the commercially available QCage.24 from QDevil, and the third was a modified version of the QCage.24 with an aluminum flashed coating on the surfaces of the package which face the device.
We compare the measured values of Q other achieved for devices in each of the three packages in Figure S14.We find that higher values of Q other can be achieved for the QCage.24package, with the highest values achieved with the aluminum flashing.Nine total devices packaged in the QCage.24 and aluminum flashed QCage.24 were excluded from Figure S14, as Q other was too large relative to the measured values of Q int to be able to be fit confidently.
We conclude that, in some cases, Q other is limited by packaging loss which are present on the puck and penny assembly, but not present on the QCage.24.Based on the fact that the highest values of Q other are found with aluminum flashing on the inside of the QCage.24,we further conclude that in the QCage.24, the electric field of the modes of our devices have a non-negligible overlap with the packaging material.The improvement in Q other is achieved by having the nearest surface of the package be a superconducting metal.

D. Surface morphology
Our tantalum films were deposited both by our group and by Star Cryoelectronics.Both sources showed a bodycentered cubic α-Ta phase with majority 111 orientation when measured with an X-ray diffractometer, however, the surface morphology as measured with atomic force microscopy (AFM) is qualitatively different.Figure S15(d We see no qualitative difference in the temperature sweep data between the two types of films (Figure S15(a-b)).We compare the fitted values of the surface loss tangent from devices with the same surface treatment fabricated on films from the two sources, and see no significant difference (Figure S15).We conclude that any losses associated with this variation in observed surface morphology difference do not limit device performance.

E. Rapid thermal annealing
With XPS, we can observe a shoulder peak at approximately 0.4 eV higher binding energy than the metallic tantalum peaks.Peaks in this location have been observed in [13], in which they are attributed to the closest layer of tantalum metal atoms to the oxide and have a differing coordination number to those in the bulk.A plausible hypothesis for a location of TLSs is in this interfacial tantalum layer.
Rapid thermal annealing (RTA) is used in semiconductor processing to increase the orderng of interfacial layers in thin films [18], and has been shown to have an effect on tantalum oxide thin films [19].We used (RTA) to change the metal-oxide interface.Our process consisted of a ramp to 800 • C in 30 seconds and holding at 800 • C for a further 30 seconds.The process was completed in a Thermo Scientific Lindberg/Blue M furnace (PN: STF55433C-1).
We performed XPS on native samples with and without the RTA process, observed a decrease of approximately 20% in the fitted intensity of the interfacial tantalum shoulder peak (Figure S16(b-c)).We performed this RTA process on a resonator chip and measured a temperature sweep (Figure S16(a)).We observe no qualitative difference between the temperature sweep data on this device and data from temperature sweeps on devices without the RTA process.For the device fabricated on the film with the RTA process, the fitted Q TLS,0 is (6.97 ± 0.36)×10 5 with a SPR of 2.6×10 −3 .
We measured three other devices with an SPR of of 2.6×10 −3 that were BOE treated, and we extracted a mean Q TLS,0 of (6.0 ± 0.4)×10 5 .The device which underwent RTA has a Q TLS,0 over 2σ higher than the mean Q TLS,0 for the control devices, we conclude that we may have seen a significant performance difference due to RTA.However, given that we only measured one RTA device, we cannot rule out the possibility that there would be a change in the extracted loss tangent of a family of RTA processed devices.All data are Shirley background corrected [12] and normalized so the total intensity for the spectrum is unity.The peaks used to fit the spectrum are doublets of Ta 0 (dark blue), Ta 0 int (cyan), Ta 1+ (green), Ta 3+ (yellow), and Ta 5+ (pink).Ta 0 and Ta 0 int peaks are fit with asymmetric Voigt profiles, others are fit with symmetric Gaussians.The lower binding energy peak in each doublet corresponds to the Ta 7/2 spin state and the higher to the Ta 7/2 spin state [14].There is an approximately 20% decrease in the fitted intensity of the Ta 0 int peaks.Data taken at the Spectroscopy Soft and Tender 2 (SST-2) endstation at the National Synchrotron Light Source II at Brookhaven National Lab with X-ray energy 2000 eV.Data were collected with the same methodology described in [15] X. EXAMPLES OF TEMPERATURE FITS

FIG. 2 .
FIG. 2. Parametrizing losses.(a) Internal quality factorQint as a function of applied microwave power and temperature for a characteristic resonator.Color shade indicates the applied power measured at the output of the network analyzer, spaced 10 dB apart from the highest power of -10 dBm (darkest blue) to the lowest power of -60 dBm (lightest blue).The traces are well separated at low temperatures, and then collapse together and fall exponentially at high temperatures.The characteristic shape of the curves is fit to a model incorporating TLS loss and equilibrium quasiparticles.Solid lines show the best fit to the dataset.(b) Shift in the resonant frequency with temperature relative to the base temperature center frequency for a representative device.Solid line represents a fit to the data.(c) Comparison between estimates of QTLS,0 extracted from two independent measurements, the power and temperature dependence of Qint and the temperature dependence of the frequency.The dashed line is a guide to the eye showing the case where QTLS,0 is equal for both measurements.Only 26 devices are shown in this plot because most measurements were optimized for measuring Qint, and did not have enough data to extract QTLS,0 from the fractional frequency shift with high confidence.

FIG. 3 .
FIG.3.Dependence of loss on SPR.(a) Cartoon cross-section illustrating the dependence of SPR on device geometry.As the distance between capacitor pads (gray) increases, the fraction of the electric field (black arrows) energy overlapping with a thin layer at the three interfaces, metal-air (yellow), metal-substrate (purple), and substrate-air (green), decreases.The fraction of the electric field energy in the sapphire substrate (blue) does not strongly depend on the distance between capacitor pads.(b) Dependence of the extracted QTLS,0 from Qint measurements on SPR.For the highest SPR devices QTLS,0 exhibits linear scaling with SPR, but for lower SPR the QTLS,0 saturates, indicating that there are both surface and bulk TLS baths.Comparing the data for four different surface conditions, native (orange), BOE (dark blue), long BOE (light blue), and triacid (green) allows for the estimate of surface loss tangents for each surface: tan δ surface,native (dashed orange line), tan δ surface,BOE (dashed dark blue line), tan δ surface,long BOE (dashed light blue line), and tan δ surface,triacid (dashed green line), as well as the bulk substrate loss tangent tan δ bulk (dashed grey line).(c) Histogram of surface loss tangents for native (orange), BOE (dark blue) and long BOE (light blue) CPW devices, showing that BOE and long BOE treatments result in surface loss tangents that are a factor of 1.89 times and 1.96 times lower than the native surface on average.

FIG. 4 .
FIG.4.Dependence of QTLS(n = 1) on SPR.QTLS(n = 1) is calculated at base temperature.Data from four different surface conditions, native (orange), BOE (dark blue), long BOE (light blue), and triacid (green) are shown.While QTLS is a nonlinear function of n, we nevertheless observe a roughly linear dependence between QTLS(n = 1) and SPR.We fit an apparent loss tangent to each surface condition: tan δ surface,native (n = 1) (dashed orange line), tan δ surface, BOE (n = 1) (dashed dark blue line), tan δ surface, long BOE (n = 1) (dashed light blue line), and tan δ surface,triacid (n = 1) (dashed green line), as well as the bulk substrate loss tangent tan δ bulk (n = 1) (dashed grey line).Data are calculated from Equation7with errors propagated from errors in fit parameters.The error bars are truncated at the lower end by QTLS,0.
arXiv:2301.07848v1 [quant-ph] 19 Jan 2023 commercial microwave package (QDevil QCage.24) with an associated Au-plated PCB.The mounting of the package to the dilution refrigerator is described in Section I C.
FIG. S1.Wiring diagram for each of our measurement lines.Ranges of attenuation are given where the attenuation varied from line to line.The magnetic shields in our experiments varied between a QCan supplied by QDevil and a custom made mu-metal can.A traveling wave parametric amplifier (TWPA) was sometimes used on the output line in our experiments, and was placed in a separate magnetic shield and wired after the second isolator.
FIG. S2.Wide frequency transmission sweep of a chip with four resonators coupled to a single feedline.The four sharp dips correspond to the location of the four resonators.
FIG. S3.Fitted Qc parameters for a power and temperature sweep.This dataset corresponds to the same power and temperature sweep shown in Figure2aof the main text.Colors correspond to power applied at the input, with the highest power being the darkest shade and the lowest power being the lightest shade.All powers are spaced 10 dB apart.
FIG. S4.(a) An example of high-power resonator transmission trace with observable non-linearity.The best fit line to the data using Equation 1 is shown in orange.This trace was excluded from our analysis.(b) Power sweep showing Qint for the non-linear trace in (a) at the maximum n.The orange line is a fit to the data with the non-linear datapoint excluded.
FIG. S5.Plots of Tc estimates provided by the frequency shift and Qint fitting methods for the three different quasiparticle frequency shift regimes.
FIG. S6. data of the surface of tantalum films after various surface treatments.Scans are of the Ta4f (a), O1s (b), and C1s (c) spectra.The "No treatment" data were taken after photoresist was stripped from a sample."Piranha", "Piranha + BOE", and "Triacid" correspond to the "Native", "BOE", and "Triacid" surface conditions, respectively.Ta4f and O1s data have a Shirley background subtracted[12], and all C1s data have a linear background subtracted.All data are normalized to the total Ta4f intensity measured on the sample.
FIG. S9.Model estimates provided by the three possible combinations of hydrocarbon-free species.Blue is BOE and triacid, orange is long BOE and triacid, and green is BOE and long BOE.The best fit value is shown in red, with a shaded box to distinguish it from the different extracted estimates.(a) Estimates from QTLS,0 data.(a) Estimates from QTLS(n = 1) data.
Photoelectron intensity (norm)(a) FIG. S11.Correlations between all seven fitted parameters used in Equation 1, Equation 2, and Equation 3 in the main text.uncertainty bounds in Q other are truncated to QTLS,0 FIG. S12.Dependence of QTLS,0 on SPR separated by etch type.
FIG. S13.(a) Dependence of extracted QTLS,0 on SPR, separated into devices fabricated on annealed substrates and those fabricated on unannealed substrates.All devices were treated in BOE for 20 minutes.No significant difference in performance is seen.(b) AFM image of annealed sapphire surface.Scanned in 512 lines with a 1 Hz scan rate and a 7mm tip.
FIG. S14.Dependence of extracted Q other on SPR, separated into devices packaged into the puck and penny assembly, the QCage.24 with bare copper surfaces, and the QCage.24 with aluminum flashed surfaces.Lower error bars are truncated to the value of QTLS,0.
) and FigureS15(e) show AFM images (Bruker ICON3) taken on the tantalum surface of films deposited by our group and Star Cryoelectonics, respectively.
FIG. S15.Effect of surface morphology on device performance.Tantalum films used in our experiment were deposited by our group or by Star Cryoelectronics, and films from the two different sources show qualitatively different surface morphologies.(a-b) Example temperature sweeps from devices fabricated tantalum deposited by our group (a) or Star Cryoelectronics (b).Both devices were BOE treated and have surface participation ratios of approximately 10 −3 .The color represents input power, with the darkest shade being the highest power.The spacing between powers is 10 dB.(c) Histogram of surface loss tangents from devices fabricated on films deposited by our group (blue) and Star Cryoelectronics (orange).Only devices with a BOE treatment are included.(d-e) Example atomic force microscopy images showing surface morphology on a film deposited by our group (d) and by Star Cryoelectronics (e).The color scale represents depth.