Self-consistent extraction of spectroscopic bounds on light new physics

Fundamental physical constants are determined from a collection of precision measurements of elementary particles, atoms and molecules. This is usually done under the assumption of the Standard Model~(SM) of particle physics. Allowing for light new physics~(NP) beyond the SM modifies the extraction of fundamental physical constants. Consequently, setting NP bounds using these data, and at the same time assuming the CODATA recommended values for the fundamental physical constants, is not reliable. As we show in this Letter, both SM and NP parameters can be simultaneously determined in a consistent way from a global fit. For light vectors with QED-like couplings, such as the dark photon, we provide a prescription that recovers the degeneracy with the photon in the massless limit, and requires calculations only at leading order in the small new physics couplings. At present, the data show tensions partially related to the proton charge radius determination. We show that these can be alleviated by including contributions from a light scalar with flavor non-universal couplings.


I. INTRODUCTION
Precision measurements of atomic and molecular properties play a dual role in fundamental physics.On the one hand, assuming the Standard Model (SM) of particle physics, these are used to determine two of the SM parameters, the fine-structure constant, α, and the electron mass, m e (through the Rydberg constant R ∞ ≡ α 2 m e c/(2h)), along with a number of other observables such as the charge radii and relative atomic masses of the proton and deuteron.An example is the determination of fundamental physical constants by the Committee on Data of the International Science Council (CODATA) [1].
On the other hand, precision measurements can be used to search for new physics (NP) beyond the SM.Such searches have been conducted using measurements of single particle observables [2][3][4], atomic systems [5][6][7][8][9][10], and molecular systems [11][12][13][14], see [15] for a review.The presence of NP would manifest itself as a discrepancy between measurements and theoretical SM predictions.The difficulty here is that in many cases the SM predictions depend on the fundamental physics parameters, which in turn were extracted from data by CODATA under the assumption that the SM is correct, and no NP exists.In general, the presence of NP would affect the extraction of fundamental constants, possibly reducing the claimed sensitivity of NP searches.This subtlety is more often than not ignored in the literature.
In this Letter we propose and carry out a selfconsistent determination of constraints on light NP models by performing a global fit, simultaneously extracting the SM and NP parameters.We go well beyond the previous studies [5,6,16], which were performed only on subsets of data.We pay special attention to the potentially problematic limit of massless NP.The challenge is that the SM predictions are calculated to a higher perturbative order than the leading order (LO) NP contributions, which can then lead to incorrect limiting behaviour for very light NP.Below, we provide a prescription, valid to LO in NP parameters, that corrects for such mismatches in the theoretical predictions, and leads to the proper massless NP limit.
The global fit shows several 3 σ (3 standard deviations) discrepancies between observables and predictions, as-suming the SM.These anomalies are well known: they correspond to the measurements constituting the proton charge radius puzzle [17][18][19], with the addition of new measurements of hydrogen transitions [20,21].Reference [20] showed the tension of their 2S 1/2 − 8D 5/2 measurement with other hydrogen data is relaxed in the presence of an additional Yukawa-like interaction.Our global analysis, which determines simultaneously both the SM and NP parameters, shows for the first time that all these deviations can be largely accounted for in a single NP model -a light scalar that couples to gluons, electrons and muons.

II. NEW PHYSICS BENCHMARK MODELS
We focus on minimal extensions of the SM, where either a light scalar boson, φ, or a light vector boson, φ µ , is added to the spectrum of SM particles.The new light particle is assumed to have parity conserving interactions with the SM electrons and muons, as well as with light quarks, resulting in couplings to neutrons and protons 1 .The interaction Lagrangian is therefore given by where Γ • φ ≡ φ, γ µ φ µ for spin s = 0, 1 bosons, respectively.Taking the nonrelativistic limit for ψ i , and working at LO in g i , the tree level exchange of φ or φ µ induces a Yukawa-like nonrelativistic potential, V ij NP (r) = (−1) s+1 α φ q i q j e −m φ r r , between particles ψ i and ψ j , separated by a distance r.The NP coupling constant, α φ ≡ |g e g p |/(4π) > 0, gives the strength of the NP induced potential between electrons and protons.The strength of NP interactions between fermions ψ i and ψ j , relative to the electronproton one, is given by the product of effective NP couplings, q i q j , where q i ≡ g i / |g e g p |.In particular, for the electron-proton system the product of effective NP couplings can take the values, q e q p = ±1.For q i q j > 0 the potential (2) is attractive (repulsive) for spin 0 (1) mediator φ, and vice versa for q i q j < 0.
In the numerical analysis, we consider the following benchmark NP models: a. Dark photon.The light NP mediator is a vector boson with couplings to the SM fermions proportional to their electric charges.A UV complete realization is an additional abelian gauge boson with field strength F µν , that couples to the SM through the renormalizable kinetic mixing interaction, −( /2)F µν F µν [22], where F µν is the electromagnetic field strength.To LO in this yields α φ = α 2 and q e = q µ = −q p = −1, q n = 0. b.B − L gauge boson.The difference of baryon (B) and lepton (L) numbers is non-anomalous, and can be gauged without introducing new fermions [23,24].Light B − L gauge boson with gauge coupling g B−L gives rise to the NP potential in (2) with α φ = g 2 B−L /(4π).The charges q e = q µ = −q p = −q n = −1 coincide with the dark photon ones, except for neutron.Comparison of B −L and dark photon bounds illustrates the importance of performing spectroscopy of different isotopes of the same species, such as hydrogen and deuterium.c.Scalar Higgs portal.A light scalar mixing with the Higgs boson [25,26] inherits the SM Yukawa structure, giving α φ = sin 2 θ m e κ p m p /(4πv 2 ) 1.8 × 10 −10 where v 246 GeV is the SM Higgs vacuum expectation value, and θ the scalar mixing angle.The effective leptonic ( = e, µ) charges are q = m / √ m e κ p m p , while the effective nucleon charges (N = p, n) are given by q N = κ N m N / √ m e κ p m p with κ p 0.306( 14) and κ n 0.308( 14) [27][28][29][30][31][32] (see also Sec.S4).Since couplings to muons and nucleons are enhanced by q µ /q e = m µ /m e 200 and g N /q e = m N /m e 2 × 10 3 , respectively, this NP benchmark highlights the relevance of muonic atom and molecular spectroscopy.d.Hadrophilic scalar.A scalar with q = 0 and √ m e κ p m p , i.e., with vanishing couplings to leptons, highlights the importance of molecular hydrogen ion spectroscopy as a probe of internuclear interactions [11,14].For expedience we take g N to be the same as for the Higgs portal, but this could be relaxed in general.
e. Up-lepto-darko-philic (ULD) scalar.In order to evade strong bounds from K + → π + + X inv searches, where X inv are invisible particles that escape the detector, see Sec.V, we adopt a particular version of a light scalar benchmark.The ULD scalar has enhanced couplings to leptons, q = m / m e κ p m p , and reduced couplings to nucleons (due to couplings to only the up quark), q N = κ N m N / m e κ p m p , with κ p 0.018(5) and κ n 0.016 (5), and α φ = k 2 m e κ p m p /(4πv 2 ), with k a dimensionless parameter controlling the overall strength of interactions, which is varied in the fit.The φ is assumed to predominantly decay to invisible states, possibly related to the dark matter, which evades constraints from beam dump experiments.See Sec.S4 in the supplemental material for further details, including results for an additional NP benchmark model-the scalar photon.

III. DATASETS
The adjustment of parameters, i.e., the fitting procedure, presented in this work has been carried out using two different datasets, CODATA18 and DATA22.The CODATA18 dataset consists of data that was used in the latest CODATA adjustment in Ref. [1], but restricted only to the subset most relevant for constraining NP.This subset contains observables related to the determination of the Rydberg constant R ∞ , the proton and deuteron radii, r p and r d respectively, the fine-structure constant α, and the relative atomic masses of the electron, proton, and deuteron: A r (e), A r (p) and A r (d), respectively.The inputs are listed in Tables S1, S3, and include theory uncertainties in Table S2.The other observables and parameters included in the CODATA 2018 adjustment are very weakly correlated with the selected data, and can be neglected for our purpose.
The DATA22 dataset combines the updated CO-DATA18 inputs with the additional data that improve the overall sensitivity to NP (see Table S6 and S9).In particular, we include the measurements of transition frequencies in simple molecular or molecule-like systems, the hydrogen deuteride molecular ion (HD + ) [33][34][35], and the antiprotonic helium atom (p 3 He and p4 He) [36,37].These have an enhanced sensitivity to the NP models with mediators that have large couplings to quarks (and thus nuclei).The three benchmark models of this type are the Higgs portal, hadrophilic and ULD scalars, cf.Sec II.
The CODATA18 dataset is used as a reference point to verify the implementation of the inputs and the adjustment procedure, while DATA22 is used to obtain our nominal results.The full list of data in the two datasets, as well as further discussion of the importance of including certain observables when constraining NP, is given in Supplemental Material, Sections.S2 and S3.

IV. LEAST-SQUARES ADJUSTMENT WITH NEW PHYSICS
The experimental data are compared to the theoretical predictions with NP following the linearized least-squares procedure [38].The theoretical prediction for an observable O takes the form, where O SM is the state of the art SM prediction, and depends on the SM parameters g SM = {R ∞ , r p , r d , α, A r (e), A r (p), A r (d)}, while the NP contribution O NP depends in addition on α φ and m φ .The theoretical uncertainties are included as in Ref. [1], by adding a normally distributed variable δO th with zero mean and standard deviation equal to the estimated uncertainty of the theoretical expression.The δO th 's are treated as yet another set of input data and varied in the fit, along with g SM , α φ , and m φ , in order to minimize the χ 2 function constructed from the input data and theory predictions (see also Sec.S1).
The SM theoretical predictions for atomic transition frequencies, the electron anomalous magnetic moment, and bound-electron g-factors are from Ref. [1] (see references therein).The predictions for the HD + and pHe transition frequencies are from Ref. [39,40] and [41][42][43], respectively, and are updated with the latest CO-DATA recommended values, see Sec.S3 for details.
The NP contributions to atomic and molecular ion transition frequencies are obtained using (timeindependent) first-order perturbation theory [44,45].We use exact nonrelativistic wavefunctions for hydrogen-like atoms and very precise nonrelativistic numerical ones from a variational method of Ref. [46] for HD + and pHe.Expectation values of the Yukawa potentials in Eq. ( 2) are calculated for a grid of m φ values, taking advantage of the fact that their matrix elements in the chosen basis can be obtained in an analytical form.The precision is limited to O(α 2 ) because of the neglected relativistic corrections to the wavefunction.The NP contribution to the free-electron (g − 2) e arises at one-loop [47,48], while for bound electrons we include an additional tree-level contribution from electron-nucleus interaction [49].Finally, we assume NP to have negligible effects in atom recoil measurements as well as relative atomic mass measurements from cyclotron frequency measurements in Penning traps.
We pay particular attention to the possible degeneracy between the determination of SM and NP parameters.In the m φ → 0 limit, the dark photon is completely degenerate with the QED photon, since couplings of the two are aligned, q i = Q i , and thus only the combination α + α φ can be determined from data.This degeneracy should be retained in the theoretical predictions (3), which in principle requires calculating NP effects to the same very high order as the SM.We propose an alternative procedure, which uses the state-of-the-art SM calculations but requires NP contribution only at LO in α φ , and reproduces the correct q i → Q i , m φ a 0 1 limit, where a 0 ≡ α/(4πR ∞ ) = (αm e ) −1 is the Bohr radius.
For light vectors we rewrite the NP potential in Eq. ( 2) as the sum of the Coulomb-like potential with QED coupling Q i plus the remainder, where V ij NP (r) ≡ α φ (q i q j e −m φ r − Q i Q j ) /r.The theory predictions are evaluated at LO in V NP (r), while the NP Coulomb term and the related relativistic corrections are evaluated to the same order as the SM, which amounts to replacing α → α + α φ in the SM predictions.For any observable O the theoretical prediction is then where O SM is the SM contribution now expressed as a function of α + α φ and O NP is the NP contribution from V NP .In the m φ → 0, q i → Q i limit the potential V NP vanishes, and all theory predictions are the SM ones, but shifted by α → α + α φ .For massive dark photon with m φ a 0 1 the leading effect of V NP is parametrically O NP ∝ m 2 φ .Note that for massless B − L the potential ṼNP vanishes in hydrogen but not in deuterium where O NP ∝ q n , thus breaking the degeneracy between the SM and NP contributions when m φ → 0.
For light scalars there is no degeneracy with QED in the massless mediator limit; it is lifted by relativistic corrections.We can use directly the state-of-the-art SM predictions, and simply add to them the NP contribution due to the potential (2) at LO, without any special treatments, see also Sec.S7.

V. RESULTS
First, we perform the control fit, i.e., the least-squares adjustment assuming SM, based on the CODATA18 dataset with inflated experimental uncertainties when there are tensions in the data [1], see also Sec.S2.Resulting χ 2 per degree-of-freedom (dof) is χ 2 SM /ν dof 0.95 (ν dof = 78 − 44 = 34), indicating an overall good description by the SM, and the use of correct expansion factors.The output g SM values and relative uncertainties (see Table S5) are in excellent agreement ( 0.2 σ) with the latest CODATA recommended values [1], validating our procedure.
Next, we perform adjustments based on the DATA22 dataset, assuming either the SM or one of the NP benchmark models in Sec.II.We do not inflate experimental errors, since mild tensions in the data could be a hint of NP.The SM-only hypothesis still describes the data relatively well, with χ 2 SM /ν dof 1.4 (ν dof = 102 − 62 = 40), FIG. 2. The constraints on ULD scalar in the α φ , m φ plane, with purple-shaded 1, 2, 3, 4 σ CL regions favored by the DATA22 dataset (black dot is the best-fit point).Exclusions are by SN1987a [50,51] (below the pink line, absent if φ invisible decay dominates), NA62 K + → π + Xinv search [52] (green, the dashed line is a naive NNLO estimate), stellar cooling [53] (gray), and E137 [54,55] (between yellow dashed lines, absent if φ invisible decay dominates).
Figure 1 shows the 95 % confidence level (CL) upper bounds on α φ as function of m φ for the NP benchmark models, Sec.II.The strongest exclusion is always reached around m φ ∼ a −1 0 ∼ 4 keV, and stays roughly constant for lighter m φ (except for dark photon due to degeneracy with QED in the m φ → 0 limit, see Sec.IV).Deuterium observables translate to a ∼ 2× stronger bound on B − L at m φ ∼ a −1 0 , compared to dark photon.The significantly stronger bounds on the Higgs portal and hadrophilic scalar for m φ 10 keV are due to the ∼ κ p m p /m e 500 enhancement in internucleon interactions (compared to electron-nucleon potential), affecting the HD + observables.For heavier NP, m φ a 0 1 (m µ /m e ) in hydrogen (muonic hydrogen), the interaction is point-like, with suppressed electron (muon) wave function overlap, and the bounds decouple as ∝ 1/m 2 φ (and more quickly for hadrophilic scalar).The bounds are stronger for Higgs portal and ULD scalar due to ∼ m µ /m e 200 enhanced effects in muonic hydrogen.
For m φ a 0 m µ /m e the Higgs portal and ULD scalar are statistically preferred over the SM at the ∼ 4 σ and ∼ 5 σ level, respectively.Figure 2 shows the preferred region for the ULD scalar, around the best-fit point, m φ = 300 keV and α φ = 6.7 × 10 −11 .This NP hint is supported mostly from the recent measurements of the hydrogen 2S 1/2 − 8D 5/2 and 1S 1/2 − 3S 1/2 transitions [20,21], as well as muonic deuterium, cf.Sec.S5.While these tensions between data and the SM predic- tion are not new, our analysis shows that all tensions can be significantly ameliorated when including NP interactions due to a single light scalar.The favored NP mass is close to the (inverse) Bohr radius of muonic atoms, a −1 0 × m µ /m e ∼ MeV, due to the large muonelectron coupling ratio in these models, contrasting with scalars having weaker or vanishing coupling to muons (see Sec. S4 and [20]).However, other constraints require the scalar to have rather a nontrivial pattern of couplings, see Sec S4 A. For ULD scalar the E137 [54,55] bounds are evaded since φ decays predominantly to an invisible dark sector.Since φ couples to up quarks and not directly to heavy quarks and gluons, the bound from NA62 search for K + → π + φ [52] is weakened [51,56].Finally, the minimal ULD model induces a too large contribution to (g − 2) µ , however this can be suppressed in less minimal versions with a custodial symmetry [57].
The presence of NP also impacts the determination of the fundamental constants in the SM. Figure 3 shows the 68 % CL determination of r p and R ∞ , subtracting the CODATA 2018 recommended values and normalizing to respective errors.The SM-parameter uncertainties increase in the presence of NP and the central values shift outside the nominal SM ellipse, shown explicitly in Fig. 3 for the Higgs portal and ULD scalar model.Because of the degeneracy with the photon the uncertainty on α in the dark photon model increases as 1/m 2 φ for masses below 10 eV (see Fig. S7) and eventually becomes comparable to α itself for m φ ∼ 0.1 meV, while α+α φ remains well constrained.

VI. CONCLUSIONS
Extracting bounds on light NP from a global fit to spectroscopic and other precision data requires both SM and NP parameters to be determined simultaneously.The possibility of NP contributions changes the extracted allowed ranges of SM parameters, a change that can be quite substantial, see Fig. 3. Furthermore, we provided a prescription to consistently include NP corrections from light vectors.It requires calculations of NP contribution only at leading order, and recovers the expected degeneracy between dark photon and QED in the massless mediator limit.
At present, spectroscopic data show tensions that could either be due to unknown or under-appreciated systematics, or to light NP.We showed that the ∼ 4σ anomaly in data can be explained by a flavor nonuniversal light scalar model.

Self-consistent extraction of spectroscopic bounds on light new physics Supplemental Material
Cédric Delaunay, Jean-Philippe Karr, Teppei Kitahara, Jeroen C. J. Koelemeij, Yotam Soreq, and Jure Zupan In the supplemental material we give further details on the linearized least squares method (Sec.S1), the CO-DATA18 dataset (Sec.S2), the DATA22 dataset (Sec.S3), the new physics benchmarks models (Sec.S4), new physics improvements relative to the SM (Sec.S5), the extraction of fundamental constants in the presence of NP (Sec.S6), and the m φ → 0 limit (Sec.S7).

S1. THE LINEARIZED LEAST SQUARES METHOD
We use the method of linearized least squares employed by the CODATA to determine both the fundamental and NP constants.For completeness we summarize below briefly the fitting procedure, while further details can be found in Appendix E of Ref. [38].
The N input data w i are either measured quantities or estimated errors on theoretical calculations (for these the central values are taken to be zero), and are listed in Tables S1-S4 and S6-S7, S9, see also Sections S2 and S3 for a detailed discussion of the two datasets we use.The predictions for w i are functions of M fitted parameters (adjusted constants) z j , The dotted equality sign means that the left and right hand sides are not equal in general, since the set of equations is usually overdetermined (N ≥ M ), but should agree within experimental errors, if the assumed physics model and measurements are correct.Most of the f i are nonlinear functions, which we linearize, i.e., Taylor expand around some initial values s j , chosen close enough to the expected values of z j such that the higher-order terms can be neglected, Linearization allows for more efficient determination of adjusted constants.Observational equations (S1) are now a set of overdetermined linear equations.In matrix notation these are where Y is an N -dimensional vector of y i ≡ w i − f i (s), A is an N × M matrix with elements a ij ≡ ∂f i (s)/∂z j , while X is an M -dimensional vector of x j ≡ z j − s j .The best fit value of X is obtained by minimizing the χ 2 function, where V is the N × N covariance matrix constructed from variances u ii = u 2 i and covariances u ij ≡ u(w i , w j ) for inputs w i,j (here u i ≡ u(w i ) is the standard uncertainty).Updated correlation coefficients for the DATA22 dataset are listed in Tables S8, S10.The result of the fit, vector X with components xj that minimizes χ 2 , is given by where G ≡ (A T V −1 A) −1 is the covariance matrix of X, i.e., an M × M matrix that captures the uncertainties and correlations of the adjusted constants z j .The solution X is only an approximation of the exact solution to the nonlinear problem.More precise values of the adjusted fundamental constants are obtained by iterating the above procedure using s j + xj as starting values, until the new xj 's are negligibly small compared to their estimated uncertainties.Similarly to the CODATA, we use the convergence condition In practice, this condition is satisfied after a few iterations.Finally, we define for each input datum a normalized residual where ẑ denotes the final value of the adjusted constants.Ignoring correlations, the contribution of each input datum i to the minimal χ 2 is R 2 i .

S2. THE CODATA18 DATASET
The CODATA 2018 dataset contains all the inputs from Ref. [1] related to the determination of the Rydberg constant R ∞ , the proton and deuteron radii, r p and r d , respectively, the fine-structure constant α, and the relative atomic masses of the electron, proton, and deuteron: A r (e), A r (p) and A r (d), respectively.The other observables and parameters included in the CODATA 2018 adjustment are very weakly correlated with the selected data, and can be neglected for our purposes.
The selected data include atomic transitions and non-spectroscopic observables, which depend on the main fundamental constants as follows.The inputs primarily used to determine R ∞ , r p , and r d are: A) measurements of transition frequencies in electronic hydrogen and deuterium, labeled as Ai, i = 1, . . ., 29, and listed in Table S1,    S3.Input data of the muonic hydrogen and deuterium Lamb shifts (LS), as well as the proton and deuteron charge radii from the electron-proton and electron-deuteron scatterings for the CODATA18 dataset, taken from Table XVIII of Ref. [1].
For the additive corrections C7 and C8, the relative uncertainty is with respect to the value of the corresponding theoretical quantity.
B) the additive theory uncertainties on theory predictions for the relevant energy levels, labeled by Bi, i = 1, . . ., 25, listed in Table S2, and C) the inputs for muonic hydrogen and deuterium and electron scattering, labeled by Ci, with i = 1, 2, 7, 8 and i = 9, 10, respectively (see Table S3).
The dependence of an electronic transition frequency ν i SM on physical constants is described, within the SM, by the   simplified expression where a i ≡ 1/n 2 i − 1/n 2 i , with n i , n i the principal quantum numbers of the initial and the final state, respectively, while m N is the mass of the nucleus, and i runs over all the transitions in Table S1.The coefficient b i (α) denotes higher-order relativistic and QED corrections, which take the form of a power series in α and ln(α), with the leading term starting at order O(α 2 ).Finally, c i (r N ), where r N is the nuclear charge radius, denotes the finite-nuclear-size and nuclear-polarizability corrections, where the leading term is of order (r N /a 0 ) 2 .The Bohr radius a 0 is given by Since the hydrogen transition frequencies depend only weakly on m e /m N and α, these parameters have to be determined by other means, using inputs listed in Table S4, and labeled as Di, i = 1, . . ., 23.The fine-structure constant α is extracted from two different methods: (i) a comparison of theoretical and experimental results for the anomalous magnetic moment of the electron [2] and (ii) atom-recoil experiments [58,59] that measure the mass m A of atom A in units of the Planck constant h, combined with values of the Rydberg constant and masses of the electron and atom A in atomic mass units.Following Ref. [1], the CODATA 2018 dataset therefore also includes the values of relative atomic masses for the relevant atoms.These are taken from the 2016 Atomic Mass Evaluation (AME) [60,61] (see Table S4).
The electron relative atomic mass is obtained from the measurements of spin-flip and cyclotron frequencies in a hydrogenic ion A using a theoretical calculation of the bound-electron g-factor g e (A).The atomic masses of the ions are deduced from the AME 2016 values of their neutral counterparts by subtracting the mass of the missing electrons and theoretically correcting for their binding energies [62].
The relative atomic mass of proton is extracted from the AME 2016 value of the hydrogen atom (using the theoretical binding energy) along with the more recent measurement [63]; for the relative atomic mass of deuteron, only the most recent measurement [64] is taken into account.The correlation coefficients for the A, B and D datasets are listed in Table IX and Table XXII of Ref. [1], respectively.
Finally, for the ease of comparison we follow Ref. [1], and add expansion factors for the CODATA18 analysis, in the case where no new physics is considered.The expansion factors reduce the tension in the data: we multiply by a factor of 1.6 the quoted errors for the proton radius data, i.e., to all the A, B, and C inputs, and a factor of 1.7 for the proton mass data, i.e., the D15 and D19 items.In the DATA22 analysis, on the other hand, we do not use the expansion factors.
Extracted values of g SM using CODATA18 data and assuming no new physics contributions are listed in Table S5.

S3. THE DATA22 DATASET
The DATA22 dataset contains the CODATA18 inputs, but with updated values for both the experimental and theoretical inputs, Table S7, as well as the additional input data, Table S6.Together these then provide an improved sensitivity to NP effects.
Among the data related to α determination, the recoil measurement of Ref. [69] was replaced by the improved measurement from the same group [58], and the measurement of the electron magnetic moment of Ref. [2] by the recently improved measurement [70].Concerning the electron mass, the only change is the recently discovered longdistance contribution of order α 2 (Zα) 5 ln(Zα) [71] that is included in the theoretical expression for the bound-electron g-factor.
The input data for all of the utilized relative atomic masses were updated to their AME2020 values [72,73].In particular, these values take into account the recent high-precision measurements of the proton, deuteron and HD + masses [74,75] and the deuteron-proton mass ratio [76].Note that the latest measurement of m p /m d [77] is not included in the AME2020.We chose not to add it separately because of the possible (unknown to us) correlations with the AME2020 values of the hydrogen and deuterium atomic masses.
energies.The charge radii are determined from the muonic helium spectroscopy data [78,79], which we therefore added to the DATA22 dataset.
The new inputs for the DATA22 dataset are listed in Table S6.The updates of theoretical uncertainties for hydrogen and deuterium levels, based on Refs.[65][66][67][68], and the correlation matrix (calculated following the method of Ref. [1]) are collected in Tables S7 and S8, respectively.Moreover, an expansion factor of 2.4 is applied for the fine-structure constant data, i.e., datapoints D3 and D4, because in this case the discrepancies cannot be due to NP, and are most    S9.Input data for the additive energy corrections accounting for missing contributions to the theoretical description of energy levels of three-body systems, HD + (F) and pHe (H), and the Lamb shift of muonic helium ions (I).
likely due to overlooked or underestimated systematics.Note that in the DATA22 inputs, we do not include newer input values for r p from e-p scattering.These include a new result of the PRad collaboration, r p = 0.831( 14) fm [80], and a reanalysis of modern data, giving r p = 0.847(8) fm [81].In both cases, the determination of the proton charge radius from the electron-proton scattering data is not precise enough to make an appreciable difference in the fit.
Hamiltonian, was calculated in [84] for HD + rovibrational states and in [83,85] for pHe.The E (3) term gives the leading radiative correction.It involves a numerically challenging quantity, the Bethe logarithm, which was obtained with high precision in [86] for HD + and in [83] for pHe resonant states.Other relevant numerical data for this correction can be found in [83,84].The R ∞ α 4 -order correction, E (4) , is made up of several contributions.One-and two-loop radiative corrections are given in [87,88].The relativistic correction was calculated in [89], and complete numerical data for HD + can be found in the Supplemental material of [40].The relativistic-recoil contribution was estimated from hydrogenlike atom theory [90], and the radiative-recoil term was taken from [91,92].
It is worth noting that the relativistic correction, as well as several higher order corrections (the R ∞ α 5 and R ∞ α 6 terms), were calculated in the adiabatic approximation, where the wave function is written as a product of electronic and nuclear (rovibrational) wavefunctions.Electronic wavefunctions were obtained by solving the Schrödinger equation in the field of two clamped nuclei using a variational method [93].In the first step, the R ∞ α 4 relativistic corrections for the bound electron were calculated for a range of internuclear distances [89].Then, the electronic curves were averaged over the nuclear wavefunction.The effect of corrections to the vibrational wavefunction induced by electronic corrections also needs to be taken into account [39,40].
The radiative correction E (5) comprises the one-loop self-energy [94,95] and the vacuum polarization (the Uehling potential) [96].Complete numerical data for these two contributions are available in the Supplemental Material of Ref. [40] for HD + .The Wichmann-Kroll vacuum polarization term, as well as two-loop and three-loop corrections are also included [94].It should be noted that in the case of antiprotonic helium, the Uehling correction has only been estimated from the hydrogen atom theory [95].
One-and two-loop radiative corrections of order R ∞ α 6 have been partially calculated in HD + in [39,40] (complete numerical data are available in the Supplemental Material of [40]).In antiprotonic helium, they have only been estimated using the hydrogen atom theory [95].In the nuclear finite-size and structure correction E nuc , we include the leading-order finite-size correction [87,88].In HD + , we take into account the higher-order nuclear corrections for the deuteron as described in [40].In pHe , we also include these corrections for the helium-3 and helium-4 nuclei using the theoretical expressions presented in [65].Similar corrections for the proton or antiproton are negligibly small at the present level of experimental and theoretical accuracy.The last term of Eq. (S10), denoted by E other , corresponds to the muonic and hadronic vacuum polarization contributions, with the explicit expressions given in [40].
Theoretical uncertainties, Table S9, and their correlations, Table S10, were estimated following an approach similar to [1].In HD + , by far the largest sources of uncertainty are the yet uncalculated nonlogarithmic contributions of order R ∞ α 6 in the one-loop self-energy and in the two-loop radiative correction [39,40].Smaller uncertainties come from the use of the adiabatic approximation in the calculation of some of the high-order correction terms (R ∞ α 4 to R ∞ α 6 ), from the relativistic-recoil correction of order R ∞ α 4 m e /m p , and from the deuteron finite-size correction of order R ∞ α 4 .All these uncertainties are assumed to be fully correlated ("type u 0 " uncertainties in the terminology of Ref. [1]), leading to correlation coefficients very close to 1.In pHe, additional sources of uncertainty come into play.Firstly, there is a larger uncertainty associated with one-and two-loop R ∞ α 6 -order corrections, since these terms have only been estimated from hydrogen atom results.Secondly, the one-loop vacuum polarization term of order R ∞ α 5 has a significant uncertainty, for the same reason.Finally, some of the operator expectation values involved in the calculation of E (2) and E (3) have non-negligible numerical uncertainties.These numerical uncertainties are assumed to be uncorrelated ("type u n " [1]), leading to smaller correlation coefficients with respect to HD + .
The sensitivity of theoretical energy levels to the SM parameters (R ∞ , α, and nuclear charge radii) is easily obtained by differentiating Eq. (S10), with the exception of particle masses.The dependence on particle masses mainly comes from the nonrelativistic energy (first term of Eq. (S10)), and is obtained numerically.The corresponding sensitivity coefficients are thus calculated numerically using the approach outlined in [82].
The values of g SM determined from the fit to DATA22 data under the hypothesis of just SM, i.e., no new physics favored region of the ULD parameter space is for |k| 2.4, see Fig. 2.This requires the y couplings to be larger than the corresponding SM Yukawa couplings by a factor of A 100 × (Λ/10 TeV).The largest is the coupling to muons, y µ 0.04 × (Λ/10 TeV), which remains well perturbative even for a relatively high value of the cutoff scale Λ.The higher dimensional operators could arise from extra vector-like fermions at the scale Λ, where Ψ ,u are vector-like SU(2) L singlets of mass M ,u .Integrating out the Ψ's yields the dimension-five operators in Eq. (S15) with y /Λ = x λ /M and y u /Λ = x u λ u /M u .
In the second example of a UV model we supplement the SM field content by an extra Higgs doublet H and a light singlet S that mixes with H , giving a light mass eigenstate φ S − sin θ H , where θ 1 is the mixing angle.The mixings of the two scalars with the SM Higgs is assumed to be small.The Yukawa couplings of H to the SM fermions are assumed to be diagonal in the same basis as for the SM Higgs, with the only nonzero values the Yukawa couplings to the electron, the muon and the up quark.The SM Higgs H remains the dominant source of the electron, muon and up-quark masses and gives the entirety of the mass to the remaining SM fermions, while a subdominant parts of the electron, muon and up quark masses are due to the H vacuum expectation value, H = (0, v )/ √ 2 (for more general models of this type see, e.g., [105][106][107]).That is, for electron, muon and up quark we have and m i = m H i for i = τ, d, s, c, b, t.As a simple ansatz we take m H ,u = δ × m H ,u , with δ 1 a universal factor.The φ couplings to electrons, muons and up quarks are then the same as for the SM Higgs, just rescaled by k = δ sin θ v/v .The smallness of δ guarantees that the SM Higgs boson decays to muons, h → µ + µ − , remain close to the SM predictions, in agreement with measurements, which require δ 0.35 at 90% CL [108].Note that for |k| ∼ O(1) the H VEV must be smaller than the weak scale, v /v δ sin θ 1.

B. The scalar photon
Finally, we introduce an additional NP benchmark model, the "scalar photon", which will prove useful in the discussion of the m φ → 0 limits in Section S7 below.The model consists of SM and a new light scalar that has the same pattern of couplings to the SM fermions as the photon, i.e., q e = q µ = −q p = −1 and q n = 0.That is, the only difference between the dark photon and the scalar photon model is the spin of the mediator.The comparison of bounds for the two models is therefore very informative, and highlights the importance of different subsets of data as well as the importance of implementing the m φ → 0 behavior correctly.For instance, the right panel of Fig. S1 shows that the bounds on α φ are rather different in the two models, despite the identical pattern of couplings.For m φ m e , the bound is mostly set by the one-loop NP contribution to the electron g − 2, which is larger for scalar photon.For m φ a 0 ∼ O(1), the sensitivity of the fit to NP is dominated by hydrogen spectroscopy, which currently favors an addition of a repulsive NP interaction, thus yielding a stronger bound for dark photon.For very light masses m φ 1/a 0 , the bounds on α φ weaken both for the scalar photon as well as for dark photon, due to the degeneracy with the QED photon.However, while for dark photon the degeneracy with the QED photon is exact in the m φ → 0 limit, for massless scalar photon the degeneracy is lifted by O(α 2 ) relativistic corrections.The bound on the scalar photon fine-structure constant therefore flattens below small enough m φ , for which the NP contributions to the hydrogen energy levels become comparable to the relativistic corrections, i.e., for a 0 m φ ∼ O(α).
The least-squares adjustment also reveals that the DATA22 dataset favors the scalar photon model over the SMonly hypothesis at the 4.8σ level, with the best-fit point at m φ = 0.2 keV and α φ = 2.7 × 10 −13 .The 1σ, 2σ, 3σ, and 4σ CL contours are shown as orange shaded regions in Fig. S2 (right panel), along with the exclusions from stellar cooling (grey), SN1987a cooling (red), NA62 search for K + → π + +inv (green) and E137 (dark green).The overlap of all the bounds excludes the region favored by the adjustment.Note that even if the SN cooling does not apply [109] the favored region is still excluded by the combination of other constraints.
Figure S1 (left panel) shows the allowed range for α φ at 99% CL assuming a given scalar photon mass (orange shaded region), with the solid line showing the best fit α φ for each value of m φ .Note that for very light φ, with mass below around 20 eV the value α φ = 0 is allowed at 99% CL, i.e., the very light scalar photon is not preferred over the SM.Similarly, the heavy scalar photon, with mass above about 100 keV, is not preferred over the SM, in agreement with Fig. S2 (right panel).[50,51,110] (pink), the K + → π + X search at NA62 [52] (green, the dashed line is an NNLO estimate), stellar cooling [53] (gray) and the E137 beam dump experiment [54,55] (yellow).

S5. NEW PHYSICS IMPROVEMENTS RELATIVE TO THE SM
The fit to DATA22 dataset assuming only the SM, i.e., no NP, gives the minimum χ 2 per degree of freedom χ 2 SM /ν dof 1.4 (ν dof = 102 − 62 = 40).Ignoring correlations, the largest contributions to the χ 2 are from the hydrogen observables A12-A15, A23 and A31 in Tables S1 and S6, see also Fig. S3.
The adjustments exhibit no significant (> 2σ) preference over the SM for the gauged B − L, dark photon and hadrophilic scalar models, see Fig. S4.The ULD scalar and the Higgs portal, on the other hand, show a preference over the SM for m φ a 0 m µ /m e .This NP evidence can already be anticipated from the bounds on α φ shown in Fig. 1.For masses heavier than m φ a 0 m µ /m e the main constraint is from muonic hydrogen, so that one would expect, due to enhanced couplings of φ to muons, the bounds on ULD scalar and Higgs portal to be stronger by a factor O(m µ /m e ) relative to all the other models we consider, in which q e = q µ .The bounds in Fig. 1, however, are found   and the NP coupling constant α φ for a 300 keV ULD scalar, using the DATA22 dataset.The x-axis is shifted by x SM i , the extracted value of xi for the SM-only hypothesis, and normalized by its uncertainty.
for the ULD and Higgs portal models to be weaker by a factor of O(20) than the naive expectations, for heavy φ.This is an indication that the data favors NP models with large µ-to-e coupling ratio over the SM in this mass range.
The significance of the deviation is at the ∼ 4σ level for the Higgs portal and the ∼ 5 σ level for the ULD scalar.Figure 2 in the main text shows the preferred region in the ULD model parameters, where the best-fit point is m φ = 300 keV and α φ = 6.7 × 10 −11 .Similarly, the left panel in Fig. S2 shows the preferred region for the Higgs portal parameters, with the best-fit point given by m φ = 400 keV and α φ = 2.5 × 10 −11 .In both models the NP evidence receives support mostly from the recent measurements of the hydrogen 2S 1/2 − 8D 5/2 and 1S 1/2 − 3S 1/2 transitions [20,21], as well as muonic deuterium, see Fig. S5.These tensions between data and the SM prediction are not new.The authors of Ref. [20] already pointed out the inconsistency of their 2S 1/2 − 8D 5/2 measurement with hydrogen theory and discussed NP interpretations in the form of Yukawa potential as well as its impact on the determination of R ∞ .The 1S 1/2 − 3S 1/2 hydrogen and muonic deuterium Lamb shift measurements are known pieces of the so-called proton-radius puzzle [19,111].Our analysis shows that both tensions can be significantly ameliorated by postulating the existence of a single light scalar mediator, with the pattern of couplings to the SM fermions such as in the Higgs portal or in the ULD model, with only the latter also avoiding other, non-spectroscopic constraints.

S6. EXTRACTION OF FUNDAMENTAL CONSTANTS IN THE PRESENCE OF NP
In this section we discuss the effect of NP on the uncertainty of extracted fundamental constants.Figure 3 in the main text shows the 68% CL region for simultaneous determinations of the proton charge radius r p and the Rydberg constant R ∞ , assuming the SM-only hypothesis, using either the CODATA18 or DATA22 datasets, as well as for the ULD and the Higgs portal models (DATA22 only).As expected, the extracted values of r p and R ∞ are highly correlated regardless of the existence of NP.NP induces significant shifts in the extracted values of R ∞ and r p , notably when the data shows evidence for nonzero α φ .Furthermore, the larger the shift in the extracted value of the fundamental constant, the more important is the correlation with α φ , see Fig. S6.
Figure S7 illustrates the impact of allowing in the fit the possibility of NP.The relative uncertainties on the extracted values of fundamental constants, g SM , are plotted as a function of the NP mass, for all six benchmark models.In many cases the uncertainties on the extracted fundamental constants change significantly, by factors of O(1), even if the NP parameters are strongly disfavored by data.For dark photon, in particular, the uncertainty on α increases as ∼ m −2 φ for φ masses below ∼ 10 −2 keV.This is a result of a degeneracy between α and α φ in the m φ → 0 limit, see the inset in Fig. S7.The combination α + α φ , however, is well determined.The extraction of bounds on NP parameters in the m φ → 0 limit requires special care, especially for new vector bosons that couple to the SM fermions in a similar way than the QED photon.Massless dark photon, in particular, is completely degenerate with the QED photon.In the massless limit all the effects due to the exchanges of a dark photon are therefore absorbed in the SM predictions by performing the shift α → α + α φ , i.e., the massless dark photon is unobservable.The reason is that for a massless dark photon one can always choose a linear combination of photon and dark photon fields that does not couple to the SM currents, and redefine the orthogonal linear combination as the SM photon.This behavior should be reflected in theoretical predictions.That is, in the m φ → 0 limit, O SM only depends on α + α φ , so that this sum can be interpreted as the SM fine-structure constant.Note that the shift α → α + α φ also implies a redefinition of the Rydberg constant extracted from hydrogen, R ∞ → (α + α φ ) 2 m e /(4π), and of the Bohr radius, a −1 0 → 4πR ∞ /(α + α φ ).The treatment of NP corrections in Eqs. ( 4), ( 5) is designed such that a) it is always correct to leading order in α φ , and b) it reproduces the correct result for the massless dark photon, i.e., for m φ → 0 and q i = Q i .The degeneracy of dark photon with the QED photon is broken either by having m φ = 0 (while still q i = Q i for all particles) or by having a coupling for at least one particle differ from the QED one, q i = Q i .For infinitesimal deformations from the massless dark photon limit the leading observable NP effect is captured by the potential V NP .For m φ a 0 1, keeping the NP couplings still aligned with QED, q i = Q i , the leading non-trivial term in ṼNP scales as O(m 2 φ ), indicating a parametric loss of sensitivity to NP for small m φ , O NP ∝ (m φ a 0 ) 2 .(The term linear in m φ is independent of r and thus unobservable.)For massless vectors with couplings that deviate inifinitesimally from the QED couplings, i.e., q i = Q i + δq i with δq i Q i , the leading effect scales as Ṽ ij NP ∝ δq i + δq j .For massive vectors, m φ a 0 ∼ O(1), with charges that differ significantly from the QED ones, q i = Q i , the prescription in Eqs.(4), ( 5) is equivalent to working to leading order in V NP and not shifting the SM predictions at all.That is, for m NP nonzero and q i = Q i such as massive B − L gauge boson, both prescriptions are strictly speaking we did for the other three scalar benchmarks in the main text.

FIG. 1 .
FIG.1.The 95% CL bounds on the NP coupling constant α φ as a function of the new boson's mass m φ for the benchmark NP models of Sec.II, as indicated.Other model-dependent constraints may apply (see text).

FIG. 3 .
FIG.3.The 68% CL regions for simultaneous determinations of the Rydberg constant R∞ and the proton radius rp assuming either the SM-only hypothesis (gray) or including putative NP contributions from a 400 keV Higgs portal scalar (blue) or 300 keV ULD scalar (purple).The solid lines use DATA22 dataset, the dashed (dotted) lines the CODATA18 dataset with (without) errors inflated by expansion factors.Both R∞ and rp are shown in terms of normalized deviations from the central values of the CODATA 2018 analysis, Ref.[1].
ν HD + ((v, L) − (v , L )) corresponds to the spin-averaged frequency of the transition between rovibrational states (v, L) and (v , L ) of HD + , where v (v ) denote the initial (final) vibrational state, and L (L ) the initial (final) orbital angular momentum quantum number.Similarly, νpHe((n, l) − (n , l )) is the frequency of the transition between states (n, l) and (n , l ) of pHe , where n (n ) and l (l ) respectively denote the principal and orbital quantum numbers of the antiproton in the initial (final) state.The "− − −" symbol indicates that the input datum has been removed from the CODATA18 dataset.Label Input datum Value (kHz) Rel.uncert.
FIG. S2.NP parameters favored by atomic spectroscopy in the Higgs portal (left) and the scalar photon (right) benchmark models, profiling over the SM parameters.The black dot indicates the best-fit point and the blue-shaded (left) and orangeshaded (right) areas represent the 1, 2, 3, 4σ confidence regions around it.The other shaded areas denote excluded region by SN1987a[50,51,110] (pink), the K + → π + X search at NA62[52] (green, the dashed line is an NNLO estimate), stellar cooling[53] (gray) and the E137 beam dump experiment[54,55] (yellow).

m
FIG. S3.Normalized residuals R SM i , see Eq. (S7), of individual input data within the DATA22 dataset.Input data satisfying |R SM i | ≥ 2 are indicated in red.

-
S7. FURTHER DETAILS ON THE m φ → 0 LIMIT

TABLE S1
[1]nput data of the electronic hydrogen and deuterium measurements for the CODATA18 dataset, taken from Table X of Ref.[1].

TABLE S2 .
[1]ut data for the additive energy corrections accounting for missing contributions to the theoretical description of H and D energy levels for the CODATA18 dataset, taken from Table VIII of Ref.[1].Relative uncertainties are with respect to the binding energy.

TABLE S4 .
[1]ut data relevant for fundamental constants other than the Rydberg constant and nuclear charge radii for the CODATA18 dataset, taken from Table XXI of Ref.[1].The D1-D13 inputs are relevant for the fine-structure constant and the electron mass, while the D14-D23 ones are relevant for the proton and deuteron masses.For the additive corrections D2, D9 and D13 of the ae and bound g-factors in carbon (C) and silicium (Si), respectively, the relative uncertainty is relative to the value of the corresponding theoretical quantity.

TABLE S5 .
The values and relative standard uncertainties of the main fundamental constants resulting from the least-squares adjustment based on the CODATA18 and DATA22 datasets without new physics.The symbol u denotes the unified atomic mass unit.The last column indicates the shift of the DATA22 value from the CODATA18 one in units of the CODATA18 uncertainty.

TABLE S8 .
Updated correlation coefficients r(Bi, Bj) ≥ 0.0001 between input data for the hydrogen and deuterium energy corrections.

TABLE
NP i ) 2 is the contribution of each input datum i to the χ 2 difference between SM and NP ignoring correlations with other input data.R NP i are the normalized residuals of the scalar photon (orange) and the ULD scalar (purple) models, evaluated at their respective best-fit points.Darker colors indicate input data with |δR 2 i | ≥ 2. Input data satisfying |R SM i | ≥ 2 are indicated in red.