Search for rare decays of $D^0$ mesons into two muons

A search for the very rare $D^0 \to \mu^+\mu^-$ decay is performed using data collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 7$, 8 and $13~\rm{TeV}$, corresponding to an integrated luminosity of $9~\rm{fb^{-1}}$. The search is optimised for $D^0$ mesons from $D^{\ast+}\to D^0\pi^+$ decays but is also sensitive to $D^0$ mesons from other sources. No evidence for an excess of events over the expected background is observed. An upper limit on the branching fraction of this decay is set at $\mathcal{B}(D^0 \to \mu^+\mu^-)<3.1 \times 10^{-9}$ at a $90\%$ CL. This represents the world's most stringent limit, constraining models of physics beyond the Standard Model.

Processes with a change in quark flavour without a change in electric charge are forbidden at the lowest order in the Standard Model (SM) of particle physics.Flavour Changing Neutral Currents (FCNC) are additionally suppressed by the Glashow-Iliopoulos-Maiani (GIM) mechanism [1].FCNC have been extensively studied in strange-and beauty-quark hadrons.In the charm sector the GIM suppression is stronger because the mass differences between down-type quarks are smaller than the ones between up-type quarks.These processes can be enhanced by several orders of magnitude in new physics (NP) scenarios when compared to the SM.
This Letter presents a search for the D 0 → µ + µ − decay based on data collected by the LHCb experiment in pp collisions corresponding to 9 fb −1 of integrated luminosity.The 1 Charge conjugate processes are implied throughout.
data have been collected in 2011, 2012 (Run 1) and 2015-2018 (Run 2) at √ s = 7, 8 and 13 TeV, respectively.Compared to the previous publication [51], the present work benefits from various improvements in the analysis, such as refined multivariate algorithms against combinatorial and misidentified background as well as an improved trigger [?], described throughout the paper.These allow to mitigate the impact of the harsher experimental conditions in Run 2. Both the higher energy and instantaneous luminosity produce a higher track multiplicity, which increases the combinatorial background and worsens the particle identification performance.The D 0 → µ + µ − decay is searched for using D * + → D 0 π + decays, as this improves the background rejection and allows the yield of the decay to be obtained from a two-dimensional fit to the dimuon invariant mass, m(µ + µ − ), and the difference between the D * + and D 0 candidate masses, ∆m.The yield is converted to the decay branching fraction by normalising to two hadronic decays, D 0 → K − π + and D 0 → π + π − , selected concurrently to the signal (collectively referred to as The LHCb detector is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, described in detail in Refs.[52,53].The simulated events used in this analysis are produced with the software described in Refs.[54][55][56][57][58][59][60]. Events are selected online by a trigger that consists of a hardware stage, which is based on information from the calorimeter and muon systems, followed by two software stages.At the hardware trigger stage, events are required to have a muon candidate with high transverse momentum, p T , or a hadron, photon or electron candidate with high transverse energy in the calorimeters.A first stage of the software trigger selects events with a muon candidate, or a high p T charged particle, or a combination of two tracks, each of these displaced from the primary pp collision vertex (PV).In the second stage of the software trigger, dedicated algorithms select candidate D 0 → µ + µ − , D 0 → K − π + and D 0 → π + π − decays, combining two oppositely charged tracks with loose particle identification (PID) requirements that form a secondary vertex separated from any PV.The invariant mass of the D 0 candidate must lie in an interval of ±300 (±70) MeV/c 2 centred on the known D 0 mass for signal (normalisation) channel candidates.To keep the same trigger selection for the signal and normalisation channels, aside from PID, a scale factor of order 0.2-3.0%depending on the data taking period is applied to the normalisation channels trigger selections to limit their rate, keeping randomly a fraction of the events.In the offline selection, only candidates associated with a muon hardware trigger, or those where the rest of the event contained a high transverse energy hadron or electron are kept for the signal channel.For the normalisation channels, only candidates associated with a high transverse energy hadron are kept.
In the offline analysis, D 0 candidates satisfying the trigger requirements are formed with similar but more stringent criteria than the second software-trigger stage.These D 0 candidates are then combined with a charged particle originating from the same PV and having p T > 110 MeV/c to form D * + → D 0 π + candidates.To improve the mass resolution, the D * + meson decay vertex is constrained to coincide with the PV [61].The candidate ∆m is required to be in the range 139.6-151.6MeV/c 2 .A multivariate selection based on a boosted decision tree (BDT) algorithm [62][63][64] is used to suppress background from random combinations of charged particles, using as input: the p T of the pion from the D * + decay, the smallest p T and impact parameter significance with respect to the PV of the D 0 decay products, the angle between the D 0 momentum and the vector connecting the primary and secondary vertices, and the quality of the D 0 vertex.The BDT is trained separately for each Run of data taking, using simulated decays as signal and data candidates from the dimuon sample with m(µ + µ − ) > 1894 MeV/c 2 as background.The k-folding technique, with k = 9, is applied [65].The BDT output ranges from 0 to 1, from background-like candidates to more signal-like candidates.The BDT output is used to define three search regions: BDT ∈ [0.15, 0.33], [0.33, 0.66], [0.66, 1.].The output of the same BDT algorithm is computed also for the D 0 → π + π − candidates for calibration purposes, but is not required in the selection.
A second source of background is due to two-and three-body D 0 decays, with one or two hadrons misidentified as muons (e.g.D 0 → h + h − or semileptonic decays).The misidentification occurs mainly for hadrons that decay into a muon before the muon sub-detector.Although this process is relatively rare, the large branching fractions of these modes produce a background peaking in the signal region of the m(µ + µ − ) distribution that is partially suppressed by a multivariate muon identification discriminant combining information from the Cherenkov detectors, the calorimeters, and the muon sub-detector [66].In addition, the muon candidates are required to have associated muon chamber hits that are not shared with any other track in the event.Signal and background sources can originate both from D * + → D 0 π + decays or from other sources (the PV or B decays) combined with an unrelated pion (untagged); both are taken into account in the yield estimation.A requirement on the output of the multivariate muon identification discriminant is simultaneously chosen for the three BDT regions, by optimising the sensitivity to the minimum visible cross-section, as defined by an extension of the figure of merit defined in Ref. [67].Roughly one percent of events contain more than one signal candidate after all selection requirements, all of which are retained.
The signal yield is converted to the decay branching fraction by normalising to the hadronic decays D 0 → π + π − and D 0 → K − π + , with branching fractions of (1.490 ± 0.027) × 10 −3 and (3.999 ± 0.045) × 10 −2 , respectively [68], as where ε is the efficiency and N is the yield of the given channel, s is the scale factor of the normalisation channel and α is defined as the single event sensitivity.The efficiencies in Eq. ( 1) are factorised into different steps for ease of estimation and evaluated with respect to the previous steps: detector acceptance, reconstruction and selection, PID, and trigger.
The reconstruction and selection efficiencies are obtained from simulated samples.The simulated candidates are assigned weights with an iterative procedure that improves the agreement with data using the following variables: pseudorapidity of the D 0 meson, transverse momentum of the D 0 meson and number of tracks in the event.It is verified that after weighting, all variables used in the selection agree well between data and simulation.The weights obtained from the D 0 → π + π − candidates are used to correct also the signal simulation.
Possible residual differences between data and simulation in the tracking efficiencies are determined using control channels in data [69].The PID efficiencies are determined from data using samples of kinematically identified charged particles from B + → J/ψK + and D * + → D 0 (→ K − π + )π + decays [66], weighted to match the kinematic properties of the signal and the normalisation channels, respectively.The efficiencies are determined in bins of the p and p T of the tracks.A total systematic uncertainty of 1-3% is associated to the binning scheme and background determination in the calibration samples.
The efficiency of the second level of the software trigger is unity with respect to the offline-selected candidates by construction, as the selection is tighter in every requirement.The hardware and first level software trigger efficiencies are evaluated with the Tistos method [70] in data.For the signal channel, the B + → J/ψ(→ µ + µ − )K + decay is used as the calibration channel, selected with the same requirements as those used for the analysis of B decays into two muons [71,72].The calibration is performed in intervals of the J/ψ p T and pseudorapidity.For each interval, a scaling factor between data and simulation is obtained and applied to the D 0 → µ + µ − simulation.Compatible results are obtained repeating the calibration in intervals of p T and the maximum IPχ 2 of the muons, where IPχ 2 is defined as the difference in the vertex-fit χ 2 of a given PV reconstructed with and without the track under consideration.The typical scaling between data and simulation deviates from unity by 2-6%.The normalisation channels, given their high yields, are self-calibrated.The Tistos method is applied to the D 0 → K − π + and D 0 → π + π − channels and trigger efficiencies are obtained.To minimise cross-correlation biases, only candidates in events that satisfy a muon trigger independently of the candidate are used as calibration sample.The calibration of the hadronic hardware trigger is also validated with independent estimates based on control samples in data, obtained with similar methods as in Ref. [73], from which a 15% relative systematic uncertainty is assigned to the hadronic trigger efficiency calibration.
The efficiency of the BDT requirement, and the signal fraction in the BDT intervals, are calibrated in data by applying the same estimator to the D 0 → π + π − decay, which is topologically very similar to the signal.The distribution of the BDT output is obtained in background subtracted D 0 → π + π − decays in data and simulation, and found to be compatible, as shown in the Supplemental Material [74].A small correction is determined and applied to the signal; its uncertainty is assigned as systematic uncertainty to the signal efficiency.
The yields of the normalisation channels are obtained through a fit to the ∆m distribution (Fig. 1), requiring the reconstructed D 0 mass to be within ±10 MeV/c 2 of the known D 0 mass.The signal probability distribution function is composed of a sum of a Gaussian and a Crystal-Ball function [75] with power-law tails on both sides.The background is described with a threshold function, as defined in Ref. [51].The parameters of the Crystal-Ball function are estimated with simulation: the power of the tail is fixed, while the position where the power tails start may vary freely in the fit.In addition, the signal width and all background parameters are left free in the fit.
Using Eq. ( 1), values of α for both normalisation channels are obtained, and found to be in good agreement with each other for each data taking run and for the full sample.As an additional cross-check, the ratio of the efficiency corrected yields of the two normalisation channels is obtained and compared to the ratio of their branching fractions.The value is stable across the data taking years and compatible with the world average [68].The average single event sensitivity is found to be α = (2.15 ± 0.34) × 10 −11 , corresponding to at most one expected signal D 0 → µ + µ − decay under the SM hypothesis.
The signal yield is obtained through an unbinned maximum-likelihood fit to the two-dimensional distribution of m(µ + µ − ) and ∆m, performed simultaneously in the three BDT intervals and in the two data taking Runs.The distributions projected onto the two variables are shown in Fig. 2. Each of the two projections is selected using only candidates in the signal region of the other variable, where the signal regions are defined as m(D 0 ) ∈ [1840, 1885] MeV/c 2 and ∆m ∈ [144.9, 146.1]MeV/c 2 , respectively.The full distributions can be seen in the Supplemental Material [74].The correlation between the two variables is found to be negligible for all contributing decay modes, thus they are treated as uncorrelated.After the full selection, only combinatorial and misidentified hadronic D 0 decays are found to contribute to the background.Background from semileptonic decays is found to be negligible, and any remaining background from other sources is well modelled as part of the combinatorial background component.
The shape of the signal and misidentified background (D 0 → π + π − , D 0 → K − π + ) distributions in the two fit variables is obtained from simulation, reconstructed as D 0 → µ + µ − decays.The model parameters are determined separately for Run 1 and Run 2; the resulting PDF describes the distribution in each data taking year and BDT interval well.For signal, the m(µ + µ − ) and ∆m distributions are both parametrised by Crystal-Ball [75] functions with power tails on both sides.For D 0 → π + π − decays, a single Crystal-ball function in m(µ + µ − ) and the sum of a Johnson [76] and Gaussian function in ∆m are employed.For D 0 → K − π + decays, a Johnson function is used for the m(µ + µ − ) distribution, while the ∆m distribution is described by three Gaussian functions.The combinatorial background is described by an exponential function in m(µ + µ − ) and a threshold function in ∆m [51].Untagged signal and D 0 → π + π − components are included and parametrised as their respective tagged component in m(µ + µ − ) and with the same threshold function of the combinatorial background in ∆m.The fraction of this component is fixed to the value determined in each BDT interval from a fit to D 0 → π + π − data.The shape parameters obtained from the simulated samples are fixed in the data fit, while the slope of the exponential of the combinatorial background is left free to vary in each BDT interval.
A constraint on the expected number of misidentified D 0 → π + π − decays is determined from a dedicated, high-statistics, simulation sample with the trigger and offline selection applied.The most critical part of the simulated sample is the PID efficiency due to the presence of a large fraction of π → µ decays that mimic the signal and are not considered in the standard calibration tools.The PID efficiency is obtained from simulation but it is cross-checked using D + → π + π − π + and D + s → π + π − π + control samples in data where same-sign pions are weighted to match the kinematics of D 0 → π + π − decays.The agreement between the PID efficiency determined with both methods is satisfactory over the full range of the muon identification discriminant variable [74].Therefore, no systematic uncertainty is assigned on this estimate.The uncertainty on the expected D 0 → π + π − yield is propagated through a Gaussian constraint on the relevant parameter in the final fit.
The yield of the misidentified D 0 → K − π + decays is constrained from an auxiliary fit to the m(µ + µ − ) sideband data, recomputed with the correct mass hypothesis.The fit is performed using the ∆m distribution within a ±10 MeV/c 2 region around the D 0 mass in the K − π + mass hypothesis.A correction is applied to take into account this mass requirement.The correlation between this estimate and the yield in the final fit is found not to influence the estimate of the signal branching fraction.
The systematic uncertainties related to both the normalisation, through α, and the background shapes and yields, are included in the fit as Gaussian constraints on the relevant parameters.The dominant systematic uncertainty comes from the calibration of the hadronic trigger efficiency, which is shared through auxiliary parameters among the normalisation channels, and also with the misidentified D 0 → π + π − yields that depend on the same estimate.The fit procedure is tested with pseudoexperiments.The values of the floating shape parameters are obtained from the data fit.Unbiased estimates of the branching fraction with correct coverage are obtained.
The m(µ + µ − ) and ∆m distributions in data are shown for the most sensitive BDT interval in Fig. 2 and for all intervals in Ref.
[74], overlaid with the result of the fit.The data are consistent with the expected background.The value obtained for the D 0 → µ + µ − branching fraction is B(D 0 → µ + µ − ) = (1.7 ± 1.0) × 10 −9 , corresponding to 79 ± 45 signal decays.The significance of this signal is estimated comparing the test statistics in data with the distribution of the test statistics in background-only pseudoexperiments, and is found to have a p-value of 0.068, corresponding to a significance of 1.5σ (see also Ref. [74]).An upper limit on the branching fraction is derived using the frequentist CL s method [77] as implemented in the GammaCombo framework [78,79].This yields The observed limit is larger than the one expected from background-only pseudoexperiments, B(D 0 → µ + µ − ) < 1.9 (2.3) × 10 −9 at a 90 (95)% CL, coherently with the central value for the signal branching fraction.
The fit is repeated with different configurations: allowing the resolution of the misidentified D 0 → π + π − background to vary, using a double exponential function in place of a single one for the combinatorial background, and reducing the range in the ∆m variable.No significant change was found in the signal branching fraction with any configuration.In summary, a search for the D 0 → µ + µ − decay in data corresponding to 9 fb −1 of pp collision data collected by the LHCb experiment is performed.No excess with respect to the background expectation has been found and an upper limit of B(D 0 → µ + µ − ) < 3.1 × 10 −9 at 90% CL has been set.This result represents an improvement of more than a factor two with respect to the previous LHCb result.This measurement constitutes the most stringent limit on the relevant FCNC couplings in the charm sector, allowing to set additional constraints on physics models beyond the SM which predict the branching fractions of D 0 → µ + µ − and describe results from B physics measurements.

Supplemental Material
This section contains the additional figures mentioned in the main text.Figures 3 and 4 present the same data shown in Figure 1 in the main body, but for all the BDT intervals.Figures 5 and 6 represent the same distributions and subdivisions of the previous plots, but without restricting each to the signal region of the other variable, i.e. they contain the full data used for the signal search.Figure 7 shows the data in the two-dimensional plane of the m(µ + µ − ) and ∆m variables, as well as the signal regions in each variable.Figure 8 displays the result of the calibration of the BDT output described in the text.Figure 9 shows the test on the particle identification variable mentioned in the main body.Finally, Figure 10 shows the value of the CL s estimator used to compute the upper limit on the D 0 → µ + µ − branching fraction, as a function of the branching fraction itself.

Figure 2 :
Figure 2: Distribution of (left) m(µ + µ − ) and (right) ∆m for the D 0 → µ + µ − candidates in data from (top) Run 1 and (bottom) Run 2, for the most sensitive BDT interval.The distribution is superimposed with the fit to data.Each of the two distributions is in the signal region of the other variable, see text for details.Untagged and tagged decays are included in a single component for signal and D 0 → π + π − background.

Figure 3 :
Figure 3: Distributions of (left) m(µ + µ − ) and (right) ∆m for the D 0 → µ + µ − candidates for Run 1 data in, top to bottom, the three BDT intervals.The distributions are superimposed with the fit to data.Each of the two distributions is in the signal region of the other variable, see text for details.Untagged and tagged decays are included in a single component for signal and D 0 → π + π − background.

Figure 4 :Figure 5 :Figure 6 :
Figure 4: Distributions of (left) m(µ + µ − ) and (right) ∆m for the D 0 → µ + µ − candidates for Run 2 data in, top to bottom, the three BDT bins.The distributions are superimposed with the fit to data.Each of the two distributions is in the signal region of the other variable, see text for details.Untagged and tagged decays are included in a single component for signal and D 0 → π + π − background.

Figure 7 :Figure 8 :
Figure 7: Two-dimensional distribution of m(D 0 ) versus ∆m for the most sensitive BDT interval in the (left) Run 1 and (right) Run 2 data.Red lines represent the signal region of each variable as defined in the text.

Figure 9 :
Figure9: Efficiency for two pions to pass a requirement on the P robN N mu variable (after additional requirements on its identification as muon) in Run 2, as measured from D + → π + π − π + and D + s → π + π − π + decays in data and D 0 → π + π − decays in simulation.Same sign pions are used in data to avoid contamination for hadronic resonances decaying to two real muons.

Figure 10 :
Figure 10: Value of CL s as a function of the D 0 → µ + µ − branching fraction.