Search for Higgs boson decays into a pair of pseudoscalar particles in the $bb\mu\mu$ final state with the ATLAS detector in $pp$ collisions at $\sqrt{s}=13$ TeV

This paper presents a search for decays of the Higgs boson with a mass of 125 GeV into a pair of new pseudoscalar particles, $H\rightarrow aa$, where one $a$-boson decays into a $b$-quark pair and the other into a muon pair. The search uses 139 fb$^{-1}$ of proton-proton collision data at a center-of-mass energy of $\sqrt{s}=13$ TeV recorded between 2015 and 2018 by the ATLAS experiment at the LHC. A narrow dimuon resonance is searched for in the invariant mass spectrum between 16 GeV and 62 GeV. The largest excess of events above the Standard Model backgrounds is observed at a dimuon invariant mass of 52 GeV and corresponds to a local (global) significance of $3.3 \sigma$ ($1.7 \sigma$). Upper limits at 95% confidence level are placed on the branching ratio of the Higgs boson to the $bb\mu\mu$ final state, $\mathcal{B}(H\rightarrow aa\rightarrow bb\mu\mu)$, and are in the range $\text{(0.2-4.0)} \times 10^{-4}$, depending on the signal mass hypothesis.


Introduction
Light (pseudo) scalars that couple to the 125 GeV Higgs boson [1,2] appear in many well-motivated extensions of the Standard Model (SM) [3][4][5][6][7][8]. These include models addressing the baryon asymmetry of the universe [9,10], offering a solution to the naturalness problem [11,12], or providing insights into the nature of dark matter [13][14][15][16][17][18][19]. Light bosons produced in Higgs boson decays could also be mediators to dark sectors that do not couple to the SM otherwise [20][21][22][23][24]. Furthermore, pseudoscalar mediators appear in models, such as those described in Ref. [25], that were proposed to explain the anomalous muon magnetic moment [26]. A combination of ATLAS measurements of the Higgs boson production cross-sections and branching ratios constrains the branching ratios into invisible and undetected states to be B ( → inv) < 30% and B ( → undetected) < 21%, respectively, whereas the overall branching fraction of the Higgs boson into beyond-the-SM (BSM) states is determined to be less than 47% at 95% confidence level (CL) [27]. Combined measurements of Higgs boson couplings performed by the CMS Collaboration set upper limits of B ( → inv) < 22% and B ( → undetected) < 38% at 95% CL [28]. This motivates searches for light states in the Higgs boson decays that probe this potentially large B ( → BSM). This paper presents a search for decays of the 125 GeV Higgs boson into two pseudoscalars, denoted by , in proton-proton ( ) collisions at the LHC [29]. The search is performed in events where one -boson decays into two -quarks and the other into two muons, → →¯+ − . 1 The -bosons are assumed to have a decay width that is narrow compared to the detector resolution. As pseudoscalar couplings are generally proportional to mass, which is for example the case in two-Higgs-doublet models [20,30], the final state provides a good balance between a high branching ratio from the → decay and a clean, high mass-resolution, dimuon resonance signature that is easy to trigger on from the → decay. In scenarios with enhanced lepton couplings, the → branching ratio can also be relatively large, resulting in B ( → → )/B ( → ) of up to 0.16% [31]. and 2 + 2-jets [43]. A search for a dimuon resonance produced in association with b-jets has been performed by CMS [44] and a light resonance decaying to two muons has been searched for by LHCb [45]. CMS has performed a search for → → in 35.9 fb −1 of collision data at a center-of-mass energy of √ = 13 TeV that sets upper limits on B ( → → ) of (1-7) × 10 −4 for -boson masses GeV. Additionally, boosted decision tree (BDT) techniques are used to improve the separation of the signal from the SM backgrounds, increasing the analysis sensitivity, especially for higher . 1 Denoted by → → from now on for the rest of the paper.

ATLAS detector
The ATLAS experiment [48, 49] is a multipurpose particle detector with a forward-backward symmetric cylindrical geometry and a nearly 4 coverage in solid angle. 2 It consists of an inner detector (ID) surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field, electromagnetic (EM) and hadron calorimeters, and a muon spectrometer (MS). The inner tracking detector covers the pseudorapidity range | | < 2.5. It consists of silicon pixel, silicon microstrip, and transition radiation tracking detectors.
Lead/liquid-argon (LAr) sampling calorimeters provide electromagnetic energy measurements with high granularity. A steel/scintillator-tile hadron calorimeter covers the central pseudorapidity range | | < 1.7. The endcap and forward regions are instrumented with LAr calorimeters for both the EM and hadronic energy measurements up to | | = 4.9. The MS surrounds the calorimeters and is based on three large superconducting air-core toroidal magnets with eight coils each. The field integral of the toroids ranges between 2.0 Tm and 6.0 Tm across most of the detector. The MS includes a system of precision tracking chambers and fast detectors for triggering. A two-level trigger system is used to select events. The first-level trigger is implemented in hardware and uses a subset of the detector information to accept events at a rate below 100 kHz [50]. This is followed by a software-based trigger that reduces the accepted event rate to 1 kHz on average. An extensive software suite [51] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Dataset and simulated events
The data used in this analysis were collected in Run 2 of the LHC during the 2015-2018 data-taking period with collisions at a center-of-mass energy of √ = 13 TeV. The dataset corresponds to an integrated luminosity of 139 fb −1 . The lowest-threshold unprescaled single-muon and dimuon triggers are used to select the events [52]. Single-muon triggers require the transverse momentum ( T ) of the muon to be above 20 GeV or 26 GeV, depending on the data-taking period, while the dimuon trigger requires both muons to have a T above 14 GeV.
Simulated events are used in the estimation of the SM backgrounds. S 2.2.1 [53,54] was used as the baseline generator for the Drell-Yan (DY) + jets, (→ ℓ )+jets, diboson and triboson backgrounds. It is a multiparton matrix element and parton shower (PS) generator including hadronization [55][56][57][58][59], with the NNPDF3.0 parton distribution function (PDF) set at next-to-next-to-leading-order (NNLO) accuracy [60]. The DY+jets and multiboson samples were generated with a minimum dilepton mass of 10 GeV and 4 GeV, respectively. The¯and single-top-quark samples were generated with P -B 2 [61][62][63][64][65] using the NNPDF3.0 PDF in matrix element interfaced to P 8.230 [66] for the PS. For the underlying-event description a set of tuned parameters called the A14 tune [67] was used, along with the NNPDF2.3 PDF [68]. The¯+ vector-boson processes (¯+ ) were generated with M G 5_aMC@NLO 2.3.3 [69] interfaced to P 8.210 for the PS. The underlying-event tune was the same as for the¯sample. E G [70] was used for the properties of the bottom and charm hadron decays in all simulated samples, except those simulated with S . 2 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the center of the detector and the -axis along the beam pipe. The -axis points from the IP to the center of the LHC ring, and the -axis points upwards. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). Angular distance is measured in units of Higgs boson production through gluon-gluon fusion (ggF) was generated using the NNLOPS program [71,72] with P -B 2 [61,63,73,74]. The vector-boson fusion (VBF) processes were generated with P -B 2 at NLO accuracy [75]. The Higgs boson mass was set to 125 GeV. For both the ggF and VBF production processes, P -B was interfaced with P 8.212 using the AZNLO tune [76] for the simulation of the → → decays, where the -boson is a pseudoscalar, as well as for parton showering, hadronization and the underlying event. The ggF Higgs boson production rate is normalized to the total cross-section predicted at next-to-next-to-next-to-leading-order accuracy in QCD with NLO electroweak corrections applied [77][78][79][80][81] and amounts to 48.58 pb. The VBF production rate is normalized to an approximate NNLO cross-section with the NLO electroweak corrections applied [82][83][84][85], which amounts to 3.8 pb. The contribution from the associated production of a Higgs boson and a vector boson (VH) is calculated to be 3.5% of the total ggF+VBF cross-section and is accounted for by scaling the simulated ggF and VBF samples. The contribution from Higgs boson production in association with a pair of top quarks is found to be negligible (below the percent level) and is neglected in the analysis. Thirteen mass points were simulated for the ggF and VBF production modes, with the -boson mass in the range = 16-62 GeV. 3 Below = 16 GeV the -quarks coming from the decays of the -boson tend to be so collimated due to its boost that they cannot be reconstructed as two separate -jets (with a radius parameter of = 0.4). Another effect is that in the highly asymmetric decays of low-mass -bosons, the subleading -jet falls below the jet reconstruction threshold of 20 GeV [86]. As a result, the signal acceptance falls below 0.2% and the analysis loses sensitivity.
The effects of additional interactions in the same and neighboring beam-bunch crossings (pileup) were modeled for all simulated events by overlaying additional collisions generated with P 8.186 using the NNPDF2.3 PDF set and the A3 tune [87]. Simulated event samples are weighted to reproduce the distribution of the number of pileup interactions observed in the data. All the generated background and signal samples are processed through the ATLAS detector simulation [88] based on G 4 [89] and reconstructed using the same software as for the data.

Event reconstruction and selection
Muons are reconstructed by combining track information from the MS with tracks found in the ID [90]. They also have to satisfy T > 5 GeV and | | < 2.7 (for | | > 2.5, only tracking information from the MS is used), and pass the working point identification requirement defined in Ref. [90]. Muon tracks must have a longitudinal impact parameter 0 satisfying | 0 sin | < 0.5 mm and a transverse impact parameter significance | 0 |/ 0 < 3 relative to the primary interaction vertex, chosen as the reconstructed vertex with the highest sum of the 2 T of its associated tracks. Furthermore, muons are required to be isolated from the surrounding detector activity by requiring that the scalar sum of the T of additional inner detector tracks and the sum of the transverse momentum T of calorimeter energy deposits within a cone of size Δ = 0.2 around a muon be less than 15% and 30% of the muon T , respectively.
Jets are reconstructed using the anti-algorithm [91] implemented in the F J package [92] with a radius parameter of = 0.4. The inputs to the jet clustering are built by combining the information from both the calorimeters and the ID using a particle-flow algorithm [86,93]. Jets with T < 60 GeV originating from pileup are suppressed with the jet-vertex-tagger (JVT) [94], a multivariate algorithm combining track-based variables. Selected jets are required to have T > 20 GeV and | | < 2.5. An algorithm (MV2c10) relying on multivariate techniques, taking as input the properties of displaced tracks and vertices reconstructed within a jet, is employed to identify (tag) jets containing -hadrons [95]. The MV2c10 tagger is used at 77% -jet identification efficiency, with an approximate misidentification probability of 25% for jets arising from charm quarks, 6.3% for hadronically decaying -leptons, and 0.8% for light-flavor jets as measured in simulated¯events.
The missing transverse momentum ( miss T ) is calculated as the magnitude of the negative vector sum of the transverse momenta of all the reconstructed and calibrated objects in the event, including a soft term that accounts for charged particles that are associated with the primary vertex, but not with any reconstructed object [96,97].
The events selected for the analysis are required to have two muons of opposite charge, either with the leading and subleading muons satisfying leading T > 27 GeV and subleading T > 5 GeV, and the event being triggered by a single-muon trigger, or with both muons having T > 15 GeV, and the event being triggered by a dimuon trigger. The dimuon invariant mass, , is required to be between 15 GeV and 65 GeV. Furthermore, the events must contain exactly two -tagged jets with T above 20 GeV.
A kinematic likelihood (KL) [98] fit exploiting the equal invariant masses of the and systems in → decays is performed to improve the four-body invariant mass ( ) resolution and reduce the SM backgrounds. The same fit approach as considered in the previous ATLAS publication [47] is used. The dimuon invariant mass, , is used to constrain the di--jet mass, as the former has a resolution approximately ten times better than the latter. The resolution ranges between 0.4 GeV at = 16 GeV and 1.3 GeV at = 62 GeV. The fit maximizes the likelihood by shifting the -jet energies within the resolution in order to satisfy the constraint . The output of the fit is the logarithm of the maximum likelihood value, ln( max ), which quantifies how well the event matches the = hypothesis, characteristic of signal events. The four-body invariant mass, recomputed after the KL fit, is denoted by KL and is used for further event categorization.
Signal-like events are chosen by requiring that 110 < KL < 140 GeV, and that ln( max ) > −8, which ensures that is compatible with . Finally, miss T is required to be less than 60 GeV to reduce the background from¯events, which is one of the two major backgrounds and can contain large miss T from neutrinos in top-quark decays. This selection defines the "inclusive" signal region (SRincl) and is summarized in Table 1, along with the selection requirements for other analysis regions described later in the text.
A boosted decision tree (BDT) classifier implemented using the TMVA framework [99] is employed to further reduce the SM backgrounds. Its training is done in partially overlapping 8-GeV-wide windows centered at the values of each of the 12 generated signals, 4 in order to fully exploit their kinematic differences. The background sample consists of¯and DY+jets events, the two dominant backgrounds, combined in the proportions extracted from the background validation fit described in Section 7. The signal samples used for the training include ggF and VBF Higgs boson production samples combined according to their cross-sections. The seven kinematic variables included in the training are: • , • ln( max ), • Δ 1 2 (the angular distance between the two -jets), • diffΔ = Δ 1 2 − Δ 1 2 (the difference between the angular separations between the two -jets and the two muons), • Δ (the angular distance between the and systems), • Δ , = Δ 1 1 + Δ 1 2 + Δ 2 1 + Δ 2 2 /4 (the average angular distance of all four combinations of a -jet and a muon), • = 1 1 + 1 2 + 2 1 + 2 2 /4 (the average mass of all four combinations of a -jet and a muon).
The distributions of these variables for the background and three representative signal masses are shown in Figure 1.
The variable helps separate the low-mass signal from the backgrounds, as peaks around 60 GeV for the¯and DY processes. The ln( max ) peaks at higher values as the signal mass becomes smaller.
Due to a higher boost of a lighter -boson, its decay products are collimated, resulting in Δ 1 2 and Δ 1 2 being much smaller than for a signal from a heavier -boson or for background processes. As a consequence, diffΔ shows a narrow distribution centered around zero, while the background and a higher-mass signal exhibit a much broader diffΔ distribution.
The Δ variable helps enhance the sensitivity to higher signal masses. Heavier -bosons are produced approximately at rest, resulting in the Δ distribution being relatively flat with a small peak at low values. As the signal mass decreases, the Δ distribution transitions into a "back-to-back" topology, characteristic of both a low-mass signal and the background events.
Finally, the Δ , and , variables provide another measure of how close the two -bosons are in Δ . In the "back-to-back" topology for lower signal masses, the muons are, on average, further away from the -jets, while for heavier -bosons produced approximately at rest, the average distance between the muons and the -jets is smaller. Consequently, Δ , and , peak at high (low) values for low (high) signal masses, while the backgrounds peak somewhere between the two extreme signal topologies.
The output score of the BDT trained for a signal with mass is denoted by BDT . The BDT distributions for = 20 GeV, 40 GeV and 60 GeV are shown in Figure 2.
The final signal region (SR) bin for each signal mass is defined by imposing two requirements in addition to the SRincl selection: − < < + and BDT > 0.2, where = 1 GeV ( = 1.5 GeV) for ≤ 45 GeV ( > 45 GeV). The widths of the SR bins and the BDT cut value are optimized to maximize the significance of signal over background events. For masses at which no signal sample was generated, and, consequently, no BDT was trained, the BDT trained for the closest to the one being tested is used. For example, when testing the = 32 GeV hypothesis, the requirement BDT30 > 0.2 is applied to select the events for the SR bin. Signal yields for mass points where no signal sample was generated ( = 32 GeV in this example) are obtained by selecting events with BDT scores above 0.2 for the same BDT (BDT30 in this case) in all simulated mass points and interpolating using third-order splines. To assess the uncertainty, the yields of the neighboring simulated mass points ( = 30 GeV and = 35 GeV in this case) are interpolated using a linear function. The difference between the yields obtained using the splines and a linear function for the interpolation is assigned as a systematic uncertainty on the interpolated signal yield. Using a BDT at a mass for which the training was not performed results in a negligible loss of significance relative to a BDT that was optimized for that mass point.
The signal acceptance × efficiency varies between 0.3% and 2.5% for ggF Higgs boson production and between 0.2% and 3.0% for VBF production, where the lowest acceptance × efficiency is obtained for the lowest , and grows as increases. The largest loss of acceptance occurs when requiring that there are two -jets in the event, as one of the signal jets tends to fall below the reconstruction threshold of 20 GeV. The fraction of signal events passing the two--jet requirement is less than 20% for all mass points.

Background estimation
The dominant backgrounds in the analysis arise from the DY dimuon process in association with -quarks and pair production of top quarks (¯) where each boson decays into a muon and a neutrino. These two backgrounds account for more than 96% of background events in all analysis regions.
Two control regions are designed to constrain the¯and DY backgrounds. They are chosen so that they have negligible signal contamination, are kinematically as close as possible to SRincl, and maximize the contribution of one of the respective background processes. A top-quark control region (TCR) is defined by inverting the miss T selection criterion in SRincl to miss T > 60 GeV. This results in an event sample approximately 93% pure in¯events. The DY control region (DYCR) is defined in the 30 GeV-wide KL sidebands of SRincl, i.e. by requiring 80 < KL < 110 GeV or 140 < KL < 170 GeV. Approximately 50% of the events in DYCR originate from the DY process, whereas the rest mostly come from¯production. Two validation regions (VR1 and VR2) are used to validate the normalizations of the backgrounds. VR1 is defined in the 170 < KL < 300 GeV range, while VR2 is obtained by inverting the ln( max ) selection criterion of SRincl to −11 < ln( max ) < −8. All the analysis regions are summarized in Table 1 and illustrated in Figure 3.
The shapes of the¯kinematic variable distributions are obtained from simulation, while the overall normalization is extracted from the fits described in Section 7. The distributions for the DY background are taken from data templates because the limited sizes of the simulated event samples do not allow a reliable estimate. The template regions are defined in the same way as the analysis regions in Table 1, except that the two--tag requirement is replaced by a zero--tag requirement. The template regions are >95% pure in    , ln( max ), The variables are plotted in SRincl. All the distributions are normalized to unit area. The background histogram is the sum of the¯and DY event templates, combined in the proportions extracted from the background validation fit described in Section 7.   DY events. Contributions from other processes, namely¯, +jets, diboson and single-top, are subtracted using simulation. Following the subtraction, the DY templates are corrected to account for kinematic differences between event samples dominated by jets originating from light quarks or gluons (template regions) and event samples dominated by -jets (analysis regions). The correction is applied as a per-event weight, where the reweighting is derived from a comparison between two--tag and zero--tag kinematic distributions in simulated DY events. Two sets of event weights are derived and applied sequentially. First, the jet multiplicity of the zero--tag MC sample is reweighted to the one in the two--tag sample. It is the distribution with the largest difference between the zero-and two--tag samples and was hence corrected first. Secondly, a BDT-based reweighting is employed to further correct the zero--tag template kinematics. A BDT is trained on the zero--tag versus the two--tag simulated DY samples. The BDT input consists of kinematic properties and angular distributions of the -jets, muons and the two corresponding -boson candidates, as well as miss T and KL . The ratio of the BDT score distributions obtained for the two-b-tag and zero-b-tag simulated events is then applied as a weight to every event from the zero-b-tag DY template, as a function of its BDT score. Following the BDT-based reweighting, the KL and miss T distributions are corrected by up to 20%. The DY templates are normalized to data in the fits described in Section 7.
Minor backgrounds include diboson and single-top-quark production, production of a¯pair in association with a vector boson, and boson production in association with -jets. The estimation of these minor backgrounds relies purely on simulation normalized to the best available theoretical prediction. The events where a jet is misidentified as a muon are taken into account as follows: non-prompt/misidentified muons in +jets and¯events are included in the analysis on the basis of simulation, any contribution of  are determined using the event yields measured in the TCR, DYCR, and a given signal window. Tw validation regions are defined to compare the number of observed events with the number of SM even predicted by the fit. One validation region (VR1) is defined in the high tail of the bbµµ invariant mas distribution, 170 < m KL bbµµ < 300 GeV, while for the second validation region (VR2) only the requiremen on the ln(L max ) is changed relative to the SR, 11 < ln(L) < 8. All the analysis regions are illustrate in Figure 2. The DY templates for each of the kinematic variables considered in a particular region of the analysis (SR CR or VR) are taken from the data in a corresponding template region (DYTR). For each analysis regio the associated DYTR is defined by changing the two-b-tag requirement (present in every SR, CR an VR) to a zero-b-tag requirement, while keeping all other selection requirements the same. All the DYT are > 90% pure in DY events. The small contribution from non-DY backgrounds, namely tt, diboson W+jets, single-top and ttV, is subtracted from the data in a DYTR using the simulated samples, and th remaining data events are assigned to the DY template. To construct b-jet-based variables, such as m b and m bbµµ , in a DYTR the two leading non-tagged jets are taken and used in the computation instead o the b-jets.
It is verified in both the simulation and the data that the shapes of all the muon-based variables (mo importantly m µµ ) are consistent between the sample with no b-tagged jets and the sample with two b tagged jets. To account for di erences in jet kinematics between the DYTR dominated by light-flavou jets and the corresponding analysis region dominated by heavy-flavour jets, an event-reweighting base on the leading jet p T is applied to the events in the DYTR. The event weights are derived in the data afte the preselection as the ratio of the leading b-tagged jet p T in the two-b-tag sample to the leading jet p T i the sample with zero b-tags. An improvement in the modelling of jet-based kinematic variables after th reweighting is verified both in simulation and in data in the DYCR, while the shape of the m µµ distributio remains unchanged.
Minor backgrounds include diboson production, W boson production in association with b-jets (wit one non-prompt muon satisfying the isolation criteria) and production of a single top quark or tt pair i association with a vector boson. The contribution of the minor backgrounds in the signal region is at th percent level. They are estimated using simulation normalised to the best available theory prediction. 7 are determined using the event yields measured in the TCR, DYCR, and a given signal window. Two validation regions are defined to compare the number of observed events with the number of SM events predicted by the fit. One validation region (VR1) is defined in the high tail of the bbµµ invariant mass distribution, 170 < m KL bbµµ < 300 GeV, while for the second validation region (VR2) only the requirement on the ln(L max ) is changed relative to the SR, 11 < ln(L) < 8. All the analysis regions are illustrated in Figure 2. The DY templates for each of the kinematic variables considered in a particular region of the analysis (SR, CR or VR) are taken from the data in a corresponding template region (DYTR). For each analysis region the associated DYTR is defined by changing the two-b-tag requirement (present in every SR, CR and VR) to a zero-b-tag requirement, while keeping all other selection requirements the same. All the DYTR are > 90% pure in DY events. The small contribution from non-DY backgrounds, namely tt, dibosons, W+jets, single-top and ttV, is subtracted from the data in a DYTR using the simulated samples, and the remaining data events are assigned to the DY template. To construct b-jet-based variables, such as m bb and m bbµµ , in a DYTR the two leading non-tagged jets are taken and used in the computation instead of the b-jets.
It is verified in both the simulation and the data that the shapes of all the muon-based variables (most importantly m µµ ) are consistent between the sample with no b-tagged jets and the sample with two btagged jets. To account for di erences in jet kinematics between the DYTR dominated by light-flavour jets and the corresponding analysis region dominated by heavy-flavour jets, an event-reweighting based on the leading jet p T is applied to the events in the DYTR. The event weights are derived in the data after the preselection as the ratio of the leading b-tagged jet p T in the two-b-tag sample to the leading jet p T in the sample with zero b-tags. An improvement in the modelling of jet-based kinematic variables after the reweighting is verified both in simulation and in data in the DYCR, while the shape of the m µµ distribution remains unchanged.
Minor backgrounds include diboson production, W boson production in association with b-jets (with one non-prompt muon satisfying the isolation criteria) and production of a single top quark or tt pair in association with a vector boson. The contribution of the minor backgrounds in the signal region is at the percent level. They are estimated using simulation normalised to the best available theory prediction. non-prompt/misidentified muons in the DY+jets component is accounted for by the data template, and the potential contribution from multĳet events is found to be negligible.

Systematic uncertainties
Systematic uncertainties in the analysis are divided into three categories: experimental uncertainties affecting the simulated background and signal processes, uncertainties in the modeling of the DY template, and theoretical uncertainties of the simulated background and signal samples. Table 2 shows a summary of the dominant systematic uncertainties in the total background and signal yields in the signal region bins, as resulting from the fits described in Section 7 and hereafter denoted by "post-fit".
Among the experimental uncertainties, the leading effects come from those associated with the calibration and resolution of jet energies [100], and with the measurement of the -tagging efficiency [95]. The impact of these uncertainties on the total background (signal) yields in the SR bins is as large as 3% (10%). The uncertainty in the combined 2015-2018 integrated luminosity is 1.7% [101], obtained using the LUCID-2 detector [102] for the primary luminosity measurement. Other uncertainties, such as those arising from the muon identification efficiency, momentum scale and resolution [90,103], and pileup are found to have a negligible impact on the final yields.
The uncertainty arising from limited MC sample sizes ranges from 8% to as large as 40% in the low mass bins due to there being few¯events in this region.
Five sources of uncertainty in the data-driven DY template are considered. The uncertainty in subtracting non-DY events from the non-reweighted template in the analysis regions is assessed by comparing the nominal template, for which the simulated non-DY backgrounds had been subtracted before reweighting, with an alternative template for which no subtraction had been performed. The uncertainties in the template kinematics modeling are derived by comparing the DY template with simulation in two key variables: miss T and KL . The ratios of the template to the simulated DY events are fit with linear functions and used in assigning uncertainties to the shapes of the miss T and KL distributions. Similarly, the uncertainty in the template shape is assessed by comparing the template with the smoothed simulated sample and applying the observed difference as a systematic uncertainty. The uncertainty in the normalization of the DY template is obtained from the fits to data. Finally, the uncertainty in the efficiency of the BDT selection criteria is evaluated by taking the difference in the BDT cut efficiency, BDTm a >0.2 DY events SR / no BDTm a cut DY events SR , between the template and the simulation. All one-sided DY template uncertainties are symmetrized around the nominal value.
To assess the uncertainties in the generation of the hard-scatter¯process, the P sample is compared with a sample generated using M G 5_aMC@NLO 2.3.3. The hadronization and fragmentation uncertainties in the PS are evaluated by comparing the nominal sample showered by P 8.230 with an alternative sample generated by P using the same PDF in matrix element as for the nominal sample, but showered with H 7.0.4 [104,105]. The initial-and final-state radiation (ISR and FSR) uncertainties of the¯sample are assessed by varying the internal P 8.230 showering parameters. Finally, the uncertainties due to the PDF choice are evaluated using the internal variations of the nominal PDF4LHC15_ _30 set [106].
Uncertainties in the calculation of the ggF and VBF Higgs boson production cross-sections are assessed by following the recommendations of the LHC Higgs Working Group given in Refs. [77,82]. As no VH signal sample was generated, a conservative 100% uncertainty is assigned to the estimated VH yield. To evaluate the uncertainties due to the PDF choice, the yields obtained with the baseline NNPDF30_ _ _0118 set are compared with the yields obtained using the internal variations of NNPDF30_ _ _0118 and with the yields obtained with the nominal MMHT2014 68 118 [107] and CT14 [108] sets. The largest difference is taken as the overall PDF uncertainty for all signal mass points. Furthermore, the effects of uncertainties in the ISR, FSR, multiparton interactions (MPI) in P , parton showering, and renormalization and factorization scales are also assessed. Uncertainties from these sources have an impact of 1-6% on the signal yields, with the largest contributions arising from the ggF production cross-section and FSR uncertainties.

Analysis and results
The final background and signal estimates are obtained in a set of binned likelihood fits [109] using the HistFitter [110] package. The likelihood is a product of Poisson probability functions, describing the observed and predicted numbers of events in each region, and Gaussian distributions that constrain the nuisance parameters associated with the systematic uncertainties. In the background validation fit, the data in TCR and DYCR are used to extract the normalization of the¯and DY backgrounds, respectively. As the¯sample in TCR is modeled very well, it is implemented as only one bin in the fit, whereas DYCR is divided into five equal-width bins in to provide greater sensitivity to the DY template shape. The purpose of this fit is to validate the modeling of the background in the control and validation regions and in SRincl. The fitted¯normalization factor is¯= 1.07 +0.06 −0.07 , while the value of DY has no physical meaning because it is scaled from a template region and is thus not quoted. Figures 4 and 5 show post-fit distributions of KL , miss T , ln( max ), and spanning various analysis regions, while Figure 6 shows BDT20 and BDT50 in SRincl. Good agreement between the estimated backgrounds and the data is observed in the kinematic distributions. In SRincl, 1185 events are observed, which is compatible with the total estimated background of 1155.3 ± 13.6. The yields in several representative SR bins, i.e. windows after applying the BDT selection, as obtained from the background validation fit above, are shown in Table 3. When comparing the systematic uncertainty with the statistical uncertainty, it can be seen that the analysis is clearly statistically limited. Figure 7 shows the data and the estimated backgounds in all final SR bins. Due to the limited statistics of the background samples, the estimates are not perfectly smooth; however, the bin-to-bin fluctuations are much smaller than the statistical uncertainty of the data. Larger jumps, which occur at = 23, 28, 33, 38 GeV etc., appear when the BDT discriminant used for the selection changes from the one trained in the lower mass range to the one trained in the higher mass range.
To test for the presence of new phenomena, fits are performed for each of the 47 hypothesized signal masses in the range 16 ≤ ≤ 62 GeV in 1 GeV steps. It was verified that the analysis is also sufficiently sensitive to a signal with centered in between these 1 GeV steps. TCR, DYCR, and the respective SR bin are included in each fit in order to constrain the backgrounds and the signal to the data.
A model-independent fit, i.e. not including any signal sample, is performed to test whether the data are compatible with the background-only hypothesis. The result is a scan of 0 -values as shown in Figure 8. The largest discrepancy is found at = 52 GeV, corresponding to a local (global) 0 -value of 0.00054 (0.048) and a local (global) significance of 3. 3 (1.7 ). The global significance was calculated from the asymptotic formulae in Refs. [109,111].    . Neighboring bins partially overlap, hence they are not statistically independent. The bottom panel shows the pull in each bin, defined as ( obs − pred )/ tot , where obs is the number of events in the data, pred is the number of fitted background events and tot is the total (systematic and statistical, added in quadrature) uncertainty in the fitted background yield. Discontinuities in the background predictions appear when the BDT discriminant used for the selection changes from the one trained in the lower mass range to the one trained in the higher mass range. The histogram labeled as "Other" in the legend includes the contributions from the diboson, single-top-quark,¯+ and +jets backgrounds. Upper limits, derived using the CL s technique [112,113], are set on B ( → → ) in a series of conditional fits, this time also including the signal samples. The limits as a function of are shown in Figure 9. Uniform sensitivity is achieved for all masses above 18 GeV, while for lower signal masses, ≤ 18 GeV, the sensitivity of the analysis decreases due to -jets falling below the reconstruction threshold or merging into one reconstructed jet. Figure 10 shows and BDT distributions after the signal+background fit for two SR bins, A factor of ∼2 improvement in sensitivity comes from the larger dataset, and a further factor of ∼2 is achieved thanks to the use of multivariate techniques to discriminate between the signal and the SM backgrounds. Due to small number of background events at lower signal masses , the BDT training is less efficient in this region, and the gain from applying the BDT selection criteria is higher at higher . Taking as an example the favorable scenario with B ( → → )/B ( → ) = 0.16%, the analysis probes the Higgs boson branching fraction into pseudoscalars down to B ( → ) = 1.3%, much lower than the limits derived from combinations of the Higgs boson measurements.
So as not to restrict the analysis sensitivity solely to models where the -particle is a pseudoscalar, upper limits obtained without employing the BDT discriminants are also derived as shown in Figure 11. In addition to being less sensitive to the particle's CP properties, the limits in SRincl without the BDT selection also facilitate reinterpretations of the analysis. These limits are derived in the same way as described above, i.e. by scanning the windows of SRincl, but omitting the final selection on the BDT discriminants. The expected limits obtained when employing the baseline analysis strategy are also shown in Figure 11 for comparison, illustrating the significant improvement in sensitivity to pseudoscalars when using the BDTs. The excess observed at = 52 GeV in the BDT analysis is not supported by the limits derived without the BDTs. Figure 12 shows the data and the estimated backgrounds in all final SR bins, without applying the BDT selection.    for ≤ 45 GeV ( > 45 GeV). Neighboring bins partially overlap, hence they are not statistically independent. The bottom panel shows the pull in each bin, defined as ( obs − pred )/ tot , where obs is the number of events in the data, pred is the number of fitted background events and tot is the total (systematic and statistical, added in quadrature) uncertainty in the fitted background yield. The discontinuity at = 45 GeV appears where the window size is changed. The histogram labeled as "Other" in the legend includes the contributions from the diboson, single-top-quark,¯+ and +jets backgrounds.

Conclusion
A search for light pseudoscalar particles (denoted by ) in the decays of the 125 GeV Higgs boson in the final state with two muons and two -tagged jets, → → , is presented. The analysis is performed using 139 fb −1 of √ = 13 TeV collision data recorded by the ATLAS detector at the LHC between 2015 and 2018. A narrow resonance is searched for in the dimuon invariant mass spectrum in the range 16 ≤ ≤ 62 GeV. BDT classifiers are trained to distinguish the → signal, where is a pseudoscalar, from the SM backgrounds. Additionally, the result without selection on the BDT discriminants is also provided to ensure sensitivity to models where the -particle is not necessarily a pseudoscalar, as well as to facilitate reinterpretations of the analysis. No significant excess of the data above the SM backgrounds is observed. In the BDT analysis, the lowest local 0 -value of 0.00054 is observed at = 52 GeV and corresponds to a local significance of 3.3 . The global significance of that excess is determined to be 1. (Taiwan), RAL (UK) and BNL (USA), the Tier-2 facilities worldwide and large non-WLCG resource providers. Major contributors of computing resources are listed in Ref. [114].
[52] ATLAS Collaboration, Performance of the ATLAS muon triggers in Run    The ATLAS Collaboration