Search for Higgs-like particles produced in association with bottom quarks in proton-antiproton collisions

We report on a search for a spin-zero non-standard-model particle in proton-antiproton collisions collected by the Collider Detector at Fermilab at a center-of-mass-energy of 1.96 TeV. This particle, the $\phi$ boson, is expected to decay into a bottom-antibottom quark pair and to be produced in association with at least one bottom quark. The data sample consists of events with three jets identified as initiated by bottom quarks and corresponds to $5.4~\text{fb}^{-1}$ of integrated luminosity. In each event, the invariant mass of the two most energetic jets is studied by looking for deviations from the multijet background, which is modeled using data. No evidence is found for such particle. Exclusion upper limits ranging from 20 to 2 pb are set for the product of production cross sections times branching fraction for hypothetical $\phi$ boson with mass between 100 and 300 GeV/$c^2$. These are the most stringent constraints to date.

We report on a search for a spin-zero non-standard-model particle in proton-antiproton collisions collected by the Collider Detector at Fermilab at a center-of-mass-energy of 1.96 TeV. This particle, the φ boson, is expected to decay into a bottom-antibottom quark pair and to be produced in association with at least one bottom quark. The data sample consists of events with three jets identified as initiated by bottom quarks and corresponds to 5.4 fb −1 of integrated luminosity. In each event, the invariant mass of the two most energetic jets is studied by looking for deviations from the multijet background, which is modeled using data. No evidence is found for such particle. Exclusion upper limits ranging from 20 to 2 pb are set for the product of production cross sections times branching fraction for hypothetical φ boson with mass between 100 and 300 GeV/c 2 . These are the most stringent constraints to date.

I. INTRODUCTION
The discovery of a Higgs boson [1,2] completes the standard model (SM), but does not exclude the existence of yet-unknown particles that could provide direct indication of non-SM physics.  [4], where two scalar Higgs doublets exist, leading to five physical Higgs bosons, of which three are electrically neutral and collectively denoted as φ. The φ boson particles would be produced preferably in association with a b quark. The decay into bb pairs is expected to have a branching fraction of about 90% in this model [5]. While the production cross section for SM Higgs bosons through vector-boson fusion in proton-antiproton (pp) collisions at 1.96 TeV is 0.07 ± 0.01 pb [6], the cross section for the φb process is calculated to be O(1) pb [3]. In addition, scalar neutral particles with large couplings to b quark are also predicted as mediators in dark-matter models [7,8]. Even for resonances with nonenhanced couplings to b quarks, the sensitivity of searches with b quarks in the final state is competitive, due to the distinctive final-state features that allow background reduction. The analysis described in this paper searches for massive particles decaying into bb pairs and produced in association with one or more b quarks. The signal is searched for in final states with at least three b quarks, where the requirement of the third b quark is used to further suppress the multijet background, thus increasing the signal sensitivity. The requirement of a fourth b quark is not considered, as its kinematic distributions fall outside the available acceptance resulting in lower signal efficiency.
Searches for such a process have been performed by the CDF [9] and the D0 [10] experiments at the Tevatron pp collider, as well as by the CMS experiment in pp collisions at the Large Hadron Collider (LHC) [11]. The combined CDF and D0 result showed an excess of events of more than two standard deviations (σ) over the SM background prediction, compatible with the signal of a 100 − 150 GeV/c 2 φ boson particle [12]. The CMS collaboration has set exclusion limits for such particles as functions of the MSSM parameters. But, because of the higher collision energy, which leads to a larger multijet production rate, searches for a particle with mass smaller than 200 GeV/c 2 at the LHC are limited by the difficulties in selecting online low-energy jets. This analysis investigates the reported 2σ deviation using completely independent data with the same pp initial state in the low-mass range of 100 to 300 GeV/c 2 .
The analysis presented in this paper is based on data from pp collisions at 1.96 TeV center-of-mass energy collected by the CDF II detector and corresponding to 5.4 fb −1 of integrated luminosity. The sample corresponds to the data collected after Spring 2008, when an ad-hoc online selection, which requires at least one jet identified as being initiated by a b quark (b-jet) through a secondary-vertex algorithm [13], was implemented. The offline analysis requires at least three b-jets. The relatively long b-quark lifetime provides distinctive features against backgrounds, strongly enhancing the sensitivity of the search.
The paper is organized as follows. In Sec. II, the CDF II detector and the online data selection system are briefly described, while the data selection and the signal simulation are outlined in Sec. III. Sec. IV presents the datadriven background model. In Sec. V, the fits to the data assuming the background-only hypothesis are described. Systematic uncertainties are summarized in Sec. VI. The search for a massive particle is presented in Sec. VII, and the results are discussed in Sec. VIII. Finally, the main conclusions are summarized in Sec. IX.

II. THE CDF II DETECTOR
The CDF II detector was an azimuthally and forwardbackward symmetric apparatus located around one of the pp collision points at the Fermilab Tevatron collider. A detailed description of its design and performance is in Refs. [14,15]. Cylindrical coordinates are used to describe the event kinematics, in which ϕ is the azimuthal angle, θ is the polar angle with respect to the proton beam, r is the distance from the nominal beam line, and positive z corresponds to the proton-beam direction, with the origin at the center of the detector. Pseudorapidity is defined as η = − ln(tan(θ/2)). The transverse momentum of a particle is defined as p T = p sin(θ) and the transverse energy as E T = E sin(θ).
A superconducting solenoidal magnet provided a magnetic field of 1.4 T oriented along the beam direction. Tracking devices placed inside the magnet measured charged-particle trajectories (tracks). In particular, precise track measurements near the interaction point were provided by silicon-strip tracking detectors [16] in the polar range |η| < 1.1. A 3.1 m long cylindrical drift chamber [17] provided full coverage over the range |η| < 1.
Particle energies were measured by calorimeters surrounding the solenoid and covering the region |η| < 3.6: segmented lead-scintillator electromagnetic [18] and ironscintillator hadronic [19] modules. An online selection system (trigger) [20,21] reduced the rate of events to be permanently recorded from 1.7 MHz to 150 Hz. The trigger system was organized in a threelevel architecture. The first level (L1) was based on custom-designed hardware that exploited low-resolution muon, track, and calorimeter information to produce a decision. Events selected by L1 were analyzed by the level 2 (L2) system, a combination of hardware and commercial processors where a partial event reconstruction was performed. The level 3 (L3) consisted of a large array of processors where data were read out and accepted events were sent to mass storage.

III. DATA SELECTION AND SIGNAL DESCRIPTION
The data sample used in this measurement was collected with an ad-hoc trigger optimized for the selection of events with b-jets. The trigger selection reached high signal purity by performing online b-jet tagging: the secondary vertex (SV), corresponding to the position where the b hadron decays, is inferred from clusters of tracks displaced from the primary pp interaction vertex.
At L1, at least two central (|η| < 1.5) calorimetric energy depositions (towers), with E T ≥ 5 GeV and two tracks having p T > 2 GeV/c were required. At L2, jets with E T > 15 GeV and |η| < 1.0 were reconstructed using a fixed-cone algorithm with a radius parameter, R, of 0.7 [22]. At least two tracks with signed impact parameter d 0 > 90 µm matched to one of the jets had to be identified. The signed impact parameter is defined where R b and ϕ b are the b-hadron decay length and azimuthal angle, respectively. At this stage, the b-hadron decay length in the transverse plane was required to be greater than 0.1 cm. At L3, the L2 requirements were applied to the offline-quality variables. A more detailed description of the online selection algorithm is in Ref. [13]. This trigger replaced the lower-purity trigger used in the previous CDF φb search [9] and was sufficiently selective to remain online even with instantaneous luminosities of up to 3.0 × 10 32 cm −2 s −1 .
The offline selection requires at least three jets with E T > 22 GeV and |η| < 1, with energies corrected to account for detector and physics effects, such as the presence of inactive material in the calorimeters and multiple pp interactions per beam crossing, according to the standard CDF procedures [23]. Each of the three jets is required to be associated with a secondary vertex identified by the SECVTX b-tagging algorithm [15], which assigns to each jet a positive or negative tag. If the secondary vertex is reconstructed inside the jet cone, the jet has a positive tag. If the secondary vertex is found on the opposite side of the primary vertex with respect to the jet direction, the jet has a negative tag. While most of the jets initiated by b quarks are positively tagged, negatively tagged jets are predominantly initiated by light-flavor quarks in which a false secondary vertex is reconstructed based on resolution tails of the tracks.
The sample with three positively-tagged jets constitutes the signal sample, and is referred to as the triple-tagged sample. The sample where two jets have a positive tag and the third jet has a negative tag is referred to as control sample. A sample with at least three jets with E T > 22 GeV and |η| < 1, but with the requirement of just two positively-tagged jets, is used to model the backgrounds and is referred to as the double-tagged sample.
The pp → φb+X signal is simulated using the Pythia 6.216 [24] Monte Carlo simulation with the CTEQ5L [25] set of parton distribution functions (PDF), and passed through the detector and trigger simulation based on a GEANT3 [26] description. At tree level, the cross section for this signal is dominated by the process gg → bbH. The process gg → bbH is employed to simulate the signal final state. The standard model Higgs boson, forced to decay into a bb quark pair and with modified mass, is used to mimic the narrow φ state. Samples are generated for a variety of φ masses with a lower threshold of 15 GeV/c on the bottom quark p T . These simulated signals are used to evaluate the acceptance and efficiency for reconstructing a φb signal as functions of the φ mass. The combined efficiency and acceptance for the event selection increases from 0.37% to 0.87% for φ boson masses from 100 GeV/c 2 to 250 GeV/c 2 , respectively, and then decreases down to 0.80% at 300 GeV/c 2 . At very high masses the efficiency decreases because the b quarks produced in association are more likely to fall outside the acceptance.

IV. BACKGROUND DESCRIPTION
The dominant background is the multijet production of heavy-flavor quarks, which is conventionally categorized into the following processes: flavor creation, flavor excitation, and gluon splitting [27]. Events where two gluon-splitting processes occur, or a flavor excitation process is followed by a gluon-splitting process, can lead to final states with three or more heavy quarks.
The low-energy quantum chromodynamics (QCD) calculations that would be needed for reliable rate predictions of these events are intractable, thus it is not possible to rely on direct theoretical predictions. Furthermore, the invariant mass of the two leading-E T jets, m 12 , is affected by biases introduced by the trigger and the displaced-vertex tagging requirements that would need to be modeled. Therefore, a data-driven approach is chosen to model the various background components. Small (< 1%) contributions from Z bosons produced in association with b-jets followed by Z → bb decay, and from tt pair production, are neglected.
The previous CDF measurement [9] showed that the triple-tagged jets sample contains predominantly two jets initiated by real b quarks. Furthermore, the contamination from light-quark-initiated jets in the double-tagged sample is negligible as shown in Ref. [28], where the same online selection is used. Hence, the double-tagged sample is used to determine the normalized multijet-backgroud distributions (templates) needed for the analysis of the triple-tagged sample. The events in the double-tagged sample, with an additional third untagged jet, are separated into two categories, bbY and Y bb, where Y can take values "B" for bottom quark, "C" for charm quark, and "Q" for light quark or gluon. The classification label depends on the E T rank of the untagged jet, which is represented by the upper-case letter Y , and no distinction is made between the two leading jets. The sample where the third leading jet and either one of the two leading jets is tagged is labeled Y bb, while bbY indicates events with an untagged third jet.
Six background templates, bbB, Bbb, Cbb, bbC, Qbb, and bbQ, are constructed by weighting the events by the probability that the untagged jet of a given E T would be identified as a b-jet by the SECVTX-tagging algorithm, under the condition that it was initiated by a b, c, or light quark. These probabilities, called tagging matrices, are constructed on a per-jet basis, assuming that they do not depend on the event topology, but only on jet kinematic properties. They have been studied using simulated samples of bb, cc, and light-quark samples generated with the full CDF II detector simulation.
The simulated bb sample includes contributions from flavor creation, flavor excitation, and gluon splitting, while the cc sample is generated assuming only flavor creation. Differences in response of the online and the offline btagging algorithms between jets in experimental and simulated data are corrected using scale factors evaluated on a dedicated data sample [28]. The value of the trigger scale factor is 0.68 ± 0.03, and for the offline b-tagging is 0.86±0.05. The b-tagging data-to-simulation scale factors are determined as functions of the jet E T and applied to each simulated jet.
To further discriminate the jet-flavor composition of the triple-tagged sample, a second variable, x tags , is introduced alongside m 12 . The x tags variable is derived from M SV , the invariant mass of all tracks, assumed to be charged pions, associated with the reconstruction of the secondary vertex. The M SV distribution is sensitive to the flavor of the parton initiating the jet. For jets initiated by c quarks, the distribution peaks at lower values than the one from jets initiated by b quarks. For the jets initiated by light quarks or gluons, denoted as q, a secondary vertex can only be reconstructed due to track mismeasurements. In this case, the M SV distribution follows an exponential decrease. Following Ref. [9], the x tags variable is defined as where M SV,1,2,3 is the M SV of the first, second, and third leading jet, respectively. The x tags variable helps to discriminate backgrounds with high M SV from backgrounds with low M SV . In particular, the M SV,1 + M SV,2 distribution is sensitive to the Cbb and Qbb contributions, while the M SV,3 distribution discriminates statistically between the bbC and bbQ cases.
To build the x tags variable for the background templates, the events of the double-tagged sample are weighted by taking into account the flavor of the simulated untagged jet. Because no SV is associated with the untagged jet in double-tagged events, for the computation of x tags , all possible M SV values to the jet are assigned, each properly weighted by the tagging matrices, which are also parametrized as functions of the M SV variable. By construction, each event has multiple entries in the background template, each with the same value of m 12 and different x tags . Since the number of events used to build the templates is two orders of magnitude larger than the yield of the analysis sample, the correlated fluctuations introduced in the x tags templates with this construction are neglected.
The bbC and bbQ template distributions are too similar to be discriminated by the fit. Therefore, their average distribution, bbX, is used, reducing the number of the background templates to five. The bbX double-tagged sample contains 1.3 × 10 5 events and the Y bb doubletagged sample contains 1.4 × 10 5 events.

V. RESULTS UNDER THE BACKGROUND-ONLY HYPOTHESIS
The two-dimensional distribution in the variables m 12 and x tags for the 5 616 triple-tagged events is fitted under the hypothesis that no signal is present. A binned maximum-likelihood fit is used, where the likelihood function is constructed using a joint two-dimensional probability density function of the two variables m 12 and x tags . The entries in each bin follow a Poisson distribution, µ nij ij e −µij /n ij !, with n ij being the number of observed events in the ith bin of m 12 and the jth bin of x tags , where the expected yield µ ij is given by The index b runs over the five background templates, bbB, Bbb, Cbb, Qbb, and bbX. The parameters f b,ij are the fractions contributed by each background component to bin (i, j). The value N b of each background yield, normalized to the total number of events, is determined by the fit.
The control sample, which consists of the 2 359 events with two positive and one negative b-tagged jets, is used to validate the background templates for light-flavor quarks. This sample, which is expected to contain almost purely Qbb and bbQ events, is fitted using all the background templates. The results return only contributions of the  Table I summarizes the fit results and compares them with an estimate based on the doubletagged sample. Studies using simulated samples in Ref. [9], where the relevant analysis conditions mirror the present analysis, show that in events with at least two b-jets, about 2% of the third jets are from b quarks, about 4% from c quarks and the remaining from light quarks or gluons, independently of the jet-energy ordering. The expected number of events for each background category in the triple-tagged sample is then estimated by multiplying the number of double-tagged events by these fractions.
The expected numbers of Qbb and bbQ events of the bbX template are extracted using the results of the fit to the negative-tagged control data sample. The results of the fit to the triple-tagged data sample assuming the backgroundonly hypothesis are consistent with the predictions, with the exception of the Cbb component, whose mass shape is too similar to the bbB and Bbb shapes to allow a significant separation by the fit. The large uncertainties in the Bbb and bbB fractions determined by the fit are due to their −0.97 correlation, which indicates that the fit is unable to distinguish between the two components. In the limit calculation described in Sec. VIII, the correlation between background components is then taken into account.

VI. SEARCH FOR RESONANCES
A search for a Higgs-like particle φ is performed in the mass range of 100 − 300 GeV/c 2 by fitting the m 12 and the x tags distributions using the procedure described in the previous section and allowing for a signal component in the number of events in each bin ν ij where N s is the total number of signal events, f s,ij represents the proportion of the signal template for each bin, and N b and f b,ij have the same meaning as in Sec V. The signal templates are obtained from the simulated signal samples with the requirement that three jets are b-tagged. Figure 2 shows the distributions of the leading dijets mass m 12 and the flavor separator x tags , with results of the fit overlaid for a φ test mass of 160 GeV/c 2 . In this case, the fit returns 130 ± 70 signal candidates, with a fit quality of χ 2 /d.o.f.= 16/21. This would correspond to a cross section times branching fraction of about 7 pb for the signal model, assuming a branching fraction of 90% to bb quark pairs and a width of 36 GeV/c 2 . Only statistical uncertainties are considered here.
Fits perfomed under various assumptions for the relative proportions of the Cbb, Bbb, and bbB components yield consistent signal estimates, confirming that the similarity between background mass shapes prevents the fit from distinguishing precisely among various components but does not introduce signal biases.

VII. SYSTEMATIC UNCERTAINTIES
Systematic uncertainties affect both the signal and the background description. The uncertainties that impact the number of events of each component are classified as 'rate' uncertainties, and the ones that come from the shape of the m 12 and x tags distributions are labeled as 'shape' uncertainties. Table II summarizes the systematic uncertainties considered.
The luminosity uncertainty follows Ref. [29]. The online and offline b-tagging systematic uncertainties are taken from Ref. [28]. The systematic uncertainty in the signal efficiency due to the CDF jet-energy correction is estimated by shifting the correction by 1 σ of its total uncertainty [31]. In this way the acceptance and the shape of the signal are modified. The acceptance changes from 7% to 4% in the 100 − 300 GeV/c 2 mass range of the φ particle.
The simulated signal samples are generated using the CTEQ5L set of PDFs. The uncertainty due to this choice is evaluated by generating simulated samples using the CTEQ6L [30] set and taking the difference in acceptance as uncertainty. The uncertainty due to the finite size of the background templates is taken into account assuming Poisson fluctuations in each bin. The mass of the SECVTX tags used to build the x tags variable, is varied by ±3% around the chosen values following Ref. [9].

VIII. LIMIT ON THE PRODUCTION CROSS SECTION
The fitted signal yield in Sec. V does not represent a clear evidence of a narrow states in the triple-tagged data set, whose composition is instead consistent with the sum of the background SM components. Exclusion upper limits at the 95% confidence level (CL) on the production cross section times branching fraction are set as functions of the mass of the particle, by using a modified frequentist CL S method [32]. The limit calculation is based on the MCLIMIT package [33]. Simulated experiments are generated based on the background modeling with the normalization taken from the third column of Table I, and on the various signal templates as functions of the φ mass. The fractions of the individual background normalizations and the signal yields are varied for each simulated experiment according to the systematic uncertainties in   Table II.
These simulated experiments are then fit under the background-only and the background-plus-signal hypotheses, with the φ mass varying between 100 and 300 GeV/c 2 . The test statistic employed to calculate the limit is the difference in χ 2 between the fits under the two hypotheses. The expected limit on the signal yield as a function of the φ mass is the median of the results in samples where no signal is present. The same procedure is repeated on data to determine the observed limit. The number of events is then translated into cross section times branching fraction, σ(pp → φb)B(φ → bb), using the signal acceptance, the signal efficiency, the integrated luminosity, and the data-to-simulation scale factors for the online and offline b-tagging algorithm.
The observed 95% CL limit, and the median expected limit under the background-only hypothesis, are summarized in Table III and shown in Fig. 3 with bands corresponding to fluctuations including 68.3% (1σ) and 95.5% (2σ) of the expected limits.
All observed limits are within the 1σ band of the expected limit, indicating the absence of any statistically significant excess of events.

IX. CONCLUSION
A search for a Higgs-like particle with 100-300 GeV/c 2 mass range decaying into a pair of b quarks and produced in association with at least one additional b quark in pp collisions is reported.
No significant deviations from the SM expectations for background are observed. The sensitivity of this analysis is doubled with respect to the previous CDF result. For that analysis [9], the most significant excess of events  with respect to the expected background, was observed at m φ = 150 GeV/c 2 with a significance of 2.8 σ. This excess, interpreted as associated to a narrow scalar particle, corresponded to a production cross section times branching fraction of about 15 pb. The result reported here excludes such a signal rate with 95% confidence.