Studies of inclusive four-jet production with two b-tagged jets in proton-proton collisions at 7 TeV

Measurements are presented of the cross section for the production of at least four jets, of which at least two originate from b quarks, in proton-proton collisions. Data collected with the CMS detector at the LHC at a center-of-mass energy of 7 TeV are used, corresponding to an integrated luminosity of 3 inverse picobarns. The cross section is measured as a function of the jet transverse momentum for pt>20 GeV, and of the jet pseudorapidity for abs(eta)<2.4 (b jets), 4.7 (untagged jets). The correlations in azimuthal angle and pt between the jets are also studied. The inclusive cross section is measured to be sigma(pp to 2 b + 2 j + X) = 69 +/- 3 (stat) +/- 24 (syst) nb. The eta and pt distributions of the four jets and the correlations between them are well reproduced by event generators that combine perturbative QCD calculations at next-to-leading-order accuracy with contributions from parton showers and multiparton interactions.


Introduction
• POWHEG 1.0 [36,37] matched to the PYTHIA 8 parton showers including a simulation of MPI. The POWHEG event generator uses NLO dijet matrix elements implemented via 2→2 and 2→3 diagrams. These matrix elements include only LO effects for the four-jet configuration of the present analysis. For the hard-scattering process, the HERAPDF1.5NLO [38] PDF set is used with a minimump T of 5 GeV. The b quarks are treated as massless in the matrix element calculation. The UE provided by PYTHIA 8 is simulated with the CUETS1 tune, which uses the HERAPDF1.5LO [38] PDF set and reproduces with very high precision UE and jet observables at various collision energies. Since the POWHEG predictions contain both real and virtual corrections for the dijet matrix elements, they are used as the reference baseline in the present analysis. Therefore, the full theoretical uncertainty is provided for the POWHEG simulation, while only the central predictions are provided for the other MC simulations.
• MADGRAPH 5.1.5 [39] interfaced with PYTHIA 8. The MADGRAPH predictions use a LO multijet matrix element with up to four final-state partons, calculated with the CTEQ6L1 PDF, and a simulation of the UE provided by PYTHIA 8 tune CUETM1 [21], which uses the NNPDF2.3LO PDF set [40,41]. The p T sum of the four partons, H T , is required to be H T > 50 GeV, and the b quarks are treated as massless. The matching scale between the matrix element calculations and the parton shower simulation is taken to be 10 GeV, within the k T -MLM scheme [42]. Underlying event data are well described by this combination of matrix elements plus parton showers with a proper UE tune [21].
The detector response is simulated in detail with the GEANT4 package [43]. All simulated samples are processed and reconstructed in the same manner as collision data. The multijet final state can be mimicked by various background sources, such as Drell-Yan and W boson production associated to jets, and top-antitop events. The size of these backgrounds is estimated with PYTHIA 8 and found to be negligible, with a cross section in the measured phase space less than 0.5% of that for pure QCD multijet events. Therefore, these background sources are neglected in the following.

Event selection
This analysis uses data from pp collisions at √ s = 7 TeV recorded with the CMS apparatus in 2010 corresponding to an integrated luminosity of 3 pb −1 . The data were collected at low luminosity (<0.2 × 10 33 cm −2 s −1 ), and consequently with low probability of multiple pp interactions in the same bunch crossing (pileup). These running conditions correspond to a fraction of the total integrated luminosity of 36 pb −1 collected in 2010. The mean number of interactions per bunch crossing is around 1.6 for this sample, which results in small pileup effects in the measured distributions. The MC samples are reweighted to the number of interactions in the data in order to match the multiplicity of reconstructed primary vertices.
For the present study, three HLT single-jet trigger sets are analyzed: one with jet p T threshold of 15 GeV is used for leading jets with 20 < p T < 50 GeV, a second with p T threshold of 30 GeV for leading jets with 50 < p T < 140 GeV, and a third with p T threshold of 50 GeV for leading jets with p T above 140 GeV. In the region 20 < p T < 80 GeV, the trigger efficiency is less than 100%, increasing from 45% for leading jets with p T ≈ 20 GeV. A correction is thus applied as a function of the leading jet p T and η. For leading jet p T > 80 GeV, the trigger is fully efficient. The choice of such regions is a compromise between statistics and reliability of the trigger efficiency correction.
The physics objects used in this analysis are particle flow (PF) jets [44]. The PF algorithm [45] combines information from all relevant CMS subdetectors to identify and reconstruct all particle candidates in the event, namely leptons, photons, charged and neutral hadrons. The energy of the muons is obtained from the corresponding track momentum. Charged hadrons are reconstructed from tracks in the tracker. The energy of the electrons is determined from a combination of the track momentum at the main interaction vertex, the corresponding ECAL cluster energy, and the energy sum of all bremsstrahlung photons attached to the track. Photons and neutral hadrons are reconstructed from energy clusters in the ECAL and HCAL, respectively; only clusters far away from the extrapolated position of any track are used. Jets are reconstructed from the four-momenta of the PF candidates with the anti-k T algorithm [46] with a distance parameter of 0.5. A tight quality selection [47] is applied to suppress unphysical jets, i.e., jets resulting from noise in the ECAL and/or HCAL. Each jet is required to contain at least two PF candidates, one of which has to be a charged hadron. The jet energy fraction carried by neutral hadrons, photons, muons, and electrons must be less than 90%. With these criteria, jets are selected with an efficiency greater than 99% and a misidentification rate (i.e. the probability of selecting fake jets, like e.g., originating from leptons or calorimeter noise) smaller than 0.5% for jet p T > 20 GeV. A jet p T correction is applied to both data and simulation to account for the nonlinear response of the calorimeters and other instrumental effects. These corrections are based on in situ measurements using dijet, γ+jet, and Z+jet data samples [48].
A primary vertex (PV) is identified by a collection of tracks measured in the tracker. If more than one PV is present, the vertex with the highest sum of the squared p T of the tracks associated to it is selected. The selected vertex is required to be reconstructed from at least five charged-particle tracks and must satisfy a set of quality requirements, including |z PV | < 24 cm and ρ PV < 2 cm, where z PV and ρ PV are the longitudinal and transverse distances of the PV from the nominal interaction point in the CMS detector.
The b jets are identified by using information on the secondary decay vertex of the b hadrons, the impact parameter significance, i.e., the three-dimensional impact parameter divided by its resolution, and the tracks and jet kinematics [49], through the so-called "combined secondary vertex" (CSV) discriminant. A loose selection [49] is used in the b tagging algorithm, which gives a b tagging efficiency on single jets larger than 75% for jet p T > 20 GeV, with a maximum of 85% at p T ≈ 150 GeV, as estimated by simulation studies with the PYTHIA 6 sample. The light-flavor (u, d, s quark or gluon) mistag probability is 20%, 10% and 15% for p T ≈ 20, 75 and 300 GeV, respectively, for |η| < 2, increasing to 35% for jets in the region 2.0 < |η| < 2.4. This loose selection provides a high-statistics sample, though with relatively few genuine b jets. After requiring the two b tags, the b jet purity, i.e. the percentage of selected events where both tagged jets originate from b quarks, is about 12% for this loose selection. The highest-p T (leading) b-tagged jet is a genuine b jet in 18% of the selected events, while the fraction of events where the second-highest-p T (subleading) b-tagged jet originates from a b quark is about 14%. There is a high degree of correlation between the purities of the leading and the subleading jets. From simulation studies, about 65% of the selected events with a true leading b jet also contain a true subleading b jet. The b jet purity of the medium selection for the b tagging algorithm [49] is 58% for the current analysis. Since the results obtained with the medium selection are consistent with those obtained with the loose selection within the systematic uncertainties, we use the latter results, which have higher statistical accuracy.
The correction for the events with four jets that pass the selection criteria but for which the two b-tagged jets are not genuine b jets is performed through the unfolding procedure employed to obtain stable-particle level distributions (Sec. 4). The amount of this type of background is estimated from the purity of the measured distributions. The measurement of the b jet purity is based on fits of the track counting high efficiency (TCHE) distributions [49] of each b-tagged jet with three different shape templates obtained from MC simulation, corresponding to the TCHE values for light-quark and gluon, charm, and bottom jet flavors. The TCHE discriminant corresponds to the second-highest impact parameter significance among all selected tracks belonging to the considered jet. The b jet purities measured in the data and those in the simulation differ by 2-7%. Scale factors (SF b-purity ), depending on jet p T and η, are applied to the simulation to correct for this difference. By applying SF b-purity to the simulated events, the b jet purity of the data sample passing the analysis criteria is consistent with that of the MC. Compatible results are obtained if the CSV discriminant of the b tagged jets is used in the fitting procedure, instead of TCHE distributions. The b jet purity of the selection is estimated in the data separately for leading and subleading b-tagged jets in different bins of p T and η.
Additional scale factors (SF b-tag ) are applied to the simulation in order to match the b tagging efficiencies measured in data [49]. They depend on the jet p T , η, and flavor, and range between 0.9 and 1.1.
A further reweighting as a function ofp T is applied to the LO generators used for data correction, in order to improve their description of the measured distributions.
Events with at least one PV and at least four jets with p T > 20 GeV are selected for the analysis: two of the four jets are the two b-tagged jets with highest p T within |η| < 2.4, while the other two are the remaining highest-p T jets selected within |η| < 4.7 without any b tagging requirement. If two or more b-tagged jets are present, the two with the highest p T are taken as the "b quark jet pair" (referred to as "bottom" hereafter). The "untagged jet pair" (referred to as "light" hereafter) is taken as the remaining two leading jets. The two different η ranges are chosen because the absence of the tracker in the forward region does not allow b jets to be identified for |η| > 2.4.
About 60 000 events are left in the data after the offline selection described above. In Fig. 1, the shapes of the p T and η distributions of the leading b-tagged and the leading untagged jet are compared to predictions of PYTHIA 6 and HERWIG++, before unfolding to the stable-particle level. These shapes are well described by both MCs in the central region and over the whole range of p T , while there are differences of up to 20-40% for the most forward pseudorapidities (|η| > 3).
Differential cross sections (referred to as "absolute cross sections" hereafter) as a function of p T and η of each of the four jets are measured in this analysis. In addition, differential distributions normalized to the total number of selected events (referred to as "normalized cross sections") are measured as a function of jet correlation variables very similar to those used in the four-jet analysis of Ref. [11]: • the difference in azimuthal angle (in the plane transverse to the beam axis, in radians) between the jets belonging to the light-jet pair:

Corrections and systematic uncertainties
Particle-level distributions are inferred from the reconstructed data by correcting for selection efficiencies and detector effects. The results are corrected to particle level by applying an iterative unfolding [50] as implemented in the ROOUNFOLD package [51]. Particles are considered stable if their mean path length cτ is greater than 10 mm. MC jets are identified as "b jets" at the particle level if a b quark is found within a cone of radius R = √ (∆η) 2 + (∆φ) 2 = 0.3 around the jet axis. The background consisting of events with four jets that pass the selection criteria but for which the b-tagged jets are not genuine b jets is corrected for with PYTHIA 6 tune Z2*, after applying the SF b-tag and SF b-purity scale factors. The correlation between events selected at the reconstructed and particle levels is then studied by constructing the response matrix. The response matrix quantifies the migration probability between the particle level and reconstructed quantities, as well as the overall reconstruction efficiency. It is obtained for each observable with the PYTHIA 6 tune Z2* sample. Diagonal terms in the response matrix correspond to particle-level quantities that are reconstructed in the same bin after detector simulation. Off-diagonal terms represent the probability of migration between bins at the particle level and bins at the reconstructed level. As an example, Fig. 2 shows the response matrices for the p T and the η of the leading b-tagged and the leading untagged jet. They exhibit a diagonal structure, with off-diagonal terms less than 30-40%. The bin widths are larger than the detector resolution at each bin.
The response matrix obtained with PYTHIA 6 is used for the data unfolding. As a cross check, a sample of events generated with HERWIG++ tune UE-EE-3 is unfolded with the PYTHIA 6 response matrix. All distributions agree with the generated ones within 9-20%. The iterative unfolding procedure is regularized by limiting the number of iterations to a certain value for each measured distribution. The optimal number of iterations is determined by minimizing the difference between the distributions measured in the data and the ones obtained by applying backwards the detector effects to the unfolded distributions. The number of iterations ranges between 2 and 4 depending on the observable. As expected, the statistical uncertainties of the unfolded distributions are larger than those of the reconstructed data. The unfolding to particle level includes corrections for jet resolution, flavor misidentification, and pileup effects. The results are presented in the kinematic region defined in Table 1. At least four jets p T > 20 GeV Two leading b jets |η| < 2.4 Two leading other jets |η| < 4.7 All significant sources of systematic uncertainties are investigated and the corresponding uncertainty is calculated for each distribution. The total uncertainty is obtained by summing up the individual contributions in quadrature. The following systematic effects are considered: Model dependence the response matrix obtained with PYTHIA 6 is used for the final correction, and the difference between this and that obtained with HERWIG++ is taken as a measure of the model dependence of the unfolding, resulting in an uncertainty ranging from 9% to 20%.
Jet energy scale (JES) the momentum of the jets is varied according to the uncertainty associated with the reconstructed p T [48]. The resulting uncertainty is of the order of 20-25% (5%) for the absolute (normalized) cross sections.  Jet energy resolution (JER) the JER differs for data and simulation by 6-19% [48] depending on the η range, and introduces a systematic uncertainty of 4-8% in all results.
Pileup reweighting the effect of the pileup reweighting procedure is evaluated and found to be negligible (< 0.1%).
B tagging scale factor (SF b-tag ) the values of the scale factors are varied by 10% for each jet flavor [49]. This variation results in an uncertainty of 15-18% for absolute cross sections and of 1-2% for the normalized ones.
B jet purity the b jet purity of the sample is evaluated by fitting separately the TCHE distribution of the leading and of the subleading b-tagged jet in bins of p T , η and ∆S. The difference between the unfolded results when using the SF b-purity obtained from the two fits is used as a systematic uncertainty, resulting in values of 10-12% for the absolute cross sections and 1-2% for the normalized distributions.
Trigger efficiency the trigger efficiency correction is varied within its uncertainty and the resulting corrections are applied to the data. These variations result in an uncertainty ranging from 1 to 6%.
Integrated luminosity the systematic uncertainty on the luminosity of the 2010 data, affecting the absolute cross sections, is 4% [52].
The dominant source of uncertainty is the JES, which is considered as correlated among the measured bins. The following aspects of the theoretical uncertainty affecting the POWHEG predictions are also evaluated: PDF uncertainty the choice of the PDF set influences the theoretical predictions. The uncertainty related to the PDF is determined by generating predictions with various PDF eigenvectors. As central PDF set, the HERAPDF1.5NLO together with the PYTHIA 6 tune CUETS1 is used.
Scale uncertainty the default renormalization and the factorization scales (µ R and µ F ) in the matrix element calculations are chosen to be equal to the leading jet p T value. The uncertainty related to the µ R and µ F choices is estimated by using POWHEG interfaced to the UE simulation provided by PYTHIA 8 tune CUETS1-HERAPDF. Six combinations of the (µ R /p T , µ F /p T ) scales: (0.5,0.5), (0.5,1), (1,0.5), (1,2), (2,1), and (2,2), are used. The scale uncertainties are evaluated by taking the envelope of the predictions obtained with the listed scale choices.
A summary of all the systematic effects is given in Table 2.

Results
The absolute differential cross sections are measured as a function of the jet p T and η, along with the normalized cross sections as a function of the jet correlation variables. In Table 3, the cross section is given, and compared to predictions from different event generators at the particle level. The POWHEG event generator interfaced with PYTHIA 8 tune CUETS1, referred to in the following as "POWHEG", reproduces the measured cross section best. However, if the MPI simulation is switched off, the same POWHEG predictions, referred to in the following as Table 2: Systematic and statistical uncertainties affecting the absolute and the normalized cross sections for each measured observable: each source of uncertainty is specified and the value is the average over all the bins of the observable. The 4% uncertainty from the integrated luminosity is included in the total uncertainty affecting the absolute cross sections. The total uncertainty is obtained by summing the individual experimental uncertainties quadratically. The theoretical uncertainties, listed in the last two columns, affect all the predictions. The systematic uncertainties in the normalized cross sections are smaller than those for the absolute cross sections, since, among others, they are not affected by the migration effects from outside the selected phase space. "POWHEG MPI-off", underestimate the value of the measured cross section. All predictions are consistent with the data within uncertainties, although MADGRAPH+PYTHIA 8 tune CUETM1 ("MADGRAPH" in the following) tends to underestimate the data, and PYTHIA 8 to overestimate them.  Fig. 3, the absolute differential cross sections as a function of the p T and η of the selected jets are shown compared to predictions from POWHEG. Figures 4 and 5 present the same differential cross sections as ratios of theoretical predictions from various MC event generators to the data. The POWHEG predictions reproduce very well the measurements as a function of p T and η of each jet, in both the central and forward regions. The other MC simulations also describe the data satisfactorily, although HERWIG++ tune UE-EE-5C and MADGRAPH are systematically lower than the data. Similar conclusions about HERWIG++ and MADGRAPH have been already drawn for inclusive [21] and exclusive four-jet [11] final states.   Figure 4: Ratios of the absolute cross section predictions of POWHEG, MADGRAPH, PYTHIA 6 (P6), PYTHIA 8 (P8), and HERWIG++ over data (unfolded to the particle level) as a function of the jet transverse momenta p T for each jet. The error bars on the data represent the total uncertainties, i.e., statistical and systematic added quadratically. Data are shown with markers at unity. The band represents the theoretical uncertainty due to the choice of the scales and PDFs (shown only around the POWHEG ratio for clarity, but affecting all predictions in the same way).  Figure 5: Ratios of the absolute cross section predictions of POWHEG, MADGRAPH, PYTHIA 6 (P6), PYTHIA 8 (P8), and HERWIG++ over data (unfolded to the particle level) as a function of the jet pseudorapidity η for each jet. The error bars on the data represent the total uncertainties, i.e., statistical and systematic added quadratically. Data are shown with markers at unity. The band represents the theoretical uncertainty due to the choice of the scales and PDFs (shown only around the POWHEG ratio for clarity, but affecting all predictions in the same way).
Figures 6-8 show the normalized differential cross sections as a function of the correlation observables, ∆φ light , ∆ rel light p T , and ∆S. The data are compared to the MC simulations considered previously. In addition, predictions from POWHEG MPI-off are also shown. All MC simulations that include MPI contributions describe the data well. This is remarkable given that the predictions are based on MPI models tuned to data at softer scales (p T ≈ 3-5 GeV). Conversely, POWHEG MPI-off is ruled out by the data, especially at low values of ∆ rel light p T (<0.1) and for values of ∆S smaller than 2. This is a clear indication for the need of MPI contributions. The discrepancy between the measurement and the POWHEG MPI-off predictions goes up to 60% in the low ∆S region, while for the four-jet events of Ref.
[11] the disagreement is of about 40%. This shows that heavy-flavor multijet production with common jet threshold is more sensitive to a DPS contribution than an untagged four-jet sample with asymmetric p T thresholds. The fact that the normalized distribution as a function of ∆φ light is also described reasonably well by POWHEG MPI-off reflects the limited DPS sensitivity of this observable, as already observed for exclusive four-jet final states [11].
In summary, predictions using LO or NLO dijet matrix elements matched to the simulation of MPI effects reproduce the measured normalized cross sections, whereas those without MPI fail to describe them. This study demonstrates the presence of DPS in the data and confirms the sensitivity to such contributions of the jet correlation variables ∆S and ∆ rel light p T .

Summary
A study of events with at least four jets, at least two of which are b jets, in proton-proton collisions at 7 TeV is presented. The data, corresponding to an integrated luminosity of 3 pb −1 , were collected with the CMS experiment in 2010. The two b jets must be within pseudorapidity |η| < 2.4, and the two other jets must be within |η| < 4.7. The transverse momenta of all the jets are required to be greater than 20 GeV. The cross section is measured to be σ(pp → 2b + 2j + X) = 69 ± 3 (stat) ± 24 (syst) nb. The differential cross sections as a function of the p T and η of each of the four jets are presented, along with the cross sections as a function of kinematic jet correlation variables. The results are compared to several theoretical predictions with and without contributions from double parton scattering. The models based on leading order or next-to-leading-order dijet matrix element calculations, matched to parton shower and including multiparton interaction (MPI) contributions, describe well the differential cross sections as a function of p T and η in the whole measured region. The differential cross sections as a function of the jet correlation variables are poorly reproduced by models that do not include contributions from MPI. Specifically, the predictions of POWHEG interfaced with PYTHIA 8 without the simulation of multiple parton interactions underestimate the cross sections as a function of ∆S and ∆ rel light p T in the regions of the phase space where a double parton scattering (DPS) signal is expected. These results demonstrate, for the first time, the sensitivity of kinematic jet correlation variables, such as ∆S and ∆ rel light p T , to DPS processes in multijet final states with heavy-quarks. (1/rad)

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: