A search for bottom-type, vector-like quark pair production in a fully hadronic ﬁnal state in proton-proton collisions at √ s = 13 TeV

A search is described for the production of a pair of bottom-type vector-like quarks (VLQs), each decaying into a b or b quark and either a Higgs or a Z boson, with a mass greater than 1000 GeV. The analysis is based on data from proton-proton collisions at a 13 TeV center-of-mass energy recorded at the CERN LHC, corresponding to a total integrated luminosity of 137 fb − 1 . As the predominant decay modes of the Higgs and Z bosons are to a pair of quarks, the analysis focuses on ﬁnal states consisting of jets resulting from the six quarks produced in the events. Since the two jets produced in the decay of a highly Lorentz-boosted Higgs or Z boson can merge to form a single jet, nine independent analyses are performed, categorized by the number of observed jets and the reconstructed event mode. No signal in excess of the expected background is observed. Lower limits are set on the VLQ mass at 95% conﬁdence level equal to 1570 GeV in the case where the VLQ decays exclusively to a b quark and a Higgs boson, 1390 GeV for when it decays exclusively to a b quark and a Z boson, and 1450 GeV for when it decays equally in these two modes. These limits represent signiﬁcant improvements over the previously published VLQ limits. in Physical Review doi:10.1103/PhysRevD.102.112004 .”


Introduction
One of the biggest puzzles in elementary particle physics concerns the large difference between the electroweak scale and the Planck scale, and the related problem of the unexpectedly low value of the Higgs boson mass [1]. In the standard model (SM), the Higgs boson H is assumed to be a fundamental scalar (spin-0) particle. Unlike the fundamental fermions (leptons and quarks) and the vector gauge bosons, the corrections to the Higgs boson mass due to vacuum energy fluctuations are quadratic, driving the Higgs boson mass to the cutoff value of the vacuum energy fluctuations. In the absence of any new physics below the Planck scale, this cutoff is about 10 19 GeV. In that case, the Higgs boson mass would naturally be expected to be seventeen orders of magnitude greater than its measured mass of 125 GeV.
Although supersymmetry provides an elegant solution to this problem [2,3], the lack of evidence for the production of supersymmetric particles at the CERN LHC indicates that, if supersymmetry is realized in nature, it is broken at an energy scale greater than a few TeV and, therefore, does not solve the fine tuning of the 125 GeV Higgs boson mass. Several alternative theories have been proposed for solving this fine tuning problem. These theories include composite Higgs models [4][5][6], in which the Higgs boson is not a fundamental particle, but rather contains constituents bound by a new type of gauge interaction, and little Higgs models [7,8], in which the Higgs boson is a pseudo-Nambu-Goldstone boson that arises from spontaneous breaking of a global symmetry at the TeV energy scale. Both of these types of models predict a new class of vector-like fermions [9] with the same charges as the SM fermions, but with purely vector current couplings to the weak gauge bosons. In composite Higgs models, the vector-like quarks (VLQs) are excited bound-state resonances, while in little Higgs models they are fundamental particles that cancel loop divergences.
Since the VLQs are nonchiral, Lagrangian mass terms not arising from Yukawa couplings to the Higgs field are allowed, thereby avoiding the constraints on heavy, sequential fourthgeneration quarks set by the measured cross section for Higgs boson production at the LHC [10,11]. Requiring VLQs to have renormalizable couplings to the SM quarks permits only four types of VLQs, defined by their charge q: q = −1/3 (B), q = +2/3 (T), q = −4/3 (X), and q = +5/3 (Y) [12]. These are arranged into seven multiplets: two singlets (T and B), three doublets (TB, XT, and YB), and two triplets (XTB and TBY) [13]. This analysis focuses on the q = −1/3 (B) type of VLQ.
The branching fractions B of the T and B are model specific and depend upon the VLQ multiplet configuration, the mass of the VLQ, and the coupling of the VLQ to chiral quarks [14]. In general, up-type quark mass eigenstates will be mixtures of the chiral up-type quarks with the T VLQ, while down-type quark mass eigenstates will be mixtures of the chiral down-type quarks and the B VLQ. Precision measurements of the couplings of the first and second generation SM quarks constrain their mixings with VLQs and indicate that the only sizable couplings of the T and B VLQs allowed are to SM quarks of the third generation, although couplings to other quarks are not excluded [13,15,16]. In this analysis, we assume the B VLQ has three decay modes: B → bZ, B → bH, and B → tW. In most models, for a B VLQ mass greater than the current limit of approximately 1000 GeV, there is a small difference between B(B → bZ) and B(B → bH), depending on the VLQ mass, but the difference is essentially zero for masses greater than 2000 GeV. The expected values of B(B → bZ) and B(B → bH) also depend upon the multiplet configuration. They are 50% for both the XTB triplet and the BY doublet, and 25% for the TBY triplet and the B singlet. The branching fractions for the TB doublet depend upon the mixing of the T and B VLQs with chiral quarks. If the Tt mixing is zero, the B → bH and B → bZ branching fractions are 50%. If the Tt and Bb mixing are equal, these branching fractions are 25%. If the Bb mixing is zero, these branching fractions are zero [12].
Results from both the ATLAS and CMS Collaborations using events with fully hadronic final states, based on data from proton-proton (pp) collisions at √ s = 13 TeV with an integrated luminosity of 36 fb −1 , have excluded a pair-produced B with mass up to approximately 1100 GeV. The ATLAS analysis [17] was based on a classification of event signatures using neural networks, while the CMS analysis [18] used event shapes to identify Lorentz-boosted objects. The ATLAS (CMS) results exclude masses, at 95% confidence level (CL), up to 1010 (980), 710 (1070), and 950 (1025) GeV for the 100% B → bH, 100% B → bZ, and the BY doublet cases, respectively. In addition, an ATLAS analysis [19] combining both fully hadronic and leptonic channels excludes values of the B mass up to 1140 GeV for the BY doublet case. The analysis presented here improves on these results by using the full 137 fb −1 data set collected by CMS in 2016-2018, and by fully reconstructing the event kinematics, thereby allowing the mass of the B to be reconstructed.

Analysis overview
This analysis involves a search for the production of a pair of bottom-type VLQs with mass greater than 1000 GeV, using data from pp collisions at √ s = 13 TeV at the LHC collected by the CMS detector during 2016-2018, corresponding to an integrated luminosity of 137 fb −1 . The analysis is focused on events in which each of the VLQs decays to a b or b quark and to either a Higgs or Z boson. Since the dominant decay modes of the Higgs and Z bosons are to a quark and antiquark pair, we select final states consisting of jets resulting from the quarks and antiquarks produced in the decays of the VLQs and subsequent decays of the two bosons. The events are categorized into three modes, depending on the daughter bosons: bHbH, bHbZ, and bZbZ. Figure 1 shows the dominant Feynman diagrams for these three modes.
Background from SM processes (predominantly "multijet" events, consisting solely of jets produced through the strong interaction) is reduced by requiring that the jets are consistent with the production of a pair of bosons (either Higgs or Z), that the reconstructed VLQs have equal masses, and that some of the jets are tagged as originating from b quarks. For a highly boosted Higgs or Z boson, the two jets resulting from its daughter quarks might merge into a single reconstructed jet. In order to include these events, three orthogonal, fully independent analyses are carried out using exclusive sets of events categorized by the observed jet multiplicity: 4, 5, or 6 jets. The final result is obtained by combining these three independent analyses. In this paper, we use "jet tagging requirements" to refer to both single jets tagged as being from a b quark, and merged jets tagged as containing a bb pair.
To select the correct assignment of reconstructed jets to parent particles, a modified χ 2 metric, χ 2 mod , is used. The χ 2 mod value is determined by the differences between the masses of the two reconstructed bosons and the mass of the Higgs or Z boson, normalized by their resolutions, and by the reconstructed fractional mass difference of the two VLQs. The event mode is assigned as bHbH, bHbZ, or bZbZ, depending on which gives the smallest value of χ 2 mod . An upper cutoff on the value of χ 2 mod is applied to remove background. The expected background is first determined by fitting the distribution of the number of events as a function of the reconstructed VLQ mass, before jet tagging requirements are applied, so this sample is overwhelmingly background dominated. The fraction of background expected to remain after jet tagging requirements are applied, called the background jet-tagged fraction, is measured using events with VLQ candidate masses in the range 500-800 GeV, in which a VLQ signal has already been excluded, and then corrected for a possible dependence on the VLQ mass by using a control region with a higher χ 2 mod value. Both the χ 2 mod selection and jet tagging requirements are simultaneously optimized for maximal sensitivity to a potential signal. This optimization is done separately for each event mode and jet multiplicity. For the final result, all event mode and jet multiplicity analyses are combined using the procedure in Ref. [20] to obtain VLQ mass limits as a function of B(B → bH) and B(B → bZ), as described further in Section 10.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.
The electromagnetic calorimeter consists of 75 848 lead tungstate crystals, which provide coverage in |η| < 1.48 in a barrel region and 1.48 < |η| < 3.0 in two endcap regions. In the region |η| < 1.74, the HCAL cells have widths of 0.087 in pseudorapidity and 0.087 in azimuth (φ). In the η-φ plane, and for |η| < 1.48, the HCAL cells map on to 5×5 arrays of ECAL crystals to form calorimeter towers projecting radially outwards from close to the nominal interaction point. For |η| > 1.74, the coverage of the towers increases progressively to a maximum of 0.174 in ∆η and ∆φ. Within each tower, the energy deposits in ECAL and HCAL cells are summed to define the calorimeter tower energies, which are subsequently used to provide the energies and directions of hadronic jets. When combining information from the entire detector, the jet energy resolution amounts typically to 15-20% at 30 GeV, 10% at 100 GeV, and 5% at 1 TeV [21].
Events of interest are selected using a two-tiered trigger system [22]. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a fixed time interval of about 4 µs. The second level, known as the high-level trigger (HLT), consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage.
A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [23].

Data and simulated events
The data used in this analysis were collected during the 2016-2018 LHC running periods and correspond to an integrated luminosity of 137 fb −1 [24][25][26].
Signal events with pair production of VLQs were simulated using the Monte Carlo generator MADGRAPH5 aMC@NLO [27], version v2.3.3 (v2.4.2) for samples corresponding to 2016 (2017-2018) data, at leading order with the NNPDF3.0 parton distribution functions (PDFs) [28]. The generated VLQ masses m B cover the range 1000-1800 GeV in steps of 100 GeV. Hadronization of the underlying partons was simulated using PYTHIA v8.212 [29] with the CUETP8M1 tune [30] for samples corresponding to 2016 data, and with the CP5 tune [31] for samples corresponding to 2017 and 2018 data. Corrections of the cross sections to next-to-next-to-leading order and next-to-next-to-leading logarithmic soft-gluon resummation were obtained using TOP++ 2.0 [32] with the MSTW2008NNLO68CL parton distribution set from the LHAPDF 5.9.0 library [33,34]. To simulate the effect of additional pp interactions within the same or nearby bunch crossings ("pileup"), PYTHIA v8.226 with a total inelastic pp cross section of 69.2 mb [35] was used. Following event generation, the GEANT4 package [36,37] was used to simulate the CMS detector response. Scale factors corresponding to jet energy corrections, jet energy resolutions [21], pileup, and jet tagging [38,39] are applied to the simulated signal events so that the corresponding distributions agree with those in data.

Jet reconstruction and tagging
The global event reconstruction, also called the particle-flow event reconstruction [40], aims to reconstruct and identify each individual particle in an event, with an optimized combination of all subdetector information. In this process, the identification of the particle type (photon, electron, muon, charged hadron, or neutral hadron) plays an important role in the determination of the particle direction and energy. First, photons, electrons, and muons are identified using ECAL energy clusters, tracks in the tracker, and hits in the muon system. Then, charged hadrons are identified as charged particle tracks neither identified as electrons, nor as muons. Finally, neutral hadrons are identified as HCAL energy clusters not linked to any charged hadron trajectory, or as a combined ECAL and HCAL energy excess with respect to the expected charged hadron energy deposit. The energy of charged hadrons is determined from a combination of the track momentum and the corresponding ECAL and HCAL energies, corrected for the response function of the calorimeters to hadronic showers, and the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies.
For this analysis, two types of hadronic jets are clustered from these reconstructed particles, using the infrared and collinear safe anti-k T algorithm [41,42]. The first type, "AK4 jets", uses a distance parameter ∆R = √ (∆η) 2 + (∆φ) 2 of 0.4. However, since merged jets from a boosted Higgs or Z boson decay may be wider, a second set, using a distance parameter of 0.8 ("AK8 jets") is also reconstructed from the same set of input particles. Jet momentum is determined as the vectorial sum of all particle momenta in the jet, and is found from simulation to be, on average, within 5 to 10% of the true momentum over the whole transverse momentum (p T ) spectrum and detector acceptance. Pileup can contribute additional tracks and calorimetric energy depositions to the jet momentum. The pileup per particle identification algorithm (PUPPI) [43] is used to mitigate the effect of pileup at the reconstructed particle level, making use of local shape information, event pileup properties, and tracking information. A local shape variable is defined, which distinguishes between collinear and soft diffuse distributions of other particles surrounding the particle under consideration. The former is attributed to particles originating from the hard scatter and the latter to particles originating from pileup interactions. Charged particles identified as originating from pileup vertices are discarded. For each neutral particle, a local shape variable is computed using the surrounding charged particles compatible with the primary vertex within the tracker acceptance (|η| < 2.5), and using both charged and neutral particles in the region outside of the tracker coverage. The momenta of the neutral particles are then rescaled according to their probability to originate from the primary interaction vertex deduced from the local shape variable, superseding the need for jet-based pileup corrections [44]. Jet energy corrections are derived from simulation to bring the measured response of jets to that of particle level jets on average. In situ measurements of the momentum balance in dijet, photon + jet, Z + jet, and multijet events are used to account for any residual differences in the jet energy scale between data and simulation [21]. Additional selection criteria are applied to each jet to remove jets potentially dominated by anomalous contributions from various subdetector components or reconstruction failures. This analysis only uses AK4 jets with p T > 50 GeV and AK8 jets with p T > 200 GeV, both within |η| < 2.4. For AK8 jets, the constituents are reclustered using the Cambridge-Aachen algorithm [45,46]. The "modified mass drop tagger" algorithm [47,48], also known as the "soft drop" algorithm, with angular exponent β = 0, soft cutoff threshold z cut < 0.1, and characteristic radius R 0 = 0.8 [49], is applied to remove soft, wide-angle radiation from the jet. This results in a jet mass that, in the case of large mass, more accurately corresponds to the mass of the mother particle from which the jet originated. For events with boosted Higgs and Z bosons, the AK8 soft-drop mass is used to obtain the mass of the merged jet.
The event jet multiplicity is determined by the number of AK4 jets passing the requirements above. In signal events, decays of the VLQ pair and subsequent decays of the Higgs and Z boson daughters yield a total of six quarks. In all three event modes, at least two of these are b quarks. Considering the predominant H → bb decay, for the bHbH event mode, all six are b quarks; for the bHbZ event mode, four or six are b quarks; while for the bZbZ event mode, two, four or six are b quarks. We note that of the hadronic Z boson decays, 15% are to a bb pair; however, Z → cc decays also have a significant chance of passing the tagging requirements, as discussed in Section 6, and are also included (along with other possible Z boson decay modes) in the signal efficiency. If all six quark jets are individually reconstructed, a 6-jet event is produced; if two jets merge into a single reconstructed jet, this produces a 5jet event; and if two merged jets are produced, then a 4-jet event results. Note that the VLQ reconstruction does not consider the possibility of additional jets produced by initial state or final state radiation.
Because of the large number of b jets in signal events, b tagging is a powerful tool to signifi-cantly reduce the background from SM processes. Individual jets are tagged using the DeepJet b discriminant [38] applied to AK4 jets, while merged jets from bb pairs are double b tagged using the algorithm in Ref. [39], developed in the context of H → bb searches, applied to AK8 jets.

Event selection
The events used in the analysis are first selected online by the CMS trigger system. The HLT trigger used requires the total p T measured in the calorimeters to be at least 900 (1050) GeV for the 2016 (2017-2018) data set. Offline, events with H T > 1350 GeV are selected, where H T is defined as the scalar sum of the jet p T for all AK4 jets with p T > 50 GeV and |η| < 2.4. The requirement is set higher than the trigger threshold to avoid effects due to trigger turn on. In order to minimize bias when measuring the efficiency of the HLT triggers, the efficiencies are measured in a data set collected by an orthogonal trigger, which requires the event to have a single muon. For all years, the measured trigger efficiency for events with H T > 1350 GeV is at least 99.6%. Table 1 shows the efficiency for simulated VLQ signal events after the H T requirement for each of the three jet multiplicity channels and for each of three VLQ masses (1000, 1200, and 1400 GeV). The number of tagged jets required to select an event, as well as the working points for the taggers used, are optimized separately for each of the three jet multiplicities and event modes, in order to maximize the expected signal sensitivity. For the working points selected, the single b tagger has an efficiency of 82% for b jets in simulated tt events with p T > 30 GeV and a mistag rate of 1% for light quarks (u, d, or s) and approximately 17% for charm quarks [38]. The double b tagger has an efficiency of 75% in simulated H → bb events, and a mistag rate of 10% in simulated inclusive multijet events and 33% in simulated H → cc events [39], where a mistag in the double b tag case means that at least one non-b quark subjet is present in the tagged jet. The number of tags required depends on the jet multiplicity as follows: in the 6-jet case, four AK4 jets are required to have a b tag, except in the bZbZ event mode, for which three tags are required. In the 5-jet case, three of the AK4 jets not associated with the merged decay products are required to be b tagged; no double b tag requirement is applied to the AK8 jet associated with the merged decay. In the 4-jet case, two of the nonmerged AK4 jets are required to have a b tag, and one of the merged jets is required to have a double b tag, except in the bZbZ event mode, for which no double b tag is required. These requirements are summarized in Table 2.

Event reconstruction
In the case when the two jets produced from a H/Z boson decay are individually resolved, the mass of the parent boson can be estimated from the invariant mass of the two jets. In the case where the two jets are merged, the parent boson mass is instead estimated using the soft-drop mass of the AK8 jet. Only those AK8 jets that are within ∆R < 0.3 of an AK4 jet are used. However, if a second AK4 jet is within ∆R < 0.6 of the AK8 jet, this overlap could cause the AK8 jet mass to be misreconstructed, so in this case the AK8 jet is discarded and the two AK4 jets are treated as a resolved dijet boson candidate.
A central feature in the analysis is the selection of the correct way of combining jets in order to reconstruct the parent particles; this is a difficult task because of the large number of jets in the event. In 6-jet events, there are two pairs of jets originating from H/Z boson decay; and three jets (including two from the H/Z decay) associated with each VLQ decay. In 5-jet events, there is a pair of jets associated with one H/Z boson and a merged jet associated with the other H/Z boson; each of these is associated with one of the remaining two jets to form a VLQ candidate. In 4-jet events, there is a merged jet associated with each H/Z boson, each of which is paired to one of the remaining two jets to form a VLQ candidate. The final reconstructed VLQ mass, m VLQ , is defined as the average mass of the two individual reconstructed VLQs in the event.
The number of possible ways to combine the jets to reconstruct the two H/Z bosons and then to combine these with the two remaining jets to form the VLQ candidates is 720, 120, and 24 for the 6-, 5-, and 4-jet multiplicities, respectively. However, many of these different combinations simply involve different permutations among the jets that constitute the individual VLQs, and these permutations do not affect the reconstructed VLQ mass. The numbers of combinations that give distinct VLQ masses are only 10, 10, and 3 for the 6-, 5-, and 4-jet multiplicities, respectively. For the 5-and 4-jet multiplicities, however, the jet associated with the merged H/Z boson decay products distinguishes different combinations, since this jet is treated differently from the others by having a double b tag, rather than a single b tagging requirement. This doubles the number of distinct 5-jet combinations and quadruples the number of distinct 4-jet combinations. The final number of distinct combinations is then 10, 20, and 12 for the 6-, 5-, and 4-jet multiplicities, respectively.
For each jet combination, χ 2 mod is determined using Eqs. 1-3 below, which depend on the measured mass of a dijet Higgs or Z boson candidate (m dijet ), the measured soft-drop mass of a merged-jet Higgs or Z boson candidate (m merged ), and the fractional mass difference of the two VLQ candidates (∆m VLQ ), where ∆m VLQ is the difference of the masses of the two VLQ candidates divided by the average mass of the two. The only use of AK8 jets in this calculation is to determine the mass of the merged H/Z candidates using the soft-drop mass of the matched AK8 jet. All other quantities are determined using AK4 jet kinematics.
6-jet events: 5-jet events: 4-jet events: The means (m and ∆m VLQ ) and standard deviations (σ m and σ ∆m VLQ ) of the parameters used in these expressions are determined from simulated signal events in which the jets are matched to the generator-level quarks and H/Z bosons. These quantities are derived separately for each jet multiplicity, but do not depend on the simulated signal mass. For each parameter, the central core of the distribution is fit with a Gaussian function, whose mean and standard deviation are then used as the parameters in the expressions for χ 2 mod . As the distribution of the merged Higgs boson mass is asymmetrical, two Gaussian functions are separately fit above and below the peak of the distribution. Since the underlying distributions used in these expressions have non-Gaussian tails and are in some cases asymmetric, the values of χ 2 mod are not exactly distributed as a χ 2 variable. However, the difference is small, and χ 2 mod is only used to select events, so these deviations do not affect the analysis. Choosing the jet combination that has the lowest value of χ 2 mod gives a high probability of identifying the correct jet combination, and allows m VLQ to be reconstructed. In simulation, this can then be compared with the generated B mass m B . This is indicated in Fig. 2, which shows the average value of the reconstructed VLQ mass for the jet combination with the lowest χ 2 mod for simulated signal events with m B = 1200 GeV. In most cases, the VLQ mass is correctly reconstructed. We observe that the reconstructed mass distribution is consistently peaked at a value about 5% lower than the generated mass, for all generated signal mass values. The low-side tail is due to the presence of incorrectly reconstructed events. Figure 3 shows the distributions of χ 2 mod /ndf, where ndf is the number of degrees of freedom, for the best jet combination (i.e., the combination with the lowest χ 2 mod ), from simulated 1200 GeV VLQ signal events, at each jet multiplicity. Each χ 2 mod expression has three degrees of freedom, one for each term. The distributions for the data are also shown for comparison. In these plots, the simulated signal and data distributions are normalized to the same integral value within the displayed χ 2 mod range. This figure demonstrates that requiring a small χ 2 mod value for the best jet combination provides an effective method for removing background.
The χ 2 mod value is also used to select the event mode. There are three possible decays of the bottom-type VLQ: B → bH, B → bZ, and B → tW. This results in six possible modes for the BB pair production events: bHbH, bHbZ, bZbZ, tWbH, tWbZ, and tWtW. The latter three modes involve one or two decays of a VLQ to a t quark and a W boson. These events either have a jet multiplicity greater than six, or contain leptons and missing transverse energy from the W decays. Although this analysis is not optimized for sensitivity to these events, events with B → tW, if present, have some probability to be selected as one of the other three event modes and can affect the sensitivity of the analysis. These events are included in the signal simulation and are added according to their reconstruction efficiency. For events that satisfy the H T requirement and that are categorized as either 4-, 5-, or 6-jet multiplicity events, the χ 2 mod described above is calculated for each of the three event modes: bHbH, bHbZ, or bZbZ, and  Figure 3: Distribution of χ 2 mod /ndf for the best jet combination for simulated 1200 GeV VLQ events (red histogram) and data (black points), for 4-jet (left), 5-jet (center), and 6-jet events (right). The simulated signal events and data events are normalized to the same integral value within the displayed χ 2 mod range.
the mode of the event is selected as the one that has the best χ 2 mod value. Events are categorized by their jet multiplicity and their reconstructed mode, regardless of the underlying decay mode for simulated signal events.
Before examining the data in the potential signal region, the event selection parameters (the jet tagging parameters and χ 2 mod ) are optimized. This optimization is performed by varying the parameters and selecting the values that maximize the sensitivity to a 1600 GeV VLQ signal. The mass of 1600 GeV is chosen because it is the point with maximum sensitivity of the analysis; however, the optimized parameters are largely independent of the point chosen. The optimized jet tagging parameters are described in Section 6, and the optimized χ 2 mod /ndf values are shown below in Table 3. With the optimized selection, the overall signal efficiency measured in simulation for a generated VLQ mass of 1600 GeV is approximately 5% in the B(B → bZ) = 100% scenario, increasing to 10% for B(B → bH) = 100%.

Background estimation
The expected background is estimated in a low-mass sideband region in data, using the ratio of the number of events passing the tag requirements to the number before these requirements are applied. The estimation is done separately for each of the three event modes, jet multiplicities, and three data-taking years, for a total of 27 cases. Figure 4 shows the distribution of m VLQ for the jet combination with the lowest χ 2 mod for each of the three jet multiplicities. All events shown in this plot are required to pass a selection of χ 2 mod /ndf < 4. The falloff in the distribution at lower masses is due to the H T > 1350 GeV requirement. The distributions are then fit with an exponential function for VLQ candidate masses greater than 1000 GeV; in all three cases, the function (shown by the red line) agrees with the data. An F-test [50] shows that a more complex model, namely an exponential plus constant background, offers no significant improvement over the exponential distribution. The lower plots show the fractional difference between the data and the fit. At this stage, since there is no requirement made on jet tagging, the ratio of background to signal event acceptance is more than two orders of magnitude larger than after jet tagging, so the fits are insensitive to any possible signal events in the data.
The background jet-tagged fraction (BJTF) is the fraction of background events that remain after the jet tagging requirements, as described in Section 5, are applied. Since the BJTF for events with m VLQ > 1000 GeV could be biased due to signal events that might be in the data, the BJTF is initially determined only for events in which m VLQ is between 500 and 800 GeV, which is below the current lower exclusion limit on the VLQ mass [17,18]. Table 4 shows the BJTF for data events with m VLQ in the range 500-800 GeV for each of the three event modes and three jet multiplicities. Table 4: Values of the BJTF for data events with m VLQ in the range 500-800 GeV for each of the three event modes and three jet multiplicities. bHbH bHbZ bZbZ 4 jets 0.0042 ± 0.0014 0.0019 ± 0.0004 0.0025 ± 0.0004 5 jets 0.0041 ± 0.0003 0.0036 ± 0.0002 0.0048 ± 0.0009 6 jets 0.0019 ± 0.0002 0.0019 ± 0.0002 0.0020 ± 0.0005 Because the jet tagging efficiency depends on the p T of the jet, the BJTF might depend on the mass of the VLQ candidate, since events with greater VLQ mass generally have higher p T jets. A control region is therefore used to determine the VLQ mass dependence of the BJTF by offsetting the window of the χ 2 mod selection. The signal χ 2 mod region depends on the event mode and multiplicity, as determined by the optimization procedure described in Section 7, but in all cases is at most χ 2 mod /ndf < 5.5. A control region is defined by using a region of 12 < χ 2 mod /ndf < 48. Figure 5 shows the mass dependence of the BJTF for data events in the χ 2 mod control region. A first-order polynomial fit is used to determine the BJTF mass dependence. Another F-test shows that there is no improvement for a second-order polynomial fit compared to a first-order one, and also that the first-order polynomial fit performs better than a constant fit. A systematic uncertainty is assigned by comparing the first-order polynomial fit to an exponential fit, and using the average difference of these two fits over the mass range as the uncertainty. This covers the uncertainty due to the choice of the BJTF shape.
In order to validate that the control region used has the same BJTF behavior as the signal region, we perform a test where the BJTF in the low VLQ mass range (500-800 GeV) is plotted  Figure 5: Dependence of the BJTF on m VLQ in the control region 12 < χ 2 mod /ndf < 48, for 4-jet (left column), 5-jet (center column), and 6-jet (right column) multiplicities, and for the bHbH (upper row), bHbZ (middle row), and bZbZ (lower row) event modes. The data are shown as black points with vertical error bars, and the linear fit and associated uncertainty are shown as a solid red line and the shaded red band. as a function of χ 2 mod /ndf, in twelve equally spaced regions for χ 2 mod /ndf from 0 to 48. This is shown in Fig. 6. The slope of this plot is consistent with zero, indicating no statistically significant dependence on χ 2 mod /ndf.  Figure 6: Dependence of the BJTF on χ 2 mod /ndf in the low-mass (500-800 GeV) VLQ region, for 4-jet (left column), 5-jet (center column), and 6-jet (right column) multiplicities, and for the bHbH (upper row), bHbZ (middle row), and bZbZ (lower row) event modes. The data are shown as black points with vertical error bars, and the linear fit and associated uncertainty are shown as a solid red line and the shaded red band. Figure 7 shows the two-dimensional dependence of the BJTF in data on m VLQ and χ 2 mod /ndf, and Fig. 8 shows the corresponding distributions for simulated VLQ signal events with a generated VLQ mass of 1200 GeV. The signal region is indicated in these plots by the red rectangle, and is excluded from the data plots.
The final estimate of the number of background events n b as a function of VLQ mass m is given by the following expression:  Figure 7: Dependence of the BJTF on m VLQ and the best χ 2 mod /ndf in data events, for 4-jet (left column), 5-jet (center column), and 6-jet (right column) multiplicities, and for the bHbH (upper row), bHbZ (middle row), and bZbZ (lower row) event modes. The red box indicates the signal region, which is excluded from these plots.
where n(m) is the number of candidates as a function of m VLQ before jet tagging for candidates passing the χ 2 mod selection shown in Fig. 4, ε 0 is the BJTF at low VLQ mass as shown in Table 4, and the last factor accounts for the potential mass dependence of the BJTF, with ε(m) the distribution of the BJTF as a function of mass, as shown in Fig. 5; the factor of 300 GeV is to normalize over the range considered. The final estimate is also validated by comparison with the region 8 < χ 2 mod /ndf < 12 and with simulated events.

Systematic uncertainties
We consider two types of systematic uncertainties, those that are common to all event modes and jet multiplicities, and those that depend on the particular channel. The uncertainties in the first category are listed in Table 5; these are the integrated luminosity, trigger efficiency, and the choice of fit function for the dependence of the BJTF on VLQ mass. The integrated luminosities of the 2016, 2017, and 2018 data-taking periods are individually known with uncertainties in the range 2.3-2.5% [24][25][26], while the total 2016-2018 integrated luminosity has an uncertainty of 1.8%, the improvement in precision reflecting the (uncorrelated) time evolution of some systematic effects. The uncertainty associated with the choice of fit function for the m VLQ dependence of the BJTF is determined by finding the average difference between the fit functions used for each mode and multiplicity combination, and using the maximum one as the common uncertainty for all modes and multiplicities. Table 5 also indicates whether an uncertainty affects the signal efficiency or the background estimate. The uncertainties that depend on event mode and jet multiplicity are those due to the background estimation, jet tag scale factors, jet energy resolution and scale, choice of PDF, and pileup.
There are several sources of uncertainty in the background estimation, corresponding to the three terms in Eq. 4. The first uncertainty arises from the exponential fit to the distribution n(m), the number of events as a function of mass before jet tagging is applied. The uncertainties in the fit parameters p 0 and p 1 are used to determine the uncertainty in the fit value for a given mass. The second is the uncertainty in the BJTF determined for low-mass VLQ candidates, ε 0 , as shown in Table 4. Finally, the third uncertainty arises from the third term, to account for a potential mass dependence of the BJTF, and is obtained from the uncertainties in the fit parameters, as in the first case.
The efficiencies for jet tagging are measured in simulated events and then corrected to data events using a data-to-simulation scale factor. The uncertainty in this scale factor is propagated to the signal reconstruction efficiency by varying the scale factors within their uncertainties [39]. The uncertainties due to the scale factors for jet energy scale and resolution [21] are determined similarly. The uncertainty due to the choice of PDF weighting is calculated from a set of 100 weights selected from the NNPDF3.0 distribution, following the prescription in Ref. [51]. The pileup uncertainties are due to a 4.6% systematic uncertainty in the pp inelastic cross section. Table 6 summarizes these uncertainties, and indicates whether they affect the signal efficiency or the background estimate, and whether the uncertainty affects the overall rate or the shape of the mass distribution. For the PDF systematic uncertainties, the values refer only to the event acceptance rate. There is in addition an uncertainty in the VLQ pair production cross section. This uncertainty depends only weakly on VLQ mass and an average value of 6% is used for all masses [34]. Figure 9 shows the distribution of the reconstructed VLQ mass, after the optimized selections described in Section 7 have been applied, for data, the expected background, and for simulated signal events with a VLQ mass of 1200, 1400, 1600, and 1800 GeV and B(B → bH) = 100%. The signal distributions are normalized to the expected number of events as determined by the VLQ production cross section. No statistically significant excess of data over the background expectation is observed; the largest difference across all nine mass points and branching fraction scenarios is slightly less than 2σ.

Results
We proceed to set exclusion limits on the VLQ mass as a function of the branching fractions. The signal extraction procedure is based on a binned maximum likelihood fit. All systematic uncertainties are incorporated into the fit as nuisance parameters, where the effect of each systematic uncertainty is included as a lognormal probability distribution per bin. A limit at 95% CL is calculated using the CL s method [52,53] using the profile likelihood test statistic [54] with the asymptotic limit approximation. Figure 10 shows the final expected and observed limits on the VLQ mass as a function of B(B → bH) and B(B → bZ), after all of the individual jet multiplicities and event modes have been combined. Points for which the exclusion limit is less than 1000 GeV are not shown. Figure 11 shows the expected limits at 95% CL on the cross section of VLQ pair production as a function of VLQ mass assuming three different branching fraction combinations: B(B → bH) = 100%, B(B → bZ) = 100%, and B(B → bH) = B(B → bZ) = 50%. The observed limits at 95% CL are: 1570 GeV in the 100% bH case, 1390 GeV in the 100% bZ case, and 1450 GeV in the 50% bH plus 50% bZ case. In the fully B → bH and B → bZ modes, as well as the mixed bHbZ mode, where this analysis is most sensitive, these limits represent significant improvements over previously published VLQ limits (1010, 1070, and 1025 GeV respectively), extending the existing limits by several hundred GeV. These improvements can be   Figure 9: Data (black points), expected background (solid blue histogram), and expected background plus a VLQ signal for different VLQ masses (colored lines), for 4-jet (left column), 5-jet (center column), and 6-jet (right column) multiplicities and for bHbH (upper row), bHbZ (middle row), and bZbZ (lower row) event modes. For the signal, B(B → bH) = 100% is assumed. The hatched regions for the background and background plus signal distributions indicate the systematic uncertainties. All three data-taking years are combined.
attributed to the use of the χ 2 mod /ndf method, which allows the hadronic final state to be fully reconstructed, as well as to the increased size of the data sample.

Summary
This paper describes a search for bottom-type, vector-like quark (VLQ) pair production in data collected by the CMS detector in 2016-2018 at √ s = 13 TeV, where the VLQ B decays into a b or b quark and either a Higgs boson H or a Z boson. The analysis targets the fully hadronic B → bH and B → bZ decays by tagging jets and using a modified χ 2 metric to reconstruct the event. Different jet multiplicity categories were used to account for the fact that Higgs or Z boson decays can produce either two distinct jets or, if highly Lorentz boosted, a single merged jet. Backgrounds were estimated from a region of low VLQ mass and extrapolated into the signal region using a modified χ 2 control region. Limits were set on the VLQ mass at 95% confidence level as a function of the branching fractions for B → bH and B → bZ. Compared to previous measurements [17,18] [10] ATLAS Collaboration, "Combined measurements of Higgs boson production and decay using up to 80 fb −1 of proton-proton collision data at √ s = 13 TeV collected with the ATLAS experiment", Phys. Rev. D 101 (2020) 012002, doi:10.1103/PhysRevD.101.012002, arXiv:1909.02845.