Search for the standard model Higgs boson produced in association with a W or a Z boson and decaying to bottom quarks

A search for the standard model Higgs boson (H) decaying to b b-bar when produced in association with a weak vector boson (V) is reported for the following channels: W(mu nu)H, W(e nu)H, W(tau nu)H, Z(mu mu)H, Z(e e)H, and Z(nu nu)H. The search is performed in data samples corresponding to integrated luminosities of up to 5.1 inverse femtobarns at sqrt(s) = 7 TeV and up to 18.9 inverse femtobarns at sqrt(s) = 8 TeV, recorded by the CMS experiment at the LHC. An excess of events is observed above the expected background with a local significance of 2.1 standard deviations for a Higgs boson mass of 125 GeV, consistent with the expectation from the production of the standard model Higgs boson. The signal strength corresponding to this excess, relative to that of the standard model Higgs boson, is 1.0 +/- 0.5.


Introduction
At the Large Hadron Collider (LHC), the ATLAS and CMS collaborations have reported the discovery of a new boson [1,2] with a mass, m H , near 125 GeV and properties compatible with those of the standard model (SM) Higgs boson [3][4][5][6][7][8]. To date, significant signals have been observed in channels where the boson decays into γγ, ZZ, or WW. The interaction of this boson with the massive W and Z vector bosons indicates that it plays a role in electroweak symmetry breaking. The interaction with the fermions and whether the Higgs field serves as the source of mass generation in the fermion sector, through a Yukawa interaction, remains to be firmly established.
At m H ≈ 125 GeV the standard model Higgs boson decays predominantly into a bottom quarkantiquark pair (bb) with a branching fraction of ≈58% [9]. The observation and study of the H → bb decay, which involves the direct coupling of the Higgs boson to down-type quarks, is therefore essential in determining the nature of the newly discovered boson. The measurement of the H → bb decay will be the first direct test of whether the observed boson interacts as expected with the quark sector, as the coupling to the top quark has only been tested through loop effects.
In their combined search for the SM Higgs boson [10], the CDF and D0 collaborations at the Tevatron pp collider have reported evidence for an excess of events in the 115-140 GeV mass range, consistent with the mass of the Higgs boson observed at the LHC. In that search, the sensitivity below a mass of 130 GeV is dominated by the channels in which the Higgs boson is produced in association with a weak vector boson and decaying to bb [11]. The combined local significance of this excess is reported to be 3.0 standard deviations at m H = 125 GeV, while the expected local significance is 1.9 standard deviations. At the LHC, a search for H → bb by the ATLAS experiment using data samples corresponding to an integrated luminosity of 4.7 fb −1 at √ s = 7 TeV resulted in exclusion limits on Higgs boson production, at the 95% confidence level (CL), of 2.5 to 5.5 times the standard model cross section in the 110-130 GeV mass range [12].
This article reports on a search at the Compact Muon Solenoid (CMS) experiment for the standard model Higgs boson in the pp → VH production mode, where V is either a W or a Z boson and H → bb. The previous Higgs boson search in this production mode at CMS used data samples corresponding to integrated luminosities of up to 5.1 fb −1 at √ s = 7 TeV and up to 5.3 fb −1 at √ s = 8 TeV [13]. The results presented here combine the analysis of the 7 TeV data sample in Ref. [13] with an updated analysis of the full 8 TeV data sample corresponding to a luminosity of up to 18.9 fb −1 .
The following six channels are considered in the search: W(µν)H, W(eν)H, W(τν)H, Z(µµ)H, Z(ee)H, and Z(νν)H, all with the Higgs boson decaying to bb. Throughout this article the term "lepton" refers only to charged leptons and the symbol is used to refer to both muons and electrons, but not to taus. For the W(τν)H final state, only the 8 TeV data are included and only taus with 1-prong hadronic decays are explicitly considered; the τ notation throughout this article refers exclusively to such decays. The leptonic decays of taus in WH processes are implicitly accounted for in the W(µν)H and W(eν)H channels. Backgrounds arise from production of W and Z bosons in association with jets (from gluons and from light-or heavy-flavor quarks), singly and pair-produced top quarks (tt), dibosons, and quantum chromodynamics (QCD) multijet processes.
Simulated samples of signal and background events are used to provide guidance in the optimization of the analysis. Control regions in data are selected to adjust the event yields from simulation for the main background processes in order to estimate their contribution in the mately nine. During the 2012 period the LHC instantaneous luminosity reached 7.7 × 10 33 cm −2 s −1 and the average number of pp interactions per bunch crossing was approximately twenty-one. Additional simulated pp interactions overlapping with the event of interest in the same bunch crossing, denoted as pileup events, are therefore added in the simulated samples to reproduce the pileup distribution measured in data.

Triggers
Several triggers are used to collect events consistent with the signal hypothesis in the six channels under consideration.
For the W(µν)H and W(eν)H channels, the trigger paths consist of several single-lepton triggers with tight lepton identification. Leptons are also required to be isolated from other tracks and calorimeter energy deposits to maintain an acceptable trigger rate. For the W(µν)H channel and for the 2011 data, the trigger thresholds for the muon transverse momentum, p T , are in the range of 17 to 24 GeV. The higher thresholds are used for the periods of higher instantaneous luminosity. For the 2012 data the muon trigger p T threshold for the single-isolated-muon trigger is set at 24 GeV. For both the 2011 and 2012 data, a single-muon trigger with a 40 GeV p T threshold, but without any isolation requirements, is also used for this channel. The combined single-muon trigger efficiency is ≈90% for W(µν)H events that pass all offline requirements that are described in Section 5.
For the W(eν)H channel and for the 2011 data, the electron p T threshold ranges from 17 to 30 GeV. To maintain acceptable trigger rates during the periods of high instantaneous luminosity, the lower-threshold triggers also require two central (|η| < 2.6) jets, with a p T threshold in the 25-30 GeV range, and a minimum requirement on the value of an online estimate of the missing transverse energy, E miss T , in the 15-25 GeV range. E miss T is defined online as the magnitude of the vector sum of the transverse momenta of all reconstructed objects identified by a particle-flow algorithm [30,31]. This algorithm combines the information from all CMS subdetectors to identify and reconstruct online individual particles emerging from the proton-proton collisions: charged hadrons, neutral hadrons, photons, muons, and electrons. These particles are then used to reconstruct jets, E miss T and hadronic τ-lepton decays, and also to quantify the isolation of leptons and photons. For the 2012 data, the electron trigger uses a 27 GeV threshold on the p T and no other requirements on jets or E miss T are made. The combined efficiency for these triggers for W(eν)H events to pass the offline selection criteria is >95%.
For the W(τν)H channel trigger, a 1-prong hadronically-decaying tau is required. The p T of the charged track candidate coming from the tau decay is required to be above 20 GeV and the p T of the tau (measured from all reconstructed charged and neutral decay products) above 35 GeV. Additionally, the tau is required to be isolated inside an annulus with inner radius ∆R = 0.2 and outer radius ∆R = 0.4, where no reconstructed charged candidates with p T > 1.5 GeV must be found. A further requirement of a minimum of 70 GeV is placed on the E miss T . The efficiency of this trigger for W(τν)H events that pass the offline selection criteria is >90%.
The Z(µµ)H channel uses the same single-muon triggers as the W(µν)H channel. For the Z(ee)H channel, dielectron triggers with lower p T thresholds, of 17 and 8 GeV, and tight isolation requirements are used. These triggers are nearly 100% efficient for all Z( )H signal events that pass the final offline selection criteria.
For the Z(νν)H channel, combinations of several triggers are used, all requiring E miss T to be above a given threshold. Extra requirements are added to keep the trigger rates manageable as the instantaneous luminosity increased and to reduce the E miss T thresholds in order to improve signal acceptance. A trigger with E miss T > 150 GeV is used for the complete data set in both 2011 and 2012. During 2011 additional triggers that require the presence of two central jets with p T > 20 GeV and E miss T thresholds of 80 or 100 GeV, depending on the instantaneous luminosity, were used. During 2012 an additional trigger that required two central jets with p T > 30 GeV and E miss T > 80 GeV was used. This last trigger was discontinued when the instantaneous luminosity exceeded 3 × 10 33 cm −2 s −1 and was replaced by a trigger that required E miss T > 100 GeV, at least two central jets with vectorial sum p T > 100 GeV and individual p T above 60 and 25 GeV, and no jet with p T > 40 GeV closer than 0.5 in azimuthal angle to the E miss T direction. In order to increase signal acceptance at lower values of E miss T , triggers that require jets to be identified as coming from b quarks are used. For these triggers, two central jets with p T above 20 or 30 GeV, depending on the luminosity conditions, are required. It is also required that at least one central jet with p T above 20 GeV be tagged by the online combined secondary vertex (CSV) b-tagging algorithm described in Section 4. This online b-tagging requirement has an efficiency that is equivalent to that of the tight offline requirement, CSV > 0.898, on the value of the output of the CSV discriminant. The E miss T is required to be greater than 80 GeV for these triggers. For Z(νν)H events with E miss T > 130 GeV, the combined trigger efficiency for Z(νν)H signal events is near 100% with respect to the offline event reconstruction and selection, described in the next sections. For events with E miss T between 100 and 130 GeV the efficiency is 88%.

Event reconstruction
The characterization of VH events, in the channels studied here, requires the reconstruction of the following objects, all originating from a common interaction vertex: electrons, muons, taus, neutrinos, and jets (including those originating from b quarks). The charged leptons and neutrinos (reconstructed as E miss T ) originate from the vector boson decays. The b-quark jets originate from the Higgs boson decays.
The reconstructed interaction vertex with the largest value of ∑ i p T 2 i , where p T i is the transverse momentum of the ith track associated with the vertex, is selected as the primary event vertex. This vertex is used as the reference vertex for all relevant objects in the event, which are reconstructed with the particle-flow algorithm. The pileup interactions affect jet momentum reconstruction, missing transverse energy reconstruction, lepton isolation, and b-tagging efficiencies. To mitigate these effects, all charged hadrons that do not originate from the primary interaction are identified by a particle-flow-based algorithm and removed from consideration in the event. In addition, the average neutral energy density from pileup interactions is evaluated from particle-flow objects and subtracted from the reconstructed jets in the event and from the summed energy in the isolation cones used for leptons, described below [32]. These pileup-mitigation procedures are applied on an event-by-event basis. Jets are reconstructed from particle-flow objects using the anti-k T clustering algorithm [33], with a distance parameter of 0.5, as implemented in the FASTJET package [34,35]. Each jet is required to lie within |η| < 2.5, to have at least two tracks associated with it, and to have electromagnetic and hadronic energy fractions of at least 1%. The last requirement removes jets originating from instrumental effects. Jet energy corrections are applied as a function of pseudorapidity and transverse momentum of the jet [36]. The missing transverse energy vector is calculated offline as the negative of the vectorial sum of transverse momenta of all particleflow objects identified in the event, and the magnitude of this vector is referred to as E miss T in the rest of this article.
Muons are reconstructed using two algorithms [37]: one in which tracks in the silicon tracker are matched to signals in the muon detectors, and another in which a global track fit is performed, seeded by signals in the muon systems. The muon candidates used in the analysis are required to be successfully reconstructed by both algorithms. Further identification criteria are imposed on the muon candidates to reduce the fraction of tracks misidentified as muons. These include the number of measurements in the tracker and in the muon systems, the fit quality of the global muon track and its consistency with the primary vertex. Muon candidates are considered in the |η| < 2.4 range.
Electron reconstruction requires the matching of an energy cluster in the ECAL with a track in the silicon tracker [38]. Identification criteria based on the ECAL shower shape, matching between the track and the ECAL cluster, and consistency with the primary vertex are imposed. Electron identification relies on a multivariate technique that combines observables sensitive to the amount of bremsstrahlung along the electron trajectory, the geometrical and momentum matching between the electron trajectory and associated clusters, as well as shower-shape observables. Additional requirements are imposed to remove electrons produced by photon conversions. In this analysis, electrons are considered in the pseudorapidity range |η| < 2.5, excluding the 1.44 < |η| < 1.57 transition region between the ECAL barrel and endcap, where electron reconstruction is suboptimal.
Charged leptons from the W and Z boson decays are expected to be isolated from other activity in the event. For each lepton candidate, a cone is constructed around the track direction at the event vertex. The scalar sum of the transverse momentum of each reconstructed particle compatible with the primary vertex and contained within the cone is calculated, excluding the contribution from the lepton candidate itself. If this sum exceeds approximately 10% of the candidate p T , the lepton is rejected; the exact requirement depends on the lepton η, p T , and flavor. Including the isolation requirement, the total efficiency to reconstruct muons is in the 87-91% range, depending on p T and η. The corresponding efficiency for electrons is in the 81-98% range.
The hadronically-decaying taus are reconstructed using the hadron plus strips (HPS) algorithm [39] which uses charged hadrons and neutral electromagnetic objects (photons) to reconstruct tau decays. Reconstructed taus are required to be in the range |η| < 2.1. In the first step of reconstruction, charged hadrons are reconstructed using the particle-flow algorithm. Since neutral pions are often produced in hadronic tau decays, the HPS algorithm is optimized to reconstruct neutral pions in the ECAL as objects called "strips". The strip reconstruction starts by centering one strip on the most energetic electromagnetic particle and then looking for other particles in a window of 0.05 in η and 0.20 in φ. Strips satisfying a minimum transverse momentum of p T (strip) > 1 GeV are combined with the charged hadrons to reconstruct the hadronic tau candidate. In the final step of reconstruction, all charged hadrons and strips are required to be contained within a narrow cone size of ∆R = 2.8/p T (τ), where p T (τ) is measured from the reconstructed hadronic tau candidate and is expressed in GeV. Further identification criteria are imposed on the tau candidate to reduce the fraction of electron and muons misidentified as taus. These include the tau candidate passing an anti-electron discriminator and an anti-muon discriminator. The isolation requirement for taus is that the sum of transverse momenta of particle-flow charged hadron and photon candidates, with p T > 0.5 GeV and within a cone of ∆R < 0.5, be less than 2 GeV. The tau reconstruction efficiency is approximately 50% while the misidentification rate from jets is about 1%.
Jets that originate from the hadronization of b quarks are referred to as "b jets". The CSV btagging algorithm [40] is used to identify such jets. The algorithm combines the information about track impact parameters and secondary vertices within jets in a likelihood discriminant to provide separation between b jets and jets originating from light quarks, gluons, or charm quarks. The output of this CSV discriminant has values between zero and one; a jet with a CSV value above a certain threshold is referred to as being "b tagged". The efficiency to tag b jets and the rate of misidentification of non-b jets depend on the threshold chosen, and are typically parameterized as a function of the p T and η of the jets. These performance measurements are obtained directly from data in samples that can be enriched in b jets, such as tt and multijet events (where, for example, requiring the presence of a muon in the jets enhances the heavyflavor content of the events). Several thresholds for the CSV output discriminant are used in this analysis. Depending on the threshold used, the efficiencies to tag jets originating from b quarks, c quarks, and light quarks or gluons are in the 50-75%, 5-25%, and 0.15-3.0% ranges, respectively.
Events from data and from the simulated samples are required to satisfy the same trigger and event reconstruction requirements. Corrections that account for the differences in the performance of these algorithms between data and simulations are computed from data and used in the analysis.

Event selection
The background processes to VH production with H → bb are the production of vector bosons in association with one or more jets (V+jets), tt production, single-top-quark production, diboson production (VV), and QCD multijet production. Except for dibosons, these processes have production cross sections that are several orders of magnitude larger than Higgs boson production. The production cross section for the VZ process, where Z → bb, is only a few times larger than the VH production cross section, and given the nearly identical final state this process provides a benchmark against which the Higgs boson search strategy can be tested.
The event selection is based on the reconstruction of the vector bosons in their leptonic decay modes and of the Higgs boson decay into two b-tagged jets. Background events are substantially reduced by requiring a significant boost of the p T of the vector boson, p T (V), or of the Higgs boson [41]. In this kinematic region the V and H bosons recoil away from each other with a large azimuthal opening angle, ∆φ(V, H), between them. For each channel, different p T (V) boost regions are selected. Because of different signal and background content, each p T (V) region has different sensitivity and the analysis is performed separately in each region. The results from all regions are then combined for each channel. The low-, intermediate-, and highboost regions for the W(µν)H and W(eν)H channels are defined by 100 < p T (V) < 130 GeV, 130 < p T (V) < 180 GeV, and p T (V) > 180 GeV. For the W(τν)H a single p T (V) > 120 GeV region is considered. For the Z( )H channels, the low-and high-boost regions are defined by 50 < p T (V) < 100 GeV and p T (V) > 100 GeV. For the Z(νν)H channel E miss T is used to define the low-, intermediate-, and high-boost p T (V) regions as 100 < E miss T < 130 GeV, 130 < E miss T < 170 GeV, and E miss T > 170 GeV, respectively. In the rest of the article the term "boost region" is used to refer to these p T (V) regions.
Candidate W → ν decays are identified by requiring the presence of a single-isolated lepton and additional missing transverse energy. Muons are required to have p T > 20 GeV; the corresponding thresholds for electrons and taus are 30 and 40 GeV, respectively. For the W( ν)H and W(τν)H channels, E miss T is required to be >45 and >80 GeV, respectively, to reduce contamination from QCD multijet processes. To further reduce this contamination, it is also required for the W( ν)H channels that the azimuthal angle between the E miss T direction and the lepton be <π/2, and that the lepton isolation for the low-boost region be tighter.
Candidate Z → decays are reconstructed by combining isolated, oppositely-charged pairs of electrons or muons and requiring the dilepton invariant mass to satisfy 75 < m < 105 GeV. The p T for each lepton is required to be >20 GeV.
The identification of Z → νν decays requires the E miss T in the event to be within the boost regions described above. The QCD multijet background is reduced to negligible levels in this channel when requiring that the E miss T does not originate from mismeasured jets. To that end three event requirements are made. First, for the high-boost region, a ∆φ(E miss T , jet) > 0.5 radians requirement is applied on the azimuthal angle between the E miss T direction and the closest jet with |η| < 2.5 and p T > 20 GeV for the 7 TeV analysis or p T > 25 GeV for the 8 TeV analysis (where more pileup interactions are present). For the low-and intermediate-boost regions the requirement is tightened to ∆φ(E miss T , jet) > 0.7 radians. The second requirement is that the azimuthal angle between the missing transverse energy direction as calculated from charged tracks only (with p T > 0.5 GeV and |η| < 2.5) and the E miss T direction, ∆φ(E miss T , E miss T (tracks)), should be smaller than 0.5 radians. The third requirement is made for the low-boost region where the E miss T significance (defined as the ratio between the E miss T and the square root of the total transverse energy in the calorimeter, measured in GeV) should be larger than 3. To reduce background events from tt and WZ production in the W( ν)H, W(τν)H, and Z(νν)H channels, events with an additional number of isolated leptons, N a > 0, with p T > 20 GeV are rejected.
The reconstruction of the H → bb decay proceeds by selecting the pair of jets in the event, each with |η| < 2.5 and p T above a minimum threshold, for which the value of the magnitude of the vectorial sum of their transverse momenta, p T (jj), is the highest. These jets are then also required to be tagged by the CSV algorithm, with the value of the CSV discriminator above a minimum threshold. The background from V+jets and diboson production is reduced significantly when the b-tagging requirements are applied and processes where the two jets originate from genuine b quarks dominate the final selected data sample.
After all event selection criteria described in this section are applied, the dijet invariant-mass resolution of the two b jets from the Higgs decay is approximately 10%, depending on the p T of the reconstructed Higgs boson, with a few percent shift on the value of the mass peak. The Higgs boson mass resolution is further improved by applying multivariate regression techniques similar to those used at the CDF experiment [42]. An additional correction, beyond the standard CMS jet energy corrections, is computed for individual b jets in an attempt to recalibrate to the true b-quark energy. For this purpose, a specialized BDT is trained on simulated H → bb signal events with inputs that include detailed jet structure information which differs in jets from b quarks from that of jets from light-flavor quarks or gluons. These inputs include variables related to several properties of the secondary vertex (when reconstructed), information about tracks, jet constituents, and other variables related to the energy reconstruction of the jet. Because of semileptonic b-hadron decays, jets from b quarks contain, on average, more leptons and a larger fraction of missing energy than jets from light quarks or gluons. Therefore, in the cases where a low-p T lepton is found in the jet or in its vicinity, the following variables are also included in the BDT regression: the p T of the lepton, the ∆R distance between the lepton and the jet directions, and the transverse momentum of the lepton relative to the jet direction. For the Z( )H channels the E miss T in the event and the azimuthal angle between the E miss T and each jet are also considered in the regression. The output of the BDT regression is the corrected jet energy. The average improvement on the mass resolution, measured on simulated signal samples, when the corrected jet energies are used is ≈15%, resulting in an increase in the analysis sensitivity of 10-20%, depending on the specific channel. This improvement is shown in Fig. 1 for simulated samples of Z( )H(bb) events where the improvement in resolution is ≈25%. The validation of the regression technique in data is done with samples of Z → events with two b-tagged jets and in tt-enriched samples in the lepton+jets final state. In the Z → case, when the jets are corrected by the regression procedure, the p T balance distribution, between the Z boson, reconstructed from the leptons, and the b-tagged dijet system is improved to be better centered at zero and narrower than when the regression correction is not applied. In the tt-enriched case, the reconstructed top-quark mass distribution is closer to the nominal top-quark mass and also narrower than when the correction is not applied. In both cases the distributions for data and the simulated samples are in very good agreement after the regression correction is applied. , before (red) and after (blue) the energy correction from the regression procedure is applied. A Bukin function [43] is fit to the distribution and the fitted width of the core of the distribution is displayed on the figure.
The signal region is defined by events that satisfy the vector boson and Higgs boson reconstruction criteria described above together with the requirements listed in Table 1. In the final stage of the analysis, to better separate signal from background under different Higgs boson mass hypotheses, an event BDT discriminant is trained separately at each mass value using simulated samples for signal and all background processes. The training of this BDT is performed with all events in the signal region. The set of event input variables used, listed in Table 2, is chosen by iterative optimization from a larger number of potentially discriminating variables. Among the most discriminant variables for all channels are the dijet invariant mass distribution (m(jj)), the number of additional jets (N aj ), the value of CSV for the Higgs boson daughter with the second largest CSV value (CSV min ), and the distance between Higgs boson daughters (∆R(jj)). It has been suggested that variables related to techniques that study in more detail the substructure of jets could help improve the sensitivity of the H → bb searches [41]. In this analysis, several combinations of such variables were considered as additional inputs to the BDT discriminant. However they did not yield significant gains in sensitivity and are not included in the final training used.
A fit is performed to the shape of the output distribution of the event BDT discriminant to search for events resulting from Higgs boson production. Before testing all events through this final discriminant, events are classified based on where they fall in the output distributions of several other background-specific BDT discriminants that are trained to discern signal from Table 1: Selection criteria that define the signal region. Entries marked with "-" indicate that the variable is not used in the given channel. If different, the entries in square brackets indicate the selection for the different boost regions as defined in the first row of the table. The p T thresholds for the highest and second highest p T jets are p T (j 1 ) and p T (j 2 ), respectively. The transverse momentum of the leading tau track is p T (track). The values listed for kinematic variables are in units of GeV, and for angles in units of radians.
individual background processes. This technique, similar to the one used by the CDF collaboration [44], divides the samples into four distinct subsets that are enriched in tt, V+jets, dibosons, and VH. The increase in the analysis sensitivity from using this technique in the Z(νν)H and W( ν)H channels is 5-10%. For the Z( )H channel the improvement is not as large and therefore the technique is not used for that case. The technique is also not used in the W(τν)H channel because of the limited size of the simulated event samples available for training multiple BDT discriminants. The first background-specific BDT discriminant is trained to separate tt from VH, the second one is trained to separate V+jets from VH, and the third one separates diboson events from VH. The output distributions of the background-specific BDTs are used to separate events in four subsets: those that fail a requirement on the tt BDT are classified as tt-like events, those that pass the tt BDT requirement but fail a requirement on the V+jets BDT are classified as V+jets-like events, those that pass the V+jets BDT requirement but fail the requirement on the diboson BDT are classified as diboson-like events and, finally, those that pass all BDT requirements are considered VH-enriched events. The events in each subset are then run through the final event BDT discriminant and the resulting distribution, now composed of four distinct subsets of events, is used as input to the fitting procedure.
As a validation of the multivariate approach to this analysis, these BDT discriminants are also trained to find diboson signals (ZZ and WZ, with Z → bb) rather than the VH signal. The event selection used in this case is identical to that used for the VH search.
As a cross-check to the BDT-based analysis, a simpler analysis is done by performing a fit to the shape of the dijet invariant mass distribution of the two jets associated with the reconstructed Higgs boson, m(jj). The event selection for this analysis is more restrictive than the one used in the BDT analysis and is optimized for sensitivity in this single variable. Table 3 lists the event selection of the m(jj) analysis. Since the diboson background also exhibits a peak in the m(jj) distribution from Z bosons that decay into b quark pairs, the distribution is also used 6 Background control regions

Background control regions
Appropriate control regions are identified in data and used to validate the simulation modeling of the distributions used as input to the BDT discriminants, and to obtain scale factors used to adjust the simulation event yield estimates for the most important background processes: production of W and Z bosons in association with jets and tt production. For the W and Z backgrounds the control regions are defined such that they are enriched in either heavy-flavor (HF) or light-flavor (LF) jets. Furthermore, these processes are split according to how many of the two jets selected in the Higgs boson reconstruction originate from b quarks, and separate scale factors are obtained for each case. The notation used is: V + udscg for the case where none of the jets originate from a b quark, V + b for the case where only one of the jets is from a b quark, and V + bb for the case where both jets originate from b quarks.
To obtain the scale factors by which the simulated event yields are adjusted, a set of binned likelihood fits is simultaneously performed to CSV distributions of jets for events in the control regions. These fits are done separately for each channel. Several other distributions are also fit to verify consistency. These scale factors account not only for cross section discrepancies, but also for potential residual differences in physics object selection. Therefore, separate scale factors are used for each background process in the different channels. The uncertainties in the scale factor determination include two components: the statistical uncertainty due to the finite size of the samples and the systematic uncertainty. The latter is obtained by subtracting, in quadrature, the statistical component from the full uncertainty which includes the effect of various sources of systematic uncertainty such as b-tagging, jet energy scale, and jet energy resolution.   Table 7 summarizes the fit results for all channels for the 8 TeV data. The scale factors are found to be close to unity for all processes except for V + b for which the scale factors are consistently found to be close to two. In this case, most of the excess occurs in the region of low CSV min values in which events with two displaced vertices are found relatively close to each other, within a distance ∆R < 0.5 defined by the directions of their displacement trajectories with respect to the primary vertex. This discrepancy is interpreted as arising mainly from mismodeling in the generator parton shower of the process of gluon-splitting to b-quark pairs. In this process the dominant contribution typically contains a low-p T b quark that can end up not being reconstructed as a jet above the p T threshold used in the analysis, or that is merged with the jet from the more energetic b quark. These discrepancies are consistent with similar observations in other studies of the production of vector bosons in association with heavy-flavor quarks by the ATLAS and CMS experiments [46][47][48].

Uncertainties
The systematic uncertainties that affect the results presented in this article are listed in Table 8 and are described in more detail below.
The uncertainty in the CMS luminosity measurement is estimated to be 2.2% for the 2011 data [49] and 2.6% for the 2012 data [50]. Muon and electron trigger, reconstruction, and identification efficiencies are determined in data from samples of leptonic Z-boson decays. The uncertainty on the event yields resulting from the trigger efficiency estimate is 2% per lepton       Table 7. Top left: Dijet p T distribution in the Z+jets control region for the Z(ee)H channel. Top right: p T distribution in the tt control region for the W(µν)H channel. Bottom left: CSV min distribution for the W+HF high-boost control region for the Z(νν)H channel. Bottom right: E miss T distribution for the Z+HF high-boost control region for the Z(νν)H channel. The bottom inset in each figure shows the ratio of the number of events observed in data to that of the Monte Carlo prediction for signal and backgrounds. and the uncertainty on the identification efficiency is also 2% per lepton. The parameters describing the Z(νν)H trigger efficiency turn-on curve have been varied within their statistical uncertainties and also estimated for different assumptions on the methods used to derive the efficiency. This results in an event yield uncertainty of about 3%.
The jet energy scale is varied within its uncertainty as a function of jet p T and η. The efficiency of the analysis selection is recomputed to assess the variation in event yields. Depending on the process, a 2-3% yield variation is found. The effect of the uncertainty on the jet energy resolution is evaluated by smearing the jet energies according to the measured uncertainty. Depending on the process, a 3-6% variation in event yields is obtained. The uncertainties in the jet energy scale and resolution also have an effect on the shape of the BDT output distribution. The impact of the jet energy scale uncertainty is determined by recomputing the BDT output distribution after shifting the energy scale up and down by its uncertainty. Similarly, the impact of the jet energy resolution is determined by recomputing the BDT output distribution after increasing or decreasing the jet energy resolution. An uncertainty of 3% is assigned to the event yields of all processes in the W( ν)H and Z(νν)H channels due to the uncertainty related to the missing transverse energy estimate.
Data/MC b-tagging scale factors are measured in heavy-flavor enhanced samples of jets that contain muons and are applied consistently to jets in signal and background events. The measured uncertainties for the b-tagging scale factors are: 3% per b-quark tag, 6% per charm-quark tag, and 15% per mistagged jet (originating from gluons and light u, d, or s quarks) [40]. These translate into yield uncertainties in the 3-15% range, depending on the channel and the spe-cific process. The shape of the BDT output distribution is also affected by the shape of the CSV distributions and an uncertainty is assigned according to a range of variations of the CSV distributions.
The total VH signal cross section has been calculated to NNLO accuracy, and the total theoretical uncertainty is ≈4% [51], including the effect of scale variations and PDF uncertainties [25,[52][53][54][55]. This analysis is performed in the boosted regime, and differences in the p T spectrum of the V and H bosons between data and MC introduce systematic effects in the signal acceptance and efficiency estimates. Two calculations are available that evaluate the NLO electroweak (EW) [56][57][58] and NNLO QCD [59] corrections to VH production in the boosted regime. Both the electroweak and QCD corrections are applied to the signal samples. The estimated uncertainties of the NLO electroweak corrections are 2% for both the ZH and WH production processes. The estimate for the NNLO QCD correction results in an uncertainty of 5% for both the ZH and WH production processes.
The uncertainty in the background event yields estimated from data is approximately 10%. For V+jets, the difference between the shape of the BDT output distribution for events generated with the MADGRAPH and the HERWIG ++ Monte Carlo generators is considered as a shape systematic uncertainty. For tt the differences in the shape of the BDT output distribution between the one obtained from the nominal MADGRAPH samples and those obtained from the POWHEG and MC@NLO [60] generators are considered as shape systematic uncertainties.
An uncertainty of 15% is assigned to the event yields obtained from simulation for single-topquark production. For the diboson backgrounds, a 15% cross section uncertainty is assumed. These uncertainties are consistent with the CMS measurements of these processes [61,62]. The limited number of MC simulated events is also taken into account as a source of uncertainty.
The combined effect of the systematic uncertainties results in an increase of about 15% on the expected upper limit on the Higgs boson production cross section and in a reduction of 15% on the expected significance of an observation when the Higgs boson is present in the data at the predicted standard model rate.

Results
Results are obtained from combined signal and background binned likelihood fits to the shape of the output distribution of the BDT discriminants. These are trained separately for each channel and for each Higgs boson mass hypothesis in the 110-135 GeV range. In the simultaneous fit to all channels, in all boost regions, the BDT shape and normalization for signal and for each background component are allowed to vary within the systematic and statistical uncertainties described in Section 7. These uncertainties are treated as independent nuisance parameters in the fit. All nuisance parameters, including the scale factors described in Section 6, are adjusted by the fit.
In total 14 BDT distributions are considered. Figure 4 shows an example of these distributions after the fit for the high-boost region of the Z(νν)H channel, for the m H = 125 GeV mass hypothesis. The four partitions in the left panel correspond to the subsets enriched in tt, V+jets, diboson, and VH production, as described in Section 5. The right panel shows the right-most, VH-enriched, partition in more detail. For completeness, all 14 BDT distributions used in the fit are shown in Figs. 10-14 in Appendix A. Table 9 lists, for partial combinations of channels, the total number of events in the four highest bins of their corresponding BDT for the expected backgrounds, for the 125 GeV SM Higgs boson signal, and for data. An excess compatible with Table 8: Information about each source of systematic uncertainty, including whether it affects the shape or normalization of the BDT output, the uncertainty in signal or background event yields, and the relative contribution to the expected uncertainty in the signal strength, µ (defined as the ratio of the best-fit value for the production cross section for a 125 GeV Higgs boson, relative to the standard model cross section). Due to correlations, the total systematic uncertainty is less than the sum in quadrature of the individual uncertainties. The last column shows the percentage decrease in the total signal strength uncertainty, including statistical, when removing that specific source of uncertainty. The ranges quoted are due to the difference between 7 and 8 TeV data, different channels, specific background processes, and the different Higgs boson mass hypotheses. See text for details. the presence of the SM Higgs boson is observed. Figure 5 combines the BDT outputs of all channels where the events are gathered in bins of similar expected signal-to-background ratio, as given by the value of the output of their corresponding BDT discriminant (trained with a Higgs boson mass hypothesis of 125 GeV). The observed excess of events in the bins with the largest signal-to-background ratio is consistent with what is expected from the production of the standard model Higgs boson. Table 9: The total number of events for partial combinations of channels in the four highest bins of their corresponding BDT for the expected backgrounds (B), for the 125 GeV SM Higgs boson VH signal (S), and for data. Also shown is the signal-to-background ratio (S/B). The results of all channels, for all boost regions and for the 7 and 8 TeV data, are combined to obtain 95% confidence level (CL) upper limits on the product of the VH production cross section times the H → bb branching fraction, with respect to the expectations for a standard model Higgs boson (σ/σ SM ). At each mass point the observed limit, the median expected limit,  The two bottom insets show the ratio of the data to the background-only prediction (above) and to the predicted sum of background and SM Higgs boson signal with a mass of 125 GeV (below). and the 1 and 2 standard deviation bands are calculated using the modified frequentist method CL s [63][64][65]. Figure 6 displays the results.
For a Higgs boson mass of 125 GeV the expected limit is 0.95 and the observed limit is 1.89. Given that the resolution for the reconstructed Higgs boson mass is ≈10%, these results are compatible with a Higgs mass of 125 GeV. This is demonstrated by the red dashed line in the left panel of Fig. 6, which is the expected limit obtained from the sum of expected background and the signal of a SM Higgs boson with a mass of 125 GeV.
For all channels an excess of events over the expected background contributions is indicated by the fits of the BDT output distributions. The probability (p-value) to observe data as discrepant as observed under the background-only hypothesis is shown in the right panel of Fig. 6 as a function of the assumed Higgs boson mass. For m H = 125 GeV, the excess of observed events corresponds to a local significance of 2.1 standard deviations away from the background-only hypothesis. This is consistent with the 2.1 standard deviations expected when assuming the standard model prediction for Higgs boson production.
The relative sensitivity of the channels that are topologically distinct is demonstrated in Table 10 for m H = 125 GeV. The table lists the expected and observed limits and local significance for the W( ν)H and W(τν)H channels combined, for the Z( )H channels combined, and for the Z(νν)H channel.
The best-fit values of the production cross section for a 125 GeV Higgs boson, relative to the standard model cross section (signal strength, µ), are shown in the left panel of Fig. 7 for the W( ν)H and W(τν)H channels combined, for the Z( )H channels combined, and for the Z(νν)H channel. The observed signal strengths are consistent with each other, and the value  for the signal strength for the combination of all channels is 1.0 ± 0.5. In the right panel of Fig. 7 the correlation between the signal strengths for the separate WH and ZH production processes is shown. The two production modes are consistent with the SM expectation, within uncertainties. This figure contains slightly different information than the one on the left panel as some final states contain signal events that originate from both WH and ZH production processes. The WH process contributes approximately 20% of the Higgs boson signal event yields in the Z(νν)H channel, resulting from events in which the lepton is outside the detector acceptance, and the Z( )H process contributes less than 5% to the W( ν)H channel when one of the leptons is outside the detector acceptance. The dependency of the combined signal strength on the value assumed for the Higgs boson mass is shown in the left panel of Fig. 8.
In the right panel of Fig. 8 the best-fit values for the κ V and κ b parameters are shown. The parameter κ V quantifies the ratio of the measured Higgs boson couplings to vector bosons relative to the SM value. The parameter κ b quantifies the ratio of the measured Higgs boson partial width into bb relative to the SM value. They are defined as: κ V 2 = σ VH σ SM VH and κ b 2 = Γ bb Γ SM bb , with the SM scaling of the total width [66]. The measured couplings are consistent with the expectations from the standard model, within uncertainties.

Results for the dijet mass cross-check analysis
The left panel of Fig. 9 shows a weighted dijet invariant mass distribution for the combination of all channels, in all boost regions, in the combined 7 and 8 TeV data, using the event selection for the m(jj) cross-check analysis described in Section 5. For each channel, the relative event weight in each boost region is obtained from the ratio of the expected number of signal events to the sum of expected signal and background events in a window of m(jj) values between 105 and 150 GeV. The expected signal used corresponds to the production of the SM Higgs boson with a mass of 125 GeV. The weight for the highest-boost region is set to 1.0 and all other weights are adjusted proportionally. Figure 9 also shows the same weighted dijet invariant mass distribution with all backgrounds, except diboson production, subtracted. The data are consistent with the presence of a diboson signal from ZZ and WZ channels, with Z → bb), with a rate consistent with the standard model prediction from the MADGRAPH generator, together with a small excess consistent with the production of the standard model Higgs boson with a mass of 125 GeV. For the m(jj) analysis, a fit to the dijet invariant mass distribution results in a measured Higgs boson signal strength, relative to that predicted by the standard model, of µ = 0.8 ± 0.7, with a local significance of 1.1 standard deviations with respect to the background-only hypothesis. For a Higgs boson of mass 125 GeV, the expected and observed 95% CL upper limits on the production cross section, relative to the standard model prediction, are 1.4 and 2.0, respectively.

Diboson signal extraction
As a validation of the multivariate technique, BDT discriminants are trained using the diboson sample as signal, and all other processes, including VH production (at the predicted standard model rate for a 125 GeV Higgs mass), as background. This is done for the 8 TeV dataset only. The observed excess of events for the combined WZ and ZZ processes, with Z → bb, differs by over 7 standard deviations from the event yield expectation from the background-only hypothesis. The corresponding signal strength, relative to the prediction from the diboson MADGRAPH generator mentioned in Section 2, and rescaled to the cross section from the NLO MCFM generator, is measured to be µ VV = 1.19 +0. 28 −0.23 . Upper limits, at the 95% confidence level, on the VH production cross section times the H → bb branching fraction, with respect to the expectations for a standard model Higgs boson, are derived for the Higgs boson in the mass range 110-135 GeV. For a Higgs boson mass of 125 GeV the expected limit is 0.95 and the observed limit is 1.89.

Summary
An excess of events is observed above the expected background with a local significance of 2.1 standard deviations. The expected significance when taking into account the production of the standard model Higgs boson is also 2.1 standard deviations. The sensitivity of this search, as represented by the expected significance, is the highest for a single experiment thus far. The signal strength corresponding to this excess, relative to that of the standard model Higgs boson, is µ = 1.0 ± 0.5. The measurements presented in this article represent the first indication of the H → bb decay at the LHC.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies:   Fig. 15 shows these distributions for the highest-boost region in each channel, normalized to unity. See Section 8 for more details.    Figure 12: Post-fit BDT output distributions for W(τν)H for 8 TeV data (points with error bars), all backgrounds, and signal, after all selection criteria have been applied. The bottom inset shows the ratio of the number of events observed in data to that of the Monte Carlo prediction for signal and backgrounds.