Combined search for the Standard Model Higgs boson in pp collisions at √s = 7 TeV with the ATLAS detector

A combined search for the Standard Model Higgs boson with the ATLAS detector at the LHC is presented. The data sets used correspond to integrated luminosities from 4 : 6 fb (cid:1) 1 to 4 : 9 fb (cid:1) 1 of proton-proton collisions collected at ﬃﬃﬃ s p ¼ 7 TeV in 2011. The Higgs boson mass ranges of 111.4 GeV to 116.6 GeV, 119.4 GeV to 122.1 GeV, and 129.2 GeV to 541 GeVare excluded at the 95% conﬁdence level, while the range 120 GeV to 560 GeV is expected to be excluded in the absence of a signal. An excess of events is observed at Higgs boson mass hypotheses around 126 GeV with a local signiﬁcance of 2.9 standard deviations ( (cid:1) ). The global probability for the background to produce an excess at least as signiﬁcant anywhere in the entire explored Higgs boson mass range of 110–600 GeV is estimated to be (cid:2) 15% , corresponding to a signiﬁcance of approximately 1 (cid:1) .

Combined search for the Standard Model Higgs boson in pp collisions at √ s = 7 TeV with the ATLAS detector

The ATLAS Collaboration
A combined search for the Standard Model Higgs boson with the ATLAS detector at the LHC is presented. The datasets used correspond to integrated luminosities from 4.6 fb −1 to 4.9 fb −1 of proton-proton collisions collected at √ s = 7 TeV in 2011. The Higgs boson mass ranges of 111.4 GeV to 116.6 GeV, 119.4 GeV to 122.1 GeV, and 129.2 GeV to 541 GeV are excluded at the 95% confidence level, while the range 120 GeV to 560 GeV is expected to be excluded in the absence of a signal. An excess of events is observed at Higgs boson mass hypotheses around 126 GeV with a local significance of 2.9 standard deviations (σ). The global probability for the background to produce an excess at least as significant anywhere in the entire explored Higgs boson mass range of 110-600 GeV is estimated to be ∼15%, corresponding to a significance of approximately 1σ.

I. INTRODUCTION
Probing the mechanism for electroweak symmetry breaking (EWSB) is one of the prime objectives of the Large Hadron Collider (LHC). In the Standard Model (SM) [1][2][3], the electroweak interaction is described by a local gauge field theory with an SU (2) L ⊗ U (1) Y symmetry, and EWSB is achieved via the Higgs mechanism with a single SU (2) L doublet of complex scalar fields [4][5][6][7][8][9]. After EWSB the electroweak sector has massive W ± and Z bosons, a massless photon, and a massive CP-even, scalar boson, referred to as the Higgs boson. Fermion masses are generated from Yukawa interactions with couplings proportional to the masses of fermions. The mass of the Higgs boson, m H , is a free parameter in the SM. However, for a given m H hypothesis the cross sections of the various Higgs boson production processes and the branching fractions of the decay modes can be predicted, allowing a combined search with data from several search channels.
Combined searches at the CERN LEP e + e − collider excluded the production of a SM Higgs boson with mass below 114.4 GeV at 95% confidence level (CL) [10]. The combined searches at the Fermilab Tevatron pp collider excluded the production of a SM Higgs boson with a mass between 147 GeV and 179 GeV, and between 100 GeV and 106 GeV at 95% CL [11]. Precision electroweak measurements are sensitive to m H via radiative corrections and indirectly constrain the SM Higgs boson mass to be less than 158 GeV [12] at 95% CL.
In 2011, the LHC delivered an integrated luminosity of 5.6 fb −1 of proton-proton (pp) collisions at a centerof-mass energy of 7 TeV to the ATLAS detector [13]. Of the 4.9 fb −1 collected, the integrated luminosity used in the individual Higgs search channels is between 4.6 fb −1 and 4.9 fb −1 , depending on the data quality requirements specific to each channel. This paper presents a combined search for the SM Higgs boson in the decay modes H → γγ, H → ZZ ( * ) , H → W W ( * ) , H → τ + τ − , and H → bb, with subsequent decays of the W , Z, and τ leading to different final states. Some searches are designed to exploit the features of the production modes pp → H (gluon fusion), pp → qqH (vector boson fusion) and pp → V H with V = W ± or Z (associated production with a gauge boson). In order to enhance the search sensitivity, the various decay modes are further subdivided into sub-channels with different signal and background contributions and different sensitivities to systematic uncertainties. While the selection requirements for individual search channels are disjoint, each selection is, in general, populated by more than one combination of Higgs boson production and decay. For instance, Higgs boson production initiated by vector boson fusion (VBF) can contribute significantly to a search channel optimized for gluon fusion production.
The ATLAS collaboration has previously published a similar but less extensive combined search for the Higgs boson [14] in data taken at the LHC in 2011. The CMS collaboration has also performed a combined analysis of Higgs searches with data collected in 2011 and have obtained similar results [15]. In comparison to the analysis of Ref. [14], the H → τ + τ − and H → bb channels have been added, the H → W W ( * ) → ℓ + νℓ − ν analysis has been updated and extended to cover the mass range range of 110-600 GeV, and the H → W W → ℓνqq ′ H → ZZ → ℓ + ℓ − νν, and H → ZZ → ℓ + ℓ − qq analyses have been updated to use the full 2011 dataset. Both the H → W W ( * ) → ℓ + νℓ − ν and H → W W → ℓνqq ′ analyses include a specific treatment of the 2-jets final state targeted at the VBF production process.
The different channels entering the combination are summarized in Table I. After describing the general approach to statistical modeling in Section II, the individual channels and the specific systematic uncertainties are described in Section III and Section IV, respectively. The statistical procedure is described in Section V and the resulting exclusion limits and compatibility with the background-only hypothesis are presented in Section VI and Section VII, respectively.

II. STATISTICAL MODELING
In this combined analysis, a given search channel, indexed by c, is defined by its associated event selection criteria, which may select events from various physical processes. In addition to the number of selected events, TABLE I. Summary of the individual channels entering the combination. The transition points between separately optimized mH regions are indicated when applicable. The symbols ⊗ and ⊕ represent direct products or sums over sets of selection requirements. The details of the sub-channels are given in Section III. n, each channel may make use of an invariant or transverse mass distribution of the Higgs boson candidates. The discriminating variable is denoted x and its probability density function (pdf) is written as f (x|α), where α represents both theoretical parameters such as m H and nuisance parameters associated with various systematic effects. These distributions are normalized to unit probability. The predicted number of events satisfying the selection requirements is parameterized as ν(α). For a channel with n selected events, the data consist of the values of the discriminating variables for each event D = {x 1 , . . . , x n }. The probability model for this type of data is referred to as an unbinned extended likelihood or marked Poisson model f, given by For each channel several signal and background scattering processes contribute to the total rate ν and the overall pdf f (x|α). Here, the term process is used for any set of scattering processes that can be added incoherently. The total rate is the sum of the individual rates and the total pdf is the weighted sum Using e as the index over the n c events in the c th channel, x ce is the value of the observable x for the e th event in channels 1 to c max . The total data are a collection of data from individual channels: D com = {D 1 , . . . , D cmax }.
The combined model can then be written as follows

A. Parameterization of the Model
The parameter of interest is the overall signal strength factor µ, which acts as a scale factor to the total rate of signal events. This global factor is used for all pairings of production cross sections and branching ratios. The signal strength is defined such that µ = 0 corresponds to the background-only model and µ = 1 corresponds to the SM Higgs boson signal. It is convenient to separate the full list of parameters α into the parameter of interest µ, the Higgs boson mass m H , and the nuisance parameters θ, i.e. α = (µ, m H , θ).
Each channel in the combined model uses either the reconstructed transverse mass or the invariant mass of the Higgs candidate as a discriminating variable. Two approaches are adopted to model the signal pdfs at intermediate values of m H where full simulation has not been performed. The first, used in the H → γγ channel, is based on a continuous parameterization of the signal as a function of m H using an analytical expression for the pdf validated with simulated Monte Carlo (MC) samples. The second, used in channels where pdfs are modeled with histograms, is based on an interpolation procedure using the algorithm of Ref. [24].

B. Auxiliary Measurements
The nuisance parameters represent uncertain aspects of the model, such as the background normalization, re-construction efficiencies, energy scale and resolution, luminosity, and theoretical predictions. These nuisance parameters are often estimated from auxiliary measurements, such as control regions, sidebands, or dedicated calibration measurements. A detailed account of these measurements is beyond the scope of this paper and is given in the references for the individual channels [16][17][18][19][20][21][22][23].
Each parameter α p with a dedicated auxiliary measurement f aux (D aux |α p , α other ) provides a maximum likelihood estimate for α p , a p , and a standard error σ p . Thus, the detailed probability model for an auxiliary measurement is approximated as The f p (a p |α p , σ p ) are referred to as constraint terms. The fully frequentist procedure applied for the present analysis includes randomizing the a p when constructing the ensemble of possible experiment outcomes. In the hybrid frequentist-Bayesian procedures used at LEP and the Tevatron, the a p are held constant and the nuisance parameters α p are randomized according to the prior probability density where η(α p ) is an original prior, usually taken to be constant.
The set of nuisance parameters constrained by auxiliary measurements is denoted S and the set of estimates of those parameters, also referred to as global observables which augments D com , is denoted G = {a p } with p ∈ S. Including the constraint terms explicitly, the model can be rewritten to The use of a Gaussian constraint term f p (a p |α p , σ p ) = Gauss(a p |α p , σ p ) is problematic if the parameter is intrinsically non-negative, as is the case for event yields and energy scale uncertainties. This is particularly important when the relative uncertainty is large. An alternative constraint term defined only for positive parameter values is the log-normal distribution, which is given by The conventional choice κ = 1 + σ rel is made, where σ rel is the relative uncertainty σ p /a p from the observed auxiliary measurement [25].
Using the log-normal distribution for a p is equivalent to having a Gaussian constraint for the transformed parameter a ′ p = ln a p and α ′ p = ln α p .
For channels that use histograms based on simulated MC samples, the parametric pdf f (x|α) is formed by interpolating between histogram variations evaluated at α p = a p ±σ p . Since the variations need not be symmetric, the function is treated in a piecewise way using a sixthorder polynomial to interpolate in the range α p ∈ [a p − σ p , a p +σ p ] with coefficients chosen to match the first and second derivatives [26]. Henceforth, the prime will be suppressed and α p will refer to the transformed nuisance parameter.
Not all systematic uncertainties have an associated auxiliary measurement. For example, uncertainties associated with the choice of renormalization and factorization scales and missing higher-order corrections in a theoretical calculation are not statistical in nature. In these cases, the frequentist form of the constraint term is derived assuming, by convention, a log-normal prior probability density on these parameters and inverting Eq. (6).

III. INDIVIDUAL SEARCH CHANNELS
All the channels combined to search for the SM Higgs boson use the complete 2011 dataset passing the relevant quality requirements. The Higgs boson decays considered are H → γγ, H → W W ( * ) , H → ZZ ( * ) , H → τ + τ − , and H → bb. In modes with a W or Z boson, an electron or muon is required for triggering. In the H → τ + τ − channel, almost all combinations of subsequent τ decays are considered. The results in the γγ and ℓ + ℓ − ℓ + ℓ − modes are the same as in the previously published combination [14], but all other channels have been updated. A summary of the individual channels contributing to this combination is given in Table I.
The invariant and transverse mass distributions for the individual channels are shown in Figs. 1 and 2, with several sub-channels merged.
• H → γγ: This analysis is unchanged with respect to the previous combined search [14,16] and is carried out for m H hypotheses between 110 GeV and 150 GeV. Events are separated into nine independent categories of varying sensitivity. The categorization is based on the pseudorapidity of each photon, whether it was reconstructed as a converted or unconverted photon, and the momentum component of the diphoton system transverse to the diphoton thrust axis (p Tt ). The mass resolution is approximately 1.7% for m H ∼120 GeV.
shown before the final selection requirements are applied.
the ZH → ννbb channels. The vertical dashed lines illustrate the separation between the mass spectra of the subcategories in p Z T , p W T , and E miss T , respectively. The signal distributions are lightly shaded where they have been scaled by a factor of five or ten for illustration purposes.
• H → ZZ ( * ) : In the ZZ ( * ) decay mode at least one Z is required to decay to charged leptons, while the other decays to either leptons, neutrinos, or jets.
-H → ZZ ( * ) → ℓ + ℓ − ℓ + ℓ − : This analysis, described in Ref. [17], is performed for m H hypotheses in the 110 GeV to 600 GeV mass range and is unchanged with respect to the previous combined search [14]. The main irreducible ZZ ( * ) background is estimated using a combination of Monte Carlo simulation and the observed data. The reducible Z+jets background, which mostly impacts the low four-lepton invariant mass region, is estimated from control regions in the data. The topquark (tt) background normalization is validated using a dedicated control sample. Four categories of events are defined by the lepton flavor combinations and the four-lepton invariant mass is used as a discriminating variable. The mass resolution is approximately 1.5% in the four-muon channel and 2% in the four-electron channel for m H ∼120 GeV.
-H → ZZ → ℓ + ℓ − νν: This analysis [18] is split into two regimes according to the level of pile-up, i.e. the average number of pp collisions per bunch crossing. The first 2.3 fb −1 of data had an average of about six pile-up collisions per event and the subsequent 2.4 fb −1 had an average of about twelve. The search is performed for m H hypotheses ranging from 200 GeV to 600 GeV. The analysis is further categorized by the flavor of the leptons from the Z decay. The selection is optimized separately for Higgs boson masses above and below 280 GeV. The ℓ + ℓ − invariant mass is required to be within 15 GeV of the Z boson mass. The inverted requirement is applied to same-flavor leptons in the H → W W ( * ) → ℓ + νℓ − ν channel to avoid overlap in the selection. The transverse mass (m T ), computed from the dilepton transverse momentum and the missing transverse momentum, is used as a discriminating variable.
-H → ZZ → ℓ + ℓ − qq: This search is performed for m H hypotheses ranging from 200 GeV to 600 GeV and is separated into search regions above and below m H =300 GeV, for which the event selections are independently optimized. The dominant background arises from Z+jets production, which is estimated using sidebands of the dijet invariant mass distribution in data. To profit from the relatively large rate of b-jets from Z boson decays present in the signal compared to the rate of b-jets found in the Z+jets background, the analysis is divided into two categories. The first category contains events in which the two jets are b-tagged and the second uses events with less than two b-tags. The analysis [19] takes advantage of a highly efficient b-tagging algorithm [27] and the sideband to constrain the background yield. Using the Z boson mass constraint improves the mass resolution of the ℓ + ℓ − qq system by more than a factor of two. The invariant mass of the ℓ + ℓ − qq system is used as a discriminating variable.
• H → W W ( * ) : Two sets of channels are devoted to the decay of the Higgs boson into a pair of W bosons, namely the ℓ + νℓ − ν and ℓνqq ′ channels.
The updated analysis [20] is performed for m H values from 110 GeV up to 600 GeV. Events with two leptons are classified by the number of associated jets (0, 1 or 2), where the two-jet category has selection criteria designed to enhance sensitivity to the VBF production process. The events are further divided by the flavors of the charged leptons, ee, eµ and µµ where the mixed mode (eµ) has a much smaller background from the Drell-Yan process. As in the case of H → ZZ → ℓ + ℓ − νν, the samples are split according to the pile-up conditions and analyzed separately. Each sub-channel uses the W W transverse mass distribution, except for the 2-jets category, which does not use a discriminating variable.
-H → W W → ℓνqq ′ : This analysis is performed for m H hypotheses ranging from 300 GeV to 600 GeV. A leptonically decaying W boson is tagged with an isolated lepton and missing transverse momentum (E miss T ). Additionally, two jets with an invariant mass compatible with a second W boson [21] are required. The W boson mass constraint allows the reconstruction of the Higgs boson candidate mass on an event-by-event basis by using a quadratic equation to solve for the component of the neutrino momentum along the beam axis. Events where this equation has imaginary solutions are discarded in order to reduce tails in the mass distribution. The analysis searches for a peak in the reconstructed ℓνqq ′ mass distribution. The background is modeled with a smooth function. The analysis is further divided by lepton flavor and by the number of additional jets (0, 1 or 2), where the two jet channel is optimized for the VBF production process.
• H → τ + τ − : The analyses [22] are categorized by the decay modes of the two τ leptons, for m H hypotheses ranging from 110 GeV to 150 GeV (the leptonically decaying τ leptons are denoted τ lep and the hadronically decaying τ leptons are denoted τ had ). Most of these sub-channels are triggered using leptons, except for the fully hadronic channel H → τ had τ had , which is triggered with specific double hadronic τ decay selections. All the searches using τ decay modes have a significant background from Z → τ + τ − decays, which are modeled using an embedding technique where Z → µ + µ − candidates selected in the data have the muons replaced by simulated τ decays [22]. These embedded events are used to describe this background process.
-H → τ lep τ lep : In this channel events are separately analyzed in four disjoint categories based on the number of reconstructed jets in the event [22]. There are two categories specifically aimed towards the gluon fusion production process, with or without a jet, one for the VBF production process and one for the Higgs boson production in association with a hadronically decaying vector boson. Each jet category requires at least one jet with p T above 40 GeV. The collinear approximation [28] is used to reconstruct the τ τ invariant mass, which is used as the discriminating variable. All three combinations of e and µ are used, except in the 0-jets category, which uses only the eµ candidate events where the effective mass is used as a discriminating variable.
-H → τ lep τ had : There are seven separate categories in this sub-channel. The selection of VBFlike events requires two jets with oppositely signed pseudorapidities η, |∆η jj | > 3.0 and a dijet invariant mass larger than 300 GeV, in which events with electrons and muons are combined due to the limited number of candidates. In the other subchannels, electron and muon final states are considered separately. The remaining candidate events are categorized according to the number of jets with transverse momenta in excess of 25 GeV, the 0-jets category being further subdivided based on whether the E miss T exceeds 20 GeV or not. The Missing Mass Calculator (MMC) technique [29] is used to estimate the τ τ invariant mass, which is used as a discriminating variable.
-H + jet → τ had τ had + jet: Events are triggered using a selection of two hadronically decaying τ leptons with transverse energy thresholds varying according to the running conditions [22]. Two oppositely-charged hadronically decaying τ candidates are required along with one jet with transverse momentum larger than 40 GeV, E miss T > 20 GeV and a reconstructed invariant mass of the two τ leptons and the jet greater than 225 GeV. In addition to the Z background there is a significant multijet background which is estimated using data-driven methods. The τ τ invariant mass is estimated via the collinear approximation and is used as a discriminating variable after further selections on the momentum fractions carried away by visible τ decay products.
• H → bb: The ZH → ℓ + ℓ − bb, ZH → ννbb, and W H → ℓνbb analyses [23] are performed for m H ranging from 110 GeV to 130 GeV. All three analyses require two b-tagged jets (one with p T > 45 GeV and the other with p T > 25 GeV) and the invariant mass of the two b-jets, m bb , is used as a discriminating variable. The ZH → ℓ + ℓ − bb analysis requires a dilepton invariant mass in the range 83 GeV< m ℓℓ < 99 GeV and E miss T < 50 GeV to suppress the tt background. The W H → ℓνbb analysis re-quires E miss T > 25 GeV, the transverse mass of the lepton-E miss T system to be in excess of 40 GeV, and no additional leptons with p T > 20 GeV. The ZH → ννbb analysis requires E miss T > 120 GeV, as well as p miss T > 30 GeV, where p miss T is the missing transverse momentum determined from the tracks associated with the primary vertex. To increase the sensitivity of the search, the m bb distribution is examined in sub-channels with different signal-tobackground ratios. In the searches with one or two charged leptons, the division is made according to four bins in transverse momentum p V T of the reconstructed vector boson V : In the ZH → ννbb search the E miss T is used to define three sub-channels corresponding to 120 GeV < E miss T < 160 GeV, 160 GeV ≤ E miss T < 200 GeV, and E miss T ≥ 200 GeV. No categorization is made based on lepton flavor.

IV. SYSTEMATIC UNCERTAINTIES
The sources of systematic uncertainties and their effects on the signal and background rates ν k (α) and discriminating variable distributions f k (x|α) are described in detail for each channel in Refs. [16][17][18][19][20][21][22][23]. The sources of systematic uncertainty are decomposed into uncorrelated components, such that the constraint terms factorize as in Eq. (7). The main focus of the combination of channels is the correlated effect of given sources of uncertainties across channels. Typically, the correlated effects arise from the ingredients common to several channels, for example the simulation, the lepton and photon identification, and the integrated luminosity. The sources of systematic uncertainty affecting the signal model are frequently different from those affecting the backgrounds, which are often estimated from control regions in the data. The dominant uncertainties giving rise to correlated effects are those associated with theoretical predictions for the signal production cross sections and decay branching fractions, as well as those related to detector response affecting the reconstruction of electrons, photons, muons, jets, E miss T and b-tagging. The log-normal constraint terms are used for uncertainties in the signal and background normalizations, while Gaussian constraints are used for uncertainties affecting the shapes of the pdfs.

B. Theoretical Uncertainties Affecting the Background
In the H → γγ and H → W W → ℓνqq ′ channels the backgrounds are estimated from a fit to the data. This removes almost all sensitivity to the corresponding theoretical uncertainties. In the case of the H → γγ analysis an additional uncertainty is assigned to take into account possible inadequacies of the analytical background model chosen. Theoretical uncertainties enter in all other channels where theoretical calculations are used for background estimates. In particular, both signal and background processes are sensitive to the parton distribution functions, the underlying event simulation, and the parton shower model.
The ZZ ( * ) continuum process is the main background for the H → ZZ ( * ) → ℓ + ℓ − ℓ + ℓ − and H → ZZ → ℓ + ℓ − νν analyses and is also part of the backgrounds in the H → ZZ → ℓ + ℓ − qq channel. An NLO prediction [65] is used for the normalization. The QCD scale uncertainty has a ±5% effect on the expected ZZ ( * ) background, and the effect due to the PDF and α S uncertainties is ±4% and ±8% for quark-initiated and gluon-initiated processes respectively. An additional theoretical uncertainty of ±10% on the inclusive ZZ ( * ) cross section is conservatively included due to the missing higher-order QCD corrections for the gluon-initiated process. This theoretical uncertainty is treated as uncorrelated for the different channels due to the different acceptance in the H → ZZ ( * ) → ℓ + ℓ − ℓ + ℓ − and H → ZZ → ℓ + ℓ − νν channels and because its contribution to the H → ZZ → ℓ + ℓ − qq channel is small.
In most other channels the overall normalization of the main backgrounds is not estimated from theoretical predictions; however, simulations are used to model the pdfs f (x|α) or the scale factors used to extrapolate from the control regions to the signal regions. For example, in the H → W W ( * ) → ℓ + νℓ − ν channel the main backgrounds are continuum W W ( * ) and tt production. Their normalizations are estimated in control regions; however, the factors used to extrapolate to the signal region are estimated with the NLO simulation [66].

C. Experimental Uncertainties
The uncertainty on the integrated luminosity is considered as being fully correlated among channels and amounts to ±3.9% [67,68].
The detector-related sources of systematic uncertainty can affect various aspects of the analysis: (a) the overall normalization of the signal or background, (b) the migration of events between categories and (c) the shape of the discriminating variable distributions f (x|α). Similarly to the theoretical uncertainties, experimental uncertainties on the event yields (a) are treated using a log-normal f p (a p |α p ) constraint pdf. In cases (b) and (c) a Gaussian constraint is applied.
The experimental sources of systematic uncertainty are modeled using the classification detailed below. Their effect on the signal and background yields in each channel separately is reported in Table II. The various sources of systematic uncertainty have in some cases been grouped for a concise presentation (e.g. the jet energy scale and b-tagging efficiencies), while the full statistical model of the data provides a more detailed account of the various systematic effects including the effect on the pdfs f (x|α). The assumptions made in the treatment of systematics are outlined below.
• The uncertainty in the trigger and identification efficiencies are treated as fully correlated for electrons and photons. The energy scale and resolution for photons and electrons are treated as uncorrelated sources of uncertainty.
• The uncertainties affecting muons are separated into those related to the inner detector (ID) and the muon spectrometer (MS) in order to provide a better description of the correlated effect among channels using different muon identification criteria and different ranges of muon transverse momenta.
• The Jet Energy Scale and Jet Energy Resolution are sensitive to a number of uncertain quantities, which depend on p T , η, and flavor of the jet. Measurements of the JES and JER result in complicated correlations among these components. Build-   b-tag efficiency -- ing a complete model of the response to these correlated sources of uncertainty is intricate. Here, a simplified scheme is used in which independent JES and JER nuisance parameters are associated to channels with significantly different kinematic requirements and scattering processes with different kinematic distributions or flavor composition. This scheme includes a specific treatment for b-jets. The sensitivity of the results to various assumptions in the correlation between these sources of uncertainty has been found to be negligible. Furthermore, an additional component to the uncertainty in E miss T , which is uncorrelated with the JES uncertainty, is included.
• While the τ energy scale uncertainty is expected to be partially correlated with the JES, here it is treated as an uncorrelated source of uncertainty. This choice is based on the largely degenerate effect due to the uncertainty associated with the embedding procedure, in which the simulated detector response to hadronic τ decays is merged with a sample of Z → µ + µ − data events. Furthermore, the uncertainty of this embedding procedure is treated separately for signal and background processes, which is a conservative approach given that Z → τ + τ − sideband effectively constrains this nuisance parameter.
• The b-tagging systematic uncertainty is decomposed into five fundamental sources in the H → bb channels, while a simplified model with a single source is used in the H → ZZ → ℓ + ℓ − qq channel. The uncertainty in the b-veto is considered uncorrelated with the uncertainty in the b-tagging efficiency.
The effect of these systematic uncertainties depends on the final state, but is typically small compared to the theoretical uncertainty of the production cross section.
The electron and muon energy scales are directly constrained by Z → e + e − and Z → µ + µ − events; the impact of the resulting systematic uncertainty on the four-lepton invariant mass is of the order of ±0.5% for electrons and negligible for muons. The impact of the photon energy scale systematic uncertainty on the diphoton invariant mass is approximately ±0.6%.

D. Background Measurement Uncertainty
The estimates of background normalizations and model parameters from control regions or sidebands are the main remaining source of uncertainty. Because of the differences in control regions these uncertainties are not correlated across channels.
In the case of the H → bb channels the background normalizations are constrained both from sideband fits and from auxiliary measurements based on the MC prediction of the main background processes (Z+jets, W +jets and tt).
The uncorrelated sources of systematic uncertainties are summarized as a single combined number in Table II for each channel.

E. Summary of the Combined Model
To cover the search range efficiently between m H =110 GeV and m H =600 GeV, the signal and backgrounds are modeled and the combination performed in m H steps that reflect the interplay between the invariant mass resolution and the natural width of the Higgs boson. In the low mass range, where the high mass-resolution H → γγ and H → ZZ ( * ) → ℓ + ℓ − ℓ + ℓ − channels dominate, the signal is modeled in steps from 500 MeV to 2 GeV. For higher masses the combination is performed with step sizes ranging from 2 GeV to 20 GeV. The m H step sizes are given in Table III. The combined model and statistical procedure are implemented within the RooFit, HistFactory, and RooStats software framework [26,69,70]. As shown in   Table I, the number of channels included in the combination depends on the hypothesized value of m H . The details for the number of channels, nuisance parameters, and constraint terms for various m H ranges are shown in Table IV. For m H = 125 GeV there are 70 channels included in the combined statistical model and the associated dataset is comprised of more than 22,000 unbinned events and 8,000 bins. For m H = 350 GeV there are 46 channels included in the combined statistical model and the associated dataset is comprised only of binned distributions, for which there are more than 4,000 bins. Due to the limited size of the MC samples, additional nuisance parameters and constraint terms are included in the model to account for the statistical uncertainty in the MC templates. The difference between the number of nuisance parameters and the number of constraints reported in Table IV corresponds to the number of nuisance parameters without external constraints, which are estimated in data control regions or sidebands. The procedures for computing frequentist p-values used for quantifying the agreement of the data with the background-only hypothesis and for determining exclusion limits are based on the profile likelihood ratio test statistic.
For a given dataset D com and values for the global observables G there is an associated likelihood function of µ and θ derived from the combined model over all channels including all constraint terms in Eq. (7) L(µ, θ; m H , D com , G) = f tot (D com , G|µ, m H , θ) . (9) The notation L(µ, θ) leaves the dependence on the data implicit.

A. The Test Statistics and Estimators of µ and θ
The statistics used to test different values of the strength parameter µ are defined in terms of a likelihood function L(µ, θ). The maximum likelihood estimates (MLEs)μ andθ are the values of the parameters that maximize the likelihood function L(µ, θ). The conditional maximum likelihood estimate (CMLE)θ(µ) is the value of θ that maximizes the likelihood function with µ fixed. The tests are based on the profile likelihood ratio λ(µ), which reflects the level of compatibility between the data and µ. It is defined as Physically, the rate of signal events is non-negative, thus µ ≥ 0. However, it is convenient to define the estimatorμ as the value of µ that maximizes the likelihood, even if is negative (as long as the pdf f c (x c |µ, θ) ≥ 0 everywhere). In particular,μ < 0 indicates a deficit of events with respect to the background in the signal region. Following Ref. [71] a treatment equivalent to requiring µ ≥ 0 is to allow µ < 0 and impose the constraint in the test statistic itself, i.e.
To quantify the significance of an excess, the test statisticq 0 is used to test the background-only hypothesis µ = 0 against the alternative hypothesis µ > 0. It is defined asq Instead of defining the test statistic to be identically zero forμ ≤ 0 as in Ref. [71], this sign change is introduced in order to probe p-values larger than 50%.
For setting an upper limit on µ, the test statisticq µ is used to test the hypothesis of signal events being produced at a rate µ against the alternative hypothesis of signal events being produced at a lesser rate µ ′ < µ: Again, a sign change is introduced in order to probe p-values larger than 50%. The test statistic −2 ln λ(µ) is used to differentiate signal events being produced at a rate µ from the alternative hypothesis of signal events being produced at a different rate µ ′ = µ.
Tests of µ are carried out with the Higgs mass m H fixed to a particular value, and the entire procedure is repeated for values of m H spaced in small steps.

B. The Distribution of the Test Statistic and p-values
When calculating upper limits, a range of values of µ is explored using the test statisticq µ . The value of the test statistic for the observed data is denotedq µ,obs . One can construct the distribution ofq µ assuming a different value of the signal strength µ ′ , which is denoted The distribution depends explicitly on m H and θ. The p-value is given by the tail probability of this distribution, and thus the p-value will also depend on m H and θ. The reason for choosing the test statistic based on the profile likelihood ratio is that, with sufficiently large numbers of events, the distribution of the profile likelihood ratio with µ = µ ′ is independent of the values of the nuisance parameters and, thus also the associated p-values.
In practice, there is generally some residual dependence of the p-values on the value of θ. The values of the nuisance parameters that maximize the p-value are therefore sought. Following Refs. [25, [72][73][74][75], the p-values for testing a particular value of µ are based on the distribution constructed atθ(µ, obs), the CMLE estimated with the observed data, as follows: The ensemble includes randomizing both D and G.
Here the distribution ofq µ assumes that the data D as well as the global observables G are treated as measured quantities, i.e., they fluctuate upon repetition of the experiment according to the model f tot (D com , G|µ, m H , θ).
Upper limits for the strength parameter µ are calculated using the CL s procedure [76]. To calculate this limit, the quantity CL s is defined as the ratio where p b is the p-value derived from the same test statistic under the background-only hypothesis f (q µ |0, m H ,θ(µ = 0, obs))dq µ . (17) The CL s upper limit on µ is denoted µ up and obtained by solving for CL s (µ up ) = 5%. A value of µ is regarded as excluded at the 95% confidence level if µ > µ up .
The significance of an excess is based on the compatibility of the data with the background-only hypothesis. This compatibility is quantified by the following p-value: Note that p 0 and p b are both p-values of the backgroundonly hypothesis, but the test statisticq 0 in Eq. (18) is optimized for discovery while the test statisticq µ in Eq. (17) is optimized for upper limits.
It is customary to convert the background-only p-value into an equivalent Gaussian significance Z (often written Zσ). The conversion is defined as where Φ −1 is the inverse of the cumulative distribution for a standard Gaussian.

C. Experimental Sensitivity and Bands
It is useful to quantify the experimental sensitivity by means of the significance one would expect to find if a given signal hypothesis were true. Similarly, the expected upper limit is the median upper limit one would expect to find if the background-only hypothesis were true. Although these are useful quantities, they are subject to a certain degree of ambiguity because the median values depend on the assumed values of all of the parameters of the model, including the nuisance parameters.
Here, the expected upper limit is defined as the median of the distribution f (µ up |0, m H ,θ(µ = 0, obs)) and the expected significance is based on the median of the distribution f (p 0 |1, m H ,θ(µ = 1, obs)). The expected limit and significance thus have a small residual dependence on the observed data throughθ(µ, obs). These distributions are also used to define bands around the median upper limit. The standard presentation of upper limits includes the observed limit, the expected limit, a ±1σ and a ±2σ band. More precisely, the edges of these bands, denoted µ up±1 and µ up±2 , are defined by For large data samples, the asymptotic distributions f (q µ |µ ′ , m H , θ) and f (q 0 |µ ′ , m H , θ) are known and described in Ref. [71]. These formulae require the variance of the maximum likelihood estimate of µ given µ ′ is the true value: One result of Ref. [71] is that σ µ ′ can be estimated with an artificial dataset referred to as the Asimov dataset. This dataset is defined as a binned dataset, where the number of events in bin b is exactly the number of events expected in bin b. The value of the test statistic evaluated on the Asimov data is denotedq µ,Aα , where the subscript A α denotes that this is the Asimov data associated with α. A convenient way to estimate the variance ofμ is In the asymptotic limit,q µ is parabolic, thus σ µ ′ is independent of µ. In Ref. [71], the bands around the expected limit are given by An improved procedure is used here in order to capture the leading deviations ofq µ from a parabola, corresponding to departures in the distribution ofμ from a Gaussian distribution centered at µ ′ with variance σ 2 µ ′ . Finding the upper limit µ up (still using the formulae in Ref. [71]) for an Asimov dataset constructed with α N = (µ N , m H ,θ(µ N , obs)), where µ N is the value of µ corresponding to the edges of the µ up±N band. It is found to be more accurate than Eq. (23) in reproducing the bands obtained with ensembles of pseudo-experiments. The value of µ N used for the +N σ band is the value of µ that satisfies q µN ,A0 = N . The choice ofθ(µ N , obs) is more indicative ofθ for the pseudo-experiments that have µ up near the corresponding band.

E. Bayesian Methods
A posterior distribution for the signal strength parameter µ can be obtained, via Bayes's theorem, with f tot (D, G|α) and a prior η(α). The information about the nuisance parameters from auxiliary measurements is incorporated into f tot (D, G|α) through the constraint terms f p (a p |α p ). As in Eq. (6), the prior on η(α p ) is taken as a uniform distribution. The upper limits are calculated for a given value of m H , so no prior on m H is needed. The prior on the signal strength parameter µ is taken to be uniform as this choice leads to one-sided Bayesian credible intervals that correspond numerically with the CL s upper limits in the frequentist formalism for the simple Gaussian and Poisson cases, and have been observed to coincide in more complex situations. The one-sided, 95% Bayesian credible region is defined as The integration over nuisance parameters is carried out with Markov Chain Monte Carlo using the Metropolis-Hastings algorithm in the RooStats package [26].

F. Correction for the Look Elsewhere Effect
By scanning over m H and repeatedly testing the background-only hypothesis, the procedure is subject to effects of multiple testing referred to as the lookelsewhere effect. In principle, the confidence intervals can be constructed in the µ − m H plane using the profile likelihood ratio λ(µ, m H ). For µ = 0 there is no signal present, thus the model is independent of m H . This leads to a background-only distribution for −2 ln λ(0, m H ) that departs from a chi-square distribution, thus complicating the calculation of a global p 0 when m H is not specified. The global test statistic is the supremum of q 0 (m H ) with respect to m H In the asymptotic regime and for very small p-values, a procedure [77], based on the result of Ref. [78], exists to estimate the tail probability for q 0 (m H ). The procedure requires an estimate for the average number of up-crossings of q 0 (m H ) above some threshold. Due to the m H -dependence in the model, changes in cuts, and the list of channels included, it is technically difficult to estimate this quantity with ensembles of pseudoexperiments. Instead, a simple alternative method is used in which the average number of up-crossings is estimated by counting the number of up-crossings with the observed data. When the trials factor is large, the number of up-crossings at low thresholds is also large and thus a satisfactory estimate of the average [25]. This procedure has been checked using a large number of pseudoexperiments in a simplified test case, where it provided a good estimate of the trials factor.

VI. EXCLUSION LIMITS
The model discussed in Section II is used for all channels described in Section III and the systematic uncertainties summarized in Section IV. The statistical methods described in Section V are used to set limits on the signal strength as a function of m H .
The expected and observed limits from the individual channels entering this combination are shown in Fig. 3. The combined 95% CL exclusion limits on µ are shown in Fig. 4 as a function of m H . These results are based on the asymptotic approximation. The ±1σ and ±2σ variation bands around the median background expectation are calculated using the improved procedure described in Section V D, which yields slightly larger bands compared to those in Ref. [14]. Typically the increase in the bands is of the order of ∼ 5% for the ±1σ band and 10-15% for the ±2σ band. This procedure has been validated using ensemble tests and the Bayesian calculation of the exclusion limits with a uniform prior on the signal strength described in Section V E. These approaches yield limits on µ which typically agree with the asymptotic results within a few percent. Exp. Comb.  The expected 95% CL exclusion region for the SM (µ = 1) hypothesis covers the m H range from 120 GeV to 560 GeV. The addition of the H → τ + τ − and H → bb channels as well as the update of the H → W W ( * ) → ℓ + νℓ − ν channel bring a significant gain in sensitivity in the low-mass region with respect to the previous combined search. For Higgs boson mass hypotheses below approximately 122 GeV the dominant channel is H → γγ. For mass hypotheses larger than 122 GeV but smaller than 200 GeV the H → W W ( * ) → ℓ + νℓ − ν channel is the most sensitive. In the mass range between ∼200 GeV and ∼300 GeV the H → ZZ ( * ) → ℓ + ℓ − ℓ + ℓ − dominates. For higher mass hypotheses the H → ZZ → ℓ + ℓ − νν channel leads the search sensitivity. The updates of the H → W W ( * ) → ℓ + νℓ − ν, H → W W → ℓνqq ′ , H → ZZ → ℓ + ℓ − νν, and H → ZZ → ℓ + ℓ − qq channels improve the sensitivity in the high-mass region. The observed exclusion regions range from 111.4 GeV to 116.6 GeV, from 119.4 GeV to 122.1 GeV, and from 129.2 GeV to 541 GeV at 95% CL under the SM (µ = 1) hypothesis. The mass range 122.1 GeV to 129.2 GeV is not excluded due to the observation of an excess of events above the expected background. This excess and its significance are discussed in detail in Section VII.
Two mass regions where the observed exclusion is stronger than expected can be seen in Fig. 4. In the low mass range, Higgs mass hypotheses in the 111.4 GeV to 116.6 GeV range are excluded due mainly to a local deficit of events in the diphoton channel with respect to the expected background. A similar deficit is observed in the high mass region in the range 360 GeV to 420 GeV, resulting from deficits in the H → ZZ ( * ) → ℓ + ℓ − ℓ + ℓ − and H → ZZ → ℓ + ℓ − νν channels. Both fluctuations correspond to approximately two standard deviations in the distribution of upper limits expected from background only.
A small mass region near m H ∼ 245 GeV was not excluded at the 95% CL in the combined search of Ref. [14], mainly due to a slight excess in the H → ZZ ( * ) → ℓ + ℓ − ℓ + ℓ − channel. This mass region is now excluded. The CL s values for µ = 1 as a function of the Higgs boson are shown in Fig. 5, where it can also be seen that the region between 130.7 GeV and 506 GeV is excluded at the 99% CL. The observed exclusion covers a large part of the expected exclusion range.

VII. SIGNIFICANCE OF THE EXCESS
The observed local p-values, calculated using the asymptotic approximation, as a function of m H and the expected value in the presence of a SM Higgs boson signal at that mass are shown in Fig. 6 in the entire search mass range and in the low mass range. The asymptotic approximation has been verified using ensemble tests which yield numerically consistent results.
The largest significance for the combination is observed for m H =126 GeV, where it reaches 3.0σ with an expected value in the presence of a signal at that mass of 2.9σ. The observed (expected) local significances for m H =126 GeV are 2.8σ (1.4σ) in the H → γγ channel and 2.1σ (1.4σ) in the H → ZZ ( * ) → ℓ + ℓ − ℓ + ℓ − channel. In the H → W W ( * ) → ℓ + νℓ − ν channel, which has been updated and includes additional data, the observed (expected) local significance for m H =126 GeV is 0.8σ (1.9σ); the observed significance was previously 1.4σ [14].
The significance of the excess is not very sensitive to energy scale and resolution systematic uncertainties for photons and electrons; however, the presence of these uncertainties leads to a small deviation from the asymptotic approximation. The observed p 0 including these effects is therefore estimated using ensemble tests. The results are displayed in Fig. 6 as a function of m H . The effect of the energy scale systematic uncertainties is an increase of approximately 30% of the corresponding local p 0 . The maximum local significance decreases slightly to 2.9σ. The muon momentum scale systematic uncertainties are smaller and therefore neglected.
The global p-values for the largest excess depends on the range of m H and the channels considered. The global p 0 associated with a 2.8σ excess anywhere in the H → γγ search domain of 110-150 GeV is approximately 7%. A 2.1σ excess anywhere in the H → ZZ ( * ) → ℓ + ℓ − ℓ + ℓ − search range of 110-600 GeV corresponds to a global p 0 of approximately 30%. The global probability for a 2.9σ excess in the combined search to occur anywhere in the mass range 110-600 GeV is estimated to be approximately 15%, decreasing to 5-7% in the range 110-146 GeV, which is not excluded at the 99% CL by the LHC combined SM Higgs boson search [62]. The data are observed to be consistent with the background-only hypothesis except for the region around m H = 126 GeV. The observed and expected ratio −2 ln(λ(1)/λ(0)) is shown in Fig. 7, which indicates a departure from the background-only hypothesis similar to the signal-plus-background expectation.     The best-fit value of µ, denotedμ, is displayed for the combination of all channels in Fig. 8 and for individual channels in Fig. 9 as a function of the m H hypothesis. A summary of −2 ln λ(µ) < 1 intervals at three specific Higgs boson mass hypotheses (m H =119 GeV, 126 GeV and 130 GeV) for each Higgs decay mode and the combination is given in Fig. 10. The bands aroundμ illustrate the µ interval corresponding to −2 ln λ(µ) < 1 and represent an approximate ±1σ variation. While the estimatorμ is allowed to be negative in Figs. 8 and 9 in order to illustrate the presence and extent of downward fluctuations, the µ parameter is bounded to ensure non-negative values of the probability density functions in the individual channels. Hence, for negativeμ values close to the boundary, the −2 ln λ(µ) < 1 region does not reflect a calibrated 68% confidence interval. It should be noted that theμ does not directly provide information on the relative strength of the production modes. The excess observed for m H = 126 GeV corresponds to aμ of 1.1 ± 0.4, which is compatible with the signal strength expected from a SM Higgs boson at that mass (µ = 1). [GeV]