Observation and measurement of Higgs boson decays to $WW^{\ast}$ with the ATLAS detector

We report the observation of Higgs boson decays to $WW^{\ast}$ based on an excess over background of $6.1$ standard deviations in the dilepton final state, where the Standard Model expectation is $5.8$ standard deviations. Evidence for the vector-boson fusion (VBF) production process is obtained with a significance of 3.2 standard deviations. The results are obtained from a data sample corresponding to an integrated luminosity of $25\,\textrm{fb}^{-1}$ from $\sqrt{s}=7$ and 8 TeV $pp$ collisions recorded by the ATLAS detector at the LHC. For a Higgs boson mass of 125.36 GeV, the ratio of the measured value to the expected value of the total production cross section times branching fraction is $1.09^{+0.16}_{-0.15}\,\textrm{(stat)}^{+0.17}_{-0.14}\,\textrm{(syst)}$. The corresponding ratios for the gluon fusion and vector-boson fusion production mechanisms are $1.02\pm 0.19\,\textrm{(stat)}^{+0.22}_{-0.18}\,\textrm{(syst)}$ and $1.27^{+0.44}_{-0.40}\,\textrm{(stat)}^{+0.30}_{-0.21}\,\textrm{(syst)}$, respectively. At $\sqrt{s}=8$ TeV, the total production cross sections are measured to be $\sigma(gg\to$ $H\rightarrow WW^\ast) = 4.6\pm0.9\,\textrm{(stat)}\,^{+0.8}_{-0.7}\,\textrm{(syst)}\textrm{pb}$ and $\sigma(\textrm{VBF}$ $H\rightarrow WW^\ast) = 0.51\,^{+0.17}_{-0.15}\,\textrm{(stat)}\,^{+0.13}_{-0.08}\,\textrm{(syst)}\textrm{pb}$. The fiducial cross section is determined for the gluon-fusion process in exclusive final states with zero or one associated jet.


I. INTRODUCTION
In the Standard Model of particle physics (SM), the Higgs boson results from the Brout-Englert-Higgs mechanism [1] that breaks the electroweak symmetry [2] and gives mass to the W and Z gauge bosons [3]. It has a spin-parity of 0 + , with couplings to massive particles that are precisely determined by their measured masses. A new particle compatible with the spin and gauge-boson couplings of the SM Higgs boson was discovered in 2012 by the ATLAS and CMS experiments at the LHC using the ZZ * , γγ, and W W * final states [4][5][6][7][8]. Measurements of the particle's mass [8,9] yield a value of approximately 125 GeV, consistent with the mass of the SM Higgs boson provided by a global fit to electroweak measurements [10]. Evidence for production of this boson at the Tevatron [11] and for its decay to fermions at the LHC [12] are also consistent with the properties of the SM Higgs boson.
The direct observation of the Higgs boson in individual decay channels provides an essential confirmation of the SM predictions. For a Higgs boson with a mass of 125 GeV, the H → W W * decay has the second largest branching fraction (22%) and is a good candidate for observation. The sequential decay H → W W * → ν ν, where is an electron or muon, is a sensitive experimental signature. Searches for this decay produced the first direct limits on the mass of the Higgs boson at a hadron collider [13,14], and measurements following the boson discovery are among the most precise in determining its couplings and spin [5][6][7].
The dominant Higgs boson production mode in highenergy hadron collisions is gluon fusion (ggF), where the interacting gluons produce a Higgs boson predominantly through a top-quark loop. The next most abundant production mechanism, with a factor of twelve reduction in * Full author list given at the end of the article. rate, is the fusion of vector bosons radiated by the interacting quarks into a Higgs boson (vector-boson fusion or VBF). At a further reduced rate, a Higgs boson can be produced in association with a W or Z boson (VH production). The leading-order production processes are depicted in Fig. 1.
This paper describes the observation and measurement of the Higgs boson in its decay to a pair of W bosons, with the Higgs boson produced by the ggF and VBF processes at center-of-mass energies of 7 and 8 TeV. The ggF production process probes Higgs boson couplings to heavy quarks, while the VBF and VH processes probe its couplings to W and Z bosons. The branching fraction B H → W W * is sensitive to Higgs boson couplings to the fermions and bosons through the total width. To constrain these couplings, the rates of the ggF and VBF H → W W * processes are measured-individually and combined-and normalized by the SM predictions for a Higgs boson with mass 125.36 GeV [9] to obtain the "signal strength" parameters µ, µ ggf , and µ vbf . The total cross section for each process is also measured, along with fiducial cross sections for the ggF process.
A prior measurement of these processes with the same data set yielded a combined result of µ = 1.0 ± 0.3 [5]. The results presented here supersede this measurement and contain improvements in signal acceptance, background determination and rejection, and signal yield extraction. Together, these improvements increase the expected significance of an excess of H → W W * decays over background from 3.7 to 5.8 standard deviations, and reduce the expected relative uncertainty on the corresponding µ measurement by 30%.
The paper is organized as follows. Section II provides an overview of the signal and backgrounds, and of the data analysis strategy. Section III describes the ATLAS detector and data, and the event reconstruction. The selection of events in the different final states is given in Sec. IV. Sections V and VI discuss the modeling of the signal and the background processes, respectively. The signal yield extraction and the various sources of systematic uncertainty are described in Sec. VII. Section VIII provides the event yields and the distributions of the final discriminating variables. The results are presented in Sec. IX, and the conclusions given in Sec. X.

II. ANALYSIS OVERVIEW
The H → W W * final state with the highest purity at the LHC occurs when each W boson decays leptonically, W → ν, where is an electron or muon. The analysis therefore selects events consistent with a final state containing neutrinos and a pair of opposite-charge leptons. The pair can be an electron and a muon, two electrons, or two muons. The relevant backgrounds to these final states are shown in Table I and are categorized as W W , top quarks, misidentified leptons, other dibosons, and Drell-Yan. The distinguishing features of these backgrounds, discussed in detail below, motivate the definition of event categories based on lepton flavor and jet multiplicity, as illustrated in Fig. 2. In the final step of the analysis, a profile likelihood fit is simultaneously performed on all categories in order to extract the signal from the backgrounds and measure its yield.
The Drell-Yan (DY) process is the dominant source of events with two identified leptons, and contributes to the signal final state when there is a mismeasurement of the net particle momentum in the direction transverse to the beam (individual particle momentum in this direction is denoted p t ). The DY background is strongly reduced in events with different-flavor leptons (eµ), as these arise through fully leptonic decays of τ -lepton pairs with a small branching fraction and reduced lepton momenta. The analysis thus separates eµ events from those with same-flavor leptons (ee/µµ) in the event selection and the likelihood fit. Pairs of top quarks are also a prolific source of lepton pairs, which are typically accompanied by highmomentum jets. Events are removed if they have a jet identified to contain a b-hadron decay (b-jet), but the tt background remains large due to inefficiencies in the b-jet identification algorithm. Events are therefore categorized by the number of jets. The top-quark background provides a small contribution to the zero-jet category but represents a significant fraction of the total background in categories with one or more jets.
In events with two or more jets, the sample is separated by signal production process ("VBF-enriched" and "ggFenriched"). The VBF process is characterized by two quarks scattered at a small angle, leading to two wellseparated jets with a large invariant mass [15]. These and other event properties are inputs to a boosted decision tree (BDT) algorithm [16] that yields a single-valued discriminant to isolate the VBF process. A separate analysis based on a sequence of individual selection criteria provides a cross-check of the BDT analysis. The ggFenriched sample contains all events with two or more jets that do not pass either of the VBF selections.
Due to the large Drell-Yan and top-quark backgrounds in events with same-flavor leptons or with jets, the most Analysis divisions in categories based on jet multiplicity (nj) and lepton-flavor samples (eµ and ee/µµ). The most sensitive signal region for ggF production is nj = 0 in eµ, while for VBF production it is nj ≥ 2 in eµ. These two samples are underlined. The eµ samples with nj ≤ 1 are further subdivided as described in the text.
sensitive signal region is in the eµ zero-jet final state. The dominant background to this category is W W production, which is effectively suppressed by exploiting the properties of W boson decays and the spin-0 nature of the Higgs boson (Fig. 3). This property generally leads to a lepton pair with a small opening angle [17] and a correspondingly low invariant mass m , broadly distributed in the range below m H /2. The dilepton invariant mass is used to select signal events, and the signal likelihood fit is performed in two ranges of m in eµ final states with n j ≤ 1.
Other background components are distinguished by p 2 t , the magnitude of the transverse momentum of the lower-p t lepton in the event (the "subleading" lepton). In the signal process, one of the W bosons from the Higgs boson decay is off shell, resulting in relatively low subleading lepton p t (peaking near 22 GeV, half the difference between the Higgs and W boson masses). In the background from W bosons produced in association with a jet or photon (misreconstructed as a lepton) or an offshell photon producing a low-mass lepton pair (where one lepton is not reconstructed), the p 2 t distribution falls rapidly with increasing p t . The eµ sample is therefore subdivided into three regions of subleading lepton momentum for n j ≤ 1. The jet and photon misidentification rates differ for electrons and muons, so this sample is further split by subleading lepton flavor.
Because of the neutrinos produced in the signal process, it is not possible to fully reconstruct the invariant mass of the final state. However, a "transverse mass" Illustration of the H → W W decay. The small arrows indicate the particles' directions of motion and the large double arrows indicate their spin projections. The spin-0 Higgs boson decays to W bosons with opposite spins, and the spin-1 W bosons decay into leptons with aligned spins. The H and W boson decays are shown in the decaying particle's rest frame. Because of the V − A decay of the W bosons, the charged leptons have a small opening angle in the laboratory frame. This feature is also present when one W boson is off shell. m t [18] can be calculated without the unknown longitudinal neutrino momenta: where E t = (p t ) 2 + (m ) 2 , p νν t (p t ) is the vector sum of the neutrino (lepton) transverse momenta, and p νν t (p t ) is its modulus. The distribution has a kinematic upper bound at the Higgs boson mass, effectively separating Higgs boson production from the dominant nonresonant W W and top-quark backgrounds. For the VBF analysis, the transverse mass is one of the inputs to the BDT distribution used to fit for the signal yield. In the ggF and cross-check VBF analyses, the signal yield is obtained from a direct fit to the m t distribution for each category.
Most of the backgrounds are modeled using Monte Carlo samples normalized to data, and include theoretical uncertainties on the extrapolation from the normalization region to the signal region, and on the shape of the distribution used in the likelihood fit. For the W +jet(s) and multijet backgrounds, the high rates and the uncertainties in modeling misidentified leptons motivate a model of the kinematic distributions based on data. For a few minor backgrounds, the process cross sections are taken from theoretical calculations. Details of the background modeling strategy are given in Sec. VI.
The analyses of the 7 and 8 TeV data sets are separate, but use common methods where possible; differences arise primarily because of the lower instantaneous and integrated luminosities in the 7 TeV data set. As an example, the categorization of 7 TeV data does not include a ggF-enriched category for events with at least two jets, since the expected significance of such a category is very low. Other differences are described in the text or in dedicated subsections.

III. DATA SAMPLES AND RECONSTRUCTION
This section begins with a description of the ATLAS detector, the criteria used to select events during datataking (triggers) and the data sample used for this analysis. A description of the event reconstruction follows. The Monte Carlo simulation samples used in this analysis are described next, and then differences between the 2012 and 2011 analyses are summarized.

A. Detector and data samples
The ATLAS detector [19] is a multipurpose particle detector with approximately forward-backward symmetric cylindrical geometry. The experiment uses a righthanded coordinate system with the origin at the nominal pp interaction point at the center of the detector. The positive x-axis is defined by the direction from the origin to the center of the LHC ring, the positive y-axis points upwards, and the z-axis is along the beam direction. Cylindrical coordinates (r, φ) are used in the plane transverse to the beam; φ is the azimuthal angle around the beam axis. Transverse components of vectors are indicated by the subscript T. The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2).
The inner tracking detector (ID) consists of a siliconpixel detector, which is closest to the interaction point, a silicon-microstrip detector surrounding the pixel detector-both covering | η | < 2.5-and an outer transition-radiation straw-tube tracker (TRT) covering | η | < 2. The TRT provides substantial discriminating power between electrons and pions over a wide energy range. The ID is surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field.
The muon spectrometer (MS) surrounds the calorimeters and is designed to detect muons in the pseudorapidity range | η | < 2.7. The MS consists of one barrel (| η | < 1.05) and two endcap regions. A system of three large superconducting air-core toroid magnets, each with eight coils, provides a magnetic field with a bending integral of about 2.5 T · m in the barrel and up to 6 T · m in the endcaps. Monitored drift tube chambers in both the barrel and endcap regions and cathode strip chambers covering 2.0 < | η | < 2.7 are used as precisionmeasurement chambers, whereas resistive plate chambers in the barrel and thin gap chambers in the endcaps are used as trigger chambers, covering | η | < 2.4. The chambers are arranged in three layers, so high-p t particles traverse at least three stations with a lever arm of several meters.
A three-level trigger system selects events to be recorded for offline analysis. The first level (Level-1 trigger) is hardware-based, and the second two levels (High-level trigger) are software-based. This analysis uses events selected by triggers that required either a single lepton or two leptons (dilepton). The singlelepton triggers had more restrictive lepton identification requirements and higher p t thresholds than the dilepton triggers. The specific triggers used for the 8 TeV data with the corresponding thresholds at the hardware and software levels are listed in Table II. Offline, two leptons-either ee, µµ or eµ-with opposite charge are required. The leading lepton ( 1 ) is required to have p t ≥ 22 GeV and the subleading lepton ( 2 ) is required to have p t ≥ 10 GeV.
The efficiency of the trigger selection is measured using a tag-and-probe method with a data sample of Z/γ * → ee, µµ candidates. For muons, the single-lepton trigger efficiency varies with η and is approximately 70% for | η | < 1.05 and 90% for | η | > 1.05. For electrons, the single-lepton trigger efficiency increases with p t , and its average is approximately 90%. These trigger efficiencies are for leptons that satisfy the analysis selection criteria described below. Dilepton triggers increase the signal acceptance by allowing lower leading-lepton p t thresholds to be applied offline while still remaining in the kinematic range that is in the plateau of the trigger efficiency. The trigger efficiencies for signal events satisfying the selection criteria described in Sec. IV are 95% for events with a leading electron and a subleading muon, 81% for events with a leading muon and subleading electron, 89% for µµ events and 97% for ee events. These efficiencies are for the n j = 0 category; the efficiencies are slightly larger for categories with higher jet multiplicity.
The data are subjected to quality requirements: events recorded when the relevant detector components were not operating correctly are rejected. The resulting integrated luminosity is 20. Overlapping signals in the detector due to these multiple interactions-as well as signals due to interactions occurring in other nearby bunch crossings-are referred to as "pile-up."

B. Event reconstruction
The primary vertex of each event must have at least three tracks with p t ≥ 400 MeV and is selected as the vertex with the largest value of Σ (p t ) 2 , where the sum is over all the tracks associated with that particular vertex.
Muon candidates are identified by matching a recon- II. Summary of the minimum lepton p t trigger requirements (in GeV) during the 8 TeV data-taking. For singleelectron triggers, the hardware and software thresholds are either 18 and 24i or 30 and 60, respectively. The "i" denotes an isolation requirement that is less restrictive than the isolation requirement imposed in the offline selection. For dilepton triggers, the pair of thresholds corresponds to the leading and subleading lepton, respectively; the "µ, µ" dilepton trigger requires only a single muon at Level-1. The "and" and "or" are logical. structed ID track with a reconstructed MS track [20]. The MS track is required to have a track segment in all three layers of the MS. The ID tracks are required to have at least a minimum number of associated hits in each of the ID subdetectors to ensure good track reconstruction. This analysis uses muon candidates referred to as "combined muons" in Ref. [20], in which the track parameters of the MS track and the ID track are combined statistically. Muon candidates are required to have | η | < 2.50. The efficiencies for reconstructing and identifying combined muons are provided in Ref. [20]. Electron candidates are clusters of energy deposited in the electromagnetic calorimeter that are associated with ID tracks [21]. All candidate electron tracks are fitted using a Gaussian sum filter [22] (GSF) to account for bremsstrahlung energy losses. The GSF fit reduces the difference between the energy measured in the calorimeter and the momentum measured in the ID and improves the measured electron direction and impact parameter resolutions. The impact parameter is the lepton track's distance of closest approach in the transverse plane to the reconstructed position of the primary vertex. The electron transverse energy is computed from the cluster energy and the track direction at the interaction point.
Electron identification is performed in the range | η | < 2.47, excluding the transition region between the barrel and endcap EM calorimeters, 1.37 < | η | < 1.52. The identification is based on criteria that require the longitudinal and transverse shower profiles to be consistent with those expected for electromagnetic showers, the track and cluster positions to match in η and φ, and signals of transition radiation in the TRT. The electron identification has been improved relative to that described in Ref. [5] by adding a likelihood-based method in addition to the selection-based method. The likelihood allows the inclusion of discriminating variables that are difficult to use with explicit requirements without incurring significant efficiency losses. Detailed discussions of the likelihood identification and selectionbased identification and the corresponding efficiency measurements can be found in Ref. [23]. Electrons with 10 < E t < 25 GeV must satisfy the "very tight" likelihood requirement, which reduces backgrounds from light-flavor jets and photon conversions by 35% relative to the selection-based identification with the same signal efficiency. For E t > 25 GeV, where misidentification backgrounds are less important, electrons must satisfy the "medium" selection-based requirement. The singlelepton trigger applies the medium selection-based requirements. Using a likelihood-based selection criterion in addition to this selection-based requirement would result in a loss of signal efficiency without sufficient compensation in background rejection. Finally, additional requirements reduce the contribution of electrons from photon conversions by rejecting electron candidates that have an ID track that is part of a conversion vertex or that do not have a hit in the innermost layer of the pixel detector.
To further reduce backgrounds from misidentified leptons, additional requirements are imposed on the lepton impact parameter and isolation. The significance of the transverse impact parameter, defined as the measured transverse impact parameter d 0 divided by its estimated uncertainty σ d0 , is required to satisfy | d 0 |/σ d0 < 3.0; the longitudinal impact parameter z 0 must satisfy the requirement | z 0 sin θ | < 0.4 mm for electrons and 1.0 mm for muons.
Lepton isolation is defined using track-based and calorimeter-based quantities. Details about the definition of electron isolation can be found in Ref. [23]. The track isolation is based on the scalar sum Σ p t of all tracks with p t > 400 MeV for electrons (p t > 1 GeV for muons) that are found in a cone in η-φ space around the lepton, excluding the lepton track. Tracks used in this scalar sum are required to be consistent with coming from the primary vertex. The cone size is ∆R = 0.4 for leptons with p t < 15 GeV, where ∆R = (∆φ) 2 + (∆η) 2 , and ∆R = 0.3 for p t > 15 GeV. The track isolation selection criterion uses the ratio of the Σ p t divided by the electron E t (muon p t ). This ratio is required to be less than 0.06 for leptons with 10 < p t < 15 GeV, and this requirement increases monotonically to 0.10 for electrons (0.12 for muons) for p t > 25 GeV.
The calorimeter isolation selection criterion-like the track isolation-is based on a ratio.
The relative calorimetric isolation for electrons is computed as the sum of the cluster transverse energies Σ E t of surrounding energy deposits in the electromagnetic and hadronic calorimeters inside a cone of ∆R = 0.3 around the candidate electron cluster, divided by the electron E t . The cells within 0.125 × 0.175 in η × φ around the electron cluster barycenter are excluded. The pile-up and underlying-event contributions to the calorimeter isolation are estimated and subtracted event-by-event.
The electron relative calorimetric isolation upper bound varies monotonically with electron E t : it is 0.20 for 10 < E t < 15 GeV, increasing to 0.28 for E t > 25 GeV. In the case of muons, the relative calorimetric isolation discriminant is defined as the Σ E t calculated from calorimeter cells within ∆R = 0.3 of the muon candidate, and with energy above a noise threshold, divided by the muon p t . All calorimeter cells within the range ∆R < 0.05 around the muon candidate are excluded from Σ E t . A correction based on the number of reconstructed primary vertices in the event is made to Σ E t to compensate for extra energy due to pile-up. The muon relative calorimetric isolation upper bound also varies monotonically with muon p t ; it is 0.06 for 10 < p t < 15 GeV, increasing to 0.28 for p t > 25 GeV. The signal efficiencies of the impact parameter and isolation requirements are measured using a tag-and-probe method with a data sample of Z/γ * → ee, µµ candidates. The efficiencies of the combined impact parameter and isolation requirements range from 68% (60%) for electrons (muons) with 10 < p t < 15 GeV to greater than 90% (96%) for electrons (muons) with p t > 25 GeV.
Jets are reconstructed using the anti-k t sequential recombination clustering algorithm [25] with a radius parameter R = 0.4. The inputs to the reconstruction are three-dimensional clusters of energy [26,27] in the calorimeter. The algorithm for this clustering suppresses noise by keeping only cells with a significant energy deposit and their neighboring cells. To take into account the differences in calorimeter response to electrons and photons and hadrons, each cluster is classified, prior to the jet reconstruction, as coming from an electromagnetic or hadronic shower using information from its shape. Based on this classification, the local cell signal weighting calibration method [28] applies dedicated corrections for the effects of calorimeter noncompensation, signal losses due to noise threshold effects and energy lost in regions that are not instrumented. Jets are corrected for contributions from in-time and out-of-time pile-up [29], and the position of the primary interaction vertex. Subsequently, the jets are calibrated to the hadronic energy scale using p t -and η-dependent correction factors determined in a first pass from simulation and then refined in a second pass from data [27,28]. The systematic uncertainties on these correction factors are determined from the same control samples in data.
To reduce the number of jet candidates originating from pile-up vertices, a requirement is imposed on the jet vertex fraction, denoted jvf: for jets with p t < 50 GeV and | η | < 2.4, more than 50% of the summed scalar p t of tracks within ∆R = 0.4 of the jet axis must be from tracks associated with the primary vertex (jvf > 0.50) [30]. No jvf selection requirement is applied to jets that have no associated tracks.
For the purposes of classifying an event in terms of jet multiplicity n j , a jet is required to have p j t > 25 GeV for | η j | < 2.4, and p j t > 30 GeV if 2.4 ≤ | η j | < 4.5. The increased threshold in the higher-| η | region suppresses jets from pile-up. The two highest-p t jets (j 1 , j 2 , ordered in p t ) are the "VBF jets" used to compute dijet variables in the VBF-enhanced n j ≥ 2 category.
Additional jets not counted in n j have lower thresholds in three scenarios. First, those used to reject events because they lie in the η range spanned by the two leading jets in the VBF-enriched selection (see Sec. IV C) are considered if they have p j t > 20 GeV. Second, the jets for b-jet identification-described below-are required to have p j t > 20 GeV and | η j | < 2.4. Third, the jets used for the calculation of soft hadronic recoil (see Sec. IV A and the f recoil definition therein) are required to have p j t > 10 GeV and have no jvf requirement. The calibration procedure described above is applied only to jets with p j t > 20 GeV. Jets with 10 GeV < p j t < 20 GeV are used only in the f recoil definition, and the efficiency for the requirements on this quantity are measured directly from the data, so the analysis is not sensitive to the modeling of the energy scale of these soft jets in the Monte Carlo simulation.
The identification of b-quark jets (b-jets) is limited to the acceptance of the ID (| η | < 2.5). The b-jets are identified with a multivariate technique-the MV1 algorithm [31]-that is based on quantities that separate b and c jets from "light jets" arising from light-flavor quarks and gluons. The inputs [32] to this algorithm use quantities such as the presence of secondary vertices, the impact parameters of tracks, and the topologies of weak heavy-quark decays. The efficiency for identifying b-jets is measured [33] in a large data sample of dilepton tt pair candidates. An operating point that is 85% efficient for identifying b-jets is adopted. At this operating point, the probability of misidentifying a light jet as a b-jet is 10.3%.
Two leptons or a lepton and a jet may be close in η-φ space. The following procedure is adopted in the case of overlapping objects. Electron candidates that have tracks that extend to the MS are removed. If a muon candidate and an electron candidate are separated by ∆R < 0.1, then the muon is retained, and the electron is removed. These cases usually indicate a muon that has undergone bremsstrahlung in the ID material or calorimeter. A high-p t electron is always also reconstructed as a jet, so if an electron and the nearest jet are separated by less than ∆R = 0.3, the jet is removed. In contrast, if a muon and a jet are separated by less than ∆R = 0.3, the muon candidate is removed, as it is more likely to be a nonprompt muon from heavy-flavor decay. Finally, due to early bremsstrahlung, a prompt electron may produce more than one electron candidate in its vicinity. In the case of two electrons separated by less than ∆R = 0.1, the electron candidate with larger E t is retained.
The signature of a high-momentum neutrino is a momentum imbalance in the transverse plane. The reconstruction of this "missing" transverse momentum [34] is calculated as the negative vector sum of the momentum of objects selected according to ATLAS identification algorithms, such as leptons, photons, and jets, and of the remaining "soft" objects that typically have low values of p t . The calculation can thus be summarized as where the reconstruction of soft objects and the choice of selected objects differ between different methods of evaluating the missing transverse momentum. Three methods of reconstruction are used in this analysis; E miss t is used to represent one particular method, as described below.
The large coverage in rapidity (y) of the calorimeter and its sensitivity to neutral particles motivate a calorimeter-based reconstruction of the missing transverse momentum. Selected objects are defined as the leptons selected by the analysis, and photons and jets with E t > 20 GeV. The transverse momenta of these objects are added vectorially using object-specific calibrations. For the remaining soft objects, calibrated calorimeter cluster energy measurements are used to determine their net transverse momentum. The resulting missing transverse momentum is denoted E miss t . The significant pile-up present in the data degrades the resolution of the calorimeter-based measurement of missing transverse momentum. An O(20%) improvement in resolution is obtained using a track-based measurement of the soft objects, where the tracks are required to have p t > 0.5 GeV and originate from the primary vertex. Tracks associated with identified leptons or jets are not included, as these selected objects are added separately to the calculation of the missing transverse momentum. This reconstruction of missing transverse momentum, denoted p miss t , is used in the final fit to the m t distribution and improves the signal resolution relative to the E miss t used for the previous measurement [5].  respectively), and for m t in the n j = 0 category, all evaluated by subtracting the reconstructed quantity from the corresponding quantity obtained using generated leptons and neutrinos in ggF H → W W * events. The r.m.s. of the m t difference decreases from 19 GeV to 14 GeV when using p miss t instead of E miss t in the reconstruction. The improved resolution significantly increases the discrimination between signal and certain background processes (such as W γ

ATLAS Simulation
Resolutions of (a) missing transverse momentum and (b) m t for the ggF signal MC in the nj = 0 category. The comparisons are made between the calorimeter-based reconstruction (E miss t ) and the track-based reconstruction (p miss t ) of the soft objects [see Eq. (2)]. The resolution is measured as the difference of the reconstructed (Reco.) and generated (Gen.) quantities; the r.m.s. values of the distributions are given with the legends in units of GeV. a final-state lepton. A relative quantity E miss t,rel is defined as follows: where ∆φ near is the azimuthal separation of the E miss t and the nearest high-p t lepton or jet. A similar calculation defines p miss t,rel and p miss (trk) t,rel .

C. Monte Carlo samples
Given the large number of background contributions to the signal region and the broadly peaking signal m t TABLE III. Monte Carlo samples used to model the signal and background processes. The corresponding cross sections times branching fractions, σ · B, are quoted at √ s = 8 TeV. The branching fractions include the decays t → W b, W → ν, and Z → (except for the process ZZ → νν). Here refers to e, µ, or τ for signal and background processes. The neutral current Z/γ * → process is denoted Z or γ * , depending on the mass of the produced lepton pair. Vector-boson scattering (VBS) and vector-boson fusion (VBF) background processes include all leading-order diagrams with zero QCD vertices for the given final state (except for diagrams with Higgs bosons, which only appear in the signal processes).

Process
Drell-Yan Z (m > 10 GeV) alpgen+herwig 16500 VBF Z + 2 jets sherpa 5.36 (m > 7 GeV) distribution, Monte Carlo modeling is an important aspect of the analysis. Dedicated samples are generated to evaluate all but the W +jets and multijet backgrounds, which are estimated using data (see Sec. VI C). Most samples use the powheg [35] generator to include corrections at next-to-leading order in α S (NLO). In cases where higher parton multiplicities are important, alpgen [36] or sherpa [37] provide merged calculations at tree level for up to five additional partons. In a few cases, only leading-order generators (such as acermc [38] or gg2vv [39]) are available. Table III shows the generator and cross section used for each process. The matrix-element-level Monte Carlo calculations are matched to a model of the parton shower, underlying event and hadronization, using either pythia6 [40], pythia8 [41], herwig [42] (with the underlying event modeled by jimmy [43]), or sherpa. Input parton distribution functions (PDFs) are taken from ct10 [44] for the powheg and sherpa samples and cteq6L1 [45] for alp-gen+herwig and acermc samples. The Z/γ * sample is reweighted to the mrstmcal PDF set [46].
Pile-up interactions are modeled with pythia8, and the ATLAS detector response is simulated [47] using either geant4 [48] or geant4 combined with a parametrized geant4-based calorimeter simulation [49]. Events are filtered during generation where necessary, allowing up to 2 ab −1 of equivalent luminosity for high cross section processes such as Z/γ * in the VBF category.
The ggF and VBF production modes for the H → W W * signal are modeled with powheg+pythia8 at m H = 125 GeV, and the corresponding cross sections are shown in Table III. A detailed description of these processes and their modeling uncertainties is given in Sec. V. The smaller contribution from the VH process, with subsequent H → W W * decay, is also shown in Table III. Not shown are the H → τ τ MC samples, which have an even smaller contribution but are included in the signal modeling for completeness using the same generators as for the H → W W * decay. The H → ZZ * decay contributes negligibly after event selection and is not included in the analysis.
Cross sections are calculated for the dominant diboson and top-quark processes as follows: the inclusive W W cross section is calculated to NLO with mcfm [50]; nonresonant gluon fusion is calculated and modeled to leading order in α S (LO) with gg2vv, including both W W and ZZ production and their interference; tt production is normalized to the calculation at next-to-next-toleading order in α S (NNLO) with resummation of higherorder terms to the next-to-next-to-leading logarithms (NNLL), evaluated with top++2.0 [51]; and single-top processes are normalized to NNLL following the calculations from Refs. [52], [53], and [54] for the s-channel, tchannel, and W t processes, respectively. The W W kinematics are modeled using the powheg+pythia6 sample for the n j ≤ 1 categories and the merged multileg sherpa sample for the n j ≥ 2 categories. Section VI A describes this modeling and the normalization of the double parton interaction process (qq → W ) + (qq → W ), which is modeled using the pythia8 generator. For W W , WZ, and ZZ production via nonresonant vector-boson scattering, the sherpa generator provides the LO cross section and is used for event modeling. The negligible VBS ZZ process is not shown in the table but is included in the background modeling for completeness.
The process W γ * is defined as associated W +Z/γ * production, where there is an opposite-charge sameflavor lepton pair with invariant mass m less than 7 GeV. This process is modeled using sherpa with up to one additional parton. The range m > 7 GeV is simulated with powheg+pythia8 and normalized to the powheg cross section. The use of sherpa for W γ * is due to the inability of powheg+pythia8 to model invariant masses down to the dielectron production threshold. The sherpa sample requires two leptons with p t > 5 GeV and | η | < 3. The jet multiplicity is corrected using a sherpa sample generated with 0.5 < m < 7 GeV and up to two additional partons, while the total cross section is corrected using the ratio of the mcfm NLO to sherpa LO calculations in the same restricted mass range. A similar procedure is used to model Zγ * , defined as Z/γ * pair-production with one same-flavor oppositecharge lepton pair having m ≤ 4 GeV and the other having m > 4 GeV.
The W γ and DY processes are modeled using alp-gen+herwig with merged tree-level calculations of up to five jets. The merged samples are normalized to the NLO calculation of mcfm (for W γ) or the NNLO calculation of DYNNLO [55] (for Z/γ * ). The W γ sample is generated with the requirements p γ t > 8 GeV and ∆R(γ, ) > 0.25. A W γ calculation at NNLO [56] finds a correction of less than 8% in the modeled phase space, which falls within the uncertainty of the NLO calculation.
A sherpa sample is used to accurately model the Z(→ )γ background. The photon is required to have p γ t > 8 GeV and ∆R(γ, ) > 0.1; the lepton pair must satisfy m > 10 GeV. The cross section is normalized to NLO using mcfm. Events are removed from the alp-gen+herwig DY samples if they overlap with the kinematics defining the sherpa Z(→ )γ sample.
The uncertainties are discussed for each specific background in Sec. VI, and their treatment in the likelihood fit is summarized in Sec. VII.

D. Modifications for 7 TeV data
The 7 TeV data are selected using single-lepton triggers with a muon p t threshold of 18 GeV and with varying electron p t thresholds (20 or 22 GeV depending on the data-taking period). The identification of the electrons uses the "tight" selection-based requirement described in Ref. [57] over the entire E t range, and the GSF fit is not used. Muons are identified with the same selection used for the analysis of the 8 TeV data. The lepton isolation requirements are tighter than in the 8 TeV analysis due to a statistically and systematically less precise estimation of the backgrounds with misidentified leptons. The jet p t thresholds are the same as in the 8 TeV analysis, but due to less severe pile-up conditions, the requirement on the jet vertex fraction jvf > 0.75 can be stricter without loss in signal efficiency.
The MC samples used for the analysis of the 7 TeV data have been chosen to reflect closely the samples used for the 8 TeV data (see Table III). The same matrix-element calculations and parton-shower models are used for all samples except for the WZ and ZZ backgrounds where powheg+pythia6 is used instead of powheg+pythia8. The pile-up events are simulated with pythia6 instead of pythia8. The samples are normalized to inclusive cross sections computed following the same prescriptions described in Sec. III C.

IV. EVENT SELECTION
The initial sample of events is based on the data quality, trigger, lepton p t threshold, and two identified leptons discussed in the previous section. Events with more than two identified leptons with p t > 10 GeV are rejected.
After the leptons are required to have opposite charge and pass the p t -threshold selections, the eµ sample of approximately 1.33 × 10 5 events is composed primarily of contributions from Z/γ * → τ τ and tt, with approximately 800 expected signal events. The ee/µµ sample of 1.6 × 10 7 events is dominated by Z/γ * → ee, µµ production, which is largely reduced (by approximately 90%) by requiring | m − m Z | > 15 GeV. Low-mass meson resonances and Z/γ * (Drell-Yan or DY) events are removed with the m > 10 GeV (12 GeV) selection for the eµ (ee/µµ) samples. The DY, and W +jets and multijets (denoted as Misid.) events are further reduced with requirements on the missing transverse momentum distributions. Figure 5(a) shows the E miss t,rel distribution in the n j ≤ 1 ee/µµ sample, where the dominant Z/γ * → ee, µµ contribution is suppressed by the E miss t,rel > 40 GeV requirement. In the n j ≤ 1 and n j ≥ 2 ggF-enriched eµ samples, a p miss t > 20 GeV selection is applied to significantly reduce the Z/γ * → τ τ background and the multijet backgrounds with misidentified leptons (see Figs. 5(b) and 5(c) for the n j ≤ 1 categories). The n j ≥ 2 VBF-enriched eµ sample requires no missing transverse momentum selection, and thus recovers signal acceptance for the statistically limited VBF measurement. In the ee/µµ sample, more stringent selections are applied: E miss t > 45 GeV and p miss t > 40 GeV. Table IV lists these so-called preselection criteria.
The different background composition as a function of jet multiplicity motivates the division of the data sample into the various n j categories. Figures 6(a) and 6(b) show the jet multiplicity distributions in the ee/µµ and eµ samples, respectively. The Z/γ * → ee, µµ background dominates the n j ≤ 1 ee/µµ samples even after the abovementioned missing transverse momentum requirements. The top-quark background becomes more significant at higher jet multiplicities. Its suppression is primarily based on the b-jet multiplicity; the distribution is shown in Fig. 6(c) for the eµ sample.
In each of the n j and lepton-flavor categories, further criteria are applied to increase the precision of the signal measurement. Sections IV A to IV D present the discriminating distributions and the resulting event yields. The selections are also listed in Table IV along with the preselection. Section IV E details the selection modifications for the 7 TeV data analysis. Section IV F concludes with   (2)] are made after applying the preselection criteria common to all nj categories (see Table IV). The observed data points (Obs, •) with their statistical uncertainty (stat) are compared with the histograms representing the cumulative expected contributions (Exp, -), for which the systematic uncertainty (syst) is represented by the shaded band. The band accounts for experimental uncertainties and for theoretical uncertainties on the acceptance for background and signal and is only visible in the tails of the distributions. Of the listed contributions (see Table I), the dominant DY backgrounds peak at low values. The legend order follows the histogram stacking order of the plots with the exception of DY ee/µµ ; it is at the top for (a) and at the bottom for the others. The arrows mark the threshold of the selection requirements.
the distributions after all requirements are applied.
In this section, the background processes are normalized using control regions (see Sec. VI). The distributions in the figures and the rates in the tables for the signal contribution correspond to the expectations for an SM Higgs boson with m H = 125 GeV. The VBF contribution includes the small contribution from VH production, unless stated otherwise.
A. nj = 0 category Events with a significant mismeasurement of the missing transverse momentum are suppressed by requiring p miss t to point away from the dilepton transverse momentum (∆φ ,met > π/2). In the absence of a reconstructed jet to balance the dilepton system, the magnitude of the dilepton momentum p t is expected to be small in DY events. A requirement of p t > 30 GeV further reduces  Table IV). See Fig. 5 for plotting details.
the DY contribution while retaining the majority of the signal events, as shown for the eµ sample in Fig. 7(a). At this stage, the DY background is sufficiently reduced in the eµ sample, but still dominates in the ee/µµ one. In this latter sample, a requirement of p miss (trk) t,rel > 40 GeV is applied to provide further rejection against DY events.
The continuum W W production and the resonant Higgs boson production processes can be separated by exploiting the spin-0 property of the Higgs boson, which, when combined with the V − A nature of the W boson decay, leads to a small opening angle between the charged leptons (see Sec. II). A requirement of ∆φ < 1.8 reduces both the W W and DY backgrounds while retaining 90% of the signal. A related requirement of m < 55 GeV combines the small lepton opening angle with the kinematics of a low-mass Higgs boson (m H = 125 GeV). The m and ∆φ distributions are shown for the eµ sample in Figs. 7(b) and 7(c), respectively.
An additional discriminant, f recoil , based on soft jets, is defined to reduce the remaining DY contribution in the ee/µµ sample. This residual DY background satisfies the event selection primarily when the measurement of the energy associated with partons from initial-state radiation is underestimated, resulting in an apparent imbalance of transverse momentum in the event. To further suppress such mismeasured DY events, jets with p j t > 10 GeV, within a π/2 wedge in φ (noted as ∧) centered on −p t , are used to define a fractional jet recoil IV. Event selection summary. Selection requirements specific to the eµ and ee/µµ lepton-flavor samples are noted as such (otherwise, they apply to both); a dash (-) indicates no selection. For the nj ≥ 2 VBF-enriched category, met denotes all types of missing transverse momentum observables. Values are given for the analysis of 8 TeV data for mH = 125 GeV; the modifications for 7 TeV are given in Sec. IV E. All energy-related values are in GeV.
Objective ggF-enriched VBF-enriched The jet transverse momenta are weighted by their associated jvf value to suppress the contribution from jets originating from pile-up interactions. Jets with no associated tracks are assigned a weight of 1. The f recoil distribution is shown in Fig. 7(d); a requirement of f recoil < 0.1 reduces the residual DY background in the ee/µµ sample by a factor of seven.
The expected signal and background yields at each stage of selection are shown in Table V, together with the observed yields. At the final stage, the table also shows the event yields in the range 3 4 m H < m t < m H where most of the signal resides. This m t selection is not used to extract the final results, but nicely illustrates the expected signal-to-background ratios in the different categories.  Table VI.
In the case of the eµ sample, a requirement is applied to the transverse mass defined for a single lepton i : where ∆φ is the angle between the lepton transverse momentum and p miss t . This quantity tends to have small values for the DY background and large values for the signal process. It also has small values for multijet production, where misidentified leptons are frequently measured with energy lower than the jets from which they originate. The m t distribution, chosen to be the larger of m 1 t or m 2 t , is presented in Fig. 8(a), and shows a clear difference in shape between the DY and multijet backgrounds, which    Table V). For each variable, the top panel compares the observed and the cumulative expected distributions; the bottom panel shows the overlay of the distributions of the individual expected contributions, normalized to unit area, to emphasize shape differences. See Fig. 5 for plotting details. lie mostly at low values of m t , and the other background processes. Thus, both the DY and multijet processes are substantially reduced with a requirement of m t > 50 GeV in the eµ sample.
The requirement of a jet allows for improved rejection of the Z/γ * → τ τ background. Using the direction of the measured missing transverse momentum, the mass of the τ -lepton pair can be reconstructed using the socalled collinear approximation [58]. A requirement of m τ τ < m Z − 25 GeV significantly reduces the remaining DY contribution in the eµ sample, as can be seen in Fig. 8 , f recoil , m , ∆φ ) are the same as in the n j = 0 category, except that p t is replaced with the magnitude of p j t = p t + p j t in the calculation of f recoil , and the p miss (trk) t,rel threshold is reduced to 35 GeV. The m and ∆φ distributions are shown in Figs. 8(c) and 8(d), respectively. Differences between the shapes of the signal or W W processes and the Z/γ * background processes are more apparent in the ∆φ distribution of the eµ + ee/µµ events than of the eµ events.

C. VBF-enriched nj ≥ 2 category
The n j ≥ 2 sample contains signal events produced by both the VBF and ggF production mechanisms. This TABLE V. Event selection for the nj = 0 category in the 8 TeV data analysis. The selection is presented separately for the eµ and ee/µµ samples. The summary columns give the observed yields (N obs ), the expected background yields (N bkg ), their ratios, and the expected signal yields (Nsig). For the dominant backgrounds, the expected yields are normalized using control regions, as described in Sec. VI. The Nsig values are given for mH = 125 GeV and are subdivided into the NggF and NVBF contributions. The composition columns give the contributions to N bkg (see Sec. VI). The requirements are imposed sequentially from top to bottom; entries are shown as 0.0 (-) if they are less than 0.1 (0.01) events. The entries are rounded to a precision commensurate with the statistical uncertainties due to the random error associated with the central value of the yield (stat obs = √ N obs ) and the sampling error associated with the finite sample size used for the prediction for background type k (stat bkg,k ). The errors on N obs /N bkg are due to the combined statistical uncertainty on stat obs and stat bkg . Energy-related quantities are in GeV.

Summary
Composition of N bkg section focuses on the former; the next section focuses on the latter.
The sample is analyzed using a boosted decision tree (BDT) multivariate method [16] that considers VBF Higgs boson production as signal and the rest of the processes as background, including ggF Higgs boson produc-tion. A cross-check analysis is performed using sequential selections on some of the variables that are used as inputs to the BDT.    QCD vertices and electroweak events with VBS or VBF interactions (see Table III).
The VBF process is characterized by the kinematics of the pair of tag jets (j 1 and j 2 ) and the activity in the rapidity gap between them. In general, this process results in two highly energetic forward jets with ∆y jj > 3, where ∆y jj = | y j1 − y j2 |. The invariant mass of this tag-jet pair combines ∆y jj with p j t information since m jj ≈ p j1 t · p j2 t e ∆y jj /2 for large values of ∆y jj . Both ∆y jj and m jj are input variables to the BDT; for the cross-check analysis, ∆y jj > 3.6 and m jj > 600 GeV are required [see Figs. 9(a) and 9(b)].
The ∆y jj gap defines a "central region," where a relatively low level of hadronic activity is expected because the mediating weak bosons do not exchange color. The number of extra jets (n extra-j ) in the ∆y jj gap quantifies the activity. Requiring the absence of such jets in this region is known as a "central-jet veto" [59] and it suppresses processes where the jets are produced via QCD radiation. The central-jet veto uses jets with p t > 20 GeV, and this requirement is applied in both the BDT and cross-check analyses. The selection can be expressed in terms of jet centrality, defined as: where η j3 is the pseudorapidity of an extra jet, Σ η jj = η j1 + η j2 and ∆η jj = | η j1 − η j2 |. The value of C j3 increases from zero, when η j3 is centered between the tag jets, to unity when η j3 is aligned in η with either of  Table V for presentation details). The NggF, NVBF, and NVH expected yields are shown separately. The expected yields for W W and Z/γ * → τ τ are divided into QCD and electroweak (EW) processes, where the latter includes VBF production.    Table V for presentation details). The event yields in (a) are shown after the preselection and the additional requirements applied before the BDT classification (see text). The event yields in (b) are given in bins in OBDT after the classification, the normalization factors are applied to the yields (see Table XX). In the specific case of (a), the normalization factors described in Sec. VI are not applied to the relevant backgrounds. The NggF, NVBF, and NVH expected yields are shown separately.
(a) Before the BDT classification the tag jets, and is greater than unity when | η j3 | > | η j1 | or | η j3 | > | η j2 |. The centrality of any extra jet in the event is required therefore to be C j3 > 1. The Higgs boson decay products tend to be in the central rapidity region. The centrality of a given lepton, C , with respect to the tag jets is defined similarly to that for extra jets in Eq. (6). A requirement of C < 1 is ap-plied to each lepton in the BDT and cross-check analyses. The sum of lepton centralities Σ C = C 1 + C 2 is used as an input to the BDT. The C 1 distribution is shown in Fig. 9(c).
Top-quark pair production has a large cross section and the same final state as VBF Higgs boson production, with the exception that its jets result from b-quarks.  FIG. 9. Distributions of (a) mjj, (b) ∆y jj , (c) C 1 , and (d) Σ m j , for the nj ≥ 2 VBF-enriched category. The plot in (a) is made after requiring all selections up to mjj, (b) up to ∆y jj and (c) up to C 1 (see Table VII). The signal is shown separately for the ggF and VBF production processes. The arrows mark the threshold of the selection requirements for the cross-check analysis in (a)-(c). There is no selection made on the variable in (d) since it is only used as an input to the training of the BDT. See Figs. 5 and 7 for plotting details.
A requirement of n b = 0 with p t > 20 GeV is made in the BDT and cross-check analyses. This requirement is made on all jets in the event regardless of classification as tag jets. Significant top-quark background still remains because of the limited η coverage of the tracker, the p t threshold applied to the b-jets, and the inefficiency of the b-jet identification algorithm within the tracking region. Further reductions are achieved through targeted kinematic selections and the BDT.
The pair production of top quarks occurs dominantly through gluon-gluon annihilation, and is frequently accompanied by QCD radiation. This radiation is used as a signature to further suppress top-quark backgrounds using the summed vector p t of the final-state objects, p sum t = p t + p miss t + Σ p j t where the last term is a sum of the transverse momenta of all jets in the event. Its magnitude is used as input to the BDT and is required to be p sum t < 15 GeV in the cross-check analysis.
The sum of the four combinations of lepton-jet invariant mass, Σ m j = m 1,j1 + m 1,j2 + m 2,j1 + m 2,j2 , is also used as an input to the BDT. In the VBF topology, tag jets are more forward whereas the leptons tend to be more central. This results in differences in the shapes of the Σ m j distributions for the VBF signal and the background processes, as can be seen in Fig. 9(d). This variable is not used in the cross-check analysis.
The other BDT input variables are those related to the H → W W * → ν ν decay topology (m , ∆φ , m t ), which are also used in the n j ≤ 1 categories. The crosscheck analysis requires ∆φ < 1.8 and m < 50 GeV.
Distributions from eight variables are input to the BDT: Σ C , ∆y jj , and m jj for VBF selection; p sum t and Σ m j for tt rejection; and ∆φ , m , and m t for their sensitivity to the H → W W * → ν ν decay topology. The BDT is trained after the common preselection criteria (as listed in Table IV) and the n b = 0 requirement. This event selection stage corresponds to the n b = 0 stage presented for the cross-check analysis in Table VII. Additional criteria, common to the BDT and cross-check analyses, are applied before the classification of the events based on the BDT output (described below). They include requirements on m τ τ , C j3 and C . The observed and expected event yields after all these requirements are shown in Table VIII(a) separately for the eµ and ee/µµ samples. The dominant background processes include tt and Z/γ * production. The normalization factors, described in Sec. VI, are not applied to these backgrounds at this stage.
The BDT is trained using the MC samples after the above-mentioned selections. The training starts with a single decision tree where an event is given a score of ± 1 if it satisfies particular sets of decisions ( + 1 leaf contains signal-like events and − 1 background-like ones). A thousand such trees are built and in each iteration the weight of miscategorized events is relatively increased, or "boosted." The final discriminant O BDT for a given event is the weighted average of the binary scores from the individual trees. The bin widths for the likelihood fit are optimized for the expected significance while keeping each bin sufficiently populated. The chosen configuration is four bins with boundaries at [−1, −0.48, 0.3, 0.78, 1], and with corresponding bin numbers from 0 to 3. The lowest bin contains the majority of background events and has a very small signal-to-background ratio. It is therefore not used in the likelihood fit. The expected and observed event yields after the classification in bins of O BDT are shown in Table VIII The sample of n j ≥ 2 events, which are neither in the VBF-enriched category for the BDT analysis nor in the cross-check analysis, are used to measure ggF production. In this category only the eµ final state is analyzed due to the relatively low expected significance in the ee/µµ Distribution of dilepton invariant mass for the nj ≥ 2 ggF-enriched category. The plot is made after requiring all selections up to m (see Table IX). See Fig. 5 for plotting details.
sample. Table IX shows the signal and background yields after each selection requirement.
The initial selection, n b = 0 and m τ τ < m Z − 25 GeV, is common to the other categories and reduces the topquark and DY backgrounds. The ggF-enriched sample is forced to be mutually exclusive to the VBF-enriched sample by inverting at least one of the VBF-specific requirements: C j3 > 1, C < 1, or O BDT > −0.48. A similar inversion is done for the cross-check analysis: ∆y jj > 3.6, m jj > 600 GeV, n extra-j = 0, or C < 1. Both sets of orthogonality requirements for the BDT and the crosscheck are imposed for the n j ≥ 2 ggF-enriched category.
The resulting sample contains events in a region sensitive to VH production where the associated W or Z boson decays hadronically. This region is suppressed by rejecting events in the region of ∆η jj ≤ 1.2 and | m jj − 85 | < 15 GeV. Figure 10 shows the m distribution after the VH orthogonality requirement. The H → W W * → ν ν topological selections, m < 55 GeV and ∆φ < 1.8, further reduce the dominant top-quark background by 70%, resulting in a signal purity of 3.3%.

E. Modifications for 7 TeV data
The 7 TeV data analysis closely follows the selection used in the 8 TeV analysis. The majority of the differences can be found in the object definitions and identifications, as described in Sec. III B. The lower average pile-up allows the loosening, or removal, of requirements on several pile-up sensitive variables from the selection.
The amount of DY background in the ee/µµ channel depends on the missing transverse momentum resolution. This background is reduced in a lower pileup environment, allowing lower E miss t thresholds in the TABLE IX. Event selection for the nj ≥ 2 ggF-enriched category in the 8 TeV data analysis (see Table V for presentation details). The NggF, NVBF, and NVH expected yields are shown separately. The "orthogonality" requirements are given in the text.

Summary
Composition of N bkg t are removed entirely. The effect of the reduced E miss t thresholds is partially compensated by an increased p t requirement of 40 GeV in the n j = 0 category and a p j t > 35 GeV requirement added to the n j = 1 category. The f recoil criteria are loosened to 0.2 and 0.5 in the n j = 0 and n j = 1 categories, respectively.
In the n j ≥ 2 category, only the VBF-enriched analysis is considered; it follows an approach similar to the 8 TeV version. It exploits the BDT multivariate method and it uses the same BDT classification and output binning, as the 8 TeV data analysis. In the eµ sample, a two-bin fit of the O BDT is used (bins 2 and 3 are merged). In the ee/µµ sample, a one-bin fit is used (bins 1-3 are merged) due to the smaller sample size.
The background estimation, signal modeling, final observed and expected event yields, and the statistical analysis and results, are presented in the next sections.

F. Summary
This section described the event selection in the n j and lepton-flavor categories. Each of these categories is treated independently in the statistical analysis using a fit procedure described in Sec. VII. Inputs to the fit include the event yields and distributions at the final stage of the event selection without the m t requirement.
The total signal efficiency for H → W W * → ν ν events produced with = e and µ, including all signal categories and production modes, is 10.2% at 8 TeV for a Higgs boson mass of 125.36 GeV. The corresponding signal efficiency when considering only the VBF production mode is 7.8%. Figure 11 shows the m t distributions in the n j = 0, n j = 1 and n j ≥ 2 ggF-enriched categories for the 8 TeV data. The distributions for the n j ≤ 1 categories are shown in Fig. 12 for the 7 TeV data. The final O BDT output distribution, for the VBF-enriched category, is shown in Fig. 13 for the 7 TeV and 8 TeV data samples. Figures 14 and 15 show the p 2 t and m distributions at the end of the event selection in the n j ≤ 1 eµ categories for the 8 TeV data analysis. The distributions are shown for two categories of events based on the flavor of the lepton with the higher p t . This division is important for separating events based on the relative contribution from the backgrounds from misidentified leptons (W +jets and multijets); see Sec. VI C for details. The dependence of the misidentified lepton and V V background distributions on p 2 t motivates the separation of the data sample into three bins of p 2 t . The variations in the background composition across the m range motivate the division into two bins of m . Figure 16 shows the corresponding distributions in the eµ n j ≤ 1 samples in the 7 TeV data analysis.
The event displays in Fig. 17 show examples of the detector activity for two signal candidates: one in the n j = 0 eµ category for the 7 TeV data analysis, and one in the VBF-enriched n j ≥ 2 eµ category for the 8 TeV data analysis. Both events have a small value of ∆φ as is characteristic of the signal. The latter event shows two well-separated jets that are characteristic of VBF production.        GeV. Both events have a small value of ∆φ , which is characteristic of the signal. The second event shows two well-separated jets that are characteristic of VBF production.

V. SIGNAL PROCESSES
The leading Higgs boson production processes are illustrated in Fig. 1. This section details the normalization and simulation of the ggF and VBF production modes. In both cases, the production cross section has been calculated to NNLO in QCD and next-toleading order in the electroweak couplings. Resummation has been performed to NNLL for the ggF process. For the decay, the calculation of the branching fraction is computed using the H → W W * and H → ZZ * partial widths from prophecy4f [60] and the width of all other decays from hdecay [61]. The H → W W * branching fraction is 22% with a relative uncertainty of 4.2% for m H = 125.36 GeV [62]. Interference with direct W W production [63] and uncertainties on VH production [64] have a negligible impact on this analysis. Uncertainties on the ggF and VBF production processes are described in the following subsections.

A. Gluon fusion
The measurement of Higgs boson production via gluon fusion, and the extraction of the associated Higgs boson couplings, relies on detailed theoretical calculations and Monte Carlo simulation. Uncertainties on the perturbative calculations of the total production cross section and of the cross sections exclusive in jet multiplicity are among the leading uncertainties on the expected signal event yield and the extracted couplings. The powheg [35] generator matched to pythia8 is used for event simulation and accurately models the exclusive jet multiplicities relevant to this analysis. The simulation is corrected to match higher-order calculations of the Higgs boson p t distribution.
Production of a Higgs boson via gluon fusion proceeds dominantly through a top-quark loop (the bottom-quark loop contributes 7% to the cross section). Higher-order QCD corrections include radiation from the initial-state gluons and from the quark loop. The total cross section is computed to NNLO [65] using the m t → ∞ approximation, where an effective point-like ggH coupling is introduced. Corrections for the finite top-quark mass have been computed to NLO and found to be a few percent [66]; this difference is applied as a correction to the NNLO cross section. Resummation of the soft QCD radiation has been performed to NNLL [67] in the m t → ∞ approximation and to the next-to-leading logarithms (NLL) for finite top-and bottom-quark masses. Electroweak corrections to NLO [68] are applied using the complete factorization approximation [69]. Together, these calculations provide the total inclusive cross section for the ggF process [70], which is 19.15 pb for m H = 125.36 GeV. The uncertainty on the total cross section is 10%, with approximately equal contributions from QCD scale variations (7.5%) and parton distribution functions (7.2%). The powheg MC generator used to model ggF production [71] is based on an NLO calculation with finite quark masses and a running-width Breit-Wigner distribution that includes electroweak corrections at next-to-leading order. The generator contains a scale for matching the resummation to the matrix-element calculation, which is chosen to reproduce the NNLO+NLL calculation of the Higgs boson p t [72]. To improve the modeling of this distribution, a reweighting scheme is applied to reproduce the prediction of the NNLO+NNLL dynamic-scale calculation given by the hres2.1 program [73]. The scheme separately weights the p t spectra for events with ≤ 1 jet and events with ≥ 2 jets, since the latter include jet(s) described purely by the pythia shower model, which underestimates the rate of two balancing jets producing low Higgs boson p t . Events with ≥ 2 jets are therefore reweighted to the p t spectrum predicted by the NLO powheg simulation of Higgs boson production in association with two jets (H + 2 jets) [74]. The reweighting procedure preserves agreement between the generated jet-multiplicity distribution and the predictions of higher-order calculations.
The uncertainty on the jet multiplicity distribution is evaluated using the jet-veto-efficiency (JVE) method [72,75] for the ggF categories and the Stewart-Tackmann (ST) method [76] for the VBF category. The JVE method factorizes the total cross section from the acceptances of the jet vetoes in the zero-jet and one-jet channels, treating these components as uncorrelated. Three calculations of the jet-veto efficiency are defined based on ratios of cross sections with different jet multiplicities and at different orders (for example, 1 − σ nlo nj ≥1 /σ nnlo tot for the veto efficiency of the first jet). The three calculations differ by NNNLO terms in the inclusive perturbative series, so their comparison provides an estimate of the perturbative uncertainty on the jet veto. A second estimate is obtained by individually varying the factorization, renormalization, and resummation scales by factors of two or one-half, and by coherently varying the factorization and renormalization scales by these factors. These estimates are used to define an overall uncertainty, as described below.
For the efficiency 0 of the jet veto that defines the zerojet channel, the central value is evaluated at the highest available fixed order (NNLO), with NNLL resummation. The uncertainty is taken as the maximum effect of the scale variations on the calculation, or the maximum deviation of the other calculations from this one. The results using the JetVHeto computation [77] are shown in Fig. 18, along with the reweighted powheg+pythia8 prediction evaluated without hadronization or the underlying event. The results are consistent to within a few percent for a jet p t threshold of 25 GeV, and the relative uncertainty at this threshold is 12%.
The efficiency of vetoing an additional jet, given the presence of a single jet, is defined as 1 . The NNLO n j ≥ 1 cross section needed for the highest-order calculation of the jet-veto-efficiency method is not available, though the other two calculations of the veto efficiency can be performed using the mcfm generator. The corresponding calculations bracket the highest-order calculation in both the case of 0 and in the case of 1 evaluated using a partial calculation of the NNLO n j ≥ 1 cross section. The central value of 1 is thus estimated to be the average of the available calculations, with the uncertainty given by the maximum scale variation of either calculation. This results in a relative uncertainty of 14% on 1 , as shown in Fig. 18. The figure shows that the reweighted powheg+pythia8 prediction for 1 agrees with the calculation to within a few percent for a jet p t threshold of 25 GeV.
A prior ATLAS analysis in this decay channel [5] relied on the ST procedure for all uncertainties associated with jet binning. The JVE estimation reduces uncertainties in the ggF categories by incorporating a resummation calculation (in 0 ) and the NLO calculation of H + 2 jets (in 1 ). The uncertainties for the ST (JVE) procedure are 18% (15%), 43% (27%), and 70% (34%) for the cross sections in the n j = 0, n j = 1, and n j ≥ 2 ggF-enriched categories, respectively. These uncertainties are reduced when the categories are combined, and contribute a total of ≈ 5% to the uncertainty on the measured ggF signal strength (see Table XXVI). Additional uncertainties on the signal acceptance are considered in each signal category. The scale and PDF uncertainties are typically a few percent. A generator uncertainty is taken from a comparison between powheg+herwig and amc@nlo+herwig, which differ in their implementation of the NLO matrix element and the matching of the matrix element to the parton shower. Uncertainties due to the underlying event and parton shower models (UE/PS) are generally small, though in the n j = 1 category they are as large as 14% in the signal regions where p 2 t < 20 GeV. The UE/PS uncertainties are estimated by comparing predictions from powheg+herwig and powheg+pythia8.
The evaluation of the ggF background to the n j ≥ 2 VBF category includes an uncertainty on the acceptance of the central-jet veto. The uncertainty is evaluated to be 29% using the ST method, which treats the inclusive H + 2-jet and H + 3-jet cross sections as uncorrelated. Scale uncertainties are also evaluated in each measurement range of the BDT output, and are 3-7% in BDT bins 1 and 2, and 48% in BDT bin 3. Other uncertainties on ggF modeling are negligible in this category, except those due to UE/PS, which are significant because the second jet in ggF H + 2-jet events is modeled by the parton shower in the powheg+pythia8 sample. A summary of the uncertainties on the gluon-fusion and vector-boson-fusion processes is given in Table X. The table shows the uncertainties for same-flavor leptons in the n j ≤ 1 categories, since events with different-flavor leptons are further subdivided according to m and p 2 t (as described in Sec. II).

B. Vector-boson fusion
The VBF total cross section is obtained using an approximate QCD NNLO computation provided by the vbf@nnlo program [78]. The calculation is based on the structure-function approach [79] that considers the VBF process as two deep-inelastic scattering processes connected to the colorless vector-boson fusion producing the Higgs boson. Leading-order contributions violating this approximation are explicitly included in the computation; the corresponding higher-order terms are negligible [64]. Electroweak corrections are evaluated at NLO with the hawk program [80]. The calculation has a negligible QCD scale uncertainty and a 2.7% uncertainty due to PDF modeling. The powheg generator is used to simulate the VBF process (see Table III). Uncertainties on the acceptance are evaluated for several sources: the impact of the QCD scale on the jet veto and on the remaining acceptance; PDFs; generator matching of the matrix element to the parton shower; and the underlying event and parton shower. Table X shows the VBF and ggF uncertainties in the most sensitive bin of the BDT output (bin 3). The other bins have the same or similar uncertainties for the VBF process, except for UE/PS, where the uncertainty is 5.2% (< 1%) in bin 2 (bin 1).

VI. BACKGROUND PROCESSES
The background contamination in the various signal regions (SR) comes from several physics processes that were briefly discussed in Sec. II and listed in Table I. They are: • W W : nonresonant W pair production; • Top quarks (Top): t pair production (tt) and singletop production (t) both followed by the decay t → W b; • Misidentified leptons (Misid.): W boson production in association with a jet that is misidentified as a lepton (Wj) and dijet or multijet production with two misidentifications (jj); • Other dibosons (V V ): W γ, W γ * , WZ and ZZ; and • Drell-Yan (DY): Z/γ * decay to e or µ pairs (ee/µµ) and τ pairs (τ τ ); the contamination of Higgs decays to non-W W channels is small, but considered as signal. A few background processes, such as Zγ and W W produced by double parton interactions, are not listed because their contributions are negligible in the control and signal regions, but they are considered in the analysis for completeness. Their normalizations and acceptances are taken from Monte Carlo simulation.
For each background the event selection includes a targeted set of kinematic requirements (and sample selection) to distinguish the background from the signal. The background estimate is made with a control region (CR) that inverts some or all of these requirements and in many cases enlarges the allowed range for certain kinematic variables to increase the number of observed events in the CR. For example, the relevant selections that suppress the W W background in the n j = 0 SR are m < 55 GeV and ∆φ < 1.8. The W W CR, in turn, is defined by requiring 55 < m < 110 GeV and ∆φ ≤ 2.6.
The most common use of a CR, like the W W example above, is to determine the normalization factor β defined by the ratio of the observed to expected yields of W W candidates in the CR, where the observed yield is obtained by subtracting the non-W W (including the Higgs signal) contributions from the data. The estimate B est sr of the expected background in the SR under consideration can be written as where N cr and B cr are the observed yield and the MC estimate in the CR, respectively, and B sr is the MC estimate in the signal region. The first equality defines the data-to-MC normalization factor in the CR, β; the second equality defines the extrapolation factor from the CR to the SR, α, predicted by the MC. With a sufficient number of events available in the CR, the large theoretical uncertainties associated with estimating the background directly from simulation are replaced by the combination of two significantly smaller uncertainties, the statistical uncertainty on N cr and the systematic uncertainty on α.
When the SR is subdivided for reasons of increased signal sensitivity, as is the case for the eµ sample for n j = 0, a corresponding α parameter is computed for each of the subdivided regions. The CR (hence the β parameter), however, is not subdivided for statistical reasons.
The uncertainties described in this section are inputs to the extraction of the signal strength parameter using the likelihood fit, which is described in Sec. VII. An extension of this method is used when it is possible to determine the extrapolation factor α from data. As described in Secs. VI C and VI E, this can be done for the misidentified lepton backgrounds and in the high-statistics categories for the Z/γ * → ee, µµ background. For the former, the distribution of the discriminating variable of interest is TABLE XI. Background estimation methods summary. For each background process or process group, a set of three columns indicate whether data (•) or MC (•) samples are used to normalize the SR yield (n), determine the CR-to-SR extrapolation factor (e), and obtain the SR distribution of the fit variable (v). In general, the methods vary from one row to the next for a given background process; see Sec. VI for the details.
n e v n e v n e v n e v n e v n e v also determined from data. For completeness, one should note that the smaller background sources are estimated purely from simulation. Table XI summarizes, for all the relevant background processes, whether data or MC is used to determine the various aspects of the method. In general, data-derived methods are preferred and MC simulation is used for a few background processes that do not contribute significantly in the signal region, that have a limited number of events in the control region, or both. MC simulation is used (open circles) or a data sample is used (solid circles) for each of the three aspects of a given method: the normalization (N), the extrapolation (E), and the distribution of the discriminating variable of interest (V). The plots in this section (Figs. [19][20][21][22][23][24][25][26][27] show contributions that are normalized according to these methods. This section focuses on the methodology for background predictions and their associated theoretical uncertainties. The experimental uncertainties also contribute to the total uncertainty on these background predictions and are quoted here only for the backgrounds from misidentified leptons, for which the total systematic uncertainties are discussed in Sec. VI C. Furthermore, although the section describes one background estimation technique at a time, the estimates for most background contributions are interrelated and are determined in situ in the statistical part of the analysis (see Sec. VII).
The section is organized as follows. Section VI A describes the W W background in the various categories. This background is the dominant one for the most sensitive n j = 0 category. Section VI B describes the back-ground from top-quark production, which is largest in the categories with one or more high-p t jets. The dataderived estimate from misidentified leptons is described in Sec. VI C. The remaining backgrounds, V V and Z/γ * , are discussed in Secs. VI D and VI E, respectively. The similarities and modifications for the background estimation for the 7 TeV data analysis are described in Sec. VI F. Finally, Sec. VI G presents a summary of the background predictions in preparation for the fit procedure described in Sec. VII.
The nonresonant W W production process, with subsequent decay W W → ν ν, is characterized by two wellseparated charged leptons. By contrast, the charged leptons in the H → W W * → ν ν process tend to have a small opening angle (see Fig. 3). The invariant mass of the charged leptons, m , combines this angular information with the kinematic information associated with the relatively low Higgs boson mass (m H < 2m W ), providing a powerful discriminant between the processes (see Fig. 7). This variable is therefore used to define W W control regions in the n j ≤ 1 categories, where the signal is selected with the requirement m < 55 GeV. For the n j ≥ 2 ggF and VBF categories, the W W process is modeled with a merged multi-parton sherpa sample and normalized to the NLO inclusive W W calculation from mcfm [50], since the large top-quark backgrounds make a control-region definition more challenging. The n j ≤ 1 analyses use a data-based normalization for the W W background, with control regions defined by a range in m that does not overlap with the signal regions. The normalization is applied to the combined (qq or qg) → W W and gg → W W background estimate, and theoretical uncertainties on the extrapolation are evaluated.
To obtain control regions of sufficient purity, several requirements are applied. In order to suppress the Z/γ * background, the CRs use eµ events selected after the p t > 30 GeV and m t > 50 GeV requirements in the n j = 0 and n j = 1 categories, respectively. The latter requirement additionally suppresses background from multijet production. A requirement of p 2 t > 15 GeV is applied to suppress the large W +jets background below this threshold. Additional Z/γ * → τ τ reduction is achieved by requiring ∆φ < 2.6 for n j = 0, and | m τ τ − m Z | > 25 GeV for n j = 1, where m τ τ is defined in Sec. IV B. The m range is 55 < m < 110 GeV (m > 80 GeV) for n j = 0 (1), and is chosen to maximize the signal significance. are shown in Fig. 19.
The W W estimate B est W W , i in each signal region i is given by Eq. (7). The control region is approximately 70% (45%) pure in the n j = 0 (1) category. The contamination in the n j = 1 category is dominated by tt → W bW b events, where one jet is unidentified and the other is misidentified as a light-quark jet. The single-top contribution is one-third the size of this background for n j = 1; for n j = 0 this ratio is about one-half. All backgrounds are subtracted as part of the fit for β described in Sec. VII B 1.
The CR-to-SR extrapolation factor has uncertainties due to the limited accuracy of the MC prediction. Uncertainties due to higher perturbative orders in QCD not included in the MC simulation are estimated by varying the renormalization and factorization scales independently by factors of one-half and two, keeping the ratio of scales in the range one-half to two [62]. An uncertainty due to higher-order electroweak corrections is determined by reweighting the MC simulation to the NLO electroweak calculation [81] and taking the difference with respect to the nominal sample. PDF uncertainties are evaluated by taking the largest difference between the nominal CT10 [44] PDF set and either the MSTW2008 [82] or the NNPDF2.3 [83] PDF TABLE XII. W W theoretical uncertainties (in %) on the extrapolation factor α for nj ≤ 1. Total (Tot) is the sum in quadrature of the uncertainties due to the QCD factorization and renormalization scales (Scale), the PDFs, the matching between the hard-scatter matrix element to the UE/PS model (Gen), the missing electroweak corrections (EW), and the parton shower and underlying event (UE/PS). The negative sign indicates anti-correlation with respect to the unsigned uncertainties for SR categories in the same column. Energy-related values are given in GeV. set, and adding in quadrature the uncertainty determined using the CT10 error eigenvectors. Additional uncertainties are evaluated using the same procedures as for ggF production (Sec. V A): uncertainties due to the modeling of the underlying event, hadronization and parton shower are evaluated by comparing predictions from powheg+pythia6 and powheg+herwig; a generator uncertainty is estimated with a comparison of powheg+herwig and amc@nlo+herwig. The detailed uncertainties in each signal subregion are given in Table XII. The corresponding uncertainties on the m t distribution are parametrized as linear variations between 90 and 170 GeV, giving a relative change of up to 20% between these points (depending on signal region). The contribution from the gg → W W process is 5.8% (6.5%) of the total W W background in the n j = 0 (1) category in the signal region and 4.5% (3.7%) in the control region. Its impact on the extrapolation factor is approximately given by the ratio of gg → W W to qq → W W events in the signal region, minus the corresponding ratio in the control region. The leading uncertainty on these ratios is the limited accuracy of the production cross section of the gluon-initiated process, for which a full NLO calculation is not available. The uncertainty evaluated using renormalization and factorization scale variations in the leading-order calculation is 26% (33%) in the n j = 0 (1) category [84]. An increase of the gg → W W cross section by a factor of two [85] increases the measured µ value by less than 3%.

SR
Boson pairs can be produced by double parton interactions (DPI) in pp collisions. The DPI contribution is very small-0.4% of W W production in the signal regionsand is estimated using pythia8 MC events normalized to the predicted cross section (rather than the β parameter from the W W CR). The cross section is computed using the NNLO W ± production cross section and an effective multi-parton interaction cross section, σ eff = 15 mb, measured by ATLAS using W jj production [86]. An uncertainty of 60% is assigned to the value of σ eff -and, correspondingly, to the DPI yields-using an estimate of σ eff ≈ 24 mb for W W production [87]. While these estimates rely on theoretical assumptions, an increase of the DPI cross section by a factor of ten only increases the measured µ by 1%. Background from two pp → W collisions in the same bunch crossing is negligible.
In the n j = 0 SR, the ratio of signal to W W background is about one to five, magnifying the impact of background systematic uncertainties. The definition of the CR as a neighboring m window reduces the uncertainty in the extrapolation to low m . To validate the assigned uncertainties, the CR normalization is extrapolated to m > 110 GeV and compared to data. The data are consistent with the prediction at the level of 1.1 standard deviations considering all systematic uncertainties.
The normalization factors determined using predicted and observed event yields are β 0j W W = 1.22 ± 0.03 (stat.) ± 0.10 (syst.) and β 1j W W = 1.05 ± 0.05 (stat.) ± 0.24 (syst.), which are consistent with the theoretical prediction at the level of approximately two standard deviations. Here the uncertainties on the predicted yields are included though they do not enter into the analysis. Other systematic uncertainties are also suppressed in the full likelihood fit described in Sec. VII B.

MC evaluation for nj ≥ 2
For the VBF and ggF n j ≥ 2 analyses, the W W background is estimated using sherpa. The MC samples are generated as merged multileg samples, split between the cases where final-state jets result from QCD vertices or from electroweak vertices. The interference between these diagrams is evaluated to be less than a few percent using madgraph; this is included as an uncertainty on the prediction.
For the processes with QCD vertices, uncertainties from higher orders are computed by varying the renormalization and factorization scales in madgraph and are found to be 27% for the VBF category and 19% for the ggF category. Differences between sherpa and madgraph predictions after selection requirements are 8-14% on the O BDT distribution and 1-7% on the m t distribution, and are taken as uncertainties. The same procedures are used to estimate uncertainties on processes with only electroweak vertices, giving a normalization uncertainty of 10% and an uncertainty on the O BDT (m t ) distribution of 10-16% (5-17%).
The MC prediction is validated using a kinematic selection that provides a reasonably pure sample of W W + 2- jet events. Events are selected if they pass the preselection requirements on lepton p t and m , have two jets, and n b = 0. An additional requirement of m t > 100 GeV is applied in order to enhance the W W contribution. A final discriminant is the minimum of all possible calculations of m t2 [88] that use the momenta of a lepton and a neutrino, or the momenta of a lepton, a jet, and a neutrino. The possible momentum values of each neutrino, given p miss t , are scanned in order to calculate m t ; this scan determines m t2 . A requirement that the minimum m t2 be larger than 160 GeV provides a purity of 60% for W W + 2 jets (see Fig. 20). The ratio of the observed to the expected number of W W + 2-jet events in this region is 1.15 ± 0.19 (stat.).

B. Top quarks
At hadron colliders, top quarks are produced in pairs (tt) or in association with a W boson (W t) or quark(s) q (single-t). The leptonic decay of the W bosons leads to a final state of two leptons, missing transverse momentum and two b-jets (one b-jet) in tt (W t) production. The single-t production mode has only one W boson in the final state and the second, misidentified, lepton is produced by a jet. The background from these events is estimated together with the tt and W t processes in spite of the different lepton production mechanism, but the contribution from these processes to the top-quark background is small. For example, these events are 0.5% of the top-quark background in the n j = 0 category. The top-quark background is estimated using the normalization method, as described in Eq. (7). In the n j = 0 category, the SR definition includes a jet veto but the CR has no jet requirements. Because of this, the CR and the SR slightly overlap, but the n j = 0 SR is only 3% of the CR and the expected total signal contamination is less than 1%, so the effect of the overlap on the results is negligible. In the n j = 1 category, the SR definition requires n b = 0 but the CR has n b ≥ 1. In the n j = 2 VBF category, the CR is defined requiring one and only one btagged jet. Finally, in the n j = 2 ggF category, to reduce the impact of b-tagging systematic uncertainties, the CR is defined for n b = 0, and instead m > 80 GeV is applied to remove overlap with the SR and minimize the signal contribution.
1. Estimation of jet-veto efficiency for nj = 0 For the n j = 0 category, the CR is defined after the preselection missing transverse momentum cut, using only the eµ channel, with an additional requirement of ∆φ < 2.8 to reduce the Z/γ * → τ τ background. The CR is inclusive in the number of jets and has a purity of 74% for top-quark events. The extrapolation parameter α is the fraction of events with zero reconstructed jets and is derived from the MC simulation.
The value of α is corrected using data in a sample containing at least one b-tagged jet. A parameter α 1b is defined as the fraction of events with no additional jets in this region. The ratio α 1b data /α 1b mc 2 corrects systematic effects that have a similar impact on the b-tagged and inclusive regions, such as jet energy scale and resolution. The square is applied to account for the presence of two jets in the Born-level tt production. The prediction can be summarized as where N cr is the observed yield in the CR and B cr and B sr are the estimated yields from MC simulation in the CR and SR, respectively. Theoretical uncertainties arise from the different topologies of the b-tagged region and the CR, through the component of the background that is derived from MC-simulated top-quark events, the ratio α 0j mc /(α 1b mc ) 2 . These uncertainties include variations of the renormalization and factorization scales, choice of PDFs, and the parton shower model. The procedure is sensitive to the relative rates of W t and tt production, so an uncertainty is included on this cross section ratio and on the interference between these processes. An additional theoretical uncertainty is evaluated on the efficiency rest of the additional selection after the n j = 0 preselection, which is estimated purely from MC simulation. Experimental uncertainties are also evaluated on the simulation-derived components of the background estimate, with the main contributions coming from jet energy scale and resolution. The uncertainties on α 0j mc /(α 1b mc ) 2 and on rest are summarized in Table XIII. The resulting normalization factor is β 0j top = 1.08 ± 0.02 (stat.), including the correc- In the n j = 1 SR, top-quark production is the second leading background, after nonresonant W W production. Summing over all signal regions with no m t requirement applied, it is 36% of the total expected background and the ratio of signal to top-quark background is approximately 0.2. It also significantly contaminates the n j = 1 W W CR with a yield as large as that of nonresonant W W in this CR. Two parameters are defined for the extrapolation from the top CR, one to the SR (α sr ) and one to the W W CR (α W W ).
The top CR is defined after the preselection in the eµ channel and requires the presence of exactly one jet, which must be b-tagged. There can be no additional b-tagged jet with 20 < p t < 25 GeV, following the SR requirement. The requirement m t > 50 GeV is also applied to reject jj background. As in the W W case, only the eµ events are used in order to suppress the Z/γ * contamination. The m t distribution in this control region is shown in Fig. 21(a).
The CR requires at least one b-jet, but the SR requires zero. In the case of a simple extrapolation using the ratio of the predicted yields in the signal and control regions, the impact of the b-tagging efficiency uncertainty on the measurement is substantial. A systematic uncertainty of 5% on the b-tagging efficiency would induce an uncertainty of about 20% on the estimated yield in the SR. In order to reduce this effect, the b-tagging efficiency est 1j is estimated from data. The efficiency 2j is the probability to tag an individual jet, measured in a sample selected similarly to the SR but containing exactly two jets, at least one of which is b-tagged. It can be measured in data and MC simulation, because a high-purity top sample can be selected. Most of the events in this sample are tt events with reconstructed jets from b-quarks, although there is some contamination from light-quark jets from initial state radiation when a b-quark does not produce a reconstructed jet. Similarly, 1j is the efficiency to tag a jet in a sample with one jet, in events passing the signal region selection.
The efficiency measurement data 2j is extrapolated from the n j = 2 sample to the n j = 1 samples using γ 1j = 1j / 2j , which is evaluated using MC simulation. Jets in the n j = 2 and jets in the n j = 1 samples have similar kinematic features; one example, the jet p t , is illustrated in Fig. 21(b). In this figure, the n j = 2 distribution contains the p t of one of the two jets, chosen at random, provided that the other jet is tagged, so that the distribution contains the same set of jets as is used in the extrapolation to n j = 1. Residual disagreements between the distributions are reflected in the deviation of γ 1j from unity, which is small. The value of γ 1j is 1.079 ± 0.002 (stat.) with an experimental uncertainty of 1.4% and a theoretical uncertainty of 0.8%. The experimental uncertainty is almost entirely due to uncertainties on the b-tagging efficiency. The theoretical uncertainty is due to the PDF model, renormalization and factorization scales, matching of the matrix element to the parton shower, top-quark cross sections, and interference between top-quark single and pair production.
The estimated b-tagging efficiency in the n j = 1 data is est 1j = γ 1j · data 2j and the top-quark background estimate in the SR is then: The theoretical systematic uncertainties are summarized for nj ≤ 1. The uncertainties on the extrapolation procedure for nj = 0 are given in (a); the uncertainties on the extrapolation factor αtop for nj = 1 are given in (b). The negative sign refers to the anti-correlation between the top-quark background predicted in the signal regions and in the W W CR. Only a relative sign between rows is meaningful; columns contain uncorrelated sources of uncertainty. Invariant masses are given in GeV.
(a) nj = 0 in Table XIII. The normalization factor for this background is β 1j top = 1.06 ± 0.03 (stat.), and the total uncertainty on the estimated background in the n j = 1 signal region is 5%.

Extrapolation from
The n j ≥ 2 categories have a large contribution from top-quark background events even after selection requirements, such as the b-jet veto, that are applied to reduce them, because of the two b quarks in tt events. The majority of the residual top-quark events have a light-quark jet from initial-state radiation and a b-quark jet that is not identified by the b-tagging algorithm. The CR requires exactly one b-tagged jet to mimic this topology, so that at first order the CR-to-SR extrapolation factor (α) is the ratio of b-jet efficiency to b-jet inefficiency. The CR includes events from eµ and ee/µµ final states because the Z/γ * contamination is reduced by the jet selection.
The O BDT discriminant is based on variables, such as m jj , that depend on the jet kinematics, so the acceptance for top-quark events in each O BDT bin is strongly dependent on the Monte Carlo generator and modeling. reduces the modeling uncertainties. Figure 22 shows the m jj and O BDT distributions in the top CR used for the VBF category. The two bins with the highest O BDT score are merged to improve the statistical uncertainty on the estimated background. The uncertainties on the extrapolation from the single bin in the CR to the two bins in the SR are separately evaluated. Table XIV shows the normalization factors β i and their uncertainties for each O BDT bin, as well as the theoretical uncertainties on the extrapolation factors α j to the corresponding SR bins.
The uncertainties on α were evaluated with the same procedure used for the W W background (see Sec. VI A 1). The only significant source is a modeling uncertainty evaluated by taking the maximum spread of predictions from powheg+herwig, alp-gen+herwig and mc@nlo+herwig. The generators are distinguished by the merging of LO matrix-element evaluations of up to three jets produced in associa-TABLE XIV. Top-quark background uncertainties (in %) for nj ≥ 2 VBF on the extrapolation factor α and normalization factor β. The contributions are given in bins of OBDT. The systematic uncertainty on β does not affect the measurement, but is shown to illustrate the compatibility of the normalization factor with unity. The values of β are also shown; bins 2 and 3 use a common value of β. Bin 0 is unused, but noted for completeness. In the more inclusive phase space of the ggF-enriched n j ≥ 2 category, the tt background remains dominant after the n b = 0 requirement, as is the case for the VBFenriched category. The CR is defined with m > 80 GeV to distinguish it from the signal region (see Fig. 10) and reduce signal contamination. The CR is approximately 70% pure in top-quark events, and a normalization factor of β = 1.05 ± 0.03 (stat.) is obtained. The uncertainties on the extrapolation factor α to the SR are 3.2% from the comparison of mc@nlo+herwig, alpgen+herwig, and powheg+pythia; 1.2% for the parton shower and underlying-event uncertainties from the comparison of powheg+pythia6 and powheg+herwig; 1% from the missing higher-order contribution, evaluated by varying the renormalization and factorization scales; 0.3% from the PDF envelope evaluated as described in Sec. VI A 1; and 0.7% from the experimental uncertainties. The effect of the same set of variations on the predicted m t distribution in the signal region was also checked. The variations from the nominal distribution are small, at most 4% in the tails, but they are included as a shape systematic in the fit procedure.

C. Misidentified leptons
Collisions producing W bosons in association with one or more jets-referred to here as W +jets-may enter the signal sample when a jet is misidentified as a prompt lepton. In this background, there is a prompt lepton and a transverse momentum imbalance from the leptonic decay of the W boson. Background can also arise from multijet production when two jets are misidentified as prompt leptons and a transverse momentum imbalance is reconstructed.

W +jets
The W +jets background contribution is estimated using a control sample of events where one of the two lepton candidates satisfies the identification and isolation criteria for the signal sample, and the other lepton fails to meet these criteria but satisfies less restrictive criteria (these lepton candidates are denoted "anti-identified"). Events in this sample are otherwise required to satisfy all of the signal selection requirements. The dominant component of this sample (85% to 90%) is due to W +jets events in which a jet produces an object reconstructed as a lepton. This object may be either a nonprompt lepton from the decay of a hadron containing a heavy quark, or else a particle (or particles) from a jet reconstructed as a lepton candidate.
The W +jets contamination in the signal region is obtained by scaling the number of events in the data control sample by an extrapolation factor. This extrapolation factor is measured in a data sample of jets produced in association with Z bosons reconstructed in either the e + e − or µ + µ − final state (referred to as the Z+jets control sample below). The factor is the ratio of the number of identified lepton candidates satisfying all lepton selection criteria to the number of anti-identified leptons measured in bins of anti-identified lepton p t and η. Antiidentified leptons must explicitly not satisfy the signal selection criteria (so that leptons counted in the numerator of this ratio exclude the anti-identified leptons counted in the denominator of this ratio) and the signal requirements for isolation and track impact parameters are either relaxed or removed. In addition, for anti-identified electrons the identification criteria specifically targeting conversions are removed and the anti-identified electron is explicitly required to fail the "medium" electron identification requirement specified in Ref. [23]. Figure 23 shows the p t distributions of identified muons [ Fig. 23(a)], identified electrons [ Fig. 23(b)], antiidentified muons [ Fig. 23(c)], and anti-identified electrons [ Fig. 23(d)] in the Z+jets control sample. The extrapolation factor in a given p t bin is the number of identified leptons divided by the number of anti-identified leptons in that particular bin. Each number is corrected for the presence of processes not due to Z+jets. The Z+jets sample is contaminated by other production processes that produce additional prompt leptons (e. g., WZ → ν ) or nonprompt leptons not originating from jets (e. g., Z/γ * and Zγ) that create a bias in the extrapolation factor. Kinematic criteria suppress about 80% of the contribution from these other processes in the Z+jets sample. The remaining total contribution of these other processes after applying these kinematic criteria is shown in the histograms in Fig. 23. The uncertainty shown in these histograms is the 10% systematic uncertainty assigned to the contribution from these other processes, mainly due to cross section uncertainties. This remaining contribution from other processes is estimated using Monte Carlo simulation and removed from the event yields before calculating the extrapolation factor.
The composition of the associated jets-namely the fractions of jets due to the production of heavy-flavor quarks, light-flavor quarks and gluons-in the Z+jets sample and the W +jets sample may be different. Any difference would lead to a systematic error in the estimate of the W +jets background due to applying the extrapolation factor determined with the Z+jets sample to the W +jets control sample, so Monte Carlo simulation is used to determine a correction factor that is applied to the extrapolation factors determined with the Z+jets data sample. A comparison of the extrapolation factors determined with the Z+jets sample and the W +jets sample is made for three Monte Carlo simulations: alpgen+pythia6, alpgen+herwig and powheg+pythia8. For each combination of matrixelement and parton-shower simulations, a ratio of the extrapolation factors for W +jets versus Z+jets is calculated. These three ratios are used to determine a correction factor and an uncertainty that is applied to the extrapolation factors determined with the Z+jets data sample: this correction factor is 0.99 ± 0.20 for anti-identified electrons and 1.00 ± 0.22 for anti-identified muons.
The total uncertainties on the corrected extrapolation factors are summarized in Table XV. In addition to the systematic uncertainty on the correction factor due to the sample composition, the other important uncertainties on the Z+jets extrapolation factor are due to the limited number of jets that meet the lepton selection criteria in the Z+jets control sample and the uncertainties on the contributions from other physics processes in the identified and anti-identified lepton samples. The total systematic uncertainty on the corrected extrapolation factors varies as a function of the p t of the anti-identified lepton; this variation is from 29% to 61% for anti-identified electrons and 25% to 46% for anti-identified muons. The systematic uncertainty on the corrected extrapolation factor dominates the systematic uncertainty on the W +jets background.
The uncertainties on the signal strength µ are classified into experimental, theoretical, and other components, as described in Sec. IX and Table XXVI. The uncertainty on µ due to the correction factor applied to the extrapolation factor is classified as theoretical because the uncertainty on the correction factor is derived from a comparison of predictions from different combinations of Monte Carlo generators and parton shower algorithms. The uncertainty on µ due to the other uncertainties on the extrapolation factor (Z+jet control sample statistics and the subtraction of other processes from this control sample) is classified as experimental. Figure 24 shows the extrapolation factor measured in the Z+jets data compared to the predicted extrapolation factor determined using Monte Carlo simulated samples (alpgen+pythia6) of Z+jets and W +jets for anti-identified muons [ Fig. 24(a)] and anti-identified electrons [ Fig. 24(b)]. The values of the extrapolation factors are related to the specific criteria used to select the anti-identified leptons and, as a result, the extrapolation factor for anti-identified muons is about one order of magnitude larger than the extrapolation factor for antiidentified electrons. This larger extrapolation factor does not indicate a larger probability for a jet to be misidentified as a muon compared to an electron. In fact, misidentified electrons contribute a larger portion of the W +jets background in the signal region.
The W +jets background in the signal region is determined using a control sample in which the lepton and the anti-identified lepton are required to have opposite charge. A prediction of the W +jets background is also used for a data control sample consisting of events that satisfy all of the Higgs boson signal requirements except that the two lepton candidates are required to have the same charge. This same-charge control region is described in Sec. VI D.
The W +jets process is not expected to produce equal numbers of same-charge and opposite-charge candidates.
In particular, associated production processes such as W c, where the second lepton comes from the semileptonic decay of a charmed hadron, produce predominantly opposite-charge candidates. Therefore, a separate extrapolation factor is applied to the same-charge W +jets control sample.
The procedure used to determine the same-charge extrapolation factor from the Z+jets data is the same as the one used for the signal region. Because of the difference in jet composition of the same-charge W +jets control sample, a different correction factor is derived from MC simulation to correct the extrapolation factor determined with the Z+jets data sample for application to the same-charge W +jets sample. Figure 24 compares the extrapolation factors in same-charge W +jets with the ones in Z+jets. The correction factor is 1.25 ± 0.31 for anti-identified electrons and 1.40 ± 0.49 for anti-identified muons; as with the opposite-charge correction factors, these factors and their systematic uncertainty are determined by comparing the factors determined with the three different samples of MC simulations mentioned previously in the text (alpgen+pythia6, alpgen+herwig and powheg+pythia8). The total uncertainties on the corrected extrapolation factors used to estimate the W +jets background in the same-charge control region are shown in Table XV. The correlation  TABLE XV. Uncertainties (in %) on the extrapolation factor α misid for the determination of the W +jets background. Total is the quadrature sum of the uncertainties due to the correction factor determined with MC simulation (Corr. factor), the number of jets misidentified as leptons in the Z+jets control sample (Stat) and the subtraction of other processes (Other bkg. between the systematic uncertainties on the oppositecharge and same-charge correction factors reflects the composition of the jets producing objects misidentified as leptons. These jets have a component that is chargesymmetric with respect to the charge of the W boson as well as a component unique to opposite-charge W +jets processes. Based on the relative rates of same-and opposite-charge W +jets events, 60% of the oppositecharge correction factor uncertainty is correlated with 100% of the corresponding same-charge uncertainty.

Multijets
The background in the signal region due to multijets is determined using a control sample that has two antiidentified lepton candidates, but otherwise satisfies all of the signal region selection requirements. A separate extrapolation factor-using a multijet sample-is measured for the multijet background and applied twice to this control sample. The sample used to determine the extrapolation factor is expected to have a similar sample composition (in terms of heavy-flavor jets, light-quark jets and gluon jets) to the control sample. Since the presence of a misidentified lepton in a multijet sample influences the sample composition-for example by increasing the fraction of heavy-flavor processes in the multijet sample-corrections to the extrapolation factor are made that take into account this correlation. The eventby-event corrections vary between 1.0 and 4.5 depending on the lepton flavor and p t of both misidentified leptons in the event; the electron extrapolation factor corrections W +jets, and same-charge (SC) W +jets. The bands represent the uncertainties: Stat. refers to the statistical component, which is dominated by the number of jets identified as leptons in Z+jets data; Background is due to the subtraction of other electroweak processes present in Z+jets data; and Sample is due to the variation of the α misid ratios in Z+jets to OC W +jets or to SC W +jets in the three MC samples. The symbols are offset from each other for presentation.
are larger than the muon extrapolation factor corrections.

Summary
Table XVI lists the estimated event counts for the multijet and W +jets backgrounds in the eµ channel for the various jet multiplicities. The values are given before the m t fit for the ggF-enriched categories and after the VBF-selection for the VBF-enriched categories. The uncertainties are the combination of the statistical and systematic uncertainties and are predominantly systematic. The dominant systematic uncertainty is from the extrapolation factors. In the case of the W +jets background, these uncertainties are summarized in Table XV; in the  TABLE XVI. W +jets and multijets estimated yields in the eµ category. For nj = 0 and nj = 1, yields for both the opposite-charge (OC) and same-charge (SC) leptons are given. The yields are given before the m t fit for the ggFenriched categories and after the VBF-selection for the VBFenriched categories. The uncertainties are from a combination of statistical and systematic sources. case of the multijet background, the largest contribution is the uncertainty introduced by the correlations between extrapolation factors in an event with two misidentified leptons. For the n j = 0 and n j = 1 categories, the expected backgrounds are provided for both the opposite-charge signal region and the same-charge control region (described in Sec. VI D), and the multijet background is expected to be less than 10% of the W +jets background in these two categories. For higher jet multiplicities, the multijet background is expected to be comparable to the W +jets background because there is no selection criterion applied to m t . In this case, however, the multijet background has a very different m t distribution than the Higgs boson signal, so it is not necessary to suppress this background to the same extent as in the lower jet multiplicity categories.

D. Other dibosons
There are backgrounds that originate from the production of two vector bosons other than W W . These include W γ, W γ * , WZ and ZZ production and are referred to here as V V . The V V processes add up to about 10% of the total estimated background in the n j ≤ 1 channels and are of the same magnitude as the signal. The dominant sources of these backgrounds are the production of W γ and W γ * /WZ, where this latter background is a combination of the associated production of a W boson with a nonresonant Z/γ * or an on-shell Z boson.
The normalization of the V V background processes in the eµ channel is determined from the data using a samecharge control region, which is described below. The distribution of these various contributing processes in the different signal bins is determined using MC simulation. In the ee/µµ channels, both the normalization and the distributions of the V V processes are estimated with MC simulation. The details of these simulations are provided in Sec. III C.
Several specialized data sample selections are used to validate the simulation of the rate and the shape of distributions of various kinematic quantities of the W γ and W γ * processes and the simulation of the efficiency for rejecting electrons from photon conversions. The W γ background enters the signal region when the W boson decays leptonically and the photon converts into an e + e − pair in the detector material. If the pair is very asymmetric in p t , then it is possible that only the electron or positron satisfies the electron selection criteria, resulting in a Higgs boson signal candidate. This background has a prompt electron or muon and missing transverse momentum from the W boson decay and a nonprompt electron or positron. The prompt lepton and the conversion product are equally likely to have opposite electric charge (required in the signal selection) and the same electric charge, since the identification is not charge dependent.
A sample of nonprompt electrons from photon conversions can be selected by reversing two of the electron signal selection requirements: the electron track should be part of a reconstructed photon conversion vertex candidate and the track should have no associated hit on the innermost layer of the pixel detector. Using these two reversed criteria, a sample of eµ events that otherwise satisfy all of the kinematic requirements imposed on Higgs boson signal candidates is selected; in the n j = 0 category (n j = 1 category), 83% (87%) of this sample originates from W γ production. This sample is restricted to events selected online with a muon trigger to avoid biases in the electron selection introduced by the online electron trigger requirements. Figures 25(a) and 25(b) show the m t distribution and the p t distribution of the electron of the n j = 0 category of this W γ validation sample compared to expectations from the MC simulation. Verifying that the simulation correctly models the efficiency of detecting photon conversions is important to ensure that the W γ background normalization and distributions are accurately modeled. To evaluate the modeling of photon conversions, a Z → µµγ validation sample consisting of either Zγ or Z boson production with finalstate radiation is selected. The Z boson is reconstructed in the µ + µ − decay channel, and an electron (or positron) satisfying all the electron selection criteria except the two reversed criteria specified above is selected. The µ + µ − e ± invariant mass is required to be within 15 GeV of m Z to reduce contributions from the associated production of a Z boson and hadronic jets. The resulting data sample is more than 99% pure in the Z → µµγ process. A comparison between this data sample and a Z → µµγ MC simulation indicates some potential mismodeling of the rejection of nonprompt electrons in the simulation. Hence a p t -dependent systematic uncertainty ranging from 25% for 10 < p t < 15 GeV to 5% for p t > 20 GeV is assigned to the efficiency for nonprompt electrons from photon conversions to satisfy the rejection criteria.
The W γ * background originates from the associated production of a W boson that decays leptonically and a virtual photon γ * that produces an e + e − or µ + µ − pair  in which only one lepton of the pair satisfies the lepton selection criteria. This background is most relevant in the n j = 0 signal category, where it contributes a few percent of the total background and is equivalent to about 25% of the expected Higgs boson signal.
The modeling of the W γ * background is studied with a specific selection aimed at isolating a sample of W γ * → eνµµ candidates. Events with an electron and a pair of opposite-charge muons are selected with m µµ < 7 GeV, p miss t > 20 GeV and both muons must satisfy ∆φ(e, µ) < 2.8. Muon pairs consistent with originating from the decay of a J/ψ meson are rejected. The electron and the highest p t muon are required to satisfy the signal region lepton selection criteria and p t thresholds; however, the subleading-muon p t threshold is reduced to 3 GeV. The isolation criteria for the higher-p t muon are modified to take into account the presence of the lower-p t muon. The sherpa W γ * simulation sample with m γ * < 7 GeV is compared to the data selected with the above criteria; the distributions of the m t calculated using the electron and the higher-p t muon and the invariant mass of the two muons m µµ are shown in Figs. 25(c) and 25(d).
The WZ and ZZ backgrounds are modeled with MC simulation. No special samples are selected to validate the simulation of these processes. The ZZ background arises primarily when one Z boson decays to e + e − and the other to µ + µ − and an electron and a muon are not detected. This background is very small, amounting to less than 3% of the V V background. Background can also arise from Zγ * and Zγ production if the Z boson decays to + − and one of the leptons is not identified and the photon results in a second lepton. These backgrounds are also very small, and the Zγ * background is neglected.
The V V backgrounds arising from W γ, W γ * , and WZ are equally likely to result in a second lepton that has the same charge or opposite charge compared to the lepton from the W boson decay. For this reason, a selection of eµ events that is identical to the Higgs boson candidate selection except that it requires the two leptons to have the same charge is used to define a same-charge control region. The same-charge control region is dominated by V V processes. The other process that contributes significantly to the same-charge sample are the W +jets pro- cess and-to a much lesser extent-the multijet process. The same-charge data sample can be used to normalize the V V processes once the contribution from the W +jets process is taken into account, using the method described in Sec. VI C. Figure 26 shows the distributions of the transverse mass [26(a) and 26(c)] and the subleading lepton p t [26(b) and 26(d)] for the same-charge data compared with the MC simulations after normalizing the sum of these MC predictions to the same-charge data. A single normalization factor is applied simultaneously to all four MC simulations of the V V backgrounds (shown separately in the figures). These normalization factors are β 0j = 0.92 ± 0.07 (stat.) and β 1j = 0.96 ± 0.12 (stat.) for the eµ channels in the n j ≤ 1 categories. The V V processes comprise about 60% of the total in both the zero-jet and one-jet same-charge data samples, with 30% coming from the W +jets process.
Theoretical uncertainties on the V V backgrounds are dominated by the scale uncertainty on the prediction for each jet bin. For the W γ process, a relative uncertainty of 6% on the total cross section is correlated across jet categories, and the uncorrelated jet-bin uncertainties are 9%, 53%, and 100% in the n j = 0, n j = 1, and n j ≥ 2 categories, respectively. For the W γ * process, the corresponding uncertainties are 7% (total cross section), 7% (n j = 0), 30% (n j = 1), and 26% (n j ≥ 2). No uncertainty is applied for the extrapolation of these backgrounds from the same-charge control region to the opposite-charge signal region, since it was verified in the simulation that these processes contribute equal numbers of opposite-charge and same-charge events.

E. Drell-Yan
The DY processes produce two oppositely charged leptons and some events are reconstructed with significant missing transverse momentum. This is mostly due to neutrinos produced in the Z boson decay in the case of the Z/γ * → τ τ background to the eµ channels. In contrast, in the case of the Z/γ * → ee, µµ background to the ee/µµ channels, it is mostly due to detector resolution that is degraded at high pile-up and to neutrinos produced in b-hadron or c-hadron decays (from jets produced in association with the Z boson). Preselection requirements, such as the one on p miss t , reduce the bulk of this background, as shown in Fig. 5, but the residual background is significant in all categories, especially in the ee/µµ samples. The estimation of the Z/γ * → τ τ background for the eµ samples is done using a control region, which is defined in a very similar way across all n j categories, as described below. Since a significant contribution to the Z/γ * → ee, µµ background to the ee/µµ categories arises from mismeasurements of the missing transverse momentum, more complex data-derived ap- proaches are used to estimate this background, as described below.
Mismodeling of p Z/γ * t , reconstructed as p t , was observed in the Z/γ * -enriched region in the n j = 0 category. The alpgen + herwig MC generator does not adequately model the parton shower of soft jets that balance p t when there are no selected jets in the event. A correction, based on the weights derived from a data-to-MC comparison in the Z peak, is therefore applied to MC events in the n j = 0 category, for all leptonic final states from Drell-Yan production.

Z/γ * → τ τ
The Z/γ * → τ τ background prediction is normalized to the data using control regions. The contribution from this background process is negligible in the ee/µµ channel, and in order to remove the potentially large Z/γ * → ee, µµ contamination, the CR is defined using the eµ samples in all categories except the n j ≥ 2 VBFenriched one.
The control region in the n j = 0 category is defined by the requirements m < 80 GeV and ∆φ > 2.8, which select a 91%-pure region and result in a normalization factor β 0j = 1.00 ± 0.02 (stat.). In the n j = 1 category, XVII. Z/γ * → τ τ uncertainties (in %) on the extrapolation factor α, for the nj ≤ 1 and nj ≥ 2 ggF-enriched categories. Scale, PDF and generator modeling (Gen) uncertainties are reported. For the nj = 0 category, addtional uncertainty due to p Z/γ * t reweighting is shown. The negative sign indicates anti-correlation with respect to the unsigned uncertainties in the same column. the invariant mass of the τ τ system, calculated with the collinear mass approximation, and defined in Sec. IV B, can be used since the dilepton system is boosted. An 80%-pure region is selected with m < 80 GeV and m τ τ > (m Z − 25 GeV). The latter requirement ensures that there is no overlap with the signal region selection. The resulting normalization factor is β 1j = 1.05 ± 0.04 (stat.). The n j ≥ 2 ggF-enriched category uses a CR selection of m < 70 GeV and ∆φ > 2.8 providing 74% purity and a normalization factor β 2j = 1.00 ± 0.09 (stat.). Figure 27 shows the m t distributions in the control regions in the n j = 0 and n j = 1 categories. High purity and good data/MC agreement is observed. In order to increase the available statistics in the Z/γ * → τ τ control region in the n j ≥ 2 VBF-enriched category, ee/µµ events are also considered. The contribution from Z/γ * → ee, µµ decays is still negligible. The control region is defined by the invariant mass requirements: m < 80 GeV (75 GeV in ee/µµ) and | m τ τ − m Z | < 25 GeV. The resulting normalization factor is derived after summing all three bins in O BDT and yields β = 0.9 ± 0.3 (stat.).

Regions
Three sources of uncertainty are considered on the extrapolation of the Z/γ * → τ τ background from the control region: QCD scale variations, PDFs and generator modeling. The latter are evaluated based on a comparison of alpgen + herwig and alpgen + pythia generators. An additional uncertainty on the p Z/γ * t reweighting procedure is applied in the n j = 0 category. It is estimated by comparing the different effects of reweighting with the nominal weights and with an alternative set of weights derived with a p miss t > 20 GeV requirement applied in the Z-peak region. This requirement follows the event selection criteria used in the eµ samples where the Z/γ * → τ τ background contribution is more important. Table XVII shows the uncertainties on the extrapolation factor α to the signal regions and the W W control regions in the n j ≤ 1 and n j ≥ 2 ggF-enriched categories.
2. Z/γ * → ee, µµ in nj ≤ 1 The f recoil variable (see Sec. IV) shows a clear shape difference between DY and all processes with neutrinos in the final state, including signal and Z/γ * → τ τ , which are collectively referred to as "non-DY". A method based on a measurement of the selection efficiency of a cut on f recoil from data, and an estimate of the remaining DY contribution after such a cut, is used in the ee/µµ category. A sample of events is divided into two bins based on whether they pass or fail the f recoil requirement, and the former defines the signal region. The efficiency of this cut, ε = N pass /(N pass + N fail ), measured separately in data for the DY and non-DY processes, is used together with the fraction of the observed events passing the f recoil requirement to estimate the final DY background. It is analytically equivalent to inverting the matrix: and solving for B dy , which gives the fully data-derived estimate of the DY yield in the ee/µµ signal region. The m t distribution for this background is taken from the Monte Carlo prediction, and the m t shape uncertainties due to the p Z/γ * t reweighting are found to be negligible. The non-DY selection efficiency ε non-dy is evaluated using the eµ sample, which is almost entirely composed of non-DY events. Since this efficiency is applied to the non-DY events in the final ee/µµ signal region, the event selection is modified to match the ee/µµ signal region selection criteria. This efficiency is used for the signal and for all non-DY backgrounds. The DY selection efficiency ε dy is evaluated using the ee/µµ sample satisfying the | m − m Z | < 15 GeV requirement, which selects the Zpeak region. An additional non-DY efficiency ε non-dy is introduced to account for the non-negligible non-DY contribution in the Z-peak, and is used in the evaluation of ε dy . It is calculated using the same m region but in eµ events. Numerical values for these f recoil selection efficiencies are shown in Table XVIII(a). For the non-DY f recoil selection efficiencies ε non-dy and ε non-dy , the systematic uncertainties are based on the eµ-to-ee/µµ extrapolation. They are evaluated with MC simulations by taking the full difference between the selection efficiencies for eµ and ee/µµ events in the Z-peak and SR. Obtained uncertainties are validated with alternative MC samples and with data, and are added in quadrature to the statistical uncertainties on the efficiencies. The difference in the f recoil selection efficiencies for the signal and the other non-DY processes is taken as an additional uncertainty on the signal, and is 9% for the n j = 0 category and 7% for the n j = 1 one. Systematic uncertainties on the efficiencies related to the sample composition of the non-DY background processes were found to be negligible.
The systematic uncertainties on ε dy are based on the  49 45 extrapolation from the Z peak to the SR and are evaluated with MC simulation by comparing the f recoil selection efficiencies in these two regions. This procedure is checked with several generators, and the largest difference in the selection efficiency is taken as the systematic uncertainty on the efficiency. It is later added in quadrature to the statistical uncertainty. The procedure is also validated with the data. The Z/γ * → ee, µµ background in the VBF-enriched channel is estimated using an abcd method. The BDT shape for this process is taken from a high-purity data sample with low m and low p miss t (region b). It is then normalized with a p miss t cut efficiency, derived from the data using the Z-peak region separated into low-and high-p miss t regions (c and d, respectively). It yields 0.43 ± 0.03. The final estimate in the signal region (a) is corrected with a nonclosure factor derived from the MC, representing the differences in p miss t cut efficiencies between the low-m and Z-peak regions. It yields 0.83 ± 0.22. Bins 2 and 3 of O BDT are normalized using a common factor due to the low number of events in the highest O BDT bin in region b. The normalization factors, applied to the Z/γ * → ee, µµ background in the ee/µµ channel in the signal region, are β bin1 = 1.01 ± 0.15 (stat.) and β bin2+3 = 0.89 ± 0.28 (stat.).
The uncertainty on the nonclosure factor is 17% (taken as its deviation from unity), and is fully correlated across all O BDT bins. Uncertainties are included on the O BDT shape due to QCD scale variations, PDFs, and the parton shower model, and are 11% in the bin with the highest O BDT score. No dependence of the BDT response on p miss t is observed in MC, and an uncertainty is assigned based on the assumption that they are uncorrelated (4%, 10%, and 60% in the bins with increasing O BDT score).

F. Modifications for 7 TeV data
The background estimation techniques in the n j ≤ 1 channels for 7 TeV data closely follow the ones applied to 8 TeV data. The definitions of the control regions of W W , top-quark, and Z/γ * → τ τ are the same. The Z/γ * → ee, µµ background is estimated with the same method based on the f recoil selection efficiencies. The f recoil requirements are loosened (see Sec. IV E). The calculation of the extrapolation factor in the W +jets estimate uses a multijet sample instead of a Z+jets sample, which has a limited number of events. The V V backgrounds are estimated using Monte Carlo predictions because of the small number of events in the samecharge region. In the n j ≥ 2 VBF-enriched category, the background estimation techniques are the same as in the 8 TeV analysis. The normalization factors from the control regions are given in Table XX in the next section along with the values for the 8 TeV analysis.
The theoretical uncertainties on the extrapolation factors used in the W W , top-quark, and Z/γ * → τ τ background estimation methods are assumed to be the same as in the 8 TeV analysis. Uncertainties due to experimental sources are unique to the 7 TeV analysis and are taken into account in the likelihood fit. The uncertainties on the f recoil selection efficiencies used in the Z/γ * → ee, µµ background estimation were evaluated following the same technique as in the 8 TeV analysis. The dominant uncertainty on the extrapolation factor in the W +jets estimate is due to the uncertainties on the differences in the compositions of the jets in the multijet and W +jets sample and is 29% (36%) for muons (electrons).

G. Summary
This section described the control regions used to estimate, from data, the main backgrounds to the various categories in the analysis. An overview of the observed and expected event yields in these control regions is provided in Table XIX for the 8 TeV data. This shows the breakdown of each control region into its targeted physics process (in bold) and its purity, together with the other contributing physics processes. The W W CR in the n j = 1 category is relatively low in W W purity but the normalization for the large contamination by N top is determined by the relatively pure CR for top quarks.
The normalization factors β derived from these control regions are summarized in Table XX, for both the 7 and 8 TeV data samples. Only the statistical uncertainties are quoted and in most of the cases the normalization factors agree with unity within the statistical uncertainties. In two cases where a large disagreement is observed, the systematic uncertainties on β are evaluated. One of them is the W W background in the n j = 0 category, where adding the systematic uncertainties reduces the disagreement to about two standard deviations: β = 1.22 ± 0.03 (stat.) ± 0.10 (syst.). The systematic component includes the experimental uncertainties and additionally the theoretical uncertainties on the cross section and acceptance, and the uncertainty on the luminosity determination. Including the systematic uncertainties on the normalization factor for the top-quark background in the first bin in the n j ≥ 2 VBF-enriched category reduces the significance of the deviation of the normalization factor with unity: β = 1.58 ± 0.15 (stat.) ± 0.55 (syst.). In this case, the uncertainty on MC generator modeling is also included. The systematic uncertainties quoted here do not have an impact on the analysis since the background estimation in the signal region is based on the extrapolation factors and their associated uncertainties, as quoted in the previous subsections. In addition, the sample statistics of the control region, the MC sample statistics and the uncertainties on the background subtraction all affect the estimation of the backgrounds normalized to data. TABLE XIX. Control region event yields for 8 TeV data. All of the background processes are normalized with the corresponding β given in Table XX or with the data-derived methods as described in the text; each row shows the composition of one CR. The Nsig column includes the contributions from all signal production processes. For the VBF-enriched nj ≥ 2, the values for the bins in OBDT are given. The entries that correspond to the target process for the CR are given in bold; this quantity corresponds to N bold considered in the last column for the purity of the sample (in %). The uncertainties on N bkg are due to sample size.

VII. FIT PROCEDURE AND UNCERTAINTIES
The signal yields and cross sections are obtained from a statistical analysis of the data samples described in Sec. IV. A likelihood function-defined to simultaneously model, or "fit" the yields of the various subsamples-is maximized.
The signal strength parameter µ, defined in Sec. I, is the ratio of the measured signal yield to the expected SM value. Its expected value (µ exp ) is unity by definition. A measurement of zero corresponds to no signal in the data. The observed value µ obs , reported in Sec. IX, is one of the central results of this paper.
In this section, the fit regions are described in Sec. VII A followed by the details of the likelihood function and the test statistic in Sec. VII B. Section VII C summarizes the various sources of uncertainty that affect the results. A check of the results is given in Sec. VII D.

A. Fit regions
The fit is performed over data samples defined by fit regions listed in Table XXI The profiled CRs determine the normalization of the corresponding backgrounds through a Poisson term in the likelihood, which, apart from the Drell-Yan τ τ CR, use the eµ sample. The nonprofiled CRs do not have a Poisson term and enter the fit in other ways. The details are described in the next section.
The SR categories i and fit distribution bins b that contribute to the likelihood were briefly motivated in Sec. II.
The eµ samples in the n j ≤ 1, the most signal-sensitive of all channels, are each divided into twelve kinematic regions (12 = 2 · 3 · 2): two regions in m , three regions in p 2 t , and two regions for the subleading lepton flavors. In contrast, the less sensitive ee/µµ samples for the n j ≤ 1 categories use one range of m and p 2 t . The m t distribution is used to fit all of the ggFenriched categories. Its distribution for the signal process has an upper kinematic edge at m H , but, in practice, m t can exceed m H because of detector resolution. There is also a kinematic suppression below a value of m t that increases with increasing values of m and p 2 t due to the kinematic requirements in each of the n j ≤ 1 categories.
The m t distribution for the n j = 0 category in the eµ (ee/µµ) samples uses a variable binning scheme that is optimized for each of the twelve (one) kinematic regions. In the kinematically favored range of the eµ and ee/µµ samples, there are ten bins that are approximately 5 GeV wide between a range of x to y, where x is approximately 80 GeV and y is approximately 130 GeV. A single bin at low m t , from 0 to x, has a few events in each category; another bin at high m t -from y to ∞-is populated dominantly by W W and top-quark events, constraining these backgrounds in the fit.
The m t distribution for the n j = 1 category follows the above scheme with six bins. The bins are approximately 10 GeV wide in the same range as for n j = 0.
The m t distribution of the eµ events in the ggFenriched n j ≥ 2 uses four bins specified by the bin boundaries [0, 50,80,130, ∞] GeV.
The O BDT distribution is used to fit the VBF-enriched n j ≥ 2 samples. The signal purity increases with increasing value of O BDT , so the bin widths decrease accordingly.  Table XXIX.
The interplay of the various fit regions is illustrated for one kinematic region of the n j = 0 in Fig. 28. The shape of the m t distribution is used in the fit to discriminate between the signal and the background as shown in the top row for the SR. Three profiled CRs determine the normalization factors (β k ) of the respective background contributions in situ. The variable and selections used to separate the SR from the CRs regions are given in the second row: for W W the m variable divides the SR and CR, but also the validation region (VR) used to test the W W extrapolation (see Sec. VI A); for DY the ∆φ variable divides the SR and CR with a region separating the two; for V V the discrete same/opposite charge variable is used. The last row shows the backgrounds whose normalizations are not profiled in the fit, but are computed prior to the fit.
The treatment of a given region as profiled or nonprofiled CR depends from the complexity related to its implementation in the fit, the impact of the estimated background in the analysis, and the level of contamination of the other process in the relative CR. Subdominant backgrounds and those whose estimation is not largely affected by the post-fit yield of the other backgrounds, like Wj and multijet backgrounds, are not profiled.  (11), not including the terms used for MC statistics. The signal region categories i are given in (a). The definitions for bins b are given by listing the bin edges, except for m t and OBDT, and are given in the text and noted as the fit variables on the right-most column. The background control regions are given in (b), which correspond to the ones indicated as using data in Table XI. The profiled CRs are marked by • and the others are marked by •. "Sample" notes the lepton flavor composition of the CR that is used for all the SR regions for a given nj category: "eµ" means that a eµ CR sample is used for all SR regions; the Wj and jj CRs use the same lepton-flavor samples in the SR (Same), i. e., "eµ" CR for "eµ" SR and "ee/µµ" CR for "ee/µµ" SR; the DY, ee/µµ sample is used only for the ee/µµ SR; the two rows in nj ≥ 2 VBF use a CR that combines the two samples (Both); see text for details. Energy-related quantities are in GeV.
Top CR is inclusive nj SR 0j More loose More strict

B. Likelihood, exclusion, and significance
The statistical analysis involves the use of the likelihood L(µ, θ | N ), which is a function of the signal strength parameter µ and a set of nuisance parameters θ = {θ a , θ b , . . .} given a set of the numbers of events N = {N A , N B , . . .}. Allowed ranges of µ are found using the distribution of a test statistic q µ .

Likelihood function
The likelihood function L (Eq. (11) below) is the product of four groups of probability distribution functions: • Poisson function f (N ib | . . .) used to model the event yield in each bin b of the variable fit to extract the signal yield for each category i; • Poisson function f (N l | Σ k β k B kl ) used to model the event yield in each control region l with the total background yield summed over processes k (B kl ); • Gaussian functions g(ϑ t | θ t ) used to model the systematic uncertainties t; and • Poisson functions f (ζ k | . . .) used to account for the MC statistics k. · Table   I k f ξ k ζ k ·θ k Poiss. for MC stats (11) The statistical uncertainties are considered explicitly in the first, second, and fourth terms. The first and second terms treat the random error associated with the predicted value, i. e., for a background yield estimate B the √ B error associated with it. The fourth term treats the sampling error associated with the finite sample size used for the prediction, e. g., the √ N mc "MC statistical errors" when MC is used. All of the terms are described below and summarized in Eq. (11).
The first term of L is a Poisson function f for the probability of observing N events given λ expected events, f (N | λ) = e −λ λ N /N !. The expected value λ is the sum of event yields from signal (S) and the sum of the background contributions (Σ k B k ) in a given signal region, i. e., λ = µ · S + Σ k B k . The parameter of interest, µ, multiplies S; each background yield in the sum is evaluated as described in Sec. VI. In our notation, the yields are scaled by the response functions ν that parametrize the impact of the systematic uncertainties θ. The ν and θ are described in more detail below when discussing the third term of L.
The second term constrains the background yields with Poisson components that describe the profiled control regions. Each term is of the form f (N l | λ l ) for a given CR labeled by l, where N l is the number of observed events in l, i. e., λ l = Σ k β k · B kl is the predicted yield in l, β k is the normalization factor of background k, and B kl is the MC or data-derived estimate of background k in l. The β k parameters are the same as those that appear in the first Poisson component above.
The third term constrains the systematic uncertainties with Gaussian terms. Each term is of the form g(ϑ | θ) = e −(ϑ−θ) 2 /2 / √ 2π, where ϑ represents the central value of the measurement and θ the associated nuisance parameter for a given systematic uncertainty. The effect of the systematic uncertainty on the yields is through an exponential response function ν(θ) = (1 + ) θ for normalization uncertainties that have no variations among bins b of the fit variable, where is the value of the uncertainty in question. In this case, ν follows a log-normal distribution [90]. In this notation, = 3% is written if the uncertainty that corresponds to one standard deviation affects the associated yield by ± 3% and corresponds to θ = ± 1, respectively.
For the cases where the systematic uncertainty affects a given distribution differently in each bin b, a different linear response function is used in each bin; this function is written as ν b (θ) = 1 + b · θ. In this case, ν b is normally distributed around unity with width b , and is truncated by the ν b > 0 restriction to avoid unphysical values. Both types of response function impact the predicted S and B k in the first Poisson component.
The fourth term treats the sample error due to the finite sample size [89], e. g., the sum of the number of generated MC events for all background processes, B = Σ k B k . The quantity B is constrained with a Poisson term f (ξ | λ), where ξ represents the central value value of the background estimate and λ = ζ · θ. The ζ = (B/δ) 2 defines the quantity with the statistical uncertainty of B as δ. For instance, if a background yield estimate B uses N mc MC events that correspond to a data sample with effective luminosity L mc , then for a data-to-MC luminosity ratio r = L data /L mc the background estimate is B = r · N mc , and the uncertainty (parameter) in question is δ = r · √ N mc (ζ = N mc ). In this example, the Poisson function is evaluated at N mc given λ = θ · N mc . Similar to the case for the third term, a linear response function ν(θ) = θ impacts the predicted S and B k in the first Poisson component.
In summary, the likelihood is the product of the four above-mentioned terms and can be written schematically as done in Eq. (11), where the ν br and ν bs are implicitly products over all three types of response functionsnormalization, shape of the distribution, and finite MC sample size-whose parameters are constrained by the second, third, and fourth terms, respectively. In the case of finite MC sample size, θ is unique to each bin, which is not shown in Eq. (11). The statistical treatments of two quantities-the Z/γ * → ee, µµ estimate in n j ≤ 1 and the top-quark estimate in n j = 1-are constrained with additional multiplicative terms in the likelihood (see Appendix A).
To determine the observed value of the signal strength, µ obs , the likelihood is maximized with respect to its arguments, µ and θ, and evaluated at ϑ = 0 and ξ = ζ.

Test statistic
The profiled likelihood-ratio test statistic [91] is used to test the background-only or background-and-signal hypotheses. It is defined as and it is also written as q µ ; the argument of the logarithm is written as Λ in later plots. The denominator of Eq. (12) is unconditionally maximized over all possible values of µ and θ, while the numerator is maximized over θ for a con ditional value of µ. The latter takes the valuesθ µ , which are θ values that maximize L for a given value of µ. When the denominator is maximized, µ takes the value ofμ. The p 0 value is computed for the test statistic q 0 , Eq. (12), evaluated at µ = 0, and is defined to be the probability to obtain a value of q 0 larger than the observed value under the background-only hypothesis. There are no boundaries onμ, although q 0 is defined to be negative ifμ ≤ 0. All p 0 values are computed using the asymptotic approximation that −2 ln Λ(µ) follows a χ 2 distribution [91].
A modified frequentist method known as CL S [92] is used to compute the one-sided 95% confidence level (C.L.) exclusion regions.

Combined fit
The combined results for the 7 and 8 TeV data samples account for the correlations between the analyses due to common systematic uncertainties.
The correlation of all respective nuisance parameters is assumed to be 100% except for those that are statistical in origin or have a different source for the two data sets. Uncorrelated systematics include the statistical component of the jet energy scale calibration and the luminosity uncertainty. All theoretical uncertainties are treated as correlated.

C. Systematic uncertainties
Uncertainties enter the fit as nuisance parameters in the likelihood function [Eq. (11)]. Uncertainties (both theoretical and experimental) specific to individual processes are described in Secs. V and VI; experimental uncertainties common to signal and background processes are described in this subsection. The impact on the yields and distributions from both sources of uncertainty is also discussed.

Sources of uncertainty
The dominant sources of experimental uncertainty on the signal and background yields are the jet energy scale and resolution, and the b-tagging efficiency. Other sources of uncertainty are the lepton resolutions and identification and trigger efficiencies, missing transverse momentum measurement, and the luminosity calculation. The uncertainty on the integrated luminosity in the 8 TeV data analysis is 2.8%. It is derived following the same methodology as in Ref. [93], from a preliminary calibration of the luminosity scale derived from beam-separation scans. The corresponding uncertainty in the 7 TeV data analysis is 1.8%.
The jet energy scale is determined from a combination of test beam, simulation, and in situ measurements [28]. Its uncertainty is split into several independent categories: modeling and statistical uncertainties on the extrapolation of the jet calibration from the central region (η intercalibration), high-p t jet behavior, MC nonclosure uncertainties, uncertainties on the calorimeter response and calibration of the jets originating from light quarks or gluons, the b-jet energy scale uncertainties, uncertainties due to modeling of in-time and out-of-time pile-up, and uncertainties on in situ jet energy corrections. All of these categories are further subdivided by the physical source of the uncertainty. For jets used in this analysis (p t > 25 GeV and | η | ≤ 4.5), the jet energy scale uncertainty ranges from 1% to 7%, depending on p t and η. The jet energy resolution varies from 5% to 20% as a function of the jet p t and η. The relative uncertainty on the resolution, as determined from in situ measurements, ranges from 2% to 40%, with the largest value of the resolution and relative uncertainty occurring at the p t threshold of the jet selection.
The method used to evaluate the b-jet tagging efficiency uses a sample dominated by dileptonic decays of top-quark pairs. This method is based on a likelihood fit to the data, which combines the per-event jet-flavor information and the expected momentum correlation between the jets to allow the b-jet tagging efficiency to be measured to high precision [31]. To further improve the precision, this method is combined with a second calibration method, which is based on samples containing muons reconstructed in the vicinity of the jet. The uncertainties related to b-jet identification are decomposed into six uncorrelated components using an eigenvector method [33]. The number of components is equal to the numbers of p t bins used in the calibration, and the uncertainties range from < 1% to 7.8%. The uncertainties on the misidentification rate for light-quark jets depend on p t and η, and have a range of 9-19%. The uncertainties on c-jets reconstructed as b-jets range between 6% and 14% depending on p t only.
The reconstruction, identification, isolation, and trigger efficiencies for electrons and muons, as well as their momentum scales and resolutions, are estimated using Z → ee, µµ, J/ψ → ee, µµ, and W → eν, µν decays [20,23]. The uncertainties on the lepton and trigger efficiencies are smaller than 1% except for the uncertainty on the electron identification effficiency, which varies between 0.2% and 2.7% depending on p t and η, and the uncertainties on the isolation efficiencies, which are the largest for p t < 15 GeV and yield 1.6% and 2.7% for electrons and muons, respectively.
The changes in jet energy and lepton momenta due to varying them by their systematic uncertainties are propagated to E miss t ; the changes in the high-p t object momenta and in E miss t are, therefore, fully correlated [34]. Additional contributions to the E miss t uncertainty arise from the modeling of low-energy particle measurements (soft terms). In the calorimeter, these particles are measured as calibrated clusters of cells that are above a noise threshold but not associated with reconstructed physics objects. The longitudinal and perpendicular (with respect to the hard component of the missing transverse momentum) components of the soft terms are smeared and rescaled in order to assess the associated uncertainties. The uncertainties are parametrized as a function of the magnitude of the summed p t of the high-p t objects, and they are evaluated in bins of the average number of interactions per bunch crossing. This results in variations on the mean of the longitudinal component of 0.2-0.3 GeV, where the upper bound corresponds to the hard objects with p t > 60 GeV. The resolution of the longitudinal and perpendicular components varies between 1% and 4%, where the largest uncertainties are for objects with p t < 30 GeV.
Jet energy and lepton momentum scale uncertainties are also propagated to the p miss t calculation. The systematic uncertainties related to the track-based soft term are based on the balance between tracks not associated with charged leptons and jets and the total transverse momentum of the hard objects in the event. These uncertainties are calculated by comparing the properties of p miss t in Z → ee, µµ events in real and simulated data, as a function of the magnitude of the summed p t of the hard p t objects in the event. The variations on the mean of the longitudinal component are in the range 0.3-1.4 GeV and the uncertainties on the resolution on the longitudinal and perpendicular components are in the range 1.5-3.3 GeV, where the lower and upper bounds correspond to the range of the sum of the hard p t objects below 5 GeV and above 50 GeV, respectively.

Impact on yields and distributions
In the likelihood fit, the experimental uncertainties are varied in a correlated way across all backgrounds and all signal and control regions, so that uncertainties on the extrapolation factors α described in Sec. VI are correctly propagated. If the normalization uncertainties are less than 0.1% they are excluded from the fit. If the shape uncertainties (discussed below) are less than 1% in all bins, they are excluded as well. Removing such small uncertainties increases the performance and stability of the fit.
In the fit to the m t distribution to extract the signal yield, the predicted m t shape from simulation is used for all of the backgrounds except W +jets and multijets. The impact of experimental uncertainties on the m t shapes for the individual backgrounds and signal are evaluated, and no significant impact is observed for the majority of the experimental uncertainties. Those experimental uncertainties that do produce statistically significant variations of the shape have no appreciable effect on the final results, because the uncertainty on the m t shape of the total background is dominated by the uncertainties on the normalizations of the individual backgrounds. The theoretical uncertainties on the W W and W γ * m t shape are considered in the n j ≤ 1 categories, as discussed in Secs. VI A 1 and VI D. In the n j ≥ 2 ggF-enriched category, only the theoretical uncertainties on the top-quark m t shape are included (see Sec. VI B 4).
The O BDT output distribution is fit in the n j ≥ 2 VBFenriched category, and as with the m t distribution its shape is taken from the MC simulation, except for the W +jets and multijet background processes. The theoretical uncertainties on the top-quark O BDT shape are included in the analysis, as described in Sec. VI B 4.
Table XXII(a) shows the relative uncertainties on the combined predicted signal yield, summed over all the lepton-flavor channels, for each n j category for the 8 TeV analysis. They represent the final post-fit uncertainties on the estimated yields. The first two entries show the perturbative uncertainties on the ggF jet-bin acceptances in the exclusive n j = 0 and n j = 1 categories. The following entries are specific to the QCD scale uncertainties on the inclusive n j ≥ 2 and n j ≥ 3 cross sections, and on the total cross section and the acceptance. The latter includes the uncertainties due to the PDF variations, UE/PS and generator modeling, as described in Table X. The uncertainties on the VBF production process are also shown but are of less importance. The dominant TABLE XXII. Sources of systematic uncertainty (in %) on the predicted signal yield (Nsig) and the cumulative background yields (N bkg ). Entries marked with a dash (-) indicate that the corresponding uncertainties either do not apply or are less than 0.1%. The values are post-fit and given for the 8 TeV analysis.
(a) Uncertainties on Nsig (in %) uncertainties on the signal yields are theoretical. The uncertainties on the f recoil selection efficiency (relevant to the Z/γ * → ee, µµ estimate in the n j ≤ 1 categories) are applied only in the ee/µµ channels.
Table XXII(b) shows the leading uncertainties on the cumulative background yields for each n j category. The first three entries are theoretical and apply to the W W , top-quark, and V V processes (see Sec. VI). The remaining uncertainties arise from the modeling of specific backgrounds and from experimental uncertainties. XXIII. Composition of the post-fit uncertainties (in %) on the total signal (Nsig), total background (N bkg ), and individual background yields in the signal regions. The total uncertainty (Total) is decomposed into three components: statistical (Stat.), experimental (Expt.) and theoretical (Theo.). Entries marked with a dash (-) indicate that the corresponding uncertainties either do not apply or are less than 1%. The values are given for the 8 Table XXIII summarizes the above post-fit uncertainties on the total signal and backgrounds yields. The uncertainties shown are divided into three categories: statistical, experimental and theoretical. The statistical uncertainties are only relevant in the cases where the background estimates rely on the data. For example, the entry under N W W in n j = 0 represents the uncertainty on the sample statistics in the W W control region. The uncertainties on N top in the n j ≤ 1 categories also include the uncertainties on the corrections applied to the nor-malization factors. The uncertainties from the number of events in the control samples used to derive the W +jets and multijet extrapolation factors are listed under the experimental category, as discussed in Sec. VI C. Uncertainties on the total W +jets estimate are reduced compared to the values quoted in Table XV, because they are largely uncorrelated between lepton p t bins (statistical uncertainties on the Z+jets data sample) and between the lepton flavors (systematic uncertainties on the OC correction factor). The uncertainty due to the limited sample of background MC events for all the considered processes is included in the experimental component.
Background contamination in the control regions causes anti-correlations between different background processes, resulting in an uncertainty on the total background smaller than the sum in quadrature of the individual process uncertainties. This effect is called "cross talk" and is most prominent between the W W and top-quark backgrounds in the n j = 1 category. The uncertainties on the background estimates, as described in Sec. VI, cannot be directly compared to the ones presented in Table XXIII. The latter uncertainties are post-fit and are subject to subtle effects, such as the cross talk mentioned above, and also pulls and data-constraints (defined below) on the various nuisance parameters.

D. Checks of fit results
The fit simultaneously extracts the signal strength µ and the set of auxiliary parameters θ. This process adjusts the initial pre-fit estimation of every parameter θ as well as its uncertainty, ∆ θ . The fit model is designed to avoid any significant constraints on the input uncertainties to minimize the assumptions on the correlations between the phase spaces in which they are measured and applied. This is achieved by having mostly singlebin control regions. Of central importance is the preand post-fit comparison of how the variation of a given systematic source translates to an uncertainty on µ.
The impact of a single nuisance parameter θ is assessed by considering its effect on the signal strength, i. e., whereμ is the post-fit value of the signal strength. In the following, quantities with a hat represent post-fit parameter values or their uncertainties. The valuesμ(θ ± ∆ θ ) are the result of a fit with one θ varied by ±∆ θ around the post-fit value for θ, namelŷ θ. All other θ are floating in these fits. In the pre-fit scenario, the ∆ θ are taken as their pre-fit values of ± 1, as θ is constrained by a unit Gaussian. The postfit scenario is similar, but withθ varied by its post-fit uncertainty of ∆θ. This uncertainty is found by a scan about the maximum so that the likelihood ratio takes the values −2 ln(L(θ ± ∆θ)/L(θ)) = 1. The corresponding impact onμ is ∆μ. When ∆ θ is less than the pre-fit value, θ is said to be data-constrained. In this case the systematic uncertainty is reduced below its input value given the information from the data. This can result from the additional information that the data part of the likelihood injects. As can be seen from Table XXIV, only a few of the uncertainties are data-constrained, and only one of them is data-constrained by more than 20%. That is the W W generator modeling that includes the m t shape uncertainties correlated with the uncertainties on the extrapolation factor α W W . The data-constraint in this case comes from the high-m t tail of the signal region, which contains a large fraction of W W events.
The post-fit values for θ modify the rates of signal and background processes, and the data-constraints affect the corresponding uncertainties. The results of these shifts are summarized in Table XXIV for a set of twenty nuisance parameters ordered by the magnitude of ∆μ (Higgs signal hypothesis is taken at m H = 125 GeV). The highest-ranked nuisance parameter is the uncertainty on the total ggF cross section due to the PDF variations. It changesμ by −0.06/+0.06 when varied up and down by ∆ θ , respectively. It is followed by the uncertainty on the total ggF cross section due to QCD scale variations and W W generator modeling uncertainty. Other uncertainties that have a significant impact onμ include the effects of generator modeling on α top , the systematic uncertainties on α misid originating from a correction for oppositely charged electrons and muons, the luminosity determination for 8 TeV data, and various theoretical uncertainties on the ggF and VBF signal production processes. In total there are 253 nuisance parameters which are divided into three main categories: experimental uncertainties (137 parameters), theoretical uncertainties (72 parameters) and normalisation uncertainties (44 parameters). They are further divided into more categories as shown in Table XXVI.

VIII. YIELDS AND DISTRIBUTIONS
The previous section described the different parameters of the simultaneous fit to the various signal categories defined in the preceding sections. In particular, the signal and background rates and shapes are allowed to vary in order to fit the data in both the signal and control regions, within their associated uncertainties.
In the figures and tables presented in this section, background processes are individually normalized to their post-fit rates, which account for changes in the normalization factors (β) and for pulls of the nuisance parameters (θ). The varying background composition as a function of m t (or O BDT in the n j ≥ 2 VBF-enriched category) induces a shape uncertainty on the total estimated background. As described in Sec. VII C, additional specific shape uncertainties are included in the fit procedure and are accounted for in the results presented in Sec. IX. No specific m t shape uncertainties are applied to the figures since their contribution to the total systematic uncertainty band was found to be negligible. The Higgs boson signal rate is normalized to the observed signal strength reported in Sec. IX.
This section is organized as follows. The event yields are presented in Sec. VIII A for each signal category including the statistical and systematic uncertainties. The relevant distributions in the various signal regions are shown in Sec. VIII B. Section VIII C summarizes the differences in the event and object selection, the signal treatment and the background estimates with respect to the previously published analysis [5]. Table XXV shows the post-fit yields for all of the fitted categories in the 8 TeV [Table XXV(a)] and 7 TeV [Table XXV(b)] data analyses. The signal yields are scaled with the observed signal strength derived from the simultaneous combined fit to all of the categories. All of the background processes are normalized to the post-fit β values (where applicable) and additionally their rates take into account the pulls of the nuisance parameters. The observed and expected yields are shown, for each n j category, separately for the eµ and ee/µµ channels. The sum of the expected and observed yields is also reported. The uncertainties include both the statistical and systematic components.

A. Event yields
As described in the previous section, the changes in the normalization factors and the pulls of the nuisance parameters can affect the expected rates of the signal and background processes. The differences between the pre-fit (tables in Sec. IV) and post-fit (Table XXV) expected rates for each background process are compared to the total uncertainty on that expected background, yielding a significance of the change. In the analysis of the n j ≤ 1 category of the 8 TeV data, most of the changes are well below one standard deviation. In the eµ n j = 0 sample, the expected multijet background is increased by 1.3 standard deviations (equivalent to a 30% increase in the expected multijet background prediction which corresponds to 2% of the signal prediction) due to the positive pulls of the three nuisance parameters assigned to the uncertainties on the extrapolation factor. A negative pull of the nuisance parameter associated with the uncertainties on the DY f recoil selection efficiency changes the Z/γ * → ee, µµ yield in the ee/µµ n j = 0 sample by 1.6 standard deviations (equivalent to a 40% decrease in DY in this category which corresponds to 25% of the signal prediction).

B. Distributions
The transverse mass formed from the dilepton and missing transverse momenta (m t ) is used as the final discriminant in the extraction of the signal strength in the n j ≤ 1 and n j ≥ 2 ggF-enriched categories. The likelihood fit exploits the differences in m t shapes between the signal and background processes.
Several of the m t distributions for the eµ sample (corresponding to different choices of the m and p 2 t bins) in the n j ≤ 1 categories are shown in Fig. 29. The background composition, signal contribution, and the separation in the m t distributions between signal and background are different for each region. In general, as shown in Figs. 29(a)-(c), the W W process dominates the background contributions in regions with n j = 0; the difference between these distributions is due to the varying signal contribution and background m t shape. In contrast, Fig. 29(d) shows that V V and W +jets processes are dominant backgrounds in the 10 < m < 30 GeV and 10 < p 2 t < 15 GeV region. For most of the distributions shown in Fig. 29, agreement between data and MC is improved qualitatively when including the expected signal from a Standard Model Higgs boson with m H = 125 GeV.
The m t distributions for the ee/µµ samples in the n j ≤ 1 categories are shown in Fig. 30. In contrast to the eµ distributions, the residual DY background is present in these samples at low values of m t .
For the ggF-enriched n j ≥ 2 category, Fig. 31 shows the m t distribution. In contrast to the n j ≤ 1 distributions, the dominant backgrounds arise from top-quark and Z/γ * → τ τ production (shown together with the negligible contribution from Z/γ * → ee, µµ).
For the VBF-enriched n j ≥ 2 category, a selectionbased analysis, which uses the m t distribution as the discriminant, is used as a cross-check of the BDT result. In this case, m t is divided into three bins (with boundaries at 80 and 130 GeV) and an additional division in m jj at 1 TeV is used in the eµ channel to profit from the difference in shapes between signal and background processes. Figure 32(a) shows the m t distribution before the division into the high-and low-m jj regions. Figure 32(b) shows the scatter plot of m jj versus m t . The areas with the highest signal-to-background ratio are characterized TABLE XXV. Signal region yields with uncertainties. The tables give the ggF-and VBF-enriched post-fit yields for each nj category, separated for the 8 and 7 TeV data analyses. The N signal columns show the expected signal yields from the ggF and VBF production modes, with values scaled to the observed combined signal strength (see Sec. IX C). For each group separated by a horizontal line, the first line gives the combined values for the different subchannels or BDT bins. The yields and the uncertainties take into account the pulls and data-constraints of the nuisance parameters, and the correlations between the channels and the background categories. The quoted uncertainties include the theoretical and experimental systematic sources and those due to sample statistics. Values less than 0.1 (0.01) events are written as 0.0 (-).
by low m t and high m jj . Figures 33(a) and 33(c) show the O BDT outputs in the eµ and ee/µµ samples, respectively. In terms of VBF signal production, the third BDT bin provides the highest purity, with a signal-to-background ratio of approximately two. The m t variable is an input to the BDT and its distributions after the BDT classification are shown in Figs. 33(b) and 33(d), combining all three BDT bins, for the eµ and ee/µµ samples, respectively. Figure 34 shows the m t distributions in the 7 TeV analysis in the various signal regions in the n j ≤ 1 categories.
Characteristics similar to those in the 8 TeV analysis are observed, but with fewer events.
Finally, Fig. 35(a) shows the combined m t distribution, summed over the lepton-flavor samples and the n j ≤ 1 categories for the 7 and 8 TeV data analyses. To illustrate the significance of the excess of events observed in data with respect to the total background, the systematic uncertainty on the signal is omitted. The uncertainty band accounts for the correlations between the signal regions, including between the 7 and 8 TeV data, and for the varying size of the uncertainties as a function of m t . Figure 35(b) shows the residuals of the data with respect to the total estimated background compared to the expected m t distribution of an SM Higgs boson with m H = 125 GeV scaled by the observed combined signal strength (see Sec. IX). The level of agreement observed in Fig. 35(b) between the background-subtracted data and the expected Higgs boson signal strengthens the interpretation of the observed excess as a signal from Higgs boson decay.

C. Differences with respect to previous results
The analysis presented in this paper has better sensitivity than the previous ATLAS analysis [5]. The most important changes-described in detail below-include improvements in the object identification, the signal acceptance, the background estimation and modeling, and the fit procedure.
Electron identification is based on a likelihood technique [23] that improves background rejection. An improved definition of missing transverse momentum, p miss t based on tracks, is introduced in the analysis since it is robust against pile-up and provides improved resolution with respect to the true value of missing transverse momentum.
Signal acceptance is increased by 75% (50%) in the n j = 0 (1) category. This is achieved by lowering the p 2 t threshold to 10 GeV. Dilepton triggers are included in addition to single lepton triggers, which allows reduction of the p 1 t threshold to 22 GeV. The signal kinematic region in the n j ≤ 1 categories is extended from 50 to 55 GeV. The total signal efficiency, including all signal categories and production modes, at 8 TeV and for a Higgs boson mass of 125.36 GeV increased from 5.3% to 10.2%.
The methods used to estimate nearly all of the background contributions in the signal region are improved. These improvements lead to a better understanding of the normalizations and thus the systematic uncertainties.  tainties between the top-quark control region and signal regions in the n j = 1 categories. The Z/γ * → τ τ background process is normalized to the data in a dedicated high-statistics control region in the n j ≤ 1 and n j ≥ 2 ggF-enriched categories. The V V backgrounds are normalized to the data using a new control region, based on a sample with two same-charge leptons. Introducing this new control region results in the cancellation of most of the theoretical uncertainties on the V V backgrounds. The multijet background is now explicitly estimated with an extrapolation factor method using a sample with two anti-identified leptons. Its contribution is negligible in the n j ≤ 1 category, but it is at the same level as W +jets background in the n j ≥ 2 ggF-enriched category. A large number of improvements are applied to the estimation of the W +jets background, one of them being an estimation of the extrapolation factor using Z+jets instead of dijet data events.
Signal yield uncertainties are smaller than in the previous analysis. The uncertainties on the jet multiplicity distribution in the ggF signal sample, previously estimated with the Stewart-Tackmann technique [76], are now estimated with the jet-veto-efficiency method [75]. This method yields more precise estimates of the signal rates in the exclusive jet bins in which the analysis is performed.  selection-based approach, is used for the VBF category. This improves the sensitivity of the expected VBF results by 60% relative to the previously published analysis. The ggF-enriched category is a new subcategory that targets ggF signal production in this sample.
In summary, the analysis presented in this paper brings a gain of 50% in the expected significance relative to the previous published analysis [5].

IX. RESULTS AND INTERPRETATIONS
Combining the 2011 and 2012 data in all categories, a clear excess of signal over the background is seen in Fig. 35. The profile likelihood fit described in Sec. VII B is used to search for a signal and characterize the production rate in the ggF and VBF modes. Observation of the inclusive Higgs boson signal, and evidence for the VBF production mode, are established first. Following that, the excess in data is characterized using the SM Higgs boson as the signal hypothesis, up to linear rescalings of the production cross sections and decay modes. Results include the inclusive signal strength as well as those for the individual ggF and VBF modes. This information is also interpreted as a measurement of the vector-boson and fermion couplings of the Higgs boson, under the assumptions outlined in Ref. [62]. Because this is the first observation in the W W * → ν ν channel using ATLAS data, the exclusion sensitivity and observed exclusion limits as a function of m H are also presented to illustrate the improvements with respect to the version of this analysis used in the 2012 discovery [4]. Finally, cross-section measurements, both inclusive and in specific fiducial regions, are presented. All results in this section are quoted for a Higgs boson mass corresponding to the central value of the ATLAS measurement in the ZZ → 4 and γγ decay modes, m H = 125.36 ± 0.41 GeV [9].

A. Observation of the H → W W * decay mode
The test statistic q µ , defined in Sec. VII B, is used to quantify the significance of the excess observed in Sec. VIII. The probability that the background can fluctuate to produce an excess at least as large as the one observed in data is called p 0 and is computed using q µ with µ = 0. It depends on the mass hypothesis m H through the distribution used to extract the signal (m t or O BDT ). The observed and expected p 0 are shown as a function of m H in Fig. 36. The observed curve presents a broad minimum centered around m H ≈ 130 GeV, in contrast with the higher p 0 -values observed for lower and higher values of m H . The shapes of the observed and expected curves are in good agreement.
The probability p 0 can equivalently be expressed in terms of the number of standard deviations, referred to as the local significance (Z 0 ). The value of p 0 as a function of m H is found by scanning m H in 5 GeV intervals. The minimum p 0 value is found at m H = 130 GeV and corresponds to a local significance of 6.1 standard deviations. The same observed significance within the quoted precision is found for m H = 125.36 GeV. This result establishes a discovery-level signal in the H → W W * → ν ν channel alone. The expected significance for a SM Higgs boson at the same mass is 5.8 standard deviations.
In order to assess the compatibility with the SM expectation for a Higgs boson of mass m H , the observed best-fitμ value as a function of m H is shown in Fig. 37. The assumption that the total yield is predicted by the SM is relaxed to evaluate the two-dimensional likelihood contours of (m H , µ), shown in Fig. 38. The value (µ = 1, m H = 125.36 GeV) lies well within the 68% C.L. contour, showing that the signal observed is compatible with those in the high-resolution channels.

B. Evidence for VBF production
The n j ≥ 2 VBF-enriched signal region was optimized for its specific sensitivity to the VBF production process, as described in particular in Sec. IV. Nevertheless, as can be seen in Table XXV, the ggF contribution to this signal region is large, approximately 30%, so it has to be profiled by the global fit together with the extraction of the significance of the signal strength of the VBF production process.
The global likelihood can be evaluated as a function of the ratio µ vbf /µ ggf , with both signal strengths varied independently. The result is illustrated in Fig. 39, which has a best-fit value for the ratio of The value of the likelihood at µ vbf /µ ggf = 0 can be interpreted as the observed significance of the VBF production process for m H = 125.36 GeV, and corresponds to 3.2 standard deviations; the expected significance is 2.7 standard deviations. This establishes the evidence for the VBF production mode in the H → W W * → ν ν final state. The significance derived from testing the ratio µ vbf /µ ggf = 0 is equivalent to the significance of testing µ vbf = 0, though testing the ratio is conceptually advantageous since the branching fraction cancels in this parameter, while it is implicit in µ vbf .
This result was verified with the cross-check analysis described in Sec. IV C, in which the multivariate discriminant is replaced with a series of event selection requirements motivated by the VBF topology. The expected and observed significances at m H = 125.36 GeV are 2.1 and 3.0 standard deviations, respectively. The compatibility of the 8 TeV results from the cross-check and O BDT analyses was checked with pseudo-experiments, considering the statistical uncertainties only and fixing µ ggf to 1.0. With those caveats, the probability that the difference in Z 0 values is larger than the one observed is 79%, reflecting good agreement.

C. Signal strength µ
The parameter µ is used to characterize the inclusive Higgs boson signal strength as well as subsets of the signal regions or individual production modes. First, the ggF and VBF processes can be distinguished by using the normalization parameter µ ggf for the signal predicted for the ggF signal process, and µ vbf for the signal predicted for the VBF signal process. This can be done for a fit to any set of the signal regions in the various categories. In addition, to check that the measured value is consistent among categories, different subsets of the signal regions can be fit. For example, the n j = 0 and n j = 1 categories can be compared, or the eµ and ee/µµ categories. To derive these results, only the signal regions are separated; the control region definitions do not change. In particular, the control regions defined using only eµ events are used, even when only ee/µµ signal regions are considered.
The combined Higgs signal strength µ, including 7 and 8 TeV data and all signal region categories, is: The uncertainties are divided according to their source. The statistical uncertainty accounts for the number of observed events in the signal regions and profiled control regions. The statistical uncertainties from Monte Carlo simulated samples, from nonprofiled control regions, and from the extrapolation factors used in the W +jets background estimate are all included in the experimental uncertainties here and for all results in this section. The theoretical uncertainty includes uncertainties on the signal acceptance and cross section as well as theoretical uncertainties on the background extrapolation factors and normalizations. The expected value of µ is 1 +0. 16 −0.15 (stat.) +0.17 −0.13 (syst.). In order to check the compatibility with the SM predictions of the ggF and VBF production processes, µ ggf and µ vbf can be simultaneously determined through a fit to all categories because of the different sensitivity to these processes in the various categories. In this fit, the VH contribution is included although there is no dedicated category for it, and the SM value for the ratio σ vbf /σ vh is assumed. Technically, the signal strength µ vbf+vh is measured, but because the contribution from VH is negligible, the notation µ vbf is used. The corresponding twodimensional likelihood contours as a function of µ ggf and µ vbf are shown in Fig. 40 The details of the uncertainties on µ, µ ggf , and µ vbf are shown in Table XXVI. The statistical uncertainty is the largest single source of uncertainty on the signal strength results, although theoretical uncertainties also play a substantial role, especially for µ ggf .
The signal strength results are shown in Table XXVII for m H = 125.36 GeV. The table includes inclusive results as well as results for individual categories and production modes. The expected and observed significance for each category and production mode is also shown. The µ values are consistent with each other and with unity within the assigned uncertainties. In addition to serving as a consistency check, these results illustrate the sensitivity of the different categories. For the overall signal strength, the contribution from the n j ≥ 2 VBF category is second only to the n j = 0 ggF category, and the n j ≥ 2 ggF category contribution is comparable to those in the n j = 0 and n j = 1 ee/µµ categories.
For all of these results, the signal acceptance for all pro-  duction modes is evaluated assuming a SM Higgs boson. The VH production process contributes a small number of events, amounting to about 1% of the expected signal from the VBF process. It is included in the predicted signal yield, and where relevant, is grouped with the VBF signal assuming the SM value of the ratio σ vbf /σ vh . The small (< 1%) contribution of H → τ τ to the signal regions is treated as signal, assuming the branching fractions as predicted by the SM.

D. Higgs couplings to fermions and vector bosons
The values of µ ggf and µ vbf can be used to test the compatibility of the fermionic and bosonic couplings of the Higgs boson with the SM prediction using a framework motivated by the leading-order interactions [62]. The parametrization uses the scale factors κ F , applied to all fermionic couplings, and κ V , applied to all bosonic couplings; these parameters are unity for the SM.
In particular, the ggF production cross section is proportional to κ 2 F through the top-quark or bottom-quark loops at the production vertex, and the VBF production cross section is proportional to κ 2 V . The branching TABLE XXVI. Summary of uncertainties on the signal strength µ. The table gives the relative uncertainties for inclusive Higgs production (left), ggF production (middle), and VBF production (right). For each group separated by a horizontal line, the first line gives the combined result. The "profiled signal region" indicates the contribution of the uncertainty on the ggF signal yield to the µvbf measurement and vice versa. The "misid. factor" is the systematic uncertainty related to the W +jets background estimation. The "Z/γ * → ee, µµ" entry corresponds to uncertainties on the f recoil selection efficiency for the nj ≤ 1 ee/µµ category. The "muons and electrons" entry includes uncertainties on the lepton energy scale, lepton momentum corrections, lepton trigger efficiencies, and lepton isolation efficiencies. The "jets" uncertainties include the jet energy scale, jet energy resolution, and the b-tagging efficiency. Values are quoted assuming mH = 125.36 GeV. The plot for VBF (third column) has a different scale than the the other columns to show the relative uncertainties per column. The entries marked with a dash are smaller than 0.01 or do not apply. fraction B H → W W * is proportional to κ 2 V and inversely proportional to a linear combination of κ 2 F and κ 2 V . This model assumes that there are no non-SM decay modes, so the denominator corresponds to the total decay width in terms of the fermionic and bosonic decay amplitudes. The formulae, following Ref. [62], are The small contribution from B H → γγ depends on both κ F and κ V and is not explicitly shown.
As a result, the κ 2 F dependence for the ggF process approximately cancels, but the rate remains sensitive to κ V . Similarly, the VBF rate scales approximately with κ 4 V /κ 2 F and the VBF channel provides more sensitivity to κ F than the ggF channel does in this model. Because Eq. (17) contains only κ 2 F and κ 2 V , this channel is not sensitive to the sign of κ F or κ V .
The likelihood scan as a function of κ V and κ F is shown in Fig. 41. Both the observed and expected contours are shown, and are in good agreement. The relatively low discrimination among high values of κ F in the plot is due to the functional behavior of the total ggF yield. The product σ ggf · B does not depend on κ F in the limit where κ F κ V , so the sensitivity at high κ F values is driven by the value of µ vbf . The VBF process rapidly vanishes TABLE XXVII. Signal significance Z0 and signal strength µ. The expected (Exp) and observed (Obs) values are given; µexp is unity by assumption. For each group separated by a horizontal line, the highlighted first line gives the combined result. The plots correspond to the values in the table as indicated. For the µ plot, the thick line represents the statistical uncertainty (Stat) in the signal region, the thin line represents the total uncertainty (Tot), which includes the uncertainty from systematic sources (Syst). The uncertainty due to background sample statistics is included in the latter. The last two rows report the results when considering ggF and VBF production modes separately. The values are given assuming mH = 125. 36   in the limit where κ F κ V due to the increase of the Higgs boson total width and the consequent reduction of the branching fraction to W W bosons. Therefore, within this framework, excluding µ vbf = 0 excludes κ F κ V .  (18) and their correlation is ρ = 0.47. The correlation is derived from the covariance matrix constructed from the second-order mixed partial derivatives of the likelihood, evaluated at the best-fit values of κ F and κ V .

E. Exclusion limits
The analysis presented in this paper has been optimized for a Higgs boson of mass m H = 125 GeV, but, due to the low mass resolution of the ν ν channel, it is sensitive to SM-like Higgs bosons of mass up to 200 GeV and above. The exclusion ranges are computed using the modified frequentist method CL S [92]. A SM Higgs boson of mass m H is considered excluded at 95% C.L. if the value µ = 1 is excluded at that mass. The analysis is expected to exclude a SM Higgs boson with mass down to 114 GeV at 95% C.L. The clear excess of signal over background, shown in the previous sections, results in an observed exclusion range of 132 < m H < 200 GeV, extending to the upper limit of the search range, as shown in Fig. 42.

F. Higgs production cross sections
The measured signal strength can be used to evaluate the product σ · B H → W W * for Higgs boson production at m H = 125.36 GeV, as well as for the individual ggF and VBF production modes. The central value is simply the product of µ and the predicted cross section used to define it. The uncertainties are similarly scaled, except for the theoretical uncertainties related to the total production yield, which do not apply to this measurement. These are the QCD scale and PDF uncertainties on the total cross sections, and the uncertainty on the branching fraction for H → W W * , as described in Sec. V. In practice, the corresponding nuisance parameters are fixed to their nominal values in the fit, effectively removing these uncertainties from consideration. Inclusive cross-section measurements are performed for ggF and VBF production. The cross section is also measured for ggF production in defined fiducial volumes; this approach minimizes the impact of theoretical uncertainties.

Inclusive cross sections
Inclusive cross sections are evaluated at both 7 and 8 TeV for the ggF production process and at 8 TeV for the VBF production process. The 7 TeV VBF cross section is not measured because of the large statistical uncertainty. The signal strengths used for ggF and VBF are determined through a simultaneous fit to all categories as described in Sec. IX C. The small VH contribution, corresponding to 0.9%, is neglected, and its expected fractional yield is added linearly to the total error. The where (sig.) indicates the systematic uncertainties on the total signal yield for the measured process, which do not affect the cross-section measurement. The effect of uncertainties on the signal yield for other production modes is included in the systematic uncertainties. In terms of the measured signal strength, the inclusive cross section is defined as In this equation, A is the kinematic and geometric acceptance, and C is the ratio of the number of measured events to the number of events produced in the fiducial phase space of the detector. The product A × C is the total acceptance for reconstructed events.
The predicted cross-section values are 3.3 ± 0.4 pb, 4.2 ± 0.5 pb, and 0.35 ± 0.02 pb, respectively. These are derived as described in Sec. V, and the acceptance is evaluated using the standard signal MC samples.

Fiducial cross sections
Fiducial cross-section measurements enable comparisons to theoretical predictions with minimal assumptions about the kinematics of the signal and possible associated jets in the event. The cross sections described here are for events produced within a fiducial volume closely corresponding to a ggF signal region. The fiducial volume is defined using generator-level kinematic information, as specified in Table XXVIII. In particular, the total p t of the neutrino system (p νν t ) replaces the p miss t , and each lepton's p t is replaced by the generated lepton p t , where the lepton four-momentum is corrected by adding the four-momenta of all photons within a cone of size ∆R = 0.1 to account for energy loss through QED finalstate radiation. These quantities are used to compute m t . Jets are defined at hadron level, i.e., after parton showering and hadronization but before detector simulation. To minimize dependence on the signal model, and therefore the theoretical uncertainties, only eµ events in the n j ≤ 1 categories are used. Also, only the 8 TeV data sample is used for these measurements. The measured fiducial cross section is defined as with the multiplicative factor A being the sole difference with respect to the inclusive cross-section calculation. The measured fiducial cross section is not affected by the theoretical uncertainties on the total signal yield nor by the theoretical uncertainties on the signal acceptance. The total uncertainty is reduced compared to the value for the inclusive cross section because the measured signal yield is not extrapolated to the total phase space. The correction factors for n j = 0 and n j = 1 events, C ggf 0j and C ggf 1j , are evaluated using the standard signal MC sample. The reconstructed events include leptons from τ decays, but for simplicity, the fiducial volume is defined without these contributions. According to the simulation, the fraction of measured signal events within the fiducial volume is 85% for n j = 0 and 63% for n j = 1.
The values of the correction factors are C ggf 0j = 0.507 ± 0.027 The experimental systematic uncertainty is approximately 5%. Remaining theoretical uncertainties on the C ggf values were computed by comparing the ggF predictions of powheg+herwig, powheg+pythia8 and powheg+pythia6, and are found to be approximately 2% and are neglected. The acceptance of the fiducial volume is The uncertainties on the acceptance are purely theoretical in origin and the largest contributions are from the effect of the QCD scale on the jet multiplicity requirements.
The cross-section values are computed by fitting the µ values in the n j = 0 and n j = 1 categories. The VBF contribution is subtracted assuming the expected yield from the SM instead of using the simultaneous fit to the VBF signal regions as is done for the inclusive cross sections. The non-negligible ggF yield in the VBF categories would require an assumption on the ggF acceptance for different jet multiplicities, whereas the fiducial cross-section measurement is intended to avoid this type of assumption. The effect of the theoretical uncertainties on the VBF signal yield is included in the systematic uncertainties on the cross sections. The obtained signal strengths are: The predicted values are 19.9 ± 3.3 fb and 7.3 ± 1.8 fb, respectively.

X. CONCLUSIONS
An observation of the decay H → W W * → ν ν with a significance of 6.1 standard deviations is achieved by an analysis of ATLAS data corresponding to 25 fb −1 of integrated luminosity from √ s = 7 and 8 TeV pp collisions produced by the Large Hadron Collider at CERN. This observation confirms the predicted decay of the Higgs boson to W bosons, at a rate consistent with that given by the Standard Model. The SM predictions are additionally supported by evidence for VBF production in this channel, with an observed significance of 3.2 standard deviations. For a Higgs boson with a mass of 125.36 GeV, the ratios of the measured cross sections to those predicted by the Standard Model are consistent with unity for both gluonfusion and vector-boson-fusion production: These total cross sections, as well as the fiducial cross sections measured in the exclusive n j = 0 and n j = 1 categories, allow future comparisons to the more precise cross section calculations currently under development.
The analysis strategies described in this paper set the stage for more precise measurements using future collisions at the LHC. The larger data sets will significantly reduce statistical uncertainties; further modeling and analysis improvements will be required to reduce the leading systematic uncertainties. Future precise measurements of the H → W W * → ν ν decay will provide more stringent tests of the detailed SM predictions of the Higgs boson properties.
The m t distribution is used in the likelihood fit for the ggF-enriched n j samples (see Sec. VII). Figure 43 shows an example of the binned m t distribution in the most sensitive kinematic region of n j = 0 and eµ lepton-flavor category. The optimization procedure for the widths was discussed in Sec. VII A. Table XXIX gives the details of the binning for every kinematic region. The m t range between the bin 1 (around 80 GeV) and the last bin (around 120 GeV) is binned in variable widths. For kinematic regions in the n j = 0 category, the variable widths are approximately 5 GeV; for n j = 1, the widths are approximately 10 GeV. For both samples, the r.m.s. of the bin widths from the mean bin width is approximately 1 GeV. Lastly, the ggF-enriched n j ≥ 2 and the cross-check VBFenriched n j ≥ 2 categories use the same set of fixed m t bin boundaries with bins of variable width. The details of the treatment for the Drell-Yan estimate for the ee/µµ category in the n j ≤ 1 sample are described.
The method uses additional control regions to constrain the parameters corresponding to the selection efficiencies of contributing processes categorized into "DY" and "non-DY"; the latter includes the signal events. The variable f recoil is used to separate the two categories, and to divide the sample into "pass" and "fail" subsamples. In the ee/µµ categories, the pass samples are enriched in non-DY events and, conversely, the fail samples are enriched in DY events. The residual cross-contamination is estimated using additional control regions.
Of particular interest is the data-derived efficiency of the f recoil selection for the DY and non-DY events. The efficiency of the applied f recoil selection on DY events (on ε dy ) is obtained from the ee/µµ sample in the Z peak (in the Z CR), defined by the dilepton mass range | m − m Z | < 15 GeV. Events in the Z CR are relatively pure in DY. The ε dy estimates the efficiency of the selection due to neutrinoless events with missing transverse momentum due to misreconstruction, or "fake missing transverse momentum." The same parameter appears in two terms, one for the Z CR and the other for the signal region, each composed of two Poisson functions.
The non-DY events with neutrino final states, or "real missing transverse momentum," contaminate both the Z CR and the SR, and the two corresponding f recoil selection efficiencies, ε non-dy and ε non-dy , are evaluated separately. The non-DY efficiency in the Z CR, ε non-dy , is evaluated using the Z CR selection except with the eµ sample, which is pure in non-DY events. The SR effi-TABLE XXIX. m t bins for the likelihood fit in the 8 TeV analysis. The first bin spans 0 to "bin 2 left edge"; the last bin spans "last bin left edge" to ∞. The bin widths w b of those between the first and last bins are given. The mean of the variable width bins, w = b w b (n bins − 2), is given as well as the r.m.s. of the deviation with respect to the mean, b (w − w b ) 2 /(n bins − 2). All energy-related quantities are in GeV.

Category
Bin left edge Bin widths w b for bin b Mean width, r.m.s. of deviation ciency ε non-dy is evaluated using the ee/µµ SR selection (described in Sec. VI E 2) applied to an eµ sample. The fit CR part of the likelihood function [Eq. (11)] contains two Poisson functions that represent events-in the Z mass window in the ee/µµ category-that pass or fail the f recoil selection: f N Zcr pass β dy · ε dy · B Zcr dy + ε non-dy · B Zcr non-DY · (A1) f N Zcr fail β dy · 1−ε dy · B Zcr dy + 1−ε non-dy · B Zcr non-dy , where N is the observed number of events and B the background estimate without applying an f recoil selec-tion. The superscript denotes the Z CR mass window; the subscript pass (fail) denotes the sample of events that pass (fail) the f recoil selection; and the subscripts DY (non-DY) denotes background estimates for the Drell-Yan (all except Drell-Yan) processes. The non-DY estimate, B Zcr non-dy , is a sum of all contributing processes listed in Table I; normalization factors, such as β W W , that are described in Sec. VI are implicitly applied to the corresponding contributions. The Drell-Yan estimate is normalized explicitly by a common normalization factor β dy applied to both the passing and failing subsamples of the Z peak.
The ε non-DY parameter above is determined using events in the eµ category. The corresponding Poisson functions are included in the likelihood: f N Zcr,eµ pass ε non-dy · B Zcr,eµ non-dy · f N Zcr,eµ fail (1 − ε non-dy ) · B Zcr,eµ non-dy , where the eµ in the superscript denotes the Z CR mass window for events in the eµ category; all other notation follows the convention for Eq. (A1). The DY contamination in this region is implicitly subtracted. The SR part of the likelihood also contains two Poisson functions-using the same ε dy above, but a different β DY and ε non-dy corresponding to the SR-is f N sr pass β dy · ε dy · B sr dy + ε non-dy · B sr non-dy · (A3) f N sr fail β dy · 1−ε dy · B sr dy + 1−ε non-dy · B sr non-dy , where SR denotes the signal region selection and β dy is the common normalization factor for the Drell-Yan estimate for the pass and fail subsamples. The parameter ε non-dy is constrained following the same strategy as Eq. (A2) with f N sr,eµ pass ε non-dy · B sr,eµ non-dy · f N sr,eµ fail 1−ε non-dy · B sr,eµ non-dy , where the eµ in the superscript denotes the ee/µµ SR selection (including the one on f recoil ) applied to events in the eµ category. As noted before, the DY contamination in this region is implicitly subtracted.
3. Top-quark estimate for nj = 1 The details of the in situ treatment for the b-tagging efficiency for the top-quark estimate for n j = 1 category are described.
The method uses two control regions within the n j = 2 sample: those with one and two b-jets. These CRs constrain the normalization parameter for the b-tagging efficiency of top-quark events (β b-tag ) and for the top-quark cross section in these regions (β top ).
The Poisson terms for the control regions are where N 1b 2j (N 2b 2j ) corresponds to the number of observed events with one (two) b-jets; B 1b top (B 2b top ) is the corresponding top-quark estimates from MC samples; and B other are the rest of the processes contributing to the sample.
The parameter β top enters only in the above terms, while β b-tag is applied to other regions. In the top-quark CR, one factor of β top is applied to the expected topquark yield. In the SR and the W W CR, the treatment is of the same form as the second line of Eq. (A5) applied to the n j = 1 sample, i. e., the estimated top-quark background is B 0b top + (1 − β b-tag ) · B 1b top . In summary, the difference between the observed and the expected b-tagging efficiency corrects the number of estimated untagged events in the SR.

Appendix B: BDT performance
Section IV C motivated the choice of variables used in the n j ≥ 2 VBF-enriched category based on their effectiveness in the cross-check analysis. Many of the variables exploit the VBF topology with two forward jets and no activity in the central region. The main analysis in this category is based on the multivariate technique that uses those variables as inputs to the training of the BDT. The training is optimized on the simulated VBF signal production and it treats simulated ggF production as a background. Figures 44 and 45 show the distributions of the input variables in the eµ and ee/µµ samples, respectively. The comparison is based only on MC simulation and it shows the separation between the VBF signal and the background processes, motivating the use of the chosen variables.
The O BDT distributions are shown in Fig. 46. The lowest O BDT score is assigned to the events that are classified as background, and the highest score selects the VBF signal events. This separation can be seen in these distributions. The final binning configuration is four bins with boundaries at [−0.48, 0.3, 0.78], and with bin numbering from 0 to 3. The background estimation and the signal extraction is then performed in bins of O BDT . Figure 47 shows the data-to-MC comparison of the input variables in the three highest O BDT bins. Good agreement is observed in all the distributions. The event properties of the observed events in the highest BDT bin in the n j ≥ 2 VBF-enriched category are shown in Table XXX.   The variables are shown after the common preselection and the additional selection requirements in the nj ≥ 2 VBF-enriched category, and they include: m , ∆φ , m t , and ∆y jj (top two rows); mjj, p sum t , Σ C , and Σ m j (bottom two rows). The distributions show the separation between the VBF signal and background processes (ggF signal production is treated as such). The VBF signal is scaled by fifty to enhance the differences in the shapes of the input variable distributions. The SM Higgs boson is shown at mH = 125 GeV. The uncertainties on the background prediction are only due to MC sample size.   FIG. 45. Distributions of the variables used as inputs to the training of the BDT in the ee/µµ sample in the 8 TeV data analysis. The variables are shown after the common preselection and the additional selection requirements in the nj ≥ 2 VBF-enriched category, and they include: m , ∆φ , m t , and ∆y jj (top two rows); mjj, p sum t , Σ C , and Σ m j (bottom two rows). The distributions show the separation between the VBF signal and background processes (ggF signal production is treated as such). The VBF signal is scaled by fifty to enhance the differences in the shapes of the input variable distributions. The SM Higgs boson is shown at mH = 125 GeV. The uncertainties on the background prediction are only due to MC sample size.