Constraints on anomalous Higgs boson couplings to vector bosons and fermions from the production of Higgs bosons using the $\tau\tau$ final state

A study of anomalous couplings of the Higgs boson to vector bosons and fermions is presented. The data were recorded by the CMS experiment at a center-of-mass energy of pp collisions at the LHC of 13 TeV and correspond to an integrated luminosity of 138 fb$^{-1}$. The study uses Higgs boson candidates produced mainly in gluon fusion or electroweak vector boson fusion at the LHC that subsequently decay to a pair of $\tau$ leptons. Matrix-element and machine-learning techniques were employed in a search for anomalous interactions. The results are combined with those from the four-lepton and two-photon decay channels to yield the most stringent constraints on anomalous Higgs boson couplings to date. The pure $CP$-odd scenario of the Higgs boson coupling to gluons is excluded at 2.4 standard deviations. The results are consistent with the standard model predictions.


Introduction
The discovery of the Higgs boson (H) by the ATLAS and CMS experiments at the LHC [1-3] has opened a new era for particle physics, wherein the characterization of the new boson is of crucial importance. Studies of the Higgs boson test the standard model (SM) of particle physics and probe for new physics. Thus far, the properties of the H are found to be consistent with the SM predictions [4][5][6][7][8][9][10]. In particular, nonzero spin assignments of the H have been excluded [11,12], and its spin-parity quantum numbers are consistent with J PC = 0 ++ . However, the limited precision of current studies allows for anomalous couplings of the H with two electroweak gauge bosons (HVV) or gluons (Hgg). Possible CP-violating effects in H couplings to fermions (Hff) have been constrained by the CMS and ATLAS Collaborations in ttH production [20,21,30], and by the CMS Collaboration in the H → ττ decay [22], where CP-odd couplings may appear at tree level, and are not suppressed by loop effects. In the SM, Hgg is mediated via loops, where the top quark dominates. Any observed CP violation in the Hgg interaction would indicate either a CP-odd Higgs coupling to top quarks (Htt) or a new effective interaction requiring new particles. Thus, a study of the Hgg coupling provides complementary information on the nature of the H and serves as an indirect search for new phenomena. Both the CMS and ATLAS Collaborations have previously searched for CP-violation in the Hgg coupling, but these constraints are quite weak [21,31].
In this paper, we report on a search for anomalous effects, including possible signs of CP violation, in the tensor structure of the H interactions with electroweak bosons and gluons in the production of the H. The analysis is performed in four ττ final states: eµ, eτ h , µτ h , and τ h τ h , where e, µ and τ h indicate τ decays into electrons, muons and hadrons, respectively. We follow the formalism used in previous CMS studies of anomalous couplings in Run 1 and Run 2, described in Refs. [11,[14][15][16][17][18][19][20][21]. The two dominant production channels employed in this study are electroweak vector boson fusion (VBF) and gluon fusion (ggH). Compared to our previous study in the H → ττ channel [19], we have improved the sensitivity to anomalous effects with multivariate tools, optimization of the final state categorization, and an increased data sample. The analysis utilizes a matrix element likelihood approach (MELA) [13,[32][33][34][35] and a neural network to optimize the measurement of anomalous couplings using production and decay kinematic information. Compared to our similar study in the H → 4ℓ channel [21], where ℓ denotes an electron or muon and both production and decay information is used, the inclusion of the H → ττ channel leads to a substantial improvement in constraints on anomalous couplings due to a larger sample of VBF and ggH events reconstructed in association with two jets. The results obtained from the two decay channels are further combined to form the most stringent constraint on anomalous couplings. The combined ggH results are further combined with the ttH analysis using the H → γγ and H → 4ℓ decays [20,21] under the assumption of top quark dominance in ggH to constrain the Htt anomalous couplings.
The paper is organized as follows. The phenomenology of anomalous HVV and Hff couplings is discussed in Sec. 2. The kinematics of the processes studied and the observables utilized in this study to search for anomalous contributions are described in Sec. 3. The CMS detector is described in Sec. 4. The data used in this study, the Monte Carlo (MC) simulation, as well as event reconstruction methods are described in Sec. 5. The event selection and categorization is documented in Sec. 6. Methods to estimate backgrounds are given in Sec. 7 and the sources of systematic uncertainty are listed in Sec. 8. The analyses of the Hgg and HVV interactions using H → ττ decays are presented in Secs. 9 and 10, respectively. The combination of the H → ττ results with the H → 4ℓ and H → γγ decay channels is detailed in Sec. 11. Section 12 summarizes the results. Tabulated results are provided in the HEPData record for this analysis [36].
Interactions of a spin-0 H with two spin-1 gauge bosons VV, such as WW, ZZ, Zγ, γγ, and gg, are parametrized by a scattering amplitude that includes three tensor structures with expansion of coefficients up to (p 2 /Λ 2 1 ): where p i , ϵ Vi , and m V1 are the four-momentum, polarization vector, and pole mass of the gauge boson, indexed by i = 1, 2. The gauge boson's field strength tensor and the dual field strength The coupling coefficients a VV i , which multiply the three tensor structures, and κ VV i /(Λ VV 1 ) 2 , which multiply the next term in the p 2 expansion for the first tensor structure, are to be determined from data, where Λ 1 is the scale of beyond the SM (BSM) physics. The convention ϵ 0123 = +1 defines the relative sign of the CPodd and CP-even couplings. The sign in front of the gauge fields in the covariant derivative defines the sign of the photon field and sets the sign convention of the Zγ couplings. The conventions adopted in this analysis are discussed in Sec. 5.
In Eq. (1), the only nonzero SM contributions at tree level are a WW 1 and a ZZ 1 , which are assumed to be equal under custodial symmetry. All other ZZ and WW couplings are considered anomalous contributions, which are either due to BSM physics or small contributions arising in the SM from loop effects that cannot be detected with the current precision [46]. Among the anomalous contributions, considerations of symmetry and gauge invariance require a = 0 [47]. For the gg couplings, the only nonzero couplings are a gg 2 and a gg 3 , which are anomalous contributions due to BSM physics and do not account for interactions mediated by SM particles via loops. Therefore, in total there are 13 independent parameters that describe the H coupling to the electroweak gauge bosons and two that describe the coupling to gluons. The a VV 3 couplings are CP-odd, and their presence together with any other CP-even couplings would result in CP violation in a given process.
Our earlier measurements [11] and a more recent phenomenological study [46] indicated substantially stronger limits on a γγ, Zγ 2 and a γγ, Zγ 3 couplings from H → Zγ and H → γγ decays with on-shell photons than from measurements with virtual photons, so we do not pursue measurements of these parameters in this paper, and they are set to zero when measuring other anomalous couplings.
As the event kinematics of the H production in WW fusion and in ZZ fusion are very similar, it is essentially impossible to distinguish between a WW i and a ZZ i in the VBF production. It is therefore necessary to choose a convention to set the relative size of the HWW and HZZ couplings. The results can be reinterpreted for any chosen relationship between the a WW i and a ZZ i couplings [18].
In our measurements, we adopt two approaches to set the relationship between the a ) 2 . In the second approach (Approach 2) we reinterpret the results for the CP-violating coupling a 3 following the procedure described in Ref. [18].
In this reinterpretation we apply additional considerations of custodial and SU(2)×U(1) symmetries in the relationships of anomalous couplings [47,48] It is convenient to measure the effective cross section ratios f ai rather than the anomalous couplings a i themselves, as most uncertainties cancel in the ratio. Moreover, the effective fractions are conveniently bounded between −1 and 1, independent of the coupling convention. The effective fractional cross sections f ai are defined as follows [21]: where σ i is the cross section for the process corresponding to a i = 1 with all other couplings set to zero. The choice of the sign for the κ 1 and κ Zγ 2 terms follows the convention introduced in the prior results [11,17,18,21]. The other sign conventions follow the JHUGEN 7.0.2 [32][33][34][35] event generator, as discussed in Section 5 and Ref. [46]. For consistency with previous CMS measurements in the H → 4ℓ channel [11,21], the σ i coefficients are defined for the gg → H → VV → 2e2µ process. The numerical values are given in Table 1 as calculated using the JHUGEN event generator. It is assumed that the couplings in Eq. (1) are constant and real, and therefore this formulation is equivalent to an effective Lagrangian formalism. Table 1: Cross sections for the anomalous contributions (σ i ) used to define the fractional cross sections [21]. The σ i values are defined as the cross section computed with a i = 1 and all other couplings set to zero. All cross sections are given relative to the SM value (σ 1 ). In the case of the κ 1 and κ Zγ 2 couplings, the numerical values Λ 1 = Λ Zγ 1 = 100 GeV are considered so as to keep all coefficients of similar order of magnitude.
The ggH process is a purely loop-induced process, which in the SM is generated by the top quark, with a smaller contribution from the bottom quark [49]. This interaction is CP-even in the SM. However, a contribution of the CP-odd interaction in the H coupling to fermions is not ruled out, and the search for such a CP-violating interaction can be performed in ttH production and H → ττ decay. Under the assumption that other BSM particles do not contribute to the gluon fusion loop, a CP-structure measurement in the ggH process is equivalent to the measurement of the CP structure in Yukawa interactions, which can be parametrized with the amplitude The effective fractional cross section for Hff couplings is defined as [20] f Hff An equivalent effective mixing angle α Hff is also used to describe the CP-odd contribution to the H Yukawa couplings and is defined as where | f Hff CP | = sin 2 α Hff . Therefore, with just two contributions to the gluon fusion loop (CPeven and CP-odd fermion couplings), the two parameters are equivalent. However, with consideration of multiple contributions, as discussed in the case of electroweak HVV couplings above, multiple fractional contributions have to be defined and a single angle is not sufficient. The ggH loop can be generated by unknown heavy BSM particles, in addition to the SM fermions, and the effective coupling results in the CP-even a gg 2 and CP-odd a gg 3 couplings, defined in Eq. (1). In the effective field theory (EFT) approach [48], they correspond to two EFT couplings in the Higgs basis: where α S is the running strong coupling constant. Therefore, there are at least four contributions to consider (κ t , κ t , c gg , c gg ), where in the SM we have (κ t , κ t , c gg , c gg ) = (1, 0, 0, 0). The dependence of the ggH cross section and H branching fractions on these parameters is given in Ref. [46]. Under the assumptions that the only SM particles contributing to the loop are the top and bottom quarks and (κ b , κ b ) = (1, 0), the ggH cross section relative to the SM expectation is given as Within the framework of our analysis, however, it is hard to distinguish between the κ f and a gg 2 contributions, or between κ f and a gg 3 . There are small differences in the transverse momentum p T distributions of the H, and one can also observe effects in the off-shell H production [47]. However, the former is too small to have a noticeable effect in this analysis, and the latter does not come within the scope of our analysis based on the on-shell production. Therefore, we absorb the SM fermion loop contribution, dominated by the heavy top quark, into the overall a Under the assumption that only the top and bottom quarks contribute to gluon fusion with κ t = κ b and κ t = κ b , the following relationship [47] holds: In this paper, we present a search for anomalous Hgg couplings in the gluon fusion production and anomalous HVV couplings in VBF and associative H production with a W or Z boson (VH). In addition, the Hgg measurement is interpreted in terms of constraints on Hff couplings under the assumption of top quark dominance in gluon fusion. We measure a given anomalous coupling while setting the values of all other anomalous coupling parameters to zero, with the exception of measuring the CP-odd parameters, f a3 and f ggH a3 , as CP violation in VBF and VH production would modify the same kinematic distributions as those in the ggH process. Therefore, we treat f a3 as an unconstrained parameter when we measure f ggH a3 , and vice versa. The presence of CP violation in the decay of the H to a pair of τ leptons does not affect the measurements of the production process, and thus we assume the SM kinematics for the H decays.

Production and decay kinematics, and discriminants
Because exotic nonzero spin assignments of the H have been excluded , we focus on the analysis of couplings of a spin-0 H. When combined with the momentum transfer squared of the vector bosons, p 2 1 and p 2 2 , the five angles in Fig. 1 provide complete kinematic information for production and decay of the H.  [32,34]. The illustration for H production via ggH in association with two jets is identical to the VBF diagram, except with V = g.
There are four possible and practical ways to access CP-violating effects (or more generally anomalous HVV or Hff couplings) using the reconstructed H → ττ events: 1. correlation of H and two quark jets or leptons in VBF and VH production; 2. correlation of H and two quark jets in ggH production; 3. correlation of H and quark jets in ttH or tH production; and 4. correlation of decay products of two τ leptons.
There are no spin correlations between the production and decay through a spin-0 object. Therefore, all four of the above processes can be studied independently and they target different parameters that are independent, even though all of them may be related to anomalous effects. This analysis focuses on searching for anomalous effects in the topologies described as the first and second items above. We refer to those as the anomalous HVV and Hgg couplings, respectively.

Correlation of H and two quark jets or leptons in VBF and VH production
Kinematic distributions of associated particles in VBF and VH production are sensitive to the quantum numbers and anomalous couplings of the H. A set of observables could be defined in production, such as } for the VH process (as shown in Fig. 1 and discussed in Ref. [34]). It is a challenging task to perform an optimal analysis in a multidimensional space of observables. The MELA method introduced earlier [2, [32][33][34][35] is designed to reduce the number of observables to the minimum, while retaining all essential information. Two types of discriminants are defined for the production process. One type of discriminant separates the process with anomalous couplings (denoted as generic BSM here) from the SM one: where the probability density P of a certain process (either SM or anomalous signal) is calculated using the MELA [32][33][34][35] package, that contains a library of matrix elements for the signal processes from JHUGEN. The discriminant for each anomalous coupling is listed in Table 2.
The second type of discriminant isolates the interference contribution: where P int SM−BSM is the interference part of the probability distribution for a process with a mixture of the SM and anomalous contributions. This discriminant is called D VBF CP (D ggH CP ) in the CP-odd VBF (ggH) amplitude analysis. The discriminant is this case is a CP-odd observable, and a forward-backward asymmetry in its distribution would indicate CP violation. Probabilities are normalized for the matrix elements to give the same cross sections in the relevant phase space of each process. Such normalization leads to a balanced distribution of events in Table 2: List of discriminants for separating anomalous couplings from the SM contribution. The third column indicates the hypothesis that was assumed for the signal production process when computing the matrix elements that are inputs to the discriminants. For the D ggH 0− discriminant, the "ggH" label indicates that this observable is constructed using matrix elements computed for the ggH production process to differentiate it from the equivalent discriminant for the VBF process (D 0− ). For the D Zγ Λ1 for discriminant, the "Zγ" label is used to indicate that it targets the κ Zγ 2 anomalous coupling parameter to differentiate it from the D Λ1 discriminant that targets κ 1 .
VBF the range between 0 and 1, or between −1 and 1, for the D BSM and D int discriminants, respectively.
The two other observables in Eqs. (11) and (12) rely only on signal matrix elements and are well defined. One can apply the Neyman-Pearson lemma to prove that, in the absence of detector smearing, they become the minimal and complete set of optimal observables [34,35] for the measurement of the f ai parameters defined in Sec. 2.
In application to the CP measurement with the f a3 parameter, the two optimal observables are called D 0− and D VBF CP , because J P = 0 − is the BSM hypothesis in this case, and the interference discriminant is an unambiguously CP-sensitive observable. A distinct forward-backward asymmetry in the D VBF CP distribution (forward defined as D VBF CP > 0 and backward as D VBF CP < 0) appears only in the presence of CP violation. These observables could be defined for both VBF and VH processes. However, since the analysis selection is optimized for the VBF process, the probabilities in the discriminant calculation in Eqs. (11) and (12) are defined for the VBF process.

Correlation of H and two jets in the production of H via ggH
Kinematic distributions of associated particles in ggH production are also sensitive to the quantum numbers and anomalous Hgg couplings. The set of observables Ω in this topology is identical to the VBF process (as shown in Fig. 1 and discussed in Ref. [34]).
Similar to the VBF and VH study, we form optimal D ggH 0− and D ggH CP observables that are sensitive to CP violation. However, unlike in the VBF production study, the sensitivity to CP violation using MELA observables in ggH production is comparable to the signed azimuthal difference between two leading jets, defined as [50] The sign is defined by ordering the jets in η, which ensures that the observable is sensitive to the interference between the CP-even and CP-odd contributions [37,38,51]. Thus, in this study we cross check the results obtained using the MELA discriminants with a simplified approach that does not rely on multivariate techniques and utilizes ∆ϕ jj as the CP-sensitive observable. We will refer to these approaches as the "MELA method" and the "∆ϕ jj method", respectively.

The CMS detector
The main feature of the CMS detector is a superconducting solenoid of 6 m internal diameter, that provides a magnetic field of 3.8 T. Within the volume of the solenoid, there are a silicon pixel and strip tracker detectors, as well as a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter; both calorimeters are composed of a barrel and two endcap sections. Forward calorimeters extend the coverage in pseudorapidity, η. The CMS muon system is comprised of gas-ionization chambers embedded in the steel fluxreturn yoke outside the CMS solenoid.
The CMS data acquisition system employs a two-tiered trigger system [52] to select events of interest. The first level (L1), composed of custom hardware processors, utilizes information from muon detectors and both calorimeters to select collision events at a rate of about 100 kHz within a fixed latency below 4 µs. The second level, also known as the high-level trigger, further reduces the event acceptance rate to about 1 kHz before data storage by using a full event reconstruction software, optimized for fast processing, running on a computing farm.
A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [53].

Data and simulation samples
The data samples used in this analysis correspond to integrated luminosities of 36. Following the formalism discussed in Sec. 2, the samples with the SM and anomalous H couplings in VBF and VH production are generated with the JHUGEN program at leading order (LO) in quantum chromodynamics (QCD). All the simulated scenarios are reweighted to model any other set of H couplings using the MELA package. The VBF and VH JHUGEN SM simulations, after parton showering modeling, are explicitly compared with the next-to-leading order (NLO) QCD SM simulations produced by POWHEG 2.0 [59][60][61][62][63][64] and no significant differences are found in kinematic observables. Therefore, the JHUGEN simulation is used to describe kinematics of the VBF and VH processes with anomalous coupling effects in VBF and VH processes, with the expected yields scaled to match the SM theoretical predictions for inclusive cross sections and H → ττ branching fraction from Ref. [48], and the POWHEG 2.0 SM prediction of relative event yields in the categorization of events based on associated particles.
Anomalous ggH events are produced with up to two jets at NLO QCD accuracy using MAD-GRAPH5 aMC@NLO 2.6.0 [65][66][67] and are also studied with JHUGEN at LO. The inclusive cross section and H → ττ branching fraction are scaled to match the SM theoretical predictions from Ref. [48], and the p T and jet multiplicity distributions are reweighted to match the POWHEG NNLOPS predictions [49,68]. The relationship between the Hff and Hgg couplings follows JHUGEN with the relative sign of CP-odd and CP-even coefficients opposite to that assumed in MADGRAPH5 aMC@NLO 2.6.0. This choice corresponds to the convention ϵ 0123 = +1 [47]. The sign convention of the photon field in JHUGEN is opposite to that in MADGRAPH5 aMC@NLO, which leads to the opposite sign of the HZγ couplings. This sign convention depends on the sign in front of the gauge fields in the covariant derivative and this analysis follows [46]. The PYTHIA event generator is used to model the H decay to τ leptons and the decays of the τ leptons. Both scalar and pseudoscalar H → ττ decays and their interference have been simulated to confirm that the observables used in the analysis are not sensitive to anomalous couplings affecting the H → ττ decays. Thus, the default samples are generated with the SM H → ττ decay process.
The MADGRAPH5 aMC@NLO [65] generator is used to produce W+jets and Z → ee/µµ+jets samples at LO accuracy. The MADGRAPH5 aMC@NLO generator is also used for diboson production simulated at NLO, whereas POWHEG version 2.0 is used for tt [69] and single top quark (t-channel) production [70], and POWHEG version 1.0 is used for single top quark production in association with a W boson [71]. Simulated events include the contribution from additional pp interactions within the same or adjacent bunch crossings (pileup) and are weighted to reproduce the observed pileup distribution in data.

Event selection
The reconstruction of recorded and simulated events relies on the particle-flow (PF) algorithm [78], which combines the information from the CMS subdetectors to identify and reconstruct muons, electrons, photons, and charged and neutral hadrons emerging from pp collisions. Combinations of these PF candidates are used to reconstruct higher-level objects such as jets, τ candidates, or missing transverse momentum, ⃗ p miss T .
The primary vertex (PV) is taken to be the vertex corresponding to the hardest scattering in the event, evaluated using tracking information alone, as described in Sec. 9.4.1 of Ref. [79].
Electrons are identified with a multivariate discriminant combining several quantities describing the track quality, the shape of the energy deposits in the ECAL, and the compatibility of the measurements from the tracker and the ECAL [80]. Muons are identified with requirements on the quality of the track reconstruction and on the number of measurements in the tracker and the muon systems [81]. A relative isolation variable, I ℓ , is defined as the total energy deposited in a cone of size of R < 0.3 (0.4) centered on the electron (muon) direction divided by the p T of the lepton. The expected contribution to the energy sum from pileup interactions is estimated and subtracted from the total. To reject lepton candidates arising from misidentified jet constituents or from hadron decays, we require that I ℓ < 0.15.
Hadronic jets are clustered from the reconstructed PF particles using the infrared and collinear safe anti-k T algorithm [82,83] with a distance parameter ∆R = √ (∆η) 2 + (∆ϕ) 2 of 0.4. Jet momentum is determined as the vector sum of all particle momenta in the jet, and is found from simulation to be, on average, within 5 to 10% of the true momentum over the whole p T spectrum and detector acceptance. Pileup interactions can contribute additional tracks and calorimetric energy depositions to the jet momentum. To mitigate this effect, charged particles identified to be originating from pileup vertices are discarded and an offset correction is applied to correct for the remaining contributions. Jet energy corrections are derived from simulation to bring the measured response of jets to that of particle level jets on average. In situ measurements of the momentum balance in dijet, photon + jet, Z + jet, and multijet events are used to account for any residual differences in the jet energy scale between data and simulation [84]. The jet energy resolution amounts typically to 15%-20% at 30 GeV, 10% at 100 GeV, and 5% at 1 TeV [84]. Additional selection criteria are applied to each jet to remove jets potentially dominated by anomalous contributions from various subdetector components or reconstruction failures. In this analysis, jets are required to have p T > 30 GeV and |η| < 4.7, and to be separated from the reconstructed visible τ decay products by a distance parameter of at least 0.5, where ϕ is the azimuthal angle in radians. Data collected in the most forward region of the ECAL endcaps were affected by large amounts of noise during the 2017 run, which led to disagreements between simulation and data. To mitigate this effect, jets used in the analysis of the 2017 data are discarded if they have p T < 50 GeV and 2.650 < |η| < 3.139. Hadronic jets that contain b quarks ("b jets") are identified using a deep neural network (DNN), called the deep combined secondary vertex algorithm [85].
Hadronically decaying τ leptons are reconstructed with the hadron-plus-strips algorithm [86,87], which is seeded with anti-k T jets with p T > 14 GeV. This algorithm reconstructs τ h candidates based on the number of tracks and the number of ECAL strips with energy deposits within the associated η-ϕ plane and reconstructs one-prong, one-prong+π 0 (s), and three-prong decay modes (where a "prong" refers to a charged hadron constituent). For this analysis, a DNN discriminator is used to identify hadronic decays of τ leptons [88]. The input variables to the DNN include variables related to the τ h isolation, τ h lifetime, and other detector-related variables. These variables serve as input to a DNN, which provides an output discriminant. The threshold on the output discriminant depends on the τ h p T and provides a τ h identification (ID) and reconstruction efficiency of about 60%. Two other DNNs are used to reject electrons and muons misidentified as τ h candidates using dedicated criteria based on the consistency between the measurements in the tracker, the calorimeters, and the muon detectors.
The ⃗ p miss T is defined as the negative vector sum of the p T of all PF candidates [89]. Its magnitude is referred to as p miss T . The invariant mass of the ττ system m τ τ is a key variable for separating H candidate events from the background in this analysis. The m τ τ is reconstructed using the FASTMTT algorithm, which is similar to the SVFIT algorithm [90] used in previous CMS publications, except that it uses a simplified mass likelihood function to reduce the computation time. This algorithm makes use of the ⃗ p miss T and its uncertainty and the four-vectors of the reconstructed visible τ decay products to calculate an estimate of the mass of the parent boson and the full fourmomenta of the H decay products needed to calculate MELA kinematic observables discussed in Sec. 3. Compared to the procedure described in Ref. [90], the FASTMTT algorithm removes the contributions of the leptonic and hadronic τ decay matrix elements to the likelihood function, and assumes that the neutrinos are collinear to the visible τ leptons. This gives a similar m τ τ resolution as the SVFIT algorithm, but the computation time is reduced by two orders of magnitude.

Event categorization
Selected events are classified according to four decay channels, eµ, eτ h , µτ h , and τ h τ h , where e and µ indicate τ decays into electrons and muons, respectively. The resulting event samples are made mutually exclusive by discarding events that have additional loosely identified and isolated electrons or muons. In cases where multiple pairs can be formed due to the presence of additional τ h candidates, we select the pair that includes the τ h candidate(s) with the largest value of the τ h ID discriminant.
The largest irreducible source of background is Drell-Yan production of Z/γ * → ττ, while the dominant background sources with jets misidentified as leptons are QCD multijet and W+jets. Other contributing background sources are tt, single top quark, Z/γ * → ee, Z/γ * → µµ, and diboson production.
The two τ lepton candidates assigned to the H decay are required to have opposite charges. Events are selected online using a combination of single-lepton, lepton+τ h , double-τ h , and electron+muon triggers. The trigger requirements, geometrical acceptances, and p T criteria are summarized in Table 3. The p T thresholds in the selections are optimized to increase the sensitivity to the H → ττ signal, while also satisfying the trigger requirements. The η selections are driven by reconstruction and trigger requirements. Table 3: Kinematic selection requirements for the four di-τ decay channels. The trigger requirement is defined by a combination of trigger candidates with p T over a given threshold, indicated inside parentheses in GeV. The pseudorapidity thresholds come from trigger and object reconstruction constraints. The p T thresholds for the lepton selection are driven by the trigger requirements, except for the τ h candidate in the µτ h and eτ h channels, and the sub-leading lepton in the eµ channel, where they have been optimized to increase the analysis sensitivity.

Channel Trigger
Year Selection criteria requirement In the ℓτ h channels, the large W+jets background is reduced by requiring the transverse mass, m T , to be less than 50 GeV. The m T is defined as follows, where p ℓ T is the transverse momentum of the electron or muon and ∆ϕ is the azimuthal angle between the lepton momentum and ⃗ p miss T .
In the eµ and ℓτ h channels, events with b jets are vetoed to reduce the background from tt production. In the eµ channel, this background is further mitigated by requiring p ζ = p miss along the bisector of the p T of the two leptons and p vis ζ is the sum of the components of the lepton p T along the same direction [91]. Event categories are designed to increase the sensitivity to the signal by isolating regions with large signal-to-background ratios, and to provide sensitivity to the Hgg and HVV parameters. They follow closely the selection in Ref. [92]: • 0-jet category: This category targets H events produced via ggH. Events containing no jets with p T > 30 GeV are selected. • VBF category: This category targets H events produced via the VBF process and ggH in association with two jets. Events are selected with at least two jets with p T > 30 GeV. In the eµ and ℓτ h channels the invariant mass of the two leading jets m jj is required to be larger than 300 GeV. In the τ h τ h channel we require the separation |∆η jj | between the two leading jets to be greater than 2.5, and the transverse component of the vector sum of the ⃗ p miss T and the ⃗ p T of the visible decay products of the τ leptons, defined as ⃗ p τ τ T , to have a magnitude (p τ τ T ) greater than 100 GeV. • Boosted category: This category contains all the events that do not enter one of the previous categories, namely events with one jet and events with several jets that fail the requirements of the VBF category. It targets events with an H produced in ggH and recoiling against an initial-state radiation jet.

Background estimation
In this section we describe the background processes to the H → ττ signal and methods to estimate their contributions. The major background is from Drell-Yan production, where the Z boson decays to a pair of τ leptons, followed by backgrounds from jets misidentified as τ lepton candidates. Whenever possible we rely on data to estimate background contributions. The data, signal, and background predictions in the VBF category are illustrated in Fig. 2.

Backgrounds due to ττ events
The Drell-Yan Z/γ * → ττ process is the dominant background to the H → ττ signal as both processes share the same final state and have very similar kinematic properties. Additionally, several other process such as tt can produce a ττ final state. Given the dominance of these backgrounds in data, we rely on an embedding method [93] to simulate them: in a dedicated control region, Z/γ * → µµ candidate events are selected from data; calorimeter deposits and tracks produced by the pair of muons in the event are removed; the muons in the event are replaced with simulated τ leptons with the same kinematic properties as the removed muons; and the PYTHIA generator is used to model the decay of the τ leptons in the same way as a typical Z/γ * → ττ decay.
The embedding process is performed separately for each decay channel and results in greatly improved statistical accuracy compared to that of a typical MC simulation. Using data to describe effects such as pileup and detector noise results in a much more reliable description of p miss T and jet-related variables, which in turn reduces systematic uncertainties arising from, e.g., jet energy corrections.

Background due to jet misidentification
One of the major backgrounds to the H → ττ signal in the eτ h , µτ h , and τ h τ h channels originates from events where a jet is misidentified as a τ h candidate. We refer to such backgrounds as jet → τ h processes. The background processes include QCD multijet production, W+jets, tt, and diboson processes. As the jet → τ h misidentification rates are typically ∼ 0.1-1%, the use of MC simulation to model this background is undesirable due to the statistical limitation and systematic uncertainty associated with the correct modeling of the detector response to jets identified as τ leptons. Thus, we rely on data to model this background, using the "fake factors" (F F ) method [94].
To estimate the jet → τ h background in the signal region, we define a sideband region enriched in jet → τ h events. The sideband region is defined by selecting τ h candidates that fail the nominal requirement on the τ h DNN ID discriminator but pass a looser requirement. This looser requirement will be referred to in the following as "relaxed". To estimate the jet → τ h background in the signal region, the events in the sideband region are scaled by extrapolation factors known as the F F , which are defined as the ratio between the nominal and relaxed jet → τ h misidentification rates. In the τ h τ h channel, there are two τ h candidates and it is possible that either one or both candidates originate from a jet → τ h . However, the dominant process producing jet → τ h in this channel is multijet QCD, which results in both τ h candidates originating from jet → τ h misidentifications. In this case, we define the sideband region by requiring only the leading τ h to meet the relaxed ID requirement.
The F F are determined separately for each channel, and for each of the dominant processes contributing to the jet → τ h background. For the τ h τ h channel, the jet → τ h background originates almost entirely from QCD multijet events, and therefore F F are derived only for this process. For the eτ h and µτ h channels, separate F F are derived for QCD multijet, W+jets, and tt processes. The QCD and W+jets F F are measured in dedicated control regions enriched with the events from the given process. The QCD control region is defined by inverting the opposite-sign charge requirement on the di-τ candidate pair. The W+jets control region is defined by selecting events with m T > 70 GeV. Obtaining a tt control region with high purity is not possible, and the F F are therefore measured in simulation for this subdominant process. For all control regions, we subtract the contributions of events with τ h candidates arising from genuine hadronic τ decays or from misidentified electrons or muons using simulations. We parametrize the F F as a function of the τ h p T . As events in this study are categorized primarily by the number of jets in the event, the F F are measured in jet multiplicity bins: no jets, one jet, and two or more jets.
The F F for each of the individual processes are then weighted into the overall F F to account for their relative contributions to the events in the relaxed identification region. For this purpose, simulated events are used to determine the expected contributions of W+jets and tt events, and the QCD contribution is estimated by subtracting all simulated non-QCD processes from the data. For the eτ h and µτ h channels, the F F measured for the W+jets process are weighted to also account for all other subdominant jet → τ h processes (all processes except multijet QCD, W+jets, and tt). For the τ h τ h channel, the multijet QCD F F account also for all other subdominant processes where the leading τ h candidate is a misidentified jet. The events in which the subleading τ h candidate is a misidentified jet and the leading τ h candidate is a genuine τ lepton are modeled via simulation; these events constitute only a small fraction (O(2%)) of the total misidentified jet background in this channel.
Finally, the F F are further corrected to accommodate residual differences observed when applying the measured F F to events in control regions. Such corrections are needed to account for: differences in the jet → τ h misidentification rates in control and signal regions arising from, for example, slight differences in the jet flavor compositions, the choices of functional forms for parametrizing the p T dependence, the finite binning of the parametrizing variables, and the omission of dependencies on kinematic or topological variables, such as η for which F F values are averaged out. Sub-leading dependencies of the F F on p ℓ T and m T for the ℓτ h channels, or the p T of the subleading τ h candidate and the mass of the visible ττ decay products for the τ h τ h channel, enter via these corrections.
In the eµ channel, one of the minor backgrounds stems from the multijet QCD process, where at least one jet is misidentified as an electron or a muon candidate. The majority of these events involve bb production, with electron and muon candidates produced in semileptonic decays of heavy-flavor quarks. This background is estimated from a sideband region using events with an electron and a muon with same-sign (SS) electric charges. Scale factors are then applied to extrapolate from this sideband region to the signal region, where the electron and the muon have opposite-sign (OS) electric charges. These so-called OS/SS scale factors are derived from a control region where the muons pass a relaxed isolation requirement but fail the signal region isolation criteria. The dependence of the OS/SS factors on the ∆R between the two leptons and jet multiplicity in the event is taken into account. A correction is also applied to account for any bias introduced by the inversion of the isolation requirement on the muon candidate. Finally, we subtract contributions from known SM processes using embedded and MC simulation samples.

Other backgrounds processes
The remaining backgrounds include Drell-Yan processes, where the Z boson decays to a pair of electrons or muons and one or more of the final state leptons is misidentified as the τ lepton, as well as tt, single top quark and multiboson production with fewer than two genuine τ leptons and additional electrons or muons that are misidentified as leptonically or hadronically decaying τ leptons. These backgrounds are small and we rely on MC simulation to estimate their contribution to the signal region. To avoid double-counting of backgrounds arising from jet misidentification, we remove the events with the generator-level quark or gluon matched to the reconstructed τ lepton candidate in the final state. Similarly, Drell-Yan MC events as well as any other MC simulation events with two genuine τ leptons are discarded to avoid overlap with the embedded samples.

Corrections to simulated data
To improve the agreement between the signal and background processes modeled with simulations and the data, the following corrections are applied to the simulated events (including ττ-embedded events for corrections pertaining to the simulated τ leptons): • The pileup distribution in simulation is reweighted in order to match the pileup in data.
• The electrons and muons channels are corrected to account for their trigger efficiencies, reconstruction, identification, and isolation requirements. The channels containing τ h candidates are corrected for their trigger efficiencies, reconstruction, and identification requirements. Corrections are also applied to events including e → τ h and µ → τ h candidates to account for differences in the misidentification probabilities.
• For the eµ and ℓτ h channels, corrections are applied to account for differences in the number of events passing the b jet veto, as a result of variations in the probabilities for jets to be tagged as b jets.
• The τ h energy scales are corrected per decay mode to match the energy scale in data using the Z/γ * → ττ visible mass peak. Separate corrections are derived for τ h candidates that originate from genuine hadronically decaying τ leptons and those that originate from electron and muon misidentifications. The electron energy scale is adjusted in data and simulation using the Z boson mass peak, and the resolution of the simulated electrons is also adjusted to match the data.
• Jet energy scale corrections are applied to both data and simulated events, and the energy resolution of simulated jets is adjusted to match the resolution in data. For the Drell-Yan, W+jets, and H events estimated from simulation, corrections are applied to the p miss T based on the vector difference of the measured p miss T and the total p T of the neutrinos from the Z, W, or H decay products ("recoil corrections").
• The Z boson mass and p T spectra in simulation are corrected to better match the data.
To this purpose the Z mass and p T are measured in data and simulation in dimuon events. The observed differences between the data and simulations are taken as event weights that are subsequently applied to the simulated Z/γ * → ℓℓ events. The size of the corrections are typically less than 20%. For tt events, the top quark p T spectra are also reweighted to match the p T spectra in data. The procedure used to derive these corrections is described in Ref. [95]. The sizes of the corrections are less than 20%.
• During the 2016-2017 data-taking, a gradual shift in the timing of the inputs of the ECAL L1 trigger in the region at |η| > 2.0 caused a specific trigger inefficiency [96]. For events containing an electron (a jet) with p T larger than ≈50 GeV (≈100 GeV), in the region 2.5 < |η| < 3.0 the efficiency loss is ≈10-20%, depending on p T , η, and time. Correction factors were computed from data and applied to the acceptance evaluated by simulation to account for this inefficiency. This results in a small decrease in the estimated signal and background yields, e.g., the inclusive SM VBF yields are reduced by about 2%-3%.

Systematic uncertainties
A variety of systematic uncertainties are taken into account in the analysis. The uncertainty model consists of normalization uncertainties that only scale the yield of a distribution while leaving its shape unchanged, and shape uncertainties that also alter the shapes of the distributions. The leading systematic uncertainty sources result from the jet energy scales and resolutions, and the statistical uncertainties in the background predictions. All systematic uncertainties are implemented in the form of nuisance parameters in the likelihood, which can be further constrained by the fit to the data. The uncertainties considered in this analysis are summarized in Table 4 and detailed below. The integrated luminosities of the 2016, 2017, and 2018 data-taking periods are individually known with uncertainties in the 1.2-2.5% range [54][55][56], while the total Run 2 (2016-2018) integrated luminosity has an uncertainty of 1.6%, the improvement in precision reflecting the (uncorrelated) time evolution of some systematic effects. The uncertainty in the L1 ECAL trigger timing correction factors described in Sec. 7 ranges between 0.2-15%.
The uncertainties in the (electron) muon reconstruction, identification, and isolation efficiencies amount to (2) 1%. The electron and muon triggers contribute an additional 2% uncertainty in the yield of simulated processes. The uncertainty in the electron energy scale depends on p T and η and is typically less than 1%. The muon energy scale uncertainty varies between 0.4 and 2.7% depending on η.
The τ h reconstruction and identification efficiency is measured in three p T bins (30-35, 35-40, > 40 GeV) or four τ h decay mode bins and its uncertainty is dominated by the statistical component. The uncertainty is taken to be uncorrelated for the individual measurements and is within the 3%-10% range. In addition, a yield uncertainty of 3% is taken into account for genuine τ h candidates due to the discrimination against electrons and muons. An additional uncertainty is applied for the τ h reconstruction in the embedded samples to account for differences in the charged hadrons and π 0 reconstruction efficiencies, which ranges from 0.8 to 3%. The uncertainty in the τ h trigger efficiencies depends on the p T and decay mode of the τ h candidates, and is therefore treated as a shape uncertainty. The magnitude of this uncertainty is typically O(10%). For electrons and muons misidentified as τ h candidates, an uncertainty derived in bins of p T , η, and decay mode of the misidentified τ h candidate is applied and amounts to between 9%-40% and 10%-70%, respectively. The uncertainty in the τ h energy scale ranges from 0.2 to 1.1% depending on the decay mode. For electrons and muons misidentified as τ h candidates, the uncertainty in the energy scale amounts to 1%-6.5% for electrons and 1% for muons.
Uncertainties in the jet energy scale come from different sources and with partial correlations. These sources typically affect different regions of the detector and their magnitude depends on the jet p T and η. The collective magnitude of these uncertainties per jet typically ranges from 0.5 to 14%. Uncertainties in the jet energy resolution are also taken into account and range from 2 to 95% per jet depending on η. The jet energy scale and resolution uncertainties create migrations between categories defined on the basis of the jet multiplicity or m jj , and affect the shapes of the ∆ϕ jj and MELA discriminants. For all MC samples without recoil corrections applied, the uncertainties in the jet energy scale and resolution are also propagated to the ⃗ p miss T .
For simulated events that have recoil corrections, the uncertainties in the resolution and response of the ⃗ p miss T are derived as part of the estimate of the recoil corrections and range from 0.3 to 5.8%. Other processes suffer from uncertainties in the energy measurement for the energy depositions in the calorimeter, not associated with jets and photon candidates, so-called unclustered energy scale uncertainties. The magnitudes of these uncertainties depend on the p T , η, and types of the unclustered PF candidates. The overall sizes of these uncertainties are typically less than 20%.
The yield uncertainty related to discarding events with a b-tagged jet varies up to 10% for backgrounds with heavy-flavor jets, whereas for backgrounds with mostly gluon and lightflavor jets it is less than 1%.
For background with jet → τ h misidentifications, the uncertainties in the measured F F are propagated to the background predictions as shape uncertainties. This includes statistical uncertainties in the fitted functions as well as systematic uncertainties coming from residual differences observed in control regions. Altogether the uncertainties on the F F are O(10%). Similarly, for the multijet QCD estimation in the eµ channel uncertainties in the OS/SS extrapolation factors are taken into account. Altogether the uncertainties amount to O(20%).
Uncertainties related to the embedding method are taken into account in addition to those pertaining to the simulated τ lepton decay products described previously. Embedded samples include all events with two τ lepton candidates, essentially Drell-Yan events, but also contain small fractions of diboson and tt events. A shape uncertainty is applied to take into account the contamination from these non Drell-Yan events, which amounts to 10% of the tt and diboson contribution to embedded samples, as estimated from simulation. Data events with muons are selected with a muon trigger before embedding the simulated τ leptons. The uncertainty in this trigger requirement amounts to 4%.
Uncertainties in the tt, Drell-Yan, diboson, and single top quark production cross sections amount to 4.2, 2.0, 5.0, and 5.0%, respectively. This includes uncertainties due to missing higher-order corrections, the PDFs, and α S . For the tt cross section the uncertainty in the top quark mass is also included. Uncertainties due to the reweighting of the top p T and Drell-Yan p T and mass spectra are also included. For the tt samples, the size of the correction is taken as the uncertainty, while for the Drell-Yan samples, the correction is varied by 10%.
The theoretical uncertainties in the H production cross sections and H → ττ branching fraction follow the recommendations in Ref. [48]. The uncertainty in the branching fraction of the H to τ leptons includes a 1.7% uncertainty due to missing higher-order corrections, a 1% parametric uncertainty in the quark masses, and a 0.62% parametric uncertainty in α S . The inclusive uncertainty related to the PDFs amounts to 3.2, 2.1, 1.8, and 1.3%, respectively, for the ggH, VBF, WH, and ZH production modes. Acceptance uncertainties for the ggH signal due to renormalization and factorization scale variations are applied following the uncertainty schemes proposed in Ref. [48]. The sizes of these uncertainties are typically smaller than 25%. Acceptance uncertainties for the VBF signal due to renormalization and factorization scale variations are applied as yield uncertainties. The sizes of the uncertainties are typically smaller than 5%.
Uncertainties arising from the limited sample size of the simulated events, or data control regions, are taken into account using the"Barlow-Beeston" method [97,98]. They are considered for all bins of the distributions used to extract the results.

Analysis of ggH production
In this section we first review the analysis methods employed to extract the Hgg anomalous coupling parameters. In Sec. 9.1 we describe the MELA method from which we obtain our most stringent expected limits on the anomalous coupling parameters. The ∆ϕ jj method, which is used to cross check the results obtained by the MELA method, is briefly described in Sec. 9.2. The results obtained are then presented in Sec. 9.3.

The MELA method
For events entering the VBF category, a combination of simple neural networks and MELA discriminants is used. The former provide optimal separation of the dominant backgrounds for a given channel from the H production, while the latter offer powerful handles to distinguish different signal hypotheses.
A feed-forward network containing two hidden layers is used in each channel. As dominant backgrounds vary by channel, the observables used in the neural network training change and therefore the architecture of the network is modified for each channel. The number of nodes per layer is kept to a minimum to reduce the complexity of the neural network without compromising its performance. As sensitivity to the Hgg anomalous coupling is maximal for events with kinematics similar to those of VBF production, we use VBF signal events as the signal process for all neural networks. This provides the added benefit that the same network can be used in the analysis of both ggH and VBF production processes.
The simplest neural network is employed in the eτ h and µτ h channels where the background is dominated by the Z/γ * → ττ production. Thus, a simple binary classifier is trained to distinguish VBF production from the Z/γ * → ττ process. We use all seven MELA input variables (Ω assoc ) defined for the VBF process in Sec. 3, m τ τ , m jj , and p τ τ T as input features for the network.
In the τ h τ h and eµ channels, there are two background processes that have significant event yields in the VBF category. For the τ h τ h channel, the two background processes are Z/γ * → ττ and jet → τ h . While for the eµ channel the backgrounds are Z/γ * → ττ and tt. We thus train multiclass neural networks to divide events into three classes: two background classes and one signal class targeting the VBF H production process. The same input features used for the binary ℓτ h networks are also used for τ h τ h and eµ multiclass neural networks. However, in the eµ channel two additional features, the jet multiplicity and p ζ , are included to improve the rejection of tt events.
For the binary classifiers we use the neural network output scores as discriminating variables, whereas for the multiclass networks we use the output scores for the VBF signal classes. We will refer to these discriminants collectively as D NN .
Three MELA discriminants are used in the Hgg analysis. In order to separate ggH production from VBF production, D VBF 2jet as defined in Eq. (15) The results of the analysis are extracted with a global maximum likelihood fit based on signalsensitive observables. We summarize the observables utilized in the analysis of the ggH production in Table 5. As the same set of observables is used in the HVV study, except for the superscripts of anomalous coupling specific MELA discriminators, we define them in Table 5 as well.
We use four observables in total to construct fitted distributions in the VBF category: D ggH 0− , D ggH CP , D NN , and D VBF 2jet . The selected events are binned in multidimensional histograms (templates) of these observables. The binning of these templates has been optimized to ensure sufficient statistical populations of all bins, to retain kinematic information, and for memory usage and speed of computer calculations.
The inclusion of the D ggH CP observable is intended to bring sensitivity to the sign of the interference between the CP-even and CP-odd contributions, which would manifest as an asymmetry between the number of events detected with positive and negative values of D ggH CP . We therefore include two bins in this discriminant, D ggH CP < 0 and D ggH CP ≥ 0. It should be noted that  For the eµ channel, which is the least sensitive channel in this analysis, we respectively use 3, 2, and 4 bins for these observables. In all cases, neighboring bins are merged such that the background prediction has no bins with statistical uncertainty larger than 50% to prevent cases where bins have very low statistical populations. For the τ h τ h channel, it was not possible to define a suitable set of equally spaced bins that fulfilled the optimization criteria. Therefore, we employ variable bin widths for the D ggH 0− , D NN , and D VBF 2jet observables, and select bin boundaries that optimize the expected sensitivity, while minimizing the total number of bins in the templates.
We use two observables to construct our templates in the boosted category, m τ τ and p τ τ T , and one observable in the 0-jet category, m τ τ . There are no dedicated MELA observables sensitive to anomalous couplings in these channels, as the events either have fewer than the two jets needed to construct the observables, or do not display significant separation between different signal scenarios to justify their inclusion. However, the p τ τ T observable used in the boosted category has some sensitivity to anomalous HVV couplings, as the BSM VBF events generally have larger p τ τ T . Similarly, the relative yields of the signal events across categories also has some sensitivity to anomalous HVV couplings, as the signal acceptance in each category will vary depending on the p T of the H. Despite not bringing any sensitivity to anomalous Hgg couplings, the 0-jet and boosted categories are included in the fit nonetheless to constrain backgrounds and to provide sensitivity to the inclusive ggH cross section. The chosen binning in these categories is thus similar to what was employed in previous CMS measurements [92].
Example distributions of the observables in the most sensitive τ h τ h and µτ h channels are given in Fig. 3.

The ∆ϕ jj method
This cross check method is based on the strategy proposed in Ref. [50]. The ∆ϕ jj variable, defined in Sec. 3.2, provides sensitivity to the CP properties of the Hgg vertex. The VBF-like ggH events are targeted as they have been shown to be most sensitive to the Hgg anomalous couplings [50].
The event selection and categorization follows closely those described in Sec. 6 Obs. / Exp.   The VBF category definition described in Sec. 6.1 is adopted for the eµ and ℓτ h channels. For the τ h τ h channel, the VBF category selections defined previously for the ℓτ h channels are utilized. The selected VBF-like events are then further subdivided into four categories based on m jj and p τ τ T to enhance the separation between different CP scenarios and to provide additional differentiation between the signal and backgrounds. The four categories are defined as follows: events with m jj < 500 GeV and p τ τ T < 150 GeV (low-m jj category); events with m jj < 500 GeV and p τ τ T ≥ 150 GeV (low-m jj boosted category); events with m jj ≥ 500 GeV and p τ τ T < 150 GeV (high-m jj category); and events with m jj ≥ 500 GeV and p τ τ T ≥ 150 GeV (high-m jj boosted category).
We summarize the observables utilized in Table 6. In this case we use two-dimensional (2D) templates in the VBF categories to extract the results. These templates are constructed using the ∆ϕ jj and m τ τ observables. We use 12 equally-spaced bins for ∆ϕ jj . Variable bin widths are used for the m τ τ observable, where the bin boundaries are selected to capture the peaking structures of the signal distributions close to m τ τ ∼ 125 GeV. The initial choice of the m τ τ bin boundaries is the same for all channels and categories but we apply an additional merging of neighboring bins in cases where the statistical fluctuations in the signal or background templates are excessive.

Results of the ggH analysis
The results are extracted by performing a binned maximum likelihood fit to the data combining all categories for the different channels and data-taking years. The likelihood function is defined as a product of conditional probabilities over all bins i: where n i is the observed number of data events in each bin. The signal and background expectations are given by s i and b i respectively, which are functions of θ, that represents the full set of nuisance parameters corresponding to the systematic uncertainties, and the parameters that modify the H signal processes: µ ggH , µ qqH , and ⃗ f . The parameters µ ggH and µ qqH are the H signal strength modifiers that respectively modify the ggH and VBF+VH cross sections with respect to the SM values. The ⃗ f term represents the set of anomalous coupling parameters that modify the distributions of the ggH and/or VBF+VH signals. In the case of Hgg anomalous coupling measurements, ⃗ f = ( f ggH a3 , f a3 ). Finally, the p(θ|θ) term represents the full set of probability density functions of the uncertainties in the nominal values of the nuisance parameters θ. The systematic uncertainties that affect only the normalizations of the signal and background processes are assigned log-normal external constraints, whereas the shape altering systematic uncertainties are assigned Gaussian external constraints. The negative log-likelihood is defined withμ ggH ,μ qqH , ⃗f , andθ as the best fit values of the signal modifiers and nuisance parameters. The 68 and 95% confidence level (CL) intervals are identified when −2∆ ln L = 1.00 and 3.84, respectively, for which exact coverage is derived using the asymptotic approximation [99].
The measurements of f ggH a3 , or equivalently f Htt CP or α Hff according to Eq. (10), is performed using the two methods based on MELA and ∆ϕ jj . An example of a pre-fit distribution for the MELA method is given in Fig. 4 for one of the most sensitive signal categories. Figure 5 shows the postfit distribution in the VBF high-m jj boosted category in the τ h τ h channel, which is the most sensitive category used to extract the results using the ∆ϕ jj method. The results of the likelihood scans are shown in Figs. 6-7 and listed in Table 7.  Figure 4: The observed and predicted 2D distribution of (D ggH 0− , D NN ) before the fit to data in the most sensitive VBF category region with 0.3 < D VBF 2jet < 0.7 in the τ h τ h channel. The total H signal, including VBF, ggH, and VH processes, is shown stacked on top of the background in the solid red histogram. The ggH signal for the CP-even (CP-odd) scenario is also shown overlaid by the red (blue) line. Only the statistical uncertainties are included in the uncertainty band.
For the MELA method, the maximum value of the −2∆ ln L is noticeably larger for the observed scan (occurring at f ggH a3 ≈ −0.8) compared to the expected. We checked and excluded a possibility that the discrepancy originated from artificial effects of our analysis procedure. We used pseudoexperiments to estimate the probability of obtaining a maximum value of the −2∆ ln L greater than or equal to the maximum −2∆ ln L of the observed scan. This probability was determined to be 33%. We thus conclude that the observed results are affected by statistical fluctuations but compatible with the signal plus background model.
The use of the MELA method is shown to improve the expected uncertainty in f ggH a3 (α Hff ) by 8 (4)%. This improvement owes partly to the use of neural networks that differentiate the H signal from the backgrounds more effectively, and partly to the inclusion of matrix-element discriminants that improve the separation between CP-even, CP-odd, and mixed CP scenarios. In  Figure 5: Observed and predicted 2D distributions after the fit to data in the VBF high-m jj boosted category in the τ h τ h channel. The total H signal, including VBF, ggH, and VH processes, is shown stacked on top of the background in the solid red histogram. The ggH signal for the CP-even (CP-odd) scenario is also shown overlaid by the red (blue) line. The uncertainty band accounts for all sources of systematic uncertainty in the signal and background predictions. The expectation in the ratio panel is the sum of the estimated backgrounds and the best fit signal. Table 7: Allowed 68% (central values with uncertainties) and 95% CL (in square brackets) intervals on anomalous Hgg coupling parameters using the H → ττ decay. The use of "-" indicates cases where no exclusion at the 95% CL was found. As indicated in the Table, the case of the ggH production, the SM process is generated by the quark loop represented by the a gg 2 term in Eq. (1). This makes it harder to distinguish the anomalous a gg 3 contribution from the SM, as both are generated by dimension-six operators and many of their kinematic features are similar. Most of the sensitivity to CP-odd couplings is primarily in the azimuthal correlation of two jets, which explains why the full multivariate MELA treatment of kinematic information does not bring as much additional information as in the case of the VBF production, described in the following section, where the SM process is generated by the tree-level coupling g VV 1 in Eq. (1), which is a dimension-four operator. This leads to kinematic differences between the anomalous contributions and the SM in multiple observables and a much larger gain from the full multivariate MELA treatment.
We also cross-checked the results neglecting CP-odd contributions to the VBF and VH processes ( f a3 fixed to zero). This was found to have only a minor effect on the best fit value and uncertainty of f ggH a3 (α Hff ). Therefore, we present results only for the more general case where

Analysis of VBF production
As MELA-based observables offer superior sensitivity and discrimination among different possible anomalous HVV couplings in the VBF and VH productions compared to single kinematic observables, such as ∆ϕ jj , the VBF study is conducted with MELA-based observables only. As the sensitivity to VH anomalous effects is small with the present dataset, the analysis is optimized for the VBF process. However, we allow the anomalous couplings to modify the VH kinematics in the fit to data.
We employ the same neural networks to separate VBF-like signal and background processes as in the ggH analysis, while MELA discriminants offer optimal separation between different signal hypotheses (shown in Table 5). We use D VBF 2jet , defined in Eq. (15), to separate SM ggH production from the SM VBF production. Several MELA discriminants, as defined in Eq. (20), are constructed to optimally separate the SM hypothesis from the potential anomalous coupling in the VBF production: As for the Hgg analysis, we also define a pure CP-odd MELA discriminant D VBF CP in Eq. (21), that is sensitive to interference effects between the SM and pseudoscalar H contributions to directly probe for CP-violation in the HVV vertex: The results of the VBF analysis are extracted with a global maximum likelihood fit based on 4D or 3D, 2D, and 1D distributions built in each of the VBF, boosted, and 0-jet categories, respectively. The templates constructed for the boosted and 0-jet categories are identical to those described for the ggH analysis in Sec. 9.1, although we note that in this case, in contrast to the former case, the chosen observables do provide some differentiation between anomalous coupling scenarios.
Depending on the anomalous coupling parameter being measured, we use three or four observables in total to construct the distributions in the VBF category. In all cases we include the D NN and D VBF 2jet observables. We additionally include D 0− , D 0h+ , D Λ1 , or D Zγ Λ1 , when we measure f a3 , f a2 , f Λ1 , or f Zγ Λ1 , respectively; which we will collectively refer to as D BSM in the following. The fourth observable, which is included only for the measurement of the CP-odd parameter f a3 , is D VBF CP . The selected events are binned into templates constructed using these observables.
The binning of the templates has been optimized following the criteria outlined in Sec. 9.1. For the ℓτ h and τ h τ h channels, we use 10, 8, and 4 equally sized bins for D BSM , D NN , and D VBF 2jet , respectively. For the eµ channel, we respectively use 3, 2, and 4 bins for these observables. In all cases, neighboring bins are merged such that the background prediction has no bins with statistical uncertainty larger than 50%. For the measurement of the f a3 parameter we include two bins in the D VBF CP discriminant, D VBF CP < 0 and D VBF CP ≥ 0, to bring sensitivity to the sign of the interference between the CP-even and CP-odd contributions. The expected symmetry between these bins is enforced for the background and CP-conserving signal templates to reduce the influence of statistical fluctuations.

Results of the HVV analysis
The four f ai parameters describing anomalous HVV couplings, as defined in Eqs. (1) and (3), are tested against the data according to the likelihood function defined in Eq. (18), following the same approach as that utilized in the analysis of the Hgg vertex.
An example of a pre-fit distribution in the most sensitive VBF category for the τ h τ h channel is shown in Fig. 8. The results of the likelihood scans for Approaches 1 and 2 (defined in Sec. 2) are listed in Table 8   For the observed f a2 scan, there is a second region allowed at the 68% CL away from the best fit value. We use the union symbol (∪) to display the additional allowed f a2 range in this case. The presence of two minima in the observed likelihood scan for f a2 (and to a lesser extent f Zγ Λ1 ) is a result of the limited sensitivity to the sign of the interference between the a 1 and a 2 (κ Zγ 2 ) couplings, which in turn limits the sensitivity to the signs of the f a2 ( f Zγ Λ1 ) parameters. The CL intervals on f ai at 95% and 68% are more stringent compared to those utilizing the H decay information in the H → 4ℓ channel [21] because the VBF and VH production processes are sensitive to higher values of p 2 i appearing in Eq. (1). Therefore, the cross section of anomalous contributions in VBF and VH production increases quickly with f ai . As the cross section  increases with respect to f ai at different rates for production and decay, relatively small values of f ai correspond to a substantial anomalous contribution to the production cross section. This leads to the plateau in the −2 ln L distributions for larger values of f ai in Fig. 9. By using the cross section ratios for VBF production in the f ai definition in Eq. (3), the appearance of the plateau and the narrow exclusion range would change. The f ai constraints in Ref. [21] also utilize the VBF and VH production information, but the number of reconstructed H → 4ℓ events in these production modes is still low compared to this analysis.

Combination of the results with other decay channels
The results of the anomalous coupling measurements presented in the previous sections can be further improved by combining with other H production and decay channels. The precision of the anomalous HVV and Hgg coupling measurements is improved by combining the H → ττ and H → 4ℓ decay channels, where we consider H production via VBF, VH, and ggH. We additionally constrain the anomalous Htt couplings by combining the ggH → ττ/4ℓ and ttH/tH → γγ/4ℓ channels.
For all combinations, each H decay channel treats anomalous couplings in H production processes in the likelihood in a consistent manner. As with the H → ττ only fits, in the likelihood fit for a given parameter the values of the other anomalous couplings are set to zero with the exception of the fits to f a3 and f ggH a3 , and the signal strength parameters are profiled in the combined likelihood fit. The number of signal strength parameters in the combined fit can be reduced by using a relationship between the production cross section ratios. For example, there are in principle four signal strength parameters for the combination of the H → ττ and H → 4ℓ channels (µ τ τ qqH , µ τ τ ggH , µ ZZ qqH , µ ZZ ggH ). However, one degree of freedom is removed because the ratio between the ggH and VBF+VH cross sections is the same in both channels, µ τ τ qqH /µ τ τ ggH = µ ZZ qqH /µ ZZ ggH . Therefore, we can parametrize the combined fit with three signal strength parameters µ qqH , µ ggH , and η τ , where η τ stands for the relative strength of the H coupling to the τ leptons. For the combination with the ttH and tH results using the H → 4ℓ and H → γγ channels, the signal strengths µ ZZ tt H and µ γγ tt H are not related for the f Htt CP measurement because they could differ by the loop involved in the H → γγ decay. In the EFT approach, the fully-resolved loop parametrization following Ref. [46] is used to correlate them. All common systematic uncertainties are treated as being correlated between the channels in the combined likelihood fit.
The measurements of anomalous Hgg and HVV couplings using the MELA method are combined with the results using the on-shell H → 4ℓ decay [21]. In the H → 4ℓ analysis, anomalous HVV couplings can affect both production (VBF+VH) and decay (H → VV → 4ℓ) processes. Information from both processes is taken into account in the analysis. The combination improves the limits on the anomalous coupling parameters typically by about 20%-50%.
The combined likelihood scans for the HVV anomalous coupling measurements are shown in Figs. 11-12, and the allowed 68 and 95% CL intervals are listed in Table 9. The H → ττ channel results mainly constrain small values of f ai where the H production information is the dominant factor, whereas the H → 4ℓ analysis provides major constraints at large values of f ai based on the decay information.
The combined likelihood scans for the Hgg anomalous coupling measurements are shown in Fig. 13, and the allowed 68 and 95% CL intervals are listed in Table 10. The H → ττ channel is more sensitive to f ggH a3 than the H → 4ℓ channel is, but there is a significant improvement from  including both channels in the combination. Previous measurements by the CMS and ATLAS Collaborations [21,31] were only able to differentiate between the CP-even and CP-odd scenar- ios with a significance slightly less than 1 standard deviation. With the current measurement, the pure CP-odd scenario is excluded with a observed (expected) significance of 2.4 standard deviations (1.8 standard deviations), which is cross-checked with pseudoexperiments. Constraints on anomalous Htt couplings are obtained through the combination of the Hgg results with measurements of the ttH and tH processes in the H → 4ℓ [21] and H → γγ [20] channels. We measure the f Htt CP parameter by relating f ggH a3 and f Htt CP as described in Eq. (10), under the assumption of top quark dominance in the ggH loop. The results are presented in Fig. 13 and Table 10.
The combination of Hgg, ttH, and tH results can be reinterpreted in the EFT approach as constraints on c gg and c gg . The likelihood scans for c gg and c gg are performed with κ t and κ t either profiled or fixed to SM expectation (κ t = 1, κ t = 0). The reinterpretation is presented in Fig. 14 and Table 11. We note that in both c gg scans there is a second minimum away from c gg = 0 due to the negative interference between the c gg and κ t contributions, as follows from Eq. (8). The value of the −2∆ ln L at c gg between the two minima points is larger for the observed scan compared to the expected, due to the statistical fluctuation in the H → ττ channel data described in Sec. 9.3. obtained with the combination of results using the H → ττ and H → 4ℓ [21] decay channels. Right: The observed (solid) and expected (dashed) likelihood scans of f Htt CP obtained with the combination of results using the H → ττ, H → 4ℓ [21], and H → γγ [20] decay channels. Table 11: Allowed 68% (central values with uncertainties) and 95% CL (in square brackets) intervals on c gg and c gg using the H → ττ, H → 4ℓ [21], and H → γγ [20] decay channels. Results are presented for two scenarios: κ t and κ t profiled in the fit, and κ t and κ t fixed to the SM expectation. In instances where there is a second allowed region away from the best fit value at a given CL, we use the union symbol (∪) to display the additional allowed c gg /c gg range.   Figure 14: Observed (solid) and expected (dashed) likelihood scans of c gg (left) and c gg (right) with κ t and κ t profiled (upper) and fixed to SM expectation (lower) using the H → ττ, H → 4ℓ [21], and H → γγ [20] decay channels.

Summary
A study is presented of anomalous interactions of the Higgs boson (H) with vector bosons, including CP violation, using its associated production with two hadronic jets in gluon fusion (ggH), vector boson fusion (VBF), and associated production with a vector boson, and a subsequent decay to a pair of τ leptons. Constraints have been set on the CP-violating effects in ggH production in terms of the effective cross section ratio f ggH a3 , or equivalently the effective mixing angle α Hff , using matrix element techniques. The ggH production analysis results in the most stringent limits on CP violation in ggH production to date. In the VBF production analysis, constraints on the CP-violating parameter f a3 and on the CP-conserving parameters f a2 , f Λ1 , and f Zγ Λ1 have been set using matrix element techniques. Further constraints were obtained in the combination of the H → ττ, H → 4ℓ, and H → γγ channels. The combination improves the limits on the anomalous coupling parameters typically by about 20%-50%. The analysis excludes the pure CP-odd scenario of the Higgs coupling to gluons with a significance of 2.4 standard deviations.