Measurement of the Soft-Drop Jet Mass in pp Collisions at sqrt[s]=13 TeV with the ATLAS Detector.

Jet substructure observables have signiﬁcantly extended the search program for physics beyond the Standard Model at the Large Hadron Collider. The state-of-the-art tools have been motivated by theoretical calculations, but there has never been a direct comparison between data and calculations of jet substructure observables that are accurate beyond leading-logarithm approximation. Such observables are signiﬁcant not only for probing the collinear regime of QCD that is largely unexplored at a hadron collider, but also for improving the understanding of jet substructure properties that are used in many studies at the Large Hadron Collider. This Letter documents a measurement of the ﬁrst jet substructure quantity at a hadron collider to be calculated at next-to-next-to-leading-logarithm accuracy. The normalized, diﬀerential cross-section is measured as a function of log 10 ρ 2 , where ρ is the ratio of the soft-drop mass to the ungroomed jet transverse momentum. This quantity is measured in dijet events from 32.9 fb − 1 of √ s = 13 TeV proton–proton collisions recorded by the ATLAS detector. The data are unfolded to correct for detector eﬀects and compared to precise QCD calculations and leading-logarithm particle-level Monte Carlo simulations.

resummation dominates over non-perturbative or fixed-order parts of the recent precision calculations; studying the distribution in log-scale allows this region to be studied more closely.
After the event selection, the data are unfolded to correct for detector effects. MC simulations are used to perform the unfolding and for comparisons with the corrected data. The unfolding procedure corrects detector-level2 observables to particle-level. The particle-level selection is defined to be as close as possible to the detector-level selection in order to minimize the size of simulation-based corrections when unfolding. Particle-level jets are clustered from simulated particles with a mean lifetime τ > 30 ps excluding muons and neutrinos. These jets are built using the same algorithm as for detector-level jets, and particle-level events must pass the same dijet requirement. The experimental resolution of the log 10 (ρ 2 ) distribution depends on the jet p T , so the log 10 (ρ 2 ) and p T distributions are simultaneously unfolded. After correcting for the acceptance of the event selection, the full twodimensional distribution is unfolded using an iterative Bayesian (IB) technique [31] with four iterations as implemented in the RooUnfold framework [32]. The acceptance corrections are largely independent of log 10 (ρ 2 ), with a small effect below −3 due to the ρ 0 requirement.
Several MC simulations are used to unfold and compare to the data. Dijet events were generated at LO using P [33] 8.186, with the 2 → 2 matrix element (ME) convolved with the NNPDF2.3LO parton distribution function (PDF) set [34], and using the A14 [35] set of tuned PS and underlyingevent model parameters. Additional radiation beyond the ME was simulated in P 8 using the LL approximation for the p T -ordered PS [36]. To provide several comparisons to data, additional dijet samples were simulated using different generators. S 2.1.1 [37] generates events using multi-leg 2 → 3 matrix elements, which are matched to the PS following the CKKW prescription [38]. These S events were simulated using the CT10 LO PDF set [39] and the default S event tune. H ++ 2.7.1 [40,41] events were generated with the 2 → 2 matrix element, convolved with the CTEQ6L1 PDF set [42] and configured with the UE-EE-5 tune [43]. Both S and H ++ use angular ordering in the PS and a cluster model for hadronization [44]. All MC samples use P 8 minimum bias events (MSTW2008LO PDF set [45] and A2 tune [46]) to simulate pileup. They were processed using the full ATLAS detector simulation [47] based on G 4 [48]. Figure 1 shows the uncorrected data compared with detector-level simulation for P , S , and H ++ as well as particle-level simulation for P . There are substantial migrations between the detector-and particle-level distributions, which cause large off-diagonal terms in the unfolding matrix especially at low values of log 10 (ρ 2 ).
Various systematic uncertainties impact the soft-drop mass distribution. The sources of uncertainty can be classified into two categories: experimental and theoretical modeling. Experimental uncertainties are due to limitations in the accuracy of the modeling of calorimeter-cell cluster energies and positions as well as their reconstruction efficiency, and are evaluated as follows. Isolated calorimeter-cell clusters are matched to tracks; the mean and standard deviation of the energy-to-momentum ratio (E/p) is used for the cluster energy scale and resolution uncertainties, and the standard deviation of the relative position is used for the cluster angular resolution. In the track-momentum range 30 GeV< p < 350 GeV, E/p is augmented with information from testbeam studies [49]. For |η| > 0.6 in that p range or for p > 350 GeV (and any |η|), a flat 10% uncertainty is estimated for both the energy scale and resolution, motivated by earlier studies [50]. The reconstruction efficiency is studied using the fraction of tracks without a matched calorimeter-cell cluster. A series of validation studies are performed to ensure that 2 Detector level refers to the measured outputs of the detector; particle level refers to the particles that interact with the detector.       Figure 1: Distributions of log 10 (ρ 2 ) in data compared to reconstructed detector-level (Reco.) P , S , and H ++, and particle-level (Truth) P simulations for β = 0 (left), β = 1 (right), and β = 2 (bottom). The ratio of the three detector-level MC predictions to the data is shown in the middle panel, and the size of the detector→ particle-level corrections for P is shown as the ratio in the bottom panel. The error bars on the data points and in the first ratio include the experimental systematic uncertainties in the cluster energy, angular resolution, and efficiency. The distributions are normalized to the integrated cross-section, σ resum , measured in the resummation region, −3.7 < log 10 (ρ 2 ) < −1.7. these uncertainties are valid also for non-isolated clusters. Jets clustered from tracks are geometrically matched to calorimeter jets and the ratio of their p T and mass is sensitive to the jet energy scale (JES) and jet mass scale. Furthermore, the decomposition method [50-52] is used to propagate the cluster-based uncertainties to an effective JES, which agrees well with the observed in-situ shift for R = 0.4 ungroomed jets [29]. Finally, the jet mass scale and resolution are tested using the observed W mass peak in tt events. The same event selection and level of agreement is observed as in Ref. [53]. These additional studies confirm that the cluster-based uncertainties are valid for log 10 (ρ 2 ).
One of the dominant uncertainties is due to the theoretical modeling of jet fragmentation (QCD Modeling). In particular, as dijet simulation is used to unfold the data, the results of the analysis are sensitive to the choice of MC generator used for this procedure. The P generator is used for the nominal sample, and comparisons are made with S and H ++. The S and H ++ generators give compatible results, so only the variation with S is used as a systematic uncertainty. The impact of this uncertainty is assessed by unfolding the data with the alternative response matrix. In addition to directly varying the model used to derive the response matrix, a data-driven nonclosure technique is used to estimate the potential bias from a given choice of prior and the number of iterations in the IB method [54]. The inverse of the response matrix is applied to the particle-level spectrum, which is reweighted until the folded spectrum agrees with data. This modified detector-level distribution is unfolded with the nominal response matrix and the difference between this and the reweighted particle-level spectrum is taken as an uncertainty. Finally, the sensitivity of the unfolding procedure to pile-up is assessed by reweighting events to vary the distribution of the number of interactions in the MC simulation by 10%: the impact on the measurement is small. This is expected, since the soft-drop algorithm is designed to remove the soft, wide-angle radiation that pileup contributes.
The uncertainties are dominated by QCD modeling and the cluster energy scale. The former are largest ( 20%) at low log 10 (ρ 2 ) where non-perturbative effects introduce a sensitivity to the log 10 (ρ 2 ) distribution prior, and are 10% for the rest of the distribution. Cluster energy uncertainties are large ( 5%) at low log 10 (ρ 2 ) where the cluster multiplicity is low and also at high log 10 (ρ 2 ) where the energy of the hard prongs, rather than their opening angle, dominates the mass resolution. Other sources of uncertainty are typically below 5% across the entire distribution. A summary of the relative sizes of the various systematic uncertainties for β = 0 is shown in Fig. 2. The relative sizes of the different sources of systematic uncertainty are similar for β = 1 and β = 2, except that the large uncertainty at low log 10 (ρ 2 ) values spans a larger range.
The unfolded data are shown in Fig. 3 are normalized to the integrated cross-section, σ resum , measured in the resummation region, −3.7 < log 10 (ρ 2 ) < −1.7. The uncertainties due to the analytical calculation come from independently varying each of the renormalization, factorization, and resummation scales by factors of 2 and 1/2. The NLO+NLL calculation is also given with non-perturbative (NP) corrections based on the average of various MC models with NP effects turned on and off; the envelope of predictions is added as an uncertainty [15]. The LO+NNLL predictions do not contain NP effects, but the open makers in Fig. 3 indicate where NP are expected to be large ('large NP effects').
The MC predictions and the analytical calculations are expected to be accurate in different regions of log 10 (ρ 2 ) [15, 17, 18]. In general, non-perturbative effects are large for log 10 (ρ 2 ) < −3.7 (where smallangle or soft gluon emissions dominate) and small for −3.7 < log 10 (ρ 2 ) < −1.7 where resummation dominates. Fixed, higher-order corrections are expected to be important for log 10 (ρ 2 ) > −1.7, where large-angle gluon emission can play an important role. This implies that the region −3.7 < log 10 (ρ 2 ) < −1.7 (the resummation region) should have the most reliable predictions for both the MC generators and the LO+NNLL analytical calculation, while the NLO+NLL calculation should also be accurate for log 10 (ρ 2 ) > −1.7. For all values of β, the measured and predicted shapes agree well in the resummation region, and the data and NLO+NLL prediction continue to agree well at higher values of log 10 (ρ 2 ). At more negative values of log 10 (ρ 2 ), non-perturbative effects lead to distinctly different predictions between the MC generators and the calculations without NP corrections; the data fall below the predictions for all β values. Interestingly, the NNLL calculation is not everywhere a better model of the data than the NLL calculation in the resummation regime and NP effects can also be comparable to the higher order resummation corrections in this regime. Therefore, improved precision for the future will require will require a careful comparative analysis of the different perturbative calculations as well as a deeper and possibly analytic understanding of NP effects.
As β increases, the fraction of radiation removed by soft-drop grooming decreases and the impact of non-perturbative effects grows larger [17, 18], so the range over which the analytical calculations are accurate also decreases. The degree of agreement between data and all the calculations for log 10 (ρ 2 ) < −3 does substantially worsen for β ∈ {1, 2}, especially when NP corrections are not included. Agreement between the data and the MC generators remains generally within uncertainties for all values of β. Digitized versions of the results, along with versions binned in jet p T can be found at Ref.  Figure 3: The unfolded log 10 (ρ 2 ) distribution for anti-k t R = 0.8 jets with p lead T > 600 GeV, after the soft drop algorithm is applied for β ∈ {0, 1, 2}, in data compared to P , S , and H ++ particle-level (left), and NLO+NLL(+NP) [15] and LO+NNLL [17, 18] theory predictions (right). The LO+NNLL calculation does not have non-perturbative (NP) corrections; the region where these are expected to be large is shown in a open marker (but no correction is applied), while regions where they are expected to be small are shown with a filled marker. All uncertainties described in the text are shown on the data; the uncertainties from the calculations are shown on each one. The distributions are normalized to the integrated cross section, σ resum , measured in the resummation region, −3.7 < log 10 (ρ 2 ) < −1.7. The NLO+NLL+NP cross-section in this resummation regime is 0.14, 0.19, and 0.21 nb for β = 0, 1, 2, respectively [15].
In summary, a measurement of the soft-drop jet mass is reported. The measurement provides a comparison of the internal properties of jets between 32.9 fb −1 of 13 TeV pp collision data collected by the ATLAS detector at the LHC and precision QCD calculations accurate beyond leading logarithm. Where the calculations are well defined perturbatively, they agree well with the data; in regions where non-perturbative effects are expected to be significant, the calculations disagree with the data and the predictions from MC simulation are better able to reproduce the data. The dijet cross section is presented as a normalized fiducial dijet differential cross section as a function of the log 10 (ρ 2 ) for each jet, allowing the results to be used to constrain future calculations and MC generator predictions.

References
[1] There are nearly 100 public search results from ATLAS and CMS as well as an even larger number of phenomenological proposals to use jet substructure to enhance various searches.