Measurement of soft-drop jet observables in pp collisions with the ATLAS detector at √s = 13 TeV

Jet substructure quantities are measured using jets groomed with the soft-drop grooming procedure in dijet events from 32 . 9 fb − 1 of pp collisions collected with the ATLAS detector at ﬃﬃﬃ s p ¼ 13 TeV. These observables are sensitive to a wide range of QCD phenomena. Some observables, such as the jet mass and opening angle between the two subjets which pass the soft-drop condition, can be described by a high-order (resummed) series in the strong coupling constant α S . Other observables, such as the momentum sharing between the two subjets, are nearly independent of α S . These observables can be constructed using all interacting particles or using only charged particles reconstructed in the inner tracking detectors. Track-based versions of these observables are not collinear safe, but are measured more precisely, and universal nonperturbative functions can absorb the collinear singularities. The unfolded data are directly compared with QCD calculations and hadron-level Monte Carlo simulations. The measurements are performed in different pseudorapidity regions, which are then used to extract quark and gluon jet shapes using the predicted quark and gluon fractions in each region. All of the parton shower and analytical calculations provide an excellent description of the data in most regions of phase space.


Introduction
Jets are collimated sprays of particles that are initiated by high-energy quarks and gluons. Grooming techniques systematically remove soft and wide-angle radiation, making the structure of the jet robust against contamination from multiple simultaneous proton-proton interactions (pileup) as well as against final-state radiation and the underlying event. This internal structure of a jet has been successfully used to tag the origin of jets in precision measurements and searches at the Large Hadron Collider (LHC) [1,2]. While grooming has been a powerful tool for applications of jet substructure techniques, it also provides a unique opportunity for the study of the strong force itself. If groomed in a suitable way, the radiation pattern inside the resulting jet can be predicted from first principles in QCD. The differential cross sections as a function of key observables such as the groomed jet mass have been computed beyond leading-logarithmic accuracy [3][4][5][6][7][8] as an expansion in the strong coupling constant α S along with logarithms of ratios of physical scales. New 'Sudakov safe' observables [9] that are the ratio of attributes that are both infrared-safe and collinear-safe cannot be expressed as an expansion in α S , but can be described with a series in fractional powers of α S . For particular grooming configurations, observables such as the ratio of subjet energies can be independent of α S [9]. These nonstandard and universal behaviors are now being tested with precision at the LHC and the Relativistic Heavy Ion Collider (RHIC).
is the jet radius parameter. If j 1 and j 2 fail the soft-drop condition, then the subjet with the lower p T is removed, and the one with the higher p T is relabeled as j and the procedure is iterated. If the soft-drop condition is satisfied, then the algorithm is stopped, and the resulting jet j is the soft-dropped jet. If no pairs of subjets in the declustering satisfy the soft-drop condition, then the resulting jet is the zero vector.
The parameters z cut and β determine the sensitivity of the algorithm to soft and wide-angle radiation. As β → ∞ (and z cut < 1), the soft-drop condition is always satisfied, and no grooming is applied. Decreasing β preferentially removes wide-angle radiation and increasing z cut preferentially removes soft radiation. The theoretical calculations are performed for a range in β and assume z cut is small enough so that it does not introduce large logarithms (which was explicitly checked in Refs. [5,6]). This measurement adopts the same choice as the available theoretical calculations: z cut = 0.1 and β ≥ 0. Several β values are tested to probe different scales of angular strulcture inside the jets. This paper measures three closely related substructure observables, which are calculated from jets after they have been groomed with the soft-drop algorithm. These are the jet mass, the p T balance z g (which is the left-hand side of Eq.( 1)) of the splitting which passes the soft-drop condition, and r g , which is the opening angle R 12 of this splitting in Eq. ( 1). These three observables -the jet mass, z g and r g -are described in greater detail in Section 5.2. These observables are approximately related by m 2 /p 2 T ∼ z g r 2 g , and each probes different aspects of the structure of the jet.

ATLAS detector
The ATLAS detector [30] at the LHC covers nearly the entire solid angle around the collision point. It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer incorporating three large superconducting toroidal magnets.
The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range |η| < 2.5. The high-granularity silicon pixel detector, the innermost layer of the tracking detector, covers the vertex region and typically provides four measurements per track, the first hit being typically recorded in the insertable B-layer that was installed before Run 2 [31,32]. It is followed by the silicon microstrip tracker, which usually provides eight measurements per track. These silicon detectors are complemented by the transition radiation tracker, which enables radially extended track reconstruction up to |η| = 2.0.
The calorimeter system covers the pseudorapidity range |η| < 4.9. Within the region |η| < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) detectors, with an additional thin LAr presampler covering |η| < 1.8, to correct for energy loss in material upstream of the detectors. Hadronic calorimetry is provided by the steel/scintillator-tile detector, segmented into three barrel structures within |η| < 1.7, and two copper/LAr hadronic endcap calorimeters which cover 1.5 < |η| < 3.2. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules covering 3.1 < |η| < 4.9, which are optimized for electromagnetic and hadronic measurements respectively.
Interesting events are selected for recording by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [33]. The first-level trigger makes decisions at the 40 MHz bunch crossing rate to keep the accepted-event rate below 100 kHz, which the high-level trigger further reduces in order to record events to disk at about 1 kHz.

Datasets
These measurements use the dataset of pp collisions recorded by the ATLAS detector in 2016, corresponding to an integrated luminosity of 32.9 fb −1 [34,35] at a center-of-mass energy of √ s = 13 TeV. Events are only considered if they were collected during stable beam conditions and satisfy all data quality requirements [36]. Due to the high instantaneous luminosity and the large total inelastic proton-proton (pp) cross section, on average there are about 25 simultaneous (pileup) collisions in each bunch crossing.
The measurements presented in this paper use a variety of Monte Carlo (MC) event generator samples to estimate the impact of detector efficiency and resolution as well as for comparison with the unfolded data. Dijet events were generated at leading order (LO) with P 8.186 [37,38], with the 2 → 2 matrix element convolved with the NNPDF2.3LO parton distribution function (PDF) set [39] and using the A14 set of multiple-parton-interaction and parton-shower parameters [40]. P 8 uses a p T -ordered parton shower model. Additional dijet events were generated using different generators, in order to study the impact of modeling uncertainties. S 2.1 [41,42] was used to generate events using multileg 2 → 2 and 2 → 3 matrix elements, which were matched to parton showers following the CKKW prescription [43]. These S events were generated using the CT10nlo PDF set [44] and the default S set of tuned parameters. Herwig++ 2.7 [45,46] was used to provide a sample of events with an angle-ordered parton shower model. These events were generated with the 2 → 2 matrix element, convolved with the CTEQ6L1 PDF set [47] and configured with the UE-EE-5 set of tuned parameters [48].
All generator events were passed through a full simulation of the ATLAS detector [49] implemented in G 4 [50], which describes the interactions of particles with the detector and the subsequent digitization of analog signals. The effects of pileup were simulated with unbiased pp collisions using the P 8.186 generator with the A2 [51] set of tuned parameters and the MSTW2008LO [52] PDF set; these events were overlaid on the nominal dijet events. These events are then reweighted such that the distribution of the average number of interactions per bunch crossing matches that seen in data.

Event selection and object reconstruction
Since the data are unfolded to particle level, it is necessary to define both the particle-level and detector-level objects used in the measurement. The former are chosen to be as close as possible to the latter in order to minimize the model dependence caused by an extrapolation from the phase space measured at detector level to the phase space measured at particle level. Section 5.1 describes the particle-level and detector-level event selection criteria. Following this, Section 5.2 describes the particle-level and detector-level jet reconstruction procedure for both the calorimeter-based (all-particle) observables and the track-based (charged-particle) observables.

Jet and event selection
Detector-level events are required to have at least one primary vertex reconstructed from at least two tracks with p T greater than 400 MeV. The primary hard-scattering vertex of the event is chosen to be the one with the highest tracks p 2 T . The inputs to the jet clustering algorithm are locally calibrated topological calorimeter-cell clusters [53].
Jets are clustered with F J [54] using the anti-k t [55] algorithm with radius parameter R = 0.8. A series of simulation-based calibration factors are applied to ensure that the detector-level jet p T is the same as the particle-level value on average [56]. Each event is required to have at least two reconstructed jets, where the transverse momentum of the leading jet, p lead T , is greater than 300 GeV. The jet selection is applied to ungroomed jets, which ensures that the same jets are studied for all grooming configurations. In order to enhance the dijet topology and allow an interpretation of quark or gluon origin of the jets in the event, the leading two jets are required to be well-balanced: p lead T /p sublead T < 1.5. Both jets are required to have |η| < 1.5, and only jets with a nonzero mass are retained.
Events are selected using single-jet triggers. Due to the large cross section for jet production, most of the jet triggers are prescaled. Therefore events which pass these triggers are randomly discarded with some fixed probability. The lowest-p T -threshold unprescaled R = 0.4 single-jet trigger in 2016 is fully efficient for R = 0.8 dijet events where the leading-jet p T is greater than 600 GeV. In events where the leading jet has 300 GeV < p T < 600 GeV, a prescaled trigger is used with an average prescale value of 1000 (the inverse of the probability to be recorded). While this results in a lower effective luminosity, it provides access to the lower p T region.
The inputs to particle-level jets are stable particles (cτ > 10 mm) excluding muons and neutrinos. These jets are clustered using the same radius parameter as the detector-level jets and have the same η and p T cuts as for the detector-level selection.

Inputs for jet substructure
Two types of jet substructure observables are measured: calorimeter-based observables, which correspond to observables reconstructed from all particles inside the jet at particle level, and track-based observables, which correspond to observables reconstructed from charged particles. Track-based observables are theoretically more complicated to describe, but are experimentally cleaner to measure due to the precise angular measurement from the ID. For both the calorimeter-based and track-based measurements, the jet selection is performed on the calorimeter-based jets, while the soft-drop grooming is applied to the cluster inputs and the track inputs respectively (Section 2). The jets after the application of this algorithm are often referred to as groomed, and the constituents of these jets are used to compute the jet substructure observables. It is noted that since the event selection is applied to un-groomed jets, some selected jets are left with one constituent after grooming, resulting in jets with a mass of zero.
For the calorimeter-based observables, the same constituents are used to calculate the observables as are used to create the jets described in Section 5.1 for both detector level and particle level. For detector-level track-based observables, the soft-drop procedure is applied to tracks matched to the ungroomed jet via ghost association [57], and jet substructure observables are calculated using the groomed tracks. These tracks are selected with a p T > 500 MeV requirement and assigned to the primary vertex in accord with the track-to-vertex matching. Tracks not included in vertex reconstruction are assigned to the primary vertex if it has the smallest |∆z 0 sin θ| compared to any other reconstructed vertex, up to a maximum distance of 3.0 mm. Tracks not matched to the primary vertex are not considered. At particle level, these track-based observables are built using the charged-particle constituents of the particle-level jets, excluding muons.
Both the leading and subleading jet are used in this measurement. In order to expose differences between quark and gluon jets, the more forward and more central of the two jets are distinguished and measured separately. Between the leading and subleading jets, the one with the smaller |η| will be referred to as the "central" jet, and the other one as the "forward" jet. For a fixed jet p T at high rapidity where the high-x contribution is more important, jets are more often quark-initiated due to the large contribution of valence quarks.

Observables
Three substructure observables are calculated from the two jets groomed with the soft-drop algorithm (using the C/A algorithm with R = 0.8 to recluster the jets), including the jet mass, z g , and r g . These three observables completely characterize the splitting from the soft-drop condition, and they are all measured using both the calorimeter and tracker inputs.
Jet mass: One of the most basic and important jet substructure observables is the jet mass: where i refers to the constituents of the jet. The measurement is performed for a dimensionless version of the jet mass: the relative mass ρ ≡ log(m 2 /p 2 T ), where m is groomed and p T is ungroomed (groomed jet p T is not infrared-and collinear-safe [5]). The calorimeter-cluster inputs are treated as massless and tracks are assigned the pion mass. Since the probability distribution of ρ is approximately linear in the resummation regime (Λ QCD /p T m/p T z cut , where Λ QCD is the energy scale of hadronization) [3][4][5][6][7][8], the binning for ρ is evenly spaced. For ρ, the distributions are normalized to the integrated cross section, σ resum , measured in the resummation region, −3.7 < ρ < −1.7. By changing β, the distribution shifts to higher values as fewer constituents are removed from the jets during grooming.
An example of the distribution of ρ in simulation at detector level (particle level) for the calorimeter-based (all particles) definition is shown in Figure 1(a) for the more central of the two jets and for β = 0. For this observable, particularly in the lower-relative-mass region, there are nontrivial detector effects which occur due to the calorimeter granularity, resulting in a distribution with different shapes at particle and detector levels. As expected, the distribution of log(m 2 /p 2 T ) is approximately linear for β = 0 in the resummation regime.
One way to reduce the impact of these detector corrections is to consider track-based (charged-particle) observables. An example of the track-based (charged-particle-based) ρ is shown in Figure 1(b), where tracks (charged particles) are used for both the mass and the p T . As in the calorimeter case, the mass is calculated using the groomed jet, while the p T is calculated using the ungroomed constituents, but no calibration is applied to the ungroomed jet since no such calibration exists for track-based inputs. Although the particle-level distributions only include charged particles, the distributions are similar to those shown in Figure 1(a), but in this case the impact of the detector corrections is significantly smaller.  z g : An important quantity when describing the hard splitting scale that defines the mass is z g , which is min(p T,j 1 , p T,j 2 )/(p T,j 1 + p T,j 2 ) for the splitting that satisfies the soft-drop condition. If no such splitting occurs, then the jet is not included in the measurement. Symmetric splittings are characterized by z g ∼ 0.5. Figure 2 shows an example of the normalized distribution in simulation of z g at detector level (particle level) with β = 0 for both the calorimeter-based (all particles) and track-based (charged particles) definitions. For β = 0 and z cut = 0.1, z g must be greater than 0.1 in order to pass the soft-drop condition, and therefore bins with z g values less than 0.1 are not shown (this is not the case for β > 0). As in the case with the mass, the distributions of the charged-particles and all-particles versions of z g are similar. Detector effects for the calorimeter-based z g are smaller than for the relative mass, because z g is less sensitive to the angular distribution of energy within the jet.
The binning is evenly spaced in z g and the distributions are normalized to the integrated cross section σ. r g : The opening angle ∆R 12 between the two subjets that pass the soft-drop condition is r g . This angle is smaller than the jet radius by definition. Although r g is highly correlated with the relative mass and z g , it is useful for explicitly exposing the angular distribution. Figure 3 shows an example of the normalized calorimeter-based (all particles) and track-based (charged particles) r g distributions. As expected, there are large detector effects for the calorimeter-based case, especially at low angles. Due to the correlation between mass and r g , the distribution shapes and detector effects look similar to the ones shown in Figure 1.
The binning for r g is logarithmically spaced. The distributions are normalized to the integrated cross section σ. Similar to ρ, increasing β shifts the distribution to higher values as there is less grooming. The distribution in simulation of z g at detector level and particle level for β = 0 for (a) calorimeter-based (all particles), and (b) track-based (charged particles). The statistical uncertainties are drawn, but are too small to be visible. Calorimeter-based (a) r g distribution, β = 0, calorimeter-based

Unfolding
The substructure observables are reconstructed in bins of the transverse momentum of the jet, and the double-differential distributions are unfolded using P 8.186. An iterative Bayesian technique [58] is used with one (four) iterations for track-based (calorimeter-based) observables. These values were chosen to minimize the total uncertainty, and are implemented in the RooUnfold framework [59].
The probability distributions of obtaining a particle-level value given a detector-level observation, Pr(particle-level|detector-level), in P 8 for β = 0 are presented for all three observables for the calorimeter-based and track-based definitions in Figure 4. While the unfolding is done simultaneously in p T and the jet observable, the unfolding matrices are shown inclusively in p T for simplicity. As anticipated, the unfolding matrices for the track-based observables have significantly smaller off-diagonal elements than their calorimeter-based analogs.

Uncertainties
Several sources of statistical and systematic uncertainties are considered for this analysis. The data and simulation statistical uncertainties are evaluated from pseudo-experiments using the bootstrap method [60]. The uncertainties from the calorimeter-cell reconstruction, track reconstruction, and MC modeling are determined by applying variations to the simulation, as detailed in Sections 8.1, 8.2, and 8.3, respectively. The impact of the calorimeter-cell cluster uncertainties on the jets is taken into account for both the calorimeter-based measurement as well as the track-based measurement since it impacts the selection of jets. The varied simulation is then used to repeat the unfolding procedure and the deviation from the nominal result is used to estimate the uncertainty. The uncertainty in the pileup modeling is determined by reweighting the pileup profile up by 10% in MC simulation. The uncertainty in the unfolding procedure (unfolding nonclosure) is computed using a data-driven reweighting procedure [61]. In this method, the particle-level spectrum is reweighted such that the reconstructed spectrum better matches the data distribution, while the response matrix is left unchanged. The difference between the reweighted detectorlevel simulation after unfolding and the generator-level simulation from the same generator is then taken as an uncertainty. All uncertainties are symmetrized unless stated otherwise.
A summary of all the uncertainties considered is given in Section 8.4.

Calorimeter-cell cluster uncertainties
Uncertainties on the reconstruction of calorimeter cell clusters are estimated using comparisons between tracks with momentum p and clusters with energy E in data and in simulation.
Calorimeter cell clusters require seed cells that exceed the noise threshold; if a particle interacts with the material in front of the calorimeter and produces many spread-out low-energy secondary particles, there may not be sufficient localized energy to seed a cluster. The rate at which particles do not seed a cluster is studied with tracks that do not match a calorimeter cell cluster within ∆R < 0.2, where tracks are extrapolated to the calorimeter layer corresponding to the energy-weighted position of the calorimeter cell cluster. This rate is studied at 13 TeV pp collisions using tracks isolated from all other track candidates by at least ∆R = 0.4. The data/MC difference is then used to derive the cluster reconstruction efficiency uncertainty, which is evaluated in bins of pseudorapidity and energy [62]. To assess the impact of this uncertainty on the unfolded results, clusters are randomly removed at a rate determined by the measured difference between data and simulation -less than 5% for low-momentum clusters and negligible beyond 10 GeV.
The cluster energy scale and resolution uncertainties are determined in three separate regions. For E < 30 GeV, there are enough events to derive these uncertainties using the full E/p distribution in data [62]. For any clusters with 30 < E < 350 GeV, the uncertainties are derived from the combined testbeam data [63]. Finally, for regions outside of the testbeam and E/p coverage, a p T -and η-independent 10% uncertainty is assigned as a conservative estimate of the uncertainty, as done in previous studies [62].
For the regions where the uncertainty is derived using E/p, the mean and standard deviation of the distributions are extracted in bins of E and |η|. Only tracks with at least one associated cluster are included, using the same matching criteria as for the cluster efficiency. Depending on the fit quality, either the mean and σ of a Gaussian fit to the data, or the distribution mean and RMS values are used. For example, for p ≈ 25 GeV and η ≈ 0, the data and simulation are consistent with E/p = 1 and σ(E/p) = 0.22 within 1% for the mean and 5% for the standard deviation.
To evaluate the cluster energy scale uncertainty, the cluster energy in simulation is shifted according to the difference of the E/p mean value between data and MC simulation. Similarly, to evaluate the cluster energy resolution uncertainty, cluster energies are smeared according to data/MC differences in the E/p distribution by one standard deviation. The effect of the energy scale and resolution uncertainties is defined as the relative difference between the nominal and modified jet substructure observable. A series of validation studies which probe the jet energy scale, jet mass scale, and jet mass resolution were performed to ensure that this prescription is also valid for nonisolated clusters within jets.
The cluster angular resolution is estimated using a similar method by studying the modeling of the ∆R distribution between tracks and calorimeter-cell clusters.

Tracking uncertainties
Systematic uncertainties are evaluated for the track reconstruction efficiency, fake rate, and momentum scale. The efficiency is decomposed into two components: one from the uncertainty in the inner detector material ('inclusive efficiency') and one from the modeling of pixel cluster merging inside dense environments, such as inside the core of high-energy jets ('efficiency within jets').
The inclusive efficiency uncertainty is due to the material uncertainty, which is constrained by detector construction knowledge and photon conversions as well as hadronic interactions [64]. The total relative uncertainty on the efficiency is 0.5% for |η| < 0.1 and grows to 2.7% for the 2.3 < |η| < 2.5 region.
The impact of this uncertainty in the measured distributions is evaluated by randomly removing tracks in simulation with a p T -and |η|-dependent probability.
The uncertainty in the tracking efficiency in dense environments is due to the modeling of pixel cluster merging. This is studied using the dE/dx method [65,66]: the rate of pixel clusters assigned to single tracks with a large charge (comparable to twice a minimum ionizing particle charge) in the core of jets is measured in data and in simulation. The comparison between data and simulation results in an additional 0.4% (absolute) uncertainty that is only applied to tracks within a ∆R = 0.1 of a jet.
Fake tracks result from random combinations of hits mostly from charged particles that happen to overlap in space. Outside of jets, the fake rate is highly pileup dependent, as the chances for many low-p T particles to be close increases with the number of particles in the event. However, inside jets, the density from primary charged particles is also high and can result in an increased fake rate. The fake rate itself is much less than 1%, but fake tracks can have a large p T . The modeling of the fake rate is studied with a dedicated measurement that enriches the rate of fake tracks by inverting various track quality criteria [67]. The simulation reproduces the fake rate to within about 30% of the observed rate in data. The fake-rate uncertainty is estimated by randomly removing 30% of fake tracks.
The leading source of uncertainty in the track parameters is in the q/p T (q is the electric charge) from a potential sagitta distortion due to detector-misalignment weak modes [68]. This bias is corrected for, once per data-taking period, and the correction is about 0.1/TeV except at φ ≈ 0 and |η| ≈ 2.5 where the correction can reach 1/TeV. The impact on the measurement is smaller than that of the other tracking uncertainties.

Modeling uncertainty
Since the detector response depends on the energy and angular distribution of particles inside jets, it is sensitive to the fragmentation model used for the unfolding. An uncertainty is estimated by repeating the unfolding procedure using S and comparing that with the nominal unfolding that uses P 8, and taking the full difference as the uncertainty. The result of performing this procedure with Herwig++ instead of S produces a similar uncertainty. In addition to the direct sensitivity to the fragmentation modeling, there is also an indirect sensitivity to the quark/gluon fractions and the jet momentum distribution. An uncertainty due to the PDFs is evaluated as the spread in the unfolded distributions from 100 NNPDF2.3LO eigenvector variations. Figure 5 presents a summary of the total and individual uncertainties for all observables and β = 0 for both the calorimeter-based and track-based measurements, where all of the uncertainties are summed in quadrature to obtain the total uncertainty. The uncertainties change with β, due to the differing angular sensitivity, but the overall conclusions are similar. For the calorimeter-based ρ, the fragmentation modeling is the dominant uncertainty for most of the mass range, while the pileup modeling and cluster energy scale uncertainties dominate at high relative mass. A similar description is true for the track-based ρ, where the fragmentation modeling is the dominant uncertainty across the entire ρ range and the effects from the unfolding nonclosure are subdominant, while the tracking uncertainties are typically negligible. Analogous results hold for the calorimeter-based r g observable, while for the track-based r g measurement, subdominant effects are seen from the cluster energy scale, fake rate, and data statistical uncertainty. For calorimeter-based z g , the cluster energy scale and modeling uncertainties are most important, and the uncertainties are generally smaller than for ρ and r g . A similar description holds for the track-based z g , whose uncertainty is dominated by the modeling and unfolding nonclosure uncertainties.

Results
The unfolded data are presented in several different ways, in order to highlight various aspects of the measurement. Since these distributions change slowly as a function of p T , most of the results are shown inclusively in p T . Section 9.1 provides a comparison between the unfolded data and several MC predictions, highlighting the various regions of each measurement which are well-modeled by simulation. This is followed by a comparison between the unfolded data and state-of-the-art analytical predictions in Section 9.2. Section 9.3 directly compares the results of the measurements of the calorimeter-and track-based observables. While these observables are unfolded to different particle-level definitions, this comparison highlights the similarities between the different definitions, as well as demonstrating the improved precision in track-based measurements of observables sensitive to the angular structure of the jet. The forward and central measurements are compared in Section 9.4, and these measurements are used as input to the extraction of the quark-and gluon-jet distributions of these observables, which are shown in Section 9.5.  compare the unfolded data from both jets with the particle-level distributions from MC generators described in Section 4. Several trends are visible in these results. For ρ, the MC predictions are mostly accurate within 10% except for the lowest relative masses, which are dominated by nonperturbative physical effects. This becomes more visible for larger values of β, where more soft radiation is included within the jet, increasing the size of the nonperturbative effects. In addition, in the high-relative-mass region, where the effects of the fixed-order calculation are relevant, some differences between MC generators are seen. A similar trend may be seen for r g , where the small-angle region shows more pronounced differences between MC generators, since this corresponds to the region where nonperturbative effects are largest. Overall, these effects are smaller than for the relative mass. Unlike the other two observables, z g is modeled well within about 10% across most of the spectrum. However, there is some tension between the predictions and the unfolded data, which is visible particularly for the track-based observables, which have better precision.

Comparison with MC predictions
In general, the MC predictions show similar behavior for the calorimeter-based and track-based definitions, both in their overall distributions and in their agreement with the unfolded data distribution. However, as the tracking measurement is more precise, the disagreement between data and MC simulation in the nonperturbative regions is more significant. For instance, in Figure 7(e)-7(f), the Herwig++ prediction does not agree with the unfolded distribution at high values of z g for the track-based case, but it does agree in the calorimeter-based case. Pr(particle-level | detector-level) 4 Particle-level Pr(particle-level | detector-level) 4 Particle-level Pr(particle-level | detector-level) Detector-level log Pr(particle-level | detector-level) Detector-level log    (e) r g , β = 0, calorimeter-based Ratio to Data (c) β = 1, calorimeter-based Ratio to Data (e) β = 2, calorimeter-based Ratio to Data Ratio to Data (a) β = 0, calorimeter-based Ratio to Data Ratio to Data (c) β = 1, calorimeter-based Ratio to Data (d) β = 1, track-based Ratio to Data (e) β = 2, calorimeter-based

Comparison with analytical predictions
Currently, it is only possible to perform analytical predictions when including both charged and neutral particles, and therefore results in this section are only compared with the calorimeter-based results. Subleading logarithms have been computed for ρ and r g , as described below. Several calculations have been performed to predict the ρ distribution, and these predictions are compared with the unfolded data. In addition, only ρ and r g are studied, since no predictions exist for z g beyond leading-logarithmic accuracy.
The LO+NNLL and NNLL calculations are based on soft collinear effective theory (SCET) [69,70]. The former is matched to leading order using MadGraph5_aMC@NLO [71] with the MSTW2008LO PDF. The latter uses the CT14nlo [72] PDF set and includes finite z cut resummation as well as nonperturbative corrections based on an analytic shape function with one free parameter that is chosen based on comparisons with P 8. While strictly for inclusive jets, the NNLL calculation is also applicable here because at high jet p T , the difference between inclusive jets and dijets is negligible. The NLO+NLL calculation is matched to fixed order using NLOJet++ [73,74] with the CT14nlo PDF and includes finite z cut resummation as well as nonperturbative corrections from the envelope of parton shower MC predictions from H 6.521 [75] [81].
These predictions are compared with the unfolded data in Figure 9. Because the LO+NNLL and NLO+NLL calculations for ρ are only available for p T > 600 GeV, the unfolded data are shown for both a low-p T jet selection (p T > 300 GeV) and a high-p T jet selection ((p T > 600 GeV). The calculations are able to model the data in the resummation region (approximately −3 ρ −1) at the level of a 10% difference. The NLO+NLL calculation also provides an accurate model of the data at the high values of ρ, while the LO+NNLL and NNLL calculations do not model this region as accurately. This is the region where the fixed-order effects are dominant, and so this behavior is expected.
At lower values of relative mass, the nonperturbative corrections are needed to describe the data. This can be seen particularly from the low-p T results, which show the NNLL prediction with and without the inclusion of nonperturbative effects. As expected, the inclusion of these effects brings the prediction much closer to the unfolded data distribution, although the level of agreement is still not as good as in the resummation region. The region where nonperturbative corrections are relevant shifts to higher relative mass with increased values of β, since more soft radiation is included within the jet. In general, similar levels of agreement are seen in the low-p T and high-p T cases, although it is noted that the nonperturbative region shifts to slightly lower relative mass in the high-p T case.
An NLL calculation of r g has been performed recently [82], and the results of this calculation are compared with the unfolded data distribution in Figure 10. Unlike the jet mass case, nonglobal logarithms are not absent (β = 0) or power suppressed (β > 0). The calculation includes both the nonglobal and clustering logarithms to achieve full NLL accuracy. In general, in the region where nonperturbative effects are expected to be small, the prediction agrees with the data within uncertainties, while in the regions where nonperturbative effects are large, the prediction is systematically higher than the data.
Ratio to Data Ratio to Data  Ratio to Data Ratio to Data Ratio to Data Ratio to Data

Comparison of track-based and calorimeter-based measurements
On a jet-by-jet basis, the value of the all-particles and charged-particles jet substructure observables are largely uncorrelated. However, due to isospin symmetry, the probability distributions for all-particles and charged-particles distributions are nearly identical. This is studied by comparing the unfolded distributions for the cluster-based and track-based measurements, which are shown in Figures 11-13 for the region which includes both jets in the dijet system. The results generally agree in the perturbative regions at high values of ρ and r g , and there is disagreement in the low-relative-mass regions. There is also some disagreement for low values of z g for β > 0.
These studies also enable a comparison of the sizes of the uncertainties for calorimeter-based and track-based observables. For all of these observables, the uncertainties for the track-based observables are significantly smaller than those for the calorimeter-based observables, particularly for higher values of β, where more soft radiation is included within the jet. However, since no track-based calculations exist at the present time, calorimeter-based measurements are still useful for precision QCD studies. Ratio to Track-based (a) ρ distribution, β = 0 Ratio to Track-based

Comparison of forward and central measurements
The distribution of the substructure observables at a given p T is a function of the composition of the initiating parton type, and should not be affected by where the jet is produced within the detector. Therefore, any differences seen between the distribution of the observable in different regions of the detector are related to the quark-gluon composition of the events produced there. Since this measurement was done separately for the more forward and more central jet in the dijet samples, it is possible to compare these distributions to see if these effects are visible. The fraction of central and forward jets originating from gluons, f G , in P 8 multijet events is shown in Table 1, where the jet flavor is determined by the highest-energy parton inside the jet cone.2 This shows that the gluon fractions in the forward and central regions differ by about 5-10%. For each of the three observables, Figure 14 compares the unfolded distributions for the jets in the forward region with those for jets in the central region. As expected, since the forward region is Ratio to Track-based (a) r g distribution, β = 0  quark-enhanced, it has more jets at lower relative masses. These differences are numerically small because the gluon fractions are similar for the forward and central jets.

Quark-gluon Extraction of the observables
Since the shape of the ρ, r g , and z g distributions at a given jet p T only depends on the flavor of the initiating parton and not on the rapidity, the quark and gluon distributions may be extracted from the measurements of the central and forward distributions if the quark-gluon fraction is known for each region. In particular, the central and forward distributions for these observables may be described as the sum of the quark and gluon distributions, weighted by the quark and gluon composition of the sample: where h i is a bin of a histogram for an observable, F and C represent the forward and central regions, and Q and G represent quark or gluon. The quark and gluon fractions ( f Q and f G ) for the more forward and more central jets are determined from the nominal P 8 MC event sample, where the quark fraction f Q is given by 1 − f G . This extraction is model-dependent, but the more forward and more central distributions are made public for reinterpretation using any model. Table 1 shows these values for each p T bin. Equations ( 2) may then be solved for h G i and h Q i to extract these distributions from the forward and central distributions.
The extracted quark and gluon distributions are shown in Figures 15-17 for track-based observables. Cluster-based observables are not shown, but exhibit similar behavior overall. For these results, the PDF uncertainties and the uncertainties in the jet inputs are taken to be fully correlated between the more forward and more central jets, while all other uncertainties are considered fully uncorrelated. In addition, to account for the uncertainty in the composition of the sample, the difference between the extracted distributions using the P 8 and S compositions is taken as an uncertainty. A few observations can be made about the differences between the quark and gluon distributions. For the jet mass, the gluon distribution tends towards higher values of the mass, which is expected due to the larger color factor associated with gluons. These differences become more apparent at larger values of β. For β = 0, z g is independent of α S to leading order, and the distributions are very similar, while for β > 0, some differences begin to appear. Finally, for r g , the gluon distribution tends towards a larger splitting, which is similarly more apparent at larger values of β.    Quark Data Ratio to (c) r g distribution, β = 2, track-based Figure 17: Comparison of the quark and gluon unfolded r g distribution for the track-based measurement. The uncertainty bands include all sources: data and MC statistical uncertainties, nonclosure, modeling, and tracking uncertainties where relevant.

Conclusion
This paper presents a measurement of soft-drop jet substructure observables in dijet events in pp collisions at √ s = 13 TeV using a dataset corresponding to an integrated luminosity of 32.9 fb −1 collected with the ATLAS detector at the LHC. Unfolded measurements of three substructure observables are shown for both the calorimeter-based observables unfolded to the all-particle level and track-based observables unfolded to the charged-particles level. These two types of measurements allow a direct comparison of how the different object definitions affect on the measured observables. The calorimeter-based measurements for the relative jet mass and r g are compared with analytical predictions and are shown to be in good agreement in the perturbative region. In particular, this provides the first comparison between an analytical prediction and an unfolded measurement of r g . Particularly for observables which are sensitive to the angular distribution of radiation within a jet, track-based observables are shown to be more precise than calorimeter-based observables, due to the better angular resolution of tracks. Since analytical predictions of track-based observables are not currently available, cluster-based observables are still relevant for probing the perturbative region. The forward and central jets are measured separately, which enables an extraction of the quark-and gluon-jet distributions using input from simulation. The extractions demonstrate differences between the observables in their sensitivity to the quark and gluon composition of the sample, which are most pronounced for the least amount of grooming.
(Taiwan), RAL (UK) and BNL (USA), the Tier-2 facilities worldwide and large non-WLCG resource providers. Major contributors of computing resources are listed in Ref. [85].
[20] CMS Collaboration, Measurement of jet substructure observables in tt events from proton-proton collisions at √ s = 13