Identifying groomed jet splittings in heavy-ion collisions

Measurements of jet substructure in heavy-ion collisions may provide key insight to the nature of jet quenching in the quark-gluon plasma. Jet grooming techniques from high-energy physics have been applied to heavy-ion collisions in order to isolate theoretically controlled jet observables and explore possible modification to the hard substructure of jets. However, the grooming algorithms used have not been tailored to the unique considerations of heavy-ion collisions, in particular to the experimental challenge of reconstructing jets in the presence of a large underlying event. We report a set of simple studies illustrating the impact of the underlying event on identifying groomed jet splittings in heavy-ion collisions, and on associated groomed jet observables. We illustrate the importance of the selection of grooming algorithm, as certain groomers are more robust to these effects, while others, including those commonly used in heavy-ion collisions, are susceptible to large background effects -- which, when uncontrolled, can mimic a jet quenching signal. These experimental considerations, along with appropriate theoretical motivation, provide input to the choice of grooming algorithms employed in heavy-ion collisions.

Measurements of jet substructure in heavy-ion collisions may provide key insight to the nature of jet quenching in the quark-gluon plasma. Jet grooming techniques from high-energy physics have been applied to heavy-ion collisions in order to isolate theoretically controlled jet observables and explore possible modification to the hard substructure of jets. However, the grooming algorithms used have not been tailored to the unique considerations of heavy-ion collisions, in particular to the experimental challenge of reconstructing jets in the presence of a large underlying event. We report a set of simple studies illustrating the impact of the underlying event on identifying groomed jet splittings in heavy-ion collisions, and on associated groomed jet observables. We illustrate the importance of the selection of grooming algorithm, as certain groomers are more robust to these effects, while others, including those commonly used in heavy-ion collisions, are susceptible to large background effects -which, when uncontrolled, can mimic a jet quenching signal. These experimental considerations, along with appropriate theoretical motivation, provide input to the choice of grooming algorithms employed in heavy-ion collisions.

I. INTRODUCTION
Jet grooming techniques were developed in the high-energy physics community to mitigate pileup contamination and improve the theoretical calculability of jet observables in pp collisions. The Soft Drop algorithm, for example, reduces non-perturbative effects by selectively removing soft largeangle radiation, which allows for well-controlled comparisons of measurements to pQCD calculations [1][2][3]. Grooming techniques have recently been applied to heavy-ion collisions, in order to establish whether jet quenching in the quark-gluon plasma modifies the hard substructure of jets, such as the splitting function, and to elucidate whether jets lose energy coherently, as a single color charge, or incoherently, as multiple independent substructures [4][5][6][7][8][9][10][11][12]. Moreover, MC event generators suggest that jet splittings identified by grooming algorithms are correlated to parton shower splittings, raising the possibility that identifying groomed jet splittings in heavy-ion collisions may allow a handle on the spacetime evolution of jet propagation through the hot QCD medium.
Measurements of the Soft Drop groomed momentum fraction, z g , have been made in pp and heavy-ion collisions at the LHC and RHIC [13][14][15][16][17]. These measurements have opened a new avenue in heavy-ion jet physics. Measurements by CMS and ALICE show a modification of the z g distribution in Pb-Pb collisions relative to pp collisions -however, the results have not been corrected for background effects. Local background fluctuations in a heavy-ion environment can result in an incorrect splitting (unrelated to the jet) being identified by the grooming algorithm. This problem is analogous to the well-known experimental problem of 'combinatorial' jets in heavy-ion collisions, which is typically treated by either (1) Reporting jet measurements in the background-free region of phase space, namely at sufficiently large p T and/or small R, or (2) Subtracting the combinatorial jet distribution on an ensemble basis. In the case of groomed jet observables, the scale at which background effects occur is set by the subleading prong of the groomed jet, rather than the jet p T and R. The presence of background contamination in groomed jet observables has been recognized to some extent since the first measurements in heavy-ion collisions, however the magnitude of the effect has not been quantified, nor has its qualitative impact been understood. Since the reported distributions contain a significant number of 'mis-tagged' splittings, it remains unclear how to interpret the observed modifications.
Since the characteristic scale of these effects is set by the subleading prong of the groomed jet, the impact of local background fluctuations on groomed jet observables is dependent on the grooming algorithm employed. In this article, we present a simple set of studies on the performance of various arXiv:2006.01812v1 [hep-ph] 2 Jun 2020 2 grooming algorithms with respect to background contamination effects in heavy-ion collisions, in order to confront the experimental question: How are grooming algorithms affected by the presence of a heavy-ion background? We identify groomers that are relatively robust to background effects, as well as those that are susceptible to contamination. Finally, we discuss implications on the interpretation of previous measurements.

II. ANALYSIS SETUP
We reconstruct jets from charged particles in central rapidity generated by PYTHIA [18] for proton-proton collisions at √ s = 5 TeV using the anti−k T algorithm from the FASTJET [19] package with resolution parameter R = 0.4. Before the jet finding we select particles with p T > 0.15 GeV/c. This setup corresponds to typical experimental configurations at the LHC. To approximate the heavy-ion background, we use a thermal model consisting of N particles drawn from a Gaussian with dN dη ≈ 1800 and p T sampled from a Gamma distribution: f Γ (p T ; α, β) ∝ p α−1 T e −pT/β with α = 2. We select β = 0.5 in order to roughly fit the width of the R = 0.4 δp T distribution in 0-10% Pb-Pb data of σ ≈ 11 GeV/c [20]. We perform event-wide constituent subtraction on the combined event consisting of the charged particles from the PYTHIA event together with the thermal background particles, using R max = 0.25 [21]. We then cluster the subtracted particles into jets, and match these 'combined' jets to those jets found by clustering only the PYTHIA particles.

A. Groomers
In order to study the performance of different grooming criteria, we use the Soft Drop algorithm [1] and the Dynamical Grooming algorithm [22,23] but also new rather simple groomers which we call max-z, max-p soft T , max-κ, max-k T , and min-t f . These are all defined by re-clustering the jet with the Cambridge/Aachen algorithm, where every step of the clustering history is defined by a radiator and two prongs that it decays to. We use the following notation for two prongs a and b such that p radiator T = p a T + p b T , where p b T < p a T , and R g = (y a − y b ) 2 + (ϕ a − ϕ b ) 2 is the angular separation between the two with ϕ being the azimuthal angle and y the rapidity of the prongs (used interchangeably with θ g ≡ R g /R). Therefore, k T ≡ p b T R g , z ≡ p b T /p radiator T , and κ ≡ zR g . We briefly describe the algorithms that we use below: • Soft Drop with β = 0 with three values of the symmetry parameter z cut = 0.1, 0.2, 0.3.
• Dynamical Grooming with three values of the grooming parameter a = 0.1, 1.0, 2.0.
• max-z: For every jet that contains more than one particle, identify the splitting where z is the largest from all the splittings in the primary Lund plane.
• max-p soft T : For every jet that contains more than one particle, identify the splitting where the soft prong has the largest p T from all of the softer prongs within any pair in the primary Lund plane.
• max-κ: For every jet that contains more than one particle, identify the splitting where κ is the largest from all splittings in the primary Lund plane.
• max-k T : For every jet that contains more than one particle, identify the splitting where k T is the largest from all splittings in the primary Lund plane.
• min-t f : For every jet that contains more than one particle, identify the splitting where zR 2 g is the largest from all the splittings in the primary Lund plane (in relation to the estimate of the formation time for the pair t f ∼ 1  For an overview of the phase space that each of the grooming algorithms selects, we plot the primary Lund plane density ρ(κ, R g ) = 1 Njet d 2 N d ln(κ)/d ln(1/Rg) for identified splittings in Fig. 2 [24]. We note that several of these groomers are expected to select similar phase space: max-z, maxp soft T , and Dynamical Grooming a = 0.1 select approximately on the longitudinal momentum of the splitting; max-κ, max-k T , and Dynamical Grooming a = 1.0 select approximately on the transverse momentum of the splitting; min-t f and Dynamical Grooming a = 2.0 select approximately on the mass of the splitting.

B. Prong matching
In order to study the impact of the heavy-ion background on the reconstruction of groomed splittings, we examine where > 50% of the PYTHIA subleading prong (by p T ) is reconstructed in the combined event. We consider only the case where both the PYTHIA jet and the combined jet pass the grooming condition. We categorize six possibilities -the PYTHIA subleading prong is: 1. Correctly reconstructed in the subleading prong of the combined jet.
2. Reconstructed in the leading prong of the combined jet, and the PYTHIA leading prong is reconstructed in the subleading prong of the combined jet. That is, both prongs are correctly identified, but they 'swap' which is leading and which is subleading. In this case, z g and θ g are invariant -although iterative observables are not.
3. Reconstructed in the leading prong of the combined event, and the PYTHIA leading prong is not reconstructed in the subleading prong of the combined event. This is the most common way that an incorrect splitting is reconstructed, typically by a background fluctuation at large angle passing the grooming condition. Due to angular clustering, this by definition results in the subleading prong being absorbed in the leading prong, as shown in Fig. 1.

4.
Reconstructed in the groomed-away constituents of the combined jet.
5. Reconstructed nowhere in the combined jet, but rather its constituents are elsewhere in the combined event.
6. Not reconstructed in any of the above categories; for example, it may have 1/3 of its p T split between three categories.

III. PERFORMANCE OF GROOMERS
For each groomer, we plot the fraction of subleading prongs in the combined events that are correctly tagged in Figure 3, as a function of jet p T . Immediately, it is apparent that to increase the subleading prong purity one should (i) Choose a suitable groomer, and/or (ii) Measure high-p T jets. Groomers with an angular selection perform the worst, which is unsurprising given that combinatorial background preferentially occupies large-angle phase space, as compared to jets. Groomers which select on longitudinal momentum (Dynamical grooming a = 0.1, max-p soft T , max-z) perform well, with Dynamical grooming performing slightly worse, presumably due to its small angular component in the grooming condition. Soft Drop performs similarly to these for z cut = 0.2, 0.3, where above p T = 70 GeV/c there appears to be an approximate saturation, in which case further increasing z cut does not increase the purity. Soft Drop with z cut = 0.1, which is the most common configuration used in heavy-ion collisions, performs notably worse. This suggests that mis-tagged splittings arise from a characteristic longitudinal momentum scale above which background is suppressed, due to uncorrelated background fluctuations on the geometric scale of a prong.
In order to determine the dependence of the mis-tagging fraction on the splitting observables, we decompose the distributions of z g , θ g according to where the PYTHIA subleading prong is reconstructed in the combined event, as described in Section II B. Figure 4 shows the z g (left) and θ g (right) distributions when PYTHIA is embedded in the heavy-ion background. For smaller z cut and lower p T (top row), there is a large fraction of mis-tagged splittings, predominantly from the case where the subleading prong is mis-tagged in the leading prong (Fig. 1). The mis-tagged prongs are most prominent at small-z (where the true z g distribution is naturally peaked) and large-θ (in the tail of the true θ g distribution), however they are not limited to these regions of phase space. We note that in all cases, the correctly tagged distributions exhibit significant deviations from the true distributions, suggesting that there are strong correlations between the structure of the jet and its susceptibility to mis-tagging. By raising z cut (middle row) or increasing p T (bottom row),  the mis-tagging rates are significantly reduced -suggesting that at low-p T , the Soft Drop groomer with z cut = 0.1 is undesirable in heavy-ion collisions, and even with larger z cut or higher p T one should proceed with caution. The bottom panels of Fig. 4 show the fraction of subleading prongs in the embedded events that are correctly tagged, which is denoted as tagging purity (where we now include cases (1) and (2) from Section II B as correct identification). We additionally plot the ratio of the embedded distribution to the true distribution, which shows significant deviations, typically larger for θ g than z g .
In order to investigate the robustness of the choice of grooming algorithm to these experimental background effects, we plot the two ratios from the bottom panels of Fig. 4 for a variety of groomers. In Fig. 5, we plot the subleading prong tagging purity. For z g , the purity is high at large-z g , but decreases substantially at small-z g . For θ g , on the other hand, the purity is typically highest at lowθ g , and decreases at large-θ g . Groomers which select on the longitudinal hardness of the splitting (Soft Drop, Dynamical Grooming a = 0.1, max-p soft T , and max-z) perform the best, however even in these cases the purity becomes low when the absolute scale of z becomes small (Soft Drop z cut = 0.1, and all others for z g small). Of the groomers considered here, Soft Drop is the only one with an absolute cutoff in the grooming condition, which constrains the observable to the high-purity region. This, in combination with the well-studied theoretical benefits of Soft Drop, suggests that Soft Drop with sufficiently large z cut is an appealing groomer for heavy-ion collisions. We note however that in this p T range, the purity remains significantly less than unity, which must be treated carefully. Nevertheless, by maximizing the purity, one can achieve improved experimental control, both by reducing the magnitude of corrections and modeling needed in the measurement, but also by enabling a stable unfolding procedure due to the rejection of large off-diagonal contamination of the response matrix, which is otherwise often unfeasible. Figure 6 shows ratio of the embedded z g and θ g distributions to the PYTHIA distributions for a variety of groomers. This provides complementary information to the purity, since it describes the impact not only of the fraction of mis-tagged splittings, but how different the mis-tagged splittings are from the true splittings. Similar to the purity, the Soft Drop z cut = 0.1 and max-κ groomers perform poorly, whereas the other groomers perform relatively well. We see that this ratio is typically nearer to unity for z g compared to θ g , since for z g the mis-tagged splittings typically deplete and re-populate the low-z region, whereas for θ g the mis-tagged splittings are likely to populate large angles.

IV. RELEVANCE TO PREVIOUS MEASUREMENTS
In this section, we briefly outline the implications of our studies on the interpretation of published measurements of z g [15,16]. These measurements are reported without corrections for background effects or detector effects, but rather Pb-Pb data is compared to an embedded reference. In both [15,16], cuts on R g are employed, which are expected to introduce suppression (or enhancement) of the remaining z g distribution in Pb-Pb relative to pp. 1 There are two relevant effects that the presence of mis-tagged splittings can have on such measurements.
First, mis-tagged splittings dilute quenching effects, which can change the shape of apparent modifications. When comparing Pb-Pb data to an embedded reference, mis-tagged subleading prongs are not expected exhibit jet quenching, since they arise from the combinatorial background. Since the tagging purity varies with z g , this means that non-trivial changes to the shape of the Pb-Pb/pp ratio can be introduced. In particular, the tagging purity is low at small values of z g , and high at large values of z g . To illustrate the impact of this, consider a simple toy example for kinematics similar to the ALICE measurement with ∆R > 0.2, shown in Fig. 7 left. Suppose that the true R AA induced by the R g cut is 0.5, independent of z g . If we assume that mis-tagged splittings are unaffected by jet quenching, then the observed AA distribution will be given by: where f matched is the tagging purity. Note that as f matched → 1, P AA (z g ) → R AA P pp (z g ), whereas if f matched → 0, P AA (z g ) → P pp (z g ). Since the tagging purity is low at small-z g and high at large-z g , this generically causes the observed R AA to exhibit an apparent relative suppression of symmetric splittings -due entirely to background effects, and unrelated to jet quenching. We note that the exact shape of the apparent relative suppression is model-dependent; there are many model-dependent choices one could make which we do not pursue further here, 2 however the feature that the measured R AA will exhibit a spurious relative suppression emerges generically, independent of the details of presence of mis-tagged splittings can induce an artificial shape in the z g ratio, unrelated to jet quenching. Here, the normalization due to the R g selectiontypically denoted as ∆R cuts -are taken in both numerator and denominator to be from the PYTHIA distribution, in order to remove smearing effects (but keep the suppression quantified by R AA ). Note that the momentum scale here is taken from PYTHIA, whereas the experimental selection is an uncorrected Pb-Pb scale.
jet quenching, and depending only on the fact that the purity is low at small-z g and high and large-z g . Based on these considerations it is difficult to conclude that symmetric splittings are more suppressed than asymmetric splittings using the ALICE measurement alone. The right panel of Fig.  7 shows a similar toy example corresponding approximately to CMS kinematics, which suggests that dilution effects are substantially smaller due to the higher purity at high p T , but may still be significant. Note that if one fully corrects the distributions via unfolding instead of performing detector-level embedding comparisons, one eliminate the susceptibility to dilution effects, since the response matrix encodes appropriate corrections of any residual mis-tagged splittings to their true splittings. Second, the magnitude of MC-based corrections (relevant to [16]) grows as the number of mistagged splittings grows. In Fig. 7 (left), the ratio 'Embedded/Truth' gives an estimate of the size of MC-based corrections one has to perform to compare Pb-Pb data to an embedded reference, and is on the order 100%. Note that the shape of this correction is correlated with the experimentally observed modification. Moreover, the distributions are effectively self-normalized, aside from the suppression induced by the R g cut -meaning that small-z g modification necessarily causes largez g modification.

V. CONCLUSION
We performed a set of basic studies on the behavior of various jet grooming algorithms in the presence of the large combinatorial background characteristic of heavy-ion collisions. We find that such background and its region-to-region density fluctuations cause a significant number of splittings to be incorrectly identified as a genuine structure of the signal jets. The robustness of groomers against this experimental challenge is an important criteria for their usage in jet substructure measurements in heavy-ion collisions. We quantify the performance of grooming algorithms using the purity of the identified splittings and jet momentum. Our studies show that predominantly the subleading prongs are prone to mis-identification (lost, replaced by a background flux of particles, and thus merged into the leading prong) and the effect depends on the jet momentum. We have identified a set of grooming algorithms that perform relatively well; however, in our test setup, we found that groomers used in some of the existing heavy-ion measurements result in a significant contamination of the reported distributions with false splittings. We find that in general the contamination decreases (the groomer performance improves) with p T of the jets. Since these background induced splits can generically mimic jet quenching effects, future measurements at the LHC and RHIC aiming at an improved accuracy of physics conclusions will need to leverage the grooming algorithms that maximize the purity of the genuine splittings. One of the important challenges will be to properly quantify the residual uncertainties in the reported quantities due to the contamination effects. The studies presented here ought to be extended to explore the model-dependence of the background and the impact of jet fragmentation on the performance of grooming algorithms. Furthermore, the groomers that we have consider can be refined and further expanded. In particular, a promising direction to explore would be to combine a robust groomer with an additional phase space selection (e.g. κ, t f ). This, of course, calls for further theoretical guidance.