On the use of charged-track information to subtract neutral pileup

The use of charged pileup tracks in a jet to predict the neutral pileup component in that same jet could potentially lead to improved pileup removal techniques, provided there is a strong local correlation between charged and neutral pileup. In Monte Carlo simulation we find that the correlation is however moderate, a feature that we attribute to characteristics of the underlying non-perturbative dynamics. Consequently, `neutral-proportional-to-charge' (NpC) pileup mitigation approaches do not outperform existing, area-based, pileup removal methods. This finding contrasts with the arguments made in favour of a new method,"jet cleansing", in part based on the NpC approach. We identify the critical differences between the performances of linear cleansing and trimmed NpC as being due to the former's rejection of subjets that have no charged tracks from the leading vertex, a procedure that we name"zeroing". Zeroing, an extreme version of the"charged-track trimming"proposed by ATLAS, can be combined with a range of pileup-mitigation methods, and appears to have both benefits and drawbacks. We show how the latter can be straightforwardly alleviated. We also discuss the limited potential for improvement that can be obtained by linear combinations of the NpC and area-subtraction methods.


Introduction
Pileup, the superposition of many soft proton-proton collisions over interesting hardscattering events, is a significant issue at CERN's Large Hadron Collider (LHC) and also at 1 Another experimental input that could conceivably help reject pileup in future detectors is precise timing information. One might also wonder about the potential benefit from calorimetric pointing information for photons, given that this is already being used to locate the primary vertex in Higgs decays to two photons [6]. However it seems likely that the degradation of pointing angular resolution [7] due to the lower energy of pileup photons and the higher detector occupancy would render this approach impractical. We thank Isabelle Wingerter for helpful explanations on this point. 2 They also investigated the use of a variable known as the jet vertex fraction, widely used experimentally to reject jets from a vertex other than the leading one [1,10,11].
investigate what properties of events can provide insight into the performance of the NpC method. We will also be led to discuss the possible value of charged tracks from the leading vertex in deciding whether to keep or reject individual subjets (as in charged-track based trimming of Ref. [11]). Finally we shall also examine how one might optimally combine NpC and area-median subtraction.

The Neutral-proportional-to-Charged method
NpC subtraction relies on the experiments' ability to identify whether a given charged track is from a pileup vertex, in order to measure the charged pileup entering a particular jet. To a good extent this charged component can be removed, for example as in CMS's Charged-Hadron Subtraction (CHS) procedure in the context of particle flow [5]. The NpC method then further estimates and subtracts the neutral pileup component by assuming it to be proportional to the charged pileup component. At least two variants can be conceived of.
If the charged pileup particles are kept as part of the jet during clustering, then the corrected jet momentum is [8] p jet,sub where p jet,chg-PU µ is the four-momentum of the charged-pileup particles in the jet and γ 0 is the average fraction of pileup transverse momentum that is carried by charged particles. Specifically, one can define γ 0 ≡ i∈charged particles p ti i∈all particles p ti where the sums run over particles in a given event (possibly limited to some central region with tracking), and the average is carried out across minimum-bias events. If the charged pileup particles are not directly included in the clustering (i.e. it is the CHS event that is provided to the clustering), then one does not have any information on which charged particles should be used to estimate the neutral pileup in a given jet. This problem can be circumvented by a clustering an "emulated" CHS event, in which the charged-pileup particles are kept, but with their momenta rescaled by an infinitesimal factor ǫ. In this case the correction becomes where p jet,CHS µ is the momentum of the jet as obtained from the emulated CHS event, while p jet,rescaled-chg-PU µ is the summed momentum of the rescaled charged-pileup particles that are in the jet. When carrying out NpC-style subtraction, this is our preferred approach because it eliminates any backreaction associated with the charged pileup (this is useful also for area-based subtraction), while retaining the information about charged pileup tracks.
There are multiple issues that may be of concern for the NpC method. For example, calorimeter fluctuations can limit the experiments' ability to accurately remove the charged pileup component as measured with tracks. For out-of-time pileup, which contributes to calorimetric energy deposits, charged-track information may not be available at all. In any case, charged-track information covers only a limited range of detector pseudorapidities. Additionally there are subtleties with hadron masses: in effect, γ 0 is different for transverse components and for longitudinal components. In this work we will avoid this problem by treating all particles as massless. 3 The importance of the above limitations can only be fully evaluated in an experimental context.
We will be comparing NpC to the area-median method. The latter makes a global estimate of pileup transverse-momentum flow per unit area, ρ, by dividing an event into similarly sized patches and taking the median of the transverse-momentum per unit area across all the patches. It then corrects each jet using the globally estimated ρ and the individual jet's area, A µ , 4 p jet,sub Like NpC, the area-median method has potential experimental limitations. They include include questions of non-trivial rapidity dependence and detector non-linearities (the latter are relevant also for NpC). These have, to a reasonable extent, been successfully overcome by the experiments [1,2]. One respect in which NpC may have advantages over the area-median method is that the latter fails to correctly account for the fact that pileup fluctuates from point to point within the event, a feature that cannot be encoded within the global pileup estimate ρ. 5 Furthermore NpC does not need a separate estimation of the background density ρ, which can have systematics related to the event structure (e.g. tt events v. dijet events); and there is no need to include large numbers of ghosts for determining jet areas, a procedure that has a non-negligible computational cost. Let us now proceed with an investigation of NpC's performance, focusing our attention on particle-level events for simplicity. The key question is the potential performance gain due to NpC's use of local information. To study this quantitatively, we consider a circular patch of radius R centred at y = φ = 0 and examine the correlation coefficient of the actual neutral energy flow in the patch with two estimates: (a) one based on the charged energy flow in the same patch and (b) the other based on a global energy flow determination from the neutral particles, ρ ntr . Fig. 1 (left) shows these two correlation coefficients, "ntr v. chg" and "ntr v. ρ ntr A", as a function of R, for two average pileup multiplicities, µ = 20 and µ = 100. One sees that the local neutral-charged correlation is slightly lower, i.e. slightly worse, than the neutral-ρ ntr correlation. Both correlations decrease for small patch radii, ntr. -ρ ntr A (µ = 100) Figure 1: Left: the correlation coefficient between the neutral transverse momentum in a central patch and either the charged transverse momentum in that patch (rescaling this component would not change the value of the correlation coefficient) or the prediction using the area-median method, i.e. ρ ntr A. Right: the standard deviation of the difference between neutral transverse momentum in a central patch and either the rescaled charged transverse momentum in that patch or the prediction using the area-median method, i.e. ρ ntr A. The events are composed of superposed zero-bias collisions simulated with Pythia 8, tune 4C, and the number of collisions per event is Poisson distributed with average µ.
as is to be expected, and the difference between them is larger at small patch radii. The correlation is largely independent of the number of pileup events being considered, which is consistent with our expectations, since all individual terms in the determination of the correlation coefficient should have the same scaling with N PU .
Quantitative interpretations of correlation coefficients can sometimes be delicate, as we discuss in Appendix B, essentially because they combine the covariance of two observables with the two observables' individual variances. We find that it can be more robust to investigate a quantity σ ∆p ntr t , the standard deviation of where the estimate of neutral energy flow, p ntr,estimated t , may be either from the rescaled charged flow or from ρ ntr A. The right-hand plot of Fig. 1 shows σ ∆p ntr t for the two methods, again as a function of R, for two levels of pileup. It is normalised to R √ µ, to factor out the expected dependence on both the patch radius and the level of pileup. A lower value of σ ∆p ntr t implies better performance, and as with the correlation we reach the conclusion that a global estimate of ρ ntr appears to be slightly more effective at predicting local neutral energy flow than does the local charged energy flow. If one hoped to use NpC to improve  Figure 2: The filled histogram shows the distribution, for simulated zero-bias collisions, of the fraction, r, of the transverse momentum in a central circular patch of radius R = 0.4 that is due to charged particles. It is separated into components according to the multiplicity of particles in the patch. The dashed and dotted histograms show the corresponding chargedfraction distributions for each of the two hardest anti-k t , R = 0.4 jets in simulated dijet events, with two choices for the hard generation cut p gen t,min .
on the performance of area-median subtraction, then figure 1 suggests that one will be disappointed.
In striving for an understanding of this finding, one should recall that the ratio of charged-to-neutral energy flow is almost entirely driven by non-perturbative effects. Inside an energetic jet, the non-perturbative effects are at scales ∼ Λ QCD that are tiny compared to the jet transverse momentum p t . There are fluctuations in the relative energy carried by charged and neutral particles, for example because a leading u-quark might pick up ā d or aū from the vacuum. However, because Λ QCD ≪ p t , the charged and neutral energy flow mostly tend to go in the same direction.
The case that we have just seen of an energetic jet gives an intuition that fluctuations in charged and neutral energy flow are going to be locally correlated. It is this intuition that motivates the study of NpC. We should however examine if this intuition is actually valid for pileup. We will examine one step of hadronisation, namely the production of short-lived hadronic resonances, for example a ρ + . The opening angle between the π + π 0 decay products of the ρ + is of order 2m ρ /p t,ρ . Given that pileup hadrons are produced mostly at low p t , say 0.5 − 2 GeV, and that m ρ ≃ 0.77 GeV, the angle between the charged and neutral pions ends up being of order 1 or even larger. As a result, the correlation in direction between charged and neutral energy flow is lost, at least in part. Thus, at low p t , non-perturbative effects specifically tend to wash out the charged-neutral angular correlation.
This point is illustrated in Fig. 2. We consider zero-bias events and examine a circular patch of radius R = 0.4 centred at y = φ = 0. The figure shows the distribution of the charged p t fraction, r, in the patch (filled histogram, broken into contributions where the patch contains 1, 2 or more particles). The same plot also shows the distribution of the charged p t fraction in each of the two leading anti-k t , R = 0.4 jets in dijet events (dashed and dotted histograms). Whereas the charged-to-total ratio for a jet has a distribution peaked around 0.6, as one would expect, albeit with a broad distribution, the result for zero-bias events is striking: in about 60% of events the patch is either just charged or just neutral, quite often consisting of just a single particle (weighting by the p t flow in the patch, the figure goes down to 30%). This is probably part of the reason why charged information provides only limited local information about neutral energy flow in pileup events. These considerations are confirmed by an analysis of the actual performance of NpC and area-median subtraction. We reconstruct jets using the anti-k t algorithm [19], as implemented in FastJet 6 [20], with a jet radius parameter of R = 0.4. We study dijet and pileup events generated with Pythia 8.176 [21], in tune 4C; we assume idealised CHS, treating the charged pileup particles as ghosts. In the dijet ("hard") event alone, i.e. without pileup, we run the jet algorithm and identify jets with absolute rapidity |y| < 2.5 and transverse momentum p t > 150 GeV. Then in the event with superposed pileup (the "full" event) we rerun the jet algorithm and identify the jets that match those selected in the hard event 7 and subtract them using either NpC, Eq. (3), or the area-median method, Eq. (4), with ρ estimated from the CHS event. The hard events are generated with the underlying event turned off, which enables us to avoid subtleties related to the simultaneous subtraction of the underlying event. Figure 3 provides the resulting comparison of the performance of the NpC and areamedian subtraction methods (the latter in CHS and in full events). The left-hand plot shows the average difference between the subtracted jet p t and the p t of the corresponding matched hard jet, as a function of the number of pileup interactions. Both methods clearly perform well here, with the average difference systematically well below 1 GeV even for very high pileup levels. The right-hand plot shows the standard deviation of the difference between the hard and subtracted full jet p t . A lower value indicates better performance, and one sees that in CHS events the area-median method indeed appears to have a small, but consistent advantage over NpC. Comparing area-median subtraction in CHS and full events, one observes a significant degradation in resolution when one fails to use the available information about charged particles in correcting the charged component of pileup, as is to be expected for a particle-level study.
The conclusion of this section is that the NpC method fails to give a superior performance to the area-median method in CHS events. This is because the local correlations of neutral and charged energy flow are no greater than the correlations between local neutral energy flow and the global energy flow. We believe that part of the reason for this is 6 Results shown in this paper have been obtained in some cases with a development snapshot of version 3.1, in others with versions 3.1.0 and 3.1.1. 7 For the matching, we introduce a quantity p shared t (j hard i , j full j ), the scalar sum of the p t 's of the constituents that are common to a given pair i, j of hard and full jets. For a hard jet i, the matched jet in the full event is the one that has the largest p shared t (j hard i , j full j ). In principle, one full jet can match two hard jets, e.g. if two nearby hard jets end up merged into a single full jet due to back-reaction effects. However this is exceedingly rare. The right-hand plot shows the standard deviation of ∆p t (lower values are better). NpC is shown only for CHS events, while area-median subtraction is shown both for events with CHS and for events without it ("full").
that the hadronisation process for low p t particles intrinsically tends to produce hadrons separated by large angles, as illustrated concretely in the case of ρ ± resonance decay.

Cleansing
Part of the original motivation for our work here was to cross check a method recently introduced by Krohn, Low, Schwartz and Wang (KLSW) and called jet cleansing [9]. Cleansing comes in several variants. We will concentrate on linear cleansing, which was seen to perform well across a variety of observables by KLSW. 8 It involves several elements: it breaks a jet into multiple subjets, as done for grooming methods like filtering and trimming [15,14] (cf. also the early work by Seymour [22]). In its "linear" variant, it then corrects individual subjets for pileup by a method that is essentially the same as the NpC approach described in the previous section. Cleansing may also be used in conjunction with trimming-style cuts to the subtracted subjets, specifically it can remove those whose corrected transverse momentum is less than some fraction f cut of the overall jet's transverse momentum (as evaluated before pileup removal). 9 The top left-hand plot of Fig. 4 shows the correlation coefficient between the dijet mass in a hard event and the dijet mass after addition of pileup and application of each of several pileup mitigation methods. The results are shown as a function of N PU . The pileup mitigation methods include two forms of cleansing (with f cut = 0), area-median subtraction, CHS+area subtraction, and CHS+area subtraction in conjunction with trimming (also with f cut = 0). The top right-hand plots shows the corresponding results for the jet mass. For the dijet mass we see that linear (and Gaussian) cleansing performs worse than area subtraction, while in the right-hand plot, for the jet mass, we see linear (and Gaussian) cleansing performing better than area subtraction, albeit not to the extent found in Ref. [9]. These (and, unless explicitly stated, our other Z ′ results) have been generated with the Z ′ decaying to all flavours except tt, and B-hadrons have been kept stable. 10 The lower plot shows the dijet mass for a different Z ′ sample, one that decays only to u, d and s quarks, but not c and b quarks. Most of the results are essentially unchanged. The exception is cleansing, which turns out to be very sensitive to the sample choice. Without stable B-hadrons in the sample, its performance improves noticeably and at high pileup becomes comparable to that of area-subtraction. Both of the left-hand plots in our Fig. 4 differ noticeably from Fig. 4 (left) of Ref. [9] and in particular they are not consistent with KLSW's observation of much improved correlation coefficients for the dijet mass with cleansing relative to area+CHS subtraction. 11 Given our results on NpC in section 2, we were puzzled by the difference between the performance of area-subtraction plus trimming versus that of cleansing: our expectation is that their performances should be very similar. 12 The strong sample-dependence of the 9 In v1 of this article as submitted to arXiv in April 2014, and also in a version circulated to the authors of Ref. [9] several months before the arXiv submission, we used f cut = 0.05 for cleansing, reflecting our understanding of the choices made in v1 of Ref. [9], which stated "[we] supplement cleansing by applying a cut on the ratio f of the subjet p T (after cleansing) to the total jet p T . Subjets with f < f cut are discarded.
[...] Where we do trim/cleanse we employ R sub = 0.3 subjets and take f cut = 0.05." Subsequent to the appearance of v1 of our article, the authors of Ref. [9] clarified that the results in their Fig. 4 had used f cut = 0. This is the choice that we adopt throughout most of this version, and it has an impact notably on the conclusions for the jet-mass performance. 10 We often find this to be useful for particle-level b-tagging studies. Experimentally, in the future, one might even imagine an "idealised" form of particle flow that attempts to reconstruct B-hadrons (or at least their charged part) from displaced tracks before jet clustering. 11 We remain puzzled also by the relative pattern of area+CHS v. area-subtracted results in Fig. 4 (left) of Ref. [9], since the area+CHS curves appears to tend towards area at large pileup, whereas the use of CHS should significantly reduce the impact of pileup. 12 KLSW state in [9] that fluctuations around a 'best' charged fractionγ 0 decrease with increasing N PU and suggest (see also [23], pp. 16 and 17) that this will improve the determination of this fraction and therefore the effectiveness of a method like cleansing, based on a neutral-proportional-to-charged approach. However, this does not happen because, while relative fluctuations aroundγ 0 do indeed decrease proportionally to 1/ √ N PU (a result of the incoherent addition of many pileup events and of the Central Limit Theorem), the absolute uncertainty that they induce on a pileup-subtracted quantity involves an  Jets are reconstructed, as in Ref. [9], with the anti-k t algorithm with R = 1. For both trimming and cleansing, subjets are reconstructed with the k t algorithm with R sub = 0.3 and the f cut value that is applied is f cut = 0. cleansing performance also calls for an explanation. We thus continued our study of the question.
According to the description in Ref. [9], one additional characteristic of linear cleansing relative to area-subtraction is that it switches to jet-vertex-fraction (JVF) cleansing when the NpC-style rescaling would give a negative answer. In contrast, area-subtraction plus trimming simply sets the (sub)jet momentum to zero. We explicitly tried turning the switch to JVF-cleansing on and off and found it had a small effect and did not explain the differences.
Study of the public code for jet cleansing 13 reveals an additional condition being applied to subjets: if a subjet contains no charged particles from the leading vertex (LV), then its momentum is set to zero. This step appears not to have been mentioned in Ref. [9]. Since we will be discussing it extensively, we find it useful to give it a name, "zeroing". Zeroing can be thought of as an extreme limit of the charged-track based trimming procedure introduced by ATLAS [11], whereby a JVF-style cut is applied to reject subjets whose charged-momentum fraction from the leading vertex is too low. Zeroing turns out to be crucial: if we use it in conjunction with CHS area-subtraction (or with NpC subtraction) and f cut = 0 trimming, we obtain results that are very similar to those from cleansing. Conversely, if we turn this step off in linear-cleansing, its results come into accord with those from (CHS) area-subtraction or NpC-subtraction with f cut = 0 trimming.
To help illustrate this, Fig. 5 shows a "fingerprint" for each of several pileup-removal methods, for both the jet p t (left) and mass (right). The fingerprint includes the average shift ( ∆p t or ∆m ) of the observable after pileup removal, shown in black. It also includes two measures of the width of the ∆p t and ∆m distributions: the dispersion (i.e. standard deviation) in red and an alternative peak-width measure in blue. The latter is defined as follows: one determines the width of the smallest window that contains 90% of entries and then scales this width by a factor 0.304. For a Gaussian distribution, the rescaling ensures that the resulting peak-width measure is equal to the dispersion. For a non-Gaussian distribution the two measures usually differ and the orange shaded region quantifies the extent of this difference. The solid black, blue and red lines have been obtained from samples in which the Z ′ decays just to light quarks; the dotted lines are for a sample including cc and bb decays (with stable B-hadrons), providing an indication of the sample dependence; in many cases they are indistinguishable from the solid lines.
Comparing f cut = 0 grooming for NpC, area (without zeroing) and cleansing with zeroing manually disabled, all have very similar fingerprints. Turning on zeroing in the different methods leads to a significant change in the fingerprints, but again NpC, area and cleansing are very similar. 14 When used with f cut = 0 trimming, and when examining quality measures such as the additional factor N PU . The product of the two terms is therefore proportional to √ N PU , i.e. the same scaling as the area-median method. This is consistent with our observations. Note that for area subtraction, the switch from full events to CHS events has the effect of reducing the coefficient in front of √ N PU . 13 Version 1.0.1 from fjcontrib [24]. 14 For the jet mass, Gaussian cleansing appears a little different in the f cut = 0 case with zeroing, suggesting that there may be an advantage from combinations of different constraints on subjet momenta. The peak width is defined as the smallest window of ∆p t that contains 90% of the ∆p t distribution, scaled by a factor ≃ 0.304 such that in the case of a Gaussian distribution the result agrees with the dispersion. The right-hand plot shows the same set of results for the jet mass. The results are obtained in a sample of events with the number of pileup vertices distributed uniformly between 0 and 140. The hard events consist of hadronic Z ′ decays: for the solid vertical lines the sample is Z ′ → dd, uū, ss, while for the dotted lines (sometimes not visible because directly over the solid lines), the sample additionally includes Z ′ → cc, bb with B hadrons kept stable. The Z ′ mass is m Z ′ = 500 GeV and jets are reconstructed with the anti-k t algorithm with R = 1. All results in this figure include charged-hadron subtraction by default. The default form of cleansing, as used e.g. in Fig. 4, is "f cut = 0 zeroing". dispersion (in red, or the closely related correlation coefficient, cf. Appendix B), subjet zeroing appears to be advantageous for the jet mass, but potentially problematic for the jet p t and the dijet mass. However, the dispersion quality measure does not tell the full story regarding the impact of zeroing. Examining simultaneously the peak-width measure (in blue) makes it easier to disentangle two different effects of zeroing. On one hand we find that zeroing correctly rejects subjets that are entirely due to fluctuations of the pileup. This narrows the peak of the ∆p t or ∆m distribution, substantially reducing the (blue) peak-width measures in Fig. 5. On the other hand, zeroing sometimes incorrectly rejects subjets that have no charged tracks from the LV but do have significant neutral energy flow from the LV. This can lead to long tails for the ∆p t or ∆m distributions, adversely affecting the dispersion. 15 It is the interplay between the narrower peak and the longer tails that affects whether overall the dispersion goes up or down with zeroing. In particular the tails appear to matter more for the jet p t and dijet mass than they do for the single-jet mass. Note that accurate Monte Carlo simulation of such tails may be quite challenging: they appear to be associated with configurations where a subjet contains an unusually small number of energetic neutral particles. Such configurations are similar to those that give rise to fake isolated photons or leptons and that are widely known to be difficult to simulate correctly. We commented earlier that the cleansing performance has a significant sample dependence. This is directly related to the zeroing: indeed Fig. 5 shows that for cleansing without zeroing, the sample dependence (dashed versus solid lines) vanishes, while it is substantial with zeroing. Our understanding of this feature is that the lower multiplicity of jets with undecayed B-hadrons (and related hard fragmentation of the B-hadron) results in a higher likelihood that a subjet will contain neutral but no charged particles from the LV, thus enhancing the impact of zeroing on the tail of the ∆p t or ∆m jj sample.
The long tails produced by the zeroing are not necessarily unavoidable. In particular, they can correspond to the loss of subjets with tens of GeV, yet it is very unlikely that a subjet from a pileup collision will be responsible for such a large energy. Therefore we introduce a modified procedure that we call "protected zeroing": one rejects any subjet without LV tracks unless its p t after subtraction is n times larger than the largest charged p t in the subjet from any single pileup vertex (or, more simply, just above some threshold p t,min ; however, using n times the largest charged subjet p t could arguably be better both in cases where one explores a wide range of N PU and for situations involving a hard subjet from a pileup collision). Taking n = 10 (or a fixed p t,min = 20 GeV) we have found reduced tails and, consequently, noticeable improvements in the jet p t and dijet mass dispersion (with little effect for the jet mass). This is visible for area and NpC subtraction in Fig. 5. Protected zeroing also eliminates the sample dependence. Several additional comments can be made about f cut = 0 trimming combined with zeroing. Firstly, f cut = 0 trimming alone introduces a bias in the jet p t , which is clearly visible in the f cut = 0 no-zeroing shifts in Fig. 5. This is because the trimming removes negative fluctuations of the pileup, but keeps the positive fluctuations. Zeroing then counteracts that bias by removing some of the positive fluctuations, those that happened not to have any charged tracks from the LV. It also introduces further negative fluctuations for subjets that happened to have some neutral energy flow but no charged tracks. Overall, one sees that the final net bias comes out to be relatively small. This kind of cancellation between different biases is common in noise-reducing pileup-reduction approaches [26,27,28].
Most of the studies so far in this section have been carried out with a setup that is similar to that of Ref. [9], i.e. R = 1 jets in a Z ′ sample with f cut = 0 trimming. This is not a common setup for most practical applications. For most uses of jets, R = 0.4 is a standard choice and pileup is at its most severe at low to moderate p t . Accordingly, in Fig. 6 (left)   had p t > 50 GeV. We see that qualitatively the pattern is quite similar to that in Fig. 5. 17 Quantitatively, the difference between the various choices is much smaller, with about a 10% reduction in dispersion (or width) in going from ungroomed CHS area-subtraction to the f cut = 0 protected subjet-zeroing case. One should be aware that this study is only for a single p t , across a broad range of pileup. The dispersions for a subset of the methods are shown as a function of the number of pileup vertices in the right-hand plot of Fig. 6. That plot also includes results from the SoftKiller method [27] and illustrates that the benefit from protected zeroing (comparing the solid and dashed blue curves) is about half of the benefit that is brought from SoftKiller (comparing solid blue and black curves). These plots show that protected zeroing is potentially of interest for jet p t determinations in realistic conditions. Thus it would probably benefit from further study: one should, for example, check its behaviour across a range of transverse momenta, determine optimal choices for the protection of the zeroing and investigate also how best to combine it with particle-level subtraction methods such as SoftKiller. 18 Turning now to jet masses, the use of R = 1 is a not uncommon choice, however most applications use a groomed jet mass with a non-zero f cut (or its equivalent): this improves mass resolution in the hard event even without pileup, and it also reduces backgrounds, 17 Note, however, that at even lower jet p t 's, the difference between zeroing and protected zeroing might be expected to disappear. This is because the long negative tails are suppressed by the low jet p t itself. 18 An interesting feature of protected zeroing, SoftKiller and another recently introduced method, PUPPI [28], is that the residual degradation in resolution from pileup appears to scale more slowly than the √ N PU pattern that is observed for area and NpC subtraction alone.
changing the perturbative structure of the jet [29,30] even in the absence of pileup. 19 Accordingly in Fig. 7 we show f cut = 0.05 results (with shifts and widths computed relative to f cut = 0.05 trimmed hard jets) for a hard W W sample where the hard fat jets are required to have p t > 500 GeV. Zeroing, whether protected or not, appears to have little impact. One potential explanation for this fact is as follows: zeroing's benefit comes primarily because it rejects fairly low-p t pileup subjets that happen to have no charged particles from the leading vertex. However for a pileup subjet to pass the f cut = 0.05 filtering criterion in our sample, it would have to have p t > 25 GeV. This is quite rare. Thus filtering is already removing the pileup subjets, with little further to be gained from the chargedbased zeroing. As in the plain jet-mass summary plot, protection of zeroing appears to have little impact for the trimmed jet mass. 20 Does that mean that (protected) zeroing has no scope for improving the trimmed-jet mass? The answer is "not necessarily": one could for example imagine first applying protected zeroing to subjets on some angular scale R zero in order to eliminate low-p t contamination; then reclustering the remaining constituents on a scale R trim R zero , subtracting according to the area or NpC methods, and finally applying the trimming momentum cut (while also keeping in mind the considerations of footnote 20).
We close this section with a summary of our findings. Based on its description in Ref. [9] and our findings about NpC v. area subtraction, cleansing with f cut = 0 would be expected to have a performance very similar to that of CHS+area subtraction with f cut = 0 trimming. Ref. [9] however reported large improvements for the correlation coefficients of the dijet mass and the single jet mass using R = 1 jets. In the case of the dijet mass we do not see these improvements, though they do appear to be there for the jet mass. The differences in behaviour between cleansing and trimmed CHS+area-subtraction call for an explanation, and appear to be due to a step in the cleansing code that was undocumented in Ref. [9] and that we dubbed "zeroing": if a subjet contains no charged tracks from the leading vertex it is discarded. Zeroing is an extreme form of a procedure described in Ref. [11]. In can be used also with area or NpC subtraction, and we find that it brings a benefit for the peak of the ∆p t and ∆m distributions, but appears to introduce long tails in ∆p t . A variant, "protected zeroing", can avoid the long tails by still accepting subjets without leading-vertex tracks, if their p t is above some threshold, which may be chosen dynamically based on the properties of the pileup. In our opinion, a phenomenologicaly realistic estimate of the benefit of zeroing (protected or not) requires study not of R = 1 plain jets, but instead of R = 0.4 jets (for the jet p t ) or larger-R 19 In contrast, for f cut = 0 trimming, the jet structure is unchanced in the absence of pileup. 20 Cleansing appears to perform slightly worse than trimming with NpC or area subtraction. One difference in behaviour that might explain this is that the p t threshold for cleansing's trimming step is f cut p full,no-CHS t (even in the CHS-like input nc separate mode that we use). In contrast, for the area and NpC-based results, it is f cut p full,CHS t . In both cases the threshold, which is applied to subtracted subjets, is increased in the presence of pileup, but this increase is more substantial in the cleansing case. This could conceivably worsen the correspondence between trimming in the hard and full samples. For the area and NpC cases, we investigated the option of using f cut p area-sub,CHS t or f cut p NpC-sub,CHS t and found that this brings a small additional benefit.
trimmed jets with a non-zero f cut (for the jet mass). In a first investigation, there appear to be some phenomenological benefits from protected zeroing for the R = 0.4 jet p t , whereas to obtain benefits for large-R trimmed jets would probably require further adaptation of the procedure. In any case, additional study is required for a full evaluation of protected zeroing and related procedures.  Reading r = 0.67 from Fig. 8 (left) for R = 0.4, and σ NpC ≃ 1.14 σ ρA from Fig. 1 (right), one finds σ ≃ 0.96 σ ρA . Because of the substantial correlation between the two methods, ones expects only a modest gain from their linear combination.
In Fig. 9 we compare the performance of pileup subtraction from the combination of the NpC and the area-median methods, using the optimal value f = 0.31 that can be read from Fig. 8 (right) for R = 0.4, both for the jet p t and the jet mass. The expected small gain is indeed observed for the jet p t , and it is slightly larger for the jet mass. 21 Given the modest size of the gain, one may wonder how phenomenologically relevant it is likely to be. Nevertheless, one might still consider investigating whether the gain carries over also to a realistic experimental environment with full detector effects.

Conclusions
One natural approach to pileup removal is to use the charged pileup particles in a given jet to estimate the amount of neutral pileup that needs to be removed from that same jet. In this article, with the help of particle-level simulations, we have studied such a method (NpC) and found that it has a performance that is similar to, though slightly worse than the existing, widely used area-median method. This can be related to the observation that the correlations between local charged and neutral energy flow are no larger than those between global and local energy flow. Tentatively, we believe that this is in part because the non-perturbative effects that characterise typical inelastic proton-proton collisions act to destroy local charged-neutral correlation.
The absence of benefit that we found from the NpC method led us to question the substantial performance gains quoted for the method of cleansing in Ref. [9], one of whose key differences with respect to earlier work is the replacement of the area-median method with NpC. For the dijet mass, we are unable to reproduce the large improvement observed in Ref. [9], in the correlation coefficient performance measure, for cleansing relative to area subtraction. We do however see an improvement for the jet mass. We trace a key difference in the behaviour of cleansing and area subtraction to the use in the cleansing code of a step that was not documented in Ref. [9] and that discards subjets that contain no tracks from the leading vertex. This "zeroing" step, similar to the charged-track based trimming introduced by ATLAS [11], can indeed be of benefit. It has a drawback of introducing tails in some distributions due to subjets with a substantial neutral p t from the leading vertex, but no charged tracks. As a result, different quality measures lead to different conclusions as to the benefits of zeroing. The tails can be alleviated by a variant of zeroing that we introduce here, "protected zeroing", whereby subjets without LV charged tracks are rejected only if their p t is below some (possibly pileup-dependent) threshold. Protected zeroing does in some cases appear to have phenomenological benefits, which are observed across all quality measures.
Given two different methods for pileup removal, NpC and area-median subtraction, it is natural to ask how independent they are and what benefit might be had by combining them. This was the question investigated in section 4, where we provided a formula for an optimal linear combination of the two methods, as a function of their degree of correlation. Ultimately we found that NpC and area-median subtraction are quite highly correlated, which limits the gains from their combination to about a 5% percent reduction in dispersion. While modest, this might still be sufficient to warrant experimental investigation, as are other methods, currently being developed, that exploit constituent-level subtraction [31,32,27,28]. A study of the integration of those methods with protected zeroing would also be of interest.
Code for our implementation of area subtraction with positive-definite mass is available as part of FastJet versions 3.1.0 and higher. Public code and samples for carrying out a subset of the comparisons with cleansing described in section 3, including also the NpC subtraction tools, are available from Ref. [36]. grant ANR-10-CEXC-009-01, by the EU ITN grant LHCPhenoNet, PITN-GA-2010-264564 and by the ILP LABEX (ANR-10-LABX-63) supported by French state funds managed by the ANR within the Investissements d'Avenir programme under reference ANR-11-IDEX-0004-02. GPS wishes to thank Princeton University for hospitality while this work was being carried out. GS wishes to thank CERN for hospitality while this work was being finalised.

A Details of our study
Let us first fully specify what we have done in our study and then comment on (possible) differences relative to KLSW.
Our hard event sample consists of dijet events from pp collisions at √ s = 14 TeV, simulated with Pythia 8.176 [21], tune 4C, with a minimum p t in the 2 → 2 scattering of 135 GeV and with the underlying event turned off, except for the plots presented in Figs. 4 and 5, where we use Z ′ events with m Z ′ = 500 GeV. Jets are reconstructed with the anti-k t algorithm [19] after making all particles massless (preserving their rapidity) and keeping only particles with |y| < 4. We have R = 0.4, except for the some of the results presented in Section 3 and Appendix B, where we use R = 1 as in Ref. [9]. Given a hard event, we select all the jets with p t > 150 GeV and absolute rapidity |y| < 2.5. We then add pileup and cluster the resulting full event, i.e. including both the hard event and the pileup particles, without imposing any p t or rapidity cut on the resulting jets. For each jet selected in the hard event as described above, we find the jet in the full event that overlaps the most with it. Here, the overlap is defined as the scalar p t sum of all the common jet constituents, as described in footnote 7 on p. 7. Given a pair of jets, one in the hard event and the matching one in the full event, we can apply subtraction/grooming/cleansing to the latter and study the quality of the jet p t or jet mass reconstruction. For studies involving the dijet mass (cf. Fig. 4) we require that at least two jets pass the jet selection in the hard event and use those two hardest jets, and the corresponding matched ones in the full event, to reconstruct the dijet mass. 22 This approach avoids having to additionally consider the impact of pileup on the efficiency for jet selection, which cannot straightforwardly be folded into our quality measures. 23 Most of the studies shown in this paper use idealised particle-level CHS events. In these events, we scale all charged pileup hadrons by a factor ǫ = 10 −60 before clustering, to ensure that they do not induce any backreaction [4]. The jet selection and matching procedures are independent of the use of CHS or full events. When we plot results as a function of the quantity N PU , this corresponds to the actual (fixed) number of zero-bias events superimposed onto the hard collision. For results shown as a function of µ, the average number of zero-bias events, the actual number of zero-bias events has a Poisson distribution. Clustering and area determination are performed with a development version of FastJet 3.1 (which for the features used here behaves identically to the 3.0.x series) and with FastJet 3.1.0 and 3.1.1 for the results in section 3.
Details of how the area-median subtraction is performed could conceivably matter. Jet areas are obtained using active area with explicit ghosts placed up to |y| = 4 and with a default ghost area of 0.01. We use FastJet's GridMedianBackgroundEstimator with a grid spacing of 0.55 to estimate the event background density ρ. The ρ estimation is performed using the particles (up to |y| = 4) from the full or the CHS event as appropriate. When subtracting pileup from jets, we account for the rapidity dependence of ρ, based on the rapidity dependence in a pure pileup sample (as discussed in Refs. [20,35]). We carry out 4-vector subtraction p sub µ = p µ − A µ ρ(y jet ). A few obviously unphysical situations need special care. For jets obtained from the full event, if A t ρ > p jet t , we set p sub µ to a vector with zero transverse momentum, zero mass, 22 In the case of the Z ′ events used for Fig. 4, this does not exactly reflect how we would have chosen to perform a dijet (resonance) study ourselves. One crucial aspect is that searches for dijet resonances always impose a rapidity cut between the two leading jets, such as |∆y| < 1.2 [33,34]. This ensures that high dijet-mass events are not dominated by low p t forward-backward jet pairs, which are usually enhanced in QCD v. resonance production. Those forward-backward pairs can affect conclusions about pileup, because for a given dijet mass the jet p t 's in a forward-backward pair are lower than in a central-central pair, and so relatively more sensitive to pileup. Also the experiments do not use R = 1 for their dijet studies: ATLAS uses R = 0.6 [33], while CMS uses R = 0.5 with a form of radiation recovery based on the inclusion of any additional jets with p t > 30 GeV and within ∆R = 1.1 of either of the two leading jets ("wide jets") [34]. This too can affect conclusions about pileup. 23 One alternative would have been to impose the cuts on the jets in the full event (with pileup and subtraction/grooming/cleansing) and consider as the "hard jet", the subset of the particles in the fullevent jet that come from the leading vertex (i.e. the hard event). We understand that this is close to the choice made in Ref. [9]. This can give overly optimistic results, because it neglects backreaction. However in our studies it did not appear to significantly modify the conclusions on relative performances of different methods. and the rapidity and azimuth of the original unsubtracted jet; and if (p sub ) 2 is negative, an unphysical situation since it would lead to an imaginary mass, we replace p sub µ with a vector with the same transverse components, zero mass, and the rapidity of the original unsubtracted jet. 24 This is essentially equivalent to replacing negative squared masses with zero.
The case of CHS events is a bit more delicate. Let p chg µ denote the 4-momentum of the charged component of the jet. 25 Then, if p sub t < p chg t , we set p sub µ = p chg µ , and when (p sub ) 2 < (p chg ) 2 , we replace p sub µ with a vector with the same transverse components, and the mass and rapidity of p chg µ . For jets with no charged component, whenever the resulting 4-vector has an ill-defined rapidity or azimuthal angle, we use those of the original jet. Corresponding tests that the subtracted transverse momentum and mass are non-negative are also applied in our NpC subtraction. These safety requirements have little impact on the single-jet p t , limited impact on the dijet mass, and for the single-jet mass improve the dispersion of the subtraction relative to the choice (widespread in computer codes) of taking m ≡ − |m 2 | when m 2 < 0. 26 One difference between our study and KLSW's is that we carry out a particle-level study, whereas they project their event onto a toy detector with a 0.1 × 0.1 tower granularity, removing charged particles with p t < 0.5 GeV and placing a 1 GeV threshold on towers. In our original (v1) studies with f cut = 0.05 we tried including a simple detector simulation along these lines and did not find any significant modification to the pattern of our results, though CHS+area subtraction is marginally closer to the cleansing curves in this case. 27 Cleansing has two options: one can give it jets clustered from the full event, and then it uses an analogue of Eq. (1): this effectively subtracts the exact charged part and the NpC estimate of the neutrals. Or one can give it jets clustered from CHS events, and it then applies the analogue of Eq. (3), which assumes that there is no charged pileup left in the jet and uses just the knowledge of the actual charged pileup to estimate (and subtract) the neutral pileup. These two approaches differ by contributions related to backreaction. Our understanding is that KLSW took the former approach, while we used the latter. Specifically, our charged-pileup hadrons, which are scaled down in the CHS event, are scaled back up to their original p t before passing them to the cleansing code, in its 24 In versions of FastJet prior to 3.1.0 this had to be done manually: the treatment of unphysical 4-vectors was left to the user, since the optimal treatment may depend on the context. As of version 3.1.0, FastJet provides the option of "safe" subtraction, whereby subjets with negative mass squared after subtraction are automatically assigned a zero mass (in CHS events, the safe procedure follows the description below in the text). 25 For our "emulated" CHS events, the charged contribution from pileup is negligible since it has been scaled down, and only the contribution from the hard interaction counts. 26 As this work was being finalised, an alternative approach to using area-median information to reconstruct the 4-vector (and shapes) of a jet was proposed in Ref. [32], based on a constituent-level assignment of the subtraction. It appears to bring further improvements in performance for observables like the jet mass. 27 We choose to show particle-level results here because of the difficulty of correctly simulating a full detector, especially given the non-linearities of responses to pileup and the subtleties of particle flow reconstruction in the presence of real detector fluctuations. input nc separate mode. If we use cleansing with full events, we find that its performance worsens, as is to be expected given the additional backreaction induced when clustering the full event. Were it not for backreaction, cleansing applied to full or CHS events should essentially be identical.
Regarding the NpC and cleansing parameters, our value of γ 0 = 0.612 differs slightly from that of KLSW's γ 0 = 0.55, and corresponds to the actual fraction of charged pileup in our simulated events. In our tests with a detector simulation like that of KLSW, we adjusted γ 0 to its appropriate (slightly lower) value.
Finally, for trimming we use R trim = 0.3 and the reference p t is taken unsubtracted, while the subjets are subtracted before performing the trimming cut, which removes subjets with p t below a fraction f cut times the reference p t . Compared to using the subtracted p t as the reference for trimming, this effectively places a somewhat harder cut as pileup is increased. 28 For comparisons with cleansing we generally use f cut = 0 unless explicitly indicated otherwise.

B The correlation coefficient as a quality measure
In this appendix, we discuss some characteristics of correlation coefficients that affect their appropriateness as generic quality measures for pileup studies.
Suppose we have an observable v. Define to be the difference, in a given event, between the pileup subtracted observable and the original "hard" value without pileup. Two widely used quality measures for the performance of pileup subtraction are the average offset of v, ∆v and the standard deviation of ∆v, which we write as σ ∆v . One might think there is a drawback in keeping track of two measures, in part because it is not clear which of the two is more important. It is our view that the two measures provide complementary information: if one aims to reduce systematic errors in a precision measurement then a near-zero average offset may be the most important requirement, so as not to be plagued by issues related to the systematic error on the offset. In a search for a resonance peak, then one aims for the narrowest peak, and so the smallest possible standard deviation. 29 The quality measure advocated in [9] is instead the correlation coefficient between v sub and v hard . This has the apparent simplifying advantage of providing just a single quality measure. However, it comes at the expense of masking potentially important information: for example, a method with a large offset and one with no offset will give identical correlation coefficients, because the correlation coefficient is simply insensitive to (constant) offsets.
The correlation coefficient has a second, more fundamental flaw, as illustrated in Fig. 10. On the left, one has a scatter plot of the dijet mass in PU-subtracted events versus the The hard dijet sample and the analysis are as described in appendix A, with a jet radius of R = 1. The right-hand plot is identical except for the following additional condition on the hard event: 450 < m jj < 550 GeV. Note the lower correlation coefficient, even though the lower σ ∆m jj suggests better typical subtraction in this specific mass bin.
dijet mass in the corresponding hard events, as obtained in an inclusive jet sample. There is a broad spread of dijet masses, much wider than the standard deviation of ∆m jj , and so the correlation coefficient comes out very high, c = 0.988. Now suppose we are interested in reconstructing resonances with a mass near 500 GeV, and so consider only hard events in which 450 < m jj < 550 GeV (right-hand plot). Now the correlation coefficient is 0.59, i.e. much worse. This does not reflect a much worse subtraction: actually, σ ∆m jj is better (lower) in the sample with a limited m jj window, σ ∆m jj = 34 GeV, than in the full sample, σ ∆m jj = 41 GeV. The reason for the puzzling decrease in the correlation coefficient is that the dispersion of m jj is much smaller than before, and so the dispersion of ∆m jj is now comparable to that of m jj : it is this, and not an actual degradation of performance, that leads to a small correlation. This can be understood quantitatively in a simple model with two variables: let v hard have a standard deviation of σ v,hard , and for a given v hard let v sub be distributed with a mean value equal to v hard (plus an optional constant offset) and a standard deviation of jet to the trimmer, in which case the reference p t will be the subtracted one. 29 This statement assumes the absence of tails in the ∆v distribution. For some methods the long tails can affect the relevance of the standard-deviation quality measure.
σ ∆v (independent of v hard ). Then the correlation coefficient of v hard and v sub is i.e. it tends to zero for σ v,hard ≪ σ ∆v and to 1 for large σ v,hard ≫ σ ∆v , in accord with the qualitative behaviour seen in Fig. 10. The discussion becomes more involved if v sub has a more complicated dependence on v hard or if σ ∆v itself depends on v hard , for example as is actually the case for the dijet mass with the analysis of Appendix A. The main conclusion from this appendix is that correlation coefficients mix together information about the quality of pileup mitigation and information about the hard event sample being studied. It is then highly non-trivial to extract just the information about the pileup subtraction. This can lead to considerable confusion, for example, when evaluating the robustness of a method against the choice of hard sample. Overall therefore, it is our recommendation that one consider direct measures of the dispersion introduced by the pileup and subtraction and not correlation coefficients. In cases with severely non-Gaussian tails in the ∆v distributions it can additionally be useful to consider quality measures more directly related to the peak structure of the ∆v distribution.