Analyzing N-point Energy Correlators Inside Jets with CMS Open Data

Jets of hadrons produced at high-energy colliders provide experimental access to the dynamics of asymptotically free quarks and gluons and their confinement into hadrons. In this paper, we show that the high energies of the Large Hadron Collider (LHC), together with the exceptional resolution of its detectors, allow multipoint correlation functions of energy flow operators to be directly measured within jets for the first time. Using Open Data from the CMS experiment, we show that reformulating jet substructure in terms of these correlators provides new ways of probing the dynamics of QCD jets, which enables direct imaging of the confining transition to free hadrons as well as precision measurements of the scaling properties and interactions of quarks and gluons. This opens a new era in our understanding of jet substructure and illustrates the immense unexploited potential of high-quality LHC data sets for elucidating the dynamics of QCD.

Introduction.-High-energyjets produced at the Large Hadron Collider (LHC) provide a unique opportunity to study the nearly conformal dynamics of high-energy quarks and gluons in Quantum Chromodynamics (QCD) as well as their confinement into hadrons.The seminal introduction of robust jet algorithms [1][2][3] has enabled detailed measurements of the structure of energy flow within jets, providing a new window into these phenomena.This in turn has transformed our ability to search for new physics at the LHC [4][5][6] and offers the opportunity to transform our understanding of QCD itself [7,8].
The study of energy flow in QCD collisions has a long history [9][10][11][12][13][14][15].Event shape observables were first introduced as resolution variables acting as infrared-safe proxies for the underlying S-matrix elements of quarks and gluons.These observables were well suited for the LEP era where the primary interest was in the distribution of jets themselves, with each individual jet being relatively low energy and consisting of only a few hadrons.By contrast, the LHC provides high-statistics samples of individual jets, with high energies (p T > 500 GeV) and high particle multiplicities, and the substructure of jets can be measured with remarkable angular resolution [16][17][18].This massive leap provides an opportunity to rethink the language used for characterizing energy flow in QCD.
Instead of using shape observables, which take as primary the underlying S-matrix elements, it was argued in Ref. [19] that as QCD approaches its conformal limit, one should switch to a characterization of jets in terms of correlation functions.This enables a beautiful reframing of jet substructure in terms of universal scaling behavior and the operator product expansion (OPE) algebra of light-ray operators.Despite the theoretical elegance of the correlator-based approach, measurements of correlators in the perturbative regime require truly high-energy jets, measured with excellent angular resolution, much beyond what was available in the LEP era.Early studies of these observables in both theory [20][21][22][23][24] and experi-ment [25][26][27][28][29][30][31][32][33][34] were thus largely forgotten to history.With the advent of the LHC, the strong historical preference for jet shapes has left the simplest questions about correlations of energy flow in gauge theories experimentally unanswered. 1  To bridge the gap between the real-world environment of QCD at the LHC and theoretical developments in conformal field theory, a program was initiated in Ref. [36] to reformulate jet substructure in terms of correlators.This program builds on earlier visionary work in the context of conformal field theories [19,[37][38][39][40][41].In this paper, we take the next step and use publicly available data released by the CMS experiment to perform the first ever analysis of correlation functions of energy flow operators in highenergy jets. 2 These studies reveal new ways of probing jets at the LHC and transform the beautiful underlying theoretical structures into experimental realities.
Observables from Correlators.-Correlationfunctions are a standard approach to characterizing physical systems, typically building in complexity from simple lowpoint correlators to more complicated higher-point correlators.Instead of correlation functions of local operators familiar from condensed matter systems, the objects of interest in collider experiments are correlation functions, of the asymptotic energy flow operator [19,37,38,[42][43][44][45][46]: 1 Figs. 1 and 2 provide an affirmative answer to Polchinski's question at 47:04 of [35].We also hope that this introduction provides a historical explanation (although not an excuse!) for Maldacena's response: "People do not do this.I haven't figured out why they don't." 2 We use the term "analysis" instead of "measurement" to highlight that we have not corrected the data for detector effects.
where T µν is the stress-energy tensor. 3These correlation functions (which we refer to generically as EECs) are the fundamental objects of the theory, and are described by an OPE structure [19,46,[48][49][50] that encodes the internal structure of jets. 4f central physical importance is the scaling behavior of correlators as a function of angular size.To isolate this feature, Ref. [36] introduced one-dimensional projections of the higher-point correlators obtained by integrating over their shape, keeping only their longest side fixed.This defines the N -point projected correlators: where dΩ n is the area element on the detector, ∆ RL is an operator selecting the largest angular distance between the N measured directions, and the average is over an ensemble of high energy jets with energy E jet .For hadron collider measurements, we use the standard longitudinally-boost-invariant transverse momentum p T as the energy coordinate and ∆R = ∆y 2 + ∆φ 2 in the rapidity-azimuth plane as the angular coordinate. 6In the perturbative regime, the projected correlators exhibit a single-logarithmic scaling governed by the twist-2 spin j = N + 1 anomalous dimensions [36].They therefore capture the scaling properties of a generic N -point correlator in a simple one-dimensional observable.
CMS Open Data.-Despite being the fundamental objects of the theory, none of these correlators, nor their scalings, have ever been measured at the LHC. 7Furthermore, to our knowledge, no correlator with k ≥ 3 has ever been measured at a collider experiment.Fortunately, the public release [59] of research-grade collider datasets by the CMS experiment [60,61] has enabled a new era of open exploratory studies [62][63][64][65][66][67][68][69][70][71][72][73], allowing us to analyze these correlators on real data.We have found the use of Open Data to be essential for extracting a consistent picture for the behavior of higher-point correlators, which are not guaranteed to be accurately described by parton shower generators commonly used to study jet substructure observables.While official measurements by the experimental collaborations remain the gold standard in the field, we believe that Open Data studies are an essential tool for theorists exploring the frontiers of QCD.Our analysis is based on a reprocessed dataset of jets culled from the CMS 2011A Open Data [74] and made public in a simple, reusable "MIT Open Data" (MOD) format by Refs.[69,75].These jets, clustered using the anti-k t algorithm with R = 0.5 [2,3], have transverse momenta p T ∈ [500, 550] GeV and pseudo-rapidity |η| < 1.9.To minimize detector effects, we focus on track-based observables (i.e.those only using charged particles) for most of this paper, given the excellent track reconstruction performance of CMS [76], including within jets [77].Tracks are easily incorporated into the theoretical description of correlators using track functions [78][79][80][81][82].We identify charged particles from particle flow candidates (PFCs) [83] provided by CMS, which synthesize tracking and calorimeter information.We follow the procedure in Ref. [69] of using charged hadron subtraction (CHS) [84] to mitigate pileup and restricting to PFCs with p T > 1 GeV to minimize acceptance effects.More detailed studies incorporating detector unfolding will be presented elsewhere.
Imaging the Confining Transition to Free Hadrons.-Thesimplest jet substructure observable is the two-point correlator, which probes the dynamics of a jet as a function of the angular scale R L .Here, R L is associated with a transverse-momentum exchange of ∼ p jet T R L between two idealized calorimeters at infinity.Since QCD confines, we expect to see two distinct scaling regimes, corresponding to the nearly conformal dynamics of quarks and gluons at large angular scales and to free hadrons at small angular scales.
In Fig. 1, we show the two-point correlator extracted from the CMS Open Data, which provides a striking confirmation of this picture.We now describe each region of this plot working from large to small angular scales.For R L 0.5, the angular size of the correlator is larger than the R = 0.5 radius of the jet, leading to a behavior that is an artifact of the jet clustering algorithm.Moving to smaller angles, we enter a wide regime of universal scaling behavior associated with the perturbative interactions of quarks and gluons, and more explicitly the lightray OPE and the twist-2 spin-3 anomalous dimensions.This pristine scaling behavior occurs for over a decade, until at R L ∼ Λ QCD /p jet T ∼ 10 −2 , there is a clear break in the scaling behavior corresponding to the confinement of quark and gluon degrees of freedom into hadrons.Below this, we observe a nearly perfect R L dσ/dR L ∝ R 2 L scaling, corresponding to uniformly distributed hadrons.Quite remarkably, even if we had no understanding of QCD, we would be able to infer from this analysis that hadrons propagate freely at long distances. 8he ability to directly observe a clear transition between interacting partons and free hadrons relies on the high energies of the LHC, where these phases are cleanly separated.Unlike in condensed matter systems where confinement can be imaged as a function of time [87], one might have naively thought that observing this transition at the LHC would be impossible using only asymptotic measurements.Fortunately, the time evolution of the jet formation is faithfully imprinted into the angular scale of the correlator, τ 1/(p T R 2 L ), allowing us to image the jet. 9We believe this opens the door to further studies of the confinement transition using LHC data, complementary to the recent Lund plane measurement from ATLAS [88], as well as applications to the understanding of the time structure of jet quenching in heavy-ion collisions [89][90][91][92].
Ratios of Projected Correlators.-In the wide perturbative window in Fig. 1, the projected N -point correlators exhibit a scaling governed by the twist-2 spin-N + 1 anomalous dimensions, providing a precision test of perturbative QCD and a measure of the strong coupling α s [36].These correlators have closely related leading nonperturbative corrections for different values of N , and thus by taking the ratio to the two-point correlator, we can cancel the leading non-perturbative contribution and isolate a clean perturbative scaling.Taking the ratio has the added benefit that it removes classical scaling contributions: in the absence of anomalous dimensions, this ratio would be unity.A non-vanishing scaling in the ratio is therefore a genuine quantum effect associated with the scaling behavior of the light-ray OPE.
In Fig. 2, we show the ratios of projected correlators up to the six-point correlator.In the perturbative regime, a clear scaling behavior is observed.The slope increases as N is increased due to the fact that the twist-2 anomalous dimension governing the scaling grows monotonically with spin.This provides a validation of the predictions of Ref. [19] in public collider data.Precision measurements of these correlators would be extremely interesting for probing implementations of higher-order DGLAP in parton showers [93] and further testing the light-ray OPE.
Additionally, measurements of this scaling behavior provide direct access to α s and admit a number of advantages over previous proposals to extract α s from jet shapes.In particular, this scaling can be measured directly without grooming algorithms [94,95], and can be computed on tracks to significantly reduce experimental uncertainties.Furthermore, measuring the scaling for a family of projected correlators enables one to disentangle the effects of the parton distribution functions.We show a comparison of CMS Open Data to leading-logarithmic QCD predictions in the Supplemental Material.
Shapes of Energy Correlators.-Moving beyond scaling behavior, the shape dependence of higher-point correlators yields insights into the detailed structure of interactions between quarks and gluons.For example, threepoint correlators encode spin correlations [96][97][98] arising from the spin-1 nature of gluons.Measurements of higher-point correlators are also useful for testing the incorporation of higher-point splitting functions in parton shower generators.
Here, we focus on the three-point correlator.For fixed R L , the three-point correlator is a function of two crossratios whose analytic form was computed in Ref. [99] to leading order (LO) in QCD.For histogrammed analyses, it is convenient to map the domain of definition of the three-point correlation function to a rectangular grid.Denoting the long, medium, and small sides of the triangle spanned by the operators as (R L , R M , R S ), we define the coordinates: This parametrization blows up the OPE region into a line, with ξ and φ the radial and angular coordinates about the OPE limit, respectively.More details can be found in the Supplemental Material.
In Fig. 3, we show the shape dependence of the threepoint correlator in the CMS Open Data, fixing R L ∼ 0.25.It exhibits a rich shape characteristic of the 1 → 3 interaction in QCD.This is the first analysis of a threepoint correlator in QCD, and more generally, we believe that it is the first experimental analysis of a three-point correlator of light-ray operators in any theory.The rich LHC data will also enable the measurement of higherpoint correlators, as their calculations become available.
Higher-Point Scaling.-Inaddition to measuring the shape of the three-point correlator for fixed R L , one can also measure the scaling with R L for fixed shapes.One of the remarkable features of the light-ray OPE structure of the energy correlators is that this scaling can be predicted for arbitrary point correlators in conformal field theory [19].In the perturbative regime, where the lightray OPE is applicable in QCD, it predicts that the scaling of an N -point correlator of fixed shape is the same as for the projected N -point correlator.This is a much more non-trivial prediction of perturbative QCD, which unlike the projected scaling is not guaranteed to be described by parton shower simulations, making it particularly interesting to study in data.
We focus for concreteness on the scaling of the threepoint correlator for fixed shapes.Unfortunately, a LO calculation of the three-point correlator on tracks is not yet available, although it can in principle be obtained using the track function formalism [78][79][80][81][82].We therefore consider only the measurement on all hadrons, though detector effects (which have not been corrected) are larger.In Fig. 4, we show the scaling for the threepoint correlator measured on all hadrons for three different shapes, denoted by A, B, and C, whose precise parametrization is given in the Supplemental Material.The ratio to the projected three-point correlator is shown in the bottom panel.We see consistency with the prediction that the scaling for the shapes is the same as for the projected correlators, though more data and a proper unfolding would be required to make a definitive statement.Interestingly, as shown in the Supplemental Material, this behavior is in tension with the default parton shower in Pythia 8.226 [100].This strongly motivates both more precise measurements of this scaling, and further work to implement the 1 → 3 splitting functions into parton showers [101][102][103].
Conclusions.-In this paper, we argued that taking full advantage of the high energies, multiplicities, and angular resolution of the LHC for studying QCD enables a paradigm shift to thinking about jet substructure in terms of correlation functions of energy flow operators.Using publicly available CMS Open Data, we showed that the underlying theoretical beauty of the correlatorbased approach could be accessible in future experimental analyses, and we illustrated how it provides new perspectives on jets at the LHC.
The focus of this paper has been on the phenomenological applications of correlators to jets at the LHC.But the rich theoretical structure underlying energy correlators, which has seen remarkable recent progress from numerous directions [48-50, 97, 99, 104-112], also provides significant motivation for reformulating jet substructure in this language.This combination of new theoretical techniques and phenomenological applications is truly exciting and opens the door to significant progress in our understanding of QCD using the unique experimental capabilities of the LHC.

Supplemental Material to
Analyzing N-point Energy Correlators Inside Jets with CMS Open Data Patrick T. Komiske, Ian Moult, Jesse Thaler, Hua Xing Zhu In this Supplemental Material, we provide more detailed results on energy correlators from Open Data, simulation, and theory.

PARAMETRIZATION OF ENERGY CORRELATORS
At hadron colliders, an N -point energy correlator is specified by the rapidity and azimuthal angles of N points on an idealized cylindrical calorimeter at infinity.In the jet substructure (collinear) limit we are considering, they can be well approximated by the configurations of N -side polygons (not necessarily convex), with the side lengths specified by the mutual angular distance of the points ∆R = ∆y 2 + ∆φ 2 .Two-point, three-point, and four-point correlators are shown schematically in Fig. 5.A three-point projected energy correlator (E3C) is a three-point energy correlator with R S and R M integrated over, while maintaining the hierarchy Example configurations of two-point, three-point, and four-point energy correlators, labeled by the longest side RL, medium side RM the case of the three-point correlator), and shortest side RS.
In our analysis of the three-point energy correlator, we only distinguish inequivalent configuration up to translation, rotation, and reflection.We use the configuration space of a triangle to label inequivalent configurations.This is illustrated by the green region in Fig. 6a.The squeezed (OPE) limit is located at the bottom left corner.We also label the three triangles plotted in Fig. 4 by A, B, and C in Fig. 6a.To simplify data binning, we make a coordinate transformation of the configuration space to a square, as in Eq. (3).A schematic illustration of the mapping is shown in Fig. 6b, where the squeezed limit has been blown up into a line at ξ = 0.

COMPARISON WITH LEADING-LOGARITHMIC PREDICTIONS
It is instructive to compare theory predictions for the N -point projected correlator against results obtained from the CMS Open Data.For simplicity we restrict our theory prediction to leading-logarithmic (LL) accuracy.In principle, a next-to-leading logarithmic analysis could be carried out using the formalism in Refs.[36,105], combined with the use of fragmenting jet functions to incorporate the jet algorithm dependence [113][114][115].Next-to-next-toleading logarithmic predictions are also available for e + e − collisions [105], while for hadronic collisions, an infrared subtraction algorithm for collinear unsafe final state observable is needed, which is currently not available.
In the LL approximation, the N -point projected energy correlator is given by the following factorization formula at the factorization scale µ [36]: where β 0 = 11C A /3 − 2N f /3 is the one-loop QCD beta function, x q (x g ) is the fraction of quark (gluon) jets in the sample, and H J is the production cross section for a jet under the p T and rapidity selection cut.Note that at LL, the N dependence only enters through γ (0) (N + 1).At leading order, γ (0) (j) is the anomalous dimension matrix of twist-2 local Wilson operator for quarks and gluons: with matrix entries given by where Ψ(z) = Γ (z)/Γ(z) is the logarithmic derivative of the gamma function.
The expression in Eq. ( 4) is a LL prediction at parton level.At small R L , it scales as 1/R L and is the dominant perturbative contribution.It is known, however, that EEC-type observables suffer from large hadronization corrections, which scale as 1/R 2 L [23,44,116,117].When taking the ratio of projected energy correlators, though, a large part of the hadronization corrections are cancelled.In addition, taking the ratio also largely cancels the hard function H J .Thus, up to the overall quark/gluon composition, the LL prediction is independent of the parton distribution functions and underlying hard scattering processes that produce the jet ensemble.This makes the ratio of projected energy correlators an ideal candidate for precision QCD measurements.
In Fig. 7, we compare the partonic predictions with CMS Open Data for the ratios of projected energy correlators.The open data results are shown for all hadrons (black) and charged hadrons only (red), and their relative agreement is one piece of evidence for the non-perturbative robustness of these ratios.The close agreement between the scaling for the ratios of projected correlators as measured on all hadrons and on tracks arises from a combination of three non-trivial features of these observables.First, due to the renormalization group consistency of the hard-collinear factorization formula in Eq. ( 4) [36], the use of tracks does not modify the anomalous dimension of the jet or hard functions.Second, as shown in Ref. [36], in a pure gluon theory, the track functions are governed by the same anomalous dimensions as the jet function but with a non-trivial mixing structure, leading to an interesting cancellation and resulting in the same LL scaling behavior whether measured on all hadrons or tracks.And finally, corrections to this picture in QCD are suppressed by the difference of the first moments of the track functions for quarks and gluons.Since high energy jets in QCD are dominated by pions, the first moments satisfy the approximate relation T g (1) T q (1) 2/3 and hence ∆ = T q (1) − T g (1) 1 is highly suppressed [82,118].For our LL calculation, we choose α s (M Z ) = 0.118 and use two-loop running of strong coupling.We set µ = p jet T /5 as the nominal scale, as motivated by the fragmenting jet formalism, and vary around the nominal scale by a factor of 2 to estimate the theory uncertainty.The partonic predictions are shown for a pure-quark sample (x q = 1, x g = 0) and a pure gluon sample FIG. 7. Ratios of N -point projected energy correlators for N ranging from 2 to 6.We show results from the CMS Open Data for both the all-hadron case (black) and the charged-hadron case (red).We stress these results do not involve detector unfolding and the error bars only represents statistical uncertainties.We also show LL predictions for quark and for gluon jets, with the corresponding scale uncertainty band.
(x q = 0, x g = 1).We see that a reasonably good agreement can be achieved if a large gluon jet fraction is chosen.The fact that good agreement persists out to N = 6 is evidence that hadronization corrections are indeed largely cancelled in the ratios.In future work, it would be interesting to fit the quark/gluon composition to the data using the technique of Ref. [119].
For completeness, in Fig. 8 we show the two-point correlator for the all-hadron case.Like for the charged-hadron version in Fig. 1, the different phases of QCD are still visible.That said, in the quark/gluon phase, the scaling law seems to be weakly violated.We suspect this is due to detector effects, so it will be interesting to see if these features are absent once unfolding is performed.Perhaps counterintuitively, the R L dσ/dR L ∝ R 2 L scaling is robust in the all hadron case, despite the worse angular resolution for neutral hadrons.One has to remember, though, that detector smearing effects also induce decorrelation, so detailed studies are needed to disentangle detector effects from a genuine QCD phase transition.
In future work, it will be interesting to understand the mismatch between the parton shower and LO predictions.While Pythia includes LL and partial NLL resummation of logarithms of R S and the fixed-order prediction does not, the configurations of A, B, and C are chosen such that large logarithms of R S are less important.Furthermore, the fixed-order prediction includes matrix-element corrections for 1 → 3 splittings, whereas the default parton shower in Pythia does not.Understanding the origin of difference between Pythia and fixed-order theory, and between Pythia and the CMS Open Data, might shed light on the resummation of R L scaling and on the matrix element corrections for 1 → 3 splitting.

FIG. 3 .
FIG.3.The normalized shape dependence of the three-point correlator.Shown here is a slice of the data at RL ∼ 0.25 with the coordinates (ξ, φ) defined in Eq. (3).

FIG. 4 .
FIG.4.Scaling behavior for fixed shapes of the three-point correlator, whose parametrization is given in the Supplemental Material.The ratio to the projected three-point correlator is shown in the bottom panel, where flat ratios correspond to the perturbative prediction in the shaded region.Unlike the previous plots, these results are for all hadrons.