ParticleNet: Jet Tagging via Particle Clouds

How to represent a jet is at the core of machine learning on jet physics. Inspired by the notion of point clouds, we propose a new approach that considers a jet as an unordered set of its constituent particles, effectively a"particle cloud". Such a particle cloud representation of jets is efficient in incorporating raw information of jets and also explicitly respects the permutation symmetry. Based on the particle cloud representation, we propose ParticleNet, a customized neural network architecture using Dynamic Graph Convolutional Neural Network for jet tagging problems. The ParticleNet architecture achieves state-of-the-art performance on two representative jet tagging benchmarks and is improved significantly over existing methods.


I. INTRODUCTION
A jet is one of the most ubiquitous objects in protonproton collision events at the LHC. In essence, a jet is a collimated spray of particles. It serves as a handle to probe the underlying elementary particle produced in the hard scattering process that initiates the cascade of particles contained in the jet.
One of the most important questions about a jet is which type of elementary particle initiates it. Jets initiated by different particles exhibit different characteristics. For example, jets initiated by gluons tend to have a broader energy spread than jets initiated by quarks. High-momentum heavy particles (e.g., top quarks and W, Z, and Higgs bosons) that decay hadronically can lead to jets with distinct multi-prong structures. Therefore, the identity of the source particle can be inferred from properties of the reconstructed jet. Such particle identity information provides powerful insights into the collision events under study and therefore can help greatly in separating events originating from different physics processes and improving the sensitivity of both searches for new particles and measurements of the standard model processes.
In this paper, we propose a new deep-learning approach for jet tagging using a novel way to represent jets. Instead of organizing a jet's constituent particles into an ordered structure (e.g., a sequence or a tree), we treat a jet as an unordered set of particles [57]. This is very analogous to the point cloud representation of threedimensional (3D) shapes used in computer vision, where each shape is represented by a set of points in space, and the points themselves are also unordered. Therefore, a jet can be viewed as a "particle cloud". Based on Dynamic Graph Convolutional Neural Network (DGCNN) [58], we design ParticleNet, a customized neural network architecture that operates directly on particle clouds for jet tagging. The ParticleNet architecture is evaluated on two jet tagging benchmarks and is found to achieve significant improvements over all existing methods.

II. JET REPRESENTATIONS
The efficiency and effectiveness of ML techniques on jet physics relies heavily on how a jet is represented. In this section, we review the mainstream jet representations and introduce the particle cloud representation.

A. Image-based representation
The image representation has its root in the reconstruction of jets with calorimeters. A calorimeter measures the energy deposition of a jet on fine-grained spatial cells. Treating the energy deposition on each cell as the pixel intensity naturally creates an image for a jet. When jets are formed by particles reconstructed with the full detector information (e.g., using a particle-flow algorithm [59,60]), a jet image can be constructed by mapping each particle onto the corresponding calorimeter cell, and sum up the energy if more than one particle is mapped to the same cell.

arXiv:1902.08570v3 [hep-ph] 30 Mar 2020
The image-based approach has been extensively studied for various jet tagging tasks, e.g., W boson tagging [25][26][27][28][29]35], top tagging [32][33][34] and quark-gluon tagging [30,31]. Convolutional neural networks (CNNs) with various architectures were explored in these studies, and they were found to achieve sizable improvement in performance compared to traditional multivariate methods using observables motivated by QCD theory. However, the architectures investigated in these papers are in general much shallower compared to state-of-the-art CNN architectures used in image classification tasks (e.g., ResNet [61] or Inception [62]); therefore, it remains to be seen that if deeper architectures can further improve the performance.
Despite the promising performance, the image-based representation has two main shortcomings. While it can include all information without loss when a jet is measured by only the calorimeter, once the jet constituent particles are reconstructed, how to incorporate additional information of the particles is unclear, as it involves combining non-additive quantities (e.g., the particle type) of multiple particles entering the same cell. Moreover, treating jets as images also leads to a very sparse representation: a typical jet has O(10) to O(100) particles, while a jet image typically needs O(1000) pixels (e.g., 32 × 32) in order to fully contain the jet; therefore, more than 90% of the pixels are blank. This makes the CNNs highly computationally inefficient on jet images.

B. Particle-based representation
A more natural way to represent a jet, when particles are reconstructed, is to simply view the jet as a collection of its constituent particles. This approach allows for the inclusion of any kind of features for each particle and therefore is significantly more flexible than the image representation. It is also much more compact compared to the image representation, though at the cost of being variable length, as each jet may contain a different number of particles.
A collection of particles, though, is a rather general concept. Before applying any deep-learning algorithm, a concrete data structure has to be chosen. The prevailing choice is a sequence, in which particles are sorted in a specific way (e.g., with decreasing transverse momentum) and organized into a one-dimensional (1D) list. Using particle sequences as inputs, jet tagging tasks have been tackled with recurrent neural networks (RNNs) [36][37][38][39]45], 1D CNNs [40][41][42][43][44] and physics-oriented neural networks [46][47][48]. Another interesting choice is a binary tree, which is well motivated from the QCD theory perspective. Recursive neural networks (RecNNs) are then a natural fit and have been studied in Refs. [49,50].
One thing to note about the sequence or tree representation is that they both need the particles to be sorted in some way, as the order of the particles is used implicitly in the corresponding RNNs, 1D CNNs, or the RecNNs. However, the constituent particles in a jet have no intrinsic order; thus, the manually imposed order may turn out to be suboptimal and impair the performance.
C. Jet as a particle cloud An even more natural representation than particle sequences or trees would be an unordered, permutationinvariant set of particles. As a special case of the particlebased representations, it shares all the advantages of particle-based representations, especially the flexibility to include arbitrary features for each particle. We refer to such representation of a jet as a particle cloud, analogous to the point cloud representation of 3D shapes used in computer vision. They are actually highly similar, as both are essentially unordered sets of entities distributed irregularly in space. In both clouds, the elements are not unrelated individuals but are rather correlated, as they represent higher-level objects (i.e., jets or 3D shapes) that have rich internal structures. Therefore, deep-learning algorithms developed for point clouds are likely to be helpful for particle clouds, i.e., jets, as well.
The idea of regarding jets as unordered sets of particles was also proposed in Ref. [52] and is in parallel to our work. The Deep Sets framework [63] was adapted to construct the infrared and collinear safe Energy Flow Network and the more general Particle Flow Network. However, different from the DGCNN [58] approach adopted in this paper, the Deep Sets approach does not explicitly exploit the local spatial structure of particle clouds, but only processes the particle clouds in a global way. Another closely related approach is to represent a jet as a graph whose vertices are the particles. Message-passing neural networks (MPNNs) with different variants of adjacency matrices were explored on such jet graphs and were found to show better performance than the RecNNs [51]. However, depending on how the adjacency matrix is defined, the MPNNs may not respect the permutation symmetry of the particles.

III. NETWORK ARCHITECTURE
The permutation symmetry of the particle cloud makes it a natural and promising representation of jets. However, to achieve the best possible performance, the architecture of the neural network has to be carefully designed to fully exploit the potential of this representation. In this section, we introduce ParticleNet, a CNN-like deep neural network for jet tagging with particle cloud data.

A. Edge convolution
CNNs have achieved overwhelming success in all kinds of machine-learning tasks on visual images. Two key features of CNNs contribute significantly to their success.
First, the convolution operation exploits translational symmetry of images by using shared kernels across the whole image. This not only greatly reduces the number of parameters in the network but also allows the parameters to be learned more effectively, as each set of weights will use all locations of the image for learning. Second, CNNs exploit a hierarchical approach [64] for learning image features. The convolution operations can be effectively stacked to form a deep network. Different layers in the CNNs have different receptive fields and therefore can learn features at different scales, with the shallower layers exploiting local neighborhood information and the deeper layers learning more global structures. Such a hierarchical approach proves an effective way to learn images.
Motivated by the success of CNNs, we would like to adopt a similar approach for learning on point (particle) cloud data. However, regular convolution operation cannot be applied on point clouds, as the points there can be distributed irregularly, rather than following some uniform grids as the pixels in an image. Therefore, the basis for a convolution, i.e., a "local patch" of each point on which the convolution kernel operates, remains to be defined for point clouds. Moreover, a regular convolution operation, typically in the form j K j x j where K is the kernel and x j denotes the features of each point, is not invariant under permutation of the points. Thus, the form of a convolution also needs to be modified to respect the permutation symmetry of point clouds.
Recently, the edge convolution ("EdgeConv") operation has been proposed in Ref. [58] as a convolution-like operation for point clouds. EdgeConv starts by representing a point cloud as a graph, whose vertices are the points themselves, and the edges are constructed as connections between each point to its k nearest neighboring points. In this way, a local patch needed for convolution is defined for each point as the k nearest neighboring points connected to it. The EdgeConv operation for each point x i then has the form where x i ∈ R F denotes the feature vector of the point x i and {i 1 , ..., i k } are the indices of the k nearest neighboring points of the point x i . The edge function h Θ : R F × R F → R F is some function parametrized by a set of learnable parameters Θ, and is a channel-wise symmetric aggregation operation, e.g., max, sum, or mean. The parameters Θ of the edge function are shared for all points in the point cloud. This, together with the choice of a symmetric aggregation operation , makes Edge-Conv a permutationally symmetric operation on point clouds [65].
In this paper, we follow the choice in Ref. [58] to use a specialized form of the edge function, where the feature vectors of the neighbors, x ij , are substituted by their differences from the central point x i andh Θ can be implemented as a multilayer perceptron (MLP) whose parameters are shared among all edges. For the aggregation operation , however, we use mean, i.e., 1 k , throughout this paper, which shows better performance than the max operation used in the original paper.
One important feature of the EdgeConv operation is that it can be easily stacked, just as regular convolutions. This is because EdgeConv can be viewed as a mapping from a point cloud to another point cloud with the same number of points, only possibly changing the dimension of the feature vector for each point. Therefore, another EdgeConv operation can be applied subsequently. This allows us to build a deep network using EdgeConv operations which can learn features of point clouds hierarchically.
The stackability of EdgeConv operations also brings another interesting possibility. Basically, the feature vectors learned by EdgeConv can be viewed as new coordinates of the original points in a latent space, and then, the distances between points, used in the determination of the k nearest neighbors, can be computed in this latent space. In other words, the proximity of points can be dynamically learned with EdgeConv operations. This results in the DGCNN [58], in which the graph describing the point clouds are dynamically updated to reflect the changes in the edges, i.e., the neighbors of each point. Reference [58] demonstrates that this leads to better performance than keeping the graph static.

B. ParticleNet
The ParticleNet architecture makes extensive use of EdgeConv operations and also adopts the dynamic graph update approach. However, a number of different design choices are made in ParticleNet compared to the original DGCNN to better suit the jet tagging task, including the number of neighbors, the configuration of the MLP in EdgeConv, the use of shortcut connection, etc. Figure 1 illustrates the structure of the EdgeConv block implemented in this paper. The EdgeConv block starts with finding the k nearest neighboring particles for each particle, using the "coordinates" input of the Edge-Conv block to compute the distances. Then, inputs to the EdgeConv operation, the "edge features", are constructed from the "features" input using the indices of k nearest neighboring particles. The EdgeConv operation is implemented as a three-layer MLP. Each layer consists of a linear transformation, followed by a batch normalization [66] and then the a rectified linear unit (ReLU) [67]. Inspired by ResNet [61], a shortcut connection running parallel to the EdgeConv operation is also included in each block, allowing the input features to pass through directly. An EdgeConv block is characterized by two hyperparameters, the number of neighbors k, and the num- ber of channels C = (C 1 , C 2 , C 3 ), corresponding to the number of units in each linear transformation layer. The ParticleNet architecture used in this paper is shown in Fig. 2a. It consists of three EdgeConv blocks. The first EdgeConv block uses the spatial coordinates of the particles in the pseudorapidity-azimuth space to compute the distances, while the subsequent blocks use the learned feature vectors as coordinates. The number of nearest neighbors k is 16 for all three blocks, and the number of channels C for each EdgeConv block is (64,64,64), (128, 128, 128), and (256, 256, 256), respectively. After the EdgeConv blocks, a channel-wise global average pooling operation is applied to aggregate the learned features over all particles in the cloud. This is followed by a fully connected layer with 256 units and the ReLU activation. A dropout layer [68] with a drop probability of 0.1 is included to prevent overfitting. A fully connected layer with two units, followed by a softmax function, is used to generate the output for the binary classification task.
A similar network with reduced complexity is also investigated. Compared to the baseline ParticleNet architecture, only two EdgeConv blocks are used, with the number of nearest neighbors k reduced to 7 and the number of channels C reduced to (32,32,32) and (64,64,64) for the two blocks, respectively. The number of units in the fully connected layer after pooling is also lowered to 128. This simplified architecture is denoted as "ParticleNet-Lite" and is illustrated in Fig. 2b. The number of arithmetic operations is reduced by almost an order of magnitude in ParticleNet-Lite, making it more suitable when computational resources are limited.
The networks are implemented with Apache MXNet optimizer [70], with a weight decay of 0.0001, is used to minimize the cross entropy loss. The one-cycle learning rate (LR) schedule [71] is adopted in the training, with the LR selected following the LR range test described in Ref. [71], and slightly tuned afterward with a few trial trainings. The training of ParticleNet (ParticleNet-Lite) network uses an initial LR of 3 × 10 −4 (5 × 10 −4 ), rising to the peak LR of 3 × 10 −3 (5 × 10 −3 ) linearly in eight epochs and then decreasing to the initial LR linearly in another eight epochs. This is followed by a cooldown phase of four epochs which gradually reduces the LR to 5 × 10 −7 (1 × 10 −6 ) for better convergence. A snapshot of the model is saved at the end of each epoch, and the model snapshot showing the best accuracy on the validation dataset is selected for the final evaluation.

IV. RESULTS
The performance of the ParticleNet architecture is evaluated on two representative jet tagging tasks: top tagging and quark-gluon tagging. In this section, we show the benchmark results.

A. Top tagging
Top tagging, i.e., identifying jets originating from hadronically decaying top quarks, is commonly used in searches for new physics at the LHC. We evaluate the performance of the ParticleNet architecture on this task using the top tagging dataset [72], which is an extension of the dataset used in Ref. [46] with some modifications. Jets in this dataset are generated with Pythia8 [73] and passed through Delphes [74] for fast detector I: Input variables used in the top tagging task (TOP) and the quark-gluon tagging task (QG) with and without PID information.
Variable Definition TOP QG QG-PID ∆η difference in pseudorapidity between the particle and the jet axis x x x ∆φ difference in azimuthal angle between the particle and the jet axis x x x log p T logarithm of the particle's p T x x x log E logarithm of the particle's energy x x x log pT pT (jet) logarithm of the particle's p T relative to the jet p T x x x log E E(jet) logarithm of the particle's energy relative to the jet energy x x x ∆R angular separation between the particle and the jet axis ( (∆η) 2 + (∆φ) 2 ) x x x q electric charge of the particle x isElectron if the particle is an electron x isMuon if the particle is a muon x isChargedHadron if the particle is a charged hadron x isNeutralHadron if the particle is a neutral hadron x isPhoton if the particle is a photon x simulation. No multiple parton interaction or pileup is included in the simulation. Jets are clustered from the Delphes E-Flow objects with the anti-k T algorithm [75] using a distance parameter R = 0.8. Only jets with transverse momentum p T ∈ [550, 650] and pseudorapidity |η| < 2 are considered. Each signal jet is required to be matched to a hadronically decaying top quark within ∆R = 0.8, and all three quarks from the top decay also within ∆R = 0.8 of the jet axis. The background jets are obtained from a QCD dijet process. This dataset consists of 2 million jets in total, half signal and half background. The official splitting for training (1.2M jets), validation (400k jets) and testing (400k jets) is used in the development of the ParticleNet model for this dataset.
In this dataset, up to 200 jet constituent particles are stored for each jet. Only kinematic information, i.e., the 4-momentum (p x , p y , p z , E), of each particle is available. The ParticleNet model takes up to 100 constituent particles with the highest p T for each jet, and uses seven variables derived from the 4-momentum for each particle as inputs, which are listed in Table I. The (∆η, ∆φ) variables are used as coordinates to compute the distances between particles in the first EdgeConv block. They are also used together with the other five variables, log p T , log E, log p T p T (jet) , log E E(jet) and ∆R, to form the input feature vector for each particle.
We compare the performance of ParticleNet with three alternative models [76]: • ResNeXt-50: The ResNeXt-50 model is a very deep two-dimensional (2D) CNN using jet images as inputs. The ResNeXt architecture [77] was proposed for generic image classification, and we modify it slightly for the jet tagging task. The model is trained on the top tagging dataset starting from randomly initialized weights. The implementation details can be found in Appendix A. Note that the ResNeXt-50 architecture is much deeper and therefore has a much larger capacity than most of the CNN architectures [25,[27][28][29][30][31][32][33][34][35] explored for jet tagging so far, so evaluating its performance on jet tagging will shed light on whether architectures for generic image classification are also applicable to jet images.
• P-CNN: The P-CNN is a 14-layer 1D CNN using particle sequences as inputs. The P-CNN architecture was proposed in the CMS particle-based DNN boosted jet tagger [42] and showed significant improvement in performance compared to a traditional tagger using boosted decision trees and jet-level observables. The model is also trained on the top tagging dataset from scratch, with the implementation details in Appendix B.
• PFN: The Particle Flow Network (PFN) [52] is a recent architecture for jet tagging which also treats a jet as an unordered set of particles, the same as the particle cloud approach in this paper. However, the network is based on the Deep Sets framework [63], which uses global symmetric functions and does not exploit local neighborhood information explicitly as the EdgeConv operation. Since the performance of PFN on this top tagging dataset has already been reported in Ref. [52], we did not reimplement it but just include the results for comparison.
The results are summarized in Table II and also shown in Fig. 3 in terms of receiver operating characteristic (ROC) curves. A number of metrics are used to evaluate the performance, including the accuracy, the area under the ROC curve (AUC), and the background rejection (1/ε b , i.e., the reciprocal of the background misidentification rate) at a certain signal efficiency (ε s ) of 50% or 30%. The background rejection metric is particularly relevant to physics analysis at the LHC, as it is directly related to the expected contribution of background, and is commonly used to select the best jet tagging algorithm. The II: Performance comparison on the top tagging benchmark dataset. The ParticleNet, ParticleNet-Lite, P-CNN and ResNeXt-50 models are trained on the top tagging dataset starting from randomly initialized weights. For each model, the training is repeated for 9 times using different randomly initialized weights. The table shows the result from the median-accuracy training, and the standard deviation of the 9 trainings is quoted as the uncertainty to assess the stability to random weight initialization. Uncertainty on the accuracy and AUC are negligible and therefore omitted. The performance of PFN on this dataset is reported in Ref. [52], and the uncertainty corresponds to the spread in 10 trainings.  ParticleNet model achieves state-of-the-art performance on the top tagging benchmark dataset and improves over previous methods significantly. Its background rejection power at 30% signal efficiency is roughly 1.8 (2.1) times as good as PFN (P-CNN), and about 40% better than ResNeXt-50. Even the ParticleNet-Lite model, with significantly reduced complexity, outperforms all the previous models, achieving about 10% improvement with respect to ResNeXt-50. The large performance improvement of the ParticleNet architecture over the PFN architecture is likely due to a better exploitation of the local neighborhood information with the EdgeConv operation.

B. Quark-gluon tagging
Another important jet tagging task is quark-gluon tagging, i.e., discriminating jets initiated by quarks and by gluons. The quark-gluon tagging dataset from Ref. [52] is used to evaluate the performance of the ParticleNet architecture on this task. The signal (quark) and background (gluon) jets are generated with Pythia8 using the Z(→ νν) + (u, d, s) and Z(→ νν) + g processes, respectively. No detector simulation is performed. The final state non-neutrino particles are clustered into jets using the anti-k T algorithm [75] with R = 0.4. Only jets with transverse momentum p T ∈ [500, 550] and rapidity |y| < 2 are considered. This dataset consists of 2 million jets in total, half signal and half background. We follow the recommended splitting of 1.6M/200k/200k for training, validation and testing in the development of the ParticleNet model on this dataset.
One important difference of the quark-gluon tagging dataset is that it includes not only the four momentum, but also the type of each particle (i.e., electron, photon, pion, etc.). Such particle identification (PID) information can be quite helpful for jet tagging. Therefore, we include this information in the ParticleNet model and compare it with the baseline version using only the kinematic information. The PID information is included in an experimentally realistic way by using only five particle types (electron, muon, charged hadron, neutral hadron, and photon), as well as the electric charge, as inputs. These six additional variables, together with the seven kinematic variables, form the input feature vector of each particle for models with PID information, as shown in Table I. Table III compares the performance of the ParticleNet model with a number of alternative models introduced in Sec. IV A. Model variants with and without PID inputs are also compared. Note that for the ResNeXt-50 model only the version without PID inputs is presented, as it is based on jet images which cannot incorporate PID information straightforwardly. The corresponding ROC curves are shown in Fig. 4. Overall, the addition of PID III: Performance comparison on the quark-gluon tagging benchmark dataset. The ParticleNet, ParticleNet-Lite, P-CNN, and ResNeXt-50 models are trained on the quark-gluon tagging dataset starting from randomly initialized weights. The training is repeated 9 times for the ParticleNet model using different randomly initialized weights. The table shows the result from the median-accuracy training, and the standard deviation of the 9 trainings is quoted as the uncertainty to assess the stability to random weight initialization. Because of limited computational resources, the training of other models is performed only once, but the uncertainty due to random weight initialization is expected to be fairly small. The performance of PFN on this dataset is reported in Ref. [52], and the uncertainty corresponds to the spread in ten trainings. Note that a number of PFN models with different levels of PID information are investigated in Ref. [52], and "PFN-Ex", also using experimentally realistic PID information, is shown here for comparison.  inputs has a large impact on the performance, increasing the background rejection power by 10%-15% compared to the same model without using PID information. This clearly demonstrates the advantage of particle-based jet representations, including the particle cloud representation, as they can easily integrate any additional information for each particle. The best performance is obtained by the ParticleNet model with PID inputs, achieving almost 15% improvement on the background rejection power compared to the PFN-Ex (PFN using experimentally realistic PID information) and P-CNN models. The ParticleNet-Lite model achieves the second-best performance and shows about 7% improvement with respect to the PFN-Ex and P-CNN models.

V. MODEL COMPLEXITY
Another aspect of machine-learning models is the complexity, e.g., the number of parameters and the computational cost. Table IV compares the number of parameters and the computational cost of all the models used in the top tagging task in Sec. IV A. The computational cost is evaluated using the inference time per object, which is a more relevant metric than the training time for real-life applications of machine-learning models. The inference time of each model is measured on both the CPU and the GPU, using the implementations with Apache MXNet. For the CPU, to mimic the event processing workflow typically used in collider experiments, a batch size of 1 is used, and the inference is performed in single-thread mode. For the GPU, a batch size of 100 is used instead, as the full power of the GPU cannot be revealed with a very small batch size (e.g., 1) due to the overhead in data transfer between the CPU and the GPU. The ParticleNet model achieves the best classification performance at the cost of speed, being more than an order of magnitude slower than the PFN and the P-CNN models, but still it is not prohibitively slow even on the CPU. In addition, the current implementation of the EdgeConv operation used in the ParticleNet model is not as optimized as the regular convolution operation; therefore, further speed-up is expected from an optimized implementation of EdgeConv. On the other hand, the ParticleNet-Lite model provides a good balance between speed and performance, showing more than 40% improvement in performance while being only a few times slower than the PFN and P-CNN models. Notably, it is also the most economical model, outperforming all previous approaches with only 26k parameters, thanks to the effective exploitation of the permutation symmetry of the particle clouds. Overall, PFN is the fastest model on both the CPU and the GPU, making it a suitable choice for extremely time-critical tasks.

VI. CONCLUSION
In this paper, we present a new approach for machine learning on jets. The core of this approach is to treat jets as particle clouds, i.e., unordered sets of particles. Based on this particle cloud representation, we introduce ParticleNet, a network architecture tailored to jet tagging tasks. The performance of the ParticleNet architecture is compared with alternative deep-learning architectures, including the jet image-based ResNeXt-50 model, the particle sequence-based P-CNN model and the particle set-based PFN model. On both the top tagging and the quark-gluon tagging benchmarks, Par-ticleNet achieves state-of-the-art performance and improves significantly over existing methods. Although the very deep image-based ResNeXt-50 model also shows significant performance improvement over shallower models like P-CNN and PFN on the top-tagging benchmark, indicating that deeper architectures can generally lead to better performance, the gain with the ParticleNet architecture is more substantial. Moreover, the high performance is achieved in a very economical way as the number of trainable parameters is a factor of 4 (56) lower in ParticleNet (ParticleNet-Lite) compared to ResNeXt-50. Such lightweight models are particularly useful for applications in high-energy physics experiments, especially for online event processing in which low latency and memory consumption is critical.
While we only demonstrate the power of the particle cloud representation in jet tagging tasks, we think that it is a natural and generic way of representing jets (and even the whole collision event) and can be applied to a broad range of particle physics problems. Applications of the particle cloud approach to, e.g., pileup identification, jet grooming, jet energy calibration, etc., would be particularly interesting and worth further investigation.