CaloGAN: Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks

The precise modeling of subatomic particle interactions and propagation through matter is paramount for the advancement of nuclear and particle physics searches and precision measurements. The most computationally expensive step in the simulation pipeline of a typical experiment at the Large Hadron Collider (LHC) is the detailed modeling of the full complexity of physics processes that govern the motion and evolution of particle showers inside calorimeters. We introduce \textsc{CaloGAN}, a new fast simulation technique based on generative adversarial networks (GANs). We apply these neural networks to the modeling of electromagnetic showers in a longitudinally segmented calorimeter, and achieve speedup factors comparable to or better than existing full simulation techniques on CPU ($100\times$-$1000\times$) and even faster on GPU (up to $\sim10^5\times$). There are still challenges for achieving precision across the entire phase space, but our solution can reproduce a variety of geometric shower shape properties of photons, positrons and charged pions. This represents a significant stepping stone toward a full neural network-based detector simulation that could save significant computing time and enable many analyses now and in the future.


I. INTRODUCTION
The physics programs of all experiments based at the LHC rely heavily on detailed simulation for all aspects of event reconstruction and data analysis. Simulated particle collisions, decays, and material interactions are used to interpret the results of ongoing experiments and estimate the performance of new ones, including detector upgrades.
State-of-the-art simulations are able to precisely model detector geometries and physical processes spanning distance scales as small as 10 −20 m for the initial partonparton scattering, all the way to the material interactions at meter length scales. These processes, which include nuclear and atomic interactions, such as ionization, as well as strong, weak, and electromagnetic processes, will alter the state of incoming particles as they propagate through and interact with layers of material in the various detector components. Detection techniques such as calorimetry exploit these physical interactions to detect the presence and measure the energy of particles such as photons, electrons and hadrons via their interactions with hundreds of thousands of detector components. Upon interaction with a calorimeter, a cascade (shower ) of secondary particles is produced and their energy is collected and transformed into electric signals.
Physics-based (full simulation) modeling of particle showers in calorimeters (with Geant4 [1] as the state of the art) is the most computationally demanding part of the whole simulation process, and can take minutes per event on modern, distributed high performance platforms [2,3]. The production of physics results is often limited by the absence of adequate Monte Carlo (MC) simulation, and the increase in luminosity at the LHC will only exacerbate the problem. For example, the ATLAS and CMS experiments at the high-luminosity phase of the LHC (HL-LHC) will each see about 3 billion top quark pair events [4][5][6][7][8][9][10]; for a MC statistical uncertainty that is significantly below the data uncertainty, hundreds of billion simulated events would be required. This is not possible using full detector simulation techniques with existing computing resources. Currently, full MC simulation occupies 50-70% of the experiments' worldwide computing resources, equivalent to billions of CPU hours per year [11][12][13].
The relevance of the calorimeter simulation step has sparked the development of approximate, fast simulation solutions to mitigate its computational complexity. Fast simulation techniques rely on parametrized showers [14][15][16] for fluctuations, and look-up tables for low energy interactions [17]. For many applications, these techniques are sufficient. However, analyses that utilize the detailed structure of showers for particle identification as well as energy and direction calibration may not be able to rely on these simplified approaches [18].
We introduce a Deep Learning model to enable highfidelity fast simulation of particle showers in electromagnetic calorimeters. Previous work [19] assessed the viability of GAN-based simulation of jet-images [20] sparse, structured, 2D representations of jet fragmentation analogous to a single-layer, idealized calorimeter -and focused on providing architectural guidelines for this regime. Neural network-based generation, including GANs, Variational Auto-Encoders [21], and Adversarial Auto-Encoders [22], have also been tested in other areas of science, such as Cosmology [23,24], Condensed Matter Physics [25], and Oncology [26]. The longitudinally segmented calorimeter simulation addressed in this work offers unique challenges due to the sparsity of hit cells, the non-uniform granularity among the detector layers, and their sequential structure. In addition to enabling physics analyses at the LHC, the CaloGAN may form a base for solving similar computationally intensive modeling problems in other domains of science, medicine, and technology.
The paper is organized as follows. Section II introduces the dataset of calorimeter showers and Sec. III briefly reviews the generic GAN setup. The CaloGAN is described in Sec. IV and first results of its performance are documented in Sec. V. The paper ends with conclusions and future outlook in Sec. VI.

II. DATASET
A detector simulation begins with a list of particles with lifetimes greater than O(mm/c). For each particle, we are given its type (e.g. electron, pion, etc.), its energy, and its direction. The particle type determines when and how the particle interacts with the material along its trajectory. Material interactions with the detector factorize [27]: the energy deposited in a calorimeter by various particles is the sum of the energy from each shower treated independently.
There are two flavors of calorimeters: electromagnetic and hadronic. Electromagnetic calorimeters are designed to stop electrons and photons, which have shallower and narrower showers compared with protons, neutrons, and charged pions. Hadronic calorimeters are thicker and deeper in order to capture penetrating radiation that forms irregular showers from nuclear interactions. In this first application of GANs to a longitudinally segmented calorimeter, we choose to focus only on electromagnetic showers. In addition to already providing the capability to simulate electrons and photons, the electromagnetic shower contains all of the new challenges described in Sec. I.
Transverse segmentation is critical for particle identification and energy calibration in an electromagnetic calorimeter. For example, the radiation pattern can be used to distinguish prompt photons from π 0 → γγ, where the distance between the two photons is O(cm) for a 10 GeV π 0 at one meter from the interaction point. Pion rejection and an excellent resolution for photons in the Higgs boson H → γγ discovery channel were driving factors for the design of the ATLAS Liquid Argon (LAr) electromagnetic calorimeter [28], which will serve as an inspiration for the calorimeter used in this study. In particular, the calorimeter used in this study is a cube with size 480 mm 3 with no material in front of it. There are three instrumented layers in the radial (z) direction [29] with thicknesses 90 mm, 347 mm, and 43 mm. The active material is LAr and the absorber material is lead. Only the total energy per layer, that includes both the active and inactive contributions is used in what follows.
In contrast to the complex accordion geometry in the actual ATLAS calorimeter, our simplified setup (built on the Geant4 B4 example) uses flat alternating layers of lead and LAr that are 2 mm and 4 mm thick, respectively. Each of the three layers has a different segmentation, which is also not square in the first and third layers. In particular, the cells in the first layer are 160 mm × 5 mm, the cells in the second layer are 40 mm 2 , and the cells in the third layer are 40 × 80 mm 2 . The short direction in the first layer (η) corresponds to what would be the pp beam direction in a full experiment. In contrast, the short direction in the third layer (φ) is perpendicular to η. Table I summarizes the calorimeter geometry.
The training data set [30] is prepared as follows. Geant4 10.2.0 [1] is used to generate particles and simulate their interaction with our calorimeter using the Ftfp_Bert physics list based on the Fritiof [31][32][33][34] and Bertini intra-nuclear cascade [35][36][37] models with the standard electromagnetic physics package [38]. Positrons, photons, and charged pions with various energies are incident perpendicular on the center of the calorimeter front. Energies in the training are uniform in the range between 1 GeV and 100 GeV. Fig. 1 shows an example 10 GeV electron event with the exact energy deposits from Geant4 ( Fig. 1(a)) and after descretizing them according to our calorimeter geometry ( Fig. 1(b)). For visualization purposes, a 3-dimensional particle energy signature ( Fig. 2) will be displayed in the rest of this paper as a series of three 2D images in ηφ space (Fig. 3), where the pixel intensity represents the sum of the energies of all particles incident to that cell [39]. The first layer can be represented as a 3 × 96 image, the middle layer as a 12 × 12 image, and the last layer as a 12 × 6 image.

III. GENERATIVE ADVERSARIAL NETWORKS
Since their first formulation [40], Generative Adversarial Networks have become a rapidly increasing area of attention in the Machine Learning literature with many applications in natural image processing. However, there are far fewer applications in basic science and prior to this work, no applications in high energy physics and nuclear physics.
Generative Adversarial Networks (GANs) cast the task of training a deep generative model as a two-player noncooperative minimax game, in which a generator network G is trained concomitantly with an adversary, the discriminator network D, in order to learn a target distribution f . The generator G learns a map from a latent space z ∼ p z (z) (usually chosen to be N (0, 1)) to the space of generated samples, while D learns a map from the sample space to [0, 1], the probability that a shown sample is real. Note that the map that the generator learns implicitly defines a density g. The game-theoretical basis for this framework [40,41] ensures that if we extend the space of allowed functions that G and D can draw from to be the space of all continuous functions, then there exists some G (and, by construction, an implicit g) that exactly  I: Dimension of a calorimeter cell. The z direction is the direction of particle propagation (radial direction in a full experiment), the η direction would be along the pp beam axis in a full experiment, and φ is perpendicular to z and η.
recovers the target distribution f , i.e., g → f , while for every sample produced by the generator, the discriminator is maximally confused and admits a posterior of being real of 1 ⁄2. In order to train both G and D, the traditional formulation of GANs [40] utilizes the loss function L adv shown in Eqn. 1.
term associated with the discriminator perceiving a generated sample as fake term associated with the discriminator perceiving a real sample as real (1) Though the GAN framework has shown promise, stability is still a major roadblock, and various ad-hoc and theoretical improvements have been suggested, from architectural guidelines [42][43][44][45][46] to reformulations of the loss specified in Eqn. 1 to move away from the Jensen-Shannon divergence [47][48][49][50][51][52]. As suggested in [53], we are able to impose task-specific metrics which allow us to move away from loss level notions of quality and focus on task-level fidelity measures. We make the conscious decision to utilize the vanilla loss formulation as we find adequate performance with this version.

IV. THE CALOGAN
Generative Adversarial Networks are explored as a tool to speed up full simulation of particle showers in an EM calorimeter. We identify this solution with the name CaloGAN.
For it to be useful in realistic physics applications, such a system needs to be able to accept requests for the generation of showers originating from an incoming particle of type P at energy E [54]. We introduce an auxiliary task of energy reconstruction to condition on E, a real valued variable. The Auxiliary Classifier GAN [43] formalism is tested to also condition on class P , but ultimately abandoned in favor of training a specific generative model for each particle type, as the authors expect that versioning and particle-specific improvements will be prioritized in any practical implementation.
In practice, energy is scaled by a factor of 10 2 and multiplied to the 1024-dimensional [55] latent space vector z ∈ R 1024 . The generator G then maps this input to three gray-scale image outputs with different numbers of pixels, which represent the energy patterns collected by the three calorimeter layers as the requested particle propagates through them. The discriminator D accepts the three images as inputs, along with E, the chosen value for the particle energy. The inputs are mapped to a binary output that classifies showers into real and fake, and a continuous output which calculates the total energy deposited in the three layers, then compares it with the requested energy E.

A. Model Architecture
Given the sparsity levels and high dynamic range in the data described in Section II, we follow the LAGAN guidelines [19] to modify the DCGAN [42] architecture for this specific regime.
In the generator (shown in Fig. 4), our design combines parallel LAGAN-like processing streams with a trainable attention mechanism that encodes the sequential connection among calorimeter layers. The LAGAN submodules are composed of a 2D convolutional unit followed by two locally-connected units with batch-normalization [56] layers in between. The dimensionality and granularity mismatch among the three longitudinal segmentations of the detector demand separate streams of operations with suitably sized kernels. Towards providing a readily adaptable tool, we provide an architecture construction that is simply a function of the desired output image size, as we seek a common denominator that can be readily applied to a variety of particles in order to obtain reasonable baselines in a quick R&D cycle.
Modelling the sequential nature of the relationship among the energy patterns collected by the three layer requires extra care. Drawing inspiration from [46], we choose an attention mechanism to allow dependence among layers, in which we define trainable transfer functions to optimally resize and apply knowledge of the energy pattern in previous layers to the generation of the subsequent layer readout. More specifically, in-painting takes as input a resized image from a previous layer, I , is the Hadamard product. This end-to-end trainable unit can utilize information about the two layers to decide what information to propagate through from the previous particle deposition. An alternative architectural choice that includes a recurrent connection will be subject of future studies.
Leaky Rectified Linear Units [57] are chosen as activation functions throughout the system, with the exception of the output layers of G, in which we prefer Rectified Linear Units [58] for the creation of sparse samples [19].
In the discriminator (shown in Fig. 5), the feature space produced by each LAGAN-style output stream is augmented with a sub-differentiable version of sparsity percentage [59], as well as minibatch discrimination [48] on both the standard locally connected network-produced features and the output sparsity itself, to ensure a well examined space of sparsities. These are represented in Fig. 5 by the 'features' vector. The discriminator is further customized with domainspecific features to ensure fidelity of samples. Given the importance of matching the requested energy E, D directly calculates the empirical energy per layerÊ i , i ∈ {0, 1, 2}, as well as the total energyÊ tot . Minibatch discrimination is performed on this vector of per-layer energies to ensure a proper distributional understanding. We also add |E −Ê tot | as a feature, as well as I {|E−Êtot|>ε} with ε = 5 GeV -a binary, sub-differentiable feature which encodes the tolerance for GAN-produced scatterings to be incorrect in their reconstructed energy.
Further specifications of the exact hyper-parameter and architectural choices as well as software versioning constraints are available in the source code [60].
Two additional architectural modifications were tested in order to build a particle-type conditioning system directly into the learning process. Neither the AC-GAN [43] nor the conditional GAN [44] frameworks were able to handle the substantial differences among the three particle

types.
We suspect that both a significantly richer model and a larger latent space could alleviate some problems associated with conditioning using the investigated approaches. Although building a fully joint model is an interesting Machine Learning challenge, the practicality and flexibility of this application may suffer from having one single model for all particle showers.

B. Loss Formulation
In this work, we augment the classical adversarial loss term L adv (Eqn. 1) -which penalizes the system whenever D fails to classify samples originating from generated or target distributions -with a mean absolute error term: where δ(e, e ) = |e − e |, E is the requested energy, and E is the reconstructed energy. This allows us to penalize instances of too little or too much deposited energy. This solution not only helps ensuring the confinement of the generated energy to a desirable range, but it also allows to encode a 'soft' physical notion of conservation of energy, according to which no more energy than the initial E of the incoming particle can be physically collected by the detector.
Note, however, that this formulation discourages, but does not forbid, a deposition of more energy than was requested. We can remedy this unphysical result by sampling from a conditional distribution until energy preservation is met. This issue is further addressed in Sec. V B.
During training, the generator will maximize Eqn. 3, and the discriminator will maximize Eqn. 4.
C. Training Strategy L E is set to 0.05 to down-weight the importance of L E compared to L adv and rescale the absolute error, which is measured in GeV, to a comparable range with respect to L adv . This hyper-parameter can be tuned in a systematic way, but with minimal tuning, we were able to find a reasonable value.
The weights in the generator and discriminator are optimized in an alternating fashion over a set of 100,000 Geant4-simulated events for each particle type in batches of 256, using the Adam optimizer [61]. The discriminator has a learning rate of 2 × 10 −5 , and the generator has a learning rate of 2 × 10 −4 . We note that outside of initial rough hyper-parameter tuning, we perform no dedicated optimization per particle type, and simply apply the same training parameters to all three networks. We expect significant performance improvements (especially for pions) with dedicated training.
Each system is trained for 50 epochs. Sixteen NVIDIA K80 graphics cards are used for initial hyper-parameter sweeps, with two Titan X Pascal Architecture cards used for final training. Keras v2.0.3 [62] is used to construct all models, with the TensorFlow v1.1.0 backend [63].

V. PERFORMANCE
As discussed in [53], there exist several methods to qualitatively and quantitatively assess the performance of generative networks, but not all evaluation criteria are equally suitable and reliable for all applications. In this paper, we choose application-driven methods focused on sample quality. A first qualitative assessment will be accompanied by a quantitative evaluation based on physics-driven similarity metrics. The choice reflects the domain specific procedure for data-simulation comparison. These similarity metrics are based on one-dimensional statistics of the shower probability distribution. Visualizing and verifying the performance in higher dimensions is a challenge. One way to probe study the modeling of higher dimensions is to study ability to classify showers from different particles. This is studied in Sec. V C.

A. Qualitative Assessment
We first examine the average calorimeter deposition per voxel (a volumetric pixel). On average, the systems learn a complete picture of the underlying physical processes governing the cascades of e + , γ, and π + with uniform energy between 1 GeV and 100 GeV (Figs. 6, 7, and 8).
Diversity and overtraining concerns can be investigated by considering the nearest neighbors among the training and generated datasets. Figs. 9, 10, and 11 shows five randomly selected events and their GAN-generated nearest neighbors for all three calorimeter layers for e + , γ and π + showers respectively. Good qualitative agreement can be found between the two distributions across all layers, without obvious signs of mode collapse: a failure mode in which the generator learns to produce a small subset of samples from the distribution. Compared to the other two particle types explored in this application, at the individual image level, charged pions clearly display a higher degree of complexity and diversity in their showers. Some π + deposit energy in all cells of a given layer, some only hitting a handful of them. This is because charged pions undergo nuclear interactions in addition to electromagnetic interactions.

B. Shower Shapes
Electron and photon classification and energy calibration use properties of the calorimeter shower [64][65][66][67]. These same features can be used to quantitatively assess the quality of the GAN samples. The list of features used for evaluation is provided in Table IV in Appendix A.
The key physical quantity that governs the shapes of these distributions is the number of radiation lengths X 0 that are traversed by the particle. By definition, X 0 is the distance an electron will travel before its energy is reduced to 1/e on average. The equivalent distance for photons is slightly further (by 9/7 [68]) and is set by the mean free path for pair production. The transverse shower size is also proportional to X 0 . For a brief review, see e.g. [68].
The 1-dimensional distributions for Geant4-and GANgenerated samples are available in Fig. 12. Although the sparsity levels per layer are only roughly matched, note that, for the majority of the remaining variables, the GAN picks up on complex features in the distributions across several orders of magnitude and all particles types. The unique features that pions exhibit, compared to the other particles, make it unfavorable to train a single model for multiple particle types.
Note that shower shape variables were not explicitly part of the training, which is based only on the distribution of pixel intensities and energy. In the future, one can integrate the shower shape distributions into the lossfunction itself. For now, we have left them out for a comprehensive validation assessment. In addition to comparing shower shapes to reference distribution, we want to measure the quality of conditioning on energy. As outlined in Sec. IV B, we cannot explicitly impose conservation of energy, but one can devise a simple sampling system to only keep simulated showers that obey this constraint.
As can be noted in the e + example in Fig. 13, our loss formulation coupled with the uniform training distribution admits an approximately symmetric conditional output energy distribution. In Fig. 13, note that the vertical lines that approximately coincide with the mode of each distribution represent the requested energy, and could easily be used as a threshold on selecting physical events. A noteworthy feature of this system is that one can request energies that lie outside the trained region (capped at 100 GeV in this application), to which a trained CaloGAN will return samples around the requested energy levelthough with broader width, and mode shifted towards the training domain. Whether or not these extrapolated samples obey shower shape distributions and other metrics is left as future work.

C. Classification as a Performance Proxy
Transferability of classification performance from GANgenerated samples to Geant4-generated samples can be used as a proxy both for CaloGAN image quality and potential utility in a practical fast simulation setting.
We perform ten identical trainings of simple six-layer fully-connected e − γ and e − π classifiers, and we report the accuracies for in-domain and out-of-domain testing (Table II) along with the following observations: • when training on Geant4, testing on the generated CaloGAN data set yields similar results to testing on a separate Geant4 data set, leading us to believe that the GAN has learned most of the discriminating physics between the classes of particles. Note that percent-level differences in accuracy may however be relevant for particular applications; • the significantly higher performance obtained on the CaloGAN-generated test set when training on a separate dataset of CaloGAN-generated images highlights a greater inter-class differentiation in the GAN synthetic dataset than originally present in the target Geant4 distribution.
This could either be due to new unphysical, classdependent features produced by the GAN, or to the inability of the GAN to cover the entire feature space for at least one of the particle classes. It is likely that both of these contribute. To some extent, unphysical features are mitigated by the discriminator network of the GAN training itself, but both physical and unphysical features that are not very useful for distinguishing real from fake could turn into very useful features for the two-particle classification case. Such information would therefore appear discriminative in GAN images but not in Geant4. While classification is a useful metric for probing the highdimensional feature space and shows promising results, there are still challenges for interpreting and improving upon the outcome.

D. Computational Performance
In addition to the promise of being a high-fidelity fast simulation paradigm and respecting many shower shape variables, the CaloGAN affords many orders of magnitude in computational speedups [69]. We benchmark generation time on e + with incident energy drawn uniformly between 1 GeV and 100 GeV. Geant4 and CaloGAN on CPU are benchmarked on nearly identical computenodes on the PDSF distributed cluster at the National Energy Research Scientific Computing Center (NERSC), and numerical results are obtained over an average of 100 runs. CaloGAN on GPU hardware is benchmarked on an Amazon Web Service (AWS) p2.8xlarge instance, where a single NVIDIA ® K80 is used for the purposes of benchmarking.
In Table III, we show the time-to-generate a single particle shower in milliseconds. We provide different batch sizes for CaloGAN, as we expect different use-cases will have different demands around batching computation. We note that a batch can accept any number of different requested energies. With the largest batch sizes on GPU, our method admits a speedup of 5 orders of magnitude compared to the single-threaded Geant4 benchmark. In addition, generation time with Geant4 scales with incident energy, whereas computational time is flat as a function of incident energy for the CaloGAN.

Implementation Notes
As noted previously in Sec. IV A, separating perparticle-type CaloGAN architectures and implementations affords many benefits. It is easy to imagine a situation where the life cycles surrounding models for different particle types are very different. In addition, this allows for total independence of versioning, framework, or language.
When possible, any GAN should maximally employ batching -we imagine most applications can request all showers from one event simultaneously, maximally taking advantage of CPU/GPU while minimizing data transfer overhead.

VI. CONCLUSIONS AND FUTURE OUTLOOK
Using modern generative deep neural network techniques, we have generated three-dimensional electromagnetic showers in a multi-layer sampling LAr calorimeter with uneven spatial segmentation, while attempting to preserve spatio-temporal relation among layers. Our approach infused Physics domain knowledge and reproduced many aspects of key shower shape properties comparable to the ones in the Geant4 full simulation. We showed the possibility of up to five orders of magnitude decrease in computing time.
Future work will focus on improving performance by drawing from the recent Machine Learning developments in GAN training procedures, as well as testing the direct inclusion of important shower shape variables as constraints at training time. Further developments will build on this result and continue expanding the complexity of the training dataset to include incoming particles at different locations and angles within the detector, as well as the hadronic calorimeter. Concurrent plans include contributing to testing the computational performance on high performance computing (HPC) clusters, and porting TABLE II: Mean and standard deviation over 10 particle classification trials using a six-layer fully-connected network with dropout. The networks are trained using a dataset from the domain specified in the first column, and tested on an independent dataset from the domain specified in the header.  these solutions into the simulation packages used by the nuclear and particle physics communities, in order for the various experiments to be able to maximally benefit from this new technology. The standard deviation of the transverse energy profile per layer, in units of cell numbers