Estimating Elliptic Flow Coefficient in Heavy Ion Collisions using Deep Learning

Machine Learning (ML) techniques have been employed for the high energy physics (HEP) community since the early 80s to deal with a broad spectrum of problems. This work explores the prospects of using Deep Learning techniques to estimate elliptic flow ($v_2$) in heavy-ion collisions at the RHIC and LHC energies. A novel method is developed to process the input observables from particle kinematic information. The proposed DNN model is trained with Pb-Pb collisions at $\sqrt{s_{\rm NN}} = 5.02$ TeV minimum bias events simulated with AMPT model. The predictions from the ML technique are compared to both simulation and experiment. The Deep Learning model seems to preserve the centrality and energy dependence of $v_2$ for the LHC and RHIC energies. The DNN model is also quite successful in predicting the $p_{\rm T}$ dependence of $v_2$. When subjected to event simulation with additional noise, the proposed DNN model still keeps the robustness and prediction accuracy intact up to a reasonable extent.


I. INTRODUCTION
Extremely hot and dense state of the strongly interacting matter is being investigated in collisions of heavy nuclei at ultra-relativistic energies for decades.In highenergy particle colliders, it is believed, that the state of the early Universe can be recreated in tiny volumes of the order of a few cubic fermi (fm 3 ).This state of the deconfined color partons is called as Quark-Gluon Plasma (QGP).Due to the very short duration of its existence, direct observation of QGP is not possible, however, signatures of the evidence can be measured.
The presence of transverse collective expansion (i.e.transverse flow) in strongly coupled dense nuclear matter formed in relativistic heavy-ion collisions serves as an important signature for QGP [1].In particular, the transverse flow is anisotropic in nature meaning particle emission is anisotropic in the momentum space.Anisotropic flow of different order could be expressed as Fourier expansion coefficients of produced particles' azimuthal distribution in the momentum space.Experiments observe the existence of finite anisotropic flow, mainly the elliptic flow (v 2 ) in heavy-ion collisions [2][3][4][5].For decades, investigations are being performed to solve the equations of relativistic fluid dynamics to theoretically model the elliptic flow.Although, these theoretical models based on relativistic fluid dynamics were successful in presenting some of the low-p T phenomena, it over-predicted the observed elliptic flow [6,7].The solution appeared in the form of hybrid models that couple both (ideal) fluid dynamics applied to the QGP phase and kinetic descriptions applied to microscopic hadron cascade phase [8][9][10][11].In intermediate to high-p T , elliptic flow suffers combined consequences from jet quenching and parton energy loss due to the dense medium and thus, simple hydro mod-els fail to explain the data [7,12].A recent study has presented the implementation of hydro freeze-out at lowp T , parton coalescence at intermediate-p T , and fragmentation at high-p T along with coupled linear Boltzmann transport-hydro model [13].This could explain both R AA and v 2 simultaneously from low to intermediate and highp T in high-energy heavy-ion collisions.
Apart from understanding various theoretical aspects of elliptic flow, there are challenges in estimating v 2 in experiments.By definition, v 2 requires the information of the reaction plane angle on an event-by-event basis, whose measurement is non-trivial in experiments.There are couple of methods that offer the solution such as the complex reaction plane identification method [14], the cumulant method [15], and the Lee-Yang zeroes method [16].There are even attempts to implement the Principal Component Analysis (PCA) methods to estimate v 2 as well [17][18][19][20].For the first time, we propose a Deep Learning method in the Machine Learning framework to estimate v 2 .Machine Learning (ML) techniques have been employed for the high energy physics (HEP) community since the early 80s to deal with a broad spectrum of problems [21][22][23][24].With the advancement of superior hardware and smart algorithms, ML has become the most popular tool for statistical, data-driven, and prediction-based applications.It has the ability to perceive unique features and patterns in data to solve unconventional problems such as classification, regression, and clustering, just to name a few.
The motivation of this study is to explore the prospects of using Deep Learning techniques to estimate elliptic flow.We also proceed to show that the model is capable of learning centrality and transverse momentum dependence of v 2 as well.A simple attempt is made in this paper to estimate elliptic flow in heavy-ion collisions event-by-event from various particle kinematic information using a feed-forward Deep Neural Network (DNN).The proposed DNN model is trained with minimum bias Pb-Pb collisions at √ s NN = 5.02 TeV simulated with AMPT model.The predictions of the machine are compared to both simulation and experiment.Although, the training of the model is a bit CPU expensive, the trained model could be applied independently to estimate v 2 , which is faster and economical.This paper is organised as follows.We begin with a brief introduction to event generation using AMPT model and the target observable in Sec.II.In Sec.III, we describe the proposed ML regression model based on Deep Learning in detail and provide some quality assurance plots along with estimation of systematic uncertainties.Finally, the results are presented in Sec.IV and we summarise our findings in Sec.V.

II. EVENT GENERATION AND TARGET OBSERVABLE
In this section, we describe briefly the event generation using a multi-phase transport model and then define the target observable the elliptic flow.

A. AMPT Model
A multi-phase transport model (AMPT) is a Monte Carlo event generator for simulating p-A and A-A collisions from RHIC to LHC energies to study the properties of hot and dense nuclear matter [25].It has four main components, namely, the fluctuating initial conditions, followed by the parton cascade, hadronization mechanism, and hadron cascade.These are discussed below.
• Initialization of collision is done by obtaining the spatial and momentum distributions of the hard minijet partons and soft string excitations from the HIJING model [26].The inbuilt Glauber model is used to calculate and convert the cross-section of the produced mini-jets in pp collisions to heavy-ion collisions.
• Zhang's parton cascade (ZPC) model is used to perform the partonic interactions and parton cascade which currently includes the two-body scatterings with cross-sections obtained from the pQCD with screening masses [27].
• Hadronization mechanism includes the default mode where the Lund string fragmentation model is used to recombine the partons with their parent strings and then the strings are converted to hadrons, whereas, in the string melting mode the transported partons are hadronized using a quark coalescence mechanism [28,29].
• Finally, the scattering among the produced hadrons are performed using a relativistic transport model (ART) by meson-meson, meson-baryon and baryon-baryon interactions [30,31].
We have used the AMPT string melting mode (AMPT version 2.26t9b) for event generation as the anisotropic flow and particle spectra in intermediate-p T region is well explained by the quark coalescence mechanism for hadronization [32][33][34].The AMPT settings in the current work are the same as those reported in Refs.[35][36][37] for heavy-ion collisions at the LHC energies.For the definition of centrality of collisions, we have followed Ref. [38].We have simulated minimum bias 200K events for Pb-Pb collisions at √ s NN = 5.02 TeV, 100K events for Pb-Pb collisions at √ s NN = 2.76 TeV, and 100K events for Au-Au collisions at √ s NN = 200 GeV using the same settings in the AMPT model.

B. Elliptic flow
In non-central heavy-ion collisions, finite azimuthal momentum space anisotropy is observed which could be expressed as the Fourier decomposition of azimuthal momentum distribution of particles in an event as, Here, v n = cos [n(φ − ψ n )] is the n th order harmonic flow coefficient, φ is the azimuthal angle and ψ n is the n th harmonic symmetry plane angle [39].Anisotropic flow may be a combined outcome of initial spatial anisotropy of the nuclear overlap region, the transport properties and the equation of states of the produced system [40,41].Due to the almond-shaped ellipsoidal nuclear overlap region, the dominant contribution to anisotropic flow comes from the elliptic flow i.e. the second order coefficient (v 2 ) in Eq. (1).To calculate the elliptic flow coefficients event-by-event, we have followed the event plane method [42].In AMPT simulation, there is a provision of making the reaction plane angle ψ n = 0, although it is non-trivial in experiments.From this, one can obtain the elliptic flow coefficients as The average is taken over all the chosen charged particle tracks for an event.For this study, we have used a fixed reaction plane angle, i.e. ψ n = 0 for both training and testing data sets.

III. MACHINE LEARNING BASED REGRESSION
In this section, we discuss the detailed analysis procedure including the input and output observables, the DNN architecture, training, evaluation, and quality assurance.The estimation of systematic uncertainties is also covered.
Computers require a specific algorithm to perform a task in which the solution to the problem is written in a top-to-down approach and the control flows accordingly resulting an outcome.Yet, most of the problems come that gives the ability to the computers to learn correlations from data components.In the field of astronomy, DNN models have been used to map complex non-linear functions by using simulated data [43].This ability could be exploited to train ML-models to look for the hidden physics laws that govern particle production, anisotropic flow, spectra etc. in heavy-ion collisions.A small attempt is made here to estimate the elliptic flow, v 2 , by using powerful statistical tools implemented in DL algorithms to reproduce the observable of interest.For this purpose, we suggest the implementation of a feed-forward DNN in ML framework similarly as in Refs.[44][45][46].Generally, a heavy-ion collision event produces multitude of particles in the final state.Each particle interacts with designated detectors to leave a track.These detectors can gather track information such as the phase space observables i.e. p T , η, φ, charge etc. and help in identifying the particles.To estimate the elliptic flow coefficient (v 2 ) on an event-by-event basis, we propose to train the machine with various particle kinematic properties.As the number of tracks varies from event to event, it makes the case a bit tricky to follow the conventional matrix based input space of a fixed order in a feed-forward DNN model to map the single output observable, here the v 2 and the energy or multiplicity scaling of the v 2 .We propose an alternative yet convincing way to deal with this problem which is described below.

A. Input to the Machine
In this study, to train a DNN based ML-regression model for the target (output) variable v 2 , binned (η−φ) coordinate space for all charged hadrons in an event, has been taken as the primary input space.Here, η ∈ [−0.8, 0.8] and φ ∈ [0, 2π].The bin number is chosen through a proper evaluation, which is discussed in the upcoming Section III C. To add further kinematic information necessary for estimation of v 2 , we have included three secondary layers to the (η−φ) space.The three different layers carry different weights as additional input for the ML-model.These three layers are weighted by the transverse momentum (p T ), mass of the charged particle and log( s NN /s 0 ), a term related to the centerof-mass energy, where √ s 0 = 1 GeV makes s NN /s 0 unit-less.Figure 1 shows the three layers of the weighted (η−φ) space having 32 × 32 bins each for a single minimum bias Pb-Pb collision at √ s NN = 5.02 TeV from AMPT model simulation.Different colours are used to represent the different layers.The p T , mass and energy weighted plots are shown in blue, red and green colours respectively.The number of features, thus, would be 32 × 32 × 3 = 3072.These feature values for an event could be extracted as the bin content after filling the respective two dimensional histograms with the tracks as shown in Fig. 1.Thus, all the kinematic information from an event is mapped to these 3072 features which serve as the input neurons to DNN.Now, the input space with a fixed feature set (matrix of dimension 1 × 3072) is ready to be used for training.We have considered all charged particles with transverse momentum cut 0.

B. Deep Neural Network
Once being said about the input and output observables, now it is time to discuss the ML-regression algorithm.In high energy nuclear and particle physics, two of the most preferred ML algorithms are BDT and DNN.The DNN is a powerful tool in Machine Learning and has been applied to numerous problems in HEP such as classical papers [47,48], jet tagging [49][50][51], PID and track reconstruction [52][53][54] and heavy-ion physics [45,46,55,62].The interested readers may refer [56] which contains an excellent collection of ML papers in particle physics, cosmology and beyond.A neural network is inspired from the biological neurons in animal brains where information is processed and communicated as signals through a proper pathway of neurons to realize an action or take a decision.DNN has several components.Starting from the input layer, where all the features are present which needs to be mapped to the output layer.The exact mapping function is not known and DNN is trained to learn the mapping by itself.The network consists of further intermediate (hidden) layers with different sets of nodes and finally the output layer.Each of the layers consisting of several nodes are connected to a subset of nodes in the next layer.If all the nodes of the previous layer connects each of the nodes of the present layer, then it is called a dense layer.A network consisting of several hidden dense layers is called as a deep network.The connection between two nodes in any adjacent layer is made mathematically with some weights and biases.
The existing problem of estimating v 2 from particle kinematic properties is of supervised regression kind.To learn the mapping function from the data, the neural network is shown with many events consisting of input (3072 features) and the corresponding true value of the output (v 2 ) for that event.A loss function evaluates the difference between the true value from simulation and the network output.The minimization of the loss function is performed through an optimizer algorithm which basically updates the weights and biases at each node at every stage of the training.An activation function is used to introduce nonlinearity into the model.This step is one of the crucial aspects of the network.Together with the linear transformations carried by the well optimised weights and biases, and the nonlinear effect of the activation function, a DNN can approximate solutions to complex nonlinear mapping functions.
The Deep Neural Network used in this regression problem is shown pictorially in Fig. 2. The network begins with the input feature layer which is mapped to the first dense layer with 128 nodes.It has further three hidden dense layers namely dense 1, dense 2 and dense 3 each having 256 nodes.For the input and hidden layers, rectified linear unit (ReLU) activation function is used [57] which is of the form ReLU(x) = max{0, x}.The output dense layer named as dense 4 consists of a single node as the v 2 and the linear activation function is used for this layer.The network is trained with the adam algorithm [58] as the optimizer with mean-squared-error (MSE) loss function.All the layers are dense layers.
Deep Neural Network also faces the standard overfitting issue like every other ML algorithm.Over-fitting is the scenario where the model picks up super-fine details of the training data set but it performs poorly with the validation set.A properly trained model should be stable over a large set of the training data and achieve minimum difference between the training and validation loss without compromising the accuracy of the prediction.To tackle the over-fitting issue, we have tried (i) dropout technique [59] and (ii) L2-regularization [60].These methods, although helped in mitigating the overfitting issue, yet drove the network far from accurate prediction, hence not used.Finally, a simple early stopping mechanism is used to solve this problem.The training is stopped if over-fitting is observed over a specified patience level.In this case, we have used a maximum of 60  The mean-absolute-error (MAE) for v 2 which is defined in Eq. 2 is found to be ∆v 2 = 0.0073.The predicted v 2 has a very good agreement with the true v 2 as the v true 2 = v pred.
2 straight line shown in dashed black line is seen to be nicely populated in Fig. 4.
C. Quality assurance The selection of a bin number (grid dimension in Fig. 1) for the two dimensional (η−φ) space is done by training the model with input of varying bin numbers.The performance of the DNN with different number of bins in the input space has been listed in Table I.Deep Neural Networks are known to be quite stable and robust to random noise or fluctuations in the data [50].To check the noise sensitivity of prediction of our working DNN model, we proceed with this simple test.In this study, there are 3072 input features (F i,j ) for each event, i is the i th -event and j is the j th -feature.To   introduce noise to each of these features, one must need to evaluate the standard deviation (σ j ) associated with each of these features (j).Following the central limit theorem, each of these features must describe a Gaussian density function.We assume that the noise (X i,j ) introduced to each feature value (F i,j ) should be proportional to a random number between (−σ j , σ j ).Hence, we randomly select a number X i,j , such that X i,j ∈ (−σ j , σ j ).Now, we introduce w which is the weight factor.The weight factor helps to define the magnitude of the noise.For each feature value F i,j , we then add the noise X i,j /w, such that F i,j = F i,j + X i,j /w.A higher value of w would correspond to lower noise and vice-versa.Again, a smaller value of w would broaden the width of the feature distribution and hence changing its true shape and evidently affecting the DNN input.This can be treated as an imperfect simulation dataset.In Fig. 5, we have evaluated the noise sensitivity of the DNN model by taking different weights.The ratio of DNN to AMPT shows the degree of agreement between the simulation and the machine prediction.For perfect prediction, the ratio should be exactly equal to one.The width of each band shows the statistical uncertainty associated with it for each centrality class.As we can clearly observe, the smaller value of w gives larger noise to the dataset, resulting in a greater deviation from the true value.Whereas a reasonable amount of noise in the dataset does not affect the prediction of the DNN much.This test helps us understand the noise sensitivity of the DNN model.We can conclude that the DNN model is not that sensitive to random noise and hence pretty stable and accurate.
The deviation between the model prediction for a fair simulation and a noisy simulation with a certain w could constitute the systematic uncertainty of the method.We consider w = 0.5 which is reasonable for estimation of the systematic uncertainties which are included in the centrality wise predictions.Note, similar supervised regression problems on estimation of impact parameter in high-energy heavy-ion collisions have been investigated with convolutional neural network (CNN) based models [62,63], and PointNet model [64], where authors use image-like inputs for RHIC and lower energies.For our case, we have used a matrix based input derived from the image-like histograms, and it is found that the proposed DNN model is quite efficient in this study.We also recommend to try CNN and PointNet model based approach which are specialized algorithms for handling image-like inputs directly for mapping the flow coefficients.It can be taken as an outlook for the present work, and it is not covered here.

IV. RESULTS AND DISCUSSIONS
We have made an attempt to implement the DNN in the ML framework in heavy-ion collisions using a novel technique to include the particle kinematic properties to estimate the elliptic flow coefficient, v 2 .The training is done with minimum bias Pb-Pb collisions at √ s NN = 5.02 TeV generated from AMPT and the model is applied to successfully predict centrality-wise v 2 for Pb-Pb collisions at √ s NN = 5.02, 2.76 TeV and Au-Au collisions at √ s NN = 200 GeV, shown in Fig. 6.We have compared DNN predicted v 2 values with the experimental observations at the LHC (left and middle) [41] and the RHIC (right) [65].
One can clearly see from the AMPT to data ratio plots for the LHC energies that, with the current settings, AMPT reproduces the data very nicely.However, there are some level of discrepancies for the most central(peripheral) cases.It should also be noted here that the estimation of v 2 by ALICE using |∆η| > 1 could have some level of non-flow effects.In addition, different method of flow estimations also introduce a degree of uncertainty/mismatch.For RHIC energy, the current settings over-estimate v 2 as compared to the PHENIX result.This could be avoided using a specific tune valid for the top RHIC energies as proposed by the authors of AMPT [66].Here, we stick to the LHC tunes of AMPT mentioned in Section II A, as the primary focus of the paper is towards the study of DNN implementation in heavy-ion collision system.GeV.This is well understood from the bottom ratio plots.One can interpret these results as the fact that the proposed DNN model with the selected input parameters learns and preserves the centrality and energy dependence of elliptic flow.), in this case, is less than 6.0%.For the region v true 2 > 0.2, ∆v 2 seems to increase sharply for both RHIC and LHC energies.This could be due to the fact that event statistics for higher elliptic flow values (v true 2 > 0.2) are naturally less and the model sees less examples for this region in an inclusive and unbiased training data set.However, the maximum relative error in this region is found to be between (6-10)% only, which reflects the overall satisfactory performance of the proposed DNN model in a wide range of v true

V. SUMMARY
In summary, we report the implementation of DNN model in ML framework with a novel technique to include particle kinematic properties in heavy-ion collisions and estimate the elliptic flow.The proposed DNN model uses the kinematic properties such as p T , mass and log( s NN /s 0 ) -a term related to center-of-mass energy as model input.These information are encoded as the secondary layers in (η−φ) primary input space.The model is trained with minimum bias Pb-Pb collisions at √ s NN = 5.02 TeV from AMPT simulations.The performance of the model is tested under random fluctuations in data and the systematic uncertainties are cal-culated.Further, the trained model is applied to predict centrality-wise evolution of v 2 for Pb-Pb collisions at √ s NN = 5.02, 2.76 TeV and Au-Au collisions at √ s NN = 200 GeV.From the results, a very good agreement is observed between the model prediction and simulated true value.The proposed DNN model seems to learn and preserve the centrality and energy dependence of elliptic flow pretty nicely.On top of that, the same model is successfully applied to study the p T dependence of v 2 .From the results, the proposed DNN model seems to preserve the p T dependence of v 2 in heavy-ion collisions as well.
It should be noted here that, the machine learningbased model, specially used for high-energy physics, becomes more useful when the Monte-Carlo simulations explain experimental data as closely as possible.The MLbased model could be more realistic by taking into account the detector resolution and noise during training, which hopefully contain both the correlated and uncorrelated noises.Authors are planning to investigate these issues in a future study.

Software
Although there are plenty of software available for implementing a DNN model, we have specifically used Keras v2.6.0Deep Learning API [67] with Tensorflow v2.6.0 [68] backend in Python, to implement the DNN model used in this work.We also found the scikit-learn ML framework very helpful [69].

FIG. 1 :
FIG. 1: (Color online) The (η−φ) space with 32 × 32 bins showing the three layers of information for a single minimum bias Pb-Pb collision at √ sNN = 5.02 TeV from AMPT model.The pT, mass and log( sNN/s0) weighted plots are shown in blue, red and green colours respectively.

FIG. 3 :FIG. 4 :
FIG. 3: (Color online) Evaluation of mean squared loss using the adam optimiser as a function of epoch size in the DNN during the training and validation for Pb-Pb collisions at √ sNN = 5.02 TeV (min.bias) collisions from AMPT model.

Figure 3 ,
Figure 3, shows the training and validation performance of the DNN model by evaluating the meansquared-error (MSE) loss as the function of epoch size.The training and validation curves show sharp decrease with an increase in epoch.The interesting thing here is to note that, after a certain epoch size, the training is stopped with the early stopping callback to ensure there is minimal over-fitting.The difference between the val- With 8 × 8, 16 × 16, 32 × 32 and 64 × 64 bins, the model is evaluated with 50K training events and 5K testing events with the exact settings for the DNN mentioned in Section III B. From the performance of the model, it is seen that input space with both 8 × 8 and 16 × 16 number of bins, perform poorly with testing mean-absolute-error (MAE) being 0.0292 and 0.0171 respectively.For 32 × 32 bins, the model performed decently with testing MAE = 0.0102.Training with even higher bins in the input space i.e. 64 × 64 not only renders the training slower as seen from the very high CPU time epoch ≈ 6 sec but also gives even worse testing MAE = 0.0113.In this case, the number of trainable parameters of the DNN becomes too high (≈ 1.7 M) which naturally slows down the training process.Evidently, 32 × 32 number of bins is taken as an ideal input size with respect to prediction accuracy and efficient training time.The specifications of the CPU used for the performance are as follows.The CPU type is Intel(R) Core(TM) i5-8279U (Released Q2'19) which has four cores (eight threads) clocked at base frequency 2.40 GHz and has a max turbo boost frequency of 4.10 GHz [61].The system has 8 GB of LPDDR3 RAM clocked at 2133 MHz.The training is done with batch size fixed at 32.It could be noted here, due to the stochastic nature of the training process and type of CPU used, slight discrepancy in the performance of the DNN is expected.

FIG. 5 :
FIG. 5: (Color online) Noise sensitivity test of the DNN model with respect to simulated values from AMPT.Smaller value of w defines greater amount of noise in the dataset.

2 .
Finally, Fig.8shows the v 2 (p T ) for(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)% central Pb-Pb collisions at √ s NN = 5.02 TeV predicted from the DNN model.For comparison, we present ALICE measurement[41] and v true 2 calculated from AMPT simulations for the same centrality class.The DNN prediction has a similar trend of v 2 (p T ) but it over-predicts the AL-ICE data in intermediate p T , while it explains the data at low-and high-p T .On the other hand, predicted v 2 has very good agreement with the v true 2 from AMPT for the full range of p T .This could be seen from the DNN to AMPT ratio plot on the bottom panel which stays almost close to unity.The quadratic sum of systematic and statistical uncertainty is shown in the solid red band on the upper panel v 2 (p T ) plot.On the bottom panel, in DNN to AMPT ratio plot, the red band shows the statistical uncertainty and the dashed band shows the systematic uncertainty for this centrality class.

FIG. 8 :
FIG. 8: (Color online) Estimation of v2 as a function of pT for (30-40)% centrality class in Pb-Pb collisions at √ sNN = 5.02 TeV from AMPT using the DNN model.ALICE results for comparison.
It helps in faster convergence of the regression estimators.This step also ensures that the coefficients of the model are small and in return, the model is less complex.
2 < p T < 5.0 GeV/c in pseudorapidity, |η| < 0.8 for the training of the minimum bias ML-regression model.Before moving to the next step, which is the model design and training, the input array is normalised using the L2-Norm which makes the square root of sum of squares of the elements in the array equals to one.This way, the raw feature vectors are transformed into a more machine friendly represen-tation by standardization of the data.

TABLE I :
Performance