Detecting Chiral Magnetic Effect via Deep Learning

The search of chiral magnetic effect (CME) in heavy-ion collisions has attracted long-term attentions. Multiple observables have been proposed but all suffer from obstacles due to large background contaminations. In this Letter, we construct an observable-independent CME-meter based on a deep convolutional neural network. After trained over data set generated by a multiphase transport model, the CME-meter shows high accuracy in recognizing the CME-featured charge separation from the final-state pion spectra. It also exhibits remarkable robustness to diverse conditions including different collision energies, centralities, and elliptic flow backgrounds. In a transfer learning manner, the CME-meter is validated in isobaric collision systems, showing good transferability among different colliding systems. Based on variational approaches, we utilize the DeepDream method to derive the most responsive CME-spectra that demonstrates the physical contents the machine learns.

Introduction.-Quantum chromodynamics (QCD) is the standard theory describing the physics of the strong interaction. Among the studies on QCD, the proposal of using chiral magnetic effect (CME) to reveal the vacuum structure of QCD is of great importance [1][2][3]. It predicts that in hot and dense quark-gluon plasma (QGP), the topological fluctuations of gluon fields can cause imbalance between the number of left-handed and right-handed quarks, and this difference can induce an electric current under external magnetic field.
High energy heavy ion collisions (HICs) provide an environment for CME to take place. However, QGP and strong magnetic field required for giving rise to CME exist only in the early stages of the collisions. To retrieve the information of possible CME from the final-state hadrons, multiple observables were proposed [4][5][6][7][8], such as the γ-correlator (see definition in below). However, due to the large contributions of elliptic flow and other background noises [9][10][11], these observables can not clearly recognize CME or its induced charge separation (CS) in QGP along the magnetic-field direction.
Although it is difficult to detect CME through specific observables, analyzing the final-state hadronic spectrum as a whole in the sense of big data would help reveal hidden fingerprints of CME. Deep learning is a branch of machine learning which shows its powerful ability in recognizing patterns and structures with complex correlations [12,13]. With the hierarchical structure of artificial neural networks, deep learning is particularly effective in tackling complex systems that cannot be easily handled by conventional techniques. Recently, significant progress has been made in applying deep learning to physics studies, including nuclear physics [14][15][16][17][18][19][20][21], particle physics [22][23][24][25][26], and condensed matter physics [27][28][29][30][31]. In this Letter, we explore the possibility of using deep learning to determine whether there are detectable final-state signals of CME that survive the collision dynamics and background in-terference, thus providing us a new feasibility to detect CME in HICs.
FIG. 1. The convolutional neural network architecture with π + and π − spectra ρ ± (pT , φ) as input. The periodic boundary condition (p.b.c) is recovered both for the input spectra and the 2D convolution operation.
Method.-In this section, we introduce a deep learning model containing convolutional neural networks (CNNs) to detect the CME signals in HIC systems. The architecture of the neural network is shown in Fig. 1. For the purpose of supervised learning, we prepare the training data-set from the string melting AMPT model [32] 1 . In order to implement the CME in AMPT model, we adopt a global CS scheme 1 The AMPT model is a transport model which is widely used to simulate the evolution of both partonic and hadronic matter in HICs and has been arXiv:2105.13761v3 [hep-ph] 17 Jun 2021 first employed for Au+Au collisions in Ref. [41]. Within such a scheme, the y-components of momenta of a fraction of downward moving u quarks and upward movingū quarks are switched, likewise ford and d quarks ("upward" and "downward" are refer to the y-axis which is perpendicular to the reaction plane). This gives a right-dominant CME event, and for a left-dominant one, we switch quarks with momenta opposite to those in the right-dominant case, or just rotate the event along the z-axis (along beam direction) by 180 • . The CS fraction f is introduced as where superscript labels the sign of charge and subscript labels the direction of momentum along y-axis. Events with f = 0% belong to "no CS" class and are labeled as "0", while events with f > 0% are in "CS" class and labeled as "1". With supervised learning, the deep CNNs are trained to distinguish these two classes from the labeled data. As for the input to the CNNs, we prepare from AMPT simulation a series of 2-D spectra ρ ± (p T , φ) of charged pions (π + or π − ) in the final state with 20 transverse momentum p T -bins and 24 azimuthal angle φ-bins 2 . The data-set consists of Au+Au collisions with colliding energy √ s N N =7.7, 11.5, 14.5, 19.6, 27, 39, 62.4 and 200 GeV, with CS fraction f = 0% or f > 0%, all divided into 6 centrality bins in the range 0∼60%. Each species of collision conditions contains 50,000 events. To reduce fluctuations, 100 events with the same collision condition and dominant chirality are randomly picked out. Their pion spectra are averaged and normalized to form a single sample in our training batch. To preserve the mirror symmetry rooted in data, every sample is accompanied by its copy which is flipped along y-axis. It can also be viewed as exchanging the initial distribution of nucleons between the projectile and target nuclei, which naturally provides data augmentation and reduces redundancy in training the CNNs. To eliminate the ambiguity of introducing the CME under various conditions, we take two species of data-set with different CS fractions to train the deep CNNs. They are f = 5% and 10% "CS" events mixed with equivalent amount of "no CS" events. The corresponding well-trained CNN models are named as (0%+5%) and (0%+10%), respectively, in Table   proven to be successful in describing the experimental data of harmonic flows [33][34][35][36], global polarization [37][38][39], QCD phase transitions [40], and so on. 2 We maintain the p.b.c in φ in all the convolution layers in the network, so as not to lose correlations near φ = 0 and 2π, while p T follows the original boundary condition of the convolution layer. I, with their performance on recognizing the CS signal also shown. The validation accuracy of the (0%+5%) model is less than the other one. It indicates that less distinctiveness among the data in (0%+5%) case makes it harder for achieving high classification accuracy compared to the more discriminative data in the latter case. In spite of the discrepancy between the two models, their performance is robust against various collision conditions, such as the √ s N N and centrality, see Fig. 2 for centrality dependency. The performance of the network also reflects that the CS signals are not totally diminished or contaminated after the collision dynamics, and can be visible with the help of deep learning. As to the generalization ability of the trained network, both models are verified in different collision energies and initial CS setups, which is shown in Suppl. I. With the increasing of CS fraction f , both models demonstrate improvements of the prediction. In Fig. 3, predictions are sorted into true-positive (TP) and false-negative (FN) cases, in which true/false is relative to the ground truth label, whereas positive/negative is the predicted label. With CS fraction increasing, the FN curve stops at f = 10% because both models can 100% correctly recognize CS events with larger CS fraction. CME-meter.-In effect, the trained neural network is a mapping between the input (charged pions' spectrum) and the output (CS signal), P (ρ ± (p T , φ)), which is learned supervisedly from the training data. The output of the network contains 2 components named as (P 0 , P 1 ) for each input spectrum with P 0 + P 1 = 1. The value of P 1 is identified as the probability that the network regards the input spectrum to correspond to the CS class, thus it is intimately related to the CS signal intensity in the event. In above, we have demonstrated that the trained network can efficiently decode the initial CS information purely from ρ ± (p T , φ). In this sense, the network acts as a meter to measure the probability of CME occurring in HICs. In following, we investigate the correlation between P 1 and γcorrelator to reveal a coherent account of this CME-meter.
The γ-correlator measures the event-by-event two-particle azimuthal correlation of charged hadrons, which is considered sensitive to CME [4]. It is defined as γ same = cos(φ α is the azimuthal angle of particle α with positive or negative charge, Φ R is the azimuthal angle of the reaction plane (Φ R = 0 in this work), and · · · represents average over particles in the event. In order to subtract charge-independent backgrounds, ∆γ = γ same − γ opp is also often used. In order to compare with the CME-meter, we define the relative difference of ∆γ as with 0 and 1 denoting the ∆γ of "no CS" and "CS" events as claimed above. The angle bracket means average over events.
Similarly, for the trained network, we define where the angle bracket is an average over testing data-set. The relative difference R γ and R CNN can both reflect the difference between "no CS" and "CS" classes. In Fig. 4 (a), the γ-correlator from AMPT simulation for Au+Au collisions at 200 GeV is shown in a centrality range (0, 60%) with experimental data also shown as triangle dots [42,43]. In Fig. 4 (b), the relative differences, R γ and R CNN , are plotted at different centralities and demonstrate distinctive comparison. The R γ tends to perform worse with increasing centralities, while good and robust performance of R CNN is kept in all centralities. This indicates that possible background contamination (increasing with centrality) may mask γ-correlator, while does not disturb much the CME-meter.
Such contamination is potentially proportional to the elliptic flow v 2 . In fact, previous studies reveal that the v 2 driven backgrounds can strongly interfere with the CME signal in γ-corerlator because both the magnetic field and v 2 have similar centrality dependence [44,45]. Thus, v 2 -induced ∆γ can emerge in both "CS" events and "no CS" events making ∆γ daunting to distinguish these two classes of events. To examine more closely whether the CME-meter receives v 2 influence, we depict P 1 versus v 2 /N (N is the multiplicity) in  in isobaric collisions [46][47][48][49][50]. In our case, the network is trained to recognize the CS signal in Au+Au collisions, whereas its generalization ability is tested by confronting Zr+Zr and Ru+Ru collisions. They are isobaric collision systems that are recently proposed specifically for CME search. Although Zr and Ru are deformed nuclei, moreover, their masses are about half of Au, the CME-meter constructed by the network trained over Au+Au data-set clearly recognized the CME signals in isobaric collision systems, as shown in Fig. 6, where the quantity presented is R iso = 2(logit(P Ru 1 ) − logit(P Zr 1 ))/(logit(P Ru 1 ) + logit(P Zr 1 )). It indicates a distinguishable difference between the two isobaric collision systems caused by P Ru 1 > P Zr 1 as measured from the CME-meter. Physically, it is because Ru has more protons, which may induce a larger magnetic field and thus cause a larger CS signal in Ru+Ru collisions. The results demonstrate that the CMEmeter could offer an alternative way to measure CS signal effectively in a range of collision systems, and it holds the robustness in confronting different test conditions, which is largely due to the joint efforts from a series of prepossessing operations inspired by physical insights including normalization, symmetrization, and boundary condition treatment.
Interpretable deep learning for CME.-The prediction P 1 (ρ ± (p T , φ)) from the well-trained network could be also understood as a CME-signal response to the spectrum ρ ± (p T , φ), which can be utilized to find the most responsive spectrum via the following variational treatment, Specifically, with the pion spectrum to be a variational Ansatz, we start from a flat spectrum ρ ± (p T , φ) = 1/X with X = 480 the total number of pixels of the spectrum, which derives P 1 = 0, and gradually tune the functional form of ρ ± (p T , φ) with the variational target to maximize P 1 (ρ ± 0 (p T , φ)), that is, to approach P 1 = 1. Note that the trained CME-meter network is fixed, through which gradient of its output with respect to its input, δP 1 (ρ ± (p T , φ))/δρ ± (p T , φ)), can be evaluated via back propagation and is provided as the guidance for the above spectrum tuning. The resultant "ground state" ρ ± 0 (p T , φ) could disclose the crucial patterns manifesting the CS signal in the perspective of the trained network. In machine learning language, the above procedure is the so-called DeepDream method [51], in which the variational tuning is implemented as gradient ascent algorithms. In Fig. 7, the "ground state" pion spectrum tuned from the DeepDream method is visualized (also see Suppl. II). Although this spectrum may neither be real nor physical, it shows the "CME pattern" that the network would response most dramatically. Similar visualization method has also been demonstrated in image recognition tasks [52]. The ρ ± 0 (p T , φ) explains the following basic features, • Charge conservation. During the procedure of variations, the charge conservation is reasonably reserved, which is examined by tracking the charge density of the spectrum.
• Dipole structure. The "ground state" spectrum from DeepDream variation intuitively displays the CS pattern. In the low p T regime (center), the distribution of pions induces an electric current, or a dipole downward, nevertheless it presents an opposite current which is larger in the high p T regime (p T ∼ 3 GeV).
It should be mentioned that ρ ± 0 (p T , φ) derived from Deep-Dream is a virtual spectrum whose local properties depend on the AMPT simulation and the well-trained neural networks. However, it offers a reliable way to evaluate the effectiveness of P 1 in detecting CME and help us reveal the physical contents the machine learns.
Summary.-In this Letter, we propose a deep convolutional neural network (CNN) model to detect CME signal in the simulated data-set from a multiphase transport model. With two different charge separation fractions (5% and 10%), the machine is trained to recognize the CME signal under supervision. It is worth noting that this well-trained machine provides a powerful meter to quantify the CME with different collision energies and centralities. The meter also shows a robust performance at different charge separation fractions. In comparison with the conventional γ-correlator, the CME-meter remains insensitive to the backgrounds dominated by the elliptic flow v 2 . We also extend the well-trained machine to other collision systems by means of transfer learning. Remarkably, the meter successfully recognize the CME signal and their difference from Zr+Zr and Ru+Ru collisions. It indicates that the knowledge of identifying CME signal in Au+Au collision could be transformed into the knowledge of detecting CME in other collision systems. In the end, DeepDream, a method used to visualize the patterns learned by CNNs, is applied as a validation test of adopting P 1 to detect the CME. It helps us drill the physical knowledge hidden in the well-trained machine, including charge conservation and special charge dis-tribution.  Although the (0%+5%) model does not exceed the (0%+10%) model as shown in Fig. 8, the situation converses in cases with diverse CS fractions. Figure 9 manifests that the (0%+5%) model has a better performance than the (0%+10%) one, which behaves as a higher accuracy in testing data-sets. The (0%+5%) model could ameliorate the over-fitting problem that caused by modeling the random noise in each event, rather than the intended outputs [53]. It endows the (0%+5%) model with a stronger generalization ability. However, (0%+5%) model could capture the CS signal sensitively, which can give rise to the overreaction. It eventually presents a prediction without enough resolution in different cases.

DeepDream
DeepDream modifies an image to increase the activation of certain patterns by gradient ascent method. Figure 10 is the DeepDream result revealing patterns of CS signal. Upper row of polar plots colored blue are spectra of π − , and the lower row of polar plots colored orange are π + . Two spectra in a column form a sample that we feed and is also fed back by the network. From left to right, DeepDream method gradually strengthens the activation of the "CS" class, thus visualizing the knowledge or physical content learnt in the training.