Deep Learning Exotic Hadrons

We perform the first model independent analysis of experimental data using Deep Neural Networks to determine the nature of an exotic hadron. Specifically, we study the line shape of the $P_c(4312)$ signal reported by the LHCb collaboration and we find that its most likely interpretation is that of a virtual state. This method can be applied to other near-threshold resonance candidates.


I. INTRODUCTION
Many hadron candidates that deviate from the quark model expectations [1] have been discovered in the last years [2,3]. The field of hadron spectroscopy has flourished attempting to provide a comprehensive picture of the new states. Many different approaches have been proposed to explain their underlying nature, becoming a playground for testing new techniques and novel physical interpretations [4][5][6][7].
To determine if an experimental signal corresponds to a hadron resonance, it is necessary to perform an amplitude analysis in order to extract its physical properties such as mass, width, couplings, and quantum numbers. Most of the data analyses follow a top-down approach, where the amplitudes are derived from a microscopic model. The advantage is that it assigns a physical interpretation to the signal. The caveat is that the results are biased by the assumed dynamics. Another possibility is to proceed in a bottom-up approach. By considering a number of minimally-biased amplitudes compatible with physical principles and fitting them to data, one can determine the existence and properties of resonances in the least model-dependent way. Even though in this approach there is no assumed microscopic model, it is still possible to deduce the nature of the underlying dynamics from the analytic properties of the amplitudes.
For example, both methods were recently used to provide an interpretation of the P c ð4312Þ signal found by LHCb in the Λ 0 b → K − J=ψp decay [8]. This measurement is of particular interest because, if due to a resonance, then it would contain five valence quarks, which is beyond the baryon lore. The signal peaks approximately 5 MeV below the Σ þ cD 0 threshold, making it a primary candidate for a hadron molecule. Near-threshold enhancements of the cross section are known phenomena in particle physics, e.g., the weakly-bound deuteron in proton-neutron scattering. The molecular interpretation was found to be compatible with the data in [9]. Another microscopic interpretation is that the signal is a kinematical effect generated by particle rescattering [10]. The P c ð4312Þ pole position was first obtained in [11] following a bottom-up approach, favoring a virtual state interpretation, i.e., an attractive interaction that is not strong enough to bind a state, as it happens, for example, in neutron-neutron scattering [12].
The evolution of computing capabilities during the last decades has allowed to develop and employ powerful numerical techniques to unravel the structure of matter, with machine learning acquiring a prominent role. In theoretical hadron physics, techniques such as genetic algorithms [13] and neural networks [14] have been exploited as fitters and/or interpolators. Recently, the idea of using deep neural networks (DNN) as model classifiers was benchmarked against the well-known nucleon-nucleon bound state [15] and pion-nucleon resonances [16].
In this work we develop and benchmark a systematic approach to apply DNNs as a model-independent tool to analyze and interpret experimental data. Following the bottom-up strategy, we construct generic amplitudes to train the DNNs. We then use it as a model classifier to infer the physical content of the data. One clear advantage compared to standard χ 2 fits is that DNNs determine the probability of each physical interpretation, as they learn the subtle classification boundary between them. We teach DNNs how to recognize these different phases by targeting the specific regions of the parameter space (which yield stable solutions) that might be difficult to reach during optimization, or might require high resolution spectra to detect. The result of this is that we no longer need to explore large parameter spaces, but can use DNNs to efficiently extract information from the spectra that determines on which side of the boundary we are located. As proof of concept, we apply this method to the P c ð4312Þ signal.

II. PHYSICS BASIS FOR THE NEURAL NETWORK
We focus on the J=ψp invariant mass distribution reported by the LHCb in [8]. This can be parametrized as [11,17] where s is the J=ψp invariant mass squared, BðsÞ and PðsÞ are smooth functions, and ρðsÞ the three-body phase space. The amplitude TðsÞ encodes the dynamics of the J=ψp rescattering and, in particular, contains the details of the P c ð4312Þ. Close to the Σ þ cD 0 threshold, it can be expanded as follows: where This Taylor expansion could originate from any microscopic model. The question is what is the range of validity? If other singularities close by were present, then the expansion would break down in this region. The impact of triangle singularities was estimated to be small in [8], and so was the relevance of higher-order terms in the expansion [11]. It should be noted though, that if other singularities were close enough to impact, then the model that describes them could be included in the training set of the DNN. In this case we mean to benchmark the approach by comparing to the known result in [11] as described by Eq. (2). This function can be analytically continued for complex values of s. Since the square roots are multivalued, the amplitude maps onto four Riemann sheets, represented in  1) is based on an expansion around the Σ þ cD 0 threshold, it is only reliable in its vicinity. We thus have to ensure that the DNN learn from the appropriate invariant mass window.
If m 12 → 0, then the Σ þ cD 0 channel decouples from the J=ψp one. In this limit, the P c ð4312Þ pole would become either a stable bound state, or a virtual threshold enhancement, depending on whether the pole would approach the positive or negative Imk 2 axis, as shown in Fig. 1. This is controlled by the sign of the m 22 parameter: if it is positive (negative), then the resonance corresponds to a virtual (bound) state. From the figure one can also appreciate that poles on the II (IV) sheet are more likely to be bound (virtual) states, as the sheet borders with the positive (negative) semiaxis.
We construct a training dataset of 10 5 line shapes, generated by evaluating Eq. (1) for intensity parameters uniformly sampled within a wide range of values (see Supplemental Material [19] for details). We then obtain 65 intensity values by evaluating the line shape in the [4.251, 4.379] GeV invariant mass region in 2 MeV bins. The intensity line shapes were convoluted with the experimental resolution, and 5% Gaussian noise was added to the signal, to have statistical uncertainties that resemble the ones reported in Ref. [8]. An additional validation set is generated to monitor the generalization performance of the model during training. To each line shape, we attach a label according to the bound/virtual nature and the Riemann sheet where the pole lays, i.e., bj2, bj4, vj2, and vj4. Figure 1 shows examples of (noiseless) training line shapes for each class.

III. TRAINING, VALIDATION, AND INVARIANT MASS WINDOW
We build a deep neutral network using the PyTorch framework [20] that takes (noisy) line shapes as an input and predicts the corresponding class fbj2; bj4; vj2; vj4g. For optimization purposes, the line shapes are first rescaled between 0 and 1 as a normalization before we feed them into the network. The deep neutral network consists of an input layer with as many nodes as there are energy bins, followed by two fully-connected hidden layers with 400 and 200 nodes, respectively, and finally an output layer with four nodes that correspond to the four classes. After each hidden layer we set a dropout probability that randomly sets nodes to zero, to improve generalization performance. The deep neutral network is trained using the Adam optimizer [21]. The details of the procedure are given in the Supplemental Material [19]. We train the DNN for 100 passes of the full training dataset through the DNN, also known as epochs. Figure 2 shows the training and validation sets accuracy for different levels of Gaussian noise, as well as the confusion matrix for the case of 5% noise. This shows how the experimental uncertainty limits the accuracy, as expected. With this setup, the DNN learns the subtleties of the intensity line shapes associated with each one of the four different resonant pole structures. However, in order to obtain our final DNN classifier, we need to select the appropriate invariant mass window around the P c ð4312Þ signal, where we allow the DNN to attribute importance. We introduce a systematic method based on Shapley additive explanations (SHAP) values [22] to select a proper window. Using SHAP values, we can break down a prediction to show how each bin impacts classification. Therefore, we train a first deep neutral network to a wide range of invariant masses values imply a large impact of a given mass bin on the classification, as shown in Fig. 3. The mass interval used here is the same as in [11]. This choice is confirmed by the SHAP values analysis shown in Fig. 3. However, we checked that the results are qualitatively unchanged even if a wider window is selected.

IV. SIGNAL ANALYSIS
We are now in a position to generate predictions on the nature of the pole on the actual experimental LHCb data. We pass the three datasets from [8] through the DNN. We remind that one is the original Λ 0 b → K − J=ψp dataset, while two have sharp or smooth cuts that suppress the background from Λ Ã resonances. The output probabilities for each class are summarized in Table I. It is apparent that the virtual interpretation is strongly favored, specifically the vj4 class. To properly quantify the uncertainty of this prediction, we use two Monte Carlo based methods: bootstrap [23,24] and dropout [25]. Both methods aim at producing probability densities for the generated predictions on the LHCb data, as detailed in the Supplemental Material [19], and yield the same conclusions. The probability densities of the four classes are shown in Fig. 4. Class vj4 is heavily preferred, while bj4 is strongly rejected. Classes bj2 and vj2 attain low probabilities, in particular, for the datasets with background rejection. Hence, we conclude that a virtual state with its pole placed on the IV Riemann sheet is the highly preferred interpretation of the P c ð4312Þ signal.
The DNN classifier can provide further information on which invariant mass region contributes most strongly to this prediction, by repeating the SHAP analysis for the experimental data, as shown in Fig. 5. It is apparent how the region close to threshold determines the DNN classification. Slightly above the Σ þ cD 0 threshold, data favor the vj4 class, while rejecting the vj2 one. Below threshold, the vj4 and vj2 classes are preferred to bj2, and bj4 is rejected.

V. CONCLUSIONS
We presented a proof of concept of how machine learning can be used to further our understanding of exotic hadrons. We trained a neural network to learn the details of line shapes corresponding to different resonance interpretations, based on an effective range expansion of the amplitude close to the relevant threshold. We apply this method to determine the nature of the P c ð4312Þ signal seen by LHCb. We determine the probability of each of the classes of interest, given the experimental uncertainties and resolution. A DNN classifier significantly favors a virtual state interpretation, i.e., generated by an attraction force not strong enough to form a bound state, thereby confirming the findings in Ref. [11]. By adding SHAP value analyses, we study how each data point impacts the selection. This technique also allows to assess which set of physical variables (in this case the energy range) is relevant to a specific hypothesis, which in standard approaches is often a heuristic guess. Our technique can be directly applied to other (non)exotic signals close to a threshold opening.
We foresee various followups. For example, one can reuse parts of DNN classifiers that generate line shape representations (i.e., parameters in the first layers) and reapply it to new data (so-called "transfer learning") to obtain general resonance classifiers across scattering channels, and predict which physics underlies other reaction data.