Classifying Pole of Amplitude Using Deep Neural Network

Most of exotic resonances observed in the past decade appear as peak structure near some threshold. These near-threshold phenomena can be interpreted as genuine resonant states or enhanced threshold cusps. Apparently, there is no straightforward way of distinguishing the two structures. In this work, we employ the strength of deep feed-forward neural network in classifying objects with almost similar features. We construct a neural network model with scattering amplitude as input and nature of pole causing the enhancement as output. The training data is generated by an S-matrix satisfying the unitarity and analyticity requirements. Using the separable potential model, we generate a validation data set to measure the network's predictive power. We find that our trained neural network model gives high accuracy when the cut-off parameter of the validation data is within $400$-$800\mbox{ MeV}$. As a final test, we use the Nijmegen partial wave and potential models for nucleon-nucleon scattering and show that the network gives the correct nature of pole.


I. INTRODUCTION
Renewed interest in hadron spectroscopy started after the discovery of X(3872) in 2003 [1]. Since then, several candidates of nonstandard exotic hadrons are proposed. One common feature of these phenomena is that they manifest as sharp peak structure near some threshold [2]. The proximity of an enhancement to the threshold introduces several possible nature of peak's origin. One of the appealing possibilities is a weakly bounded hadronic molecule composed of two hadrons [3,4] which can be associated to the presence of a pole near the two-particle threshold. Other possibilities are purely kinematical in nature such as cusps or triangle singularities [5]. On one hand, threshold cusp is always present in s-wave scattering whenever an inelastic channel opens. However, it has been shown in [6][7][8][9] that threshold cusp can only produce a significant enhancement provided that there is some near-threshold pole even if it is not located in the relevant region of unphysical sheet. On the other hand, triangle singularity does not need nearby pole to produce a pronounced enhancement but instead requires that intermediate particles be simultaneously on-shell [5].
The purpose of this paper is to address the origin of sharp peak observed around the threshold of two-body hadron scattering problems. We specifically focus on the case where a near-threshold pole causes the peak structure and attempt to identify its nature, i.e. whether it is bound, resonance or virtual state pole. Until now, there has not been a method to distinguish the pole origin of peak structure around the threshold. In general, this a difficult program because of the limited resolution of experimental data.
Here, we treat the identification of the nature of pole causing the enhancement as a classification task [10] and solve it using supervised machine learning. The ma- * sombillo@rcnp.osaka-u.ac.jp chine learning technique is ubiquitous even in physical sciences [11] and it is well known that deep neural network excel in solving a classification task. In this work we demonstrate how a deep neural network can be applied to identify the pole origin of cross-section enhancement. This includes defining the appropriate input-output data, setting up the network architecture and generating the training dataset. As a first effort to apply deep learning in the classification of pole causing a cross-section enhancement, we only consider here the single-channel scattering.
This paper is organized as follows. In section II we give a short background on how a neural network works. One of the crucial part of deep learning is the preparation of dataset. In section III we describe how the training dataset is generated using the general properties of Smatrix. The performance of our neural network model using the training dataset is discussed in section IV. In section V we explore the applicability of our trained network using a separable potential model to generate a validation dataset. We also use the partial wave and potential models of Nijmegen group as a final test in the same section. Finally, we formulate our conclusion in section VI.

II. DEEP NEURAL NETWORK FOR POLE CLASSIFICATION
We briefly review the basic operation in deep learning [12] and discuss how it can be applied to pole classification problem. A neural network consists of an input, hidden layers and an output layer where each layer contains certain number of nodes. We use the term deep neural network for architectures having more than one hidden layers. Fig. 1 shows the deep neural network setup that we used in this study. The nodes x i 's in the input layer contain numerical values describing certain features of the input data while nodes that are not in the input layer are equipped with activation functions with range (0, 1) or (0, ∞), whichever is applicable. The nodes in ( − 1) th layer are sent to each th layer node by putting them in a linear combination is the i th node pre-activation value in the th layer , h ( −1) j is the j th node post-activation value of the ( − 1) th layer, w ij is the weight connecting j th node of ( − 1) th layer to i th node of th layer and b ( −1) is the bias in ( − 1) th layer. In this notation, input nodes are represented as is fed to the activation function to get the node's postactivation value: This arrangement of layers and nodes together with the choice of activation functions allows the neural network to build a nonlinear mapping of input vector x to output vector y. The goal of deep learning is to find an optimal mapping between x and y. To do this, one has to prepare a training dataset containing inputs with known outputs. Initially, some random weights and biases are assigned to the neural network. Then we perform a forward pass, i.e. we feed all the training inputs and let the network calculate all the outputs. Now, the average difference between true output and network's output define the cost function C(ŵ, b) whereŵ and b are the initial weight matrix and bias vector, respectively. The weights and biases are updated using the gradient descent method via backpropagation [13]. One forward pass together with one backpropagation of the entire training dataset comprise one epoch of training. Several epochs are normally executed to update the weights and biases until the cost function reached its global minimum. The neural network architecture with its updated weights and biases correspond to the optimal map that we seek.
In this study, we construct a deep neural network with the cross-section of two-body scattering, |f (E cm )| 2 , on a discretized center-of-mass energy axis [0, 100 MeV] with 0.5 MeV spacing as input and a vector with three elements as output. The output nodes correspond to three distinct pole classifications, i.e. bound state, virtual state or resonance as shown in Fig.1. The classification of pole is described as follows. Suppose p 0 represents the pole position on the complex momentum plane C, then we say that it is a bound state pole if p 0 is positive pure imaginary. If Im p 0 < 0 and |Im p 0 | > |Re p 0 |, then p 0 is a virtual state pole. Otherwise, if |Im p 0 | < |Re p 0 | we call it resonance (see Appendix of [14] for detailed explanation).
To obtain the optimal values of weights and biases, the network must be trained using a dataset of cross-section with known enhancement origin. This will be explained in the next section.

A. General Properties of S-Matrix
Ideally, a reliable neural network model that can distinguish the nature of pole responsible to the cross-section enhancement must be optimized using a training dataset generated from an exact S-matrix. However, such an Smatrix cannot be derived from the fundamental theory of strong interaction QCD for hadrons due to its nonperturbative nature. In such a situation, we can still deduce the general form of S-matrix using the analyticity and unitarity requirements [15][16][17].
Consider the s-wave scattering of two particles with mass m 1 and m 2 , reduced mass µ = m 1 m 2 /(m 1 + m 2 ) and relative momentum magnitude of p. The S-matrix can be parametrized as satisfying unitarity provided that K(p) is the real-valued K-matrix [18][19][20]. At energies near the location of Kmatrix pole M , we can write K = g 2 /(E − M ) + c where E = E 1 + E 2 with E i as the energy of particle m i and g , c are reals. Analyticity and K(−p) = K(p) are satisfied in the non-relativistic case, i.e. E = p 2 /(2µ), by the parametrization where g 2 = 2µg 2 and M = 2µ(M − m 1 − m 2 ). From the S-matrix in (3), one can obtain the partial wave amplitude using the relation Consider now how the K-matrix parameters dictate the singularities of S-matrix in (3). If we substitute K(p) into S(p), we get and the pole position is obtained from Taking the complex conjugate of (7) and knowing that µ, g 2 and M are reals, we can recover the same equation as that for p, i.e. −p * satisfies the same cubic equation. This means that the denominator of S(p) in (6) contains a factor (p + iβ) 2 − α 2 which gives a conjugate pair of poles with real α, β. The third unpaired solution to (7) must have the property p = −p * . This can only be true if p is pure imaginary. In fact it is possible that all the solutions of (7) are pure imaginary. It follows that we can write (6) in factored form as where α, β, γ are real numbers that are related to g 2 , M and c parameters. For c = 0 we only have a pair of conjugate poles given by and we readily identified β = µg 2 /2 and α = M − (µg 2 /2) 2 . Note that β > 0 is required to avoid having S-matrix poles on the upper half momentum plane (other than the imaginary axis), otherwise causality is violated [15,21]. For c = 0, a third imaginary pole iγ appears and α, β are modified according to: These are obtained by comparing the expansion in the denominator of (8) with that of (7). A dimensionless quantity ξ is introduced to facilitate the comparison and for given values of µ, g 2 and M , ξ is an implicit function of c given by with ξ → 1 as c → 0 or c → ±∞ (see Fig.2). The bounded ξ = ξ(c) implies that the third pole iγ will originate from ±∞i as c becomes nonzero. For c > 0, we can generate a simple pole at p 0 = iγ on the upper half momentum plane and if we let c → +∞, this pole gets very close to the threshold. This corresponds to a bound state in accordance to the completeness relation in [22]. Now, as we vary c from zero to some negative value, the poles redistribute themselves as shown in Fig.3. Here, we see an instance when all the three poles are pure imaginary and at some finite values of c, two of the poles will merge and turn into conjugate pair as seen in Fig.3(a) and (b). The merging of poles happens at some c < 0 when the slope of ξ becomes infinite as shown in Fig.2. This demonstrates that the constant term in (4) is capable of generating S-matrix pole and should not be treated as background (see also [23]).
The conjugate pair of poles in (8) will always have β > 0 for all values of c. For c → 0, ξ approaches unity and we recover (9) with β > 0. Also, as c → +∞, (10) gives a positive β since 0 < ξ < 1. Finally, if c < 0 we see from demonstrating that causality is not violated for all values of c.
The form of S-matrix in (8) and its relation to Kmatrix in (4) allows us to identify the parenthetical factor as the generator of pure imaginary momentum pole and the square-bracket factor as the generator of conjugate poles. To avoid ambiguity in the classification it is more plausible to separate these two factors. That is, the first factor will only be used to generate the bound-virtual dataset while the second factor will be used to generate conjugate virtual-resonance dataset. The two datasets will be combined as a single classification dataset before we use it to optimize the parameters of our neural network. This will suffice to assign three distinct outputs in our neural network, i.e. bound, virtual and resonance. Note that one can also use the combined form in (8) but a "bound with resonance" must be added to the output entry. This additional category is not yet relevant in the current study.

B. bound state and virtual state
Let us first consider the threshold-enhancement caused by a shallow bound state or a virtual state in s-wave amplitude. From previous discussion, we learned that the first factor in (8) can be used to generate a nearthreshold bound or virtual state pole. A closer look will reveal that this gives an identical cross-section whatever the sign of γ. That is, with S(p) = −(p + iγ)/(p − iγ) we get |f (p)| 2 = 1/(p 2 + γ 2 ) and there is no way to distinguish between virtual and bound state enhancements. In general, there is background contribution in addition to the pole part of S-matrix, making it possible to distinguish the two enhancements. Thus, it is imperative to include a background to the S-matrix parametrization for the bound-virtual dataset, i.e.
where δ bg (p) is the background phase. The form of δ bg (p) is restricted by unitarity and analyticity requirements. First, unitarity dictates that δ bg (p) be a real-valued function for real momentum p. Second, analyticity requires that there be no poles in the analytically continued e 2iδ bg (p) on the upper-half momentum plane and that the reflection principle be satisfied. Here, we introduce the background phase shift given by where η is a real parameter and Λ bg > 0 is the training Smatrix cut-off parameter. If we let η < 0, (13) reduces to a repulsive hard-core type background used in [24] with −η/Λ bg as the core radius if p is near the threshold. Also, (13) can simulate the left-hand cut both in the physical and unphysical sheet even in the non-relativistic case since the analytically continued tan −1 (p/Λ bg ) has branch cuts in C along the strip (−i∞, −iΛ bg ) ∪ (iΛ bg , i∞) [25].
Using the parameters of background phase in (13), we prepared three training datasets that will be used in the subsequent numerical experiments. These are shown in Table I. The purpose of each dataset is described as follows: Set 0 is used to experiment with different neural network architecture in section IV while Set 1 and Set 2 are used to train two deep neural network models for numerical experiments in section V. For each dataset, we choose negative values for η to mimic a repulsive background since the attractive case is already taken care of by the pole factor in (12). It suffices to use the integer values shown in the second column of Table I for the purpose of this study. Then, for each η we generate 500 random values of Λ bg in the range specified in third column of Table I. The size of each dataset is determined by the parameters of the pole part.
The parameters for pole part of bound-virtual in (12) is generated as follows. For each η and Λ bg in Table I, we choose 1, 000 random values of γ in the interval (−0.9Λ bg , −10 MeV) ∪ (10 MeV, 200 MeV). This choice gives a range of bound state binding energy from 0.106 MeV to 42.55 MeV. We ensure that the range of γ is cut so that equal numbers of near-threshold virtual and bound state poles are generated. With the values of η, Λ bg and γ specified, the S-matrix in (12) can now be used to calculate the input partial wave |f (E)| 2 in (5). For each input, we assign an output label based on the sign of γ, i.e. label 0 if γ > 0 (bound state) and 1 if γ < 0 (virtual state). The number of parameters used results into a total of 4 × 500 × 1000 = 2, 000, 000 input-output samples for bound and virtual state.

C. virtual state and resonance
Using the same background phase in (13) and the second factor of (8), the S-matrix with conjugate pair of poles is written as The values of η and Λ bg are again chosen from Table I but this time we only choose 50 random values for Λ bg . For the pole parameters, 100 values of β is chosen in the interval (50 MeV, 200 MeV) and 100 values of α in (1 MeV, 300 MeV). These choice can give us resonancepeaks with width ranging from 0.12 MeV to 64 MeV. We calculate the input amplitude |f (E)| 2 using the above parameters and assign an output label of 1 for virtual state pole (β > α) and 2 for resonance (β < α). This is just a continuation of output assignment in the previous subsection. We have a total of 4×50×100×100 = 2, 000, 000 input-output samples for resonance-virtual classification.
It is interesting to point out that enhancement due to a resonance pole is not completely distinguishable from that of a virtual state pole. Both of these singularities are capable of producing near-threshold peak structures in the scattering region as shown in Fig.4(d). This is true if we include a background phase in the S-matrix as in (14). A virtual state pole (β > α) that are far from threshold but close to the imaginary axis of unphysical sheet, as shown in Fig.4(c), will produce a peak above the threshold due to the distortion caused by the branch point. Normally, if there is no S-matrix background, the conjugate partner of virtual state with width is sufficient to suppress the appearance of peak even if the poles are far from threshold [26]. This is no longer the case in the presence of background and the conjugate pole must be near the threshold to suppress the peak appearance as demonstrated in Fig.4(b).
A slightly different scenario happens for resonance pole and its conjugate. If it is close to the threshold, a peak structure appears close to the real part of the pole. Here, the conjugate partner is already blocked by the branch cut and can no longer modify the line shape of amplitude. If the resonance pole is moved away from threshold but close to the imaginary axis, the branch point causes the peak structure to appear farther from the pole's real part, resulting to almost identical line shape as that of the virtual pole (see Fig.4(d)). It is therefore crucial to have a neural network trained to distinguish between these two almost-identical peak structures.

IV. ARCHITECTURE AND TRAINING
Now that we have the classification dataset ready, we proceed with the construction of neural network. To determine the optimal architecture for our task, we experiment with different architectures. Chainer framework [27] is used to build the neural network and to carry out the training. Here, we only use the Set 0 of Table I which consists only of bound-virtual samples. This dataset is chosen to deliberately make the classification difficult by putting some of the relevant pole in the branch cut of background. We further split the classification data set into two such that 80% is used for training, which optimizes the weights and biases, and the remaining 20% for testing.
Four neural network architectures are used in this experiment. We describe them using the notation where N is the number of nodes in the th layer ( = 0, 1, · · · , L), with L as the total number of hidden layers and (+1) denotes the added bias. For all architectures, we have N 0 = 200 nodes for the input layer and three nodes for the output. We assign the ReLU as activation function for hidden-layer nodes ReLU z and use softmax for output nodes softmax z In the classification problem, the cost-function to be minimized is the softmax cross entropy given by whereŵ is the weight matrix, b is the bias vector, x is one of the training input with a( x) as the correct answer, X is the size of training sample and y w,b ( x) is the network's output. We use the standard stochastic gradient descent [28,29] to optimize the weights and biases with learning rate of 0.01 and batch size of 1600. The performance of each network architecture is measured by feeding the testing input to the network and comparing the network's output to the correct label. Then, we count the number of correct predictions. The test accuracy of each architecture is shown in Fig.5. The vertical axis gives the accuracy of neural network's predictions using the testing set and the horizontal axis is the training epoch. Generally, the testing accuracy shows large fluctuation due to the stochasticity introduced in the calculation of cost-function. It is interesting to find that the performance of L = 1 architectures shown in Fig.5(a) and Fig.5(b) did not improve much even if we added more nodes. After 1000 epochs, the testing accuracies are 94.4% for the N 1 = 100 architecture and 94.5% for the N 1 = 150. This is just a 0.1% improvement in accuracy. However, we get a significant increase when the additional 50 nodes are placed in the second hidden layer. For a deep neural network with L = 2, N 1 = 100 and N 2 = 50, the performance is shown in Fig.5(c). Here, we  get a 97.2% testing accuracy after 1000 epochs, a significant improvement compared to L = 1 architecture with the same number of nodes. We also check if increasing L, while keeping the total number of nodes fixed, will further improve the performance. The result of L = 3 with N 1 = N 2 = N 3 = 50 is shown Fig.5(d) giving a testing accuracy of 97.3% after 1000 epochs. The result is almost comparable with the L = 2 architecture. However, the L = 2 architecture is more practical to use since it is much faster to train compared to L = 3. Thus, for the rest of this study we will use a two-hidden layer neural network described in Table II. We now proceed to train our chosen network architecture using the classification Set 1 and Set 2 datasets in Table.I. Each of these dataset contains 4, 000, 000 training input-output tuples for bound-virtual and resonancevirtual cases. The network's performance with Set 1 and Set 2 datasets are shown in Fig.6 and Fig.7, respectively. Optimization using Set 1 shows that the accuracy saturates as early as 400 epochs, indicating that the global minimum of the cost-function is already reached. The network's accuracy is 99.7% for the testing of Set 1 dataset after 1, 000 epochs. The same saturation behavior is observed for Set 2. However, the accuracy after 1, 000 epochs is only 97.3% for testing. The lower accuracy is due to the inclusion of η = 0 which corresponds to no-background case. This gives rise to identical enhancements at threshold whether the pole is a bound or virtual state. Despite its lower accuracy, this dataset is still useful in our subsequent numerical experiment.
We now have two deep neural network models with the same architecture but trained by two slightly different datasets, i.e. Set 1 and Set 2. In the next section we will study the applicability of these models using an exact solvable separable potential and then apply this to the nucleon-nucleon scattering data.

V. VALIDATION OF NEURAL NETWORK MODEL
We now explore if the trained neural network has the ability to generalize beyond the training dataset. This is done by generating a validation data using an exactly solvable model. It is important that the validation set be different to that of the training set to make a valid conclusion on the network's ability to generalize.

A. Separable Potential
The simplest model that can give us an exact solution to the Lippmann-Schwinger equation is a separable potential [18,19]. Here, we consider the s-wave potential given by V (p, p ) = λg(p )g(p) with Yamaguchi form factor g(p) = Λ 2 /(p 2 +Λ 2 ) where λ is an energy-independent coupling strength and Λ is a cut-off parameter [30]. The single-channel S-matrix for this model is given by We can introduce a dimensionless parameter ζ = πµλΛ/2 to rescale the momentum plane with the cut-off Λ as scaling parameter. Fig. 8(a) shows the trajectory of pole along the imaginary momentum axis as ζ is varied. At ζ = 0, the pole starts at p = −iΛ and as ζ increases in negative value, the pole splits into two. One of the pole moves beyond the cut-off limit while the other one gets closer to threshold. If −1 < ζ < 0, the near-threshold pole p 0 = iΛ(−1 + √ ζ) is a virtual state. If we further make the potential attractive by letting ζ < −1, the nearthreshold pole crosses the threshold and becomes a bound state pole. The adjustable parameter ζ can be used to produce different amplitudes to estimate the network's prediction.
S-wave bound and virtual enhancement at the threshold are possible for separable potential with energyindependent coupling λ. The absence of centrifugal barrier makes it impossible to produce resonances with attractive interaction [7]. This can be modified, however, by allowing the coupling to be energy dependent [31]. Minimal number of conjugate poles are produced if we let the energy dependence be where E = p 2 /(2µ) with threshold at E = 0. The parameter M sep is the zero of partial wave amplitude such that when E = M sep there is no scattering. The energydependent coupling gives an S-matrix with the pole position at where we introduce a new set of dimensionless parameters ζ = πΛ 3 λ/4 and = 2µM sep /Λ 2 . Consider the case when the zero of amplitude is on the scattering region, i.e. M sep > 0 or > 0. We get conjugate pair of poles provided that ζ( ζ − 1 − ) > 0. This is true for the case of attractive potential, i.e. λ < 0 or ζ < 0 and repulsive case when ζ > (1+ )/ > 0. We consider only the attractive case which is physically meaningful for the discussion of resonance. Fig.8(b) shows the trajectory of poles as ζ is varied. The conjugate poles start at p = −iΛ when ζ = 0 and moves in the opposite direction as ζ becomes negative. The pole remains below the line |Rep| = |Imp| when ζ > ζ crit where Here, we only have virtual state with width. If we further make ζ negative, such that ζ < ζ crit , the pole will move above the line and turns into a resonance pole. As ζ → −∞, the pole approaches the point p = ± 2µM sep on the real axis. To ensure that the zero will appear in the cross-section, we let the values of M sep to be within [0, 100 MeV]. The pole trajectory for M sep < 0 is more involved compared to the previous case. Here, resonance pole can only be produced provided that −(3 − √ 8) < < 0, otherwise ζ will have to be complex. From Fig.8(c), we start producing virtual state with widths when ζ + < ζ < 0 and then resonance when ζ − < ζ < ζ + where As ζ becomes more negative, i.e. ζ v < ζ < ζ − where ζ v = (1 + )/ , the resonance pole will again cross the equal-line and turn into virtual state with width. The two poles will then merge on the zero of amplitude at We separate the validation dataset into three, the first one is generated using the energy-independent coupling which gives amplitude enhancement at threshold. The second and third datasets are generated using the energydependent coupling, one with M sep > 0 and other one with M sep < 0. The last two datasets are capable of producing peak structures above the threshold. Also, for convenience, we restrict the third dataset, i.e. with M sep < 0, to produce conjugate poles only. In each set, we choose a range of cut-off parameter (Λ min , Λ max ) and generated 100, 000 amplitudes using different combinations of parameters.
We must point out that (19) and (21) have no background branch cuts along the imaginary axis compared to S-matrix of training data in (12) and (14). Instead, the validation data has isolated second order pole at p = iΛ. This might have some repercussions on the predictive power of the trained neural network when applied to the separable potential.

B. Validation of Neural Network Model Trained
Using Set 1 We now proceed to test our trained neural network using the validation dataset. In particular, we want to investigate if the network can generalize beyond the training set, i.e. we still get accurate predictions even if the validation set is different from the training dataset. Note that if the validation set is just a subset of the training dataset, then we expect that the accuracy of prediction should be high. We also want to explore the region of applicability of the trained neural network. We can asses both the ability of the network to generalize and its applicability by changing the value of cut-off Λ since this parameter controls the position of the background singularity.
Consider first the accuracy of prediction with respect to the energy-independent coupling set. From Fig.9(a), we obtain optimal accuracy in the cut-off region between 400-1000 MeV despite that the background singularity of the validation set is different to that of the training set. We can say that, within this region, the neural network generalizes beyond the training data in distinguishing bound and virtual state enhancements. Below 400 MeV, the difference between the training and the validation background starts to manifest as seen from the decrease in accuracy as the cut-off is decreased. We also observe a decrease in accuracy in the cut-off region above 1000 MeV. Here, increasing the cut-off pushes the background far from the scattering region; consequently, a bound or virtual near-threshold pole enhancement becomes identical as we have discussed in section III.
It is interesting to find that the accuracy of prediction is different in energy-dependent set as shown in Fig.9(b) and Fig.9(c) even if the neural network is just distinguishing resonance and virtual state with width enhancements for both cases. This difference is probably due to the position of the amplitudes zero, M sep . For the case of M sep > 0, i.e. the zero is above the threshold, the second order pole background in (21) can produce a bound-like enhancement at the threshold. This is the reason why we get lower accuracy in Fig.9(b) below 400 MeV. In fact, the network gives a bound state prediction even if there is no bound state in the validation set. This is, however, suppressed in the M sep < 0 case in Fig.9(c) where the zero below the threshold cancels the effect of the isolated background pole. The absence of extra structure near the threshold allows the network to distinguish a resonance with that of virtual state with width.
The situation is reversed as we go to higher cut-off region. This time, the M sep > 0 gives high accuracy in Λ > 600 MeV as shown in Fig.9(b) compared to M sep < 0 in Fig.9(c). If Λ is large, the resonance peak can go beyond the center-of-mass energy range. For M sep < 0, the zero below the threshold causes the crosssection to monotonically rise from some small value to some maximum at E cm = 100 MeV. In the absence of peak, the structure for resonance and virtual state with width becomes almost identical. This is the reason why we have decreasing accuracy in Fig.9(c) as the cut-off increases. On the other hand, for M sep > 0, the large Λ means that no bound-like enhancement will appear at the threshold. The structure between the threshold and the zero at E = M sep can still be used by the network to distinguish a resonance with a virtual state with width even if the relevant peak goes beyond the range of centerof-mass energy. This is the reason why we have high accuracy in M sep > 0 validation set in high Λ region.

C. Validation of Neural Network Model Trained
Using Set 2 For certain values of parameters, the training and validation backgrounds can have similar forms. That is, if we set η = −2, the training background e 2iδ bg reduces to (p + IΛ bg ) 2 /(p − iΛ bg ) 2 but with domain C/(−i∞, −iΛ bg ) ∪ (iΛ bg , I∞). One may attribute the good performance of our neural network to this similarity. We can test this assumption by using the training Set 2 in Table I where η = −2 is replaced with η = 0. The accuracy of the network trained using Set 2 is shown in Fig.10. Notice that above 600 MeV, the results are all similar to the performance of network trained using Set 1 in Fig.9. This demonstrates that even if the validation dataset is not in the training set, the neural network can still give high accuracy of predictions. This also illustrates that the decrease in accuracy as the cut-off increases as shown in Fig.10(a) and Fig.10(c) is an intrinsic part of pole classification problem.
We pointed out in the previous subsection that the difference in training and validation background manifests in the low cut-off region. The presence of second order pole in the background of validation dataset and the absence of η = −2 in the training parameter aggravate the situation. This is seen as a drastic drop in accuracy in Fig.10(b) and Fig.10(c) below 200 MeV. This means that in this region, the accuracy of the networks prediction is sensitive to the nature of background singularity.
We give a short comment on the networks performance on the shallow bound and virtual state produced by energy-dependent set with M sep < 0. From the trajectory of poles in Fig.8(c), a near-threshold bound state or virtual pole is always accompanied by another virtual pole. The latter pole is much closer to the scattering region compared to the accompanying virtual pole of (19) in Fig.8(a). This makes the classification difficult, i.e. accuracy is less than 50%, because the training S-matrix in  We give a short comment on the result with shallow bound state produced in energy-dependent coupling with M sep < 0. Generally, the network's performance is poor, i.e. below 50%. This is quite expected since the bound state generated in (21) with M sep < 0 is always accompanied by a nearby virtual pole (see the trajectory in Fig.8(c)). Now, our training S-matrix in (12) is only capable of producing a structure caused by a single nearthreshold pole. This makes the network ill-equipped in distinguishing the bound-virtual enhancement produced in (21) with M sep < 0. Improvement can be done by putting another pole part in (12) to simulate a nearby virtual pole in addition to the near-threshold bound or virtual pole. The situation is different with the S-matrix in (19) which also produce a virtual state pole in addition to a near-threshold bound state. However, this virtual state is pushed beyond the cut-off Λ and has negligible influence in the scattering region.

D. Application to Nucleon-Nucleon System
As a final validation, we use the partial wave analyses and potential models of the Nijmegen group [32-35] as input to our neural network. These models are fitted to the nucleon-nucleon scattering data published between 1955 to 1992. They give the correct phase shifts at any laboratory kinetic energy below 350 MeV. The fitting results are summarized in Table III. Here, PWA93 corresponds to the analyses of multienergy partial wave on the pp data, the np data and on the combined pp and np database [33]. All three analyses give an excellent fit of χ 2 /N ∼ 1 where N denotes the number of scattering data. Nijm93 is the Nijmegen soft-core potential model introduced in [34] with NijmI as the nonlocal Reidlike and NijmII is the local version. In the same paper, Reid93 is also introduced which is a regularized Reid softcore potential. All of these contain the charge-dependent one-pion exchange tail. Lastly, two meson-exchange is included in the extended soft-core ECS96 model of [35]. Now, using the 1 S 0 and 3 S 1 phase-shifts of the mentioned models, we generate the input amplitude on a center-of-mass energy interval [0, 100 MeV]. We can say that within the cut-off range from 400 MeV to 1, 000 MeV, our neural network model can classify a bound-virtual enhancement with 98% accuracy based on our analysis with separable potential model. The resulting amplitude is then fed to the neural network and the results are shown in Table IV. All the predictions are correct, i.e., the network was able to identify that the 1 S 0 partial wave threshold enhancement is due to the presence of virtual state pole while that of 3 S 1 is due to a bound state pole. It is interesting to point out that the small differences among the models do not affect the network's prediction. This means that if the input data falls within some error band, the neural network can still give consistent classification.

VI. CONCLUSION
This study set out to demonstrate how deep learning can be applied in classifying the nature of pole causing a cross-section enhancement. The method is straightforward in a sense that we can use a simple S-matrix parametrization to generate all the possible line shape that can emerge in the scattering region. We have shown that our neural network model gives high accuracy of more than 90% in the acceptable range of cut-off parameter (400 − 800 MeV). This suffices to have an accurate prediction on the nucleon-nucleon scattering data. Also, the study shows that a neural network trained using a simple S-matrix parametrization is able to generalize beyond the training set. This is demonstrated when we validated our neural network using separable potential models and the nucleon-nucleon Nijmegen models. However, there are limitations in the applicability of deep learning for enhancement classification. One example is the noticeable decrease in accuracy if the cut-off parameter is too large. For the bound-virtual classification, the effect of background is important to distinguish the two structures. While for virtual-resonance classification, the peak structure tend to appear beyond the center-ofenergy range if the cut-off is very large, making the classification difficult.
It is important to extend our approach to coupledchannel case since most of the exotic phenomena are believed to be generated from coupled-channel interactions. Although the current study deals with singlechannel scattering, the findings can still be used in coupled-channel analysis. In particular, we found that if the validation cut-off is too small, then the neural network's prediction becomes sensitive to the nature of background singularity. This observation should extend to the coupled-channel case and it is appropriate to explore other possible background parametrization such as the one used in [36,37]. This will be done elsewhere.

ACKNOWLEDGMENT
This study is supported in part by JSPS KAKENHI Grants Number JP17K14287, and by MEXT as "Priority Issue on Post-K computer" (Elucidation of the Fundamental Laws and Evolution of the Universe) and SPIRE (Strategic Program for Innovative Research). AH is supported in part by JSPS KAKENHI No. JP17K05441 (C) and Grants-in-Aid for Scientific Research on Innovative Areas, No. 18H05407, 19H05104. DLBS is supported by the UP OVPAA FRASDP and DOST-PCIEERD postdoctoral research grant.