Model independent analysis of coupled-channel scattering: a deep learning approach

We develop a robust method to extract the pole configuration of a given partial-wave amplitude. In our approach, a deep neural network is constructed where the statistical errors of the experimental data are taken into account. The teaching dataset is constructed using a generic S-matrix parametrization, ensuring that all the poles produced are independent of each other. The inclusion of statistical error results into a noisy classification dataset which we should solve using the curriculum method. As an application, we use the elastic $\pi N$ amplitude in the $I(J^P)=1/2(1/2^{-})$ sector where $10^6$ amplitudes are produced by combining points in each error bar of the experimental data. We fed the amplitudes to the trained deep neural network and find that the enhancements in the $\pi N$ amplitude are caused by one pole in each nearby unphysical sheet and at most two poles in the distant sheet. Finally, we show that the extracted pole configurations are independent of the way points in each error bar are drawn and combined, demonstrating the statistical robustness of our method.

We develop a robust method to extract the pole configuration of a given partial-wave amplitude. In our approach, a deep neural network is constructed where the statistical errors of the experimental data are taken into account. The teaching dataset is constructed using a generic S-matrix parametrization, ensuring that all the poles produced are independent of each other. The inclusion of statistical error results into a noisy classification dataset which we should solve using the curriculum method. As an application, we use the elastic πN amplitude in the I(J P ) = 1/2(1/2 − ) sector where 10 6 amplitudes are produced by combining points in each error bar of the experimental data. We fed the amplitudes to the trained deep neural network and find that the enhancements in the πN amplitude are caused by one pole in each nearby unphysical sheet and at most two poles in the distant sheet. Finally, we show that the extracted pole configurations are independent of the way points in each error bar are drawn and combined, demonstrating the statistical robustness of our method.

I. INTRODUCTION
Identifying the nature of observed enhancements in hadron-hadron scatterings is one of the central goals of hadron spectroscopy [1,2]. The primary concern is to associate the enhancements with nearby S-matrix poles [3][4][5][6][7]. Ideally, we make a complete measurement of observables, such as elastic and reaction cross-sections with spin dependence if possible. For low energy processes, the measurement is followed by partial wave analysis of the experimental data to obtain the amplitudes. The poles, Riemann sheets, and other resonance parameters are extracted from the amplitudes. To achieve this program, conventionally, one often uses a simple parametrization of the amplitude such as Breit-Wigner or Flatte parametrizations, or some more rigorous formulation like the K-matrix formalism [8]. Alternatively, one can use a dynamical model using suitable hadron degrees of freedom and their interactions. In these methods, some parameters are determined to fit the experimental data. In general, however, the applicability of these methods is somewhat limited by the assumptions of the model.
In this study, we propose an alternative method to the above parameter-fitting approach using deep learning [9][10][11]. Eventually, we would aim at the program that determines detailed properties of poles of the amplitude, including their positions, residues, among others. However, this is extremely difficult at this moment, and therefore, we limit our program to identify the pole configuration, that is, the number of poles in each Riemann sheet associated with the structures in the amplitude. The information we obtain is limited but valuable enough to draw some physical insights (see Refs. [12][13][14][15]). * sombillo@rcnp.osaka-u.ac.jp; dbsombillo@up.edu.ph We design and develop a deep neural network (DNN) to detect the pole configuration using only the partialwave amplitude. The idea is to produce a general set of simulated amplitudes with known pole configuration to build the training dataset. In doing so, the general properties of the S-matrix (analyticity, unitarity, and hermiticity) are satisfied. Furthermore, the poles that we generate must be independent of each other to ensure that the detected poles by the DNN are free from any bias. The next step is to construct a DNN program. In the optimization of the DNN, we use the training dataset to teach the DNN to recognize physically realizable amplitudes from the experimental data and extract the pole configuration. As anticipated, our method is a classification program that gives a general description of the data. Therefore, to extract the physical properties of the enhancement structures in the amplitudes, a dynamical model approach must be used while maintaining the result obtained by our program. We emphasize that once the DNN model we have constructed, the optimized parameters in DNN can be reused to analyze different processes. This treatment is possible since our formulation does not rely on a specific functional form of amplitudes.
We consider the two-hadron scatterings with two coupled channels. For a more specific demonstration, we take the elastic πN partial-wave amplitude, collected and analyzed by the GW-SAID in Refs. [16,17], as our experimental data. We choose this amplitude because of the interesting role of threshold effects to the noticeable structures seen just above the ηN and below the KΣ thresholds as shown in Fig. 1. We use the coupled πN and ηN channels treatment for the present study. In addition to the deep learning approach, we also introduce an alternative way to utilize the error bars to interpret experimental data. Instead of generating several parameters of a model that fits within the experimental result, we do the reverse treatment. We produce several amplitudes directly from the experimental results by combining points in the error bars then feed them to the trained DNN. Since there are infinitely different ways to combine points in each error bar, we expect to get different pole configurations. We can interpret the DNN inference with the highest count as the best pole configuration associated with the experimental data. Thus, we are extracting the information directly from the experimental data and not just constraining the model. This paper serves as the companion paper of Ref. [11]. The structure is as follows. First, we discuss in section II the general properties of the S-matrix used to generate the training dataset. Then, we proceed with the construction and optimization of our DNN in section III. The treatment of experimental data for the DNN inference is done in section IV. Finally, we give our conclusion in section V.

II. GENERAL PROPERTIES OF S-MATRIX
In this section, we review the general properties of the S-matrix that we will use in the generation of the training dataset. For the present purpose, it will suffice to consider the two-hadron scatterings with two channels. Here, we take into account the πN as channel 1 and ηN as channel 2. The KΛ channel can be ignored due to its weak coupling to the πN channel [18]. We can also ignore the KΣ channel by restricting the energy region of analysis up to the KΣ threshold energy. The proximity of the peak structures to ηN threshold allows us to utilize the non-relativistic relation where E is the center-of-mass scattering energy, p i is the relative momentum of the ithe channel (i ∈ {1, 2}) with two-particle reduced mass µ i and threshold energy T i . Here, the smooth variation of the lower-channel is approximated using p 1 = 2µ 1 (E − T 1 ).
To reveal the full analytic properties of the S-matrix, we let the energy and momentum take complex values. With the relation in Eq. (1), complex energy is a twovalued function of complex momenta with the positive or negative imaginary part. We call the energy plane associated with the positive (negative) imaginary part of the ith momentum channel as the top [t] (bottom [b]) Riemann sheet. For the coupled two-channels, the four Riemann sheets are conveniently labeled using the intuitive notation in Ref. [20] as [s 1 s 2 ] where s i = {t, b} is the sheet of the ith channel. The scattering region, the energy region where we plot the cross-section, lies along the real axis on the upper half of the physical [tt] sheet. The other Riemann sheets are collectively called unphysical sheets, and they are connected, together with the physical sheet, in a nontrivial way due to the presence of branch cut. Fig. 2 shows the connection of the  The properties of the S-matrix govern the collision of particles with short-range interaction. The most important property is causality, which means that no scattered wave will reach the detector before the incident wave reaches the scattering target. Causality imposes that the scattering amplitudes take the real-boundary values of some analytic functions of complex energy on the physical sheet [5,21]. It follows that there should be no pole on the physical sheet except possibly some bound state poles on the real axis below the lowest threshold. Next, we also expect that the total probability is conserved, which requires that the S-matrix S be unitary, i.e., SS † = 1. This property allows us to restrict the form of the full S-matrix and its elements in the corresponding interval of scattering energy. Finally, below the lowest threshold, the S-matrix should be real-valued or Hermitian [22]. The well-known consequence of hermiticity is the reflection principle [3,4], i.e., the energy poles come in conjugate pairs. By combining analyticity, unitarity, and hermiticity on the relevant energy region, we can construct a general parametrization for our S-matrix.
Using hermiticity and unitarity, we can perform analytic continuation [23,24] and obtain the following Smatrix elements: where the subscripts refer to the channels. Here, the function D(p 1 , p 2 ) determines the pole positions of the S-matrix. We can calculate the scattering amplitudes using the relation S ii = δ ii + 2iT ii where δ ii is the Kronecker delta and i, i ∈ {1, 2}. If we can control the singularities using a specified form of D(p 1 , p 2 ), then we can generate a set of simulated amplitudes T ii (p 1 , p 2 ) with known pole configuration.

A. Controlled poles and Riemann sheets
To accommodate a large pole-configuration space, we have to generate a set of simulated amplitudes with arbitrary number of independent poles. We look for D j (p 1 , p 2 ) = 0 that can give exactly one pole E (j) pole occupying one of the unphysical Riemann sheet and then form the product This prescription ensures that all E (j) pole s are produced independently of each other. Now, from the reflection principle, we know that if E (j) pole is a pole, then the complex conjugate E (j) * pole must also be a pole. To fix the Riemann sheet of E (j) pole and accommodate its conjugate partner, we use The absolute values of the real parameters α The importance of the extra parameter λ is discussed in the following.
A quick inspection shows that, if Eq. (1) is imposed on D j (p 1 , p 2 ) = 0 with D j (p 1 , p 2 ) given by Eq. (4), we get four solutions in either p 1 or p 2 . Two of these are the assigned pole E (j) pole and its conjugate partner while the other two are known as the shadow and its conjugate partner [25][26][27][28]. The position and Riemann sheet of the shadow pole is obtained by expressing D j (p 1 , p 2 ) = 0 in terms of p 1 (or p 2 ) and factoring out ( ); these are and The reversed sign of Im p 2 tells us that the Riemann sheet of the shadow is different to that of the assigned pole. This means that E (j) pole can respect analyticity but the shadow may not. To avoid the possible violation of analyticity, we throw the shadow far from the relevant scattering region. In particular, we choose a sufficient negative λ that can put the shadow (and its conjugate partner) on the real axis below the lowest threshold. By doing this, each E (j) pole will have its own distant background pole. Thus, we can guarantee that each D j (p 1 , p 2 ) will produce exactly one isolated pole on a specified Riemann sheet. In this way, the poles are produced independent of each other while respecting analyticity, giving us a general parametrization.

B. Observable effects of shadow
We push the shadow pole solution of D j (p 1 , p 2 ) = 0 far from the relevant scattering region to arrive at a general parametrization. However, there are situations where a shadow pole exists and may have observable consequences in the scattering region around the relevant threshold. Consider, for example; the familiar two-channel Breit-Wigner model [25,26] with the poleposition condition taking the form where E BW is the Breit-Wigner mass and γ i ≥ 0 is the coupling to the ith channel. For γ 2 < γ 1 , we can have a . If we slowly turn off γ 2 , the pole and shadow will move towards a common energy point (the same real and imaginary part) but in different Riemann sheets. The cusp at T 2 , due to p 2 ∝ √ E − T 2 , disappears on the scattering amplitude indicating that the situation is now reduced to a single-channel scattering.
It turns out that putting another independent pole in our parametrization can mimic the effect of shadow. To demonstrate this effect, let us consider the extreme case where we fixed the position of an isolated pole on the [bt] sheet such that its real part is equal to T 2 . Fig. 3 shows the resulting |T 11 | 2 (solid blue line) where a peak structure below T 2 with a noticeable cusp (infinite slope) at the threshold is observed. The addition of an arbitrary [bb] pole below T 2 push the peak structure slightly closer to T 2 (dashed red line) and slightly diminish the cusp structure. Finally, if the [bb] pole is placed exactly at the same position as the fixed [bt] pole, the peak occurs at the actual real part of the pole (which is at T 2 ), and the cusp disappeared (dotted green line). This behavior is an indication that the pole decouples to the second channel, similar to turning off γ 2 in the two-channel Breit-Wigner model. It follows that, we can turn off the channel coupling by using an arbitrary trajectory for two independent poles on different sheets. This behavior shows that the arbitrary pole we placed on a different Riemann sheet effectively functions as a shadow of the fixed pole. In our formulation, we can generate the pole and shadow independently. The decoupling effect of a pair of independent poles on different Riemann sheets can be proven in a general way as discussed in the Appendix.
We now have a parametrization that can produce as many poles as we want with an arbitrary way of controlling the strength of channel coupling. In the next section, we use our general S-matrix parametrization to produce a set of simulated amplitudes for the teaching dataset.

III. CONSTRUCTION OF DEEP NEURAL NETWORK MODEL
A. Generation of training dataset The design of our DNN generally depends on the features of input data, which we take to be the real and imaginary parts of amplitude, and the number of possible classifications in the teaching dataset. In the final stage of analysis, we will use the experimental data to make DNN inferences. It is, therefore, essential to include the limited energy resolution in the design of DNN. To do this, we generate a set of amplitudes on randomly spaced energy points. Specifically, we divide the region of interest in the experimental data into some number of bins, say B bins. One can think of each bin as the energy resolution of a particle detector. It is safe to assume that some probability distribution determines the energy of a particle entering a detector. We can use this distribution to pick one representative energy in each bin to calculate the real and imaginary parts of the amplitude. For our specific task, we divide the πN center-of-mass energy range, from πN to KΣ thresholds, into 37 bins since there are 37 data points in the GW-SAID data [16,17]. Also, for simplicity, we use a uniform distribution to choose a point in each energy bin.
The number of possible pole-configuration classifications depends on how many poles we are willing to count within some specified Riemann sheet region. The number of configurations will also determine the number of output nodes assigned to our DNN model. Appealing once again to the GW-SAID data, there are two prominent structures -one is around the ηN threshold, and the other is between KΛ and KΣ thresholds. The number of poles that we have to count should not be less than two. In our study, it suffices to consider a maximum of four distributed poles in any combination of Riemann sheets. With this restriction, we identified 35 possible configurations. Each of these configurations corresponds to one output node in our DNN. The possible configurations and output-node assignment are shown in Table I.
The enhancements in the GW-SAID πN amplitude occur only on a limited portion of the scattering region. Thus, it is practical to limit the region where we produce and count our poles in each Riemann sheet. We define the counting region as: where the energies are in units of MeV. Here, we do not have to count the conjugate poles on the upper half of [bt] or [bb] and the lower half of [tb] since they do not correspond to different independent states. In addition to the counted poles, we also produce distant background poles far from the scattering region, but these are not counted. By randomly generating at-most four poles inside the counting region, we constructed 1.8×10 6 labeled amplitudes divided uniformly into 35 configurations for the training set. Our task is to find a map between the input amplitude to the corresponding output pole configuration in the form of a DNN model. To monitor the performance of DNN, we also produce an independent 3.5 × 10 4 labeled amplitudes for the testing set.   Fig. 4 shows the basic architecture of our DNN model. The first 37 nodes in the input layer are for the random energy points chosen in each bin, and the last two 37 nodes are for real and imaginary parts of the amplitude. For the output layer, we use 35 nodes such that each one corresponds to the pole configuration listed in Table I. There are no general guidelines into how many hidden layers and nodes in each layer we should assign to obtain an optimal result. It is, therefore, useful to test some architectures. In this study, we experiment with six different architectures with hidden layer designs shown in Table II.
Except for the input layer nodes and the biases, all the other nodes are equipped with an activation function. For the hidden layer nodes, we use the rectified linear unit (ReLU) such that for the nth node where L is the last of the hidden layers. We also use the softmax cross entropy as our cost-function. The construction of the DNN models and the execution of training loop are all done in Chainer [29][30][31][32].
In a typical training loop, we feed the teaching dataset using some mini-batch procedure to add stochasticity in estimating the cost-function. Then, we execute the optimization of weights and biases using some variant of stochastic gradient descent [33]. We perform these procedures on all our models and found that none is learning the classification problem. In particular, the training and testing accuracies remain at around 2.86% no matter how many training epochs we use. The observed accuracy is, in fact, the accuracy that we will get if we make a random guess out of 35 possibilities. We tried different optimizers and different combinations of hyperparameters shown in Table III but still obtained a non-improving performance.
The difficulty in learning the classification problem is due to the noise introduced in the generation of the training dataset. This noise includes the random choice of energy points in evaluating amplitudes and the randomly added unitary background poles. It is often advisable to use a dataset with less noise to improve the training performance [34], but this defeats the purpose of simulating the energy resolution in the experimental data. The inclusion of energy resolution in the design of DNN inevitably results in a noisy dataset. Thus, we resort to a different approach to initiate the learning process without tampering with our teaching dataset. In the following, we introduce the idea of the curriculum method and how it can start the learning of a complicated classification problem.

C. Curriculum method
The curriculum method was first developed in Ref [35] for simple cases. Depending on the difficulties of classification problems, a more rigorous treatment requires a well-organized training dataset [36][37][38]. For our purpose, it is sufficient to adopt a more heuristic approach where we subjectively identify the simplest dataset and introduce new classification until we present all the training sets. With the 35 pole-configuration classifications, four of which corresponds to at-most-one-pole configuration (labels 0, 1, 11, and 21 of Table I). We treat these four classes as the simplest examples and call them curriculum 1. Then we add one of the two-pole classifications and call the new set curriculum 2. We do this until we have included all the 35 classifications in the final curriculum 32. Table IV shows the incremental progression of the curriculum training. Note that by the end of curriculum 7, we have introduced all the two-pole configurations. Similarly, all the three-pole configurations are presented at curriculum 17 and all four-pole configurations at curriculum 32. The code used to build each curriculum is in the public repository [39].
We trained all the six models using curriculum 1 for the first 300 epochs, curriculum 2 for the subsequent 300 epochs, and then curriculum 3 until epoch 650. The goal here is to find which architecture performs best and then  devote the rest of the training to the chosen DNN. Fig. 5 shows the performance of the six DNNs. All models give decent accuracies at the start of curriculum 1, indicating that the chosen simple classification is indeed learnable. There is a noticeable accuracy drop at the start of curriculum 2 and curriculum 3 due to the introduction of a new classification set. Nevertheless, we can perform a few more training epochs within the curriculum to improve accuracy. The essential point here is that all our models start to learn the classification problem using the curriculum approach. Of all the six models, the architecture with a hidden layer [200-200-200] shows a promising performance, especially at the onset of curriculum 3. We, therefore, devote the rest of our computing resources to DNN model 2.
Note that it is not practical to perform more epoch per curriculum since the accuracy will definitely drop at the start of a new curriculum. It will suffice to have some decent accuracy for each curriculum and continue the training after introducing all the classifications. Hence, we restart the training of the chosen DNN and, to accel- erate the process, we only use 100 epochs per curriculum which are then continued after all the pole configurations are introduced. We also vary the mini-batch size from 512 in the early part of the curriculum epoch, so we can take advantage of large stochasticity to 4,608 in the later part to stabilize the accuracy. At the end of epoch 3,200 (end of curriculum 32), we now have a DNN model that can detect up to four poles in any Riemann sheet with training and testing accuracies of 63.5% and 68.3%, respectively. We continue until epoch 31,050 using the typical training loop and obtained a final training and testing accuracies of 76.5% and 80.4%, respectively [11].
Recall that in the construction and training of our DNN model, only the energy resolution is included and not the fluctuation of the amplitudes, which correspond to statistical errors in experimental data. The goal of using a generic S-matrix is to teach the DNN to recognize only those amplitudes that satisfy the unitarity, analyticity, and hermiticity requirements. This restriction is justified because we expect the actual experimental data to conform to the mentioned general properties. Inserting a random error or offset in the training amplitudes will violate such requirements. In the inference stage, the trained DNN will reinterpret the input experimental amplitudes as if they satisfy the expected properties even if there are some offsets. This feature is the advantage of having a model with many parameters (weights and biases) where the trained DNN automatically discards the irrelevant peculiarities of the input data. We further describe the inference stage in the next section.

IV. APPLICATION TO πN SCATTERING
We can now use our trained DNN to make inferences on the experimental data. Due to the presence of error bars of the data, we can expect that there are multiple possible interpretations associated with the amplitude of interest. The result of DNN inferences must reflect the uncertainty due to the error bars in the experimental data. We can accomplish this by combining points in each error bar to produce several amplitudes. Here, we can further assume that some probability distribution weights the points in each error bar. Except for the assumed probability distribution, the generated amplitudes for the DNN inference comes directly from the experimental data without imposing anything.

A. Interpreting the error bar
We usually interpret each error bar to be one standard deviation σ of the Gaussian distribution around the central value. Therefore, we can generate 10 6 amplitudes directly from the experimental data by drawing points in each error bar using the Gaussian distribution. Note that the way we interpret the error bar corresponds to the confidence level that the points used to generate the amplitudes are within the error bar of the data. For example, increasing the σ to 2σ means that we increase the confidence level from 68% to 95%, and so on. In Fig. 6 we use different interpretations of the error bar in the generation of 10 6 amplitudes and then count the number  Fig. 6. Notice that as we increase the confidence level of the error bar, the most likely configuration emerges. Specifically, we find that the structures in the elastic πN amplitude are caused by one pole in [bt], one pole in [bb], and two poles in [tb]. The other three configurations are suppressed as the confidence level is increased, implying that these three configurations correspond to amplitudes produced by points outside the error bars.
One advantage of our approach is that we can go beyond the typical Gaussian distribution of points in each error bar and use other distributions. In particular, all the points in each error bar may be equally important, and a uniform distribution might be appropriate. In this way, all points used to generate the amplitudes are guaranteed to remain within the error bar. The DNN inference result, with the uniformly distributed points in each error bar, is shown in Table V. It is interesting to note that changing the weights at which the points are combined to produce the amplitudes does not affect the DNN inference very much. That is, we still get the same conclusion that the πN amplitude is best described by Note that we made no assumptions on the poles detected by the DNN. At this point, one can now use a model to interpret the origin of the detected poles. Here, we give some general statements based on the expected effects of Riemann sheet poles on the scattering amplitude. First, the most prominent structure in Fig. 1 is the enhancement between the KΛ and KΣ thresholds. For the two-channel analysis, such enhancement can only be produced by at least one [bb] pole above the second threshold. Meaning, we can associate this enhancement to the [bb] pole that our trained DNN consistently identifies.
The next noticeable structure is around the ηN threshold. As an intuitive interpretation, one may associate the detected [bt] pole to this near-ηN structure. However, a near-threshold [bt] pole is expected to give rise to an amplitude peak below the threshold with an imaginary part close to unity, i.e., reaching the unitarity limit. This description is not the case with the near-ηN enhancement. Also, notice that the DNN sometimes identifies the [bt] pole as a [tb] pole (compare the first and the third rows of Table V). The comparison suggests that the [bt] pole is near the [bt]-[tb] interface which is located above the ηN threshold (see Fig.2(b)). The detected [bt] pole may be associated with the enhancement between the KΛ and KΣ thresholds in the form of a shadow pole. Thus, we can only attribute the ηN threshold enhancement to a non-[bt] sheet pole.
The enhancement around ηN threshold could be due to the threshold cusp effect. However, to make the structure prominent, some nearby poles must be associated with it. We can only make further statements about the origin of the detected poles by appealing to some dynamical model. Nevertheless, we can give some general speculations on the nature of the detected poles. It might be the case that in the zero-coupling limit of πN and ηN channels, the detected [bb] and the [bt] poles move towards a common position resulting in a typical Breit-Wigner resonance of the lower channel. Such is a typical feature of a pole-shadow pair. On the other hand, in the zero coupling limit, one of the [tb] poles go to the [bb] sheet and approaches the same position as the other [tb] pole. In this way, the pole decouples the lower channel and results in either a near-threshold resonance or just a virtual state of the higher channel. Note that a dynamical model is needed to verify the above speculations.

V. CONCLUSION AND OUTLOOK
We have demonstrated how to design a DNN that can extract the pole configuration of a given scattering amplitude. Using a generic S-matrix, we can produce a sizable teaching dataset to train our model to detect the number of poles in each Riemann sheet. Pole generation using Eqs.(2)-(4) gives us the advantage of performing deep learning analysis on any elements of the S-matrix. This prescription allows us to use the method on any scattering processes such as 2 → 2 or 1 ↔ 2. Our approach can be extended to higher channels by putting an extra λ (p − iβ) 2 − α 2 term. Here, the new parameter λ can be adjusted to control the new shadow produced by the new channel. Additionally, the uniformization method introduced in Refs. [40,41] also provides an alternative way to control the poles and generate the training dataset. One possible advantage of uniformization is that it can be extended to relativistic scattering; this will be considered elsewhere. Also, we have shown the effectiveness of the curriculum method in initiating the learning process with a noisy dataset. The curriculum method allows us to accommodate the limited energy resolution in the design of DNN. We also provided a method of utilizing the error bars in the experimental data in a statistically robust way. By generating several experimental amplitudes, we can make a reasonable interpretation of the data based on the most consistent inference of the DNN.
Finally, our proposed S-matrix treats the experimental results in a model-independent way. The poles produced for the training dataset are independent of each other and are not constrained by any a priori trajectory. The above implies that the trained DNN makes no assumptions in detecting the poles associated with the experimental data. It is now up to some dynamical model to interpret which set of poles are supposed to be paired as pole-shadow partners or which one is independent of the other.

ACKNOWLEDGMENT
This study was supported in part by MEXT as "Program for Promoting Researches on the Supercomputer Fugaku" (Simulation for basic science: from fundamen-tal laws of particles to creation of nuclei). DLBS is supported in part by the DOST-SEI ASTHRDP postdoctoral research fellowship. YI is partly supported by JSPS KAKENHI Nos. JP17K14287 (B) and 21K03555 (C). AH is supported in part by JSPS KAKENHI No. JP17K05441 (C) and Grants-in-Aid for Scientific Research on Innovative Areas, No. 18H05407 and 19H05104.

APPENDIX
We show that an irrelevant independent pole can remove the cusp at the higher threshold of a coupled channel scattering. Note that the origin of a threshold cusp is purely kinematical. This will occur at E = T 2 when the amplitude has an explicit dependence on p 2 , where p 2 ∝ √ E − T 2 . If we can manage to remove the p 2 (like turning off γ 2 in two-channel Breit-Wigner) or turn p 2 into some analytic function of E in the amplitude, then the cusp singularity should disappear.
Let the relevant pole E pole in, say, [bb] sheet (i.e. ReE pole > T 2 ) satisfies the condition: F (E pole ) + ig 1 (E pole )p 1 + ig 2 (E pole )p 2 = 0 where F, g 1 , g 2 are analytic functions of E. We can always find F, g 1 , g 2 such that the nearest singularity to the scattering region is an isolated simple pole. Since E pole is in [bb], then we can express the momentum poles (p 1 , p 2 ) as p 1 = −iβ 1 ± α 1 p 2 = −iβ 2 ± α 2 where we set α 1 , α 2 , β 1 and β 2 to be positive. It is understood that p 1 and p 2 are related to the E pole via the energy constraint in Eq. (1). It follows that, the modified equation is satisfied by the same E = E pole but in the [bt] sheet, i.e. the momentum poles are Now, consider the S-matrix element S 11 (p 1 , p 2 ) given by The above form is consistent with S 11 (p 1 , p 2 ) = D(−p 1 , p 2 )/D(p 1 , p 2 ) in Eq. (2). From the previous discussion, we know that the first parenthetical factor will generate a relevant pole E pole in [bb] sheet.
The explicit dependence of S 11 (p 1 , p 2 ) on p 2 is now replaced with p 2 2 . We still have a peak at around E = ReE pole due to the relevant pole since the denominator did not cancel out as we perform the limiting procedure. However, the amplitude will no longer have a branch cut or threshold cusp at E = T 2 due to the absence of explicit p 2 dependence. Furthermore, from Eq. (2), the limitḡ 1 → g 1 andḡ 2 → g 2 , will give us S 22 → 1 and S 2 12 → 0. This means that putting an arbitrary irrelevant pole in the same position as the main pole, but on a different Riemann sheet, effectively decouples the two channels.