Nuclear mass predictions using machine learning models

,


I. INTRODUCTION
The atomic nucleus, a strongly correlated many-body system, is characterized by its proton (Z) and neutron (N ) numbers.Mass is a fundamental property of atomic nuclei, playing a crucial role in our understanding of strong nuclear interactions and a vital role in nuclear astrophysical calculations, such as r-process simulations, where they serve as inputs [1,2].Experimentally, it is possible to study nuclei near the stability lines, and accurate nuclear data for the masses of nuclei are available [3,4].However, despite significant advancements in nuclear facilities, measurements on the neutron-rich side of the nuclear chart remain unfeasible in the near future, and the exploration of the majority of nuclei involved in the r-process has yet to be undertaken.For instance, the neutron drip line, which indicates the position of the last bound nucleus, has been confirmed only up to Z = 10 [5], and the boundaries of the nuclear landscape are not known experimentally.Therefore, our understanding of nuclear properties away from the stability line and the limits of the nuclear landscape relies heavily on theoretical calculations.
Up to now, several theoretical models have been used to investigate nuclear properties and determine the location of drip lines.Within this framework, microscopicmacroscopic (mic-mac) global nuclear mass models, such as Weizsäcker-Skyrme Nuclear Mass Tables (WS4) [6] and Finite-Range Droplet Model (FRDM-2012) [7], have demonstrated considerable success and been extensively used in r−process calculations over the years.However, despite the success of the mic-mac models in fitting experimental data on the measured masses of nuclei, the root mean square (rms) errors with respect to the known experimental data are not at the desired level.The rms error, in relation to the available mass data, was found to be 0.298 MeV when using the WS4 model [6] and 0.662 MeV using the FRDM(2012) model [7].Furthermore, these models have been fitted using the available experimental data, which makes the behavior of nuclei on the neutron-rich side of the nuclear chart still somewhat uncertain.
More sophisticated methods, such as self-consistent mean-field (SCMF) theories based on the Hartree-Fock-Bogolyubov (HFB) approach with nuclear energy density functionals (EDF), have also long been employed to investigate the properties of nuclei.Although using microscopic tools in calculations is a computationally demanding task, large-scale computations of the nuclear chart are nonetheless available.In recent years, both relativistic and non-relativistic calculations have been performed using different EDFs to probe the properties of nuclei and to define the boundaries of the nuclear landscape [8][9][10][11][12][13][14][15][16][17].While these models perform well around the stability line with respect to the available experimental data, they reveal local variations and significant discrepancies that increase with neutron number, ultimately impacting the location of the drip lines.The major source of these differences is the missing incomplete correlations on the purely mean-field level of the HFB description, the usage of different interactions that are optimized using different strategies, pairing correlations, and the impact of the continuum.Typically, the rms error of these mass tables compared to the available experimental data are quite high and range between 2.0 and 5.0 MeV, depending on the interaction used in the calculations.One of the most recent functionals, which are optimized using the experimental data of all available nuclei, has reached an rms error between 0.5 and 0.6 MeV [14,[18][19][20][21]. Considering all of these factors, there is a need to find fast and reliable methods for determining nuclear properties, especially away from the stability line.
In recent years, machine learning (ML) models have gained considerable attention within the scientific community, demonstrating notable success, including also the field of nuclear physics (see Ref. [22] and references therein).These models have proven capable of directly predicting nuclear properties using experimental data [23][24][25][26][27][28][29][30][31][32].Recently, the ML models have also been used to improve mic-mac and microscopic model predictions.Within this framework, the most popular tool is Bayesian neural networks (BNNs), which have been used to improve the results of microscopic calculations by training on the residuals.These residuals represent the differences between experimental data and microscopic calculations, and the BNNs have gained considerable attention and success in that respect [33][34][35][36].In this context, ML models can be used as reliable and efficient tools to probe nuclear properties; however, more studies are necessary to better understand their predictive capability.
In this study, our goal is to assess the performance of the two ML models in predicting the nuclear mass excess (M ) of nuclei, rather than correcting existing mic-mac or microscopic model predictions.We use the Support Vector Regression (SVR) and Gaussian Process Regression (GPR) ML models to calculate the mass excess of nuclei.These models are trained using available experimental data along with the relevant physics-based feature space.Then, we evaluate the performance of these ML models in predicting the mass excess of nuclei, examining also their extrapolation capabilities far beyond the training and test regions.

II. MACHINE LEARNING MODELS
In this section, we present an overview of the ML models employed in our calculations: SVR and GPR.Additionally, we describe the experimental data used to train these models and provide details about the physics-based feature space involved.

A. Support Vector Regression
The SVR [37] is an ML model specifically designed for tackling regression tasks, offering a unique approach to predict continuous outcomes by leveraging the principles of support vector machines (SVMs) [38].In contrast to classification-focused methods, SVR seeks a hyperplane that optimally fits the training data with minimized error.Central to SVR is the concept of support vectors, which are critical data points closest to the hyperplane's boundaries.The model aims to align as many data points as possible within the optimal hyperplane, fitting within a specified tolerance margin.It simultaneously controls margin violations, addressing instances where data points exceed the boundaries.This brings a hyperparameter ϵ, which controls the width of the hyperplane [39].
The strength of SVR is particularly notable in addressing non-linear regression problems, often yielding enhanced results [40].At the core of SVR's approach to these problems is the effective kernel trick.This technique is crucial when dealing with input data that is not linearly separable in its original feature space.By employing the kernel trick, SVR can implicitly project the data into a higher-dimensional space, achieving linear separability.This projection is facilitated by a kernel function, which efficiently calculates the dot product of data point pairs in this higher-dimensional space without the need for explicit calculation of transformed features.Thus, SVR involves mapping input data into a higherdimensional space using kernel functions, allowing for the capture of nonlinear relationships.Among various kernel functions, the Radial Basis Function (RBF) kernel is a widely used choice in SVR applications defined as: In this context, x and x ′ represent two data points, and their Euclidean distance is denoted as ||x − x ′ ||, while γ is the kernel coefficient.Eq. 1 quantifies the similarity or dissimilarity between these data points, based on their distance in the input feature space.This results in higher similarity for closer data points, and conversely, lower similarity for those more distant [39,41].The tuning of hyperparameters plays a crucial role in SVR.Proper parameter adjustment is important to prevent overfitting or underfitting, ensuring the model generalizes well to unseen data.The regularization hyperparameter, denoted as C, is essential in striking the balance between maximizing the margin and minimizing the training error.Additionally, the ϵ hyperparameter is important in determining the tolerance margin, within which the epsiloninsensitive loss function does not penalize errors.Data points within this margin do not contribute to the loss function, enhancing the model's robustness against minor prediction errors and improving its resilience to outliers.In our experiments, we set the ϵ to 0.002, C to 1000, and γ to 0.03.Another hyperparameter, the tolerance, which indicates the desired precision for convergence, is set to 10 −5 .We performed calculations on several hyperparameter configurations to determine the optimal setting for our task, and ultimately report the model that exhibits the highest performance.
SVR offers a versatile framework for regression tasks, utilizing kernels to capture diverse relationships and incorporating a margin of tolerance to enhance robustness.Practical hyperparameter tuning and understanding the role of kernels are fundamental for optimizing the model performance across various datasets and maximizing its efficiency in real-world applications.

B. Gaussian Process Regression
When we consider a linear model expressed as y = w T x, this model describes a linear relationship for every different value of w.If we introduce a prior distribution for w, denoted as p(w), the distribution of possible y values at any given x, y(x|w), emerges from sampling w from p(w).This is the main idea of a Gaussian process.When p(w) follows a Gaussian distribution, each resulting y is also Gaussian, being a linear combination of Gaussians.Specifically, our interest lies in the joint Gaussian distribution of y values computed at N input data points x t , where t = 1, . . ., N [42,43].We typically assume a Gaussian prior with zero-mean for w, as shown in Eq.2: p(w) ∼ N (0, (1/α)I). ( GPR [42,44] operates by leveraging Gaussian processes to model distributions over functions.Initially, the algorithm establishes a prior distribution over functions, assuming Gaussian-distributed function values at input points.This prior distribution forms the foundation, characterized by a mean function and a kernel function.As training data is observed, this prior is updated to a posterior distribution using Bayes' theorem.This update incorporates the observed data, refining the model's beliefs about the underlying function.The resulting posterior distribution enables predictions at new data points, providing not only a mean prediction but also an associated measure of uncertainty, which is crucial for decision-making in uncertain scenarios. The choice of kernels is important in GPR.In our work, we utilize a combination of the RBF kernel (Eq.1) and the White kernel (Eq.3).The RBF kernel is particularly effective at capturing intricate data patterns, adapting to various scales, and ensuring smooth connections between data points.Meanwhile, the White kernel models noise within the dataset.Adjusting the noise level hyperparameter within the White Kernel is essential, striking a balance between capturing the underlying signal and accommodating inherent noise.This delicate interplay between kernels enables our GPR model to provide robust predictions, while acknowledging and quantifying uncertainties.To optimize the model, we adjust the kernel parameters by spanning a range of values to obtain the optimum values for our calculations.In our experiments, the length scale of the RBF kernel is set to 1.0, with its lower and upper bounds on the length scale being (10 −4 , 10 5 ).For the White Kernel, the noise level is set at 1, with the noise level's lower and upper bounds set at (10 −10 , 10).
GPR's strength lies not only in its predictive accuracy but also in its ability to provide nuanced insights into the reliability of those predictions.This is achieved through its uncertainty quantification, which offers a probabilistic measure of confidence in its predictions.By effectively quantifying the uncertainty associated with each prediction, GPR enhances decision-making processes in various domains.This dual capability of delivering precise predictions while simultaneously assessing their reliability makes GPR a valuable tool across a wide range of applications.FIG.1: The training set (gray circles), test set (red circles), and extrapolation set (blue circles) used in the ML models.Both the training and test sets include nuclei from AME2020, with the exception of the newly measured 71 nuclei from AME2020, which are exclusively designated for the extrapolation set [4].

C. The experimental data and feature space
In this study, our objective is to develop an ML model that predicts the mass excess of atomic nuclei using both experimental data on the mass excess of nuclei and a physics-based feature space.The experimental mass excess values are taken from the atomic mass evaluation 2020 (AME2020) [4] for nuclei with Z, N ≥ 8 (2386 nuclei).Then, the experimental data is randomly divided into two subsets: 75.0%(1789 nuclei) for training and 25.0% (597 nuclei) for testing.The nuclei in the training and test sets remain the same for all calculations.The performance of the ML models is also assessed beyond the training and test data sets.The AME2020 data includes new experimental information for 71 nuclei compared to the previous AME2016 [3].These nuclei have been utilized to test the extrapolation capabilities of the models.The estimated mass excess values, derived from the trends in the mass surface (TMS) of nuclei, are also utilized to compare our findings in the extrapolation region.The selection of the training and test sets, as well as the new data from AME2020 (extrapolation set), is shown in Figure 1.Additionally, we evaluate the extrapolation performance of the models by extending calculations to the neutron-rich region beyond the reach of current experimental facilities, probing the limits of their predictive capabilities.

TABLE I:
The ML models with different features.

Model
Feature Space As it is well known, the use of appropriate inputs during training can significantly impact the performance of ML models [26-28, 30, 31].Therefore, in our models, we incorporate relevant features of nuclei that can influence mass predictions.Our feature space consists of 12 inputs: Z, N , A, A 2/3 , (N − Z)/A, Z eo , N eo , ν Z , ν N , P F , Z shell , and N shell .Here, the bulk properties are defined as the proton (neutron) number Z (N ), the mass number (A), and A 2/3 for volume and surface terms.The term N −Z A is a measure of isospin asymmetry.The odd-even nature of protons (Z eo ) and neutrons (N eo ) is defined as follows: Z eo (N eo ) equals zero when Z (N ) is even and one when Z (N ) is odd.We also provide information about the nuclear magic numbers: ν Z and ν N represent the valence number of protons and neutrons measured from the nearest closed shell.The nuclear magic numbers for protons and neutrons are taken as Z(N ) = 8, 20, 28, 50, 82, 126, 184.The promiscuity factor (P F ) is represented by the formula P F = ν Z •ν N ν Z +ν N , and serves as a measure of valence proton-neutron (p − n) interactions [45].Lastly, the system is informed about the nuclear shells with Z shell and N shell ; they represent the shell model orbitals of the last proton and neutron.The values of Z shell and N shell are defined as 0, 1, 2, 3, or 4, depending on whether the proton or neutron number falls within the specified ranges: 1-28, 29-50, 51-82, 83-126, and above 127, respectively [46].In order to assess the importance of the feature space in model calculations, we implement ML models with different features.The inputs used in our ML models are given in Table I.

III. RESULTS
Figure 2 displays the absolute differences between the results of GPR (upper panels) and SVR (lower panels) with different inputs and the experimental data taken from AME2020 [4].The feature space of the ML models is provided in Table I.The rms errors for the training and test sets of each selected model are also presented in Figure 2. Using only the bulk properties of nuclei to construct the model, GPR-5 yields reasonable results, with rms errors of 0.91 and 1.08 MeV for the training and test sets, respectively, better than most of the microscopic model calculations.On the other hand, the performance of SVR-5 is lower compared to GPR-5, with rms errors of 2.40 and 2.55 MeV for the training and test sets, respectively.
In Figure 2, it is evident that increasing the physicsbased feature space significantly improves the performance of the models.The importance of the physicsbased feature space has also been discussed in previous studies, with similar results obtained using different ML models [26,28,30,31].It has been noticed that the inclusion of the odd-even nature of protons and neutrons (Z eo , N eo ) leads to a significant improvement in ML predictions.Subsequently, the results improve further with the inclusion of information on the nuclear shells, Z shell and N shell .Utilizing 12 inputs (GPR-12) in the calculations, we achieved an rms error of 0.14 and 0.26 MeV for the training and test sets, respectively.These rms error values are even better than those of well-known mic-mac mass models, suggesting that the GPR model effectively captures the given information of nuclei and makes reasonable predictions.Additionally, we observe that GPR performs better for medium-heavy and heavy nuclei, while errors are slightly higher for light nuclei.The poorer performance in light nuclei is attributed to the lower number of available experimental data in this region.Similar results are also obtained using the SVR model.However, we find that SVR requires more data and information to make reasonable predictions for training and test set nuclei, and GPR outperforms SVR in that respect.
We also compare our findings with previous ML studies in which different ML models have been used to predict nuclear mass excess.One of the first applications of ML models in nuclear physics was performed using SVMs [24], predicting nuclear mass excess long ago.It yielded rms errors of 0.35, 0.5, and 0.71 MeV for the training, validation, and test sets, respectively.A recent application of the probabilistic ML algorithm, the Mixture Density Network (MDN), has yielded rms errors of around 0.5-0.6MeV with respect to the AME2016 [30] when supplemented with physics-based feature space.Recently, it has been shown that the inclusion of a soft physical constraint in the MDN achieved an rms error of 0.186 MeV for the training data (consisting of only 450 nuclei, approximately 20% of the AME2016 dataset) and an rms error of 0.316 MeV for the remainder of the AME2016 FIG.2: The absolute value of mass excess differences between the GPR and SVR predictions for training and test set using different features (see Table I) and the AME2020 data [4].The rms errors for the training and test sets are also provided.
data with Z ≥ 20 [31].Therefore, we also performed calculations using different train-test set ratios to assess the performance of our ML models, and the results are presented for nuclei with Z ≥ 8 in Table II.Our findings indicate that our ML models exhibit robust predictive capabilities even when trained on a mere 25% of the available experimental data.However, as expected, the models' performance on the test set declines with reduced training data, as they struggle to grasp details with limited information.Conversely, increasing the number of the training data yields noticeable improvements in the models' test set performance, while the performance on the training set remains relatively stable.Similar results are also obtained in Ref. [32] using the MDN, whereas it is observed that our ML models require more training data to learn and generalize information to unseen data compared to the MDN.We anticipate that incorporating physical constraints into ML models, such as the Garvey-Kelson (GK) relations, can also enhance the predictive power of the ML model on unseen data, particularly with a limited amount of training data [31].Alternatively, increasing the size of the training data, as demonstrated in our work, can also improve model performance on unseen data.Our results, even without applying a physical constraint, are in good agreement with the findings in Refs.[31,32] when using train-test set ratios of 50%-50% and 75%-25%.
We conclude that our findings, obtained using different ML models, not only align with these previous studies but also establish GPR and SVR as alternative and reliable tools for ML studies in nuclear physics.
Extrapolation performance of ML models -One of the most important issues in ML studies is the low performance of the ML models when it comes to extrapolation, namely, outside the training and test set regions.It is essential to develop models that not only predict wellknown experimental data (training and test data) effectively but can also make accurate predictions for parts of the nuclear chart that are challenging to measure experimentally.Therefore, in this subsection, we assess the extrapolation capabilities of the ML models by extending beyond the experimentally known region.Initially, we test the performance of the ML models on the newly measured 71 nuclei from the AME2020 data [4] (see Fig. 1).We present the rms errors of each model in Table III.
Clearly, the accuracy of model predictions improves with the use of appropriate features.Specifically, increasing the number of inputs from 5 to 12 improves the performance of the GPR and SVR models in the extrapolation region by 54.73% and 67.96%, respectively.Furthermore, the low rms errors of these ML models, which are comparable to those of modern nuclear mass models, indicate that ML models are able to make reasonable predictions even outside the training region.How far can we go from the experimentally known region and get reasonable results using ML models?In order to assess the extrapolation performance of the ML models, we extend our calculations through the protonrich and neutron-rich regions.The results are presented for both the training and test regions (gray region) and the extrapolation region (white regions), where no experimental data currently exists.In Figure 3, we depict the predictions for the mass excess of nuclei using GPR-5 and SVR-5 for selected isotopic chains from various parts of the nuclear chart.The estimated values for the mass excess predictions from the trends in the mass surface (TMS) are also used to assess the performance of the models in the extrapolation region [4].Additionally, we compare these predictions with results from well-known mass tables: the mic-mac model WS4+RBF [6] and the non-relativistic (BSk24) [20] calculations.The relativistic calculations with the point-coupling interaction DD-PCX [47] are performed for even-even nuclei using the axially-deformed Hartree-Bogoliubov (RHB) model with separable pairing [48], employing 20 harmonic oscillator shells for convergence in the calculations [15].
In GPR, the uncertainty is represented by the blue shaded region.It represents the probability distribution over the possible functions.This distribution is updated as more data or features are observed, which leads to a more precise estimate of the function.Therefore, it is expected that the uncertainty increases away from the training data, which is a direct consequence of the roots of Gaussian Process in probability and Bayesian inference.As can be seen from the upper panels of Figure 3, the GPR with only 5 features performs poorly when we move away from the training-test region, and the uncertainty is quite high in the extrapolation region.Apart from the Mg chain, the GPR can make reasonable predictions for the isotopic chains up to an increase in neutron number around 4 or 5.Then, the results start to deviate and do not follow the trends obtained in different mass models.Although the rms errors are higher for the training and test sets using the SVR-5 model, it is seen that the SVR-5 model captures the trends better in the extrapolation region.
By increasing the number of features in the GPR model, we observe a significant improvement in the model's performance in the extrapolation region (Figure 4).Firstly, we note a considerable reduction in the uncertainties of the predictions.Secondly, the predictions of the GPR-12 model align with a trend that is similar and comparable to those obtained in different mass models, albeit slightly higher nearby the drip line.Increasing the number of features in the GPR model unequivocally enhances its generalizability and improves uncertainty estimation.As mentioned above, SVR-12 demonstrates improved predictions in both the training and test regions when the number of features is increased.However, an increase in the number of features in the SVR model does not lead to better results for the extrapolation region.The predictions of the SVR models start to deviate from other mass models and underestimate the mass excess values compared to them near the drip lines.
Finally, we explore the one-and two-neutron separation energies calculated using the mass excess M values obtained from our ML models and compare them with those from other models and available experimental data.The one and two neutron separation energies are calculated by where m n represents the mass of the neutron.In the upper panels of Figure 5, the results are displayed for the one-neutron separation energies of Kr (a) and Nd (b) isotopic chains.It is evident that the ML models provide reasonable predictions and are in agreement with the experimental data, exhibiting the well-known oddeven staggering (OES) in binding energies.As the neutron number increases, the results also show comparability with other theoretical model calculations.However, near the drip lines, the ML models start to deviate from other model calculations.
In the lower panels of Fig. 5, the two-neutron separation energies are displayed for the Kr (c) and Nd (d) isotopic chains.It can be observed that the ML models make reasonable predictions for the Kr chain.In comparison to the SVR-12 model, the GPR-12 model's predictions are more reasonable near the drip lines and follow a smooth decreasing behaviour with increasing neutron number.Additionally, the predictions of the GPR-12 model are comparable to the WS4 model, while the SVR-12 model results align with the BSK4 model as neutron number increases.When it comes to nuclei near the drip lines, the predictions of the SVR-12 model become inaccurate and exhibit an increasing pattern.The ML model predictions deviate from other mass models, particularly for heavier Nd nuclei.It is also seen that the uncertainty in the GPR-12 predictions is higher for this chain in the extrapolation region.This discrepancy is a natural consequence of both the limited number of available experimental data points and the absence of information in FIG.3: The GPR-5 and SVR-5 mass excess predictions are shown for the selected isotopic chains as a function of the neutron number.The blue shaded region represents the 95.0% confidence interval, and the gray region indicates the training and test set area, while the white region is used as extrapolation region.The estimated values for the mass excess predictions, away from the training and test set region, are derived from the trends in the mass surface (TMS) and are taken from Ref. [4].Predictions of other mass models: mic-mac model WS4 [6], non-relativistic Skyrme-type BSk24 interaction [20], and relativistic point-coupling interaction DD-PCX [15,47], are also provided for comparison.FIG.4: The same as in Fig. 3 but using GPR-12 and SVR-12 ML models.5: Upper panels: one-neutron separation energies for Kr (a) and Nd (b) isotopic chains using GPR-12 and SVR-12 models.Lower panels: two-neutron separation energies for Kr (c) and Nd (d) isotopic chains.The blue shaded region represents the 95.0% confidence interval.Theoretical model calculations (WS4, BSk24, DD-PCX) and experimental data are also provided when available [4].
the physics-based feature space in this particular region.Do the results of the ML models satisfy the Garvey-Kelson mass relations?The Garvey-Kelson relations [49], which are based on the independent particle shell model, consist of mathematical expressions that establish links among the masses of neighboring nuclides.These relations arise from the condition that various interactions between nucleons cancel out at the first order, resulting in a series of mass relations between adjacent nuclei [49,50].The GK mass relation for nuclei with N ≥ Z is given by and for nuclei with Z < N , Using the results obtained from the GPR-12 and SVR-12 models, we also assess whether the GK relations are maintained in our ML models, serving as an additional evaluation of the ML models and their extrapolation abilities.In Fig. 6, we present the results of the GK relationship described by Eqs. 5 and 6.It is evident that the GK relationships are well maintained within the training and test set regions for the ML models under consideration.However, deviations become apparent with increasing proton and neutron numbers, especially for low mass nuclei and throughout the neutron drip lines.Interestingly, while the GPR-12 model seems to perform better than the SVR-12 model near the neutron drip line (see Fig. 5), we find that the SVR-12 model exhibits better performance in the neutron-rich region concerning the GK mass relations.The differences between GPR and SVR predictions can be attributed to their distinct mathematical principles and model complexities.Including physical constraints, such as GK mass relations, alongside the physical feature space in the ML models, may enhance the model predictions in the extrapolation region [31].
Explainable AI -The implementation of ML models often faces the challenge of their perceived 'black box' nature.To counter this issue, Explainable AI (XAI) techniques have become increasingly popular for their role in demystifying these models and enhancing understanding.Among a range of XAI techniques, SHapley Additive exPlanations (SHAP) [51] has emerged as a prominent technique that has achieved widespread recognition.
The SHAP technique utilizes the concept of SHAP values, derived from Game Theory, which illustrates the individual contributions of players in a cooperative coalition.This concept, originally known as Shapley values [52], has been extensively studied in game theory literature [53].Recently adapted to AI research, specifically in XAI, this approach treats model features as 'players' and the prediction as the 'game'.SHAP values assigned to these features indicate their relative importance compared to a baseline reference.Thus, this technique effectively highlights the features most influential in the model's decision-making process.We apply the SHAP technique to interpret the results of the GPR-12 model more in depth.For the test dataset, we compute the SHAP values, where each value indicates the contribution of a specific feature to the model's prediction.These SHAP values are visually summarized in the Figure 7.The SHAP summary plot offers an insightful illustration of how each feature influences the predictions by the GPR-12 model.In this plot, features are ordered on the y-axis based on their impact, with the most impactful feature positioned at the top and the least impactful at the bottom.To manage the extensive computational demands of calculating SHAP values for the GPR-12 model, we adhered to the guidelines suggested in the official SHAP documentation [54], utilizing k-means clustering on the training data.We condensed the training data into three clusters using k-means, assigning weights to each cluster proportionate to the number of data points it encompasses.Experiments with varying numbers of clusters, including more than three, consistently yielded comparable results.
The analysis reveals that A 2/3 is the most impactful factor in predicting the mass, as shown by the SHAP values.It is closely followed by Z, A and N , both making noteworthy contributions to the model's predictions.In contrast, Z eo and N eo demonstrate a limited impact on predicting the mass, as indicated by their lower placement on the plot.Nonetheless, their inclusion is important to improve model predictions as we explain above in Fig. 2. The SHAP values depicted in the Figure 7  It is worth noting that we also examined the SHAP summary plot for the SVR-12 model, and the results are found to be identical.In Figure 7, the contributions of features beyond the top five may appear minimal.However, as previously explained, the GPR-12 model outperforms its versions with fewer features (see Fig. 2).This enhanced performance of the GPR-12 model can be attributed to the interactions between features, which can also be examined in detail through SHAP analysis.The SHAP analysis provides us with the opportunity to visualize the binary interactions between features.Although we can pinpoint the most impactful features in the ML models using SHAP summary plots as shown in Figure 7, interactions between these features also play an important role in the models' performance.SHAP interaction plots provide us with an opportunity to observe the interactions of features across different parts of the nuclear chart and better understand the working mechanism of the ML models by making them more transparent.
In Figure 8, we present selected interaction plots derived from the SHAP values of the GPR-12 model.While many interaction plots can be generated based on SHAP analysis, we choose to focus on interactions between the feature proton number Z and others to simplify our discussion.A majority of red in the interaction plots suggest a positive joint contribution of both features to the model's prediction.This means that higher values of these features together are likely to elevate the model's output.In contrast, a majority of blue suggests that the combined features negatively influence the model's prediction, with lower values of both features together expected to decrease the model's output.Thus, we can pinpoint critical feature interactions and enhance our understanding of the model's decision-making based on feature combinations.For instance, the combined effect of higher values of Z and A (see Fig. 8(a)) impacts the model's prediction positively, while lower values have a negative impact on the output.A similar situation is also observed among Z, ν N (c), and Z shell (f).It is also seen that the interaction between Z and P F (e) shows variations according to the region of interest.Nonetheless, low values of Z and P F have a negative impact on the output.On the other hand, there is no such interaction between Z and (N − Z)/A or Z eo , as shown in panels (b) and (d) of Fig. 8.The combined effects of Z and (N −Z)/A, and Z and Z eo can demonstrate both positive and negative impacts across all regions.It is clear that, in the majority of plots, the interaction of proton number Z with other features demonstrates a negative impact on the predictions of nuclei with low mass.Similar results are also observed for other impactful features, such as the neutron number (N ) and mass number (A), indicating the necessity to identify relevant features to probe these regions more effectively.Therefore, interaction plots can be useful for identifying the relevant features to enhance predictions of ML models in regions with low prediction capability.

IV. CONCLUSION
This study presents successful implementations of two ML models, SVR and GPR, using the available experimental data and physics-based feature space to make predictions for the mass excess of atomic nuclei.The ML models achieve good results not only in accurately predicting nuclear mass excesses for training and test sets but also in demonstrating robust extrapolation capabilities.Our comprehensive analysis, which includes the extrapolation region using the newly measured data from AME2020 and the region beyond, underscores the models' success in handling a diverse range of nuclear data.In addition to demonstrating the effective application of ML models, our study incorporates SHAP, an Explainable AI (XAI) technique, enhancing the interpretability of our ML models.
It is evident that SVR and GPR can be effectively utilized as reliable and efficient tools for predicting mass excess of atomic nuclei.This study highlights the potential of these ML models as powerful tools in nuclear physics and opens up new avenues for future research.These ML models can be further refined to improve their performance, especially near the drip lines.While the chosen ML models demonstrated success in predicting the mass excess of atomic nuclei, their potential applications in exploring additional nuclear properties and evaluating their performance remain as tasks for future research.
i n s e t T e s t s e t E x t r a p o l a t i o n N e u t r o n N u m b e r ( N ) P r o t o n N u m b e r ( Z ) t r o n n u m b e r ( N ) N e u t r o n n u m b e r ( N ) N e u t r o n n u m b e r ( N ) S V R -1 2 s e x c e s s [ M e V ] N e u t r o n n u m b e r ( N ) s e x c e s s [ M e V ] N e u t r o n n u m b e r ( N ) s e x c e s s [ M e V ] N e u t r o n n u m b e r ( N ) a ) s e x c e s s [ M e V ] N e u t r o n n u m b e r ( N ) s e x c e s s [ M e V ] N e u t r o n n u m b e r ( N ) t r o n n u m b e r ( N ) s e x c e s s [ M e V ] N e u t r o n n u m b e r ( N ) a ) t r o n n u m b e r ( N ) N e u t r o n n u m b e r ( N ) N e u t r o n n u m b e r ( N )

FIG. 6 :
FIG. 6: The GK mass relations for the results obtained using the (a) GPR-12 and (b) SVR-12 ML models.The dashed gray lines indicate the borders of the training and test set regions.

40 FIG. 7 :
FIG. 7: SHAP summary plot for the GPR-12 model.Each input is represented by a horizontal bar on the plot, where the length of the bar reflects the SHAP values' magnitude.The color of each bar indicates the direction of the feature's influence on the prediction: blue for a decrease with lower feature values and red for an increase with higher feature values, with the intensity of the color denoting the magnitude of the feature's value.

FIG. 8 :
FIG.8: Selected SHAP interaction plots.In these plots, an intense red color indicates higher positive SHAP values, while a deep blue color signifies lower negative SHAP values.

TABLE II :
[4]t mean square errors σ rms (in MeV) for GPR-12 and SVR-12 ML models, indicating their performance on training and test sets for Z ≥ 8 across varying train-test data ratios from AME2020 set[4].The percentages represent the proportion of data allocated to the training and test sets.

TABLE III :
The root mean square errors (given in MeV) for the extrapolation set (71 nuclei from AME2020).The calculations are performed using different inputs.