Novel techniques for alpha/beta pulse shape discrimination in Borexino

Borexino could efficiently distinguish between alpha and beta radiation in its liquid scintillator by the characteristic time profile of their scintillation pulse. This alpha/beta discrimination, first demonstrated at the tonne scale in the Counting Test Facility prototype, was used throughout the lifetime of the experiment between 2007 and 2021. With this method, alpha events are identified and subtracted from the beta-like solar neutrino events. This is particularly important in liquid scintillator as alpha scintillation is quenched many-fold. In Borexino, the prominent Po-210 decay peak was a background in the energy range of electrons scattered from Be-7 solar neutrinos. Optimal alpha-beta discrimination was achieved with a"multi-layer perceptron neural network", which its higher ability to leverage the timing information of the scintillation photons detected by the photomultiplier tubes. An event-by-event, high efficiency, stable, and uniform pulse shape discrimination was essential in characterising the spatial distribution of background in the detector. This benefited most Borexino measurements, including solar neutrinos in the \pp chain and the first direct observation of the CNO cycle in the Sun. This paper presents the key milestones in alpha/beta discrimination in Borexino as a term of comparison for current and future large liquid scintillator detectors


INTRODUCTION
For as long as it operated, Borexino was the only detector capable of measuring solar neutrino interactions (position and energy) on an event-by-event basis with a threshold ≳ 150 keV, i.e., down to the 14 C β-spectrum end-point.An important feature of Borexino was the possibility to efficiently separate events initiated by recoiling electrons (β-like events) versus α particles.The former include solar neutrino interactions as well as background from β and γ decays.This is possible via pulseshape discrimination (PSD) techniques that exploit the different time profile of the scintillation emission for α and β-like events (see, e.g., [1]).The so-called α/β discrimination played an important role in solar neutrino measurements throughout the Borexino data taking between 2007 and 2021.It is worth noting that Borexino also achieved β − /β + separation via PSD, as reported in [2] and [3]; the latter topic is, however, outside the scope of the present article.
PSD for α/β separation was first studied within the Borexino program with the 4-tonne "Counting Test Facility" (CTF) prototype [5,6].The original method is based on the Gatti parameter [4] and enabled a statistical subtraction of α background, especially from 210 Po, from the measured energy spectrum.Monochromatic, 5.3 MeV 210 Po alphas (Q α =5407 keV) appeared in the Borexino liquid scintillator as a peak at ∼500 keV of electron-equivalent energy due to a greater than ten-fold quenching of the scintillation for these highly ionizing tracks [6].At the start of the Borexino data taking, the 210 Po rate was ∼ 8000 counts per day per 100 tonnes (hereafter, cpd/100 t).Quenching was also observed for other α particles.These include those from the thoron ( 220 Rn) and radon ( 222 Rn) decay chains which are handily identified using their time coincidence, e.g., 212 Po (8954 keV), 214 Po (7833 keV), and 218 Po (6114 keV).
Because of its quenching, the 210 Po peak falls within the 7 Be solar neutrino Compton-like energy spectrum, which presents a characteristic shoulder at 662 keV.Although in this case the 7 Be shoulder appears at higher energy than the 210 Po peak, making it possible for the multi-parameter spectral fit to clearly identify these two separate components, the Borexino analysis was performed both with and without bin-by-bin statistical α/β subtraction of the 210 Po peak to ensure that there was no subtle bias due to the presence of the α background.This was particularly true in Phase-I of the experiment, when the 210 Po activity was more than two orders of magnitude greater than the 7 Be event rate (∼50 cpd/100 t over the entire energy range).In each bin, we assumed the Gatti parameter to be normally distributed with a mean value linearly dependent on energy [7][8][9].
As data-taking progressed, the 210 Po naturally decayed with a lifetime τ = 199.6 d.The reduction of this background was, however, counterbalanced by a progressive degradation of the energy resolution of the detector due to the loss of photomultiplier tubes (PMTs).The count of working PMTs decreased for ∼2000 units in mid 2007 to ∼1000 units at the end of 2021.This effect and the need for a more uniform, stable, and higher efficiency α/β discrimination for the study of CNO solar neutrinos suggested exploring novel techniques based on neural networks already extensively employed in particle physics.We pursued neural networks based on multi-layer perceptron (MLP).The subject of this paper is the description of the MLP input parameters, structure, and training strategy given the Borexino scintillator properties and layout.It also presents studies of the network's efficiency for α/β discrimination.
In Sec.I, the main characteristics of the Borexino detector and its main physics results relevant to this article are briefly reviewed.In Sec.II, the α/β PSD in Borexino using the Gatti parameter is presented.In Sec.III, the implementation strategy of the MLP on the Borexino scintillator time profile is described along with an evaluation of its performance and efficiency.Finally, in Sec.IV, the impact of α/β MLP discrimination on the CNO solar neutrino analysis and on other Borexino results over its 14 years lifetime is discussed.

I. THE BOREXINO EXPERIMENT
Borexino was located in the Hall C of Laboratori Nazionali Gran Sasso (LNGS) of the Italian Institute of Nuclear Physics (INFN) [10].The detector had been taking data from mid-2007 to the end of 2021, and is currently under decommissioning.The detector is made of concentric layers of increasing radiopurity (see for details e.g.[11]): the innermost core, called Inner Vessel (IV), consists of about 280 tons of liquid scintillator (pseudocumene mixed with 1.5 g/l of PPO as scintillating solute) contained inside an ultra-pure nylon vessel with a thickness of 125 µm and a radius of 4.25 m.A Stainless Steel Sphere (SSS), filled up with the remaining 1000 m 3 of buffer liquid (pseudocumene mixed with DMP quencher) is instrumented with more than 2000 PMTs for detecting the scintillation light inside the IV.Finally, the SSS is immersed in an about 2000 m 3 Water Tank (WT),  The Borexino data set is divided, according to the internal conventional subdivision of the experimental program, in three different phases: Phase-I, from May 2007 to May 2010, ended with the calibration campaign, in which the first measurement of the 7 Be solar neutrino interaction rate [7][8][9] and the first evidence of the pep [2] were performed; Phase-II, from December 2011 to May 2016, started after an intense purification campaign with unprecedented reduction of the scintillator radioactive contaminants, in which a 10% first spectroscopic observation of the pp neutrinos [12] was published, and later updated in the solar neutrino comprehensive analysis of all pp chain neutrino fluxes [3,13,14]; finally, Phase-III, from July 2016 to October 2021, after the thermal stabilisation program, in which the first detection of the CNO neutrinos [15] and its subsequent improvements [16,17] were achieved.The most important solar neutrino results in terms of interaction rate and corresponding fluxes are summarised in Tab.I. Thanks to its unprecedented radio-purity, Borexino has also set a lot of limits on rare processes [19][20][21][22][23] and performed other neutrino physics studies, as e.g.geo-neutrino detection (for review, see e.g.[24]).As it will be highlighted in the Sec.IV, all these important results are strongly dependent on the α/β PSD optimisation.In the following Section, the α/β discrimination problem is introduced, starting from the first method exploited by the Collaboration, based on the Gatti parameter.

II. α/β DISCRIMINATION: THE GATTI PARAMETER
The α/β discrimination in Borexino is possible thanks to the sizeable difference between the time distributions of the scintillation light (pulse shape) for α and β-like events (see Fig. 1).For each event meeting the threshold very frequent 14 C events), more than one cluster could be identified as superposition of two of more different events.The position of events is determined via a photon time-of-flight maximum likelihood method with probability density functions (PDF) based on experimental data and Monte Carlo simulations, resulting in an uncertainty of 10 cm for each of the three Cartesian spatial coordinates.The spatial resolution is expected to scale naively as 1/ √ N P E where NPE is the number of detected photoelectrons [25].For the most important analyses in Borexino, the fundamental event selection is based on the following criteria: internal only trigger (no muon veto coincidence), event time 2 ms off a preceding muon event, single cluster in the acquisition window and position reconstructed in r ≲ 3 m.These cuts guarantee that the selected event is a neutrino-like candidate, i.e. an event occurred in the innermost part of the IV (≲100 t) and far enough from the external background coming from the SSS and from the IV structures.
After the application of the selection criteria listed above, the typical Borexino spectrum shows a prominent 210 Po α peak at about 500 keV, that falls inside the 7 Be energy window, see e.g.[8].At the beginning of Phase-I the 210 Po activity was of order 10 4 cpd/100 t.At the beginning of Phase-II, more than 4 years later, the activity went down by one order of magnitude to ∼ 10 3 cpd/100 t, a bit more than expected because a little amount of 210 Po was reintroduced by the water extraction campaign.Finally in Phase-III, after more than 4 years and thanks to the thermal insulation campaign, which reduced drastically the scintillator convective motions (see Sec. IV for further details), the 210 Po activity was significantly lowered by another order of magnitude, namely ∼ 10 2 cpd/100 t.This allowed one to reach the condition of the CNO measurement via the so-called 210 Bi-210 Po link [15].
An estimation of the 210 Po activity and its possible independent quantification for the Borexino analysis can be done for example by defining a simple parameter called tail-to-tot (t2t), which is defined as the fractional portion of the time distribution of the hits above a given characteristic time t 0 with respect to the beginning of the scintillation, namely where S(t) is the scintillation time distribution.The characteristic time t 0 can be optimised by maximising the figure of merit defined as the difference between the t2t populations for α and β events.This sort of parameter works very well for example for separating electron and nuclear recoils in liquid argon scintillation chambers [26], where the scintillation light is basically made of a combination of two typical exponential decay times, differing 3 order of magnitude from each other (typically 6 and 1600 ns).This is not the case of the Borexino scintillation, where the time behaviour is more complicated and less specific for different particle types [27].
As a consequence, t2t in Borexino gives a more mild α/β separation rather than a real high efficiency event classification.
A more efficient identification of α/β, instead, can be performed using discriminating procedures like the Gatti optimal filter [4].The latter allows one to classify two types of events with different, but known, time distributions of hits as a function of time.Their reference shapes P α (t) and P β (t) are created by averaging the time distributions of a large sample of events selected independently, without any use of pulse shape variables.A practical way to build the reference shapes is to use the space-time correlation with about 90% efficiency, basically limited by the trigger threshold for the preceding β event.The first of the two events in time provides a pure β-like sample with events mostly located in the energy interval 1500-3000 keV as superposition of different β and γ lines, while the second provides a pure α sample with events peaked at about 800 keV and smeared only by the detector resolution.The radon events in the IV (with about one week lifetime) are strictly related to invasive operations on the scintillator (especially at the beginning of Phase-I and during the WE campaign), while are basically absent in quiet periods as Phase-II and especially Phase-III.
The functions P α (t) and P β (t) represent the PDFs as a function of time of detecting a PE for events of type α or β, respectively.Let e(t) be the normalised time distribution of the light for each event.The Gatti parameter G is defined as where w(t) are the weights given by The G parameter follows a probability distribution with the mean value ⟨G α,β ⟩, which depends on the particle type, namely In the Borexino scintillation, the Gatti mean values are empirically found to be linearly decreasing with energy.Finally, considering the Poissonian statistical fluctuations of the entries in each time bin, the corresponding variance, following the variance expansion identity, reads In the real experimental case, the integration in Eqs. 4 and 5 are converted into a sum over histograms binned at 1 ns from zero to about 1.5 µs.In the scintillator used by Borexino, α pulses are slower and have therefore a longer tail with respect to β pulses.This feature represents basically the key for the α/β separation.Examples of reference shapes P α (t) and P β (t), from the 214 Bi-214 Po tagging of the radon events from Phase-I, are shown in Fig. 1: the dip at 180 ns is due to the dead time on every individual electronic channel applied after each detected hit.The small knee around 60 ns is due to the reflected light on the SSS surface and on the PMTs photo-cathodes.The distributions of the corresponding G parameters (G α and G β ) for events with respect to these reference PDFs are shown in Fig. 2. The two distributions, resembling Gaussian shapes, are partially overlapped due to the sizeable G variance.As a consequence, when the number of α events largely exceeds that of the β's, a high efficiency event-by-event α/β selection is anyway limited.In principle, a bin-by-bin statistical separation of the two event populations is possible, whenever the G α,β distribution are known either analytically or through a Monte Carlo simulation.Since the mean values and the variances of G α,β are energy dependent, their distributions are fitted to two Gaussian models for each bin in the energy spectrum of interest, and their value are forcibly constraint around the linear dependence guess.The integral of the fitted curves represents the relative contribution of each species in each energy bin, and the α contribution is subtracted from the total bin content, thus obtaining the β-like spectrum by statistical subtraction.
We make the reasonable hypothesis that the underlying distributions are Gaussian.The fit procedure also provides the error of the estimated particle population to be replaced in the corresponding bin in which the subtraction is performed.In bins where one species greatly outnumbers the other, for example in the energy bins in which the 210 Po is peaked, the mean values of the Gaussian parameters are fixed to their predicted values, extrapolated from the energy dependency trend, in order to avoid any possible bias in the subtraction procedure.Figure 5 shows an example of the G αβ parameter in the energy range bin 200-205 NPE and its fit to the analytical model.Furthermore any other possible double Gaussian fit bias, due to the large difference in the two population statistics, is corrected according toy Monte Carlo simulations with the same population ratio.
The statistical subtraction can be applied in the full 7 Be energy window, removing all α backgrounds coming mostly from 210 Po, but also from other 222 Rn α's daughters such as 214 Po and 218 Po leaking the fast coincidence cut.This secondary subdominant contamination is partially affecting Phase-I, but is it completely negligible in The blue and red lines show the individual Gaussian fits to the Gatti parameter distributions for the β and α components, respectively, while the green line is the total fit.
Phase-II and Phase-III, thanks to the better background condition achieved after the WE campaign and 210 Po decay.The error associated to the statistical subtraction is propagated as a systematic uncertainty on the final neutrino interaction rates [7][8][9].It is worth mentioning that a possible bias due to the presence of the 210 Po peak is not negligible only in Phase-I when the 210 Po activity is much larger than 7 Be neutrino interaction rates.In fact, in Phase-II and in Phase-III the statistical subtraction of the α component is not applied and the 210 Po is simply quantified by the spectral fit, see e.g.[3,15].Figure 3 shows the Gatti distribution as a function of the event energy in NPE for the first 300 days of Borexino Phase-II.The big blob on the top represents the α distribution, consisting basically of 210 Po events, while the bottom horizontal belt represents the β-like component (solar neutrinos and background).The Gatti parameter shows a neat separation of the α/β population as a function of the energy.
Figure 4 shows the implementation of the α/β statistical subtraction in Phase-I: the black curve represents the energy distribution of all events before applying the basic selection criteria.The blue curve represents the event energy distribution after the fiducial volume selection: below 100 NPE the spectrum is dominated by 14 C decay (β − , Q=156 keV) [28] and the peak at 200 NPE is dominated by 210 Po decays.The red curve is the final spectrum after the statistical subtraction of the α component.The prominent feature around 300 NPE includes the Compton-like edge due to 7 Be solar neutrinos.Finally, the large bump peaked around ∼600 NPE is the spectrum of the cosmogenic 11 C (β + , Q=1.98 MeV, created in situ by cosmic ray-induced showers).
Thanks to its relevant discrimination power, the α/β based on the Gatti parameter has been applied in many Borexino analyses also as soft cut for pre-selecting events and hard cut to locate the main α contaminants, as the 210 Po within the fiducial volume, and for understanding the nature of the main backgrounds [1], e.g. in the geo-neutrino analysis [24].In those cases, no statistical subtraction is applied, and the Gatti parameter selects the α population with a given efficiency, depending on the position of the cut itself.
The optimisation of the Gatti filter, already exploited in the Borexino CTF, played a crucial role in many important Borexino studies.Nevertheless new requirements and some drawbacks pushed the Collaboration to investigate other novel techniques based on neural networks.As it will be described in Sec.IV, the CNO feasibility study had been requiring, since the beginning of Phase-II, a deep understanding of the spacial evolution of the 210 Po contamination.This analysis required, instead of a statistical subtraction, a high efficiency event-by-event selection uniform in space, and easily modelable in energy and time.The PMT loss, and the consequent resolution degradation, is affecting the Gatti parameter distributions, but, more important, the Gatti has shown since the beginning a spatial dependence, especially along the radial direction.Figure 6 shows, indeed, the shift of the G α,β as a function of r 3 for 214 Bi-214 Po events (N.B.: plotting data as a function of r 3 remove the spherical volume dependence over r).This dependence is neither easily modelled nor completely reproducible in Monte Carlo simulations.Contrary to the Gatti filter, artificial neural networks, accepting plenty of input parameters, and returning their corresponding ranking for the specific case of the α/β selection, helped a lot in understanding the origin of the radial dependence of the PSD.They offered a more uniform selector, with a controllable dependence of the efficiency upon energy and time, as discussed later in Sec.III C. In the next Section, the strategy for the implementation and tuning of a class of multi-layer perceptron is reviewed.

III. IMPROVING THE SELECTION WITH MLP A. Artificial neural networks
Artificial neural networks (ANNs) are a major component of machine learning and are designed to detect patterns in data [29].This makes ANNs the optimal solution for classifying (sorting data by predetermined categories), grouping (finding similar characteristics among data and combining that data into categories), and making forecasts based on the data.
An ANN is, more generally speaking, any simulated collection of interconnected neurons, with each neuron producing a certain response at a given set of input signals.The input data can be values for the characteristics of an external data sample, such as images or documents, or it can be outputs from other neurons.
By applying an external signal to some input neurons, the network is put into a defined state that can be measured from the response of one or several output neurons.One can therefore view the neural network as a mapping from a space of input variables x 1 , ..., x nvar onto a onedimensional (e.g. in case of a signal-versus-background discrimination problem) or multi-dimensional space of output variables.The mapping is nonlinear if at least one neuron has a nonlinear response to its input.It is important noticing here that Gatti parameter is linear over the input parameters, as one can easily see from Eq. ( 2), therefore, given the input data set, it cannot do better than their intrinsic statistical power.
A multilayer perceptron (MLP) is a class of feedforward artificial neural network.An MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer.Except for the input nodes, each node is a neuron that uses a nonlinear activation function.MLP utilises a supervised learning technique called back-propagation for training.

B. TMVA package
The Toolkit for Multivariate Analysis (TMVA) provides a ROOT-integrated environment for the processing, parallel evaluation and application of multivariate classification techniques [30].TMVA is specifically designed for the needs of high-energy physics (HEP) applications where the search for ever smaller signals in ever larger data sets has become essential to extract a maximum of the available information from the data.Multivariate classification methods based on machine learning techniques have become an essential ingredient in most of the HEP analyses.The package hosts a large variety of multivariate classification algorithms, e.g.artificial neural networks (three different MLPs implementations), support vector machines (SVM), boosted decision trees (BDT), etc.
Independent input data sets used for training and testing of multivariate methods must be defined prior to the algorithm implementation.The most important step is to identify such input parameters that are important in order to obtain the highest possible efficiency of the pulse shape discrimination of signals, and this can be done looking at the returned ranking of the variable themselves at each controlled trial.

C. Selection of input variable and different versions
It is standard practice to normalise the input variables before integrating them into the ANN.In the Borexino case of α/β discrimination, a set of t2t variables were defined for ten different t 0 , according to Eq. ( 1).Due to the fact that the distributions for α and β (Fig. 1) differ mainly in the tails, times after 10 ns were chosen, i.e. t 0 in the set {35, 70, 105, 140, 175, 210, 245, 280, 315, 350} (ns).To this set, the root mean square (RMS) and kurtosis of the photoelectron time distribution were added.At this stage, having a set made of t2t's, RMS and kurtosis in the input vector, the MLP algorithm returns the discrimination efficiency similar to Gatti, as expected.The statistical theory is absolutely constraining here, so it was necessary to try to add information not present in the first trial.A breakthrough came when it was noticed that the time distribution of the scintillation events is analysed after the event position correction, determined by the time-of-flight of photons originated in a point-like scintillation and propagated towards the PMTs.From subsequent trials, it was observed that the mean-time variable of the hits calculated before the position reconstruction ("non reconstructed cluster") was adding some missing information, possibly lost with the position correction.This recovered information improved the α/β discrimination, even solving the radial dependence observed in the Gatti parameter.This mean-time is basically the mean of the temporal PDF of the scintillation events, in which the times are associated with the photomultiplier reference system.This finding also clarified why in CTF the α/β discrimination was working in a more efficient way.In practice, since the CTF detector was a few meters small, there was basically no bias due to off-centre event reconstruction.This guess is also confirmed for events located in a region very close to the centre of Borexino, where the Gatti parameter does not show a substantial bias, and exhibits a very high efficiency.
A further improvement is achieved in MLPs in which the ten t2t's input variables are replaced with the ten PDF quantiles.Quantiles gives same statistical weight to the input variables, being indeed defined as one tenth of PDF area.This definition avoids numerical quantisation problem of t2t's, coming from the integer definitions of t 0 's (see Fig. 7) and, more important, remove the correlation present by definition on the t2t inputs, basically due to the partial overlap of the integrals for different t 0 .The Gatti parameter in Borexino was initially tuned on 7000 214 Bi-214 Po events collected during the scintillator operations, before the official start of data acquisition in mid-2007.Subsequently, during six cycles of the WE campaign, occurred in between 2010 and 2011, another and bigger sample of 214 Bi-214 Po events was collected.This sample, on which the Gatti parameter was upgraded and the MLP studies are based, contains in total 85.000 events, whose 27000 events lies in the fiducial volume region r ≃ 3 m.The only issue, that must be taken into account and controlled, is the fact that, in both data-sets (Phase-I and WE), the radon events were observed mainly on the detector top, in the region above the equator (z > 0).This evidence, supported also by the fluid dynamic simulations performed for the CNO analysis [15], is a consequence of the effective sepa-ration between the two hemispheres due to the fluid motion in relation with the spherical geometry.After the MLP training, a slight top-bottom asymmetry was actually observed, but was found not of practical relevance inside the analysis fiducial volume.
The final training sample (Sample-WE) contains in each MLP version 25000 events with r < 3 m from the WE period.The comparison of performances, among MLPs and Gatti, was done on a reduced sample made of about 15000 events for training, and about 15000 events for test (Sample-WEX), both with larger radii to study also the radial dependence.Another test samples (Sample-Ph23), for double-checking the evolution of the efficiency in time, space and energy, was chosen selecting an α/β sample, not from the 214 Bi-214 Po (basically absent after WE), but from two energy intervals of the Borexino Spectrum in the first 1000 days of Phase-II, in which the contribution of the 210 Po activity is still sizeable.The α sample is selected in a very narrow region of the 210 Po peak (209-210 NPE), with a very small contamination of the underlying β-like component from solar neutrino and β decays; whereas the β sample is selected in the 7 Be shoulder region (320-400 NPE), where the leakage of the α events from the 210 Po right tail is also negligible.
As anticipated, the MLP after the WE period, shows a performance degradation because of PMT loss with corresponding degradation of event reconstruction resolution, resulting in a time and space (radial) dependence.Furthermore, the 214 Po emits a mono-energetic alpha line about 50% higher than the 210 Po peak energy, that falls indeed in the region of interest for the solar neutrino analysis.As a consequence of the energy dependence of the scintillation temporal PDFs, the MLP efficiency evaluated at the 214 Po line is not directly applicable for example on the 210 Po analysis.The correct assessment of the space, time, and energy dependence of the MLP was studied using also calibration data and Monte Carlo simulations.In the following paragraph we will report the main MLP features investigated in the Borexino analysis.

E. Different versions of α/β MLP
The most important MLP versions, which gave similar performances and were used in the main Borexino analyses, are listed here: 1. MLPv8: This version is the first showing a significant improvement with respect to Gatti.The input variable are the ten t2t's described above, in addition with the RMS, the kurtosis and mean-time of the non-reconstructed cluster.
2. MLPv10: This version is similar to MLPv8, but t2t's are replaced with 10 quantiles.In some cases, this version shows a slightly better performance as compared with MLPv8, for the reasons described above, especially for low energy events.
3. MLPv12: This version was meant to solve problems coming from the energy difference between the training 210 Bi sample, and the low energy region where the α/β is actually applied.The 210 Bi sample energy is artificially reduced by thinning out the number of photoelectron (randomly removed), in a ratio of 1:2 for the 214 Po and of 1:4 for 214 Bi.Although this method is assigning the correct statistical weight to the training samples (and so similar to the low energy region), it cannot include real energy dependence of the scintillation PDFs upon the energy.In order to compare and contrast the different versions of MLPs, several studies have been performed, especially in terms of efficiency, space uniformity and time stability.
The TMVA package returns the normalised selector in the 0-1 interval, sharply peaked at 0 for α's and at 1 for β's in the Borexino choice.Figure 8 shows the distribution of the 0-1 selector for the versions of interest for α's (red) and β's (blue) from test Sample-WEX.All of them are basically comparable, even if MLPv8 shows in general a better symmetry and sharper distributions.MLPv12, for the reasons discussed above, shows a more smeared distribution, even though with a good separation.This differences can be understood comparing the same discriminator with Receiver Operating Characteristic (ROC) plot as reported in Fig. 9.In particular, assuming α's as signal and β's as background (N.B.: opposite to the typical TMVA convention in this particular analysis), the ROC curve reports the True Positive Rate (TPR) as function of the False Positive Rate (FPR) changing the selection threshold m 0 in the interval 0 < m 0 < 1, that is where M α,β are the corresponding PDFs of the MLP parameters, always from the test Sample-WEX.From Fig. 9, one can compare the overall performance of different MLP versions (bluish and greenish curves) and the Gatti parameter (red), as reported in the Figure legend.The better discriminator approaches a right angle shape in the top-left corner.From this Figure, we can conclude that all MLP have an overall good performance, considerably better than Gatti for FRP < 0.2%, e.g. for 99.75% TPR one can have a factor 2 less contamination from FPR.

F. MLP radial dependency
The radial dependence of the MLP selector, strictly related to the position dependence of the reconstructed cluster, plays a crucial role in the 210 Po spatial analysis, as described at the end of Sec.II.It is therefore important to study, through Monte Carlo simulations and through the test samples, any possible feature of the MLP related to the position and its possible bias in the 210 Po activity determination.Figure 10 shows the radial efficiency determined from the test Sample-Ph23 for MLPv8 (green) and Gatti (red): the first shows a better behaviour in terms of spatial uniformity.If one consider the CNO fiducial volume, located at about 21 m 3 on the x−axis, the efficiency is pretty uniform.As discussed in [15], the non-uniformity of the MLPv8 efficiency is indeed negligible, as compared with the energy and time dependence, which will be discussed below.
Such studies have been performed with different methods and with different values of the MLP selection threshold.

G. Stability and energy dependence of MLPs
The time dependency of the MLP, for a given selection cut, in the 210 Po energy region has been carefully studied for the time stability of the 210 Bi and 210 Po activity in the context of the CNO neutrino analysis, see Sec.IV.In order to obtain the selection efficiency of 210 Po and the corresponding leakage of β events by the cut itself, events in the fiducial volume analysis are fitted with the so-called MLP-complementary method.In the latter, the data-set, year by year from 2011, are split into two histograms depending on events passed or not the MLP cut, named "MLP-subtracted" and "MLP-complementary", respectively, as reported in Fig. 11.
For typical analysis, a PSD threshold of m 0 < 0.05 and energy in 150-300 NPE interval are used.Under these conditions, the fitted spectra, as a function of the energy E [NPE], are defined as where S bx (E) is the typical Borexino spectrum with all fitted species (see e.g.[13]), S α,β are the resulting α and β selected spectra and, finally, A and E 0 are two free parameters.Notice that the ansatz that the energy dependence of the MLP cut is exponential is suggested by Monte Carlo simulations and calibration data, and also by general considerations about the statistical nature of the neural network output.Figures 12 and 13 show the exponential energy dependence of the MLP cut from Monte Carlo simulation and from calibration data, respectively.In both Figures, the energy estimator is npmt, i.e. the number of hit PMTs during a scintillation event without double counting piled-up events, see [1] for further details.Either in calibration data and in MC events, the percentages left after the MLP cut show an exponential behaviour, supporting the choice of the energy dependence of the efficiency assumed in the MLPcomplementary fit.Finally, Fig. 14 shows the time evolution of the MLP efficiency for the 210 Po energy range year by year, resulting from the model in Eq. ( 8).The slightly decreasing linear trend is compatible with expectation of the event reconstruction degradation, mainly related to the linear PMT loss.This dependence is used to correct the measurement of the 210 Po activity and to determine the corresponding systematic uncertainty for the final result on the CNO neutrino interaction rate.The possibility of tagging α events with high efficiency in space and time was of crucial importance for the first measurement of neutrinos from the CNO cycle [15] with Borexino and its subsequent update [16].
Given the degeneracy, and then the correlation, between 210 Bi, pep and CNO spectra, the sensitivity to CNO neutrinos through the spectral analysis is pretty poor, unless the 210 Bi and pep-ν rates are independently constrained in the spectral fit [31].In particular, the pepν rate can be constrained to 1.4% precision [31], using: solar luminosity along with robust assumptions on the pp to pep neutrino rate ratio, global analysis of existing solar neutrino data [32,33], and the most recent oscillation parameters [34].The pep constraint is essentially independent of any reasonable assumption on the CNO rate, as the solar luminosity depends only weakly on the contribution of the CNO cycle itself.
In practice, the only crucial element at play is the 210 Bi rate, a β emitter with a short half-life (5 days) coming from the 210 Pb (present in the scintillator at the beginning of Phase-II) through the decay chain: (10) Assuming the secular equilibrium, the 210 Bi rate can be determined from the 210 Po activity [31,35,36].Since the 210 Po activity can be measured precisely through the MLP high efficiency α/β tagging, this strategy provided the key solution to tackle the species correlation in the spectral fit and then to lead the Borexino Collaboration to the first observation of the CNO neutrino interaction rate and its subsequent upgrade [15,16].Anyway, the story was not that easy: at the beginning of Borexino Phase-II (early 2013), it was clear that presence of convection motions, caused by the seasonal change of the temperature in the Gran Sasso experimental hall, made it impossible to apply the 210 Bi-210 Po link, as suggested by the sequence in Eq. (10).
In order to solve this problem, a long and challenging thermal stabilisation program was undertaken by the Collaboration to prevent the scintillation convctive motion, and the consequent contaminant mixing in the scintillator.This program, started in mid-2014, consisted of different phases: (i) installation of high precision temperature probes inside and outside the detector, (ii) thermal insulation of the detector with different layers of rock wool, (iii) active temperature control systems of the detector and (iv) of the experimental room.This longstanding effort worked properly and allowed one to set an upper limit on the 210 Bi rate, a crucial ingredient for the final extraction of the CNO neutrino interaction rate from the spectral analysis.
It is worth mentioning that the MLP tagging, with its high efficiency, uniformity and stability, helped in all the stages of this enterprise: from the understating of the 210 Po migration in the scintillator, through study of the effects of the different phases of the thermal insulation program, until the determination of the 210 Bi upper limit rate (see for details the appendix of [15]).
For the CNO analysis the space and time dependence of the α/β tag in the 210 Po region was studied carefully using the MLP complementary analysis and Monte Carlo simulations.In particular, the latter was crucial for the optimisation of the cut and for the efficiency dependency upon the radial position and time.In particular the best cut was defined by maximising the standard signal-tobackground (S/B) figure of merit (FoM): In this case S can be assumed as true positive events (real α's), and B as false positive (β-like events leaking out from the distribution tail).For MLPv8 the best α cut, corresponding to the Phase-III data set, was found at m 0 < 0.3.

B. MLP in other analyses
Besides the CNO analysis, the MLP α/β tagging was used in many other analyses published by the Borexino Collaboration.In particular, it played an important role in the high significance detection of the seasonal modulation of 7 Be neutrinos due to the Earth orbit eccentricity [37], for the reduction of the 210 Po component in the region of interest of the 7 Be spectrum.This analysis was updated including the entire Phase-II and Phase-III data set, leading to the first independent measurement of the Earth orbit eccentricity with only solar neutrinos [38].
In the geo-neutrino analysis, the MLP was used in the event selection with a high performance even at large radii (∼4m) close to the Nylon vessel [24] and for the 210 Po background estimation for the neutron background induced by alpha decays.In addition, the MLP selection was used for the space and time selection of the 210 Po data events for the accurate tuning of Monte Carlo used for simulating the 210 Po spectrum in Phase-II [39] and Phase-III.In particular, this study played an important role in the comprehensive analysis of the pp chain [12,13].This study has provided a measurement of the most important solar neutrino fluxes, which is in favour of the MSW-LMA neutrino oscillation scenario at 98% CL. (see e.g.[40] and Refs.therein).

CONCLUSIONS
In this paper, we offer a detailed review of the α/β pulse shape discrimination adopted in Borexino.We present various implementations used during more than a decade of data taking, starting with the Gatti optimal filter and the corresponding statistical subtraction of the α component from the energy spectrum, ending with the more sophisticated PSDs based on ANNs, specifically exploiting MLP.The latter, with its high efficiency, spatial uniformity and time stability, allowed us to eventby-event select the 210 Po events, a crucially important background reduction which made possible the observation of CNO solar neutrinos in Borexino.
Compared to the Gatti parameter approach, ANNs single out parameters relevant to PSD in a highly non factorizable way.In the case of Borexino, the Gatti parameter was limited by information loss in the photon arrival times after position reconstruction correction.The integration of variables in the MLP before event reconstruction improved the performance of the α/β selection.The MLP implementation required careful calibration to select the best input parameters, tune the algorithm, and evaluate its performance.Its spatial and time efficiency were monitored and used for the evaluation of the global systematic uncertainty of some of the most important Borexino results for which the method was used.
The α/β pulse shape discrimination allowed by intrinsic properties of the scintillator, was proven to be fully exploitable in an ultra-pure, large-volume detector such as Borexino.In particular, it played an essential role in the neutrino spectroscopy of the entire pp chain and the first observation of neutrinos from the CNO cycle.

FIG. 3 .
FIG. 3. Example of α/β separation in the Gatti-Energy(NPE) space (first 300 days of Borexino Phase-II) .The big blob on the top represents the α distribution ( 210 Po), while the bottom horizontal belt represents the β-like component.

FIG. 5 .
FIG.5.Example of α/β statistical subtraction with the analytical curves for events in the energy range 200-205 NPE.The blue and red lines show the individual Gaussian fits to the Gatti parameter distributions for the β and α components, respectively, while the green line is the total fit.

FIG. 6 .
FIG.6.Radial (r 3 ) dependence of the Gatti parameter G (grey) for a sample of 214 Bi-214 Po events.Red and blue points represent the mean values with their uncertainty for α and βlike events, respectively.

FIG. 7 .
FIG.7.Limitations of the t2t input variable in comparison with quantiles.Top: last t2t (t > 310 [ns]) as a function of the energy in NPE for α's (red) and β's distribution.Bottom: same distributions for the last quantile (10 % tail of time PDF). 0

FIG. 10 .FIG. 11 .
FIG. 10.Comparison between the radial efficiencies of the PSD cut for MLPv8< 0.3 (green) and Gatti> 0 (red).In order to account for the cubic increase of the statistics for the radial distribution, the efficiency is reported as a function of r 3 , instead of r.

FIG. 12 .
FIG.12.Example of energy dependence (npmt energy estimator) of the MLP efficiency, performed over Monte Carlo data.The β-like event percentage left after the MLP< 0.015 cut are reported for the versions MLPv12 (orange) and MLPv10 (cyan).Both curves are fitted to exponential functions, namely the red and the blue lines in the logarithmic plot.

FIG. 14 .
FIG. 14. MLP time dependence efficiency from MLPcomplementary fit for the combined Phase-II and Phase-III period for MLPv8, performed over one-year time intervals (blue histogram).A fitted linear trend (red) shows, at leading order, the degradation of the MLP efficiency due to the PMTs loss (p-value= 0.48 ).The uncertainty increases over time because the reduced 210 Po statistics due to its decay.

TABLE I .
Solar neutrino interaction rates in Borexino and extrapolated solar neutrino fluxes for the different components of the pp chain and CNO cycle.Rates are reported in cpd/100 t, while fluxes are reported in cm −2 s −1 .N.B.: HZ stands for high metallicity assumption.
with an accuracy of ∼ 10 cm (at 1 MeV) and the event energy with a resolution following approximately the relation σ(E)/E ≃ 5%/ E/[M eV ].

4 .
MLPv14: Finally this version, attempts to solve the problem of the low energy extrapolation using 218 Po events for α's (with lower energy and then closer to 210 Po).Since the 218 Po precedes the 214 Bi-214 Po fast coincidence by about 30 minutes, a space time cut (1 meter radius and 1.5 hour before) was able to select a pure sample of about 1800 event candidates.Due to the very low efficiency of the 218 Po tagging, this sample contains on one hand a proper representative set of low energy events, but on the other hand it has a limited statistics.