Simultaneous precision spectroscopy of pp , 7 Be , and pep solar neutrinos with Borexino Phase-II

We present the simultaneous measurement of the interaction rates R pp , R Be , R pep of pp , 7 Be, and pep solar neutrinos performed with a global fit to the Borexino data in an extended energy range (0.19 – 2.93) MeV with particular attention to details of the analysis methods. This result was obtained by analyzing 1291.51 days of Borexino Phase-II data, collected after an extensive scintillator purification campaign. Using counts per day ð cpd Þ = 100 ton as unit, we find R pp ¼ 134 (cid:2) 10 ð stat Þ þ 6 − 10 ð sys Þ , R Be ¼ 48 . 3 (cid:2) 1 . 1 ð stat Þ þ 0 . 4 − 0 . 7 ð sys Þ ; and R HZ pep ¼ 2 . 43 (cid:2) 0 . 36 ð stat Þ þ 0 . 15 − 0 . 22 ð sys Þ assuming the interaction rate R CNO of CNO-cycle (Carbon, Nitrogen, Oxigen) solar neutrinos according to the prediction of the high metallicity standard solar model, and R LZ pep ¼ 2 . 65 (cid:2) 0 . 36 ð stat Þ þ 0 . 15 − 0 . 24 ð sys Þ according to that of the low metallicity model. An upper limit R CNO < 8 . 1 cpd = 100 ton (95% C.L.) is obtained by setting in the fit a constraint on the ratio R pp =R pep ( 47 . 7 (cid:2) 0 . 8 cpd = 100 ton or 47 . 5 (cid:2) 0 . 8 cpd = 100 ton according to the high or low metallicity hypothesis). DOI: 10.1103/PhysRevD.100.082004 as in the real data analysis, penalty terms are added in the likelihood to constrain the values of the 14 C and pileup rates within the measured ones. It is interesting to note the correlation between the pp and 85 Kr rates, physically driven by the fact that a not negligible portion of the 85 Kr spectrum lies in the energy region around about 200 keV where we are sensitive to the pp ν s signal. In the left plot, 6700 pseudoexperiments have been generated assuming the R CNO according to HZ-SSM and fitted imposing a constraint on R CNO to the same value. The same MC PDFs have been used to simulate and fit data, so these plots show only uncertainties due to statistical fluctuations and the effects of the correlations among the various components. The top right inset represents the results of the fit of 10000 pseudo-experiments fitted with the MC method while keeping the R CNO free but constraining the R pp =R pep ratio to ( 47 . 7 (cid:2) 0 . 8 ) (HZ-SSM [3,26]). Constraining R pp =R pep to the LZ-SSM prediction, 47 . 5 (cid:2) 0 . 8 , gives consistent results. The study included all the background and neutrino species: here we only show those components that mostly influence the sensitivity to CNO neutrinos.


I. INTRODUCTION
Solar neutrinos produced in electron flavor (ν e )in fusion reactions occurring in the Sun provide a unique and direct way to study the interior of our star. The main contribution to the solar luminosity (∼99%) comes from reactions belonging to the pp chain, while the Carbon-Nitrogen-Oxigen (CNO) cycle is expected to play a subdominant role [1].
The solar neutrino (ν) spectrum, as predicted by the standard solar model (SSM) [2,3], is dominated by the low-energy neutrinos produced in the primary pp reaction (E ν < 0.42 MeV) and it extends up to ∼18.8 MeV (maximum energy of the hep νs). It also features two monoenergetic lines from 7 Be νs ðE ν ¼ 0.384 MeV and 0.862 MeV) and one monoenergetic line from pep νs (E ν ¼ 1.44 MeV). Neutrinos from the CNO cycle are expected to have a continuous energy spectrum extending up to 1.74 MeV. The spectrum of 8 B νs is also continuous and it ends up at about 16.5 MeV.
The 50-year-long experimental effort to study solar neutrinos [4,5] has been extremely rewarding both in terms of solar physics, by confirming the SSM predictions [3], and in terms of particle physics, by giving a substantial contribution to the discovery of neutrino flavor oscillations [6,7]. The present-day precision spectroscopy of solar neutrinos aims at studying the details of their energy spectrum by disentangling the contributions from the different reactions (pp chain νs, namely pp, 7 Be, pep, 8 B, and hep νs, and CNO cycle νs).
On the one hand, if the SSM predictions of solar fluxes ϕ are assumed, measuring the solar neutrino interaction rates R for different reactions helps to pin down the electron-flavor neutrino survival probability P ee for different energies (that is the probability that ν e s do not undergo flavor oscillations while traveling from their Sun production point to the detector). Consequently, it probes the predictions of the MSW-LMA model [8] and can set constraints on possible deviations, e.g., due to nonstandard interactions (NSI) [9].
On the other hand, if the neutrino oscillation parameters are assumed, the study of specific components of the solar neutrino spectrum can cross-check the SSM predictions. In particular, the experimental determination of the fluxes ϕ of 7 Be, 8 B, or CNO neutrinos, which are the most sensitive ones to the solar metallicity (the abundance of the elements heavier than He in the Sun), can help to settle the question of high (HZ) versus low (LZ) metallicity [3].
The Borexino experiment has recently reported a comprehensive measurement of the solar neutrino spectrum from the whole pp nuclear fusion chain in the energy range of (0.  MeV. These results are presented in [10] together with their physical implications. They include the updated values of the neutrino survival probability P ee as a function of the neutrino energy, the first direct measurement of the ratio R between the 3 He þ 4 He (pp-II) and the 3 He þ 3 He (pp-I) branches of the pp chain obtained by combining our results on the 7 Be and pp νs, and finally a preference for the HZ-metallicity choice in the SSM.
In this paper we present the details of the analysis of the data belonging to the lowest part of the energy spectrum which extends from 0.19 to 2.93 MeV. This low energy region (LER) is used to extract the interaction rates R pp , R Be , R pep , as well as to set the limit on R CNO . The analysis of the data from the so-called high energy region (HER) from 3.2 to 16 MeV, where our sensitivity to 8 B νs is maximized and from 11 to 20 MeV energy region, in which the first Borexino limit on hep νs is set, is discussed in [11].
While our previous measurements of the pp [12], 7 Be [5], pep [13], and 8 B [14] νs were obtained separately by analyzing data in restricted energy ranges, the results of [10] provide a unified analysis over the interval covering the LER and HER. The experience from the previous analyses in different energy intervals, each of them having specific difficulties, was fundamental in the process of building up the comprehensive understanding of our data and of the detector response across the combined energy interval as a whole. In addition, other important elements of the measurement are: an accurate calibration campaign [15] in the energy interval ranging from 0.15 to 9 MeV carried out by deploying several radioactive sources inside the detector, a detailed Monte Carlo (MC) simulation finetuned to reproduce the calibration data simultaneously at low and at high energies [16], and the use of data processing and data selection as well as backgroundrejection tools common to the whole energy range.
The unified analysis approach in the LER, described in this work, together with a larger exposure and a reduction of the most relevant backgrounds in the Phase-II lead to a significant improvement of the accuracy of our previous Phase-I results about the R Be (from 4.8% to 2.7%) and R pep (from 21.6% to 17.4=16.3%, depending on the HZ/LZ-SSM assumption, respectively). For R pp the improvement is smaller, from the precision of 11.4% to 10.6%.

II. THE BOREXINO DETECTOR AND THE DATA SELECTION
The Borexino experiment is located at the Laboratori Nazionali del Gran Sasso in Italy. The core of the detector [17] is 278 ton of ultrapure organic liquid scintillator, namely PC (pseudocumene, 1,2,4-trimethylbenzene) as a solvent and 1.5 g=l of fluor PPO (2,5-diphenyloxazole) as a solute, contained in a 125 μm-thick nylon inner vessel (IV) of 4.25 m radius, surrounded by nominally 2212 8-inch ETL 9351 photomultipliers (PMTs). Since the beginning of the data taking, we observed a slow PMT failure rate over time. As a reference, the number of working channels was 1769 at the beginning of the data-taking period considered in this work while it was 1383 at its end.
Neutrinos of any flavor interact by elastic scattering with electrons, whose recoil produces scintillation light (∼500 photoelectrons=MeV=2000 PMTs). The density of target electrons in the scintillator is ð3.307 AE 0.003Þ× 10 31 =100 ton. A nonscintillating buffer fills the space between the IV and a stainless-steel sphere (SSS) of 6.85 m radius, which supports the PMTs. The buffer liquid is further divided in two regions by another nylon vessel of radius 5.5 m which prevents radon emanating from the SSS and the PMTs to enter the core of the detector. The entire detector is enclosed in a cylindrical tank filled with ultrapure water and instrumented with 208 PMTs, acting as an active Cherenkov muon veto and as a passive shield against external γs and neutrons.
The present analysis is based on the data collected between December 14, 2011 to May 21, 2016, which corresponds to an exposure of 1291.51 days × 71.3 t (∼1.6 times the exposure used in [5]). This period belongs to the so-called Borexino Phase-II, which started after an extensive purification campaign of the scintillator with 6 cycles of closed-loop water extraction, which has significantly reduced the radioactive contaminants: 238 U < 9.4 × 10 −20 g=g (95% C.L.), 232 Th < 5.7 × 10 −19 g=g (95% C.L.), 85 Kr and 210 Bi reduced respectively by a factor ∼4.6 and ∼2.3 (see this work).
The expected solar νs interaction rate in Borexino ranges from few to ∼100 cpd=100 ton depending on the neutrino component. Together with the lack of directionality information from the scintillation light, this low rate demands a high detector radio-purity, a deep understanding of the backgrounds, and an accurate modeling of the detector response.
The position and pulse-shape of each event are reconstructed by exploiting the number of detected photons and their detection times. The information about the event energy is carried by the number of detected photoelectrons or just the number of hit PMTs, as in our energy range the PMTs mainly work in a single photoelectron regime. In detail, we define different energy estimators: N p which is the total number of hit PMTs in the event or N dt 1ð2Þ p , the number of hit PMTs happening within a fixed time interval of 230 (400) ns; N h the number of detected hits, including multiple hits on the same PMT and, finally N pe , the total charge collected by each PMT anode, that is the number of photoelectrons, p.e. As it will be detailed in Sec. V C, the energy is not reconstructed meaning that, during the analysis procedure, we do not convert the values of the energy estimator into the event energy. On the contrary, we build the prediction of the measured variables transforming the theoretical event energy into the corresponding value of a given energy estimator. As a reference, at 1 MeV, the energy and position reconstruction resolutions are ∼50 keV and ∼10 cm, respectively. The trigger threshold is N p > 20 in a 100 ns time window, which corresponds to ∼50 keV.
To account for the variation in the number of working channels as a function of time, in the analysis and simulation procedures, all the energy estimators are normalized to a fixed number N tot of PMTs (typically N tot ¼ 2000 PMTs) [18] through the relation N p;h;pe ¼ N m p;h;pe · N tot =N 0 ðtÞ, with N m p;h;pe being the measured value of the energy estimator and N 0 ðtÞ is the time-dependent number of working PMTs.
Events in the entire LER are selected using the same cuts described in [12]: we remove internal (external) muons [19] and we apply a 300 (2) ms veto to suppress cosmogenic backgrounds. The total dead-time introduced by these vetoes is 1.5%. We remove 214 Bi -214 Po fast coincidences from the 238 U chain and unphysical noise events. The fraction of good events removed by these cuts, estimated using MC simulations [16] and calibration data [15], is ∼0.1%. Background from sources external to the scintillator (nylon vessel, SSS, and PMTs) is reduced with a fiducial volume (FV) cut, which selects the innermost region of the scintillator (71.3 ton), contained within the radius R < 2.8 m and the vertical coordinate −1.8 < z < 2.2 m.
The 11 C isotope is continuously produced in the liquid scintillator by muons through spallation on 12 C. In order to limit its effect on the sensitivity to pep νs, we exploit the so-called three-fold coincidence (TFC) method and e þ =e − pulse-shape discrimination [13,18].
The TFC takes advantage of the fact that 11 C is often produced together with one or even a burst of neutrons. The principle of the method is thus to tag events correlated in space and time with a muon and a neutron. We have improved the TFC technique already employed by us [13] by implementing a new algorithm, which evaluates the likelihood L TFC that an event is a 11 C candidate, considering relevant observables such as the distance in space and time from the parent muon, the distance from the neutron, the neutron multiplicity, and muon dE=dx. Based on this probability, the data set is divided in two samples: one depleted (TFC-subtracted), obtained removing the 11 C tagged events, and one enriched (TFC-tagged) in 11 C. These two sets are separately fitted in the multivariate scheme (see later). The new TFC algorithm has (92 AE 4)% 11 C-tagging efficiency, while preserving (64.28 AE 0.01)% of the total exposure in the TFC-subtracted spectrum. Figure 1 shows the distribution of logðL TFC Þ of the present data set as a function of the N dt 1 p energy estimator and it demonstrates how 11 C decays can be identified by cutting the events on the basis of the value of L TFC .
A. Pulse shape discrimination of β + =β − events The residual amount of 11 C in the TFC-subtracted spectrum can be disentangled from the neutrino signal through variables with β þ =β − pulse-shape discrimination capability [13,18]. We build these variables considering that the probability density function (PDF) of the time detection of the scintillation light is different for β þ and β − events for two reasons: (i) for β þ events, in 50% of the cases, the e þ annihilation is delayed by ortho-positronium formation, which survives in the liquid scintillator with a mean time τ ∼ 3 ns [20]; (ii) the topology e þ energy deposit is not pointlike, due to the two back-to-back FIG. 1. Distribution of logðL TFC Þ as a function of the N dt 1 p energy estimator. The plot is built using the entire set of data surviving the selection cuts described in Sec. II. The regions dominated by the abundant internal background of 14 C and 210 Po are indicated by the corresponding labels. The green-dashed horizontal line represents the L TFC -threshold, above/below which the events are assigned to the TFC-tagged/subtracted energy spectrum. It is clearly visible that the majority of the events of the 11 C energy decay spectrum lies above this threshold. 511 keV annihilation γs. These two features originate a pattern of the energy deposit of β þ with a larger time and spatial spread than the corresponding one generated by β − . Based on this fact, a pulse-shape (PS) discrimination algorithm has been constructed using the neural network of a boosted decision tree (BDT) and used for previous analysis as detailed in [18]. In the present analysis, we have introduced a novel discrimination parameter, called PS-L PR , defined as the maximum value of the likelihood function L PR used in the position reconstruction (PR), divided by the value of the energy estimator. The latter normalization removes the L PR energy-dependence, since it is calculated as the summation over the collected hits [18]. The PR-algorithm is based on the expected distribution of the arrival times of optical photons on the PMTs. For all events, the algorithm uses the scintillation light emission PDF of point-like β − events. For this reason the distribution of the maximum likelihood value shows some discrimination capability for different types of particles, if they originate photon time patterns distinct from that of β − .
The study of the performances of the PS-L PR variable demands, from one side, the identification of samples of true β − and β þ events and, from another side, it requires to properly account for the variable number of working channels that influences its value.
A pure, high-statistics β − sample can only be obtained from a limited time period of the water-extraction phase of the scintillator purification campaign. During this time, a temporary 222 Rn contamination entered the detector. Using the space-and-time correlation of the fast coinciding Bi events. The ability of the MC to reproduce the PS-L PR parameter of these events and the comparison to data is shown in Fig. 2. The agreement between data and simulation demonstrates that the MC can accurately construct the PDF of this parameter for the entire set of data thus accounting for the variable number of working channels.
Our best β þ sample is obtained from the TFC tagged events with hard cuts on the energy and on the time correlation with the neutron and muon tracks. These events are selected from the whole data set and thus they naturally follow the live-channels distribution. The discrimination capability of PS-L PR is demonstrated comparing them with a MC sample of pure electrons with a flat energy distribution in the energy interval of the 11 C events, while also following the realistic live channel distribution over the whole data set. The PS-L PR for these MC generated electrons was used as β − sample in a further analysis (analytical multivariate fit) described in Sec. IV). Figure 3 shows the distribution of the PS-L PR parameter for the MC generated electrons compared with that of β þ events obtained from 11 C data. The difference between the two distributions at high values of PS-L PR is the key element allowing the discrimination between β − and β þ . Note that we do not need to build a position reconstruction algorithm based on the time profile of the scintillation light of β þ events. Figure 4 shows the PS-L PR pulse-shape discriminator as a function of N dt 1 p energy estimator for events selected with the cuts described in Sec. II and used in the present analysis.
It is interesting to note that the comparison between the BDT and PS-L PR parameters, using the samples of true β − and β þ events, shows that they have similar discrimination power and they similarly help in reducing the systematic uncertainty of the pep νs result. However, the use of PS-L PR offers some advantages like its simplicity, the fact that it can be calculated without the training procedure necessary for BDT (that suffers the limited size of the available β − training sample), and finally, the possibility to easily reproduce it through the MC.

IV. MULTIVARIATE FIT
The most powerful signatures for the detection of solar neutrinos in Borexino are the shapes of the energy spectra from electrons that underwent elastic scattering interactions with neutrinos. However, the recognition of these shapes is somewhat hindered by the contribution of various types of background events. In addition, the spectral details are also masked through the finite energy resolution of the detector and eventually distorted by nonlinear effects linking the energy deposit in the scintillator and the observed energy estimator.
Signal and background can be disentangled through an accurate fit. In order to enhance our sensitivity to the neutrino signal, we have adopted in the entire LER the multivariate fit approach already exploited in [13]. We maximize a binned likelihood function containing the information from the TFC-tagged and TFC-subtracted energy spectra. Additional information from the PS-L PR parameter and the radial distributions of the events in the optimized energy regions are included in the fit. The radial information is important to accurately measure the background rates due to external γs produced by the contamination of the PMTs and the supporting SSS. The pulse shape parameter PS-L PR helps in the separation of the residual 11 Cðe þ Þ background from the e − -like components, and this is relevant for the determination of R pep and R CNO .
Several ingredients are necessary to perform the fit. The first one is a background model, that is a list of possible radioactive contaminants that we assume give a contribution to the measured signal. The second one is the detector response function, i.e., a full model of the distributions of all the physical variables that we measure. The knowledge of the detector response function allows the prediction of the probability density functions of all the quantities entering the fit procedure.
As done in previous Borexino analyses, we have adopted two complementary methods to build the detector response function: an analytical approach and a MC based procedure. The only free parameters of the fit in the MC approach are the interaction rates of neutrino and background species, while in the analytical method (see later), in addition, some of the parameters related to the response function and to the energy scale are also free and determined by the fit procedure. These two methods share the same background model.
Fitting tools based on the use of graphical processing units (GPU) have been developed and used with the analytical fit method. They decrease the computation time by about 3 orders of magnitude compared with the standard CPU based algorithms previously used [21].

A. Multivariate likelihood function
The TFC-subtracted and TFC-tagged data sets are fitted simultaneously by maximizing a likelihood function The symbol ⃗ θ indicates the set of the arguments with respect to which the function is maximized and ⃗ k generically indicates the set of the experimental data used to evaluate the likelihood. The two factors in Eq. (1) are the likelihood functions related to TFC-subtracted and TFCtagged energy spectra, respectively.
L TFC sub ð ⃗ kj ⃗ θÞ is the standard Poisson binned likelihood function: where ⃗ k in this case is the ensemble of the data entries k j;l;m in the energy bin j, position bin l, and pulse shape parameter bin m; λ j;l;m ð ⃗ θÞ are the expected number of entries in the same bins, and N E;R;P are the total number of energy, radial, and pulse shape parameter bins. L TFC tag ð ⃗ kj ⃗ θÞ is constructed in a similar way but it does not include the pulse shape variable: and ⃗ k represents in this case the set of data entries k j;l in the energy and radial bins j, l integrated with respect to the pulse shape parameter. The signal of 11 C in the TFC-tagged spectrum is relatively strong compared to the other spectral components and the fit procedure extracts it very efficiently thanks to its spectral shape. This is the reason driving the choice of using the two dimensional (2D) likelihood function of Eq. (3) for the TFC-tagged spectrum instead of a complete function of Eq. (2) that, for the TFC-tagged spectrum, only increases the computation time without bringing additional information.
Both the TFC-subtracted and TFC-tagged spectra are fitted keeping the rates of the majority of the components in common, except 11 C itself, 6 He and 10 C (which have cosmogenic origin), and 210 Po, that is not distributed homogeneoulsy through the detector volume.
Constraints on the values of the multivariate fit parameters are implemented (if not specified otherwise) as multiplicative Gaussian terms in the likelihood function.
The likelihood function of Eq. (2) and Eq. (3) are exactly the ones which are maximized using our most recent version of the MC-based fit procedure (see Sec. VA). Precisely, we generate with the MC every signal and background component and we build and properly normalize 3D (or 2D) histograms of the simulated number of events as a function of the energy estimator, PS-L PR parameter, and radius (or of the energy estimator and radius only). The quantities λ jlm and λ jl of Eq. (2) and Eq. (3) represent the sum of the bin content of the histograms, each one weighted by the rate of the specific component ( ⃗ θ). Earlier versions of the MC fit and the present analytical fit maximize an approximated version of the likelihood L 3D ð ⃗ kj ⃗ θÞ, as already described in [18]. This function, called Lð ⃗ kj ⃗ θÞ, is written as a product of four factors coming from the TFC-subtracted and TFC-tagged energy spectra ( L TFC E;sub and L TFC E;tag ) and from the PS-L PR (L P ) and radial (L R ) distributions of events in the 11 C-energy-range of the TFC-subtracted spectrum: The first two terms, L TFC E;sub ð ⃗ kj ⃗ θÞ and L TFC E;tag ð ⃗ kj ⃗ θÞ, are Poisson likelihoods [like Eq. (2) and (3)] with ⃗ k being the data entries k j in the energy bin j integrated with respect to the other variables.
The other two terms in Eq. (4) have been built considering that in the framework of the analytical approach, there is no model able to produce precise multidimensional PDFs. Thus we have projected the events, from the optimized energy intervals of the TFC-subtracted spectrum and integrated over energy ranges larger than the binning of the energy spectrum, into 1D histograms of the pulse-shape and radial distributions. L P ð ⃗ kj ⃗ θÞ and L R ð ⃗ kj ⃗ θÞ of Eq. (4) are then built fitting these 1D distributions using PDFs obtained either from the data (high purity 11 C sample for β þ pulse shape) or based on the MC simulation (β − pulse shape, radial distributions). In the calculation of the corresponding likelihoods, we introduce a correlation between the number of counts in different histograms, as events that are in the energy spectrum will also be entries in the projections. To handle this issue, we normalize the functions to the total number of entries N in the projected data histograms. Consequently, we define the likelihood of the PS-L PR parameter as we did in [18] for the previously used PS-BDT parameter: where the scaling parameter a enforces the normalization and is set such where N is the total number of entries in the projected histogram and a is a scaling factor. Here, k m is the actual number of entries of bin m of the 1D projection of the PS-L PR distribution in a fixed energy interval, N 1D P is the total number of bins of this histogram, and λ m ð ⃗ θÞ represents the expected content in bin m. L R ð ⃗ kj ⃗ θÞ is defined in a way similar to L P ð ⃗ kj ⃗ θÞ. The results of the MC-based fit, which is either performed using L 3D ð ⃗ kj ⃗ θÞ or Lð ⃗ kj ⃗ θÞ, are consistent, confirming that no systematic uncertainty is introduced when using the approximated likelihood function.

A. The Monte Carlo method
The MC code developed for Borexino [16] is a customized Geant4-based simulation package [22], which can simulate all processes following the interaction of a particle in the detector (energy loss including ionisation quenching in the scintillator; scintillation and Cherenkov light production; optical photon propagation and interaction in the scintillator modelling absorption and re-emission, Rayleigh scattering, interaction of the optical photons with the surface of the materials; photon detection on the PMTs, and response of the electronics chain) including all known characteristics of the apparatus (geometry, properties of the materials, variable number of the working channels over the duration of the experiment as in the real data) and their evolution in time. The code thus produces a fully simulated detector response function because it provides a simulated version of all the measured physical variables.
All the MC input parameters have been chosen or optimized using samples of data independent from the ones used in the present analysis (laboratory measurements and Borexino calibrations with radioactive sources [15]) and the simulation of the variables relevant for the present analysis has reached sub-percent precision [16].
Once the MC input parameters have been tuned, the PDFs of all the needed variables related to each of the ν and background components are built simulating events according to the specific energy spectrum. In order to properly reproduce the spatial dependence of the energy response, events are simulated in the detector following their expected spatial distribution: while the ν and most of background events are expected to be uniformly distributed in the detector, 210 Po decays are simulated according to their actual spatial and time distribution obtained from experimental data. Note that data events due to the α decay of 210 Po are efficiently identified by tagging 210 Po with a pulse-shape discrimination method based on the multilayer perceptron (MLP) algorithm [23] (a particular class of neural network algorithms). Similarly, γs from external background are generated on the SSS and PMTs surfaces so that the radial distribution of the interactions inside the scintillator volume shows a clear decrease from the outer region of the detector toward the center.
Events generated according to the theoretical signal and background energy spectra are then processed as real data. As already anticipated, for every species, 3D or 2D histograms are built for the energy estimators, the reconstructed radius, and the PS-L PR variable. When properly binned and normalized, these histograms represent the PDFs to be used in the fit and they provide the values λ jlm ð ⃗ θÞ in Eq. (2) and λ jl ð ⃗ θÞ in Eq. (3). In the MC approach there are no free fit parameters other than the interaction rates of all species. The goodness of the fit simultaneously demonstrates the accuracy of the MC simulation, as well as the stability of the detector response over the period of five years.
In the wide energy range covered by this analysis, there is a huge difference between the number of measured counts per bin in the lower and in the higher energy regions. In the construction of the 3D PDFs, the need to simulate large numbers of events becomes really important, since they are scattered over a larger number of bins. To mitigate the consequences due to low populated bins and to have a good approximation to a χ 2 , we have replaced the energy estimator and the radius R with some transformed variables. We choose to use R 3 instead of R, thus using bins of 5 m 3 each and still achieving a very effective separation of the external background from the bulk components. Similarly, we introduced a transformed variable N h 0 based on the N h energy estimator: this change of variable is equivalent to adopting a variable binning size that scales with energy proportionally to the width of the N h distribution obtained simulating monoenergetic electrons. This approach allows to reduce the statistical fluctuations without losing any physical information. As a by-product, this efficient binning significantly reduced the computing time needed to perform a single fit, speeding up the analysis of the MC pseudoexperiments used to estimate the statistical and systematic uncertainties of the measurement described in Sec. VI.
The multivariate analysis was not applied on the whole energy range: the radial information was considered only for N h > 290 to exclude from the analysis the spatial distribution of 210 Po, while the PS-L PR was used where 11 C is present (409 < N h < 645). The shape of the probability density function of the PS-L PR variable for β þ was obtained from an empirical parametrization of the distribution generated by the MC, with an additional small shift to compensate differences between the MC simulation outcome and a sample of strictly selected 11 C events.
B. The analytical model of the energy response function In the analytical approach, we introduce a PDF for the energy estimator under consideration and analytical expressions for its mean value and variance. This PDF describes the detector's energy response function to monoenergetic events and, in brief, it is mainly influenced by the number of scintillation and Cherenkov photons and effects due to the nonuniformity of the light collection. As already anticipated, we then transform the energy spectra of each species into the corresponding distributions of the energy estimators. Effects like the ionization quenching in the scintillator, the contribution of the Cherenkov light, the spatial dependence of the reconstructed energy and its resolution are accounted for through some parameters, part of which are fixed, while others are free to vary in the final fit.
We describe here the present model for N p which is derived from [18], with several improvements to extend the energy range of the fit to the entire LER. The same model describes the variables N dt 1ð2Þ p . All used energy estimators are obtained after normalising the corresponding measured values to a reference configuration of N tot ¼ 2000 channels (defined in Sec. II).
As energy response function for the entire LER, we use the scaled Poisson function fðN p Þ [and similarly fðN dt 1ð2Þ p Þ] already introduced for analyzing events in the lowest region of the energy spectrum and detailed in [24] and in [25] The two free parameters of this function, m and s, are fixed using the expressions for the mean valueN p ðEÞ and variance σ 2 p developed in the context of our model and described below: In order to obtainN p ðEÞ we first consider that the mean number of photoelectronsN pe ðEÞ for each event of energy E takes its main contribution from the scintillation photons with a subdominant correction from the Cherenkov light, and it can be written as follows where Y pe 0 is the photoelectron yield expressed in photoelectrons/MeV for events in the detector center; the quenching term, QðEÞ, accounts for nonlinearity of the scintillator response; F Ch ðEÞ, an analytical parametrization of Cherenkov light dependence on energy valid for electrons, provides the smooth transition between linear dependency at the energies above 1-2 MeV and zero contribution for electrons below the Cherenkov threshold E 0 ¼ 0.165 MeV; f Ch is a parameter allowing to adjust the relative weight of the scintillation and Cherenkov light. Table I reports details of the analytical expressions.
Similarly to that described in [18],N p ðEÞ, is linked tō N pe ðEÞ through where μ ¼N pe ðEÞ N tot , g C is a geometric correction factor, calculated for the given fiducial volume, and p t is the fraction of a single photoelectron signal below the electronics threshold. These expressions extend the ones previously used in [18] with the introduction of the f Ch and p t parameters.
The second ingredient of the analytical model is the variance σ 2 p of the N p energy estimator. It is described by 0.12 f Ch Fixed Relative weight of the scintillation and Cherenkov light; fixed by performing many analytical fits on data with it as a free/fixed parameter.
QðEÞ Fixed Quenching term summarizing the effects related to nonlinearity of the scintillator response according to Birk's quenching model [18]: is the Birk's constant, and Q(E) can be parametrized as: QðE; k B Þ ¼ A 1 þA 2 ln EþA 3 ln E 2 1þA 4 ln EþA 5 ln E 2 ; fixed from the fit of N pe vs E with MC simulation of γ calibration data.
Relative variance of the probability that a PMT triggers for events uniformly distributed in the detector volume, calculated using dedicated MC studies. It has some energy dependence and then we are using a value averaged over the LER. p the following expression which extends the model already described in [18] in particular with the modification of the term linear withN p ðEÞ and the addition of a quadratic one where v 1 is the relative variance of the PMT triggering probability for events uniformly distributed in the detector volume, p 1 ¼ 1 − e −μ is the probability of having a signal at any PMT, p 0 ¼ e −μ is the probability of absence of the signal, v 0 T accounts for the spatial nonuniformity of the number of triggered PMTs, v q T accounts for the nonuniformity of the light collection, v N is the intrinsic resolution parameter of the scintillator for βs that effectively includes other contributions at low energies, and the last term σ d describes the effects of the dark noise an of PMTs. The channel equalization factor f eq is the ratio between N tot and the actual number of working PMTs and it changes during the data taking period.
In summary, the cubic term takes into account the variance of the number of the triggered PMTs for the events with fixed collected charge in the IV. The quadratic term takes into account the variance of the light collection function over the detector, and is generally weaker compared to the cubic term (and was neglected in previous analyses with more uniform PMTs distribution).
Formula (12) was derived analytically and verified against the MC simulations. For α particles we are using a simplified form with only the first and cubic term of relation 15 since we need to model a single energy point ( 210 Po). It is thus not necessary to follow the energy dependence of the variance. The coefficient of the cubic term is called v α T and it corresponds to the width of the 210 Po-α peak. As anticipated, we use the previous relations also for describing the mean value and variance of the estimators N dt 1ð2Þ p . Most of the above listed parameters are tuned using data independent from the ones used in the solar neutrino fit, calibrations or MC and are fixed in the fit (QðEÞ, f Ch , p t , are left free to vary in the fit, together with the neutrino and background interaction rates. The two parameters p t , g C could be in principle free fit parameters, however they are fixed because the fit results have a low sensitivity to them.
In summary, the model has one free parameter describing the yield and three free parameters describing the energy resolution. Leaving the above listed parameters free gives the analytical fit the freedom to account for unexpected effects or unforeseen variations of the detector response in time. Table I reports all the parameters, free or fixed, appearing in the analytical fit with a short explanation about how they are obtained. In case of parameters kept free in the fit, we report in the table the values obtained fitting the present data set as described in this paper. The corresponding values of the ν interaction rates and background are reported in Sec. VII.

C. Handling of the energy variables in the fit
We perform the fit of the energy spectra with the experimental data binned as a function of the energy estimators instead of transforming that distribution into the energy scale. Among the reasons driving this choice we remark that the analytical approach does not assume a priori knowledge of the prec energy transformation rules and the energy scale is automatically adjusted while fitting the experimental data. The use of the transformed experimental spectra would significantly slow down the fitting procedure, as the data reprocessing will be needed each time the energy scale parameters are changed in the fit. In addition, the presence of the contributions from 14 C and 210 Po with very high statistics makes the fit sensitive to tiny details of the energy response function (response to the monoenergetic event with a fixed energy distributed uniformly in the detector's volume). The shape of the energy response for the detected number of p.e. (or the number of the triggered PMTs) in the sub-MeV energy region is defined mainly by the statistical factor, with small additional smearing due to the nonuniformity of the amount of the collected light throughout the detector. The study performed using MC model showed that the shape of the charge response can be approximated by the generalized gamma function, and the shape of the N p response can be approximated by the scaled Poisson function. But the energy response function in the energy scale does not allow a simple description with an analytical function, and thus complex calculations would be necessary if the transformed energy is used. In the MC approach the transformation to the energy scale is in principle feasible, because the energy scale and energy response in this approach are fixed from the calibrations, but it was not applied to keep internal consistency with the analytical approach.
Moreover, the amount of light emitted for a given energy deposit in the scintillator differs for the electrons, γs and α particles and then the energy scale calibrated for electrons is not valid for αs and γs. The experimental spectrum contains contributions from all these types of particle and the event-by-event identification of the type of interaction is not possible while the different contributions are statistically identified using the fit procedure. The binning of the data in the physical energy scale (as shown in the figures reporting the fit results) is performed only after the fit is completed.

VI. SENSITIVITY STUDIES
Sensitivity studies have been performed by generating many pseudoexperiments with the MC and fitting this simulated data using the same response functions adopted for fitting the real experimental data, using both analytical and MC procedures. The simulated data of the pseudoexperiments are obtained from a random sampling of PDFs produced with the full Borexino MC, including solar neutrino interaction rates as predicted by the HZ/LZ-SSM and with the rates of the different background components compatible with the final results presented  rates (cpd=100 ton) of solar ν and of the background species as they result from the MC fit of pseudo-experiments simulated with the same exposure as the experimental data discussed in this paper. The fit is performed in the entire LER region and, as in the real data analysis, penalty terms are added in the likelihood to constrain the values of the 14 C and pileup rates within the measured ones. It is interesting to note the correlation between the pp and 85 Kr rates, physically driven by the fact that a not negligible portion of the 85 Kr spectrum lies in the energy region around about 200 keV where we are sensitive to the pp νs signal. In the left plot, 6700 pseudoexperiments have been generated assuming the R CNO according to HZ-SSM and fitted imposing a constraint on R CNO to the same value. The same MC PDFs have been used to simulate and fit data, so these plots show only uncertainties due to statistical fluctuations and the effects of the correlations among the various components. The top right inset represents the results of the fit of 10000 pseudo-experiments fitted with the MC method while keeping the R CNO free but constraining the R pp =R pep ratio to (47.7 AE 0.8) (HZ-SSM [3,26]). Constraining R pp =R pep to the LZ-SSM prediction, 47.5 AE 0.8, gives consistent results. The study included all the background and neutrino species: here we only show those components that mostly influence the sensitivity to CNO neutrinos. in this work. As an example, Fig. 5 shows the distribution of the results of the MC fit of 6700 pseudoexperiments each one with the same exposure as the real data. In this particular example, by construction, the fit model perfectly matches the simulated data. The 1D distributions of the fit results, i.e., the rates R of different solar neutrino and background species, are Gaussian and do not show any significant biases with respect to the rates used as simulation inputs. The widths of these distributions show the expected statistical precision of the measurement of the corresponding component. The shapes of the analogous 2D distributions visualize the correlations among the different components. In particular, we underline that since the energy spectrum of the CNO neutrinos is quite similar to that of the 210 Bi internal contamination and the fit procedure cannot separate them, the sensitivity studies for all the pp-cycle neutrino and background components are performed by constraining the CNO rate. These results are depicted in the left portion of Fig. 5 with R CNO generated and constrained assuming, as an example, the HZ-SSM. The same constraint on R CNO is used in fitting the real data, as it will be reported below. Some additional significant correlations are present among some of the various species, as the figure is showing. This is one of the reasons why the best accuracy in the determination of the interaction rates of solar neutrinos is obtained by fitting the entire energy spectrum, as in the present analysis, thus best using all the available information about details of the entire spectral shapes, instead of choosing partial energy regions.
The top right inset in Fig. 5 demonstrates the sensitivity of the present data set to CNO neutrinos. In this case, no constraint on R CNO is applied, but, to decrease the effect of the degeneracy of the spectral shapes, a constraint on the ratio between R pp and R pep , as expected from the SSM, is applied. It is interesting to note the strong anticorrelation between the 210 Bi and CNO components which is originated by the above discussed similarities of their energy spectra.
Finally, Fig. 6 is obtained removing all the constraints on the CNO and pep components and clearly shows that the strong correlations (and anticorrelations) among R CNO , R pep , and the 210 Bi decay rate significantly limit the possibility to determine all the three species at the same time.
Similar MC studies have been performed to quantify the systematic uncertainty associated to the fit models, by generating MC data with a response function modified with respect to the one used in the fit (see next section). Finally, pseudo-experiments MC data have been used to obtain the distribution of the likelihood functions and thus evaluate the p-values of our results.

VII. RESULTS
The interaction rates R pp , R Be , R pep are obtained from the fit together with the decay rates of 85 Kr, 210 Po, 210 Bi, 11 C internal backgrounds, and the external backgrounds rates ( 208 Tl, 214 Bi, and 40 K γ rays).
In the MC approach, the MC-based pile-up spectrum [16] is included in the fit with a constraint of (137.5 AE 2.8 cpd=100 ton) on the 14 C − 14 C contribution based on an independent measurement of the 14 C rate [12]. In the analytical approach, pile-up is taken into account with the convolution of each spectral component with the solicitedtrigger spectrum [12]. Alternatively, the analytical fit uses a synthetic pile-up spectrum [12] built directly from data. The differences between these methods are quoted in the systematic error (see Table IV).
In order to break the degeneracy between the 210 Bi and the CNO ν spectral shapes, we constrain the CNO ν interaction rate to the HZ-SSM predictions, including MSW-LMA oscillations (4.92 AE 0.56 cpd=100 ton) [3,26] as anticipated in Sec. VI. The analysis is repeated constraining the CNO ν rate to the LZ-SSM predictions (3.52 AE 0.37 cpd=100 ton) and in case of difference, the two results are quoted separately. The contribution of 8 B νs is small and its rate was constrained to the value obtained from the HER analysis [11]. We only show here the correlation between pep, CNO, 11 C and 210 Bi, but the study included all the spectral components. The significant correlations among these species forbid to measure at the same time R CNO and R pep and to determine the 210 Bi decay rate. As described in the text, we have constrained the CNO rate to get the pep one and set a constraint on the ratio R pp =R pep to obtain a limit on the CNO flux.
The interaction rates of solar neutrinos and the decay rates of background species, obtained by averaging the results of the analytical and MC approaches, are summarized in Tables II and III, respectively.
An example of the multivariate fit (with the MC approach) is shown in Fig. 7 (TFC-subtracted and TFCtagged energy spectra) and in Fig. 8 (radial distribution and PS-L PR pulse-shape distribution). The details of the fit at low energies (between ∼230 and 830 keV) can be appreciated in Fig. 9. In this example, obtained with the analytical fit procedure, the pile-up is not present as a separate fit component, since it is taken into account with the convolution method mentioned above.
To recognize the pep ν contribution to the measured electron-recoil spectrum, the TFC-subtracted spectrum, zoomed into the highest energy region (between 800 and 2700 keV), is shown after applying stringent selection cuts on the radial distribution (R < 2.4 m) and on the pulse-shape variable distribution (PS-L PR < 4.8) (see Fig. 10): the CNO and pep neutrino interactions are clearly visible between 1250 and 1500 keV, and the spectrum is consistent with the Compton-like shoulder expected from the pep line.
An extensive study of the systematic errors has been performed and the results are summarized in Table IV.
Differences between the results of the analytical and the MC fits are quoted as systematic errors. Further systematic uncertainties associated with the fitting procedure were studied by performing the fit in many different configurations by generating simulated data using a family of response functions whose parameters has been varied within calibration accuracy with respect to the nominal response function and by varying the energy estimator, the number and width of the bins, as well as the fit range).
Systematic uncertainties related to the fit models were evaluated using the method described in Sec. VI. Ensembles of pseudoexperiments were generated from a family of PDFs based on the full MC simulations and fitted using both the MC and analytical methods. PDFs including deformations due to possible inaccuracies in the modeling of the detector response (energy scale, uniformity of the energy response, shape of PS-L PR ) and uncertainties in the theoretical energy spectra ( 210 Bi) were considered. The magnitude of the deformation was chosen to be within the range allowed by the available calibration data.
In an additional systematic study, the fit was repeated taking into account the upper limit on the 85 Kr decay rate following the procedure described in [18], which exploits the 85 Kr − 85m Rb delayed coincidences ( 85 Kr rate < 7.5 cpd=100 ton at 95% C.L.).
The last three lines of Table IV list the uncertainties associated with the determination of the exposure. The one about the fiducial volume is one of the dominant. Its value is the same as quoted in [5] and it is estimated using calibration sources of known positions.
Fully consistent results are obtained when adopting a larger fiducial volume (R < 3.02 m, jzj < 1.67 m). This FV contains more external background (critical for the  384 keV), pep, and CNO solar νs: interaction rates and fluxes inferred assuming the MSW-LMA oscillation parameters [26]. The first error is the statistical derived by profiling the likelihood under Wilks' approximation. The interval extracted is consistent with the expectation from the MC sensitivity study. The second error is the systematic uncertainty. Different contributions to the systematic error are detailed in Table IV. The result on pep νs depends on whether we assume HZ-SSM or LZ-SSM metallicity for CNO νs. The remaining columns show the theoretical interaction rates and fluxes predicted by the standard solar model under the high and low metallicity assumptions [3].   The analysis has been performed using N h as energy estimator and the transformation to keV-energy scale was performed only for the plotting purposes. The residuals are calculated in every bin as the difference between the data counts and the fit result, divided by the square root of the data counts.
Finally, the analytical fit performed on a restricted energy range (not sensitive to pp neutrinos) using the N pe energy estimator gives consistent results (within 2σ) for R Be and R pep .
The 7 Be solar ν flux listed in Table II is the sum of the two monoenergetic lines at 384 and 862 keV. It corresponds to a rate for the 862 keV line of 46.3 AE 1.1 þ0.4 −0.7 cpd=100 ton, fully compatible with the Borexino Phase-I measurement [5]. The 7 Be solar ν flux is determined with a total uncertainty of 2.7%, which represents a factor of 1.8 improvement with respect to our previous result [5] and is two times smaller than the theoretical uncertainty.
The present value of R pp is consistent with our previous result and the uncertainty is reduced by about 7.9%.
The correlation between the CNO and pep ν is broken by constraining the R CNO in the fit. The values of R Be and R pp are not affected by the hypothesis on CNO νs within our sensitivity. However, R pep depends on it, being 0.22 cpd=100 ton higher if the LZ hypothesis is assumed (see Table II).
The Δχ 2 profile obtained by marginalizing the pep rate is shown in Fig. 11 (left) for both the HZ and LZ assumptions on CNO ν rate. Both curves are symmetric and allow us to establish, for the first time, that the absence of the pep reaction in the Sun is rejected at more than 5σ.
As anticipated, the similarity between the e − recoil spectrum induced by the CNO neutrinos and the 210 Bi spectrum makes it impossible to disentangle the two contributions with the spectral fit without an external constraint on the 210 Bi rate. For this reason, we can only provide an upper limit on the CNO neutrinos interaction rate R CNO . In order to do so, we need further to break the correlation between the CNO and pep contributions. In Phase-I, this was achieved by fixing the pep ν rate to the theoretical value [13]. In the current analysis, where pp νs are included in the extended energy range of the fit, we place an indirect constraint on pep νs by exploiting the theoretically well-known pp and pep flux ratio. The interaction rate ratio R pp =R pep , is constrained to (47.7 AE 0.8) (HZ) [3,26]. Constraining R pp =R pep to the LZ hypothesis value 47.5 AE 0.8 gives identical results.
We carried out a sensitivity study by performing the analysis on thousands of data sets simulated with the MC sensitivity tool: this study shows that under the current FIG. 10. TFC-subtracted energy spectrum zoomed between 800 keV and 2700 keV after applying stringent selection cuts on the radial distribution (R < 2.4 m) and on the pulse-shape variable distribution (PS-L PR < 4.8) to better see features due to pep νs interactions. The residuals (bottom plot) are the ratio between the data and the fit model. experimental conditions the total expected uncertainty (statistical plus systematical) is 3.4 cpd=100 ton. With this error, we expect the median 95% C.L. upper limit for R CNO to be ∼9 cpd=100 ton and 10 cpd=100 ton, for low and high metallicity, respectively. On data, we obtain the upper limit on R CNO ¼ 8.1 cpd=100 ton (95% C.L.) (see Table II), which is slightly stronger than the median limit expected from the MC based sensitivity study. The Δχ 2 profile for the CNO rate is shown in Fig. 11 (bottom). This result, using a weaker hypothesis on pep ν, confirms the current best limit on the flux of CNO νs previously obtained with Borexino Phase-I data [13].

VIII. CONCLUSIONS
In summary, we have reported the details of the analysis and the results of the first simultaneous measurement of the pp, 7 Be, and pep components of the solar neutrino spectrum providing a comprehensive investigation of the main pp chain in the Sun [10]. These results are in agreement with and improve the precision of our previous measurements. In particular, R Be is measured with an unprecedented precision of 2.7%. The absence of pep neutrinos is rejected for the first time at more than 5σ. These data, together with our measurement about 8 B ν flux in the HER [11], provide a unique measurement of the interaction rates and thus of the fluxes of the different components of the solar neutrinos from the pp chain with a single detector and a unified analysis approach.
The upper limit on R CNO has the same significance as that of Borexino Phase-I and currently is providing the tightest bound on this component.
Several analysis methods and details here reported and discussed have a general interest which is going beyond the understanding of the Borexino results: as example the 11 C suppression, the multivariate fit, the analytical model of the energy response, the full MC description of the detector and the fitting procedures can be easily adapted to large volume liquid scintillator based detectors similar to Borexino [27,28].