UvA-DARE (Digital Academic Repository) Neutrino interaction classification with a convolutional neural network in the DUNE far detector

TheDeepUndergroundNeutrinoExperimentisanext-generationneutrinooscillationexperimentthataims to measure CP -violation in the neutrino sector as part of a wider physics program. A deep learning approach based on a convolutional neural network has been developed to provide highly efficient and pure selections of electron neutrino and muon neutrino charged-current interactions. The electron neutrino (antineutrino) selection efficiency peaks at 90% (94%) and exceeds 85% (90%) for reconstructed neutrino energies between 2 – 5 GeV. The muon neutrino (antineutrino) event selection is found to have a maximum efficiency of 96% (97%) and exceeds90%(95%) efficiency for reconstructed neutrino energies above2 GeV. Whenconsidering all electron neutrino and antineutrino interactions as signal, a selection purity of 90% is achieved. These event selections are critical to maximize the sensitivity of the experiment to CP -violating effects.

The Deep Underground Neutrino Experiment is a next-generation neutrino oscillation experiment that aims to measure CP-violation in the neutrino sector as part of a wider physics program. A deep learning approach based on a convolutional neural network has been developed to provide highly efficient and pure selections of electron neutrino and muon neutrino charged-current interactions. The electron neutrino (antineutrino) selection efficiency peaks at 90% (94%) and exceeds 85% (90%) for reconstructed neutrino energies between 2-5 GeV. The muon neutrino (antineutrino) event selection is found to have a maximum efficiency of 96% (97%) and exceeds 90% (95%) efficiency for reconstructed neutrino energies above 2 GeV. When considering all electron neutrino and antineutrino interactions as signal, a selection purity of 90% is achieved. These event selections are critical to maximize the sensitivity of the experiment to CP-violating effects.

I. INTRODUCTION TO DUNE
Over the last twenty years neutrino oscillations [1,2] have become well-established [3][4][5][6][7][8][9][10]  neutrino oscillation formalism describes observed data with six fundamental parameters. These are three angles describing the rotation between the neutrino mass and flavor eigenstates, two mass splittings (differences between the squared masses of the neutrino mass states), and the CPviolating phase, δ CP . If sinðδ CP Þ is nonzero then the vacuum oscillation probabilities of neutrinos and antineutrinos will be different. DUNE [11] is a next-generation neutrino oscillation experiment with a primary scientific goal of making precise measurements of the parameters governing long-baseline neutrino oscillation. A particular priority is the observation of CP-violation in the neutrino sector. In DUNE, a muon neutrino (ν μ )-or muon antineutrino (ν μ )-dominated beam will be produced by the Long-Baseline Neutrino Facility (LBNF) beam line and characterized by a near detector (ND) at Fermilab before the neutrinos travel 1285 km to the Sanford Underground Research Facility (SURF). The far detector (FD) will consist of four 10 kt (fiducial) liquid argon time projection chamber (LArTPC) detectors. Oscillation probabilities are inferred from comparison of the observed neutrino spectra at the near and far detectors which are used to constrain values of the neutrino oscillation parameters.

A. CP-violation measurement
Symmetries under charge conjugation and parity inversion are both maximally violated by the weak interaction. Their combined operation has been shown to be violated, to a small degree, by quark mixing processes [12,13]. The neutrino oscillation formalism allows for an analogous process in lepton flavor mixing which can be measured with neutrino oscillations. DUNE is sensitive to four neutrino oscillation parameters, namely Δm 2 31 , θ 23 , θ 13 and δ CP , which can be measured using four data samples: two for neutrinos and two for antineutrinos. Two beam configurations with opposite polarities of the magnetic focusing horns are used to produce these samples: the "forward horn current" (FHC) mode produces a predominantly ν μ beam while a primarilyν μ beam is produced in the "reverse horn current" (RHC) mode. The FD data used in the oscillation analysis measure the "disappearance" channels (i.e., ν μ → ν μ andν μ →ν μ ), which are primarily sensitive to jΔm 2 31 j and sin 2 2θ 23 , and the "appearance" channels (i.e., ν μ → ν e andν μ →ν e ), which are sensitive to all four parameters, including the sign of Δm 2 31 . In all of these samples, interactions where the neutrinos scatter via charged-current (CC) exchange off the nuclei in the far detector are selected. In a CC interaction, the final state includes a charged lepton with the same flavor as the incoming neutrino and one or more hadrons, depending on the details of the interaction. Therefore, a critical aspect of event selection is the ability to identify the flavor of the final-state lepton. Thus it is key to be able to efficiently identify the signal (i.e., CC ν μ , CCν μ , CC ν e and CCν e ) interactions and have a powerful rejection of background events. At the energies relevant to the DUNE oscillation analysis, a final-state muon produces a long, straight track in the detector, while a final-state electron produces an electromagnetic (EM) shower. Examples of signal CC ν e and CC ν μ interactions are shown in Figs. 2 and 3(a), respectively.
The main background to the CC ν μ and CCν μ event selections are neutral current (NC) interactions with charged pions (π AE ) in the final state that can mimic the μ AE , an example of which is shown in Fig. 3(b). Neutral current interactions with a final-state π 0 meson, such as the one shown in Fig. 3(c), where the photons from π 0 decay may mimic the EM shower from an electron, form the primary reducible background to the CC ν e and CCν e event selections. A small fraction of electron neutrinos are intrinsic to the beam (and thus are not the result of neutrino oscillations). These events form a background for the oscillation analysis as they are indistinguishable from CC ν e appearance events. Once the four samples have been selected and the neutrino energy has been reconstructed, a fit is performed to the reconstructed neutrino energy distributions in the four samples to extract the neutrino oscillation parameters θ 13 , θ 23 , Δm 2 31 , and δ CP . This fit accounts for the effects of systematic uncertainties, including the constraints on those uncertainties from fits to ND data. Figure 1 shows the appearance samples and how they are expected to vary with the true value of δ CP , for a data collection period of 3.5 yr staged running in both FHC and RHC beam modes. The staging plan assumes two FD modules are ready at the start of the beam data taking, and modules three and four become operational after one year and two years, respectively. Full details of the DUNE staging plan and the oscillation analysis, including the assumed oscillation parameters, are provided in Ref. [14].

B. DUNE far detector
Neutrinos are detected via their interaction products i.e., observation of the leptons and hadrons that are produced when the neutrinos interact in the detector. In the singlephase LArTPC design that will be used for the first DUNE FD module, three wire read-out planes collect the ionization charge that is generated when charged particles traverse the liquid argon volume. The ionization charge drifts in a constant electric field to the read-out planes, and the drift time provides a third dimension of position information, giving rise to the name "time projection chamber." The position of the charge observed in each of the three planes is combined with the drift time to create three views of each neutrino interaction. The wires that form the planes are separated by approximately 5 mm giving the FD a fine-grained sampling of the neutrino interaction products. The electronic signals from the wires are sampled at a rate of 2 MHz, giving a similar effective spatial resolution in the time direction. Two of the wire planes are induction planes, biased to be transparent to the drifting electrons, such that they induce net-zero fluctuation in the wire current as they pass the wire plane. The third view is called the collection plane as it actually collects the drifting electrons. The four DUNE FD modules may not all have identical designs, but they will all produce similar images of the neutrino interactions, so the performance of the single-phase design is used throughout this article. Other potential designs must have at least the same sampling capabilities as the single-phase design, if not better, to be considered.

C. DUNE simulation and reconstruction
Neutrino interactions in the far detector are simulated within the LArSoft [15] framework, using the neutrino flux from a GEANT4-based [16] simulation of the LBNF beam line, the GENIE [17] neutrino interaction generator (version 2.12.10), and a GEANT4-based (version 10.3.01) detector simulation. Detector response to, and read-out of, the ionization charge is also simulated in LArSoft. Raw detector waveforms are processed to remove the impact of the electric field and electronics response; this process is referred to as "deconvolution" and the resulting deconvolved waveforms contain calibrated charge information. Current fluctuations in the wires above threshold, or "hits," are parametrized by Gaussian functions fit to deconvolved waveforms around local maxima. A reconstruction algorithm is used to cluster hits linked in space and time into groups associated with a particular physical object, such as a track or shower. More details of the DUNE simulation and reconstruction are available in Ref. [11].
The energy of the incoming neutrino in CC events is estimated by a dedicated algorithm that adds the reconstructed lepton and hadronic energies, using particles reconstructed by Pandora [18,19]. Pandora uses a multialgorithm approach to reconstruct all the visible particles produced in neutrino interactions. It provides a hierarchy of reconstructed particles, representing particles produced at the interaction vertex and their decays or subsequent interactions. If the event is selected as CC ν μ , the neutrino energy is estimated as the sum of the energy of the longest reconstructed track and the hadronic energy, where the energy of the longest reconstructed track is estimated from its range if the track is contained in the detector and from multiple Coulomb scattering if the track exits the detector. The hadronic energy is estimated from the energy associated with reconstructed hits that are not in the longest track. If the event is selected as CC ν e , the energy of the neutrino is estimated as the sum of the energy of the reconstructed shower with the highest energy and the hadronic energy. In all cases, simulation-based corrections for missing energy (due to undetected particles, reconstruction errors, etc) are applied.

II. CVN NEUTRINO INTERACTION CLASSIFIER
The DUNE convolutional visual network (CVN) classifies neutrino interactions in the DUNE FD through image recognition techniques. In general terms it is a convolutional neural network (CNN) [20]. The main feature of CNNs is that they apply a series of filters (using convolutions, hence the name of the CNN) to the images to extract features that allow the CNN to classify the images [21]. Each of the filters-also known as kernels-consists Reconstructed energy distribution of ν e andν e CC-like events selected by the convolutional neural network algorithm (CVN) assuming 3.5 yr (staged) running in the neutrino-beam mode (a) and antineutrino-beam mode (b), for a total of seven years (staged) exposure. The plots assume normal mass ordering and include curves for δ CP ¼ −π=2, 0, and π=2. Background from ν μ -CC, ν τ -CC, intrinsic ν e -CC, and NC interactions are shown as stacked, filled histograms. The figure is reproduced from Ref. [14].
of a set of values that are learnt by the CNN through the training process. CNNs are typically deep neural networks that consist of many convolutional layers, with the output from one convolutional layer forming the input to the next. Similar techniques have been demonstrated to outperform traditional event reconstruction-based methods to classify neutrino interactions [22,23]. Convolutional neural networks make use of learned kernel operations, usually followed by spatial pooling, applied in sequence to extract increasingly powerful and abstract features. In domains such as natural image analysis where important features of the data are locally spatially correlated they now greatly outperform previous state-ofthe-art techniques that relied on manual feature extraction and simpler machine learning methods [24][25][26][27]. Recently they have proven to also be appropriate for the analysis of signals in particle physics detectors [28][29][30]. They have found particular success in neutrino experiments where signals can arrive at any location in large uniform detector volumes [22,23,31,32], and the characteristic translational invariance of CNN methods represents an advantage rather than a challenge.
A. Inputs to the CVN Figure 4 shows that there are three inputs to the CVN. The three inputs are 500 × 500 pixel images of simulated neutrino interactions with one image produced for each of the three read-out views of the LArTPC. The images are produced at the hit-level stage of the reconstruction algorithms and are hence independent of any potential errors in high-level reconstruction such as clustering, trackfinding and shower reconstruction. The images are produced in (wire number, time) coordinates, where the wire number is simply the wire on which the reconstructed hit was detected, and the time is the interval from when the interaction happened to when the hit was detected on that wire (given by the peak time of the hit). The color of the pixel gives the hit charge where white shows that no hit was recorded for that pixel. Each pixel represents approximately 5 mm in the wire coordinate due to the spatial separation of the wires in the read-out plane, and the time coordinate is down-sampled to approximately correspond to the same 5 mm size after consideration of the electron drift velocity within the LArTPC.
Convolutional neural networks operate on fixed-size images; hence the neutrino interaction images must all be of a fixed size. To facilitate this, interactions that span more than 500 wires in a given view are cropped to fit in 500 × 500 pixel images. The steps below are used to find the 500 pixels in the wire coordinate: (1) Integrate the charge on each wire.
(2) Scan from low wire number, where a low wire number corresponds to the upstream end of the detector, to a high wire number and check the following 20 wires for recorded signals. If fewer than five of the 20 subsequent wires have no signals then this wire is chosen as the first column of the image. (3) If no wire satisfies the requirement in step 2, choose the continuous 500 wire range that contains the most deposited charge. For the time axis, a window of 3200 μs centered on the mean time of the hits is formed and divided into 500 bins that fill the 500 pixels. As such, no analogous region-ofinterest search is performed.
In order to ensure high quality images of the interactions, images were only produced for events that have their true neutrino interaction vertex within the detector fiducial volume described in Ref. [14]. Once the images have been produced, any events that contain any view with fewer than ten nonzero pixels are removed in order to discount empty and almost empty images from the training and testing data sets. Figure 2 shows a signal CC ν e event as seen in the three detector read-out views. Figure 3 CC ν μ interaction, and example NC background images containing a long π AE track and a π 0 are given in Figs. 3(b) and 3(c), respectively. The number of pixels in the images was chosen to maximize the size of the image whilst ensuring that the memory usage during training and inference of the network was manageable. The spatial dimension of the images covers 2.5 m, meaning any tracks with projected lengths in the read-out planes above 2.5 m will not be fully contained within the image, as is the case for the majority of muon tracks, including the one shown in Fig. 3(a). However, the key details for the neutrino interaction classification come from the region surrounding the vertex, so this choice of image size does not significantly impact the classification performance.

B. Network architecture
A simple overview of the architecture is shown in Fig. 4. The detailed architecture of the CVN is based on the 34layer version of the SE-ResNet architecture, which consists of a standard ResNet (residual neural network) architecture [33,34] along with squeeze-and-excitation blocks [35]. Residual neural networks allow the nth layer access to the output of both the (n − 1)th layer and the (n − k)th layer via a residual connection, where k is a positive integer ≥ 2. This is an important feature for the DUNE CVN as it allows the fine-grained detail of a LArTPC encoded in the input images to be propagated further into the CVN than would be possible using a traditional CNN such as the GoogLeNet (also called Inception v1) [36] inspired network used by NOvA [22].
The DUNE CVN differs from the architectures of other residual networks discussed in the literature [33,34] in the following ways: (i) The input and the shallower layers of the CVN are forked into three branches-one for each view-to let the model learn parameters from each individual view (see Sec. II A for more details). The outputs of the three branches are merged together by using a concatenation layer that works as input for the deeper layers of the model, as shown in Fig. 4. (ii) The CVN returns scores for each event through seven individual outputs (see Sec. II C and Fig. 4 for more details). Since the deeper layers of the CVN contain the model parameters 1 that are simultaneously in charge of the classification for the different outputs of the network, some outputs might take advantage of the learning process of other outputs to improve their performance. Also, a multioutput network lets us weight the outputs in order to make the network pay more attention to some specific outputs (see Sec. II D for more details). (iii) Each of the three branches (blocks 1-2, the shallower layers of the architecture shown in Fig. 4) consists of seven convolutional layers, while the deeper layers (blocks 3-N in Fig. 4) consist of 29 convolutional layers, making a total of 50 convolutional layers for the entire network.

C. Outputs from the CVN
As shown on the right of Fig. 4, there are seven outputs from the CVN, each consisting of a number of neurons with values v i for i ¼ 1 → n where n is the number of neurons. The sum of neuron values for each output (except for the last output since it consists of a single neuron) is given by P n i¼1 v i ¼ 1 such that each value of a neuron within a single output gives a fractional score that can be used to classify images.
The first output, which has four neurons to classify the flavor of the neutrino interaction, is the primary output, and it is the only one used in the oscillation sensitivity analysis presented in Refs. [11,14]. The other outputs are included in the architecture for potential use in future analyses.
(1) The first output (four neurons) returns scores for each event to be one of the following flavors: CC ν μ , CC ν e , CC ν τ and NC. This is the primary output of the network used for the main goal of neutrino interaction flavor classification. (5) The fifth output (four neurons), returns scores for each event to contain the following number of neutral pions: 0, 1, 2, > 2. (6) The sixth output (four neurons), returns scores for each event to contain the following number of neutrons: 0, 1, 2, > 2. (7) The seventh output 2 (one neuron) returns the score for each event to be a neutrino as opposed to an antineutrino. Outputs 2, 6 and 7 are not considered in the analyses presented here and are hence not further discussed, but they are included in the training and the overall loss calculations. The prediction of an event as a given underlying (anti) neutrino interaction is highly model-dependent and not as important as the number of final-state particles that can be observed in the detector; hence output 2 is not used. The neutron counting is very difficult since it is hard to define whether a neutron interaction would be visible and identifiable in the detector, so this output will not be used until it has been shown to work reliably. Finally, the antineutrino vs neutrino output is not likely to provide highly efficient or pure event selections since there is only a weak dependence on the event observables to try to differentiate neutrinos and antineutrinos.

D. Training the CVN
The CVN 3 was trained using Python 3.5.2 and Keras 2.2.4 [37] on top of Tensorflow 1.12.0 [38], on eight NVIDIATesla V100 GPUs. Stochastic gradient descent (SGD) is used as the optimizer, with a minibatch size of 64 events (192 views), a learning rate of 0.1 (divided by 10 when the error plateaus, as suggested in [33]), a weight decay of 0.0001, and a momentum of 0.9. 4 The network was trained/validated/tested on 3,212,351 events (9,637,053 images/views), consisting of 27% CC ν μ , 27% CC ν e , 6% CC ν τ and 40% NC, from a single Monte Carlo sample as follows: training (∼98%), validation (∼1%) and test (∼1%). The sample of events is an MC prediction for the DUNE unoscillated FD neutrino event rate (flux times cross section) distribution in FHC beam mode as described in Ref. [11]. Samples where the input fluxes to the MC are "fully oscillated" (i.e., all ν μ are replaced with ν e , or all ν μ are replaced with ν τ ) are also used (these samples are usually weighted by oscillation probabilities and combined to produce oscillated FD event rate predictions). Analogous versions of each input sample are used for the RHC beam mode. For training purposes all CC ν e events were considered signal since the intrinsic beam ν e are indistinguishable from signal (appearance) ν e at any given energy. The results presented in the following sections use a statistically independent Monte Carlo sample.
The individual loss functions for the different outputs that were used for training the model, as well as the overall loss function, are given below 5 : (i) Neutrino flavor ID, interaction type, 6 proton count, charged pion count, neutral pion count, neutron count loss functions (J 1 , J 2 , J 3 , J 4 , J 5 , and J 6 , respectively): categorical cross-entropy, the loss function needed for multiclass classification, (ii) Neutrino/antineutrino ID loss function 6 (J 7 ): binary cross-entropy, the loss function needed for binary classification, y ðiÞ logðŷ ðiÞ Þ þ ð1 − y ðiÞ Þ logð1 −ŷ ðiÞ Þ: (iii) Overall loss function, The CVN was trained for 15 epochs 7 for ∼4.5 days (7 h per epoch), and similar classification performance was obtained for the training and test samples. Figure 5 shows the loss and accuracy training and validation results for the four main CVN outputs, where accuracy is defined as the fraction of events correctly classified for a given output. The red vertical lines show the epoch at which the CVN weights were taken for the model used in the presented analysis. After that epoch, the validation accuracy remains constant and small signs of overtraining begin to emerge (a small divergence of the training and validation accuracy curves). The relatively small difference between training and validation seen at epoch 10 has a negligible effect.

E. Feature maps
To study how the CVN is classifying the interactions it is advisable to look at feature maps at different points in the network architecture. An example is shown in Fig. 6 for a CCν e interaction, demonstrating the position from which two sets of feature maps are viewed within the network. The set of images in the top right shows the response of the filters in the first convolutional layer to the input electromagnetic shower image, where red shows a high response to a given filter, and yellow shows a low response. Across the different particle types and event topologies, the filters respond to different components in the images. The 512 feature maps from the final convolutional layer, shown at the bottom of Fig. 6 for the aforementioned CCν e interaction, are much more abstract in appearance since the input images have passed through many convolutions and have hence effectively been down-sampled to a size of 16 × 16 pixels from their original 500 × 500 pixel size.

III. NEUTRINO FLAVOR IDENTIFICATION PERFORMANCE
The primary goal of the CVN is to accurately identify CC ν e , CCν e , CC ν μ and CCν μ interactions for the selection of the samples required for the neutrino oscillation analysis. The values of the neurons in the flavor output give the score for each neutrino interaction to be one of the neutrino flavors. The CVN CC ν e score distribution, Pðν e Þ, is shown for the FHC beam mode (left) and RHC (right) in Fig. 7 for all interactions with a reconstructed event vertex within the FD fiducial volume, as described in Ref. [14]. The contributions from neutrino and antineutrino components for each flavor are combined since the detector can not easily distinguish between them. Very clear separation is seen between the signal (CC ν e and CCν e ) interactions and the background interactions including those from NC ν and NCν events. The beam CC ν e background is seen to peak in the same way as the CC ν e signal, which is expected since both arise from the same type of neutrino interaction. Figure 8 shows the corresponding plots for Pðν μ Þ for FHC and RHC beam modes for the same set of interactions. In all four histograms the signal interactions are peaked closely near score values of unity and the backgrounds lie close to zero score, as expected.
The CC ν e event selection criteria are chosen to maximize the oscillation analysis sensitivity to CPviolation; i.e.: significance of the determination that sinðδ CP Þ ≠ 0 [14]. The optimization was performed using a simple scan of cuts on Pðν e Þ for a single true value of δ CP . CP-violation sensitivity does not strongly depend on the selection criterion for Pðν μ Þ so this cut was chosen by inspection of Fig. 8. The resulting requirements are Pðν e Þ > 0.85 for an interaction to be selected as a CC ν e candidate and Pðν μ Þ > 0.5 for an interaction to be selected as a CC ν μ candidate. These cut values are represented by the red arrows in Figs. 7 and 8. Since all of the flavor classification scores must sum to one, these two samples are mutually exclusive. The same CVN and selection criteria are used for both FHC and RHC event selections. Figure 9 shows the efficiency as a function of reconstructed energy (under the electron neutrino hypothesis, as discussed in Sec. I C) for the CC ν e and CCν e event selections. The efficiency for the CVN is shown compared to the predicted efficiency used in the DUNE conceptual design report (CDR) [39], demonstrating that, across the most important part of the flux distribution (less than 5 GeV), the performance can exceed the CDR assumption. The efficiency in FHC (RHC) mode peaks at 90% (94%) and exceeds 85% (90%) for reconstructed neutrino energies between 2-5 GeV. Antineutrino interactions, on average, produce more energetic leptons and fewer hadrons than neutrino events, leading to greater lepton tagging efficiency with respect to neutrino-induced events. The training was optimized over the oscillation peak between 1 GeV and 5 GeV, and hence the CVN performs best in this region For simplicity, neutrino and antineutrino interactions have been combined within each histogram category. Backgrounds from CC ν e interactions are negligible and not shown. A log scale is used on the y-axis, normalized to 3.5 years of staged running, and the arrows denote the cut values applied for the DUNE TDR analyses [11].
where the sensitivity to neutrino oscillations is greatest. Improvements to the efficiency above 5 GeV may be achieved through the inclusion of more relevant training data but requires more study. The CDR analysis was based on a fast simulation that employed a parametrized detector response based on GEANT4 single particle simulations, and a classification scheme that classified events based on the longest muon/charged pion track, or the largest EM shower if no qualifying track was present. The efficiencies at low energy were tuned to hand scan results as a function of lepton energy and event inelasticity. Figure 10 shows the corresponding selection efficiency for the CC ν μ event selection. The efficiency has a maximum efficiency of 96% (97%) and exceeds 90% (95%) efficiency for reconstructed neutrino energies above 2 GeV for the FHC (RHC) beam mode. The optimized cut values permit a larger background component than the CDR analysis, but the overall performance of the selection is increased due to the significantly improved signal efficiency. Considering all electron neutrino interactions (both appeared and beam background CC ν e and CCν e events) as signal interactions, the CVN has a selection purity of 91% (89%) for the FHC (RHC) beam mode, assuming the normal neutrino mass ordering and δ CP ¼ 0 [14].

IV. EXCLUSIVE FINAL STATE RESULTS
The CVN has three outputs that count the number of final-state particles for the following species: protons, charged pions, and neutral pions. Neutrino interactions with different final-state particles can have different energy resolutions and systematic uncertainties depending on the complexity and particle multiplicity of the interaction. It may be possible to improve the oscillation sensitivity of the analysis by identifying subsamples of events with specific interaction topologies and very good energy resolution.
The individual output scores from the CVN can be multiplied together to give compound scores for exclusive selections. For example, the left plot in Fig. 11 shows the combined score for an event to be CC ν μ with only a single proton in the final-state hadronic system, formed by the product, Similarly, the right plot of Fig. 11 shows NC1π 0 score, which contains only a single visible π 0 meson in the final state, defined as The background and signal distributions, closely peaked toward 0 and 1 respectively, demonstrate that the efficient selection of exclusive final states will be possible with the DUNE CVN technique. However, it is possible that the CVN is keying in on features of the model that are not wellsupported by data (e.g., kinematic distributions of particles in the hadronic shower) rather than well-supported features, like the individual particle energy deposition patterns. Studies of potential bias from selections based on these classifiers are required before they can be used to generate analysis samples. Provided that the particle counting outputs can be shown to work in a robust manner for simulations and experimental data, these detailed selections have the potential to significantly improve the scientific output of DUNE FD data.

V. ROBUSTNESS
A common concern about the applications of deep learning in high energy physics is the difference in performance between data and simulation. A straightforward check of the CVN robustness is to inspect plots of the CVN efficiency as a function of various kinematic quantities. More advanced studies could be imagined where the underlying input physics model is changed to produce alternate input samples for training and testing purposes. Studies of this nature are beyond the scope of this paper, but should be part of the validation scheme for any deep learning discriminant used in eventual analyses of DUNE data.
To be considered well-behaved, the CVN flavor identification should be sensitive to the presence of a visible  11. The CC ν μ 1 proton (left) and NC 1π 0 (right) combined score distributions from the CVN. In both cases the number of other particles is required to be zero. All events that do not fit the signal description comprise the all backgrounds histograms. The histograms are shown in the expected relative fractions but the overall scale is arbitrary. charged lepton and not highly dependent on the details of the hadronic system, which could be poorly modeled. A visible charged lepton requires that the track or shower that it produces has clearly distinguishable features that are not masked by the presence of many overlapping energy depositions from particles from the hadronic shower. Furthermore, background interactions selected by the CVN should be those containing charged pions (for CC ν μ ) or neutral pions (for CC ν e ) that mimic the charged leptons in the signal interactions. Plots of selection efficiency for signal and background interactions were generated as a function of a variety of true and reconstructed quantities, several of which are highlighted here. Figure 12 shows the variation of the signal selection efficiency as a function of the charged lepton energy for three ranges of hadronic energy for the CC ν e (left) and CC ν μ (right) selections. There is a threshold around 0.1 GeV below which no events are correctly identified, and a region at higher lepton energy where the efficiency reaches a maximum and remains relatively flat. As the hadronic energy increases the maximum efficiency decreases, and this effect is more pronounced for the CC ν e selection since EM showers are more easily masked by hadronic shower energy depositions, as compared with long, straight muon tracks. The CC ν μ efficiency as a function of the true muon  energy also demonstrates that the performance is not affected by the lack of confinement of higher energy muons (≳1 GeV) within the 500 × 500 pixel images, as was discussed in Sec. II A.
The plot on the left of Fig. 13 shows the efficiency in the CC ν e selection for background interactions containing a π 0 meson as a function of the reconstructed ν e energy distribution for three ranges of π 0 energy, E π 0 . As expected, the selection efficiency is larger for the background interactions with higher energy π 0 mesons. Similarly, the selection efficiency for background interactions containing a π þ meson in the CC ν μ selection is shown on the right of Fig. 13 to be larger for higher energy mesons. Figure 14 shows the selection efficiency for CC ν μ and CC ν e interactions as a function of the charged lepton angle, defined with respect to the neutrino direction. This angle is defined in 3D; hence when the angle is 90°it corresponds to two cases where the efficiency is expected to be lower: the lepton is traveling almost perpendicular to the read-out planes, or the lepton is traveling parallel to the collection plane (view 2) wires. In these two cases the CVN does not have clear images of the charged lepton in one or more read-out views. This angle is also strongly correlated with the charged lepton energy, explaining the lower efficiency for events containing backward going, and hence lower energy, charged leptons.
Additional studies, not shown here, help to elucidate other features of these distributions. For example, a small fraction of events with very low energy leptons are still correctly identified. For these events it can be shown that they contain high energy pions which are likely responsible for their strong CVN flavor identification scores. Also of note are studies of the efficiency for other kinematic variables that showed no dependence other than those induced by their correlations with the leptonic and hadronic system energies. Finally, studies of CC ν τ events showed that efficiencies were consistent with the tau decay rates to muons and electrons. Roughly 17% of CC ν τ events were classified as CC ν μ , and about 17% as CC ν e . The primary τ AE decays before leaving a track in the detectors, and though CC ν τ event kinematics are different from CC ν e and CC ν μ events, these events are classified based on the visible charged lepton in the event.
The outcome of these studies provides confidence that the CVN classification is strongly tied to the charged lepton features: EM showers and muon tracks. The lowest performance is seen for indistinguishable intrinsic backgrounds, such as beam-induced electron neutrinos, and events with a misidentified hadron and no visible, leptoninduced track or shower.

VI. CONCLUSION
The DUNE CVN algorithm provides excellent neutrino flavor classification, reaching efficiencies of 90% for electron neutrinos and 95% for muon neutrinos. These efficiencies have basic features that are consistent with those presented in the DUNE CDR [39]. The CVN outperforms the CDR estimates, exceeding the signal selection efficiency over most of the energy ranges shown, albeit with slightly decreased background rejection capability. The results presented here form a key part of the neutrino oscillation analysis sensitivities presented in the DUNE TDR [11]. A proof-of-principle demonstration of final-state particle counting showed a potential mechanism by which to subdivide the event selections to further improve the analysis sensitivity. Future studies of possible systematic biases arising from physics models are planned to ensure the robustness of the particle counting outputs.