Uncertainty aware anomaly detection to predict errant beam pulses in the SNS accelerator

High-power particle accelerators are complex machines with thousands of pieces of equipmentthat are frequently running at the cutting edge of technology. In order to improve the day-to-dayoperations and maximize the delivery of the science, new analytical techniques are being exploredfor anomaly detection, classification, and prognostications. As such, we describe the applicationof an uncertainty aware Machine Learning method, the Siamese neural network model, to predictupcoming errant beam pulses using the data from a single monitoring device. By predicting theupcoming failure, we can stop the accelerator before damage occurs. We describe the acceleratoroperation, related Machine Learning research, the prediction performance required to abort beamwhile maintaining operations, the monitoring device and its data, and the Siamese method andits results. These results show that the researched method can be applied to improve acceleratoroperations.


I. INTRODUCTION
The Spallation Neutron Source (SNS) facility is the world's highest power proton accelerator, delivering 1.4 MW of a 1 GeV pulsed beam at 60 Hz.The beam is accelerated in the linear accelerator which has both a warm, normal conducting and a cold, superconducting section.The accelerated beam is injected into the accumulator to form a very short but intense pulse of intensities up to 1.4 × 10 14 protons per pulse that is sent to a stainless steel vessel filled with liquid mercury [1].The impact of the protons spalls the mercury atoms and neutrons are released.These neutrons are then guided to experimental beam lines where the material research takes place.
Achieving high availability is extremely difficult in high-power proton beam accelerators.These accelerators use thousands of subsystems, with many running on the cutting edge of technology.Errant beam pulses can cause damage to the accelerator and negatively impact the research program.To minimize down times, accelerator operations include preemptively replacing equipment, careful scheduling of maintenance periods, utilizing diagnostic instruments to equipment, and detailed tracking of downtime statistics and patterning.These measures have limitations as failures still happen unexpectedly.Adding more diagnostics instruments could help but is be very expensive.
There is therefore a need for methods that can utilize existing diagnostic data to identify the onset of errant beam pulses.Based on years of analysis of beam trip data, errant beam pulses are caused by equipment failure and as nearly all equipment involved in the acceleration process must have an effect on the beam, we assume that conditions leading to errant beam pulses can be identified by monitoring signals from beam measurements.
This paper describes the results of research being conducted to exploit the extraordinary advantages of machine learning (ML) and the vast amounts of accelerator data in a neutron production facility to improve accelerator availability.The focus of this research is on using ML to predict beam loss due to the failure of various accelerator equipment, using data from a single existing beam monitoring device.If successful, we can avoid the cost of having to upgrade or install additional monitoring devices to predict upcoming failures.

II. PREVIOUS WORK
To date, ML has been used in a limited capacity in the accelerator and target community and has primarily focused on improvements in beam tuning and beam quality [2,3].
The focus of recent studies are on improving accelerator operations by detecting deviations from normal conditions.To this end, several approaches have been considered.Relevant literature focuses on building models of the beam behavior in order to identify scenarios that correspond to beam or equipment errors.Fol et al [4] provide an overview of several potential applications of ML for accelera-tors, including optimization and prediction for tuning accelerator operations, lattice imperfection corrections, and anomaly detection.In addition to this high level overview, two specific applications of ML at the Large Hadron Collider (LHC) are discussed: optics correction (predicting control knob setting to cancel quadrupole field errors) and anomaly detection (detecting faulty BPM) [5].In both cases, the proposed ML produce reasonable results, though the results described seem to show room for improvement in terms of True Positive (TP) and False Positive (FP) rates.Furthermore, Fol et al [6] utilize Isolation Forest (IF) method to identify faulty BPM data that cannot be detected with singular value decomposition (SVD) method.Results of the study illustrate that 90% of unidentified bad BPM data can be isolated when SVD and IF methods are applied together.Emma et al [3] describe an MLP-based prediction of the longitudinal phase space of particle accelerators based on diagnostic measurements.Results from simulation and data from the LCLS indicate good prediction accuracy (mean prediction accuracy ranging from 0.6 to 0.85 depending on the data set used for the training).Similarly, [7] discuss a gradient boosting classifier for identifying beam loss plane contributions to the measured Beam Loss Monitor (BLM) data at the LHC.Other studies [8] discuss the application of ML for Superconducting Radio Frequency (SRF) cavity fault classification.In this study, Radio-Frequency(RF) waveforms from fault conditions are used to assess the potential of different ML algorithms for the classification task.Reported results indicate the ability to accurately (85% accuracy) identify the cavity and cavity fault mode.Wielgosz and Skoczeń [9] propose using a Long Short-Term Memory (LSTM) cell to model superconducting magnets in the LHC in order to imitate operational time series data.After ensuring the ability of the model to represent the system accurately, the authors test the anomaly detection capabilities of their approach.Results indicate that the anomalous behavior can be captured through deviation of data from model predictions.Sanchez-Gonzalez et al [10] undertake prediction of shotto-shot X-ray properties in an X-ray Free-Electron Laser (XFEL) using standard machine learning tools such as artificial neural networks and support vector regression.Presented performance metrics indeed prove that high agreement (as high as 97% on specific variables) between predictions and measured data can be achieved using basic ML models on highrepetition rate data, which allows more accurate and preemptive diagnostics for particle accelerators.
While such anomaly detection techniques are promising, there are few studies using data from a limited number of beam monitors to identify condi-tions leading to beam loss.The goal in this paper is to predict the onset of conditions leading to errant beam pulses at least one pulse in advance from beam measurements.Similar questions are studied in [11] and [12].Rescic et al [11] demonstrate that a Random Forest (RF) is capable of identifying a beam loss event at SNS one pulse in advance using data from one beam current monitor.In [12], the authors examine the potential to predict interlocks (reflecting beam shutoff) using multiple measurements along the accelerator.A recurrence-plots based convolutional neural network (RPCNN) is used to convert the measurements into recurrence plots that are then classified using a convolutional neural network (CNN).Comparisons with a RF approach indicate that the performance of the RF and RPCNN are comparable, though the RPCNN is more successful at identifying anomalies that build up over time.The RPCNN can identify the onset of faults earlier than the RF, although the approach requires data from multiple measurements and may take additional computational time to generate the recurrence plots.It is worth noting that both solutions are challenged when faced with beam loss events that are dissimilar to those in the training data set.
Here, we propose to use a Siamese Neural Network model to provide a similarity ranking between a normal and an anomalous pulse in order to predict an upcoming errant beam pulse.We utilize a Differential Current Monitor (DCM) that uses data from two beam current sensors located upstream and downstream of the source.The sensors used in this study are the same as those in [11].Our approach specifically addresses the use of a limited amount of monitoring devices to identify beam loss events, avoids extensive computational loads and the use of data from multiple measurements, can adjust the threshold to change the FP and TP rates, ca determine is the trained model is outdated, and is not affected by anomalies the model is not trained on.

III. ACCELERATOR PERFORMANCE
To evaluate whether our ML method can be applied in practice, we must use a figure of merit for accelerator performance expressed in ML terms.One of the key metrics to track the performance of the SNS facility is the beam availability.The beam availability is defined as the number of beam hours delivered versus the number of beam hours scheduled.Ideally, the beam availability should be 100% but in practice the maximum achievable yearly beam availability for the facility is currently 95%.The reduction in beam availability is tracked closely by recording beam trip causes, frequencies, and dura-tions.To prevent a negative impact on the scientific research, any method that preemptively and wrongly aborts the accelerator beam should not noticeably increase the current levels of beam trip frequency or duration.While many machine learning applications often allow single digit percentages of FP rates, in the SNS case, a 5% FP level would reduce our integrated power on target by 5%, which would adversely affect the material science research program.We will define an acceptable FP rate by analyzing the number of beam trips and trip durations.

A. Classes of errant beam
In the context of this paper, there are two categories of errant beam pulses, those where the beam pulse is aborted and thus shortened but does not cause significant beam loss in the Superconducting Cavity Linac (SCL), we label those 1100 events, and those that do cause significant beam loss in the SCL, labeled as 1111 events.The first two digits reflect beam truncation while the last two digits reflect beam loss.It is currently difficult to find a direct correlation between non beam loss errant beam pulses and SCL degradation, but there are clear correlations between beam loss and SCL degradation.

FIG. 1: Histogram of the duration between the errant beam pulse and the next available beam pulse for 1111 events (beam truncation with SCL beam loss).
When the errant beam pulse with beam loss is aborted, the accelerator does not automatically restart and an operator has to manually intervene to restart the accelerator.This setup allows the operator to evaluate whether the beam can be turned back on without causing additional beam loss in the accelerator.For the analyzed data-set for the month of March 2021, the total beam production time was ≈26.4 days with average daily trip frequency of SCL beam loss trips of ≈5 per day with an average beam off time of ≈9 seconds, see Figure 1.Not included in the recovery time is a required 30 second beam power ramp to full beam power which then indicates an end to the downtime.Including the 30 second recovery there were then 309,000 total pulses lost per day or 0.22%.Though for this particular month the number of pulses missed from beam loss events was low previous experience has identified that the beam loss can trigger SRF cavity degradation.For a recent beam loss event in July 2021 an SRF issue was triggered by beam loss from an upstream RF cavity fault, and the beam recovery for that particular event took ≈1 hour.Not only did a long downtime result, but SRF cavity gradients were reduced to maintain high reliability.Such long down-times are more troublesome to experimenters as they can lose a significant amount of their allotted beam-time while the lower cavity gradients might require a long maintenance period to treat the cavities and recover their performance.Avoiding pulses beam loss can thus not just avoid the less than one minute downtimes but also the rarer, longer down-times.FIG.2: Histogram of the duration between the errant beam pulse and the next available beam pulse for 1100 events (beam truncation without SCL beam loss).
The 1100 events can auto restart in as little as 4 pulses.Figure 2 shows the histogram of 1100 events for the month of March in 2021 that are shorter than .4seconds to illustrate the auto restart times.Interestingly, this figure does show a longer duration between the errant beam and the first pulse after restart, 8 pulses (pulses are 16.6 ms apart).This is due to an inadvertent abort system setting for this period but this can be shortened to 4 pulses.Many of the shorter 1100 events are due to dropouts of the Radio Frequency Quadrupole (RFQ) cavity, about 5500 over the data-set.This equates to ≈200 per day, or 1,600 pulses per day or 0.03% of beam.Note that these, when spread out, are not noticeable in the beam power as these are typically spread out over minutes and within the noise of the beam power measurements and thus have not been part of the accelerator metrics.While most of the truncated beam pulses have no further effects, it is possible that problems occurs in a cavity that was loaded for a longer beam pulse.The fields will temporarily increase beyond the intended maximum strength exponentially increase field emitted electrons which could cause cavity problems and longer down-times.We have not analyzed the cause of the longer downtimes of the 1100 events, the overall downtime associated is 30,000 seconds over the month or 1.3%.Not all of this can be prevented by aborting beam before the errant beam pulse.For example, 2.5 hours of this down-time was caused by the Personnel Protect System which is unrelated to the acceleration process.

B. Acceptable lost pulses
When focusing on evaluating the benefit of the ML implementation in reducing the frequency of beam loss events in the SCL, we deal with an order of 0.22% of beam lost.A TP rate of 0.5 would gain over 0.1% of beam.The ML implementation should predict this type of event and inform the Machine Protection System (MPS) to not send a beam pulse until the fault condition has cleared.This clearing of the fault could be very quick, assuming a glitch in the equipment, and we can reissue beam after the minimal 4 pulses hold-off.This type of fault would then be an auto restart fault instead of the longer manual reset.If we maintain our erroneously aborted beam pulses lower than 0.1% we would not even have additionally lost additional pulses and protect our accelerator from damage due to beam loss.While the truncated beam pulse due to the RFQ drop-outs might be predicted, the cost of avoiding is the same as the actual abort.Still, the ML implementation could potentially prevent the longer down-times accounting for about 1% of beam lost.
We will not know the actual benefit of the ML implementation until it is activated, but given the current tolerated down-times, we will aim for a FP rate resulting in loss of around 0.2% of beam with a TP rate for 1111 events of 50% while acknowledging that slightly larger or smaller FP rates are a trade-off between losing beam pulses but minimizing damage to the accelerator and interruptions to the research program.Given the 4 pulses downtime per abort, we need a FP rate of 0.05%.This would put our recall, T P/(T P + F N ), at 0.5 and our precision, T P/(T P + F P ), at 0.999.

IV. OVERVIEW OF DIFFERENTIAL CURRENT MONITOR
The beam monitoring device providing the data is the Differential Current Monitor (DCM) [13,14].This device monitors the beam current upstream and downstream of the super conducting section of the accelerator.It aborts beam when it detects a difference between the upstream and downstream beam current, the 1111 event.The DCM can also abort when there is a significant pulse-to-pulse difference, the 1100 event.This form of errant beam must also be aborted, as the cavities' controller assumes the next pulse is also of the shorter length and will not output enough power for the next pulse, resulting in beam losses.The DCM can abort for this scenario and this will tell the cavity controller to not learn on the aborted pulse and keep the settings for the normal pulse.The DCM setup is shown in Figure 3.The two beam current signals are fed into the Field Programmable Gate Array (FPGA) for comparison and generation of the abort signal when the difference exceeds a threshold.The difference signal is fed into two integrating sliding windows, each having its own window length and threshold.The shorter window is setup to detect sudden large losses and the longer window to detect gradual losses.shows what is referred to as a series of macro-pulses, a 1 ms long pulse repeated at 60 Hz.This macro-pulse consists of approximately 1000 mini-pulses.Each mini-pulse is 650 ns and is followed by a gap of 350 ns.Later in the accelerator, these mini-pulses are accumulated in the ring and stacked on top of each other to create a short, 650 ns high intensity pulse that is directed to the target to produce the neutrons.The bottom plot shows an actual measured beam current signal with the initial ramp-up in intensity in the beginning of the macro-pulse as well as the different widths of the mini-pulses during the macro-pulse.This setup is typical for production style beam.

FIG. 4: Pulse pattern and beam current trace.
While other devices such as BLMs can also abort beam, the DCM is quicker, up to three times, to abort the beam by virtue of its fast processing in its FPGA and its special, dedicated connection to the abort system.Another unique feature of the DCM, one that makes it possible to use it for ML, is that it not only archives the beam current traces during the errant beam pulse but also the beam current traces before the errant beam pulse, and a normal beam trace on a regular schedule.The traces can now be used for (semi)-supervised learning with the before trace of an errant beam pulse event as the anomalous class and the before of the normal trace as the normal class.

A. Siamese Model
For this study, we explore the use of Siamese neural network models [15] to provide a natural similarity ranking between two inputs.Our aim is to use the Siamese model to learn the similarities between normal and anomalous pulses, as measured by the SNS DCM sensors.Developing a machine learning model based on a similarity score provides robustness against previously unseen anomalies that could be introduced to the system.Additionally, the similarity score can be used to re-evaluate the applicability of the current model by comparing samples used for training with the current normal pulse.A trending change in the similarity score would indicate the need to re-train the current Siamese model.
A Siamese model consists of twin networks that accept unique inputs.The twin networks are used to shrink the large raw data input to a reduced representation that captures the salient features.The reduced representations of each input are then compared using a modified contrastive loss function [16]: The contrastive loss function is composed of two terms used to decrease the output of like pairs and increase the output of unlike pairs.Here y is the truth value, ŷ is the predicted value, α is tuning parameter use to emphasize the similar pulses that was set to 2 for this study, and β is a second tuning parameter, set to 1, used to emphasize dissimilar pulse.We used a ResNets [17] model for the twin network.ResNets consist of several stacked residual units, which can be thought of as a collection of convolutional layers coupled with a 'shortcut' that improves the propagation of the signal in a neural network.This 'shortcut' allows for the construction of much deeper networks, since keeping a 'clean' information path in the network facilitates optimization.The architecture of the ResNet model is shown in Figure 5.
The model was developed using Keras [18] and Tensorflow back-end [19].We used the Adam optimizer [20] and a cost function as defined in Equation 1.For our study, we used the similarity metric as defined in Equation 2: Here x 1 and x 2 are the latent vector outputs from the ResNet model for input pulse 1 and 2, and i is the element wise index.

B. Uncertainty Aware Siamese Model
Providing methods to reliably quantify the predictive uncertainty for our models is critical for realworld applications.This is acutely visible when the input samples are dissimilar to the training sample.The use of distance-awareness is particularly Gaussian Processes are able to provide predictions with uncertainties, however, this techniques does not scale for large data samples and high dimensional problems.For this study, we extend the Siamese model described in Section V A by replacing the output layer with a Gaussian Process (GP) as described in [21].A classic deep learning model maps the input space to a hidden representation space and it's output layer maps the hidden representation h(x) to the label space y.By wrapping a GP layer around the output layer, we make it distance aware such that it outputs an uncertainty score representing distance between the hidden space of the test data to that of the input space (distribution that the model is trained on) |h(x) − h(x )|.

VI. RESULTS
For this study, we used the Receiver Operating Characteristic (ROC) curve to quantify the performance of our models.The ROC curve indicates the relationship between true positives (TP) and false positives (FP).In our case, a true positive is defined as correctly identifying an anomaly and a false positive is defined as incorrectly identifying a normal pulse as anomalous.We trained two Siamese models with identical architectures/configuration with the exception of the output layer as explained in Section V.

A. Data Environment
A typical trace for training is shown in Figure 8.Each trace contains 120,000 samples, of which samples 3,000-13,000 are used for the Siamese model.Figure 9 displays the box plots for 20 randomly selected traces from the normal set of traces.To remove statistically anomalous traces, each trace is required to have one peak every 750 ns.Peaks were identified using the find peaks method from the SciPy library [22] with a minimum height of 2 mA and a distance of 75 samples (750 ns) between two neighboring peaks.A sample trace demonstrating the identified peaks is shown in Figure 10.In addition, we required a minimum of 900 mini-pulses as setup for accelerator operations in order to exclude non-production setups.
To generate the data set for the Siamese model, we extracted normal and anomalous 'Before' traces from the archived data from March of 2021 from the upstream sensor.From this we selected 4000 anomalous traces and compared each of them to 15 randomly selected normal traces.We used 4000 normal traces and also compared each to 15 randomly selected normal traces.The comparisons between normal traces are assigned a label of 0 and comparisons between normal and anomalous traces are assigned a label of 1.After applying the aforementioned data pre-processing, the data is divided into orthogonal training, testing, and validation data sets that con-  For the developed solution to be of practical use we must identify the maximum number of correctly identified anomalies while maintaining the FP rate below the established 0.05%.As displayed in Figure 12 (zoomed-in sub-plot), we have a true positive rate of more than 60% on both train and test data sets.

C. Uncertainty Aware Siamese Model Results
To implement the uncertainties associated with the outputs of our Siamese model, we wrapped the last layer of the model with a Gaussian process layer.The uncertainty aware Siamese model not only provides a classifier output but also includes the uncertainty of the predictions.We can explore how the model behaves in both dimensions.
As shown in Figure 13, we introduced sample pulses (red dots) with anomalies that the model was not trained on.We can see that the model can predict that these are anomalies with larger uncertainties meaning that the model is less confident in the predictions for previously unseen anomaly type.The increase in uncertainty for these samples is consistent with our expectation.
In order to incorporate the model prediction uncertainties into the ROC curve, we smeared the model prediction output with its associated uncer- Figure 15 shows ROC curve uncertainty band for the unseen anomalies (type 1111) discussed above.Even though the model was not trained on these anomalies it is able to identify more than 45% of the anomalies correctly while keeping the false positive below the threshold of 0.05% though the predictions have higher uncertainties as can be seen in Figure 13.It shuld be noted that the scatter plot also shows that there is a threshold where the FP rate is neglectable.FIG.15: ROC curve with uncertainty band for the inferences made on the anomaly type 1111 with the model trained on anomaly type 1100.The dark and lighter bands represents same range as Figure 14.

D. Siamese Inference CPU timing Results
The Siamese model has been tested on an Intel Core i9 with the timing results coming out at about an average of 2 ms per inference for the deterministic model and 4 ms for the distance aware model.The current DCM CPU is a Core i7 CPU and based on [23] it would still be able to infer within 10 ms.It should be noted that the DCM is scheduled to be upgraded to one of the Xeon CPU (W-2245 or E5-2618) listed in table I.As such, all CPUs considered are expected to easily complete the inference within the allotted time.As the data is transferred by DMA over a PXIe bus on a point by point basis as it is being sampled, the data is almost instantly available to the Real-Time CPU for processing.Current CPU usage is around 2-3 ms per 16.6 ms cycle, that means 16.6 -1 -3 or 12.6 ms of CPU time available to convert the data from fixed point to float, evaluate with the Siamese model, and send an abort signal back to the FPGA.We also tested the deterministic model inference running as a C++ code on NI PXIe-8840 (Core i5-4400E CPU) system with LabVIEW Real Time OS, one inference took under 4 ms.In addition to the uncertainty quantification, we aim to implement a Class Activation Map (CAM) to highlight the distinct region(s) of the pulse the model focuses on when making a similarity classification.This method has been used extensively in recent years upon the realization that Convolutional Neural Networks can perform object localization without explicit supervision of the object [24].This can used to determine specific equipment failure classes.In the future, we can then compare these classes with failed equipment as indicated by the Machine Protection System (MPS) information.

B. Implementation
Given the performance of the method in both execution times and TP and FP rates, we plan to implement the model on the actual DCM system on its real-time system.In this paper we only analyzed the pulse immediately before the fault, however, the DCM also archives up to 25 preceding pulses.As such, we plan to study if the preceding pulses can also provide additional discriminating power which would allow us to identify fault earlier and more accurately.We plan to determine if different equipment failures have different durations from the first detection of an anomaly to the actual errant beam pulse and then apply the appropriate hold-off time for each different type of equipment failure.
We now also have data becoming available from the Beam Position Monitors (BPM).The phase data from the BPMs is especially interesting as the phase relates directly to the momentum of the beam particles and the momentum is directly related to the acceleration process.Thus if the acceleration process is failing, we should see this in the phase data immediately.We hope that in this data even more precursors can be found.

VIII. CONCLUSION
In this paper, we have described an uncertainty aware method to predict impending faults using data from a single data source.The FP rate can be set low enough that the performance of the accelerator is insignificantly affected while maintaining a TP rate high enough to benefit the accelerator.The method allows us to adjust our FP rate so that less beam pulse are wrongly aborted while still preventing over 40 percent of the beam loss events.Another practical aspect of the Siamese model is that we can also feed it pulses that we know are normal, no errant beam pulse either before or afterwards, and determine if this new normal pulse is still similar to the trained normal pulse.This will help us determine if we need to retrain the model.Execution times of the model are such that a practical implementation is possible which will halp us determine the benefits of the errant beam.

FIG. 3 :
FIG. 3: Setup of the Differential Current Monitorshowing one mini-pulses as it passes through the SCL.

FIG. 8 :
FIG. 8: Digitized trace of the beam current before the errant beam pulse

Figure 11
Figure 11 displays the results of the classification.The model identifies most of the anomalies with very high confidence while the remaining anomalies are misclassified as normal.For the developed solution to be of practical use we must identify the maximum number of correctly identified anomalies while maintaining the FP rate below the established 0.05%.As displayed in Figure12(zoomed-in sub-plot), we have a true positive rate of more than 60% on both train and test data sets.

FIG. 11 :
FIG. 11: Classifier output histogram (the sub-plot represents zoomed-in version of the same histogram plot)

FIG. 13 :FIG. 14 :
FIG. 13: Model predicted uncertainty vs uncertainty aware model prediction on a test data set.The blue dots are the normal pulses, the orange dots are the anomaly type used for training, and the red dots are the anomalous pulses that the model was not trained on.

TABLE I :
Expected model inference time for select computing hardware.Core i9-9880H and Core i5-4400E times were measured.