Anomaly Detection at the European XFEL using a Parity Space based Method

A novel approach to detect anomalies in superconducting radio-frequency cavities is presented, based on the parity space method with the goal to detect quenches and distinguish them from other anomalies. The model-based parity space method relies on analytical redundancy and generates a residual signal computed from measurable RF waveforms. The residual is a sensitive indicator of deviation from the model and provides different signatures for different types of anomalies. This new method not only helps with detecting faults, but also provides a catalogue of unique signatures, based on the detected fault. The method was experimentally verified at the European X-ray Free Electron Laser (EuXFEL). Various types of anomalies incorrectly detected as quenches by the current quench detection system are analysed using this new approach.


I. INTRODUCTION
Large scale superconducting particle accelerators such as the European X-ray Free Electron Laser (EuXFEL) comprise several hundreds radio frequency (SRF) cavities. A user facility of this size expects high RF availability in order to accelerate beam and provide users with reliable and predictable photon light. Automation algorithms [1] are necessary to monitor the RF and recover as fast as possible stations where a trip took place. One typical issue causing RF down time is a cavity quench, where an area of the cavity walls becomes normal conducting (thermal breakdown), which translates into a drop of the cavity quality factor by several order of magnitudes leading to a collapse of its accelerating field [2]. The likelihood of a quench taking place increases when cavities are operated at high gradient, close to their quench limits (20 -30 MV/m for EuXFEL cavities). For pulsed accelerators, the usual approach to quench detection consists of measuring the loaded quality factor Q L during the decay phase of the RF pulse [3]. While this approach has proven to be robust detecting actual quenches, it also provides false positive cases, when anomalies in machine conditions have an impact on the Q L computation. In its current implementation, the quench detection system cannot discriminate a real from a "fake" quench and provides by default the same reaction: switching off the RF. However, more information than what is currently used by the quench detection server is available in the control system, and can be used to help analyse each RF pulse, with the intent to discriminate real quenches from other anomalies. The current quench detection server uses the cavity probe amplitude information to compute Q L . The probe phase, the RF forward and reflected signals are also available and can be used to provide a more accurate description of the type of fault taking place. The electrical and mechanical behaviour of SRF cavities is well understood and can be modelled [4]. Thus, model-based fault detection methods can be used to detect faults as it has been demonstrated in [5][6][7]. These model-based approaches provide a promising insight on anomaly detection and categorization. As an alternative, a fully data-based approach is also possible as presented in [8], where machine learning is applied for the identification and classification of faults in SRF cavities for the Continuous Electron Beam Accelerator Facility (CEBAF) at Jefferson Laboratory. The basis for this approach was a data set of several thousand events, manually labeled.
In addition to cavity failures, fault and anomaly detection has gained significant interest for accelerator operation in general due to the great potential to increase availability. Mainly data-driven methods have been used, exploiting tools from machine learning. Application examples range from magnet fault detection at CERN [9] and at the Advanced Photon Source at Argonne National Laboratory [10] over the detection of faulty beam position monitors at the Large Hadron Collider [11] to fault detection at the digital electronics level [12,13].
In this paper, a novel approach to detect cavity anomalies is presented based on the parity space method [5]. This method relies on the well-known cavity model, since no extensive labeled data-set is available at DESY for a purely data-driven approach. The fault detection parity method makes use of analytical redundancy to generate a residual signal from measurable RF waveforms. This residual can be further analysed with statistical tools to evaluate significant deviations from the model, i.e., a fault. Resulting measures provide different signatures depending on the type of detected anomaly. This method can detect faults but also provides a catalogue of signatures based on the detected fault, thus allowing for a more robust cavity quench detection, by minimizing the number of false positives. The method was experimentally verified at EuXFEL and applied to various types of anomalies incorrectly detected as quenches by the current quench detection server. The next Section II provides general information about SRF cavities and their mathematical model. Different types of faults observed, based on four years of operation at EuXFEL are summarized in Section III. The following Section IV provides an insight on the parity space method for fault detection and evaluation exploiting the cavity model. Experimental results are given in Section V.

II. SRF CAVITIES
SRF cavities are electromagnetic resonators that can be modelled as second order systems in the I (in-phase) and Q (quadrature) domain as [4] V P,I (t) where V F (t) = V F,I (t) + jV F,Q (t) ∈ C is the forward field coupled into the cavity. Here, V F,I (t) and V F,Q (t) are the I and Q components, respectively. The state V P (t) = V P,I (t) + jV P,Q (t) ∈ C is the probe signal, i.e. the field which builds up inside the cavity, and V B (t) ∈ C is the field induced by the beam. Note that all signals are considered in base-band. This means that V F (t) represents the envelope of the electromagnetic forward wave which is driven by the main oscillator with a frequency of f = 1.3 GHz. Currently, the European XFEL is operated in pulsed mode with an RF pulse repetition rate of 10 Hz. Each pulse lasts for approximately 1.8 ms and can be divided into filling, flattop and decay. During filling, the electromagnetic forward wave is coupled into the cavity so that the probe electromagnetic standing wave increases up to the desired field gradient. It is held constant via a feedback controller during the flattop, to accelerate the arriving electron beam by the desired energy level. In the decay, the forward electromagnetic wave coupled into the cavity is switched off and the probe wave de- Figure 1 shows the amplitude of the cavity signals in the different operating regions. The parameters influencing the dynamic behavior of the SRF cavities are the half bandwidth ω 1/2 and the detuning ∆ω(t). The half bandwidth ω 1/2 = πf /Q L is defined by the loaded quality factor Q L , which expresses the ratio of the energy stored inside the cavity to the dissipated energy. The external input power coupling to the cavity, defined by the penetration depth of the coupler antenna, is denoted as Q ext . The unloaded, external and loaded quality factors are linked by the following relationship [2]: Due to the superconductivity of the cavities at the European XFEL, the unloaded quality factor is very high (≈ 10 10 ). The external quality factor can be altered to tune the loaded quality factor. In nominal operation the European XFEL is operated at a quality factor around Q L = 4.6 · 10 6 , resulting in a small half bandwidth (≈ 140 Hz). This can be easily estimated during the decay region of the RF pulse as one over the time constant of the exponential decay of the probe signal.
Due to its small half bandwidth, the cavity is sensitive to the detuning ∆ω(t), the second parameter that influences the cavity dynamics. The detuning is the difference between the driving frequency f and the cavities resonance frequency f 0 (t). Ideally, this should be zero, but the resonance frequency is determined by the cavities shape, which changes over time as the cavity is exposed to forces, e.g., mechanical forces and Lorentz forces. In principle the detuning can be modelled as a second order mechanical resonator. However, due to the short dutycycle an approximation as sum of first order systems is given in [4] as ∆ω n (t) = − 1 τ n ∆ω n (t) + K n (V 2 P,I (t) + V 2 P,Q (t)) , n = 1, ..., N, Here N defines the number of relevant mechanical modes (index n), τ n the time constant of the response of the respective modes and K n is the Lorentz force constant.

III. ANOMALIES THAT TRIGGERED THE CURRENT QUENCH DETECTION
The quench detection and reaction system deployed at EuXFEL [3] triggers if the Q L computed for a single pulse drops below the running average by more than a user-defined threshold (typically 10% of the nominal Q L value). In this contribution, four particular conditions or anomalies yielding false positives on the quench detection system are investigated: a controlled cavity detuning, a controlled change of cavity bandwidth, an electron burst linked to a spontaneous field emission, and finally a digital glitch corrupting the data transfer at the firmware or software level. These four cases triggered the quench detection system, although no real quench occurred.

A. Controlled change of bandwidth
Coupler heating (due to operation) can change the cavity external quality factor Q ext by up to 25% [14] for TESLA cavities. When increasing or decreasing the XFEL linac energy to accommodate different user run requirements, the change in cavity forward power is accompanied by a net change in Q ext , due to heating effects. At EuXFEL, the input power couplers are motorized, allowing for remote cavity bandwidth adjustments. As a routine maintenance, the Q ext are monitored and readjusted. While the thermal effects are very slow (several hours), a motorized coupling readjustment can produce a rapid change in Q ext . This change is detected as a change in Q L as it can be seen in Eq. (2) by the quench detection system. In some conditions, if the change is fast enough, the quench detection will falsely interpret this change as a quench. Several countermeasures are possible: one can mask the quench detection system when adjusting Q ext , or adjust the learning rate to low-pass filter these rapid controlled Q L variations. Finally, one could slow down the motorized coupler drive or increase the complexity of the quench detection system by having it check for motor movements. Having an independent measurement that can provide additional information to help in the algorithm decision process is the approach chosen in this contribution.

B. Controlled detuning
As reported in [3], changing the cavity tuning has an influence on the measured cavity coupling. A 1 kHz change in cavity resonance can infer a deviation of up to 15% in cavity bandwidth at EuXFEL. This coupling comes from the measurement error due to the limited isolation between the forward and reflected ports of the waveguide bi-directional couplers. When detuning the cavity, the increased reflected traveling wave couples out to the incident signal, changing its amplitude and phase. While the cavity external quality factor, Q ext , remains constant, the measured loaded quality factor Q L is affected. In some cases, a fast detuning (requested by operation for example) can fool the quench detection system and result in a station trip. Some safety mechanisms are built in to avoid such false positives: for example ignoring detected quenches if the cavity gradient is below a userdefined minimal threshold (2.5 MV/m typically). Heavily detuned cavities are hence ruled out, but this exception handling does not cover all cases. Mechanical coupling between the deformation of the cavity due to the tuner motor and the penetration of the coupler's antenna can also be observed. At EuXFEL however, the impact of this mechanical coupling is of second order compared to the electrical cross talk described above.

C. Field emitter and other electronic processes
Spontaneous field emission and multipacting [15] [16] can be observed sporadically on some cavities. During the cryomodule test phase [17], prior to XFEL tunnel installation, cavities prone to field emission were identified and flagged. In some cases, their operating gradient was limited to keep the emitted radiation below the acceptance threshold [18]. However, new field emitters can appear during operations, and can be triggered through secondary emission due to the onset of another field emitter. Such events produce erratic beam loading, often observed as flickering and noisy cavity RF traces taking place at random times during the RF pulse. In some extreme cases, the produced dark current is large enough to discharge the cavity field within tens of nanoseconds. This effect also referred to as plasma discharge can affect neighbouring cavities, that will see this beam loading, albeit in a reduced magnitude. If this disturbance takes place during the decay phase of the RF pulse, it will influence the computation of the Q L value, falsely triggering the quench detection system. These spontaneous electronic processes are typically accompanied by radiation bursts, observed by gamma and neutron detectors in the tunnel. In most cases, field emitters are conditioned away after a few pulses, but not always. A simple conditioning attempt is not always successful or could result in generating new field emitters via secondary emission. Techniques such as plasma processing have been developed to cure field emission [19] but are currently not implemented at DESY.

D. Digital glitches
Several failures in the digital domain have been observed so far, resulting in a corruption of the data transferred between electronic boards. Due to the fact that all electronics responsible for cavity field control are located inside the tunnel, they are subject to radiation showers that can produce single event upsets, flipping a bit in the digital data stream. There are countermeasures in place to track and fix such events (check-sum or cyclic redundancy checks) but multiple bit flips taking place simultaneously cannot be fixed with this approach. Another failure mode has been observed when the CPUs in the tunnel become temporarily overloaded, delaying data memory accesses to the time when the FPGA is writing to memory. These read/write collisions result in data loss, where not all data points are recorded. These two failure cases can result in discontinuities in the cavity waveforms, corrupting the Q L computation. Such events typically trigger the quench detection system, although no real quench occurred.

IV. FAULT DETECTION
In the following section we present a method for fault detection, the parity space method. A model-based approach is chosen because it does not require large data sets for training, and since a good model description of the cavities is available, the method is robust against changes in operating point. With the parity space method a residual is generated, as described in Section IV B, that can be further evaluated by statistical tests, see Section IV C.

A. Parity Space Method
The parity method is a method for fault detection that is based on analytical redundancy. Analytical redundancy relations are derived from an analytical model, as the cavity model in (1) and (3), and only involve measured variables [20]. These have to hold in absence of a fault. Thus, the analytical model has to represent the nominal behavior of the system. The deviation from the analytical relation is called residual, i.e., if the residual is zero, the system behaviour is as described by the model, thus behaving nominally, otherwise it is behaving faultily.
Consider the nonlinear systeṁ Here (4) is called the state equation and x(t) ∈ R n is the state vector at time t, u(t) ∈ R m the control input, d(t) ∈ R m d is the disturbance vector, representing sensor noise, unmodelled dynamics, etc. The fault signals are described by ν(t) ∈ R mν . Equation (5) is called the output equation and y(t) ∈ R n is the output vector. For fault detection a residual r(t) has to be described, depending on known signals only, that is zero in the absence of faults and different from zero otherwise [21], i.e., It is clear that in real system the residual will never be exactly zero due to the disturbances d(t), which can not be measured. Therefore, instead of (6), the residuals need to be robust against disturbances d(t) but at the same time sensitive to faults ν(t). We will tackle this in the following with a stochastic interpretation of the residuals and introduce with the generalized likelihood a measure when the residual significantly deviates from zero, thus the behaviour is not consistent with the nominal behaviour, i.e., there is a fault. This will be presented in detail in Section IV C.

B. Residual Generation
It is assumed in the following that the output y(t), the input u(t) as well as multiple derivations of these signals are known, but not the states x(t) (and neither d(t) nor ν(t)). To calculate the residuals, analytical redundant expressions of the model description are exploited. These can be generated by deriving the single equations of (4) and (5) multiple times and eliminating the unknown states to obtain residuals that are only dependent on the known signals r i = P i (u, y, u (1) , u (2) , . . . , u (αi) , u (αi) ) , i = 1, 2, . . . , where i denotes the i-th residual. With u (k) the k-th derivative of u(t) is defined. Depending on the residual i, the highest degree α i of deviation might be different. While the elimination of the unknown states is obvious for linear systems [20], it can be very involved for nonlinear ones [21].
Given the model of SRF cavities (1) and (3), three (redundant) relations for the non-measurable detuning ∆ω(t) can be derived as While ∆ω I (t) and ∆ω II (t) are already functions of known signals only, ∆ω III (t) is a function of the nonmeasurable states ∆ω n (t) for n = 1, . . . , N . As the relationship is linear, standard procedures as in given in [20] exist to eliminate the non-measurable states. Then, three different residuals can be derived, i.e., ∆ω I (t) − ∆ω II (t), ∆ω I (t) − ∆ω III (t) and ∆ω II (t) − ∆ω III (t). As the first has been shown in [5] to be most informative, we will concentrate on Further reasons for this choice are that (i) for this residual no additional derivation of the data are required, which amplifies noise, and (ii) ∆ω I (t) and ∆ω II (t) only depend on ω 1/2 , which is either known or can be easily determined, as described in Section II, while ∆ω III (t) depends on τ n and K n for n = 1, . . . , N , for which an additional system identification step is necessary. The continuous physical cavity system, y(t) with t ∈ R, is sampled in discrete time y(t 0 + kT ) for k ∈ Z, where T is the sampling period. This will be abbreviated as y(k) in the following. In order to account for this, the residual (7) needs to be discretized. For the sampling rates in question (1 MHz or 9 MHz depending on which layer the algorithm is implemented), Euler forward discretization can be used withẏ(t) ≈ y(k+1)−y(k) T . ReplacingV P,I (t) andV P,Q (t) accordingly in (7) leads to To improve the numerical properties of this residual, for implementation (8) is multiplied with V P,I (k)V P,Q (k)T .

C. Residual Evaluation
There are several approaches to evaluate residuals; in this work we apply statistical tests. Here, likelihood ratios are indicators of the goodness of fit of a null hypothesis H 0 versus an alternative one H 1 by the ratio of their likelihoods, assessing if the observed residuals supports or significantly disagrees with the null hypothesis.
Here p(µ 1 |r(k)) is the probability of the alternative hypothesis H 1 given the observed residual r(k); the complimentary probability p(µ 0 |r(k)) is defined accordingly.
For the application at hand, we assume in case of a fault or anomaly, i.e., in case of the alternative hypothesis, a jump in the mean value of a Gaussian distribution as With this in nominal operation, we expect that the residuals follow a zero-mean Gaussian distribution with variance Σ, while in case of a fault or anomaly in the system, a jump in the mean value appears. So that the data is still Gaussian distributed with the same variance but the mean value is different. While the variance Σ can be calculated from the given nominal data, the mean µ 1 is unknown. For estimating µ 1 , the maximum log-likelihood ratio, i.e., generalized likelihood ratio (GLR), is considered, derived from (9) by replacing µ 1 by its maximum likelihood estimate, given as [22] λ GLR (k) = max µ1 λ LR (k) (11) Under the assumption (10), 2λ GLR follows a χ 2 (n) distribution. Here, n is the dimension of the residuals, which in our case is one, since only the one-dimensional residual (7) is considered. With this an error thresholdλ GLR can be chosen from the χ 2 (1) distribution, so that λ GLR (k) >λ GLR is considered to be erroneous (or anomalous), according to a chosen acceptable probability of false positives alarms given as P (Q > 2λ GLR ) with Q ∼ χ 2 (1). In order to fulfill the assumption (10) for the nominal case, the residuals coming from the parity space (7) need to be zero mean. Thus, we correct them for the mean value as wherer(k) is the mean value of sample k over P pulses. It is clear that this would not be necessary if the model equations (1) and (3) would perfectly fit and (10) would only contain the measurement noise. But as discussed above, this is never the case in reality. Thus repetitive disturbances or model mismatches can be corrected using Eq. (12). This approach also allows taking the beam contribution into account. As obvious in model (1) the beam is an input here. Although it can be measured, the beam information was not available in the given data sets. Thus, the beam loading is considered as a repetitive disturbance, whose effect is cancelled by the mean value correction in (12).

V. RESULTS
In the following section, the parity space method for fault detection is applied to real operational data, and the different cases described in Section III, that falsely trigger the quench detection system, are presented and analysed.

A. Data
The data that will be analysed in this section is provided by a snapshot recorder, triggered by any interlock occurring in the RF system. For each snapshot, the cavity signals of 250 pulses, i.e., 25 seconds, are saved, with 200 pulses before the event and 50 afterwards. Unfortunately, the beam signal is not saved, but a good workaround to deal with the missing signal is presented in Section V B. Each pulse in the data set is labeled with a unique pulse identifier, the ID, and the data also includes a time vector. The data is saved as hdf5 files. One data sets contains the forward, reflected and probe signals in amplitude and phase, which can be easily transferred to the I and Q domain. Time domain signals are 1.82 ms long, sampled at 1 MHz yielding 1820 samples per waveform.

B. Implementation
The fault detection infrastructure is implemented in C++. The analysis is automated with an easy-to-use console command. Although only results of a-posteriori analysis are presented here, the algorithm is optimized for real-time and can cope with 10 Hz RF pulse repetition rate for pulse-to-pulse detection. This run-time operation capability was demonstrated for a small amount of cavities. See [23] for further implementation details.

C. Statistics
From 07/08/2020 till 11/18/2020, 34 snapshots were saved, triggered by the quench detection system. After a postmortem analysis of these trips, 18 of them have been identified as real quench. This yields a false positive rate of 47%. Thus, almost half the events identified as quenches are no quenches during this period of operation. Within this period of 15 weeks, the European XFEL was operated at high gradient for five weeks (i.e. 16.5 GeV beam energy) and ten weeks at low gradient (14 GeV beam energy). Two third of the real quenches occurred within the weeks of high gradient. Figure 2a shows the distribution of the events, i.e., real quenches and false positive, over the different stations (denoted as A2 to A25). Station A11 is the highest performing station and was operating very close to its quench gradient during the high-gradient study period. This explains the higher quench rate. Otherwise, the quenches are relatively spread over the stations. The duration of down-time caused by the respective event is shown in Fig. 2b. Here one real quench with a down-time of more than an hour is missing. Due to this outlier, the mean down-time for the real quenches is almost seven minutes, while it is two minutes for the false positives. The median however is very similar in both cases around 100 s. The residuals are calculated as given in (8), scaled with V P,I (k)V P,Q (k)T for numerical issues as discussed in Section IV B. Figure 3 shows the residuals of nominal pulses. The averaged residuals without beam show that the model fits very well and the residuals are mean value free. Since the presence of beam is not yet implemented as an input to the model and not available as measurements in the data, the presence of accelerated beam is clearly visible as a model mismatch as shown in Fig. 3. As the beam signal is not changing during the short snapshot data sets that we consider, we can correct for this using (12). Furthermore, it is obvious that with the residual calculation we are in the resolution range of the considered signals as one clearly sees quantization effects. This is confirmed in the histogram in Fig. 4a. As the Q-Q plot in Fig. 4b appears to be relatively linear, there might be an underlying Gaussian distribution, however this is clearly distorted by quantization. With this, the assumption in (10) does not hold. Nevertheless, the GLR can be used as metric, as it can be interpreted as a weighted error norm and the experimental distribution of nominal data can be considered to choose for a reasonable threshold. The upper boundλ GLR = 10.8, is chosen empirically, such that the given nominal data would have not led to a false positive alarm. This bound would have led to a false-positive rate of 0.0003% if the λ GLR followed the chi2 distribution.

E. Analysis of the Generalized Likelihood Ratio
1. Quenches Figure 5 shows the cavity signals of a quench event together with the GLR for the first pulse, where the thresh-  old ofλ GLR = 10.8 is exceeded. The GLR for this pulse and the three consecutive ones are shown in Fig. 6. The RF was switched off on the following pulse by the quench detection system. As shown here, there can be a delay of multiple pulses between quench detection and reaction, due to the current software implementation of the quench  detection. Thus, it is unclear at what pulse the quench detection system has actually been triggered. All signals of the GLR follow a characteristic bell curve, initiating during the decay of the first pulse and occurring at earlier times for each successive pulse. One quench event is selected here for demonstration but all of them show very similar GLR traces.

Field emitter and other electronic processes
The RF signals observed during a plasma discharge event and the corresponding GLR for cavity 4 of cryomodule 3 (C4.M3) are shown in Fig. 7, where the accelerating field is discharged within tens of nanoseconds. This strong anomaly is confirmed by an extremely high GLR value. For this event, neighbouring cavities have also seen this beam loading effect, albeit in a reduced magnitude. Figure 8 shows the discharge of a fraction of the probe field (see Box 1) for a neighboring cavity within the same cryomodule (C1.M3). The propagating effect downstream, but also upstream is further illustrated in Fig. 9 where the maximal GLR value over the whole pulse is plotted for all cavities within the station. The cavity where the event took place (C4.M3) is clearly identified, its GLR being one order of magnitude higher than all others. One can further see the slow decay of the effect on downstream cavities (C5 to C8 in M3 and C1 to C4 in M4) and upstream (C3 to C1 in M3 and C8 in M2), where the threshold ofλ GLR = 10.8, marked in red is exceeded. Most of the energy is scattered at the quadruple located at the end of each cryomodule, due to the mismatch between the quadrupole setting and the energy of the accelerated dark current.
The GLR signature is clearly distinct from that observed in case of quenches. It can be further noticed for example in Fig. 8 that very small effects in the cavity signals get strongly emphasized by the GLR.

Glitch
The cavity signals for a digital glitch together with the GLR are shown in Fig. 10. Here the signals of one cavity are shown as an example, but all cavities within the module show the same distortions. It is obvious that this cannot be a physical fault, since forward and reflected signals are shifted in time during the flattop region, one compared to the other (see Box 1), while the probe and reflected waveforms differ during the decay (see Box 2). This is not physically possible, as the probe equals the sum of the forward and reflected signal, and the forward signal is zero during the decay.

QL Change
To study the effect of a controlled change in Q L , the motorized antenna of an input power coupler was deliberately moved over several pulses. This change in Q ext directly impacts the measured Q L , as shown in Fig. 11. Figure 12 shows the GLRs corresponding to the eight successive pulses marked by red crosses in Fig. 11. While the first one looks normal (numerical noise), a clear signature is visible starting from the second pulse, remaining relatively flat in the filling, followed by a first step up when the flattop begins and a second one at the beginning of the decay followed by a linear decay. The amplitude of this specific signature increases as Q ext is further reduced. Two specific pulses (the second and the last one of those depicted in Fig. 12) are shown in detail in Fig. 13 and 14. While the threshold is not hit for the second pulse in Fig. 13, one clearly sees the distinct signature. The threshold is hit two pulses later. After the eighth pulse, shown in detail in Fig. 14, the quench detection system has switched off the cavity. Figures 15 and 16 show an event, were the quench detection system is fooled by a fast detuning change. The first change in cavity detuning is observed in Fig. 15 (pos- itive tilt of the reflected signal during the flattop region). Larger detuning is evidenced in Fig. 16, characterized by the round drop of the cavity probe gradient and steeper increase of the reflected signal during the flattop. Also here the GLR shows a special signature, distinct from the ones observed before: a linear increase during the filling and the flattop while it is almost zero during the decay. It is interesting to note that the GLR stays below the threshold despite the heavy detuning. This is to be expected, as the residual in (7) is the difference between detuning values resulting from two differential equations of the same electromagnetic model (1). If we assume that the model holds, a common mode detuning change will affect both equations identically, hence yielding a zero residual. The reason that we see distinct traces can be explained by model imperfections, but they do not significantly affect the fault detection, as it is desired here not to fool the detection.

VI. CONCLUSION
This work presents the parity space method for fault detection in SRF cavities followed by statistical eval- uation using the generalized likelihood ratio. It validates the good performance of the method in detecting faults and also shows that different fault types result in clearly distinct signatures of the generalized likelihood ratio. The distinction of different fault types could also be achieved by a thorough analysis of all six cavity signals (I and Q values of probe, forward and beam signals). However, with the generalized likelihood ratio, all information is embedded into one single signal. This is of practical use for online operation, as the operator only needs to observe one signal but also facilitates the a-posteriori classification of faults. Furthermore, this opens the door towards automatic classification of faults using classification tools from machine learning, which is subject to future work. Such an online automatic classification would be of great support for linac operation.
It could be demonstrated that, due to the choice of residuals, the method presented here is robust against changes in cavity detuning. Thus, in contrast to the current quench detection system, the proposed approach would not result in false positives of this kind. Detecting such abrupt detuning changes could be realized using additional residuals. One possibility would be to solve the electromagnetic model for the half bandwidth. It is expected that this should be independent of a change of Q L , but detect a change in the detuning.
In this work, the cavity faults have been analysed post mortem. However, the code for fault detection and evaluation is able to cope with real-time requirements and can be switched to support offline and online analysis. The online analysis has already been demonstrated for a small amount of cavities. It will be subject to future work to bring this to normal operation for all cavities.