Application of dictionary learning to denoise LIGO's blip noise transients

Data streams of gravitational-wave detectors are polluted by transient noise features, or"glitches", of instrumental and environmental origin. In this work, we investigate the use of total-variation methods and learned dictionaries to mitigate the effect of those transients in the data. We focus on a specific type of transient,"blip"glitches, as this is the most common type of glitch present in the LIGO detectors and their waveforms are easy to identify. We randomly select 80 blip glitches scattered in the data from advanced LIGO's O1 run, as provided by the citizen-science project Gravity Spy. Our results show that dictionary-learning methods are a valid approach to model and subtract most of the glitch contribution in all cases analyzed, particularly at frequencies below $\sim 1$ kHz. The high-frequency component of the glitch is best removed when a combination of dictionaries with different atom length is employed. As a further example, we apply our approach to the glitch visible in the LIGO-Livingston data around the time of merger of binary neutron star signal GW170817, finding satisfactory results. This paper is the first step in our ongoing program to automatically classify and subtract all families of gravitational-wave glitches employing variational methods.


I. INTRODUCTION
The third observational campaign of the advanced gravitational-wave (GW) detectors LIGO [1] and Virgo [2], O3, is currently ongoing. During the previous two campaigns, O1 and O2, the GW detector network reported the observation of eleven compact binary mergers [3] comprising ten binary black holes and one binary neutron star. The latter, GW170817 [4], was accompanied by extensive and very successful follow-up observations of electromagnetic emission originated from the same astronomical source [5]. Moreover, neutrino searches were also carried out, yet no detection has been reported [6]. The entire GW strain data from O1/O2 has been made publicly available and, in particular, the data around the time of each of the eleven O1/O2 events are accessible through the Gravitational-Wave Open Science Center 1 .
Since the start of O3 on April 1st 2019, GW candidate events are being released as public alerts to facilitate the rapid identification of electromagnetic or neutrino counterparts. The growing list of candidates can be inspected at the GW Candidate Event Database [7]. Toward the end of O3, the GW detector network may be increased by yet another facility with the addition of the KAGRA detector [8].
The detection of GWs is severely hampered by many sources of noise that contribute to a non-stationary background in the time series of data in which actual GW signals reside. The sensitivity of the instruments is limited at low frequencies (below ∼ 20 Hz) by gravity-gradient (seismic) noise and at high frequencies (above ∼ 2 kHz) by photon-shot 1 https://gw-openscience.org noise originated by quantum fluctuations of the laser. The detectors are most sensitive at intermediate frequencies (∼ 200 Hz) where the Brownian motion of the suspensions and mirrors is the limiting source of so-called thermal noise. Moreover, the data stream is polluted with the presence of transient (short duration) noise signals, commonly known as "glitches", whose origin is not astrophysical but rather instrumental and environmental. We refer to [9] for a comprehensive overview of the LIGO/Virgo detector noise and the extraction of GW signals.
Glitches difficult GW data analysis for a number of reasons. By their short-duration nature they contribute significantly to the background of transient GW searches. Glitches may occur sufficiently frequently to potentially affect true signals, particularly when occurring in (or almost in) coincidence. Furthermore, some types of glitches show time-frequency morphologies remarkably similar to actual transient astrophysical signals, which increases the false-alarm rate of potential triggers. Moreover, having to remove portions of data in which glitches are present downgrades the duty cycle of the detectors. It is however not trivial to remove defective segments of data. The simplest approach, i.e. setting them to zero, might result in a leakage of excess power, which may turn the mitigating approach more damaging than the very effect of the glitch.
For all these reasons, understanding the origin of glitches and mitigating their effects is a major effort in the characterization of GW detectors [10,11]. Indeed, in recent years many strategies have been developed to automatically classify glitches. The approaches are as diverse as Bayesian inference, machine learning, deep learning, and citizen science [12][13][14][15][16][17][18][19][20]. Recent examples of glitch mitigation are reported in [21][22][23][24]. Ref. [21] describes various deglitching methods to extract the strong glitch present in the LIGO-Livingston detector about 1s arXiv:2002.11668v1 [gr-qc] 26 Feb 2020 before the merger of the binary neutron star that produced the signal GW170817 [4]. In [22,23] the impact of loud glitches is reduced by using an inpainting filter that fills the hole created after windowing the glitch. Glitch reduction, together with other techniques, was shown to improve the statistical significance of a GW trigger. In addition, deep learning approaches have also proven very effective to recover the true GW signal even in the presence of glitches [24].
There are many different families of glitches identified during the advanced LIGO-Virgo observing runs [14,25,26]. Glitches from each family have a similar morphology, although the characteristics of each specific glitch in terms of duration, bandwidth and signal-to-noise (SNR) ratio can vary significantly even for glitches inside the same family. In this work we focus on blip glitches, a noise transient characterised by a duration of about 10 ms and a frequency bandwidth of about 100 Hz. This type of glitches, which has mainly been found in the two LIGO detectors, significantly reduce the sensitivity of searches for high-mass compact binary coalescences. Blip glitches in LIGO data are identified using both the PyCBC pipeline search (see [25] and references therein) and the citizen-science effort Gravity Spy [14]. The recent study of [25] based on PyCBC has found that Advanced LIGO data during O1/O2 contains approximately two blip glitches per hour of data (amounting to thousands of blip glitches in total). The physical origin of most of them remains unclear.
This paper explores the performance of dictionary-learning methods [27] to mitigate the presence of blip glitches in advanced LIGO data. To this aim we select a large number of blip glitches randomly distributed along the data stream from advanced LIGO's first observing run. For each glitch, the data correspond to a one-second window centered at the GPS time of the glitch as provided by Gravity Spy [14]. As in [21,22,24] the goal of our work is to mitigate the impact of glitches in GW data in order to increase the statistical significance of astrophysical triggers. Our results show that dictionary-learning techniques are able to model blip glitches and to subtract them from the data without significantly disturbing the background.
The paper is organized as follows: In Section II we summarize the mathematical framework of the variational methods which are at the core of the dictionary-learning approach we use. Section III discusses technical aspects, namely the whitening procedure we employ to remove noise lines and other artefacts, the training of the dictionaries, and how we perform the reconstruction of the glitches. The results of our study are presented in Section IV. Finally, a summary is provided in Section V. Appendix A shows the spectrograms of the 16 blip glitches from O1 we employ in our test set and reports their main characteristics.

A. Total variation methods
In this paper we employ two different variational techniques based on the L 1 norm, the Rudin-Osher-Fatemi (ROF) method [28] and a Dictionary Learning method [27]. We have recently begun to use these procedures in the context of GW data analysis in [18,[29][30][31]. Both approaches solve the denoising problem, y = u + n, where u is the true signal and n is the noise, as a variational problem. The solution u is thus obtained as where R is the regularization term, i.e. the constrain to impose in the data and F is the fidelity term, which measures the similarity of the solution to the data. The parameter λ is the regularization parameter and controls the relative weight of both terms in the equation. Even though both methods solve the same general problem, each one of them approaches the problem in a different way and, therefore, the regularization term and the fidelity term have different expressions.
In 1992, Rudin, Osher and Fatemi [28] proposed the use of the so.called total-variation (TV) norm as the regularization term R(u) = Ω |∇u| constrained to ||y − u|| 2 . Note that | · | and || · || represent the L 1 and L 2 norms, respectively. This specific formulation of the variational problem (2.1) is called ROF model and reads This model preserves steep gradients, reduces noise by sparsifying (i.e. promoting zeros) the gradient of the signal and avoids spurious oscillations (Gibbs effect). However, the associated Euler-Lagrange equation, given by becomes singular when |∇u| = 0. This issue can be easily solved by changing the standard TV norm by a slightly perturbed version (see [29] for a detailed explanation), where β is a small positive parameter. We refer to this modified version of the regularization term in the method as regularized ROF (rROF).

B. Sparse reconstruction over a fixed dictionary
In dictionary-based methods, the denoising is performed by assuming that the true signal u can be represented as a linear combination of the columns (atoms) of a matrix D called the dictionary. If the signal can be represented with a few columns of D, the dictionary is adapted to u. In other words, there exists a "sparse vector" α such that u ∼ Dα. As a result, the fidelity term in Eq. (2.1) reads, (2.5) In other words, the problem reduces to finding a sparse vector α that represents the signal u over the columns of the dictionary. The next step is to find a regularisation term that induces sparsity over the coefficients of α. Classical dictionarylearning techniques [32,33] use as regularisation term the L 0 -norm, which is chosen to ensure that the solution has the fewest possible number of nonzero coefficients. However, this problem is not convex and is NP-hard, i.e. it can be solved in non-deterministic polynomial-time. If the L 1 -norm is used instead of the L 0 -norm, the problem becomes convex. This type of regularisation promotes zeros in the components of the vector coefficient α, and the solution is the sparsest one in most cases. The variational problem thus reads, which is known as basis pursuit [34] or LASSO [35]. In this paper we solve Eq. (2.6) using the Alternating Direction Method of Multipliers (ADMM) algorithm [36].

C. Dictionary Learning
In the previous section we have assumed that the dictionary D is fixed and we only solve the problem of representation. Traditionally, predefined dictionaries based on wavelets, curvelets, etc, have been used. However, signal reconstruction can be dramatically improved by learning the dictionary instead of using a predefined one [37]. In this approach a set of training signals is divided into patches in such a way that the length of the patches is less than the total length of the training signals. In most common problems, the number of training patches m is large compared with the length of each patch n, n m. The procedure to train the dictionary is similar to Eq. (2.6) except that the dictionary D should now be added as variable, where x i denotes the i-th training patch. Unfortunately, this problem is not jointly convex unless the variables are considered separately. In [27] a method based on stochastic approximations was proposed. These approximations process one sample at a time and the method takes advantage of the problem structure to efficiently solve it. For each element in the training set, the algorithm alternates a classical sparse coding step, to solve for α using a dictionary D obtained in the previous iteration, with a dictionary update step, where the new dictionary is calculated with the recently calculated values of α, namely As in [27] we use a block-coordinate descent method [38] for solving D and α i iteratively.

III. DATA SELECTION AND DICTIONARY GENERATION
In our previous work we applied learned dictionaries to denoise numerically generated gravitational waveforms from simulations of supernovae core-collapse and binary black hole mergers [30] and to classify simulated glitches [18]. The data employed was in either case embedded in non-white Gaussian noise to simulate the background noise of advanced LIGO in its broadband configuration. This work takes a step further in our efforts by tackling the denoising problem with dictionaries employing real data, in the form of actual glitches from advanced LIGO's O1 data.
We focus on blip glitches, the most common type of glitch found in the two LIGO detectors and whose origin remains mostly unknown [25]. Blip glitches, characterised by a duration of ∼ 10 ms and a frequency bandwidth of ∼ 100 Hz, have a distinctive tear-drop shape morphology when seen is a time-frequency (spectrogram) plot. By their intrinsic properties and recurring presence they can significantly reduce the sensitivity of searches for high-mass compact binary coalescences. In order to focus only in these noise transients, we apply a whitening procedure to the O1 data to remove all systematic sources of noise from the data, like the calibration and control signals that appear as lines in the spectrum, and also to flatten the data in frequency. The whitening algorithm uses the autoregressive (AR) model of [39,40]. The AR model employs 3000 coefficients estimated using 300 s of data at the beginning of the corresponding science segment of every glitch. This type of whitening has proved to be very robust and, as it is applied in the time domain, it is not affected by the typical border problems that appear in frequency-domain methods.
To measure the accuracy of the glitch reconstruction and mitigation we employ two quantitative estimators. The first one is based on the time-frequency distribution of the power of the signal. We integrate the power spectrum for all frequencies for each temporal bin, and then we calculate the ratio between the maximum power and the mean power for all times. We will refer to this estimator as SNR: where S(t, f ) is the time-frequency representation of the data, and f s is the sampling frequency. This SNR estimator is different from the one provided by the optimal filter, which is based in theoretical templates. Our second estimator is called the "whiteness" [40] where P (f ) is the power spectral density (PSD) of the data. The whiteness measures the spectral flatness. As the presence of glitches implies an increment of power with respect to a glitch-free background, if P (f ) is very peaky then W ∼ 0, and if P (f ) is flat then W = 1.

A. Training
We randomly select 80 blip glitches scattered in the data from advanced LIGO's first observing run. While this is not a big sample, it seems sufficient to assess the performance of learned dictionaries in removing glitches. The data corresponds to a window of 1 s centred at the GPS time of the glitch as provided by Gravity Spy [14]. The data is divided in two different sets; 80% is used to train the dictionary while the remaining 20% (which includes 16 blip glitches) is used to test the algorithm. The morphologies of all 16 glitches are presented in Appendix A. The data is downsampled from their original 16384 Hz to 8192 Hz to speed up the algorithm and reduce the computational cost.
The training process is performed as follows. After whitening all the data, we select the main glitch morphology using a window of 1024 samples around the GPS time of the glitch. Then, data from all glitches is aligned and organised in a matrix to build the initial dictionary. Next, we select 30000 random patches of a given length and start the block-coordinate descend method to obtain the trained dictionary. The length of the patches is the same as the atoms of the dictionary and, jointly with the number of atoms, is a hyperparameter of the model.
We explore dictionaries formed by atoms of length in the range [2 3 , 2 9 ]. We also vary the number of atoms of each length to understand its possible effect on the results. Our study shows that a dictionary of 128 samples for atoms is a good choice. The number of atoms seems not to be very relevant as long as the dictionary is over-completed, i.e. the number of atoms is larger than the length of the atoms.
One intrinsic difficulty of applying dictionaryreconstruction techniques to noise transients instead of to actual signals is that we lack of a "clean" signal to use as a model to the dictionary. Glitches have a random component due to the background. Even though the learning step has denoising capabilities, the block-coordinate descend method can have problems of convergence when the patches contain a large stochastic component. To improve the convergence of the learning step and the extraction results, we introduce an additional step between the whitening and the patch extraction. Namely, we use the rROF method to reduce the variance of the data used for training. This process results in smoother atoms and in a cleaner reconstruction of the glitch, which translates in a better separation between the background and the glitch morphology. An example of a dictionary is shown in Fig. 1.

B. Reconstruction
Once the training step is complete, we use the resulting dictionary to extract the blip glitch from the background. As the length of the atoms is always shorter than the length of the test signals, we perform the reconstruction with a sliding window with an overlap of n − 4 samples, where n is the length of the atoms. The overlapped samples are averaged to obtain the final reconstruction.
In addition, we apply the ADMM algorithm in an iterative way. Starting with the original data y, we perform a reconstruction over the dictionary u. Then, this reconstructed signal is subtracted from the original data and the resulting residual is used as the new input. This procedure converges in the sense that in each iteration we subtract less signal from the background. Therefore, this iterative produce is applied until the differences between the residuals of consecutive iterations is less than a given tolerance. For most cases, a typical tolerance of 10 −3 − 10 −4 is enough to produce good results.

C. Regularization parameter search
Reconstruction results heavily depend on the value of the regularization parameter λ. If its value is large, the relative weight of the regularisation term in Eq. (2.6) is larger and more atoms are used. On the contrary, with a low value of λ less atoms are used and more details of the signal (and noise) are recovered. In previous papers [29][30][31], we found the optimal value, i.e. the one that produces the best results, comparing the denoised signal with the original one from GW catalogs from numerical relativity. In the present case, as there is not a true signal to compare with, we cannot determine the optimal λ in the same way.
However, as our goal is to extract the glitch trying to keep the background unaltered, we design a different approach to obtain an optimal value of λ. One of the good properties of dictionary-learning methods we found in our previous studies is that the reconstruction returns zero when the data are very different from the dictionary. This statement is true for sufficiently large values of λ. As mentioned before, if the value is too low, the regularization term in Eq. (2.6) becomes negligible and we would be solving essentially a least meansquares fit, which in practice translates in a very oscillating reconstruction. If the value is very large, it is the fidelity term the one that becomes negligible, and the problem transforms in the minimization of the L 1 -norm of the vector α, whose solution is the vector zero. Therefore, our goal is to find the first value of λ that returns zeros in the part of the data dom-inated by the background but also produces a reconstruction for the glitch. The resulting reconstruction will be zero on the window border and will avoid discontinuities. We select a small window at the beginning of the data window where we are sure that the data is mostly dominated by the background. After that, a bisection algorithm tries to find the largest value that returns a non-zero reconstruction. We will refer to this value as λ min .

A. Blip glitch subtraction with a single dictionary
We start applying dictionaries of different length and number of atoms to the 16 blip glitches we used as tests. The bisection procedure introduced in Section III C is used to find the lower value of λ that returns zeros at the beginning of the data stream. The reconstructed glitch is then subtracted from the data to obtain a cleaner background.
As mentioned previously, we explore the results of us-ing dictionaries formed by atoms of length inside the range [2 3 , 2 9 ]. All options are able to reconstruct the glitches and reduce their impact significantly. However, we observe slightly different results depending of the length of the atoms. In particular, dictionaries with shorter atom length are able to extract more high frequency features than those with larger atom length, but at the expense of leaving more energy at lower and middle frequencies. We have also explored the effect of the number of atoms. We find that once the number of atoms is sufficient large, above ∼ 2.5 times the length of the atoms, there is no appreciable improvement in the results.  Fig. 3 shows the time-frequency diagram (spectrogram) of the same data. In all these examples (and in the whole set of glitches of our sample), the amplitude of the reconstruction is almost zero in the parts of the data stream that contain only background, and only the parts of the signals with larger amplitude are reconstructed. The examples reveal that even in the case when multiples glitches are present in the same data (as in the right panel of Fig. 2) the algorithm is able to reconstruct all of them. The residuals of the subtraction (bottom panels of Fig. 2) show that the impact of the glitches is reduced in all the examples. However, some non-negligible part of the glitch remains in the data (see left and middle panels).
Let us now focus on the spectrograms shown in Fig. 3. The upper panels display the original data (after the whitening procedure) and the lower panels show the reconstructed spectrograms obtained when using a single dictionary of 192 atoms, each with a length of 128 samples. The three blips chosen to illustrate our procedure show several features that make them interesting cases of study. A strong spectral line around 1400 Hz is well visible in the right plot and overlaps with the blip glitch. The middle plot reveal a glitch with almost rectangular shape with a strong contribution at high frequencies. Finally, the right panel shows the presence of three blips with typical tear-drop shape morphologies, very close to each other, along with some spectral lines between them at low frequencies. The spectrograms of the reconstructed data confirm the analysis of the time-series plots. The power of the glitches is reduced for the low and middle frequencies (up to ∼ 1000 Hz) while in others part of the spectrogram both the signal and the background remain unaltered. This effect is clearly visible in the left plot of the lower panel of Fig. 3 which shows that the prominent spectral line is still present after the reconstruction. This is both a good and a bad feature of the method. On the one hand, it is a good result because, as the line has a totally different morphology than the blip glitch, the dictionary does not reconstruct it; that would be a nice feature of the method in the case of a coincidence of a blip glitch with an actual GW signal. On the other hand, it is a bad result because, as spectral lines are another source of noise, they require additional techniques to mitigate their impact.
The most obvious conclusion from the spectrograms is that the dictionary has problems to reduce the high frequency content of the glitches. In the next subsection we discuss how to improve the performance at high frequencies and present quantitative estimates, using the SNR and W metrics, of our two deglitching procedures.

B. Combining dictionaries
We turn next to describe the results obtained when using a combination of different dictionaries to improve the results at higher frequencies. This combination is a natural extension of our original algorithm based on a single dictionary. Now the algorithm reads as follows: first we perform the reconstruc-  tion in the same way than before using the largest dictionary (i.e., 192 atoms of 128 samples length each). Then, a second dictionary is applied to the residual obtained from the first one. However, we do not use λ min for this second dictionary. As our goal is to improve the results at high frequencies, we use a slightly lower value, around 75% less. To avoid the reduction of the background due to this lower value of λ we restrict the reconstruction to a window which contains the most significant part of the blip. This window is determined by the reconstruction using the first dictionary, selecting the part of the signal which is not zero.
As an example we show results using a combination of two dictionaries, one formed by 192 atoms of 128 samples and another one formed by 40 atoms of only 8 samples. The results are presented in Figs. 4 and 5. The comparison of the time-series plots of Fig. 4 and Fig. 2 does not show an obvious improvement when two dictionaries are used instead of one. A careful inspection of Fig. 4 reveals that two of the peaks at the maximum of the glitch disappear in the right and left panels while part of the peaks in the center panel are also reduced. Comparing the spectrograms (i.e. Fig. 3 and Fig. 5) yields more meaningful information. One can observe that the high-frequency contribution that remains from the reconstruction with a single dictionary is further reduced by using a second, smaller dictionary. In addition, the background is not significantly perturbed. This also holds when the first dictionary is able to extract the blip glitch completely, as shown in the middle panel of Fig. 5.
We have also analyzed the results for lower values of λ than the 0.75λmin used in this example. We find that the highfrequency component of the glitch can be reduced even more. However, at low and middle frequencies the background is affected and the spectrogram shows a significant reduction of power in the glitch there. Our tests indicate that a value of λ around 0.75λmin yields a good tradeoff. Table I reports the two metrics, SNR and W, for all test cases shown in Figs. [2][3][4][5]. The values of SNR and W for the original signals are shown in columns 2 and 5, respectively. The comparison using a single dictionary (columns 3 and 6) indicates that the dictionary is able to reduce the SNR significantly. The values of the whiteness increase sightly as expected. Note that as the data is whitened before the reconstruction, most of the original values of W are already close to one. For our procedure combining two dictionaries (columns 4 and 7 in Table I) the estimators show that in those cases where the first denoising does not reduce the glitch completely, the second dictionary is able to improve the results. In addition, in cases where the first dictionary already shows good performance and the SNR is reduced significantly, the addition of a second dictionary barely modifies the results. As a summary we conclude that, in general, glitch denoising with multiple dictionaries is a convenient strategy. For our test set of 16 blip glitches it reduces the SNR by a factor of ∼ 11 on average, while with a single dictionary the average reduction is ∼ 7.5, with a negligible increment in computational cost.

C. Deglitching of GW170817
In [4] the LIGO-Virgo Collaboration reported the first detection of GWs from a binary neutron star inspiral, GW170817. About 1.1s before the coalescence time of GW170817, a short instrumental noise transient appeared in the LIGO-Livingston detector (see Fig. 6, upper panel). The glitch was modeled with a time-frequency wavelet reconstruction and subtracted from the data, as shown in Fig. 2 of [4].
In this section we evaluate the use of learned dictionaries to deglitch the noise transient appearing in GW170817. The results are plotted in the middle and bottom panels of Fig. 6. We note that the spectrograms in this figure look different than those shown in the previous sections. This is because we use the routines of the Q-transform included in the GWpy libraries [41] in order to obtain similar plots to those reported in [4] to facilitate the comparison. The shape of the GW170817 glitch is not exactly the same as that of the blips we have used for training our dictionaries. However, this example is an excellent test to assess the capabilities of our trained dictionaries to reconstruct other types of glitches. In addition, the presence of the binary neutron star signal allows us to analyse if it is affected by the deglitching.
The middle panel of Fig. 6 shows the results after applying one single dictionary of 256 samples, which is the dictionary that produces the best results in terms of reducing the contribution of both high and low frequencies. This is expected because the GW170817 glitch lasted significantly longer than the test cases discussed before. For the most part the glitch has been removed from the data and the actual chirp signal from the inspiraling neutron stars behind the glitch is recovered almost completely. Nevertheless, part of the glitch at fre- quencies ∼ 400 Hz remains after the first reconstruction. To improve the results, we apply a second dictionary, in this case composed by atoms of 16 samples. The resulting spectrogram is presented in the lower panel of Fig. 6. Using this second dictionary, the high-frequency contribution disappears while the chirp signal remains almost unaltered.

V. SUMMARY
We have investigated the application of learned dictionaries to mitigate the effect of noise transients in the data of GW detectors. Although the data show the presence of many families of glitches, each with a different morphology and timefrequency shape, we have focused on "blip" glitches because they are the most common type of glitches found in the twin LIGO detectors and their waveforms are easy to identify. This paper is the first step in our ongoing program to automatically classify and subtract all families of glitches employing variational methods.
Our approach is based on the combination of two different variational techniques based on the L 1 norm, namely the Rudin-Osher-Fatemi method [28] and a Dictionary Learning method [27]. We have randomly selected 80 blip glitches scat-tered in the data from advanced LIGO's O1 run. The data corresponds to a window of 1 s centred at the GPS time of each glitch as provided by Gravity Spy [14]. 85% of the glitches have been used to train the dictionary while the other 15% have been employed as examples to test the performance of the algorithm. The test set has included 16 blip glitches. In our approach we have incorporated a regularized ROF denoising step before the training step to obtain a smooth dictionary, which has proved to be more effective to model and subtract the glitches from the data.
The determination of a good value of the regularization parameter λ is not a trivial task. In this paper we have deviated from our previous works where the optimal value of λ could be determined by comparison with an analytical or numerical template [18,[29][30][31]. To preserve the background as unaltered as posible, we find the first value of λ that only produces a reconstruction of the glitch and returns zeros for the surrounding background. This procedure has the advantage that λ is calculated only from the data and does not require any systematic search using templates. Our results have shown that this approach is valid to model and subtract most of the contribution of the blip glitches in all cases analyzed. They also have revealed, however, that the high-frequency component of the blips is not completely removed. This issue has been ameliorated by using a combination of dictionaries with different atom length, one much shorter than the other. This seems to be the best strategy to mitigate the blips at all frequencies. We have also shown that our approach yields satisfactory results when applied to the GW170817 glitch [4]. Learned dictionaries trained with a different set of glitches can remove the GW170817 glitch from the data without affecting the actual signal from the binary neutron star inspiral.
In order to eventually employ this approach in a low-latency pipeline in actual GW detectors, it would be necessary to improve the computational performance of the method. The current version of the code is implemented in MATLAB and not optimized. On average, the amount of time to process 1 s of data sampled at 8192 Hz is ∼ 1.5 s using a MacBook Pro with a 2,3 GHz Intel Core i5 processor and 16 Gb of RAM memory using 4 cores. We note, however, that the reconstruction can be trivially parallelised since each segment can be processed independently of the others. Therefore, there is still a considerable margin for improvement in execution time, developing an optimized and compiled version of the code. In the near future, we plan to extend the work initiated in [18] to account for other families of glitches and use classification algorithms as a previous step to the glitch subtraction procedure presented in this work.