Bayesian approach to radio frequency interference mitigation

Interfering signals such as Radio Frequency Interference from ubiquitous satellite constellations are becoming an endemic problem in fields involving physical observations of the electromagnetic spectrum. To address this we propose a novel data cleaning methodology. Contamination is simultaneously flagged and managed at the likelihood level. It is modeled in a Bayesian fashion through a piecewise likelihood that is constrained by a Bernoulli prior distribution. The techniques described in this paper can be implemented with just a few lines of code.


I. INTRODUCTION
Satellite constellations in low earth orbit such as SpaceX's Starlink will likely number 100,000 by 2030 1 .Described in a recent Nature article as 'horrifying' 2 , the impact of Radio Frequency Interference (RFI) created by these 'megaconstellations' on Astronomy is of significant concern.Interfering signals like RFI are problematic because they cause information in contaminated frequency channels to be lost which can lead to significant systematic error if not properly modelled.
The frequency bands that these satellites and other modern telecommunications devices operate in are not protected or reserved for Astronomy.They clash with current and next generation experiments such as the ngVLA 3 and the SKA 4 , which plan probe yet unseen epochs of the universe.Furthermore, the latest telescopes operate in large regions of the sky and across wide frequency bands, making it impossible to avoid these satellite swarms.This is pushing Astronomy to remote corners of earth and beyond, to space 5 .
However, moving projects to space is not a long term solution.There is only a short time window until the RFI-quiet dark side of the moon becomes contaminated by satellites and other devices required by projects such as LuSEE 6 and the LCRT 7 which themselves are hoping to exploit the clear lunar skies.Furthermore space itself is filled with cosmic rays, which lead to RFI-like interference and thus have been problematic for the JWST's near infrared spectrometer 6 .
Consequentially, RFI is becoming the fundamental bottleneck in modern Astronomy and beyond; with modern observations being at the forefront of many fields of fundamental Physics, such as dark matter detection.New statistical techniques are therefore urgently required to address this rapidly growing problem.
Furthermore, projects such as the Event Horizon Telescope 8 , Planck 9 and BICEP 10 all used Bayesian statistics in their data analysis pipelines, but there is currently no way to manage RFI in a Bayesian sense forcing astronomers to patch traditional RFI mitigation algorithms into their Bayesian systems.Modern projects implement Singular Value Decomposition 11 watershed segmentation 12 , Deep Learning methods 13 ; 14 ; 15 and more recently Gibbs sampling 16 .
Scientists are constantly searching for more elusive and faint signals, thus data cleaning is becoming increasingly important.The Laser Interferometer Gravitational-wave Observatory (LIGO) 17 , for example, is sensitive enough to detect a car starting miles away and thus is extremely sensitive to data corruption 18 .The next generation aLIGO 19 will be even more so.As such there is a need for novel data cleaning methodologies beyond fields involving measurements of the electromagnetic spectrum.
In this paper we propose a data cleaning methodology that takes a Bayesian approach, where contamination is both flagged and managed at the likelihood level.Detailed benchmarking of this technique will be provided when it is applied on data from a next generation low frequency global experiment, REACH 20 , which will see first light this year.A usage example of the methodologies described in this work can be found at github.com/samleeney/publications.

II. THEORY A. Bayesian Inference
Bayesian methods can be used to perform parameter estimates and model comparison.A model M uses data D to infer its free parameters θ.Using Bayes Theorem, the prior π is updated onto the posterior P in light of the likelihood L and furthermore the Bayesian Evidence Z can be inferred by computing the integral In practice, P and Z can be determined simultaneously using a Bayesian numerical solver.We use the Nested Sampling algorithm polychord 21 ; where a series of points generated within π are updated such that they sequentially contract around the peak(s) of the likelihood, forming the posterior which can be used to generate parameter estimates.The artefacts of this process can then be used to compute Z, which is used for model comparison.For a more detailed description of Bayesian Inference and Nested Sampling see 2223 .

B. Data cleaning likelihood
A sufficiently contaminated data point can be considered corrupted.Any information relevant to the model is lost and furthermore it cannot be modelled as Gaussian noise.Assuming D is uncorrelated, the likelihood where i represents the i'th data point, is insufficient to model such contaminated data.It is therefore necessary to model the likelihood that each data point is corrupted.Thus, we introduce a piecewise likelihood including the possibility of corruption of data Corruption is modelled as the data becoming completely unreliable and therefore being distributed uniformly within some range ∆ (which, as a scale of corruption, has the same dimensions as the data).An efficient way to write this likelihood is where the Boolean mask vector ε has a ith component which takes the value 1 if the datum i is uncorrupted and value 0 if corrupted.We do not know before the data arrive whether or not they are corrupted.We may infer this in a Bayesian fashion, by ascribing a Bernoulli prior probability p i of corruption (which has dimensions of probability) i.e: Both ∆ and p i are required for a dimensionally consistent analysis.It should be noted that above we assume the a-priori probability that each bin is corrupted is uncorrelated, i.e P (ε) = i P (ε i ), which in practice will almost certainly not be true.We will discuss later the extent to which this assumption can be considered valid.Multiplying Eqs. ( 6) and ( 7) yields and to recover a likelihood independent of ε we formally can marginalise out: This would require the computation of the all 2 N terms in Eq. (10).For realistic values of N , this computation becomes impractical.However, if it is assumed that the most likely model (i.e the maximum term in Eq. ( 10)) dominates over the next to leading order terms, we can make the approximation where δ ij is the usual Kroneker delta function, and ε max is the mask vector which maximises the likelihood P (D, ε|θ), namely: Under this approximation we find that the sum in Eq. ( 10) becomes In practice the approximation in Eq. ( 13) is only valid if the next to leading order term is much smaller, such that where ε (j) is ε max with its jth bit flipped: and we may use Eq. ( 14) as a consistency check.
To summarise, we can correct for contamination under these approximations by replacing the original likelihood L = i L i in Eq. ( 4) with where ε max is defined by Eq. (12).

C. Computing the posterior
The posterior and evidence are computed from Eq. ( 16) via Nested Sampling (although any numerical Bayesian sampling method could be used).Taking logs for convenience gives yielding a masked chi squared like term which can be used to distinguish whether there is a statistically significant difference between the classes of data, i.e corrupted or non corrupted.Furthermore, the second term in Eq. ( 17) introduces an Occam penalty.Each time a data point is predicted to be contaminated its likelihood is replaced with the penalty rather than being completely removed.Without this term, the likelihood where all data points are flagged would be larger and thus 'more likely' than all other possibilities.Therefore, flagging all datum would always be preferable.
We compute this by imposing the condition in Eq. ( 12) on Eq. ( 17) as follows, The corrected likelihood is then updated iteratively via the selected Bayesian sampling method, compressing the prior onto the posterior while simultaneously correcting for conamination.One may also notice that the condition log L i + log(1 − p i ) > log p i − log ∆ in Eq. ( 18) relates to a Logit function, such that Logit functions are used routinely as an activation function in binary classification tasks, hinting at the potential of a future extension of this work using machine learning.

III. TOY EXAMPLE
We will initially test this approach on a simple toy model with a basic contaminant signal injected.We then move onto a more realistic and complex case in Section IV.

A. Initial Testing
Two simple datasets are generated for comparison consisting of a line with m = 1 and c = 1 with σ = 5 order Gaussian noise.One is contaminated by an RFI like signal and the other (the ground truth) is not.They are fit in a Bayesian sense, attempting to recover the two free parameters m and c using the correcting likelihood in Eq. (18) with where θ = m, c, σ, y i is the simulated data and y S (x i ; m, c) are the parameter estimates at the i'th sampling iteration are used to compute the model y i = mx i + c.We set ∆ = D max , to encapsulate the full range of possible data values.∆ could likely be fit as a free parameter, as will be investigated further in future works.
Evaluating the posterior on ε we can assess how frequently across the entire sampling run each datum was believed to fit (non corrupted) or not fit (corrupted) the model.Contaminated points should make up a near zero fraction of the posterior.Conversely, points that are not contaminated would often fit the model and as such contribute significantly to the final posterior distribution.There can also be some points that lie somewhere in between, which the model is less confident are uncontaminated.This may occur with datum that deviate the most from the true signal due to higher order Gaussian noise, for example.
It should be emphasised that although ε i is constrained to binary values, the subsequent mask on ε is not.Unlike traditional RFI flagging algorithms, points are not classified in a binary manor.The mask takes the weighted mean across the posterior.Thus, points more likely to contain RFI will have less 'impact' on the final posterior distribution than points believed to be uncontaminated.The mask could be thought of as being slightly opaque to these data points, accounting for the models uncertainty.Incorporating the models confidence in its correction directly into the subsequent parameter estimates makes this approach unique in comparison with its counterparts.

B. Model Evaluation
The two aforementioned data sets are evaluated when fit using the likelihood capable of correcting for contamination and also when using a standard likelihood, which cannot natively account for contamination.This generates a total of four posterior distributions for comparison.Of these, all but the contaminated, uncorrected case would be expected to perform similarly if RFI has been effectively mitigated.From a Bayesian standpoint, the simplest model will always be preferable.Thus, for the clean dataset it would be expected that the standard likelihood would be preferred slightly over the correcting likelihood.Fig. 1 shows the parameter distributions inferred from the data.The results are as expected; parameters in all but the 'RFI No correction' are inferred to within 1σ of their true value.
As seen in Fig. 2, the model that does not correct for RFI is slightly preferred for uncontaminated data.Conversely the correcting model is strongly preferred on the contaminated data indicating that the correction is working as predicted.
Viewing the posterior plots of P (y|x, D) in Fig. 2 it is clear that when RFI is not corrected, the true parameter values are outside the 1σ and sometimes 2σ confidence bounds.Conversely the other three cases fit almost entirely within the 1σ bounds, indicating that the RFI has been mitigated.

C. Evaluating the log p dependence
Proper selection of the probability thresholding term log p is essential.From a Bayesian standpoint it should be set to represent our prior degree of belief in there Reconstructed Signal y x FIG. 2. Showing the inferred parameter estimates in a contour plot, where darker tones indicate higher σ confidence in the parameter estimates.Generated from the dataset described in Section III A. The Bayes factor is −1.4 ± 0.3 for the no RFI case and 25.0 ± 0.4 for the RFI case.The plots are generated using the functional posterior plotter fgivenx 25 .
being RFI in each datum.We assess the log p dependence while varying log p as a function of the RMSE on the fit generated from the parameter estimates, the log Bayesian Evidence and the mean number of points flagged across all samples.
For high log p, the RMSE is high and we observe in Fig. 3 that the model generates less accurate parameter estimates.Here the threshold is so high that the model is more confident that any of the points are RFI than non RFI.This matches the corresponding low evidence.The RMSE drops as log p decreases to near its minimum.The model incorrectly flags ≈ 5 data points while the RMSE is low, showing that the model is able to generate accurate parameter estimates whilst over flagging, indicating it is insensitive to false positives.As log p decreases further, the model is better able to distinguish be- tween higher order Gaussian noise and as such the average number of points predicted to be RFI approaches the true value.As this happens the evidence also reaches its maximum, which indicates that the Bayesian Evidence is appropriately showing how well each of the many models created by different log p values fit the data.A key assumption is made in Eq. ( 13) is that the leading order term, Eq. ( 17), is considerably larger than all the other possible terms for ε ∈ (0, 1) N .It is necessary to test the validity of this approximation by computing Eq. ( 18) and comparing the result (P max ) with the next leading order term (P NLO ) as calculated by Eq. ( 14).For −5 < log p < −0.1.These results are displayed in the bottom pane of Fig. 3. P max is 18 times larger than P NLO at peak log Z and increases linearly for log p below this.Depending on the log p selection strategy, P max is at least 11 times more likely than the next leading order term.Assuming an appropriate log p selection strategy, the ratio would be higher, indicating that P (D|θ) ≈ P (D|θε max ) is valid.

E. Selection Strategy for log p
Various selection strategies could be taken to select the optimal log p value.For each log p, the model changes.As such, selecting the log p that maximises the evidence seems to be the most obvious selection strategy.In the case of the toy model, the peak log Z occurs where log p = −2.7.Here, P max is 18 times larger than P NLO .Another possible strategy could be to select log p where the number of points flagged is at its minimum.
It is also possible to ascribe a prior to log p, fitting it as a free parameter thus fully automating the approach.This will be examined further in future works.

IV. REACH EXAMPLE
Finally, we examine a real use case for this method.The REACH 20 radio telescope designed to detect the 21cm signal from the Cosmic Dawn.We select REACH as a testing ground for our methodology because it operates in the same low frequency ranges as the next generation of Astronomical experiments, such as the nvGLA the SKA and takes a Bayesian approach to data analysis 26 .The 21cm signal is expected to take the shape of an inverted Gaussian, so the model takes the form with center frequency µ, standard deviation σ and magnitude A all free parameters.The four cases discussed in Section III are then examined, but this time on a simulated sky data set containing a 21cm signal with two RFI spikes injected.The No RFI Correction and ground truth cases are very similar with the simpler (ground truth) case marginally preferred as expected.The RFI Corrected case is again similar with a slightly lower evidence due to the penalties incurred during the corrections.The above is also evident when viewing Fig. 4. The reconstructed signal is within 2σ of the true signal for all but the RFI No Correction case.
V. CONCLUSIONS this paper, which serves as a proof-of-concept, we show that contamination can be both cleaned and corrected in a fully Bayesian sense at the likelihood level.We demonstrate our general approach in the context of signal processing for Astronomy, but these methods will likely be useful beyond.Forthcoming results from current state of the art low frequency Astronomy experiments 27 (where these methods will be benchmarked in the coming months) and extensive testing of this approach as a general Bayesian data cleaning methodology will be used to provide a more detailed analysis in the future.

FIG. 1 .
FIG.1.Showing the parameter distributions inferred from the dataset described in Section III A. The top left to bottom right panes show probability distribution functions for m, c and σ, respectively.Plots generated using posterior plotting tool anesthetic24 .

FIG. 3 .
FIG.3.Assessing how various methods of model evaluation vary as function of log p. From top to bottom: the RMSE, the log of the Bayesian Evidence, then the weighted average number of points flagged and finally the radio between Pmax and PNLO.All dependant variables (excluding log Z) are averaged over the weighted posterior samples.The noise observable in these plots is sampling noise; the noise in the simulated data is seeded.

FIG. 4 .
FIG.4.Showing the results when the RFI correction is applied on simulated data in the REACH data analysis pipeline.The Bayes factor for the no RFI case is −0.6 ± 0.6 and the Bayes factor for the RFI case is 9e7 ± 0.6.