Nonlinear Mode Decomposition: a new noise-robust, adaptive decomposition method

We introduce a new adaptive decomposition tool, which we refer to as Nonlinear Mode Decomposition (NMD). It decomposes a given signal into a set of physically meaningful oscillations for any waveform, simultaneously removing the noise. NMD is based on the powerful combination of time-frequency analysis techniques - which together with the adaptive choice of their parameters make it extremely noise-robust - and surrogate data tests, used to identify interdependent oscillations and to distinguish deterministic from random activity. We illustrate the application of NMD to both simulated and real signals, and demonstrate its qualitative and quantitative superiority over the other existing approaches, such as (ensemble) empirical mode decomposition, Karhunen-Loeve expansion and independent component analysis. We point out that NMD is likely to be applicable and useful in many different areas of research, such as geophysics, finance, and the life sciences. The necessary MATLAB codes for running NMD are freely available at http://www.physics.lancs.ac.uk/research/nbmphysics/diats/nmd/.


Introduction
In real life, measured signals are often composed of many different oscillations (or modes), with complex waveforms and time-varying amplitudes and frequencies. These oscillations contain a great deal of valuable information about the originating system and they are therefore worthy of very careful study. Their properties can be used e.g. to predict earthquakes from geophysical signals [1], or to assess health from signals generated by the human body [2,3]. However, to gain access to all of the properties of a particular mode, it should be first extracted from the signal. For a reliable analysis, therefore, one should decompose the signal, i.e. recover the individual oscillations that are present in it, separating them both from each other and from the inevitable background of noise.
The flaws mentioned greatly restrict the applicability of the approaches currently in use, so that only for a very small class of signals can the decomposition be carried out successfully. To overcome these limitations, we now introduce a new method -Nonlinear Mode Decomposition (NMD) -which is free from all the drawbacks considered above. It is based on the powerful combination of the time-frequency analysis techniques reviewed/developed in [12,24,19], surrogate data tests [26,27,28,29], and the idea of harmonic identification [30]. NMD has proven to be extremely noise-robust, and it returns only the physically meaningful oscillations with any waveforms, at the same time removing the noise. Furthermore, we develop a set of criteria using which almost all NMD settings can be adapted automatically to the signal, greatly improving its performance and making it a kind of super-adaptive approach. These features make NMD unique, providing many significant advantages over other existing approaches, as we demonstrate below on both simulated and real data.
The structure of the work is as follows. In Sec. 2 we summarize the notation used and review the necessary background. The basic idea of the NMD and its implementation are considered in Sec. 3. In Sec. 4 we introduce some important upgrades to improve the NMD performance, while the choice of its parameters is considered in Sec. 5. We demonstrate the power of the method by applying it to simulated signals in Sec. 6, and to real signals in Sec. 7. We conclude in Sec. 8.

Main notation
Given some function f (t), we will denote its Fourier transform, positive frequency part, timeaverage and standard deviation asf (ξ), f + (t), f (t) and std[ f (t)], respectively: where T denotes the overall time-duration of f (t) (in theory T → ∞). Next, the AM/FM component (or simply component) is defined as the function of time t having the form x(t) = A(t) cos φ(t), (A(t) > 0, ν(t) = φ (t) > 0 ∀t), (2.2) where A(t), φ(t) and ν(t) are respectively the instantaneous amplitude, phase and frequency of the component. We restrict the consideration to components satisfying the analytic approximation because, otherwise, the amplitude and phase are not uniquely defined and cannot in principle be recovered reliably by time-frequency analysis methods [12]. For a more detailed discussion of the definitions of A(t), φ(t), ν(t) and of some related issues see e.g. [31,32,33]. In real cases, a signal usually contains many components of the form (2.2). Moreover, some of them typically do not correspond to an independent activity, but arise due to complex waveform of a particular mode to which they are related (see below). This is because real oscillations are rarely purely sinusoidal, but have more complicated shapes as the result of nonlinearities in the generating system and/or the measurement apparatus. For example, the AM/FM component 3 (2.2) raised to the third power [A(t) cos φ(t)] 3 = 3 4 A 3 (t)[cos φ(t) + 1 3 cos 3φ(t)] consists already of two components, although there is obviously only one meaningful oscillation.
It is therefore better to consider the full Nonlinear Modes, defined as the sum of all components corresponding to the same activity: h a h cos(hφ(t) + ϕ h ). (2.4) where v(φ(t)) = v(φ(t) + 2π) is some periodic function of phase (also called the "wave-shape function" [34]), which due to its periodicity can always be expanded in the Fourier series (2.4). Without loss of generality, we fix the normalization of A(t) and φ(t) in (2.4) by setting a 1 = 1, ϕ 1 = 0. Then the instantaneous phase and frequency of the whole NM are φ(t) and ν(t) = φ (t), respectively [34]. In what follows, the AM/FM components composing the NM will be referred to as harmonics, with the h th harmonic being represented by a term ∼ cos(hφ(t) + ϕ h ) in (2.4). We assume that the signal is composed of the NMs c i (t) of form (2.4) plus some noise η(t) (the class of noise will be considered later, in Sec. 3.5): (2.5) Our ultimate goal is then to extract all the NMs present in the signal and to find their characteristics, such as their amplitudes A(t), phases φ(t) and frequencies ν(t), as well as the amplitude scaling factors a h and phase shifts ϕ h of the harmonics.

Time-frequency representations
The time-frequency representation (TFR) of the signal is constructed by projecting it onto the time-frequency plane, which allows us to visualize and follow the evolution of the signal's spectral content in time. Depending on how the projection is constructed, there are many different types of TFRs. In our case, however, we will need those from which the components present in the signal can be extracted and reconstructed in a straightforward way. TFRs meeting this criterion are the windowed Fourier transform (WFT) G s (ω, t) and the wavelet transform (WT) W s (ω, t). All aspects of their use in practice, and our implementation of them, are discussed in [12] (see also the classical reviews and books [13,14,15,16,17]), to which we refer the reader for detail; here we just review briefly the minimal necessary background.
We choose a Gaussian window for the WFT and a lognormal wavelet for the WT: where f 0 is the resolution parameter determining the tradeoff between time and frequency resolution of the resultant transform (by default we are using f 0 = 1). The forms (2.7) appear to be the most useful ones in terms of the joint time-frequency resolution and the reconstruction of components [12]. Note, however, that all of the considerations that follow are applicable for anŷ g(ξ) andψ(ξ > 0) which are real, positive and unimodal around the single peak. The difference between the WFT and WT lies in the way they resolve the components present in the signal. The WFT has linear frequency resolution, i.e. distinguishes components on the basis of their frequency differences, whereas the WT has logarithmic frequency resolution, distinguishing components on the basis of ratios (or the differences between the logarithms) of their frequencies. As a result, the time resolution of the WT increases with frequency, so that the time variability of the higher frequency components is represented better than that for the components of lower frequency, while the time resolution of the WFT is the same at all frequencies.
In theory, the WFT and WT (2.6) depend on continuous ω, but in practice they are calculated at particular discrete ω k . The latter can be chosen as ω k = (k − k 0 )∆ω for the WFT and log ω k = (k − k 0 ) log 2 n v for the WT, reflecting the linear and logarithmic frequency resolution of the corresponding TFRs; the discretization parameters ∆ω and n v can be selected using the criteria established in [12]. Due to the finite time-length of real signals, all TFRs suffer from boundary distortions and, to reduce them, we use the predictive padding suggested in [12].

Nonlinear Mode Decomposition: basic idea and ingredients
The main goal of NMD is to decompose a given signal into a set of Nonlinear Modes (2.4). To do this, four steps are necessary: (a) Extract the fundamental harmonic of an NM accurately from the signal's TFR.
(b) Find candidates for all its possible harmonics, based on its properties.
(c) Identify the true harmonics (i.e. corresponding to the same NM) among them.
(d) Reconstruct the full NM by summing together all the true harmonics; subtract it from the signal, and iterate the procedure on the residual until a preset stopping criterion is met.
These individual sub-procedures are explained in detail in the Sections below. Note that NMD can be based either on the WFT, or on the WT, although in what follows (Sec. 5.2) we will combine them to make the procedure more adaptive. Nevertheless, for the time being we assume that we have selected one of these TFR types and that we will use it for all operations.

Component extraction and reconstruction
Provided that the TFR has sufficient resolution in time and frequency to resolve components that are close in frequency and, at the same time, to portray reliably their amplitude/frequency modulation, each component will be represented by a unique sequence of TFR amplitude peaks (also called ridge points). We will call such a sequence a ridge curve, and denote it as ω p (t); an illustrative example showing the WFT and WT of a signal and the ridge curves of its components is presented in Fig. 1. Note that oscillations with a non-sinusoidal waveform will appear as two or more components in a TFR, consisting of the fundamental and its harmonics. Figure 1: (a) The central 50 s section of the signal s(t) specified at the top, which is composed of three oscillations corrupted with white Gaussian noise of standard deviation equal to 0.5. The oscillations are: 1) near 10 Hz, with frequency modulation (chirp); 2) near 6 Hz, with amplitude modulation; 3) near 1 Hz, with both modulations and a non-sinusoidal waveform. (b),(c) Respectively, the central parts of the WFT and WT of the signal s(t) shown in (a), where the ridge curves related to each oscillation are shown by solid lines of the corresponding colors; the third oscillation is represented by two curves because of its complex waveform, while the apparent distortions in the ridge frequency profiles are due to noise. The signal was sampled at 100 Hz for 100 s.
To extract particular a component from a TFR, one needs first to find its ridge curve. This is a non-trivial task, because in real cases it is not always clear which sequence of peaks corresponds to which component, and which peaks are just noise-induced artifacts. The procedures for ridge curve extraction are discussed and developed in [19]. By default we use here scheme II(1,1) from the latter, as it appears to be very universal in applicability. In short, in the case of WFT we select the ridge curve ω p (t) as being that sequence of peaks (among all those that are possible) which maximizes the path functional 2 ] 1/2 denote the time-average and standard deviation, respectively. In the case of the WT, one uses the same functional (3.1), except that now all is taken on a logarithmic frequency scale (ω p → log ω p , ∆ω p → ∆ log ω p ). The approximate optimal ω p (t) in terms of (3.1) can be found in O(N) steps as described in [19].
Having found the ridge curve ω p (t), the amplitude A(t), phase φ(t) and frequency ν(t) ≡ φ (t) of the corresponding component x(t) = A(t) cos φ(t) can be reconstructed by either of two methods, ridge and direct [12]. The ridge method utilizes only the TFR values at the peak, and the corresponding formulas read , where δν d (t) and δ log ν d (t) are the corrections for discretization effects. Thus, in theory one has a continuous frequency scale, so that the peak positions ω p (t) are exact. In practice, however, TFRs are calculated at the discrete values ω k , thereby introducing errors into the parameter estimates.
To reduce these errors, one can use quadratic interpolation to better locate the peak positions (that are also used subsequently to improve the accuracy of the amplitude estimates in (3.2)), with the corresponding correction terms being where k p (t) denote the discrete index of the peak at each time: Another way is to reconstruct the component from the full time-frequency region into which it is mapped for a given TFR. We call this region the time-frequency support and denote it as [ω − (t), ω + (t)]. It can be defined as the widest region of unimodal and non-zero TFR amplitude around the ridge point ω p (t) at each time [12]. Thus, at ω ± (t) the TFR amplitude either becomes zero or starts growing with increasing distance from the peak |ω − ω p (t)|. The parameters of the component can be reconstructed from its time-frequency support as direct[WFT]: (ω g = 0 for symmetricĝ(ξ), e.g. Gaussian window),

direct[WT]:
(3.4) The difference and relative performance of the two reconstruction methods were studied in [24]. When the component is perfectly represented in the TFR, i.e.
[ω − (t), ω + (t)] contain all of its power and no side influences, direct reconstruction is by definition exact (up to the error of the analytic signal approximation, see [12]), while the ridge estimates contain an error proportional 7 Frequency (Hz) TFR amplitude hν (1) (t 2 ) Figure 2: Illustration of the extraction of the h th harmonic ridge curve ω (h) p (t) based on the fundamental instantaneous frequency ν (1) (t). At each time, starting from the expected harmonic frequency hν (1) (t) (blue points), one climbs (i.e. follows in the direction of TFR amplitude increase) to the nearest peak, which is then assigned to ω (h) p (t) (red points).
to the strength of the amplitude and frequency modulation of the component. However, noise and interference between the components introduce large errors into the direct estimates, whereas the ridge estimates are more robust in the face of such effects. The best choice between the two therefore depends on the amount of noise and characteristic amplitude/frequency variation of the component to be extracted. For the time being, to avoid complications, let us assume that we reconstruct everything by a particular method, e.g. direct, while its adaptive choice will be discussed later in Sec. 5.3.

Harmonics: extracting the candidates
The component extracted in the previous step will generally represent a harmonic of some Nonlinear Mode. For simplicity, however, in this and the next subsection we assume that it corresponds to a fundamental, i.e. the first, harmonic (we will get rid of this assumption in Sec. 3.4). Then, given its instantaneous frequency ν (1) (t), reconstructed by (3.2) or (3.4), the harmonics for h = 2, 3, ... should be located at frequencies hν (1) (t). Therefore, the ridge curve ω (h) p (t) corresponding to the h th harmonic can be extracted simply as the sequence of peaks which are located in the same time-frequency support (region of unimodal TFR amplitude at each time) as hν (1) (t); or, in other words, the sequence of peaks nearest to hν (1) (t) in the direction of TFR amplitude increase. This is illustrated in Fig. 2. Having found ω (h) p (t), the parameters of the harmonic A (h) (t), φ (h) (t), ν (h) (t) can be reconstructed as discussed in the previous section.

Harmonics: identifying the true ones
In general, the procedure of the previous subsection yields what is not necessarily a genuine harmonic, but just a candidate for one. Thus, even if the NM does not contain a particular harmonic, one will still obtain some signal for it, consisting of noise or components lying near the frequency of the putative harmonic. Hence, having extracted the h th harmonic candidate, we must next determine whether it is a true harmonic or not. To tackle this problem, we use the method of surrogate data [26,27], testing against the null hypothesis of independence between the first harmonic and the extracted harmonic candidate. Thus, we first select a measure to quantify the degree of dependence between the dynamics of two harmonics, and we then calculate it for the original harmonic and for a collection of surrogates -a specially constructed time-series consistent with the null hypothesis being tested. If the original value of the selected measure lies outside the distribution of its surrogate values, this indicates genuine interdependence and the harmonic is regarded as true; otherwise, it is discarded as false.
The amplitude, phase and frequency dynamics of a true harmonic should depend on those for the first harmonic (fundamental) as A (h) (t)/A (1) A,φ,ν ∈ [0, 1] quantifying the degree of consistency with these laws (0 -no consistency, 1 -full consistency), i.e. the dependence of the parameters of the h th harmonic on the corresponding parameters of the first one: (3.5) The overall measure of interdependence between the harmonics can then be taken as with the parameters w A,φ,ν giving weights to each of the consistencies q (h) A,φ,ν . By default, we use ρ (h) ≡ ρ (h) (1, 1, 0), giving equal weights to the amplitude and phase consistencies, and no weight to the frequency consistency. The latter is because the procedure of harmonic extraction (see Sec. 3.2), being based on the instantaneous frequency of the first harmonic, in itself introduces a dependence of ν (h) (t) on ν (1) (t), so it is better not to base any conclusions on it.
Ideally, for true harmonics one should have q (h) A,φ,ν = 1, but noise, finite frequency and time resolutions, and the interference with the other components all introduce errors. In reality, therefore, the consistencies will be smaller than unity even for true harmonics. Hence one cannot identify harmonics based only on the value of ρ (h) but also needs to use the surrogate test. We employ the idea of time-shifted surrogates [28,29], using as a first harmonic surrogate its timeshifted version and as the other harmonic surrogate the corresponding candidate harmonic extracted from the time-shifted TFR. Such time-delay destroys any temporal correlations between the signals while preserving all other their properties, thus creating surrogates consistent with the null hypothesis of independence.
Given the maximal time-shift (in samples) M, the surrogate parameters for the first harmonic are taken as shifted on ∆T d /2 backwards where f s is the signal sampling frequency and N is its total length in samples (note, that the length of the surrogate time series is smaller than the original ones, being N − M). Using ν (1) d (τ) (3.7) as a reference profile, the surrogate h th harmonic and its parameters are extracted in the same way as described in Sec. 3.2, but from the signal's TFR shifted on ∆T d /2 forward in time (G(ω, τ + ∆T d /2) or W(ω, τ + ∆T d /2)).
The extraction of the corresponding curves for all surrogates can greatly be accelerated by initial preprocessing of the TFR, constructing its skeleton [24] at the beginning and utilizing it for each surrogate. Thus, at each time t we break the TFR into the regions of unimodal TFR amplitude [ω −,m (t), ω +,m (t)], m = 1, 2, . . . , N p (t), and from them reconstruct by (3.2) or (3.4) 9 the corresponding amplitudes A m (t), phases φ m (t) and frequencies ν m (t) (which form the TFR skeleton). Then what is left for each surrogate is to find the consequence of the indices m d (t) of the supports in which the expected harmonic frequency lies and to take the corresponding parameters: Remark 3.1. Alternatively, one can construct the surrogates by shifting the already reconstructed parameters of the first and h th harmonic relative to each other, without re-extracting them for each surrogate. In fact, we have found that such an approach gives almost the same results and at the same time is faster. However, as mentioned before, the original procedure of harmonic extraction introduces the correlation between ν (h) (t) and ν (1) (t), and can therefore introduce dependence between other parameters, such as phases. Therefore, to be rigorous, the procedure should be repeated for each surrogate, thus automatically taking into account possible bias of this kind and producing more reliable estimates of the significance.
Summarizing, we calculate the amplitude-phase consistencies ρ (h) d=1,...,N d (1, 1, 0) (3.6) for the surrogate parameters (3.7), (3.8) and compare them with the value ρ (h) 0 (1, 1, 0) calculated in the same way but for the zero time shift ∆T 0 = 0. The probability measure (although mathematically not the true probability) that the extracted h th harmonic curve is a true harmonic of the main one is then quantified by the significance of the surrogate test, i.e. by the relative part of surrogates for which ρ (h) d < ρ (h) 0 . For example, if we found 1000 surrogate amplitude-phase consistencies ρ (h) d=1,..,1000 and 792 of them are smaller than the original value ρ (h) 0 , then the rough probability that the extracted curve represents a true harmonic is 79.2%. We regard a harmonic as true if the probability calculated in this way is ≥ 95%. By default we use N d = 100 surrogates and a maximum time shift that equals quarter-length of the signal M = N/4, so that the surrogates are of N − M = 3N/4 length each.
Note that the significance of the surrogate test does not depend on the magnitude of ρ (h) (3.6). Thus, there might be an independent component located at the frequency of a possible harmonic, so that the amplitude-phase consistency would be high but, because it does not fully adjust its amplitude and phase to that of the fundamental harmonic, the surrogate test will not reject the null hypothesis of independence (e.g. see below, Table 1). Thus, the possibility of picking up a spurious component that is nearby in frequency is largely eliminated.
Remark 3.2. Since we test many harmonic candidates, it is natural to expect that sometimes we will encounter false positives, i.e. the surrogate test will regard as true a harmonic which is actually false. Thus, for some noise realizations the parameters of the TFR around the expected harmonic frequency might indeed appear to be correlated with the parameters of the fundamental harmonic. However, in such cases the extracted harmonic candidate will usually have quite small amplitude-phase consistency ρ (h) (3.6). To reduce the probability of false positives, therefore, we introduce some threshold ρ min and regard the harmonic as true only if it both passes the surrogate test and at the same time is characterized by ρ (h) ≥ ρ min . Empirically, we set this threshold as where w A , w φ are the weightings used in (3.6) (as mentioned, we use w A = w φ = 1); the value (3.9) was found to be quite universal and to work well in most cases. Note, that for a true harmonic one can also have ρ (h) < ρ min , but this will usually mean that it is badly damaged by noise or other influences and cannot be recovered without large errors.

Harmonics: practical issues
Extracting in order. To improve the accuracy of reconstruction, each harmonic which is identified as true should be subtracted from the signal before extracting and testing the next one. This decreases the errors related to interference between the harmonics and makes all procedures more accurate. The same consideration applies to the first harmonic which, after being found by the methods described in Sec. 3.1, should be subtracted from the signal before extraction of any of the other harmonics.
How many harmonics to extract? Clearly, the maximum number of harmonics one can extract in principle is h max = ( f s /2)/ ν (1) (t)/2π , where f s is the sampling frequency of the signal (so that f s /2 is the maximally achievable, Nyquist frequency) and ν (1) (t) denote the extracted instantaneous frequency of the first harmonic. However, checking all harmonics might be computationally very expensive, and is often not needed. In practice, it is better to stop the search after some chosen number of sequential harmonics has been identified as false, making it likely that there are no more true harmonics. We choose this number to be S = 3 by default.
What if the extracted component is not the first harmonic? Although intuitively the first harmonic should have the highest amplitude (and will therefore be extracted as the dominant curve), for some complicated waveforms (e.g. with sharp peaks) this might be untrue. Therefore, before starting extraction of the harmonics, one should first ensure that one is starting with the first harmonic; and, if not, then find it and switch to it. To do this, we apply the same procedure described for harmonic extraction and identification, but in the reverse direction, i.e. using h = 1/2, 1/3, 1/4, .... Then if some of these are identified as true, we switch to the one with the smallest frequency and start from there. The minimal h one can go for can be set as h min = (1/T )/ ν (1) (t)/2π , although the statistics will be bad already for h < 5h min , when the related oscillation has less than 5 cycles over the whole signal time-duration [12]. Nevertheless, it is better to stop after S 0 = 3 consecutive 1/n harmonics have been identified as false, in the same manner as is done for the usual harmonics.

Stopping criterion
Once all harmonics are identified and reconstructed, they are summed up into the Nonlinear Mode, which is then subtracted from the signal and the procedure is repeated on the residual. The natural question arises, therefore, of how many Nonlinear Modes to extract, i.e. when to stop the decomposition. Obviously, decomposition of any noise, e.g. white, Brownian, any other correlated or not, does not make much sense, so the reasonable goal is to extract all oscillatory components present in the signal and leave the noise and trends as the residual. Therefore, after extraction of each NM one needs to decide whether what is left contains any more meaningful oscillations, in which case one continues the decomposition, or whether it just represents noise, in which case one should stop. The problem is thus reduced to distinguishing deterministic from random dynamics.
To solve it, one can use the surrogate test against the null hypothesis of linear noise [26,27], which includes white and colored noises (e.g. Brownian). The surrogates for this task, called FT surrogates, are constructed by taking the inverse Fourier transform of the signal's FT with randomized phases of the Fourier coefficients: s s (t) = (2π) −1 [ŝ(ξ)e iϕ s (ξ) ]e iξt dξ, where ϕ s (ξ) = −ϕ s (−ξ) denote the phases taken at random uniformly on [0, 2π] for each frequency ξ > 0. The reason for this is that any linear noise (an ARMA process denotes Gaussian white noise of unit variance) is characterized only by the amplitudes of the Fourier coefficients. Randomization of the Fourier phases preserves the power spectrum, so that the surrogate time series will represent another realization of the same random process if the original time series is noise, thus being consistent with the tested null hypothesis. On the other hand, if some meaningful NMs are present in the signal, the randomization of the phases will destroy the particular phase relationships responsible for the amplitude and frequency modulations, making their behavior less deterministic. Remark 3.3. Instead of this stopping criterion, one can use the conventional approach of stopping when the standard deviation of the residual is lower than some percentage of that for the original signal. Another way is to extract some predefined number of NMs, e.g. 10. However, the criterion based on the surrogate test seems to be the most physically meaningful.
One now needs to select the discriminating statistics, which is calculated for the original signal and the surrogates, so that the null hypothesis of linear noise is accepted if the original value lies within the surrogate values and rejected otherwise. The commonly used statistics involve one of the generalized Renyi dimensions [35,36,37], with the correlation dimension calculated by Grassberger-Procaccia approach [38,39,40] remaining the most popular choice. However, we find that the surrogate test based on such measures is extremely sensitive to noise, prohibiting their use in the NMD which is intended to be noise-robust.
Therefore, we need to devise another discriminating statistics. There are virtually no restrictions on its choice [41], so that almost any measure can be used. The only question to be asked is how powerful it is, i.e. how good in distinguishing the deterministic dynamics from noise. Given that NMD is based on the WFT/WT, it seems reasonable to select statistics based on the properties of the time-frequency representation obtained. Namely, since at the first step of the NMD process we extract the component from the signal's TFR (see Sec. 3.1), we can base our statistics on the properties of the components extracted (in the same way) from the original signal and from its FT surrogates. Thus, if the original component is true (and not just formed from noise peaks picked in the time-frequency plane), then it is expected to have more deterministic amplitude and frequency modulation than the surrogate components, which should be more stochastic; otherwise, there will be no difference.
The degree of order in the extracted amplitude A(t) or frequency ν(t) can be quantified by their spectral entropies Q[Â(ξ)], Q[ν(ξ)], so that the discriminating statistics D for the surrogate test can be taken as their combination Note that, in practice, due to the finite sampling frequency f s and sample length N of real signals, the integrals over frequency ξ in Q[Â(ξ)], Q[ν(ξ)] (3.10) are discretized into sums over the discrete FT frequencies ξ n = (n/N − 1/2) f s , n = 1, . . . , N.
In the present context, we have found the statistics D(α A , α ν ) (3.10) to be more meaningful and much more powerful than other choices (e.g. the popular correlation dimension [38,39,40]). This statistics is directly related to the quality of the representation of component in the signal's TFR, so that the significance of the surrogate test based on it reflects the proportion of the "deterministic" part in the extracted amplitude and frequency dynamics. Thus, if the residual does not pass the surrogate test (null hypothesis is not rejected), this might mean either that the residual is indeed noise, or that the component simply cannot be reliably extracted from the TFR (e.g. because resolution characteristics of the TFR are not appropriate to reliably represent the related amplitude/frequency modulation and/or to segregate the component from noise).
The power of D(α A , α ν ), i.e. its ability to distinguish deterministic from random dynamics, depends strongly on the complexity of the component's amplitude and frequency modulations: the lower the original spectral entropies Q[Â(ξ)], Q[ν(ξ)] are, the more powerful is the test. However, even in the (quite unrealistic) case when the signal contains a meaningful component without any amplitude or frequency modulation, i.e. a pure tone A cos νt, due to numerical issues [42,43] the surrogate test will still be quite powerful in rejecting the null hypothesis (unless this tone has an integer number of cycles over the time-length of the signal). The power of the test is also inversely proportional to the spread ofÂ(ξ),ν(ξ): the more concentrated they are, the narrower the frequency band that the component A(t) cos φ(t) occupies, so that the less the noise power that is contained in it.
As to the choice of α A , α ν in (3.10), we have found that the powers of D(1, 0) = Q[Â(ξ)] and D(0, 1) = Q[ν(ξ)] are often inversely proportional to the strengths of the amplitude and frequency modulation, respectively. Thus, D(1, 0) is preferable for components with relatively small amplitude variability and considerable frequency variability, while D(0, 1) is better otherwise. Although the procedure is not fully rigorous mathematically, we therefore perform three tests, using D(1, 0), D(0, 1) and D(1, 1), and then select the significance as the maximum among them. It remains to be established, however, whether some better statistics not having the drawback mentioned can be found.
Summarizing, we extract the components from the TFR of the original signal and compute the corresponding D 0 (α A , α ν ) (3.10); then we create N s FT surrogates of the signal, for each of them calculate the corresponding TFR, extract the component from it and compute the respective D s=1,...,N s (α A , α ν ). We use N s = 40 surrogates and set the significance level to 95%, rejecting the tested null hypothesis of noise if the number of surrogates with D s > D 0 is equal or higher than 0.95 × 40 = 38. The test is performed for three different (α A , α ν ) in (3.10), using D(1, 1), D(0, 1) and D(1, 0) as a discriminating statistic; if at least for one of them the null hypothesis is rejected, we regard the signal as not noise and continue the decomposition.
Remark 3.4. At the very start of each NMD iteration, the signal should be detrended, which we do by subtracting a third order polynomial fit from it. This is especially important for the surrogate test, as trends might greatly affect its performance, In particular, the FT surrogates should be constructed using an already detrended signal as a base, because otherwise one might end up with testing the trend against noise rather than the oscillatory component of interest.
Additionally, to guarantee that the boundary distortions are of the same nature in the original TFR and those for the surrogates, it is recommended to use padding with zeros [12]; the original D 0 should thus be recalculated using parameters of the component extracted from the TFR of the zero-padded signal. This issue, however, concerns only the surrogate test, while in all other cases we use the more accurate (but at the same time more computationally expensive) predictive padding [12].
Remark 3.5. Instead of the FT surrogates used here, one can alternatively utilize the more advanced IAAFT surrogates [44], which are devoted to testing against the null hypothesis of an invertibly rescaled linear stochastic process (e.g. Brownian noise taken to the third power). These surrogates preserve both the power spectrum and, as much as possible, the distribution of values of the original signal. However, as can be seen from (2.6), the WFT and WT do not 13 explicitly take into account the distribution of signal values, only its FT. Therefore, the simpler and easier-to-compute FT surrogates seem to be a more natural choice in our case, and we have found that the test developed above works well even for rescaled (either invertibly or noninvertibly), non-Gaussian and non-stationary linear noise. In fact, the outcome of the test seems to be determined primarily by the "goodness" of the component representation in TFR, which is affected only by the noise power and its distribution in the time-frequency region around the component's instantaneous frequency.

Improvements
The NMD as described in the previous section already represents a very powerful decomposition tool. However, it may be made even better with the help of the upgrades outlined below, although at the expense of greatly increased computational cost for some of them (though it still remains O(N log N)).

Adaptive representation of the harmonics
Even if the first harmonic of some NM is well represented in the TFR and can be accurately extracted and reconstructed, it does not mean that the same applies to all the other harmonics too. For example, harmonics of an NM with only amplitude modulation require the same time and frequency resolution, so that the WFT can represent them all well, while for the WT, where time and frequency resolution scale with frequency, one will need to adjust the resolution parameter f 0 for each harmonic. Thus, consider an NM with simple sinusoidal amplitude modulation: (4.1) Clearly, all harmonics have the same amplitude variability (in relative terms), so that the TFR time-resolution should also be the same for them. This is reflected in the fact that each harmonic x (h) (t) is composed of three Fourier terms, which all have identical amplitude ratios, frequency differences and phase relationships for each h. Furthermore, the frequency distance between two consecutive harmonics remains the same, meaning that the frequency resolution also should not be changed for harmonics. Therefore, the WFT, having constant time and frequency resolution, will be a perfect match for this case. This means, first, that if the amplitude modulation of the first harmonic is represented reliably in the WFT, then the same will also apply to all other harmonics and, secondly, that if two first harmonics do not interfere in the WFT, then any two harmonics will also be wellseparated. For the WT, the former is also true, as the time-resolution, i.e. the ability to reflect changes in time, increases with frequency for this type of TFR. However, the frequency resolution of the WT progressively worsens with the increase of frequency, so that the higher are the harmonics, the harder it is to resolve them. This is illustrated in Fig. 3, where from (b) and (c) it is clear that all harmonics can be well represented in the WFT, but for the WT higher harmonics begin to interfere. This issue is explained schematically in Fig. 3(d) and (e). Thus, both the WFT and WT can be seen as convolutions in the frequency domain of the signal with a windowĝ(ω − ξ) and waveletψ * (ω ψ ξ/ω), as seen from (2.6). Figs. 3(d) and (e) show the signal's discrete FTŝ(ξ) together with the (rescaled and centered at the mean frequencies of the harmonics ω/2π = 1, 2, 5, 6 Hz) window and wavelet FTsĝ(ω − ξ) andψ * (ω ψ ξ/ω) (2.7), with whichŝ(ξ) is convoluted while constructing the WFT and WT, respectively.
Roughly speaking, the quality of the representation of time-variability (the amplitude modulation in the present case) for each harmonic can be estimated based from the proportion of the corresponding Fourier coefficients lying below the shown window/wavelet FTs. For both the WFT and WT, each harmonic has three Fourier coefficients (4.1), all of which lie in the correspondingly shaded areas, meaning that the time-resolution in each case is sufficient to represent amplitude modulation appropriately. The degree of interference between the harmonics in the WFT and WT can be estimated from the area of overlap between the corresponding window/wavelet FTs. For the WFT (Fig. 3(d)) the interference between harmonics does not depend on their frequency, while for the WT (Fig. 3(e)) it increases for the higher harmonics, which results in the fifth and sixth harmonics being represented as a single component (Fig. 3(c)). This is because the minimal frequency difference between two harmonics is equal to the frequency of the first harmonic, thus being defined on a linear scale (which is natural for the WFT), but not on a logarithmic one (natural for the WT).
The situation changes when there is frequency modulation. In this case neither the WFT nor the WT provide optimal representation of the harmonics; in addition, some harmonics cannot in principle be reliably resolved with time-frequency analysis methods. Thus, consider an NM with simple sinusoidal frequency modulation where we have used the formula e ia sin φ = ∞ n=−∞ J n (a)e inφ with J n (a) = (−1) n J −n (a) denoting a Bessel function of the first kind. Since after some |n| all terms in (4.2) become negligible, one can in practice restrict the summation to |n| ≤ n J (ha), with the maximum non-negligible order n J being determined as where p denote a chosen accuracy threshold. The value of n J (a) increases with a, being (for As a result, the higher the harmonic, the larger the frequency range it occupies, i.e. the larger is the number of non-negligible terms in (4.2). Consequently, to reflect reliably the frequency modulation of the higher harmonics, one needs higher time-resolution for them, a requirement that is satisfied by the WT, but not by the WFT. This issue is illustrated in Fig. 4, where it can be seen that the WT can represent reliably both the first and third harmonics, while the WFT reflects appropriately only the first one. However, the increased time-resolution of the WT is provided at the expense of decreased frequency resolution, leading to stronger interference between harmonics, as seen from the previous case (Fig.  3). Figure 4 also shows, that in some cases it might in principle be impossible to represent reliably two harmonics in the TFR. Thus, as can be seen from Fig. 4(d) and (e), the FTsx (h) (ξ) of the sixth and seventh harmonics are "entangled", i.e. the frequency regions in which they are contained overlap. Therefore, unless specifically designed for this case, any window/wavelet function which picks the Fourier coefficients of one harmonic will inevitably also pick those corresponding to the other one too.
Remark 4.1. Interestingly, the (E)EMD [4,5] has logarithmic frequency resolution [45,46], and therefore suffers from the same drawbacks as the WT in relation to representation of harmonics. Thus, (E)EMD applied to the NM with amplitude modulation shown in Fig. 3 will decompose it into its first and second harmonics, but will merge the fifth and sixth ones into a single mode. Note also that, while for the WT the logarithmic frequency resolution can be tuned by changing the resolution parameter f 0 in (2.6), for (E)EMD the resolution is fixed around some implicit value [45,46].
Summarizing, for the general case where both amplitude and frequency modulation are present in the NM, accurate representation of higher harmonics requires higher time resolution, but the same frequency resolution. However, an increase in time resolution inevitably leads to a decrease of frequency resolution, so usually one needs to search for some compromise. Due to this, it is often the case that neither the WFT nor the WT with a constant resolution parameter f 0 in (2.7) can represent all harmonics reliably. To tackle this problem, one can adaptively adjust f 0 for each harmonic individually. Assuming that the extracted first harmonic is reconstructed well, the quality of the representation of the h th harmonic can be assessed through its consistency ρ (h) (3.6) with the first one. Therefore, the optimal resolution parameter for the h th harmonic, f (h) 0 , can be selected as that for which ρ (h) (3.6) achieves its maximum and at the same time the harmonic passes the surrogate test, i.e. is identified as a true harmonic.
The remaining question is the region within which to search for f (h) 0 . To find it, we first need is concentrated, given the frequency band of the first harmonic. We define ω (h) f and ∆ω (h) f as is the total energy of the harmonic and p denotes the chosen accuracy threshold (note that, as everywhere else, we assume that the analytic approximation (2.3) holds for x (h) (t) = A (h) (t) cos φ (h) (t)). Then it can be shown that, given ω (1) f , ∆ω (1) f , the frequency range for the h th harmonic will be It is clear from the preceding discussion that, for the simple AM mode (4.1), one has ω (h) f = hω (1) f , ∆ω (h) f = ∆ω (1) f . The same is true for any AM mode, as can be shown by representing the 17 amplitude modulation through the Fourier series A(t) = n r n cos(ν n t + ϕ n ) and expanding each harmonic in the same way as in (4.1). For the FM mode (4.2), the expression (4.5) follows from the fact that, at least for small enough p (e.g. 0.001) in the definition (4.3), one has n J (a) n J (ha) hn J (a), (4.6) for most a, as can be confirmed numerically. Then, taking into account (4.6) in the expansion (4.2), one obtains (4.5). For the h th harmonic of the NM with a more complicated doublesinusoidal frequency modulation one has x (h) (t) =A cos(hνt + ϕ + hr 1 sin(ν 1 t + ϕ 1 ) + hr 2 sin(ν 2 t + ϕ 2 )) =A Re e i(hνt+ϕ) Clearly, it can be viewed as a sum of components with singe-sinusoidal frequency modulation, but amplitudes AJ n (hr 1 ) and frequencies (hν + nν 1 ). The frequency range of each of these components, and the number of them with a non-negligible amplitudes, both scale with harmonic number h according to (4.5), (4.6). Therefore, the frequency range of the whole x (h) (t) (4.7) satisfies (4.5). Since frequency modulation can always be expanded in a Fourier series, using the same trick as with (4.7), it can be shown by induction that (4.5) holds for any FM mode. The generalization to the case of any joint amplitude and frequency modulation is then straightforward. Based on (4.5) and the scaling properties of the WFT/WT, one can now determine the region within which the optimal resolution parameter f (h) 0 for the h th harmonic lies. Given that the first harmonic was extracted from the TFR calculated with resolution parameter f (1) 0 , and assuming that the corresponding time and frequency resolutions are appropriate for this first harmonic, one has We search for the optimal f (h) 0 by first breaking the region (4.8) into N r values f (h) r=1,...,N r (we use N r = 10). For each of them we calculate the TFR, extract the harmonic from it (see Sec. 3.2), estimate the corresponding consistency ρ (h) r (3.6), and test the harmonic for being true (see Sec. 3.3). Among the f (h) r for which the harmonic was identified as true, we select the one characterized by the highest ρ (h) r . It is then used as the starting guess for an iterative golden section search of the optimal f (h) 0 (with default accuracy being f = 0.01), maximizing the consistency ρ (h) (3.6).
Note that (4.8) does not take into account the interference between harmonics and with the other components which might lie nearby in frequency, so that in some cases the upper bound on f (h) 0 might be higher than (4.8); the same consideration applies to the lower bound. Therefore, if near the minimum or maximum of the tested f (h) 0 the consistency ρ (h) is found to grow, we continue the search in that direction until the peak appears.

Improved reconstruction of nonlinear modes
Given the reconstructed amplitudes A (h) (t), phases φ (h) (t) and frequencies ν (h) (t) of all the true harmonics, the most straightforward way to reconstruct the full NM is just to add all A (h) cos φ (h) (t) together. However, in this way the NM picks up noise contributions from all harmonics, which can make it quite inaccurate.
Fortunately, there is a cleverer way to perform the reconstruction, also yielding more accurate parameters for the individual harmonics. Thus, one can utilize the theoretical amplitude, phase and frequency relationships between the harmonics, i.e.
Then, because the components with the higher amplitudes are expected to be less noise-corrupted, one can refine the parameters of each harmonic by weighted averaging over the parameters of all harmonics: , and I[. . . ] denotes rounding to the nearest integer, so that I[0.8] = 1, I[−0.6] = −1 (the corresponding term is needed to eliminate possible noiseinduced phase slips, i.e. the rapid growth by 2π in the phase differences). Note also the multiplier min(1, h /h), appearing for phase and frequency refinement in (4.9). It is needed to account for the scaling of phase and frequency errors of lower harmonics when they are mapped to higher ones. Thus, if ν (1) (t) has an error ε(t), then hν (1) (t) will have error hε(t).
Remark 4.3. The refinement (4.9) assumes that the reconstruction error for each harmonic is directly proportional to its amplitude. However, this is true only if, firstly, there are no side components with which harmonics interfere and, secondly, the amount of noise picked up while reconstructing harmonics is the same for each of them. These criteria are rarely both satisfied in practice, so in general one should take into account the mean absolute reconstruction errors c(h) of the h th harmonic parameters. This can be done by changing A (h ) (t) → c −1 (h )A (h ) (t) in all the expressions (4.9). Unfortunately, the errors c(h) are very hard to estimate, even roughly, and therefore we do not utilize them.
The refinement (4.9) not only makes the reconstructed NM much more accurate, solving the problem of picking up the cumulative noise of all the harmonics, but also reduces the noise level in each harmonic separately. Thus, noise-induced variations in the parameters of different harmonics are expected to be mutually independent, and so they average out being added together. As a result, the more harmonics are contained in the NM, the more accurately it can be reconstructed. While NMD is generally noise-robust, due to being based on time-frequency methods for which only the spectral power of noise in the vicinity of components' frequencies matters, NM reconstruction by (4.9) raises this robustness to an extreme extent.

Remark 4.4.
It should be noted that the refinement (4.9) is not valid if the NM waveform changes in time, i.e. the relationships between amplitudes of the harmonics and their phase shifts are nonconstant. In this case one should just sum all extracted harmonics, even though some of them might be highly corrupted. Nevertheless, in this work we assume that the waveform is constant, as otherwise even the harmonic identification scheme discussed in Sec. 3.3 might become inaccurate (if the waveform changes slowly enough, however, it still works well).

Speed improvement
To extract a particular harmonic candidate, one does not need to calculate the TFR over the whole frequency range, which would be computationally expensive. Rather, one can calculate it only in the range where the possible harmonic is expected to lie. As discussed previously, given the frequency range [ω (1) min where the first harmonic is concentrated in TFR, it is clear from (4.5) that the appropriate frequency range for the h th harmonic will be characterized by ω (h) min,max = hω (1) c ∓ max(1, h)∆ω (1) c . The multiplier max(1, h) takes into account that h can be smaller than unity, since we use h = 1/2, 1/3, . . . when checking whether the extracted component is first (or higher) harmonic (see Sec. 3.4); the frequency range should not be squeezed for h < 1 because e.g. for NM with only amplitude modulation all harmonics will have the same bandwidth (see Sec. 4.1).
Given the extracted time-frequency support of the first harmonic [ω − (t), ω + (t)], the required frequency range for it to be fully contained in the TFR can be estimated as [ω (1) min , ω (1) max ] = [min ω − (t), max ω + (t)]. However, the maximum value of the ω + (t) might be excessively large (it is in theory infinite e.g. if the signal is represented by a single tone); the same applies to minimum ω − (t). Therefore, for determination of the harmonic frequency range it is better to use a support [ω − (t),ω + (t)] that is narrower in frequency, determined as the maximum region where the TFR amplitude of the first harmonic is non-negligible. Mathematically it can be defined as where ω p (t) ∈ [ω − (t), ω + (t)] is the position of the peak, as always. Ifω − (t) (4.10) does not exist, e.g. if the |H s (ω − (t), t)| ≥ s |H s (ω p (t), t)|, we assignω − (t) = ω − (t); we do the same forω + (t) in a similar case. In this work we set s = 0.001. Using (4.10), the frequency range for the h th harmonic can be estimated as where perc p denotes the p'th percentage largest value of the argument (with perc 0 and perc 1 corresponding to usual minimum and maximum, respectively). The 95% largest value ofω + (t) instead of the overall maximum is taken in (4.11) to prevent selection of an excessively wide region, because due to noiseω + (t) might sometimes be too large; the same applies toω − (t). In the process of harmonic extraction for h > 1 (h < 1) it is also useful to restrict ω (h) min (ω (h) max ) to be higher than the 5% (smaller than the 95%) value of the instantaneous frequency of the last true harmonic being extracted.
The range [ω (h) min , ω (h) max ] can, however, be more compressed or stretched, depending on the window/wavelet resolution parameter f (h) 0 used for the analysis of the h th harmonic. The expression (4.11) is optimal only for , the range (4.11) is too large; although one can in principle "squeeze" it in this case, due to noise the appropriate squeezing cannot always be estimated well, so it is better to leave all as it is. On the other hand, for f (h) , one might need a wider range than (4.11), so it should be stretched. Therefore, if (4.12) is not the case, the values (4.11) should be updated as where ω (h) ≡ h 2 ω (1) min + ω (1) max and ∆ω (h) min,max ≡ ω (h) min,max − ω (h) are calculated from the estimates (4.11). The expressions (4.11) and (4.13) together give the frequency range in which to calculate the WFT/WT (based on the window/wavelet (2.7) with the resolution parameter f (h) 0 ) for the h th harmonic.
To reduce the computational cost of the surrogate test against noise, utilized for the stopping criterion (see Sec. 3.5), one can use the same trick as above, calculating TFRs for surrogates in the frequency range where the original component resides. After extracting the component from the original TFR, this range can be estimated by (4.11) with h = 1.

Parameters and their choice
There are different settings that can be used while applying NMD. However, many of the parameters are either set at well-established values, or can be chosen adaptively, thus removing the ambiguity and making NMD a kind of superadaptive method. The settings and their choice are discussed below.

Resolution parameter f 0
The resolution parameter f 0 of the window/wavelet (2.7) determines the time and frequency resolution of the WFT/WT, i.e. how fast time changes can be reflected and how close in frequency components can be resolved, respectively. Time and frequency resolutions are not independent, being inversely proportional to each other, and an increase of f 0 increases the frequency resolution but decreases the time resolution. This inability to locate the component precisely in both time and frequency is a very general issue, which manifests itself in nearly every aspect of signal analysis and represents the time-frequency analogue of the Heisenberg uncertainty principle in quantum mechanics [47,15,48,13].
Although we have discussed how to choose f 0 for harmonics given the extracted component, it is still unclear how to choose it at the first step, when we extract the main component. Generally, one needs to select f 0 based on a compromise between better reflecting time variability and better resolving components in frequency, so the optimal choice depends on the characteristics of the components contained in the signal. Unfortunately, at present there does not seem to be any universal approach enabling one to choose f 0 appropriately for any given signal (see [24,13] for a discussion of this issue and the effects of different choices), so it remains the only important parameter of NMD that cannot be adaptively selected. Its choice, however, seems to be slightly more universal for the WT, as the latter adjusts its resolution in relation to the frequency band being studied; for it, one usually uses f 0 = 1, setting some standard limits on the allowable relative amplitude/frequency modulation of the components and the frequency distances between them.
Nevertheless, the very possibility of adjusting the time and frequency resolution is a great advantage of TFR-based methods. For example, as discussed previously in Remark 4.1, the (E)EMD has time and frequency resolution properties similar to WT but, in contrast to the latter, its resolutions are set around some implicit values and cannot be changed. The choice of f 0 therefore gives NMD more flexibility in comparison to methods not possessing such a parameter.

TFR type: WFT or WT?
The difference between the WFT and the WT lies in the type of frequency resolution, which is linear for the former and logarithmic for the latter. Thus, the ability of the WT to resolve components in frequency worsens with increasing frequency, while its ability to reflect time variations improves; in contrast, the WFT does not discriminate between components based on their characteristic frequencies. Preference for one type of TFR over the other therefore depends on the signal structure. The WT is especially suitable if the higher frequency components contained in the signal are more distant in frequency and have higher time variability than those at lower frequencies, which is often the case for real signals; otherwise, the WFT is preferable.
Without some a priori knowledge, it is hard to select adaptively (i.e. automatically) the most appropriate type, especially given the associated problems related to the choice of the resolution parameter, as discussed above. However, after one component has been extracted (even roughly), selection of the optimal resolution type can be made based on its properties. Thus, if the timevariability of the component's parameters increases with frequency, then the WT is the most suitable, whereas otherwise one should prefer the WFT. Given the initially extracted component's amplitude A(t), phase φ(t) and frequency ν(t), an empirical condition for selecting the most appropriate TFR type can be stated as:

Thus, the values of V[∂ t A(t), ν(t)]
and V[∂ t ν(t), ν(t)] quantify whether the amplitude and frequency modulation of the component become stronger with increasing frequency (V < 1) or not (V > 1). In the former case a reliable representation of the component requires higher timeresolution for higher frequencies, so the WT is preferred, while for the latter case the WFT should be used. For example, for the linear (hyperbolic) chirp ν(t) = ν 0 + at (ν(t) = exp(at)) one has V[∂ t ν(t), ν(t)] = ∞ (V[∂ t ν(t), ν(t)] = 0), reflecting the fact that the WFT (WT) is most appropriate for its representation, in accordance with what is known [13]. The derivatives ∂ t ν(t) and ∂ t A(t) in (5.2) can be estimated by numerical differentiation. However, when noise is present in A(t) and ν(t), it will be greatly amplified in such estimates, so they will be generally quite noisy. Consequently, instead of standard deviations std[x(t)] in (5.2), it is better to use 75-percentiles, i.e. the width of the range within which 75% of the values of x(t) are contained. Alternatively, one can reconstruct ∂ t ν(t) and ∂ t A(t) by deriving the direct reconstruction formulas for them as explained in [12].
The remained question is which TFR type to use for the preliminary signal exploration, i.e. extraction of the component's parameters to be used in (5.2). As was discussed, the answer depends on the signal structure, and there is no universal criterion. However, since the WT is usually faster to calculate (due to its logarithmic scale) and generally has a more universal choice of the resolution parameter, we use it.
Summarizing, we calculate the WT of the signal and extract from it the component and its parameters. Then we utilize the criterion (5.2) and determine which is the best type of TFR to use in the given case. If this is the WT, we retain the component already extracted; otherwise, we calculate the WFT of the signal in the corresponding frequency range (4.11) and re-extract all the parameters from it. To preserve the time and frequency resolution properties for which the component was extracted from the WT (which we assume to be appropriate for it), the resolution parameter for the WFT f (WFT ) 0 should be adjusted accordingly. If the WFT and WT are calculated using a Gaussian window and lognormal wavelet (2.7), the rule is It follows from the fact that, for window and wavelet (2.7) and not too small f (WT ) 0 , the frequency resolution of the WT around ω = 2π (if "linearized") is similar to the frequency resolution of the WFT for the same resolution parameter [12]; taking into account the scaling nature of the WT, i.e. its frequency-dependent resolution, one then obtains (5.3). Since for all harmonics by definition one would have identical V[∂ t ν(t), ν(t)] and V[∂ t A(t), ν(t)] in (5.2), the same type of frequency resolution (linear or logarithmic) is appropriate for all of them. Hence, for the extraction of harmonics we use the same TFR type as was selected for the original component.
Remark 5.1. The logarithmic frequency resolution of the WT might by itself introduce correlation between the frequency and the amplitude/frequency variations of the component. For example, when the signal is corrupted with white noise, its power in the WT will be proportional to frequency, leading to larger noise-induced amplitude/frequency variations in the extracted component at higher frequencies. Such artificial correlation might cause the criterion (5.2) to select the WT as the preferred representation even when this is not the case. To avoid this, we use threshold 1.1 instead of 1 in (5.2).

Reconstruction method: direct or ridge?
As mentioned in Sec. 3.1, one can use either of two alternative methods for reconstruction of the components from the WFT/WT: direct (3.4) or ridge (3.2). The differences and errors of each method were studied in detail in [24]. It was found that the ridge method is more noise-robust, but that the direct method allows the time variability in the component's parameters at low noise levels to be followed more accurately.
For some parts of the NMD procedure it is inherently better to use a particular reconstruction method. Thus, in the criterion for selecting the TFR type (5.2) we always use the parameters reconstructed by the ridge method due to the noise-robustness of such estimates. Additionally, while testing the signal against noise in the stopping criterion (Sec. 3.5) we also use the ridge estimates for calculating the discriminating statistics D (3.10). This seems natural because the curve extraction is based on the amplitudes and frequencies of the peaks (see Sec. 3.1), and we have found such an approach to be more stable than using the direct estimates; the noiserobustness of ridge reconstruction is advantageous here as well.
For the other parts of the procedure, the reconstruction method can be chosen adaptively. To choose it at the first step, when we have extracted the time-frequency support [ω − (t), ω + (t)] of some component and need to reconstruct its parameters, one can use the approach suggested in [24]. Thus, we first reconstruct the parameters by both methods, getting A (d,r) (t), φ (d,r) (t), ν (d,r) (t), where "d" and "r" denote direct and ridge estimates, respectively. Then we compute the TFR (using the same parameters as originally) of the signal s (d) (t) = A (d) cos φ (d) , extract the ridge curve from it (taking simple maximuma ω p (t) = argmax ω |H s (ω, t)| is sufficient here), and reconstruct by direct method the "refined" parametersÃ (d) ,φ (d) ,ν (d) . We do the same for the "ridge" signal s (r) (t) = A (r) cos φ (r) , now using ridge method to reconstruct the refined estimates. Then we compute inconsistencies ε (d,r) a,φ,ν between the original and refined estimates for each method: where κ (d,r) A,φ,ν are the coefficients to tune the performance of the approach (we empirically found Obviously, for the exact estimates one has ε A,φ,ν = 0, so it is natural to assess the relative performance of the reconstruction methods based on their inconsistencies (5.4). Therefore, we use the direct amplitude estimate if ε (d) A < ε (r) A , and the ridge amplitude estimate otherwise; the same applies to the phase and frequency estimates. Although empirical, this approach works very well in practice.
Remark 5.2. To implement the criterion (5.4), one needs to compute two new TFRs, for the direct and ridge signals s (d,r) (t). Obviously, one does not need to calculate them for all frequencies, and one can restrict the frequency range to that of the original component [ω (h) min , ω (h) max ], which can be estimated by (4.11). This will improve the speed of the procedure.
For harmonics the appropriate reconstruction method can be determined simply as that giving the highest consistency ρ (h) (3.6). Therefore, while adapting the resolution parameter for the harmonic, at each f 0 we reconstruct its parameters by both methods and at the end select the one characterized by highest ρ (h) .
An adaptive choice of reconstruction method, as outlined above, ensures that in most cases we get the best possible estimates: when the amplitude/frequency modulation is low and noise is high the noise-robust ridge reconstruction method is used, while in the opposite case the direct method is applied, giving more accurate amplitude/frequency variations. This approach further increases the noise-robustness and the accuracy of the NMD.

Other parameters
All the other parameters can be partitioned onto two groups: some pre-fixed settings and the parameters of numerical accuracy. The former group includes the significance levels for the two surrogate tests (used for identification of the harmonics, Sec. 3.3, and for the stopping criterion, Sec. 3.5), the minimal consistency ρ min (3.9) etc. Each of these parameters is set to a well-established value, either corresponding to a standard convention (such as 95% significance of the surrogate tests [26,27]) or found empirically (e.g. the expression (3.9) for ρ min ). The second group includes the accuracy with which to determine the optimal f 0 while adapting it to the harmonics, the precision s which to use for determining the minimal support (4.10) etc. Here one faces the usual tradeoff between accuracy and computational cost. The default values, however, are sufficient in most cases and further increase of precision leads to only a slight improvement in the accuracy of the method.

Simulation examples
Having described all parts of the NMD procedure, we now illustrate the method by consideration of some specific examples. In this section we consider simulated signals whose composition is precisely known, so that one can assess reliably the performance of NMD and compare it to that of the other methods.

Example 1
As a first, simple and illustrative example, we take the signal depicted in Fig. 5(a). Its WFT and WT are shown in Fig. 5(b) and (c), respectively. From the figure it is immediately evident that, while in the WFT the harmonics with frequencies around 3 and 4 Hz are distinguishable, in the WT they interfere strongly and cannot be resolved; in contrast, the WT has high enough timeresolution at 7 Hz to represent the frequency modulation of the corresponding harmonic, while in the WFT this highest harmonic self-interferes (there appear the "bridges" between consecutive frequency modulation cycles), indicating that the time resolution is insufficient. Therefore, for the present case one cannot appropriately extract all harmonics using either WFT or WT with constant f 0 . However, adaptive representation of the harmonics, as discussed in Sec. 4.1, solves the problem.
The NMD proceeds as follows. First it extracts and reconstructs the dominant component from the signal's WT (located at 1 Hz and corresponding to the first harmonic of the NM in the present case). It is tested with the surrogates against noise (see Sec. 3.5) and passes the test (100% 25 Figure 6: The WFTs from which each true harmonic is reconstructed, with the corresponding extracted ridge curves being shown by solid red lines. The window resolution parameter f 0 is adjusted individually for each harmonic. After the harmonic is reconstructed, it is subtracted from the signal, so that e.g. the first harmonic no longer appears in the WFTs of (b)-(d). Note that the color scaling in (b)-(d) covers half the amplitude range of that in (a). significance). Using (5.2), the WFT is determined to be more suitable for the representation of this component, because its amplitude and frequency modulations do not depend on frequency; this type of TFR is then used in what follows. The component is re-extracted from the signal's WFT calculated in the corresponding frequency range (around 1 Hz), and its parameters are reconstructed by both direct (3.4) and ridge (3.2) methods. Using (5.4), it is established that ridge estimates of amplitude, phase and frequency seem to be more accurate in the present case. This is to be expected, given the high noise level around 1 Hz and the pronounced noise-susceptibility of direct estimates. The ridge estimates are therefore taken as the ones to be used. Next, we test whether the extracted component is the first harmonic by checking its possible 1/h harmonics. Thus, first the 1/2 harmonic is extracted and tested for being true (see Sec. 3.3) using different resolution parameters f 0 within the range (4.8); among the direct and ridge estimates obtained for f 0 at which the harmonic was identified as true, we choose those maximizing the consistency (3.6). In the present case, the 1/2 harmonic is identified as false for all tested f 0 , so it is discarded. We do the same for 1/3 and 1/4 harmonics, which are both identified as false. Since there are S 0 = 3 consecutive false harmonics (1/2, 1/3 and 1/4), we stop the procedure and correctly conclude that the extracted component is first (and not higher) harmonic of the NM.
We then extract and test the higher harmonics h = 2, 3, ... in qualitatively the same way as was done for h = 1/2, 1/3, .... If some harmonic is identified as true, we subtract it from the signal to remove its interference with higher harmonics. As a result of the procedure, all genuine harmonics h = 3, 4, 7 were correctly identified as true, and all the others as false. The resolution parameters were adapted for each harmonic so as to optimally represent and reconstruct it, as discussed in Sec. 4.1 and illustrated in Fig. 6 for the present case. Harmonic extraction was stopped when S = 3 consecutive harmonics h = 8, 9, 10 were identified as false.
We then refine the harmonics' parameters using (4.9) and reconstruct the full Nonlinear Mode. This NM is then subtracted from the signal, and the procedure is repeated. However, when we extract the next component, it does not pass the surrogate test against noise (see Sec. 3.5), and therefore NMD is stopped, with one NM being extracted and the residual correctly identified as noise (in the present case Brownian). The result of NMD is shown in Fig. 7, from which one can see that even the residual Brownian noise is well-recovered. To the best of the authors' knowledge, there is at present no other method that can decompose even the current relatively simple signal (shown in Fig. 5(a)) in such an accurate and physically meaningfully   Fig. 5. Red lines, where present, show the real 1st harmonic (in C4 for EMD and C5 for EEMD), sum of the 3rd and 4th harmonics (in C3 for EMD and C4 for EEMD) and the 7th harmonic (in C2 for EMD and C3 for EEMD). The bottom panels show the sum of the components 7 to 13. For EEMD, we have used 1000 noise realizations with standard deviations ten times smaller than that of the signal.

way.
For example, the results of EMD [4] and EEMD [5] procedures are shown in Fig. 8. In contrast to NMD, (E)EMD produces 13 distinct components, with only the first harmonic of the NM being more-or-less reliably recovered in one of them. Thus, in Fig. 8 C4 for EMD and C5 for EEMD represent the first harmonic, C3 for EMD and C4 for EEMD is the noise-spoiled mix of the 3 and 4 harmonics, C2 for EMD and C3 for EEMD is the badly corrupted 7th harmonic (with influence from harmonics 3 and 4 in the case of EEMD), while none of the other "components" has any physical meaning at all.

Example 2
To demonstrate the exceptional noise-robustness of NMD and the power of the surrogate test in distinguishing true from false harmonics, we consider the signal shown together with its WFT and WT in Fig. 9. As can be seen, the harmonics of the second NM are located exactly at the places where the harmonics of the first NM are expected to be, so that they can very easily be confused with them. Moreover, it is very hard to distinguish true from false harmonics in the present case because each NM has constant amplitudes and only very small frequency modulation (the absolute deviation between the expected frequency of the second harmonic of the first NM and the frequency of the first harmonic of the second NM is only |2ν (1) 1 (t)−ν (1) 2 |/2π = 0.016 ± 0.014 Hz). Furthermore, the noise is exceedingly strong, in standard deviation being 1.5 times that of the full noise-free signal, 1.8 times that of the first NM, 2.7 times that of the second NM, and 12.2 times that of the smallest (3rd) harmonic of the second NM, located at around 6 Hz (the latter is buried under the noise and not even visible in the WFT shown in Fig. 9(b)).
Because the noise is white in the present case, its power in the WT grows with frequency, as seen from Fig. 9 (note, however, that such a situation rarely arises for real signals). Consequently, instead of using the WT and then adaptively selecting the appropriate type of TFR, for the present case we use the WFT from the very beginning. Nonlinear Mode Decomposition then proceeds as usual: it first extracts the dominant component and tests it against noise; if it passes the test, NMD extracts its harmonics and identifies the true ones; then the full NM is reconstructed and subtracted from the signal, after which the procedure is repeated on the residual until it does not pass the surrogate test against noise.
The relevant information about the NMD outcome is summarized in Table 1. As can be seen, NMD correctly identifies all harmonics and joins them into two distinct modes. Moreover, the amplitude ratios a h and phase shifts ϕ h for each NM (see (2.4)), calculated from the reconstructed amplitude and phases as a h = A (h) / A (1) and ϕ h ≡ arg e i(φ (h) −hφ (1) ) , are very close to their actual values; this is true even for the (buried in noise) 3rd harmonic of the second NM. The   Fig. 9. For each NM the significance of the surrogate test against noise (see Sec. 3.5) was based on 100 surrogates. The columns left-to-right provide information for the h th harmonic: the value of the consistency ρ (h) (1, 1, 0) (3.6); the significance of the test against independence (see Sec. 3.3); the method chosen for reconstruction of the harmonic (see Sec. 5.3); the resolution parameter f (h) 0 adapted for the harmonic considered (see Sec. 4.1); the true amplitude ratio a h ≡ A (h) (t)/A (1) (t); the extracted amplitude ratio; the true phase shift ϕ h ≡ φ (h) (t) − hφ (1) (t); the extracted phase shift. Harmonic was identified as true if both ρ (h) ≥ ρ min = 0.25 (3.9) and the significance level is ≥ 95%, with the corresponding rows being shaded with gray. The numbers of real harmonics are emboldened, and it can be seen that only they were identified as being true. For simplicity, the results for h = 1/n harmonics (which are tested to check whether the extracted component is the first harmonic) are not shown; in each case all three consecutive h = 1/2, 1/3, 1/4 harmonics were identified as false.
ridge method is automatically selected for reconstructing of all harmonics, which is indeed the preferable choice due to its noise-robustness [24].
From Table 1 it can be noted that, for all true harmonics, the resolution parameter f 0 used is higher than the original. This is because the higher the f 0 , the easier it is to segregate the component from noise [24]. However, increasing f 0 at the same time worsens the accuracy of representation of amplitude/frequency modulation of the component [24], so its choice is determined by a compromise between reflecting well the time-variability of component's parameters and suppressing the noise; the adaptive scheme that we use, described in Sec. 4.1, effectively implements this criterion.
The final results of NMD are shown in Fig. 10. Both NMs are reconstructed with great accuracy, which is like a miracle given such strong noise corruption of the original signal (see Fig. 9); even the residual noise is recovered almost exactly. Such performance is unachievable with the other existing methods, e.g. (E)EMD in the present case produces 13 components, and none of them reliably represent even a single harmonic (not shown). Note, that NMD can produce even more accurate results if the resolution parameter was adjusted from the very beginning, i.e. for the first harmonics (and not only the higher ones). However, as mentioned in Sec. 5.1, at present there does not seem to be a good and universal way of doing this. 29

Real life examples
After demonstrating its success in the analysis of simulated signals, we now apply NMD to real data.

Decomposing human blood flow signals
The decomposition of skin blood flow signals, measured non-invasively by laser-Doppler flowmetry (LDF) [49], is believed to be a very tough task, with no method at present being able to do it well. This was demonstrated in [11] for the example of Karhunen-Loève decomposition, while (E)EMD usually also fails. Thus, blood flow signals contain large amount of "physiological" noise, as well as having components with amplitude/frequency modulation whose strength and speed change with time. Nonetheless, as will be seen, NMD can often tackle even such complicated cases.
Blood flow signals exhibit oscillatory activity at multiple frequencies, and the WT has been found especially useful in studies of their structure [50]. Six oscillations have been identified in blood flow and attributed to different physiological mechanisms [3,50,51,52], with characteristic frequency ranges of (approximately): 0.6-2 Hz (I), 0.15-0.6 Hz (II), 0.04-0.15 Hz (III), 0.02-0.04 Hz (IV), 0.01-0.02 Hz (V) and 0.005-0.01 Hz (VI). Range I corresponds to cardiac oscillations, which originate from heart activity, rhythmically pushing blood through the circulatory system, thus directly propagating into blood pressure and flow. Range II corresponds to respiratory oscillations, which are the consequence of the mechanical influence of respiration on the cardiac output and, to a smaller extent, the respiratory modulation of the heart rate [53,54]. The III range oscillations by some authors are attributed to the sympathetic nerve activity, being regarded as a result of time-delays in the baroreflex feedback loop [55,56], while the others relate them to myogenic activity of the smooth muscle cells [3,50,51]. Finally, the oscillations in the IV, V and VI ranges were attributed to the neurogenic, NO-dependent endothelial and NO-independent endothelial activity [3,50,51,52], respectively. Figure 11: (a) An example of the blood flow signal, measured by LDF with the probe over the right wrist caput ulna (for more details, see [51,3]); the signal is sampled at 40 Hz for 1800 s, and the panel shows its central 600 s part. (b) The WT of the signal shown in (a). Gray-dotted lines partition the frequency axis into the regions within which the physiologically meaningful oscillations are located (according to [3,50,51,52], see the text). Bold colored lines show those extracted components which pass the surrogate test against noise, with thinner lines of the same color showing their higher harmonics. Armed with this knowledge, we now apply NMD. To better utilize the prior information, we apply the procedure to each of the mentioned physiological frequency ranges individually, starting from the first one. Thus, for a given range we extract the dominant component and test it against noise; if it passes the test, we then extract its harmonics, reconstruct the NM and subtract it from the signal; we then repeat the procedure for the same range until the next extracted component does not pass the test against noise.
The results of the procedure are shown in Figs. 11 and 12 for the examples of two blood flows. Clearly, NMD is able to decompose these signals into a physically meaningful oscillations with complex waveforms (and it also returns their amplitudes, phases and frequencies). In both cases, we were able to extract the cardiac component (around 1 Hz), while activity in ranges IV, V and VI did not pass the test against noise. However, in the example of Fig. 11 there are strong sympathetic/myogenic oscillations (around 0.1 Hz), which were extracted, while we see no activity in the frequency range II (respiratory) (Fig. 11(b)). In contrast, for the example of Fig. 11 the respiratory oscillations are present and the sympathetic/myogenic are buried under noise. The waveforms of the cardiac oscillations are also different in two cases.
In general, the properties and presence of the oscillations in blood flow varies from subject to subject, being influenced by many factors, such as the state of the microvasculature (which might in turn be influenced by age and gender), properties of the skin etc. As was demonstrated, NMD can be very useful for the study and classification of these effects, e.g. it can be used to investigate the relationship between the cardiac waveform (as well as other blood flow properties) and the health conditions. Remark 7.1. As discussed in Sec. 3.5, even if the component extracted from a particular frequency range does not pass the surrogate test against noise (as e.g. for ranges IV, V and VI in the above examples), this does not necessarily mean that there are no physiologically meaningful activity there. Thus, the underlying oscillations might be simply very small so that they are easily masked by noise, which is often the case for the respiratory oscillations. The other possibility is that the resolution parameter used is not appropriate to represent reliably the component of interest. This is often the case for sympathetic/myogenic oscillations, which might change their amplitude/frequency very rapidly at certain times. In fact, the best choice in such situations would be probably some time-varying f 0 (t), but its form is generally very hard to choose.
It should be noted, that NMD can also serve as an initial preprocessing which needs to be performed prior to applying any of the numerous existing methods for studying monocomponent signals or their sets. Thus, having first decomposed the original signal into its constituent NMs, one can then investigate the latter using one or some of the huge diversity of available techniques. For example, the structure of the interactions between different modes can be recovered by applying Bayesian inference to the extracted phases [57,58], as was done in [2] to reconstruct the cardiorespiratory interaction; in this application, the high accuracy of the phases returned by NMD is especially advantageous.
Another important application is the classification of the oscillations present in the signal, which is of particular interest because it might yield valuable insights into the nature and properties of the underlying phenomena. Following the recent introduction of chronotaxic systems [59,60,61], it is clearly desirable to be able to determine whether the originating systems generating the different modes are chronotaxic or not. Roughly speaking (see [59,60] for a more detailed definition), the system is chronotaxic if it is: (a) oscillatory (i.e. characterized by a limit 32 cycle); and (b) its phase φ(t) does not just move freely along the cycle, as conventionally assumed, but is attracted to some φ A (t), conferring an ability to resist external perturbations. This is exactly what one often observes in living systems, which are able to maintain their activity within physiological ranges even when strongly perturbed, so that chronotaxic behavior is expected to be abundant within the life sciences.
To establish which oscillations contained in the signal are chronotaxic, one needs to study them separately, i.e. the signal should be first decomposed into its NMs; this can conveniently and accurately be achieved with NMD. Subsequently, for each mode one can apply the approach suggested by Clemson et al. [61]. Basically, by applying different types of filters to the extracted phase one estimates the expected difference φ(t) − φ A (t) and uses detrended fluctuation analysis [62,3] to analyze the associated correlations, which are expected to differ between chronotaxic and non-chronotaxic systems. We have applied this method to the NMs of both of the examples in Figs. 12 and 11, but have not found clear evidence of chronotaxicity in any of the corresponding oscillations. However, since the method of Ref. [61] is based on a particular set of assumptions (e.g. that the dynamical perturbations take form of the Gaussian white noise), these oscillations could in principle still be chronotaxic though falling outside the model considered in [61].

Removing cardiac artifacts from a human EEG signal
Nonlinear Mode Decomposition can also be used to filter the signal from an extraneous oscillatory activity, provided that there is an associated signal from which the phase and frequency of the latter can accurately be extracted. Thus, using the NMD harmonic extraction procedure, the fundamental harmonic of the extraneous mode can be extracted as h = 1 harmonic of the reference component based on its (reference) phase and frequency. In this way the resolution parameter is adjusted from the start, allowing one to represent this fundamental oscillation well even if it is strongly corrupted by noise or other influences. We will illustrate this alternative use of NMD through the example of removing cardiac artifacts from the human electroencephalogram (EEG) recording, using the electrocardiogram (ECG) as the reference signal.
The EEG often contains artifacts related to heart activity, which arise due to blood flow pulsations underneath the probes (the so-called ballistocardiogram (BCG) artifacts [63]) and possibly also due to direct pick up of heart electrical activity. The BCG artifacts are extremely prominent in the EEG measured in a magnetic field (e.g. simultaneously with a magnetic resonance imaging scan), in which case they are usually filtered out by independent component analysis (ICA) [8,9]. However, ICA requires as many simultaneous EEG measurements as possible, with cardiac artifacts being prominent and taking a similar form in most of them. Therefore, it will obviously fail to remove the artifacts if only one EEG and one ECG signal are available, as the form taken by cardiac activity in the EEG is completely different from its form in the ECG signal (see below), contradicting the assumption of linear mixing on which ICA is based. In fact, we have found that ICA fails even given four EEGs, which is probably because the artifacts in them are relatively small, being hard to distinguish (although still capable of affecting some time-frequency measures); and, additionally, because the form of these artifacts might be different in different EEGs (perhaps dependent on probe position). The (E)EMD method, too, fails to provide meaningful results in the present case.
Simultaneously measured EEG and ECG time series for the same subject are presented in Fig. 13(a) and (b), with their WFTs for the default resolution parameter f 0 = 1 being shown in (c) and (d), respectively. Clearly, for f 0 = 1 the cardiac harmonics are not well distinguishable in the EEG's WFT (Fig. 13(c)), so that the extracted curve might not be very accurate. However, the cardiac phase φ (1) (t) and frequency ν (1) (t) can of course be extracted directly from the WFT of 33 Figure 13: (a) The EEG signal (measured using BIS electrode placed on the forehead, as described in [64]). (b) The ECG signal (3-lead, with electrodes on shoulders and the lowest left rib, see e.g. [65]). (in the notation of (2.4)); the gray-dashed line represents a pure sinusoid with the amplitude of the first harmonic, for comparison. The full signals are of duration 20 min. and are sampled at 80 Hz; the EEG was actually measured under anaesthesia, but the same artifacts arise under all conditions. the ECG (Fig. 13(d)). They should be the same for the first cardiac harmonic in the EEG and in ECG, because both activities obviously have the same rhythmicity; however, the corresponding amplitudes might be different, and perhaps not even proportional to each other. This is because, depending on the measurement and the environment, the same activity might undergo various transformations that can change its amplitude dynamics and the corresponding waveform, but leave phase dynamics largely unaltered. For example, Nonlinear Modes c(t) = A(t)[cos φ(t) + a cos 2φ(t)] and exp[c(t)] will have the same fundamental phase and frequency, but different amplitude dynamics (i.e. the ratio between the amplitudes of the corresponding fundamental components will be time-varying) and different relationships between the harmonics. Therefore, we use the cardiac phase φ (1) (t) and frequency ν (1) (t) extracted from the ECG as the parameters of the reference component, and extract the main cardiac component from the EEG as its h = 1 harmonic. This is done in the same way as discussed in Sec. 3.2, i.e. selecting the peaks nearest to the expected frequency ν (1) (t). We test this harmonic for being true (Sec. 3.3) and adapt the resolution parameter for its best representation (Sec. 4.1) in the usual way, except that now we use everywhere the phase consistency ρ (1) (0, 1, 0) (3.6) instead of the usual amplitude-phase consistency ρ (1) (1, 1, 0). This is because of the possible discrepancy between the amplitude dynamics of the components mentioned above (note that the threshold (3.9) also becomes ρ min = 0.5). If the extracted first harmonic is regarded as true, we use its parameters to extract the higher cardiac harmonics from EEG (now using unmodified procedures) and reconstruct the corresponding NM.
Remark 7.2. Note that there might in general be a time delay between related activities in different signals, e.g. ECG and cardiac artifacts in EEG. However, unless it exceeds the characteristic time of amplitude/frequency variations or the minimal surrogate time shift (see Sec. 3.3), it should not influence the results significantly. Nevertheless, to be fully rigorous one can additionally adjust the timings of both components by maximizing frequency consistency ρ (1) (0, 0, 1) (3.6) between them.
The EEG's WFT adapted (by maximizing phase consistency ρ (1) (0, 1, 0)) for representation of the cardiac component is shown in Fig. 13(e). Clearly, the corresponding ridge curves became much more visible than in the default WFT presented in Fig. 13(c). The cardiac artifacts extracted from the EEG are shown in Fig. 13(f). Their waveform, presented in Fig. 13(g), very much resembles the shape of the cardiac waves in the blood flow (cf. Fig. 12), but not that of the ECG. This is an indication of the BCG mechanism by which these cardiac artifacts are generated, which is also supported by the fact that their strength (and even their shape) might be different in EEGs from different probes for the same subject. Note that, depending on the particular EEG measurement, the artifacts might be inverted.
There are many other possible applications of the discussed NMD-filtering. For example, given the reference ECG and respiration signals, one can use the same approach to extract the heart and respiratory components from the blood flow more accurately (especially for respiratory activity) than is possible using the straightforward decomposition as in Sec. 7.1. Note that if in the given signal there is no oscillation related to the reference phase and frequency, then the corresponding first harmonic extracted from it will be regarded as a false harmonic of the reference component. Thus, we have not found any respiratory oscillations in the EEG, implying that the measurement process is almost unaffected by breathing.
In general, the situation when one signal contains components related to other signals is ubiquitous in real life, so that the NMD-based filtering is expected to be very useful in many contexts. A great advantage of this approach is that it does not require related oscillations in different signals to be of the same form (as is the case e.g. for ICA), but only to have the same phase dynamics.

Conclusions
We have introduced a new decomposition method -Nonlinear Mode Decomposition (NMD), which is based on the time-frequency analysis techniques [12,24,19], surrogate data tests [26,27,28,29], and the idea of harmonic identification [30]. NMD consists of a number of ingredients, as described in Sec. 3, each of which is a useful technique in its own right. For example, one can use the approach discussed in Sec. 3.5 for testing the signal against noise, which proves to be much more noise-robust than the conventional tests based on the correlation dimension or other nonlinearity measures [66,26,27].
We have successfully applied NMD to both simulated and real data, and demonstrated its advantages over the existing approaches. Unlike many previous methods, NMD recovers the full underlying oscillations of any waveform. It is also extremely noise-robust and in a sense super-adaptive, meaning that most of its settings are automatically adapted to the properties of the particular signal under investigation. Finally, in contrast to the other methods, NMD returns only physically meaningful modes, stopping the decomposition when the residual is just noise.
The area of applicability of NMD is extremely wide. For example, it would now be reasonable to reconsider all those cases where (E)EMD has been applied to date (e.g. see references in [67,5]). Furthermore, as demonstrated above, the exceptional noise-robustness of NMD and its other advantages allow one to apply it even in cases where all the other methods fail completely. Thus, it can be applied to almost all multicomponent signals of the kind that are ubiquitous in the life sciences, geophysics, astrophysics, econometrics etc. We therefore expect that, with time, NMD will become a new standard in the field.
The latest MATLAB codes needed for running NMD are freely available [68], together with detailed instructions and examples, in both text and video formats.