Decomposing spectral and phasic differences in non-linear features between datasets

When employing non-linear methods to characterise complex systems, it is important to determine to what extent they are capturing genuine non-linear phenomena that could not be assessed by simpler spectral methods. Specifically, we are concerned with the problem of quantifying spectral and phasic effects on an observed difference in a non-linear feature between two systems (or two states of the same system). Here we derive, from a sequence of null models, a decomposition of the difference in an observable into spectral, phasic, and spectrum-phase interaction components. Our approach makes no assumptions about the structure of the data and adds nuance to a wide range of time series analyses.

When employing non-linear methods to characterise complex systems, it is important to determine to what extent they are capturing genuine non-linear phenomena that could not be assessed by simpler spectral methods. Specifically, we are concerned with the problem of quantifying spectral and phasic effects on an observed difference in a non-linear feature between two systems (or two states of the same system). Here we derive, from a sequence of null models, a decomposition of the difference in an observable into spectral, phasic, and spectrum-phase interaction components. Our approach makes no assumptions about the structure of the data and adds nuance to a wide range of time series analyses.
Non-linear methods are useful for characterising differences between various states of a complex system, and have found applications in a wide range of scientific domains. For example, Lempel-Ziv (LZ) complexity [1] and multiscale entropy [2] have been successful in discriminating between conscious and unconscious brain activity [3], and have yielded insights into physiological pathologies [4] and price dynamics [5]. However, more refined conclusions could be obtained if there were a principled way to assess how much of the differences in such measures are due to genuine non-linear effects, and how much is explainable by changes in the power spectrum.
A popular approach to study the effect of spectral and phasic contributions on an observable is via surrogate data methods [6], which examine whether its value is representative of a null distribution obtained from surrogate data. Such surrogate methods are regarded as a basic constituent of the data analyst's toolkit [7], and have been extended to a range of scenarios including multivariate time series [8], non-stationary data [9], and many others. However, surrogate methods are typically designed to be applied on a single dataset, and it is not straightforward to use them to disentangle spectral and phasic contributions on differences in an observable between two datasets -e.g. how much of the difference in LZ complexity between two neurological conditions simply reflects the known spectral changes between them [10]. The crux of why this is challenging, and why naive applications of typical surrogate methods fail, is that the difference between two null models is not necessarily a good null model of the difference (see Supp. Mat. for a detailed example).
To deal with this issue, here we present a novel decomposition of the difference in an observable between two time series datasets into spectral, phasic, and spectrumphase interaction components. The decomposition makes no assumptions about the structure of the data, and is widely applicable to a broad range of scenarios of interest. We illustrate our method by analysing LZ complexity on neuroimaging data, where our decomposition identifies phasic and spectrum-phase interaction components that take the opposite sign to the predominantly spectral overall effect, and which would not have been detectable by previously existing methods.
The decomposition. Let us consider a scientist who is interested in an observed difference in some quantity f between data recorded in two different conditions, denoted by X and Y. The data consist of time series recordings, and a set of time series segments are obtained from each condition. Each segment could correspond to data recorded from, e.g. different participants in an experiment, or different time periods from the same participant. The whole dataset from the first condition is denoted as x N , where N is the population size of these data, and the N time series segments within x N as x 1 , x 2 , . . . , x N . Similarly, for the second condition one has y M = {y 1 , . . . , y M }. Our goal is to decompose the difference in f between X and Y into spectral, phasic and spectrum-phase interaction components -i.e. to decompose f y k are the empirical ensemble averages of the function in question, f . This is achieved by a series of comparisons between expected f values on the data and those on a set of progressively more constrained null models for the stochastic processes underlying the data.
Formally, we consider x 1 , . . . , x N to be independent and identically distributed (i.i.d.) realisations of a stochastic process sampled under condition X , and y 1 , . . . , y M to be i.i.d. realisations of another stochastic process sampled under condition Y, and x j , y k ∈ R T , where T is the length of each time series. The decomposition utilises the discrete Fourier transform, which is denoted byx = F{x} ∈ C T , given a time series x. The amplitudes of the Fourier components are denoted by A(x) = {A 1 (x), . . . , A T (x)} ∈ R T , and their phases by φ(x) = {φ 1 (x), . . . , φ T (x)} ∈ [0, 2π] T . Thus, the data for X can be represented in the frequency domain as i.i.d. phase-amplitude tuples A(x j ), φ(x j ) , following a distribution p X (A, φ) induced by X -and similarly for the y k .
We begin by considering a null model M i on which amplitudes and phases have no interaction -i.e. are statistically independent. Accordingly, we construct new time series x (w) j that satisfy this null model by combining the spectrum of each x j with the phases from some other randomly chosen time series from within condition X (and similarly for the y k ). That is, we construct x where α j and β k are distributed uniformly over {1, . . . , N } and {1, . . . , M }, respectively. We then consider the mean value of f on these phaseshuffled data, given by ν i The spectrum-phase interaction contribution to the value of f in condition X is then calculated as where the conditional expectation averages the effect of the random integers α j on ν i . Similarly, ∆ i y M can be calculated for Y. When estimating ∆ i (x N ) and ∆ i (y M ) in practice, one will approximate the distribution of ν i by averaging multiple realisations of it. The quantity ∆ i x N measures the extent to which the expected value of f would be affected if one were to break any dependence that exists between the amplitudes and phases of thex j . Equivalently, ∆ i x N accounts for the deviation in the mean value of f in condition X from that which would be expected if the null model M i holds. For large N , the law of large numbers guarantees that and hence that in the absence of any dependency between the phases and spectra, i.e. when p Next, we focus on the phasic effect on f , i.e. the effect of differences between the phase distributions of X and Y. For this, we consider a second null model M φ under which phases are not only independent from amplitude but also follow the same distribution in each of the conditions X and Y. We construct phase-shuffled time series x (a) j , y (a) k that satisfy this null model by replacing the phases of each time series with those from another randomly chosen time series from the whole whereŵ j ,ẑ k are the discrete Fourier transforms of independently randomly chosen time series that are each drawn from X with probability 1/2, and from Y with probability 1/2. Then, we consider the mean value of f on these phase-shuffled data: We define ∆ φ y M analogously. Again, when estimating these quantities in practice, one can approximate the distributions of ν i and ν φ , for each condition, by averaging multiple realisations of them.
The quantity ∆ φ x N measures the expected effect on the mean value of f in condition X if M φ holds -i.e. the effect of changing the probability distribution of the phases from p X (φ) to the mixture (p X (φ) + p Y (φ)) /2. Note that if the distribution of phases is the same for both conditions, so that p X (φ) = p Y (φ), then the law of large numbers guarantees that Finally, we consider the effect of spectral differences between the conditions on the difference in f . For this, we consider the deviation of the phase-shuffled data above from a further constrained null model M A , in which both amplitudes and phases are statistically independent and distributed identically in X and Y. Specifically, we con- have, by definition, the same phase statistics, ν φ x N |y M and ν φ y M |x N will, on average, differ only because of differences between the distribution of the spectrum of X and Y. Therefore, we introduce as a metric of the spectral effect. If the distribution of the spectrum is the same for both conditions, then lim N,M →∞ ∆ A x N , y M = 0 by the law of large numbers. Again, when estimating this quantity in practice, one will approximate the distribution of ν A x N , y M by obtaining multiple realisations.
With these quantities at hand, via a telescopic sum we can obtain a decomposition of the total difference in f between the two conditions into spectral, phasic, and spectrum-phase interaction terms. We have that the difference in mean f values between the conditions decomposes into Of these, the first term is the difference in f that persists on data modified so the phases have the same distribution across conditions, and so corresponds to the difference attributable to spectral changes only. Similarly, by comparing the data with the observed phase distributions against data with identically distributed phases, the second term measures the difference in f attributable to phase changes. Finally, the third term compares the observed data with phase-shuffled time series to account for changes due to the phase-spectrum interaction in both conditions. Accordingly, each of the ∆'s can be considered to be comparing expected f values on the data against f values on a set of increasingly restrictive null models, M i → M φ → M A . We note that this decomposition is invariant to the order in which the decomposition is constructed, i.e. it doesn't make a difference if phasic effects are considered before spectral contributions (as described here), or vice versa (proof in Supp. Mat.).
Example. As an illustration, we present an analysis of the entropy rate of binarised magnetoencephalographic (MEG) signals, as measured with LZ complexity. We use the Cambridge Centre for Ageing and Neuroscience (CAMCAN) dataset [11], which includes a large-scale MEG dataset of participants undergoing several cognitive tasks, and study the differences in Lempel-Ziv complexity [1] between participants in wakeful rest, and participants performing a simple cognitive stop/no-go task [11]. This measure (or minor variations of it) has been widely used in the neuroscience literature [12][13][14][15], showing a remarkable performance in discriminating between different states of consciousness, for instance normal wakefulness versus sleep [3].
In this application, we consider data for 131 participants in both "task" and "rest" conditions. The data from each participant were divided up into 100 nonoverlapping windows of length T = 1024 (which corresponds to approximately 4 s given the sampling rate of 250 Hz). To compute the LZ complexity, time series were binarised, and then the original (1976) version of the LZ complexity described in Ref. [1] was computed. Binarisation was carried out based on the mean value of the time series in question, so the binarised time series contained ones where the raw value was greater than the mean, and zeros where the raw value was less than the mean.
For each of the 204 MEG channels of each participant, the decomposition in Eq. (4) was applied considering x N to be the windowed data during task and y M to be the windowed data during rest, and using 500 realisations of the random variables involved (i.e. 500 random phase shufflings). Thus, a set of ∆'s was obtained for each channel, for each participant. Then, to assess whether differences were significant at the group level, 1-sample t-tests were carried out across participants -for each of the ∆'s, for each channel. The mean value of each of the ∆'s at each MEG channel is shown in Fig. 1.
Our decomposition reveals information about the relation between task and rest that is not captured by other statistical tools. First, by studying the direct difference between LZ complexity in task versus rest, our results show a reduction of complexity in frontal regions, and an increase in the rest of the brain during the task (Fig. 1a). Our decomposition shows that the vast majority of this difference (approximately 7.5 out of 8 units) can be explained by spectral effects (Fig. 1b). Interestingly, contrasting effects are found in the phase and interaction components. In particular, during task there is a strong and heavily localised phase-amplitude interaction component, which becomes much weaker and spatially homogeneous during rest (Fig. 1c). Interestingly, both of these show the opposite trend from the direct difference, with an increase in frontal regions and reductions elsewhere during task. The neurobiological implications of these findings will be developed in a separate publication.
Conclusion. In this paper we have tackled the problem of determining to what extent a measured difference in some quantity between two time series datasets can be attributed to differences between their power spectra. For this, we introduced a decomposition that uses a sequence of null models to disentangle the effect of spectral, phasic, and phase-amplitude interaction effects. Our decomposition requires no assumptions on the data (beyond that distinct samples within the data are independent), and is easy to compute. As a proof of concept, we provided an example of the decomposition yielding novel results on some neuroimaging data, more nuanced than what was previously possible with a standard analysis of LZ complexity.
Since this decomposition can be applied to any observed difference between two datasets, it promises to be a valuable tool for practitioners in multiple scientific disciplines. Moreover, it will help to deepen our understanding of the behaviour of non-linear properties on datasets describing complex systems.
The authors thank Lionel Barnett and Anil Seth for valuable discussions, and two anonymous referees for comments on earlier versions of this manuscript. We also thank Aleksi Ikkala and Darren Price for vital background work, and Yike Guo for supporting this research. P.M. and D.B. are funded by the Wellcome Trust (grant no. 210920/Z/18/Z). F.R. is supported by the Ad Astra Chandaria foundation. D.B. conceptualised the work. A.B.B. guided the writing of the paper.
Let us start by considering the problem of testing if an observable of interest f (x) ∈ R depends only on the power spectrum of a single dataset composed by the time series x = (x 1 , . . . , x T ) ∈ R T . The discrete Fourier transform of x is denoted asx = F{x} ∈ C T , with amplitude A(x) = |x| ∈ R T and phase φ(x) = arctan Im(x)/Re(x) ∈ [0, 2π] T . A simple procedure, known as phase randomisation [1], is to compare the value of f (x) against a null distribution given by the random variable f (x pr ), where x pr is surrogate data obtained by taking the Fourier transform of x, adding an independent random phase to each component, and then taking the inverse Fourier transform. Hence, x pr = F −1 {A(x)e iφ } where φ is a random vector of uniformly distributed phases, denoted by φ ∼ R. (Technically, half of the entries of φ are uniformly distributed over [0, 2π] while the other half are their complex conjugates, so that x pr is a real vector [2].) Accordingly, the null hypothesis that there is no genuine non-linear structure is rejected if the quantity f (x) − E{f (x pr )} is significantly different from zero [3].
Let us now consider a slightly more complex scenario with two time series x and y, and consider whether the difference δ := f (x) − f (y) can be attributed solely to differences in their spectra. A naive approach to address this problem -considered here for illustration purposeswould be to compare δ against the null distribution that corresponds to the random variablẽ where x pr = F −1 {A(x)e iφ1 } and y pr = F −1 {A(ŷ)e iφ2 }, with φ 1 , φ 2 ∼ R being statistically independent ofx,ŷ, and of each other. One would then reject the null hypothesis if δ − E{δ} is significantly different from zero. This is equivalent to testing the difference between the "corrected" values f (x) − E{f (x pr )} and f (y) − E{f (y pr )}. Unfortunately, this test generally fails because the difference between two null models is not necessarily a good null model of the difference. More specifically, the differencesδ that are seen when injecting randomised phases To generate phases we used a simple model with a roughness parameter c, that interpolates between constant phase (c = 0) and fully random phases (c = 1). The Lempel-Ziv complexity [5] of signals generated with both spectra and the same phase is shown in blue (bottom) and red (top). c) Naive applications of typical surrogate methods, like the one described in Eq. (1), incorrectly reject the null hypothesis when applied to two time series with identical phases but different spectra.
may not be representative of the differences that are seen when injecting phases from other (plausibly more realistic) distributions.
To illustrate these ideas, let's consider two given power spectra, shown in Fig. 1a, and generate random phases via a simple model equipped with a roughness parameter c ∈ [0, 1] that interpolates between a constant phase (c = 0) and fully random phases (c = 1) [4]. With this, we can build (x c , y c ) pairs by combining the two spectra in Figure 1a and the same phase φ c generated with a given c ∈ [0, 1]. The null hypothesis is true (by construction) for all these pairs, as x c and y c only differ in their spectra. However, as shown in Fig. 1c, the test in Eq. (1) results in a large number of false positives.
To understand this result, note that the value of E{f (x c ) − f (y c )} (i.e. the expected value of δ under the null hypothesis) roughly corresponds to the gap between the point clouds in Fig. 1b. However, the test in Eq. (1) is related to the above quantity when c = 1. The crux is that the two spectra "max out" at different values of f , making the quantity E δ non-zero and, in turn, introducing a bias in the test for c < 1.

THE DECOMPOSITION IS INVARIANT TO ORDERING OF THE NULL MODELS
Here we prove that our proposed decomposition does not depend on the ordering in which phasic and spectral effects are considered. In particular, instead of the studied sequence of null models given by M i → M φ → M A , one could also consider M i → M A → M φ where the effect of spectrum is considered before the effect of phase. In the rest of this section, we prove that this second sequence of null models gives the same decomposition.
is equivalent to the decomposition given by Proof. Because both decompositions start with M i , it is clear that the term correposponding to spectral-phasic interaction is equivalent. Therefore, it is enough to show that either the spectral or the phasic contribution is the same, as proving one would imply the other. Our strategy is to prove that the spectral component assessed in the third step in D 1 , given by ∆ A (x N , y M ), is equal to the spectral component estimated in the second stage of D 2 given by ∆ A (x N , y M ).
To prove this, let us introduce x y j = F −1 {A(x j )e iφ(ŷ β ) } and y x k = F −1 {A(ŷ k )e iφ(xα) }, where α, β are integers sampled at random from {1, . . . , N } and {1, . . . , M }, respectively. Put simply, x y j and y x k have the spectrum of one process combined with a randomly sampled phase from the other. Furthermore, let us use the notation ν c (x N ) = 1 N N j=1 f (x y j ) and ν c (y M ) = 1