Fourier-domain transfer entropy spectrum

We propose the Fourier-domain transfer entropy spectrum, a novel generalization of transfer entropy, as a model-free metric of causality. For arbitrary systems, this approach systematically quantifies the causality among their different system components rather than merely analyze systems as entireties. The generated spectrum offers a rich-information representation of time-varying latent causal relations, efficiently dealing with non-stationary processes and high-dimensional conditions. We demonstrate its validity in the aspects of parameter dependence, statistic significance test, and sensibility. An open-source multi-platform implementation of this metric is developed and computationally applied on neuroscience data sets and diffusively coupled logistic oscillators.


Introduction
Measuring causality between systems is critical in numerous scientific disciplines, ranging from physics [1] to neuroscience [2]. The challenge of such measurement arises from a widespread situation where only a limited set of time series of systems are given, and the prior knowledge of their coupling relations remains absent. Over last decades, identifying and quantifying causal relations with insufficient information has become one of the frontier problems in physics, mathematics, and statistics [3]. Substantial progress has been accomplished in measuring causality from statistics-theoretical (e.g., Granger causality [4] and its generalizations [5,6,7,8,9,10]) and information-theoretical (e.g., transfer entropy [11] and its generalizations [12,13,14,15,16,17,18,19,20]) perspectives. Although the nature of causality remains controversial [21], these two perspectives have been demonstrated to have non-parametric connections in mathematics [22,23] and share equivalent ideas in potential causality detection. Following their ideas, the existence of causality between systems X and Y (e.g., X is the cause of Y) can be interpreted as where P (·) denotes the probability. We mark X [t,β,∆t] = (X (t − β∆t) , . . . , X (t − ∆t)) as the history of X with time lag unit ∆t and maximum lag number β, representing the information of system X during a time interval [t − β∆t, t − ∆t]. In general, inequality (1) means that the uncertainty of Y at moment t given its own historical information (term Y [t,β,∆t] ) will be regulated if the historical information of X (term X [t,β,∆t] ) is added.
Among various causality metrics, transfer entropy [11] features plentiful applications (e.g., in neuroscience [24,25,26,27,28] and economics [29,30,31,32,33]) because of its robust capacity to capture non-linear causality and intrinsic connections with dynamics theory and information theory [11,3]. The initial version of transfer entropy can be defined as actually equivalent to conditional mutual information [34,3]. Equality (2) quantifies the regulating effects in inequality (1) in terms of the expectation of the mutual information between Y (t) and X [t,β,∆t] given Y [t,β,∆t] . While the model-free property of transfer entropy suggests its general applicability, its application in many real situations is still limited by various deficiencies, such as dimensionality curse [15] and noise sensitivity [35]. Although extensive efforts have been devoted to optimizing transfer entropy estimation in real data (e.g., through symbolization [12], phase time-series [13], ensemble evaluation [16], Lempel-Ziv complexity [18], artificial neural networks [19], and k-nearest-neighbor [36]), several perspectives remain to be improved: (I) Previous works primarily analyze systems as entireties, yet it is more ubiquitous that causality only originates from specific system components (e.g., there only exists causality between the low frequency components of X and Y). Such causality may be veiled if we do not distinguish between different system components; (II) Causality is time-varying. Although the initial version of transfer entropy can be calculated accumulatively or in sliding time windows, a more natural and accurate measurement remains absent; (III) Probability density estimation, a prerequisite of transfer entropy calculation, may be costly (e.g., critically requires a large sample size) or inaccurate (e.g., in non-stationary systems and high-dimensional spaces). Although some approaches (e.g., ensemble evaluation in cyclo-stationary systems [16,37] and k-nearestneighbor estimation [36] in high-dimensional spaces) are developed to optimize the estimation, a low-cost transfer entropy generalization born to deal with non-stationary systems is demanded.
In this letter, we discuss the possibility of generalizing transfer entropy into a form that maintains its original advantages and makes progress in aspects of (I-III).
To devote to open science, we release a multi-platform code implementation of our metric and demonstrate its validity on multiple data sets in neuroscience and physics.
2 Fourier-domain representation of systems An arbitrary system X can be subdivided into a set of components according to different criteria. To maintain the model-free property of transfer entropy measurement, we concentrate on a universal criterion subdividing X into different frequency components, namely Fourier-domain (frequency-domain) representation. Such a representation features widespread applications in physics [38], mathematics [39], neuroscience [40], and engineering [41], partly ensuring the general applicability of our approach.
To avoid the stationary assumption, one can derive the Fourier-domain representation F (X, t, ω) of X (here ω denotes frequency) applying wavelet (WT) [42] or Hilbert-Huang transform (HHT) [43] rather than Fourier transform (FT) owing to the limitations of FT for non-stationary processes [44]. In general, WT convolves X with oscillatory kernels of different frequency bands to realize time-frequency localization [42] while HHT directly measures the instantaneous frequency of X [43]. Although WT and HHT support time-frequency analysis irrespective of stationary assumption, they are still not ideal owing to certain limitations and should be selected according to circumstances [45,43]. Here we choose WT to conduct our derivations owing to its complete mathematical foundations, yet other frameworks (e.g., HHT and its variant [46]) can also be applied.
In terms of continuous WT spectrum, the Fourier-domain representation F (X, t, ω) can be obtained through [47,48,49] where ψ ∈ L 2 (R) denotes a mother wavelet function (e.g., Morlet wavelets) and notion ♥ represents conjugate. Notions s and τ denote the scaling (frequency localization) and the translation (time localization) parameters, respectively. Function ζ ψ (·) is the bijective mapping from frequency ω to scale s, varying according to the selection of mother wavelet function ψ. Because the approaches in (3-4) have been extensively studied and applied [47,48,49], we do not repeat their derivations.
The Fourier-domain representation F (X, t, ω) can function as a time-frequency spectrum (t, ω) of system X, where spectrum value denotes the power. In Fig. 1a, we show the Fourier-domain representation of an open-source near infrared spectroscopy (NIRS, a kind of electromagnetic-spectrum-based brain imaging approach) data set recorded in the superior frontal cortex [50,51]. Function ψ is set as the Morlet wavelet. Apart from F (X, t, ω) (subject 1) and

Fourier-domain transfer entropy spectrum
Once F (X, t, ω) and F (Y, t, ω) have been given, we turn to quantify the time-varying latent causality between them in terms of transfer entropy. Below, we discuss one possibility of this measurement.
To overcome the deficiency of probability density estimation (see (III)), we transform these spectra by symbolization. Specifically, we generalize the approach in [52] to implement a two-dimensional symbolization F (X, t, ω) with a time symbolization order λ and a frequency symbolization order θ The above formulas map each F (X, t, ω) to a two-dimensional coordinate Λ X (t, ω) , Θ X (t, ω) . The first coordinate component Λ X (t, ω) corresponds to the symbolization along the time-line while the second one corresponds to the symbolization along the frequency-line. Both coordinate components are decimal numbers. The symbolization relays on a permutation γ (·) that returns the order of a sequence in ascending sort (e.g., γ (1, 6, 3, 10) = (1, 3, 2, 4) because 1 < 3 < 6 < 10). If the order minus one (e.g., (1, 3, 2, 4) becomes (0, 2, 1, 3)), then it becomes a number in λ-base (or θ-base) and can be transformed to a decimal number following (6-7). In Fig. 1b, we show the symbolization results of the NIRS data under two different conditions.
The symbolization benefits to our research by embedding F (X, t, ω), a space with large cardinal number (there are a mass of possibilities of the spectrum value of F (X, t, ω) because the power in (3) is a continuous variable), to a space with much smaller cardinal number ( F (X, t, ω) is a discrete space whose cardinal number is no more than λ!) [52]. It functions as a coarse-graining approach to control noises and supports a more adequate sampling of the historical information (e.g., X [t,β,∆t] in (2)) during probability density estimation (because the variable becomes discrete and its variability is reduced) [17]. These benefits should be emphasized especially when X is non-stationary and a repetitive sampling is not infeasible. In sum, the symbolization in (5-9) can, to a certain extent, alleviate deficiency (III). Parameters λ and θ control the granularity of coarse-graining (e.g., see Fig. 1b). They should not be too large.
Otherwise, the symbolization is too fine-grained and unnecessary [53].
In Fig. 1c, we show the Fourier-domain transfer entropy spectrum of the NIRS data. The symbolization is implemented under a relatively coarse-grained condition (λ, θ) = (3, 3). Each T (X, Y, t, ω) is measured with β = 5. For convenience, we set each negative T (X, Y, t, ω) as 0 (this setting can be excluded when the decoupling effects are of interests). In our results, we can see relatively strong and oscillating causality between X and Y around ω ∼ 0.1, ω ∼ 0.25, and ω ∼ 0.4. The causality in other frequency regions is relatively weak. We also compare T (X, Y, t, ω) with the well-known wavelet coherence spectrum [54], which quantifies the time-varying and linear coherence between X and Y. Consistent with the common knowledge, the causality measured by T (X, Y, t, ω) can not be trivially simplified as linear coherence because these two metrics have no clear accordant pattern.  Fig. 1c-1d. (c-d) The Fourier-domain transfer entropy spectrum and its effect size is manipulated by the coupling strength α.
Meanwhile, we can sum T (X, Y, t, ω) over frequency at each moment (see (11)), over time at each frequency (see (12)), or over both time and frequency (see (13)).
Formulas (11)(12)(13) denote the varying Fourier-domain transfer entropy across time, frequency, and all possibilities, respectively. Note that T (X, Y) is always non-negative. Combining (11)(12)(13) with the asymmetry property of transfer entropy, we define to quantify the directional information transfer between X and Y. In Fig. 1d, we illustrate these quantities based on the results in Fig. 1c. A stronger information transfer from X to Y can be observed in both time-and frequency-line, surpassing the transfer from Y to X. Such a directional transfer is significant when ω ∼ 0.1.

Validity of Fourier-domain transfer entropy spectrum
After proposing the Fourier-domain transfer entropy spectrum, we turn to analyze its validity.
A valid causality metric is demanded to have clear dependence relations with all involved parameters. Although the Fourier-domain transfer entropy spectrum is a model-free metric and has no pre-assumption on system properties, the symbolization and the transfer entropy definition itself still introduce three parameters: λ, θ, and β. Serving as coarse-graining parameters, λ and θ determine the granularity of symbolization. Sufficiently large quantities of λ and θ may not only make symbolization meaningless [53] but also cause obstacles in probability density estimation (because the variability of variables increase). An in-exhaustive sampling frequently underestimates specific parts of the probability density, especially in high-dimensional spaces. Therefore, we speculate that T (X, Y, t, ω) may be underestimated when λ and θ become too large. In Fig. 2a, we calculate T (X, Y, t, ω) in the NIRS data with different parameters. For convenience, we set each negative T (X, Y, t, ω) as 0. Consistent with our inference, quantities E t,ω T (X, Y, t, ω) (expectation), Var t,ω T (X, Y, t, ω) (variance), and T (X, Y) approximate to 0 as λ and θ increase. Similar effects can be observed on the lag parameter β but become much more slight. In sum, the Fourier-domain transfer entropy spectrum is expected to be implemented with relatively small λ and θ. The selection of β is less limited and can be carried out according to different demands.
Moreover, a valid causality metric should be equipped with a statistic significance test. Information-theoretical metrics, including our Fourier-domain transfer entropy spectrum, frequently involve biases in finite data sets [55]. Therefore, the absolute value of T (X, Y, t, ω) only has limited meaning until being statistically tested. Combining the timeshifting surrogates [56,57,58] with the permutation test [59,60], we design the significance test as following: (1) Generate a set of surrogates {X δ |X δ (t) = X (t + δ) , δ ∈ N + } and calculate {T (X δ , Y, t)} t for each surrogate X δ ; (2) Implement permutation test for the difference in mean value between {T (X, Y, t)} t and δ {T (X δ , Y, t)} t to calculate the statistic significance p and the effect size ξ. A similar idea can be seen in [61]. In practice, a weakest condition with p ≤ 10 −2 and ξ ≥ 0.2 (in the comparison of mean values, ξ ∼ 0.2, ξ ∼ 0.5, and ξ ∼ 0.8 correspond to small, medium, and high effect sizes, respectively) should be satisfied if the Fourier-domain transfer entropy spectrum is statistically significant. In Fig. 2b, we test the results in Fig. 1c-1d, where 50 surrogates are generated by randomizing δ ∈ [3,100]. The generated δ {T (X δ , Y, t)} t features a large sample size of 191350, supporting a robust statistic test. We implement a large scale permutation test with 5000 permutations, demonstrating the statistic significance of T (X, Y, t) and T (Y, X, t) with ideal effect sizes. Meanwhile, the effect size on T (Y, X, t) is smaller than that on T (X, Y, t), corroborating our observations in Fig. 1d about the directional information transfer in the NIRS data.
Finally, a valid causality metric is expected to be sensitive to the changes of coupling strength between X and Y. In other words, T (X, Y, t, ω) should be regulated as consequences if we manipulate the coupling strength. To realize the manipulation, we consider two diffusively coupled logistic oscillators X → Y in Fig. 2c-2d. Notion → denotes the direction of coupling. Oscillators X and Y are defined with a bifurcation parameter µ = 4, and their coupling strength is denoted by α. For more mathematical details, we refer to [62]. We manipulate the coupling strength α from 0 to 0.4 and apply an open-source algorithm [63] to generate the time series of X and Y under each α condition. The generation is repeated 50 times in each case, corresponding to different random initial values. For each pair of time series, we calculate the Fourier-domain transfer entropy spectrum with λ = θ = 3 and β = 1. Meanwhile, 5 surrogates are generated by randomizing δ ∈ [3, 100], leading to a large sample size 24985 in the statistic significance test. The number of permutations is set as 5000. In Fig. 2c-2d, we demonstrate that the Fourier-domain transfer entropy spectrum and its effect sizes (with p < 10 −2 ) increase with the coupling strength α. In other words, the quantified latent causality increases as a consequence of coupling strength enhancement.

Conclusion
We present the Fourier-domain transfer entropy spectrum, a model-free metric of causality, to extract the time-varying latent causality among different system components of an arbitrary pair of systems in the Fourier-domain. Built on the wavelet transform and symbolization, our approach can partly alleviate deficiencies (I-III), generally applicable in various non-stationary or high-dimensional processes (e.g., neuroscience data). We have demonstrated the validity of our approach in the aspects of parameter dependence, statistic significance test, and sensibility. An open-source release of our algorithm can be seen in [64].