Neuronal Temporal Filters as Normal Mode Extractors

To generate actions in the face of physiological delays, the brain must predict the future. Here we explore how prediction may lie at the core of brain function by considering a neuron predicting the future of a scalar time series input. Assuming that the dynamics of the lag vector (a vector composed of several consecutive elements of the time series) are locally linear, Normal Mode Decomposition decomposes the dynamics into independently evolving (eigen-)modes allowing for straightforward prediction. We propose that a neuron learns the top mode and projects its input onto the associated subspace. Under this interpretation, the temporal filter of a neuron corresponds to the left eigenvector of a generalized eigenvalue problem. We mathematically analyze the operation of such an algorithm on noisy observations of synthetic data generated by a linear system. Interestingly, the shape of the temporal filter varies with the signal-to-noise ratio (SNR): a noisy input yields a monophasic filter and a growing SNR leads to multiphasic filters with progressively greater number of phases. Such variation in the temporal filter with input SNR resembles that observed experimentally in biological neurons.


I. INTRODUCTION
The brain must generate behavior in the face of physiological delays on multiple levels: from sensory transduction to axonal conduction to synaptic transmission and muscle activation.To compensate for such delays, it would be useful even for individual neurons to predict future inputs.Not surprisingly, the paradigm of optimal prediction has been used to derive normative models of neurons.Such models can be loosely classified into two categories [1,2]: predictive coding [3][4][5][6] and predictive information [7,8].
In predictive coding, a neuron computes an optimal prediction of the signal and subtracts it from the actual value, transmitting prediction error downstream [3][4][5].This results in an optimal linear filter whose shape depends on the statistics of the input.Predictive coding approximately accounts for the change in the temporal receptive field with the input SNR (Fig. 1A) but fails to reproduce the exact shape of the filter because it has a narrow unitary peak at the exact time of the actual signal (usually present time) [3].Such peak can be smoothed by adding a de-noising objective but that introduces additional parameters to the model.
In predictive information, a neuron filters the past signal to preserve only the part which is most informative about the future [2,7,8].Such framework expands the assortment of temporal filters depending on the various * Equal contribution parameters such as the balance between the past/output and output/future mutual information.Recently, it was shown that multi-phasic filter can be derived using this formalism as well [9].
A shortcoming of both approaches is that they require setting the time point relative to the present for which the neuron generates prediction.Such formulation of the prediction problem does not seem suitable when neurons might have different amounts of delay, or the goal is to predict (and act upon) a trend or a mode of the signal that spans multiple time points in the future.Moreover, both approaches view the neuronal filter as a trade off between different mathematical terms in the optimization objective and therefore depend on the relative weighting of these terms.Therefore, the shape of the temporal kernel depends on these -as well as other -hyperparameters of the model.
An alternative approach to prediction assumes that the sensory stimuli are generated by a (generally nonlinear) dynamical system.If such dynamics can be learned by the brain from previously seen trajectories, then prediction can be cast as identifying long-living (growing or slow) modes (or manifolds) in the input and extrapolating them into the future.To take the first step in this direction, we take advantage of the fact that in the vicinity of a hyperbolic fixed point the invariant manifolds can be well approximated by the invariant subspaces of the linearized dynamics (see e.g.Sec.19.12a of [10]).Therefore, a neuron that identifies the least decaying linear mode can identify an invariant manifold of a fixed point and generate a non-trivial prediction.Here, we explore the hypothesis that a neuron extracts the least decaying mode from the incoming scalar time series as in Normal Mode Decomposition [11].To cast this problem in the language of dynamics, we construct time-lag vectors by windowing the scalar time series.We assume that the system is in the vicinity of a hyperbolic fixed point and identify the linear dynamics of the lag vectors.The eigen-decomposition of the linear transition matrix yields the eigenmodes that evolve independently as the right eigenvectors and the corresponding linear filters as the left eigenvectors.Therefore, the top left eigenvector yields a linear filter that projects onto the most dominant mode, representing neuronal output.Thus, unlike the previous predictive coding and predictive information approaches which optimize prediction for a certain time in the future, in our proposal, a neuron learns a rank-1 approximation of the dynamical system generating the input and outputs the top mode which develops into the future in a predictable way.
Our theoretical proposal makes a prediction that can be compared with experimental data.We find that the shape of the top left eigenvector depends on the input SNR: monophasic filter at low SNR with adding phases for growing SNR (Fig. 1C).This prediction is reminiscent of the SNR-induced variation in the form of the Spike-Triggered Average (STA) of various types of biological neurons (Fig. 1AB) [3,[12][13][14][15][16].Although this does not prove our proposal it suggests that this approach may be a step in the right direction.

II. NORMAL MODE DECOMPOSITION (NMD)
In this Section, we review NMD [11] in the context of a neuron with a Single Input and a Single Output (SISO) [17].

A. Problem statement and linearization
Let {x(t)} t=1,2,... be a scalar time series satisfying the non-linear relation where f : R n → R is a continuously differentiable function and n ≥ 1 is referred to as the lag.We say x * ∈ R is a fixed point if x * = f (x * , . . ., x * ).We embed a scalar time series x(t), t = 1, . . ., T + n, into a sequence of lag vectors The dynamics of the first n − 1 components of the lag vector is given by a simple shift in the time-step (i.e.
and the dynamics of the final component x(t) is given by Eq. (1 ).These can be summarized as x t+1 = F(x t ), where the function F is given by (2) We assume that the dynamics are dominated by the presence of a hyperbolic fixed point at x * .We approximate dynamics (2) by expanding around this fixed point.Defining δx t = x t − x * , we have: where we have used the fact that F(x * ) = x * and ∇F(x * ) is the Jacobian evaluated at x * .After subtracting x * from both sides: Letting A := ∇F(x * ) and assuming without loss of generality that x * = 0, we have the following linear approximation to the nonlinear dynamics in equation ( 2) near the fixed point: From the definition of the function F in Eq. ( 2), we see that A = ∇F(0) takes a companion matrix form [18] where

B. Eigendecomposition of the dynamics
The matrix A ∈ R n×n is generically a non-normal matrix possessing an eigendecomposition: where λ i are the eigenvalues and w i (resp.v ⊤ i ) are the right (resp.left) eigenvectors.Here, we consider matrix A describing a hyperbolic fixed point (λ i ̸ = 1).Moreover, we assume, that all eigenvalues are real and are sorted in descending order λ i ≥ λ i+1 .We also assume that the top mode is non-degenerate (i.e.λ 1 > λ 2 ).
The normal modes are found by eigendecomposition of A. As a matrix of companion form, A is diagonalized using the Vandermonde matrix and its inverse [18]: where V is the Vandermonde matrix and Λ := diag(λ 1 , . . ., λ n ) is the diagonal matrix of eigenvalues λ i , which are the roots of the characteristic polynomial with coefficients c i .The left eigenvectors of A, are given by the rows of the inverse of the Vandermonde matrix V −1 .It is easily verified that the right eigenvectors of A, corresponding to the columns of the Vandermonde matrix, represent the individual modes of the dynamical system.

C. Projecting onto the dominant mode
We are interested in finding the dominant exponential mode of the dynamics.As stated above, the dominant exponential mode is represented as the top right eigenvector of A. Therefore, by projecting onto the top right eigenvector, we can isolate the dominant mode.
Because of the bi-orthogonality of the left and the right eigenvectors, the top left eigenvector (henceforth referred to as v 1 ) is orthogonal to all but the top right eigenvector.This allows us to use v 1 as a projector that zeros out all but the most dominant mode of the dynamics.We therefore propose that it is the neuron's goal to learn the top left eigenvector and project its input onto this vector.
In order to learn the top left eigenvector, one approach would be to first find matrix A and then perform the eigendecomposition on the inferred A matrix.This matrix can be found from data x(t) by minimizing the mean squared error: Eq. ( 6) can be solved via the following system of equations: where we introduce a matrix notation X = [x n , ..., x T +n−1 ] and X + = [x n+1 , ..., x T +n ].
In this paper, however, instead of directly solving for A via Eqs.( 6) and ( 7), we substitute the eigendecomposition, Eq.( 5) and multiply both sides by v ⊤ i on the left, leading to the equivalent generalized eigenvalue problem: This allows us to circumvent the intermediate step of computing the A matrix explicitly.

III. DETECTING THE DOMINANT MODE
In this Section, we apply our formalism to time-series synthetically generated by a linear dynamical system of order k.

A. Problem formulation
As stated in the introduction, we assume that we are in the vicinity of a hyperbolic fixed point of the dynamics of the lag vector.This means that the dynamics of the linearized system can be decomposed as the sum of k exponentials with real exponents determined by the Jacobian of f at the fixed point and coefficients determined by the initial conditions of the system.We also assume that an uncorrelated observation noise (see Fig. 2): In general, NMD extracts the exponents via an eigendecomposition, i.e. the eigenvalues of A, λ i = e ai .Since a neuron has a single output it can extract only one constituent.Because the greatest contribution to the future is given by the largest exponent (Figure 2) we propose that a neuron learns the top left eigenvector of A and projects its lag vector, x t , onto this eigenvector which can then be used for prediction and control.

B. Dependence of the filter on the noise level
In this section, we look at the effect of noise on the shape of the time kernel.We first discuss the effect of additive Gaussian noise analytically, we then verify this numerically on synthetically generated data.Previous work analyzing the application of NMD to data with observation noise have primarily focused on extracting the true eigenvalues and eigenvectors of the system, that is the dynamic modes of the dynamical system in the absence of noise [19,20].Here, we are interested in extracting the most dominant mode from noisy data, and as we will see (Fig. 5), the eigenvector of the noiseless system is not a good candidate for this task.
a. Analytical calculation.Let us assume that the time series given in Eq. ( 9) has additive white observation noise, that is we assume that X satisfies Eq. (3) but we observe X ϵ = X + η with η ∼ N (0, ϵ 2 ).We denote the noiseless data and ground truth dynamics by X and A, and call the data and the inferred dynamics in the presence of noise X ϵ and A ϵ .When averaging over the different draws of the noise, we have E , where I k is the identity matrix and S is the matrix of off-diagonal ones: Note that here E η denotes averaging only over the noise draws and the averaging over the samples is performed in the product of data matrices X + X ⊤ and X + X ⊤ .The putative A ϵ matrix (that is the A matrix that minimizes the MSE objective with noisy data X ϵ ) is now given by That is, given the ground truth dynamics and noiseless data covariances (i.e.A and XX ⊤ ), we can determine the inferred dynamics and eigenvectors in the presence of white noise.We provide further details of this computation in the supplementary materials Sec. A. While the eigenvalue equations are derived analytically, they are not solvable in closed form and we look at their numerical solutions for varying noise and exponents.Here, we summarize the results and provide some intuition.
When looking at the eigenvalues of A ϵ (Figure 3), we see that as the noise level ϵ increases, the real parts slowly decrease and eventually cross zero.At the same time, the imaginary parts of the eigenvalues start at zero and grow when the real part becomes negative.This is intuitively understandable: when the increased amount of noise reduces the ability of the system to differentiate different modes, it starts replacing these with complex oscillatory modes.Furthermore, at the noise value for which a mode's eigenvalue's real part crosses zero, we see that the top left eigenvector's number of phases decreases by one.This is due to the fact that the multi-phasic structure of the left eigenvectors are responsible for the bi-orthogonal structure of the left/right eigenvectors.
b. Numerical verification.Above we looked at the eigenvectors of A ϵ , i.e. after averaging over different noise draws.Here, we look at these for specific noise draws, that is for specific simulations of linear dynamical systems.We looked at the response of the algorithm to synthetic data constructed from dynamics in the vicinity of a hyperbolic fixed point.We initiated a number of trajectories with different initial conditions and noise draws.Our algorithm finds the top eigenvector on the dynamics inferred from these trajectories.We then applied the learned eigenvector to a previously unseen trajectory.Figure 2(Right) shows the projection of this time series onto the largest left eigenvector and we see that we correctly recover the dominant exponential.For details of this experiment and further results and figures see supplementary materials Sec. A.
Figure 4 shows the effect of various parameters such as the noise, the system order, and the lag vector length on the shape of the filter.We see that adding noise decreases the number of the phases as expected (Fig. 4A).Similarly as we decrease the system order k, the number of phases decrease (Fig. 4B) as expected from the bi-orthogonality argument given at the end of Sec.II. Figure 4C shows that with a finite amount of noise, as the length of the lag vector is increased, the number of phases can slowly increase.This is due to the fact that with longer lag vectors, there is higher noise tolerance.Indeed as we take the noise values to zero, we see that the lag vector length no longer affects the number of phases in the filter.Increasing the length of the lag vector merely stretches the filter shape (Fig. 6 in the supplementary materials).From these trajectories, we learn the top left eigenvector of the inferred A matrix.We use this as the time kernel of our proposed neuron.(C) We apply the time kernel to a previously unseen trajectory from the same dynamical system.(D) We extract the most dominant mode by convolving the time series with the computed time kernel.In this example, the presence of the dominant mode is clear from the projection starting at t = 1.However, if we only look at the time series in (C), the presence of a growing mode would not be clear until around time t = 2. Dynamics details: the time series is comprised of five different exponents with exponents given by {−1.5, −1, 0, 1, 1.5}.The time series is discretized with time-step 0.05.

C. Comparison with optimal projection under noisy observations
To verify that in the presence of noise our algorithm provides a sensible approximation of the dominant dynamical mode, in this section we discuss a noise-optimal projection alternative and provide empirical comparisons.Let x t be a time series and suppose our goal is to extract the dominant exponential mode by projecting x t onto the filter given by v (derived as the top eigenvector of (8) when there is no observation noise).We refer to v as the noiseless filter.Suppose, however, we do not have access to x t , but only a noisy observation y t = x t + n t , where n t is isotropic Gaussian noise with variance α 2 .What is the optimal projection of y t ?Consider the optimization problem where we have used the fact that n t is mean zero isotropic noise independent of x t .Solving for the optimal w we find: In order to apply this method, we need to know the noise variance α and also the noise-free projector v and covariance C xx .Computing these noise-free quantities is challenging, especially in a problem where the dynamics might be changing over time.Furthermore, computing the full covariance matrix C xx requires more samples than just computing the top subspace in our generalized eigenvalue problem (Eq.( 8)).In practice, we find that even if we know α and v, our method gives comparable performance to this noise-optimal solution (Fig. 5).Furthermore, our method provides a better estimate of the most dominant mode than the non-adaptive method which uses the true (i.e.noiseless) eigenvector.In practice, in order to compute the noiseless filter v, we set the noise variance α = 10 −6 .FIG. 4: Dependence of the filter shape for the rank 5 system in Fig. 2 on system parameters: (A) noise amplitude, (B) system order (the number of exponential constituents, k), and (C) length of the lag vector, n.In (B) and (C) the noise is fixed at 0.01%.In (B), to achieve a comparable system of lower rank k, we keep the top k exponents of the system.FIG.5: Comparison of the performance with respect to the mean square error between the recovered dominant exponential and the ground truth.The 'noise optimal' solution is given by the filter in Eq. ( 11) and 'noiseless filter' refers to using the filter compted in the absence of noise but applied to noisy observations.The system is the same as in Fig. 2.

IV. DISCUSSION
We propose to model a neuron on an algorithmic level as performing a NMD on its input.Specifically, the temporal filter of a neuron is given by the top left eigenvector of the generalized eigenproblem formulated in terms of covariances of lag vectors.The neuron outputs the fastest growing mode present in the input thus predicting a future trend.
How could a biological neuron find the top eigenvector of the generative matrix?One possibility is for a neuron to solve the generalized eigenproblem (8) using a power iteration.Specifically, this would require two types of history-dependent active conductances (ion channels) which encode input covariances, X + X ⊤ and XX ⊤ , with opposite signs (depolarizing and hyperpolarizing).Then the solution of (8) would be given by the voltage that balances the two currents.The output of a neuron would reflect such voltage, thus projecting the input on the top left eigenvector of the transition matrix.
Whereas a single neuron computing a scalar signal can represent only one eigenmode, multiple neurons may extract multiple modes (not just the top eigenmode).For each neuron to represent a different mode their output must be decorrelated, for example, by the lateral inhibitory connections.The strengths of such connections can be adjusted using biologically plausible local learning rules as has been shown by some of the authors previously in the context of extracting eigenmodes of the multichannel covariance [21][22][23].Also, we proposed a framework for deriving multichannel neural networks for solving symmetric generalized eigenvalue problems [24] and the approach can potentially be extended to solve non-symmetric generalized eigenvalue problems.We anticipate that a similar approach can be applied to extracting eigenmodes of the time-lagged dynamics using local learning rules.
Because the matrix A in Eq.( 3) is generically nonnormal, its eigenvectors are not guaranteed to be orthogonal.In practice, this means that they are not robust to perturbations to the elements of the matrix A. This requires particular care to ensure that the eigenvalues are sufficiently distinct as determined by the pseudospectrum of A [25].
Neuronal projection of its input onto the subspace corresponding to the fastest growing mode has an interpretation in terms of the phase portrait of the generative dynamical system.Close to a hyperbolic fixed point the dynamics are linear and the fastest growing mode would correspond to the unstable invariant subspace which evolves into an attracting manifold away from the fixed point.Therefore, the sign of the projection onto the unstable subspace predicts along which unstable manifold the future trajectory of the system will develop beyond the linear regime.If the neuronal output is rectified, it effectively assigns the trajectory to a particular future state.Output of a layer of such rectifying neurons becomes a latent vector variable that can in turn be an input to the next layer.This opens a path to stacking the layers of such neurons to discover more and more abstract latent variables.

FIG. 2 :
FIG. 2: Problem formulation.(A)We generate a number of trajectories in the vicinity of a hyperbolic fixed point.Each trajectory is comprised of a number of growing and decaying exponentials and white noise (see panel C).(B) From these trajectories, we learn the top left eigenvector of the inferred A matrix.We use this as the time kernel of our proposed neuron.(C) We apply the time kernel to a previously unseen trajectory from the same dynamical system.(D) We extract the most dominant mode by convolving the time series with the computed time kernel.In this example, the presence of the dominant mode is clear from the projection starting at t = 1.However, if we only look at the time series in (C), the presence of a growing mode would not be clear until around time t = 2. Dynamics details: the time series is comprised of five different exponents with exponents given by {−1.5, −1, 0, 1, 1.5}.The time series is discretized with time-step 0.05.

FIG. 3 :
FIG.3: Analytically derived eigenvectors and eigenvalues for the lag n = 5 for different noise levels.The x axis for the eigenvalue plots denotes the eigenvalues sorted from largest to smallest real part.System with exponents {0.7, 0.2, −0.1, −0.4,−1.6} and time-step equal to 1 for simplicity.

FIG. 6 :
FIG.6:The dependence of the filter shape vs the length of the lag vector at low noise values (0.00001%).

FIG. 7 :
FIG. 7: We reproduce Figure 2 with imaginary parts in the exponents.(A) Imaginary parts in all trajectories except the one with the largest eigenvalue.(B) Imaginary parts in all trajectories including the one with the largest eigenvalue.(C) Imaginary parts in all trajectories including the one with the largest eigenvalue, but lower noise level.