Bridging the information and dynamics attributes of neural activities

The brain works as a dynamic system to process information. Various challenges remain in understanding the connection between information and dynamics attributes in the brain. The present research pursues exploring how the characteristics of neural information functions are linked to neural dynamics. We attempt to bridge dynamics (e.g., Kolmogorov-Sinai entropy) and information (e.g., mutual information and Fisher information) metrics on the stimulus-triggered stochastic dynamics in neural populations. On the one hand, our unified analysis identifies various essential features of the information-processing-related neural dynamics. We discover spatiotemporal differences in the dynamic randomness and chaotic degrees of neural dynamics during neural information processing. On the other hand, our framework reveals the fundamental role of neural dynamics in shaping neural information processing. The neural dynamics creates an oppositely directed variation of encoding and decoding properties under specific conditions, and it determines the neural representation of stimulus distribution. Overall, our findings demonstrate a potential direction to explain the emergence of neural information processing from neural dynamics and help understand the intrinsic connections between the informational and the physical brain.


I. INTRODUCTION
Understanding how the brain works is one of the most frontier directions at the intersection of biology and physics [1]. Till now, substantial progress has demonstrated that the brain can be treated as a dynamic system that processes information [2]. The brain is frequently driven out of equilibrium by external stimuli [3][4][5] or internal events [6,7] and creates multifarious dynamics [8][9][10][11][12]. In the meantime, when the dynamics is stimulus-triggered, the stimulus information is coded [13] and memorized [14] by the brain to support cognitive functions [15][16][17], making the brain an information system as well [18]. Research into such dual attributes of the brain features a long history. Extensive connections between the dynamics and information attributes have been discovered in the brain [4,[19][20][21], indicating that the cognitive functions that process external information (or referred to as information functions) are essentially rooted in neural dynamics [22][23][24].
The present research pursues to glance at the intrinsic relations between information and dynamics in the brain. To build an operable framework, we concentrate on the elementary neural information functions from encoding and decoding perspectives and the stimulus-triggered neural dynamics in neural populations (see Fig. 1). We attempt to implement a unified analysis of these elements and explore the emergence of the characteristics of neural information functions from the dynamics of neural ensembles. Technically, our research may contribute to developing a possible description of stimulus-triggered neural activities in neural populations, supporting the analytic measurement of dynamics and information quantities during neural information processing. Theoretically, the significance of our pursuit lies in the possibility for the analysis to explore the fundamental connections between the physical (dynamics aspect) and informational (information aspect) brain [1]. These connections may reveal a potential direction to study why the complex and remarkable characteristics of neural information processing and cognition can naturally emerge in the brain, a system of the neurons that only have elementary functions.
The paper is organized as follows. In Sec. II, we introduce a mathematical characterization of the stimulus- triggered neural activities during information processing, whose foundation has been established in our previous research [51]. Based on this characterization, the dynamic randomness and chaotic degree of stimulus-triggered neural activities are quantified in terms of Kolmogorov-Sinai entropy (Sec. III). After reviewing the quantification of neural information function attributes (e.g., encoding and decoding efficiency), we explore the substantive characteristics of information-processing-related neural dynamics and analyze the emergence of neural information function attributes from neural dynamics. Various potential connections between the information and dynamics attributes of neural activities are observed (Sec. IV). In Sec. V, we provide an integrated and multi-scale perspective for our theoretical framework and computational findings. We attempt to verify our discoveries' validity and generalization ability by relating them with existing experiment-validated studies or proposing mechanistic insights into why they arise. Finally, we discuss several potential directions and remaining challenges for future explorations. While we concentrate on physical pictures and neuroscience backgrounds throughout the paper, one can find the systematic description of all mathematical implementations in appendixes.

A. Neural population description
We begin with a neural population N (V, E), where V is the neuron set and E is the synapse set. The synaptic connection strength is defined by an adjacent matrix C (C ij ∈ [−1, 1]). We randomize V, E, and C for generality (see appendix A 1). In real neural populations, the stimuli will not be simultaneously received by all neurons. Stimulus information experiences a complex diffusion process among neurons, creating time differences for neurons to receive stimuli. Therefore, it is biologically reasonable to classify these neurons into input neurons (receive inputs directly) and intermediary neurons (triggered by their pre-synaptic neurons and process inputs indirectly) (see Fig. 2(a) for an example).

B. Neural activities of input neurons
Input neurons process stimuli directly, their activity profiles are mainly determined by their neural tuning properties. A standard characterization for the tuning property is the tuning curve [2,53], where the stimulus at the peak evokes the highest response rate [53]. In our research, each input neuron N i has a bell-shaped tuning curve G i (s) with a maximum response coefficient R i , a preferred stimulus s i , and a curve width σ i (see (A1) in appendix A 2 and Fig. 2(a)). Assuming a randomized stimulus sequence S occurs in a time interval [0, t ], we implement the neural activity characterization as the probability P i (r | S , 0, t) for the cumulative neural response count of input neuron N i to reach a specific quantity r at any moment t in [0, t ]. This probability can be efficiently approximated by the Poisson process [2]. Specifically, we use the tuning curve G i (s) as the intensity function of Poisson process to describe the response selectivity of N i on S . Such definition makes the Poisson process non-homogeneous, realizing a timevarying neural activity intensity controlled by neural tuning properties (please see Fig. 2(b) and (A2-A4) in appendix A 2). Given the probability distribution of the Poisson process, we can generate possible neural activities by predicting the arrival time of each neural response of N i . We implement the prediction with the maximum probability method (MP). This method estimates the arrival time of r-th neural response as the moment t r when the response frequency is most likely to reach r (see (A5) in appendix A 2). Fig. 2(b) illustrates an example of the predicted arrival time sequence and the time difference between each two neural responses. With the predicted sequence T i = { t r } r∈N + , we can mark every neural response of N i in the time line by R i (t), where The stochastic process of the neural activities of a randomly picked input neuron, the predicted neural response arrival sequence, and the time difference between each two predicted responses. (c) The pre-synaptic inputs, the estimated stochastic process of neural activities (h = 100), the observed neural response sequence, and the estimated tuning curve (for visualization, it is smoothed from the raw data utilizing the Savitzky-Golay filter [52]) of the intermediary neuron. (d) The estimated neural response train.
R i (t) = 1 stands for response and R i (t) = 0 stands for no response (see (A6) in appendix A 2).

C. Neural activities of intermediary neurons
Intermediary neurons are driven by their pre-synaptic neurons rather than direct stimulus inputs. It is unreasonable to limit their activity profiles by pre-setting their tuning curves. Obviously, their activities are significantly affected by the network dynamics, leading to obstacles for a priori stochastic characterization.
We develop a two-step statistical approach to overcome these obstacles. For an intermediary neuron N j that features a receptive field RF (N j ) (the set of pre-synaptic neurons), the first step is to predict the neural activity arrival sequence of each neuron N k in RF (N j ) and define the synaptic inputs given from N k to N j as R k (t) C kj (see Fig. 2(c)). By summing these inputs over all neurons in RF (N j ), we can obtain the total synaptic input Ψ j of neuron N j . Then we characterize the neural response of N j following where υ (·) denotes the unit step function. This integrateand-fire response mechanism has a dissipation term Ω (e.g., the leaky term in the leaky integrate-and-fire neuron [32]) and a neural response threshold Υ (e.g., spiking threshold [32]), implementing that N j emits a response only if cumulative synaptic inputs−dissipation > threshold (see (A7-A10) in appendix A 3). The concrete examples of this mechanism can be seen in existing deterministic models [33][34][35][36][37]. The second step is to treat the generated neural activities of N j as observed samples and repeat the generation with S for h times. This repeated sampling supports a maximum likelihood estimation for the intensity function of the neural activities of N j , constructing an observed distribution of the Poisson process (5) can be further applied to estimate the neural response arrival sequence and the neural tuning curve of N j (see Fig. 2(c) and (A11-A14) in appendix A 3). Such a twostep approach takes the advantages of deterministic models in describing inter-neuron interactions (see (4)). By estimating Poisson processes based on the generated activities in (4), the difficulties underlying a direct probabilistic description of network dynamics are avoided. For a summary, we define P ♥ n (r | S , 0, t) to describe the neural activities of neuron N n , where ♥ denotes the neuron type (♥ = 1 for the input neuron and ♥ = 0 for the intermediary neuron). The algorithm in appendix A 4 depicts our framework. Fig. 2(d) illustrates the estimated neural response train by this framework.

III. DYNAMICS IN STIMULUS-TRIGGERED NEURAL ACTIVITIES
A. Neural tuning Kolmogorov-Sinai entropy A remaining challenge is to develop a dynamicstheoretical metric for stimulus-triggered neural activities. Although a natural choice is Kolmogorov-Sinai entropy, which can characterize the randomness of a dynamic system forward in time [54], the implementation of this entropy in neural dynamics remains non-trivial. To simplify the calculation of the Kolmogorov-Sinai entropy, we reformulate the proposed Poisson processes as continuous Markov chains where parameter τ ≥ 0 measures the interval length of variation. We define that W ij (S , t, t + τ ) = W ij (S , t, t + τ ) P ♥ n (i | S , 0, t), in which W denotes the transition probability matrix (one can see (B1-B4) in appendix B 1 for details). Then, a new Kolmogorov-Sinai entropy H KS (N n , S , t, t + τ ) (referred to as neural tun-ing Kolmogorov-Sinai entropy) is proposed as a metric of the dynamic randomness of the activities of neuron N n in a time interval [t, t + τ ] (see (B5-B7) in appendix B 2). Note that the interval length τ should not be too small since any biological variation takes time in the neural system. Our research considers the cases where τ ≥ 1. Fig. 3(a) shows that the maximum possible variation amplitude of neural activities (the maximum change r − r with non-zero P ♥ n (r − r | S , t, t + τ )) is positively correlated with τ . In Fig. 3(b), we find that H KS (N n , S , t, t + τ ) is larger when τ is relatively small. Together, we can know that while short-term neural activities have relatively small amplitudes of variations, the dynamic randomness of those variations are more complex. In comparison, long-term variations feature larger amplitudes, but the variation tendency is relatively stable. Moreover, Fig. 3(c) shows the temporal and spatial distribution of H KS with τ = 1 (measures the shortterm dynamic randomness) on the population scale, and a spatial distribution of H KS with τ = 1 and t = 100.

B. Chaos in neural activities
An important property of the Kolmogorov-Sinai entropy is that it is bound by the summation of all the pos- itive Lyapunov exponents of the dynamic system (please see Pesin identity [55] and further see Ruelle inequality [56] for a generalization). This mathematical relation bridges between our metric and the Lyapunov spectra analysis [49,50]. Each Lyapunov exponent characterizes the separation or convergence rate of different infinitesimally close trajectories in the phase space of a dynamic system, and a positive Lyapunov exponent reflects the existence of chaos. The existence of such a property suggests that the neural activities are chaotic when the corresponding H KS is positive, and the chaos will be intensified when H KS increases (see appendix B 3).

A. Information-theoretical metrics reformulation
Our research concentrates on the relations between the stimulus-triggered neural dynamics and the elementary neural information functions (studied from the aspects of encoding and decoding). Our proposed neural activity characterization above supports the analytical calculation of encoding-efficiency-related and decodingefficiency-related metrics.
For neural encoding, we concentrate on the widelyused metrics such as total response entropy H (TE), noise entropy H (NE) and mutual information H (MI). They respectively measure the total variability of neural responses to stimuli, the inexplicable part of the variability by stimuli, and the explicable part of the variability by stimuli. Their classic definitions are summarized in [2]. In the present research, they are reformulated to be timedependent to fit dynamic situations (see (C3-C5) in appendix C 1). At each moment t, we can directly measure H (N n , t), H (N n , t), and H (N n , t) for every neuron N n . Meanwhile, we can measure the entropy of stimulus sequence (ES) as H S (S , t) (see (C6) in appendix C 1). Moreover, we can use H H (MI/TE) to measure the interpretability of neural activities based on stimuli, and use H H S (MI/ES) to measure the encoding efficiency for stimuli based on neural activities.
The neural decoding efficiency is measured by Fisher information (FI) [2,57]. In the present research, the measurement of the time-dependent Fisher information F (N n , t) of each neuron N n is proposed by (C16) in appendix C 2. Cramér-Rao bound suggests that Fisher information limits the accuracy with which any decoding technique can estimate about the target stimulus parameter based on neural activities [2]. Thus, the timedependent Fisher information F (N n , t) acts as the lower bound of the variance of any decoding technique applied on neuron N n . Any decoding scheme that reaches this variance bound is optimal [2]. B. Finding 1: The difference of dynamic randomness between the short-term and long-term variations in neural activities As suggested above, compared with the long-term (large τ ) variations of neural activities, the short-term (small τ ) variations have small amplitudes but much more complex dynamic randomness ( Fig. 3(a-b)). In Fig. 5(a), this finding is further verified under more general conditions. Based on the illustrated instances and statistical results, we find that the dynamic randomness measured by H KS of each neuron, and the diversity of dynamic randomness between neurons, are negatively correlated with τ ∈ [1,50] (the diversity of dynamic randomness is quantified by the variance Var (H KS ) among neurons). Such finding indicates that the shortterm neural activities have smaller variation amplitudes but more complex dynamic randomness, implying larger inter-neuron diversity in dynamic randomness. Opposite characteristics are featured by long-term neural activities, namely, larger variation amplitudes but more stable trends, and the activities of all neurons tend to be more homogeneous. As shown in Fig. 3(c), the spatial distribution of Kolmogorov-Sinai entropy H KS in the neural population is uneven. The dynamic randomness varies between neurons (no matter at any specific moment or through the time interval). To offer a more solid verification, we set another random neural population with 500 neurons (the ratio of input neurons to intermediary neurons is 1 : 1). We treat all input neurons as the input ports of this population and calculate the mean minimum distance (the shortest path length on the graph) between each neuron N n and these input ports (the distance is set as 0 when N n happens to be an input neuron). An intermediary neuron with a larger distance is treated as in a deeper layer, defining a direction from shallow to deep layers.
In the upper panel of Fig. 5(b), we analyze the dynamic randomness quantified by the mean H KS of each neuron (averaged through the time interval [0, 500]). We find that the neural spiking probability (approximated by the spiking frequency in the interval [0, 500]) declines along with the mean distance to input neurons. Because of the positive correlation between the mean H KS and the spiking probability, the mean H KS decreases from shallow layers to deep layers. In the bottom panel of Fig. 5(b), we concentrate on the dynamic randomness of the neural responses to stimuli. For each neuron, we define the mean H KS only using the raw data of H KS while it spikes. Then the newly calculated mean H KS of each neuron is normalized by dividing the corresponding spiking probability. This normalized H KS reflects the dynamic randomness when a neuron responds to stimuli (i.e., generates spikes) . The normalization prevents this metric from increasing with the spiking probability sharply. Thus, we can see that the normalized H KS relatively increases from shallow layers to deep layers.
Given the connection between H KS and chaos, a positive H KS suggests chaos in neural activities, and the chaos will be intensified with a larger H KS . Together, we conclude that the spatial distribution of chaos in neural activities is uneven. In the perspective of the mean H KS , the neural activities (including both spiking and resting) in shallow layers have intenser chaos, while those in deep layers have less intense chaos and are more stable. As verified by the normalized H KS , the stimulustriggered responses of shallow-layer neurons are more regular and stable, while those of deep-layer neurons are more chaotic.

D. Finding 3: The restrictive relationship between the encoding and decoding properties implied by dynamics
Now, we take information functions into analyses. Fig.  6(a) illustrates the temporal and spatial distributions of total entropy H (TE), noise entropy H (NE), mutual information H (MI) and Fisher information F (FI). In Fig. 6(b), we explore the relations between ∆H KS (τ = 1), ∆H, ∆H , ∆H , and ∆F, revealing that H and H frequently share the same variation trends with H KS (e.g., ∆H ≥ 0 and ∆H ≥ 0 frequently hold when ∆H KS ≥ 0). This finding meets our expectation because H KS and H both are the metrics of disorder and random degrees. Given the entropy of stimulus sequence H S , the minimum noise entropy is bound by H ≥ H − min (H, H S ). Therefore, H usually increases along with H when H S is given. However, the variation trends of ∆H and ∆F can not be predicted by ∆H KS completely.
To reveal the underlying patterns, we organize the experiment results as following: First, we average these parameters of each neuron through the time interval. Second, we arrange those averaged parameters of each neuron according to the mean H KS (Fig. 6(c)). Third, we do binning for H KS , and average F, H H (MI/TE) and H H S (MI/ES) with respect to those bins ( Fig. 6(d)).
In Fig. 6(c), we can see that H, H and H increase along with the mean H KS (both for input and intermediary neurons), while F has more complex variation trends. The two-cluster distribution of F can be explained by the two-class neuron type (input and inter-mediary). This phenomenon is in line with our expectations. The mathematical definition of F (see (C16) in appendix C 2) makes it depend on the intensity fluctuations of neural responses towards different stimuli. Consistent with previous studies [2,53], we discover that stimulus s can cause large fluctuations when it is located at the steep gradients around the peaks of neural tuning curves, leading to a large F towards it (see Fig. 6(d)). Because the neural tuning curves of intermediary neurons usually feature more peaks, these neurons frequently have higher F than input neurons.
The results in Fig (2) F and H H decrease while H H S continues increasing when H KS is in [0.2, 0.5] (the right side of black dashed vertical line). Note that these phenomena do not depend on the binning approach critically. Overall, we conclude that when the dynamic randomness is relatively small (H KS ∈ (0, 0.2]), the encoding (MI/ES) and decoding (FI) efficiency quantities share the same variation trend. When the dynamic randomness is relatively large (H KS ∈ [0.2, 0.5]), an either-or situation emerges between encoding and decoding efficiency since these quantities have opposite trends along with H KS (MI/ES increases but FI decreases). In other words, encoding efficiency has a restrictive relationship with the decoding efficiency under this condition since they can not be optimized synchronously.

E. Finding 4: The relation between neural dynamics and the representation for stimulus distribution
We further analyze how neural dynamics determines the neural representation of the stimulus distribution.
We have previously defined H H to measure the interpretability of neural activities based on stimuli, or more specifically, based on the global stimulus distribution. Here the interpretability is defined in terms of the mutual information H , measuring the synergy degree between neural activities and stimuli. As illustrated in Fig. 4, neural activities become completely explainable when H H approaches 1. To analyze the interpretability based on the local stimulus distribution, we introduce the conceptions of encoding scope µ and the local interpretability (Local MI/TE) (see (C8-C11) in appendix C 1). Based on the definition of H , we can directly obtain the noise entropy H s that refers to each stimulus s. For s, the processing of it by neuron N n produces less noise if H s < H , namely, it has better interpretability for the neural activities of N n . Then, we define the encoding scope µ at moment t as the proportion of the stimuli with better interpretability in all stimuli S (0, t) When µ approaches 1, it refers more to the global stimulus distribution; Otherwise, it refers more to specific local parts of stimulus distribution. By recalculating the total response entropy H µ , noise entropy H µ , and mutual information H µ only based on the stimuli in S µ , we can further calculate the interpretability H µ Hµ of neural activities based on S µ (referred to as the local interpretability).
In Fig. 6(f), we show the variation trend of µ with respect to H KS , suggesting that µ is negatively correlated with H KS . Meanwhile, the local interpretability (Local MI/TE) relatively increases along with H KS . Taken together, we can conclude that when H KS is in (0, 0.25] (the dynamics is more stable), neural activities can be better explained by the global stimulus distribution (larger encoding scope); Once the dynamic randomness becomes relatively large (H KS is in (0.25, 0.5]), neural activities can be better explained by the specific local parts of the stimulus distribution (smaller encoding scope). During this process, although the neurons with small dynamic randomness are mainly driven by the global stimulus distribution, they usually feature weaker neuron-stimulus synergy (lower Local MI/TE). This phenomenon is related to the low spiking probability of these neurons (see Fig. 5(b)), because their activation requires the inputs to contain enough global stimulus information (which can not be frequently satisfied). Opposite situations can be observed in the neurons with higher dynamic randomness, which feature stronger neuron-stimulus synergy (higher Local MI/TE).

A. Significance of our work
In the current research, we present an original theoretical framework and demonstrate the intrinsic connections between dynamics and information in neural activities.
In the theoretical part, we propose the stimulustriggered neural activity characterization as a bridge between the dynamics-theoretical and the informationtheoretical metrics. We build our characterization only on several basic and common neural characteristics, such as tuning properties [2,53] and neural spike mechanisms [32]. These settings enable us to model real neural populations at a biologically authentic level. The proposed framework takes the advantages of both stochastic [29][30][31][32] and deterministic [33][34][35][36][37] models to offer a practical description of the collective neural activities involved in information processing. Specifically, we drive neural activities by the combined effects of stimulus-neuron synergy and network dynamics. While neural tuning properties directly govern the stimulus-neuron synergy, the network dynamics is captured by estimating the stochastic process of neurons from the neural activity samples generated by the neural response mechanism used in the deterministic models. Such an approach principally avoids the difficulty underlying a direct probabilistic description of the collective activities of coupled neurons. Based on this framework, the neural tuning Kolmogorov-Sinai entropy is introduced as a metric of the information-processing-related neural dynamics. Although Lyapunov exponents can not have classical definitions for the stochastic process since most trajectories in the phase space only spend a finite time in the system [58], a general connection between the entropy production and Lyapunov exponent can be established based on Ruelle inequality [56] (or Pesin identity [55]), FIG. 7. Summary of our findings. A stimulus sequence triggers series of neural dynamics in a neural population. The dynamic variation of neural activities usually features small amplitudes and high dynamic randomness in the short-term, while the opposite properties can be seen in the long-term dynamic variation (see Finding 1). One can see an uneven spatial distribution of dynamic randomness in the neural population. Specifically, the mean HKS declines from shallow-layer neurons to deep-layer neurons since neural spikes gradually reduce. Meanwhile, the normalized HKS increases along with the mean distance to input neurons, suggesting that the chaotic degrees of stimulus-triggered neural responses increase from shallow-layer neurons to deep-layer neurons (see Finding 2). When one turns to analyze the information quantities, it can be found that the encoding efficiency H H S (MI/ES) shares the same variation trend with the decoding efficiency F (FI) only when the dynamic randomness (mean HKS) is relatively small (in deep-layer neurons). When the dynamic randomness is relatively large (in shallow-layer neurons), the encoding efficiency increases along with the dynamic randomness while the decoding efficiency does not, leading to an either-or situation since the increase of the encoding/decoding efficiency implies the reduction of the other one (see Finding 3). Furthermore, the encoding scope gradually declines as the dynamic randomness increases. Thus, the shallowlayer neurons (with high dynamic randomness) mainly account for the specific encoding of local stimulus distribution, while the deep-layer neurons (with small dynamic randomness) support nonspecific encoding for the global stimulus distribution (see Finding 4). The observation of Findings 2-4 requires the dynamic randomness to be analyzed in the short-term, otherwise the dynamic randomness will be small and homogeneous among all neurons (see Finding 1).
which enables the proposed neural tuning Kolmogorov-Sinai entropy to measure the dynamic randomness and identify chaos in the characterized neural activities analytically [54]. Taken together, the proposed framework supports calculating the dynamics-theoretical and the information-theoretical metrics analytically, achieving our objective for a unified analysis of dynamics and information. The analytical calculation of these metrics prevents our findings from depending on computational approximation critically.
In the experimental part, we implement a unified analysis for the relations between neural information functions (quantified from the aspects of encoding and decoding) and neural dynamics. We discover that short-term neural activities have smaller variation amplitudes but greater dynamic randomness, while long-term variations feature exactly opposite properties (Finding 1). Another relevant finding is that the spatial distribution of chaos of neural activities in a neural population is uneven (Finding 2). Then, we reveal the existence of specific restrictive relationships between the encoding and decoding efficiency when the dynamical randomness is relatively large (Finding 3). Finally, we identify that the neural activities with chaos dynamics are more related to the processing of local stimulus distribution while the stable neural dynamics is more relevant with the processing of global stimulus distribution (Finding 4). Fig. 7 offers a discussion on these findings and their relations.
Finding 1 might be related to previous neural signal recording studies. It has been found that a signal recording scheme with a low temporal sampling rate (e.g., fMRI) can not directly reflect the underlying stimulustriggered neural activities [59]. Before being recorded by those low temporal resolution techniques, the shortterm neural activity variations need to accumulate until the variations are intense and robust enough [59], accompanied by significant loss in neural activity information [59][60][61][62]. A classical solution towards this limitation is to develop high temporal resolution recording techniques (e.g., multiphoton microscopy [63][64][65]). While Finding 1 partly supports this idea, it also suggests the contradiction between robust signal trends and the preservation capacity of the neural activity information. A recording scheme with a high sampling rate preserves the information of highly-frequent neural activity variations and ensures the separability of neurons, but obtains less robust signal trends than the scheme with a low sampling rate. Thus, improving temporal resolution is a necessary but not sufficient solution.
If we do not control the effects of spiking probability, the variation trend of the mean H KS in Finding 2 is consistent with several findings in computational neuroscience. Back to an early study [66], Diesmann finds an attractor of the propagation of synchronized action potentials, which governs the neural activity dynamics. A more recent study [67] confirms this attractor as a line attractor in the phase space of neural activities. These previous studies suggest that the dynamic randomness will be gradually reduced during a long enough propagation (e.g., from shallow layers to deep layers), accompanied by the reduction of spike rates. Compared with these previous results, our finding is not limited to the strictly hierarchical network topology and linear propagation process (our neural population is random), which ensures universality. When we turn to the normalized H KS that quantifies the dynamic randomness of stimulus-triggered neural responses, one can see the consistency between its variation and a well-known phenomenon that stimulus drives suppress dynamic randomness [68]. Combine our results with these previous explorations [68], we suggest that the synergy between neurons and input drives does control the chaos in stimulus-triggered neural responses (spikes). This neuron-stimulus synergy declines along with the distance to input neurons (the input ports of neural populations) and is gradually covered by network dynamics. The chaotic degree of neural responses to stimuli becomes larger when network dynamics takes in charge, because the stimulus effects are suppressed by inherent chaos. However, this phenomenon does not mean that the neural responses to stimuli are completely chaotic and unrepeatable (in other words, unreliable). As demonstrated by Fig. 6(d), the observed neural tuning curves of intermediary neurons are not fully stochastic, featuring specific patterns instead. Certain neural selectivity towards stimuli can still emerge. Therefore, the network dynamics and the inherent chaos can coexist with the regular neural activities governed by neural tuning properties. Although the regularity of neural activities is frequently broken by chaos, the chaotic degree of neural activities does not grow or maintain steadily (e.g., see Fig. 3(b)). These phenomena are consistent with the previous studies on chaos and reliability of stimulus-triggered neural activities [69,70].
Finding 3 may have potential insights for diverse topics, especially for the studies that aim at locating the neuronal or cortical foundations of cognitive functions [15,71,72]. These studies analyze the information processing properties of specific neurons or brain regions. The analysis usually relies on specific signal recording schemes (e.g., multiphoton microscopy [73,74] and microelectrode recording [75,76]) and measures the information processing properties from the aspects of encoding (e.g., with H, H , H [13,77]) or decoding (e.g., with Fisher information F [78][79][80]). Based on Finding 3, the potential risks lie in that the direct and indirect measurement methods may obtain two separate, and even competing, results. For example, on the one hand, the real efficient neurons/cortices in the encoding process might be neglected since its recorded signals are measured with low decoding efficiency; on the other hand, the neurons/cortices that are efficient in decoding might have lots of activities that can not be explained by stimuli, and thus, imply noisy conditions in the analysis. These risks may further lead to false-negative problems and repeatability problems in functional studies [81].
Moreover, Finding 1 and Finding 3 might propose challenges for neural signal recordings by revealing (1) the contradictory relation between the robust sig-nal trends and the preservation capacity of the neural activity information in the recording stage and (2) the restrictive relations between the encoding and decoding properties in the analysis of recorded signals. Although the recording with high temporal resolution features the capacity to reflect the underlying frequent neural activity variations, the high dynamic randomness of short-term neural activities usually limits the possibility to obtain a robust and repeatable result and implies the restrictive relations between encoding and decoding properties. The recording scheme with low temporal resolution can not record the high-frequency neural activities, but it can obtain the signals with relatively (not completely) controlled dynamic randomness to ensure stable macroscopic signal trends and avoid the restrictive relations between encoding and decoding. Therefore, a multitemporal-resolution recording may take the advantages of both high and low temporal resolutions and overcome their shortcomings. This provides theoretical interpretations for the intrinsic advantages of the multi-temporalresolution recording of neural signals [82][83][84].
Finding 4 might be inspiring for cognitive neuroscience. Both Finding 2 and Finding 4 reveal that the neural activities in shallow layers are more related to local stimulus distribution, while those in deep layers are more related to the global stimulus distribution. To some extent, the interpretability by the global stimulus distribution can be treated as a kind of increase of the generalization ability of neural responses to stimuli. This emerged phenomenon suggests the possibility that the neurons in shallow layers concentrate on specific local information of stimuli, while the neurons in deep layers process the general information of stimuli. To our knowledge, there is no experimental evidence for such a finding, and it may inspire further study on the spontaneous formation of the division of labor among neurons during the data-driven (bottom-up) processing.

B. Validity and limitations
In the above discussion, we have sketched how our findings may relate to neuroscience studies. Here we attempt to confirm the scope that our work can be applied on robustly and validly. Although we have pursued a unified and biologically valid analysis of information and dynamics, there are specific limitations in the current research. The validity of the mathematical descriptions of neural characteristics (e.g., tuning properties [2,53] and neural spike mechanisms [32]) is ensured conditionally.
The necessary condition for the proposed Poisson processes to approximate neural activities validly is that the time step ∆t in discretization corresponds to a sufficiently short physical time (∼ 5 ms). With a sufficiently small physical time step, the activities of every neuron follow a non-homogeneous Poisson process because a neuron can not emit more than one spike simultaneously (satisfies the Poisson condition that the probability of two or more changes in a sufficiently small interval is 0). Although the dependence on short physical time steps does not threaten our theory and findings mathematically, it is essentially a limitation in computational implementations. The current version of stimulus-triggered neural activity characterization is extremely computationally costly, making it impossible to deploy large-scale and long-run experiments (e.g., run on an ultra-large neural population with 10 9 neurons that approximates a human cortical area or generate the spike trains corresponding to a physical time interval of 24 hour). The costs will significantly increase if we further take the maximum likelihood estimation for defining the Poisson processes of intermediary neurons (see (A11)) into account. However, the formation of neural characteristics (e.g., the tuning properties of intermediary neurons) in real neural systems usually requires long-term processes and the involvements of large-scale neurons. The difficulty we meet here is a potential threat to the generalization ability of our findings, questioning if the discovered phenomena rely on our experiment settings critically.
Proposing mechanistic insights into why our findings arise is a possible approach to verify their generalization ability. Finding 1 arises when we attempt to compare between the short-term (small τ ) and long-term (large τ ) variations of neural activities. As suggested by Fig.  3(a) and Fig. 5(a), the differences between short-term and long-term neural activity variations (amplitude and dynamic randomness) hold across different neurons and throughout the time interval. The amplitude differences are easy to understand since the variation amplitude accumulates during the variation interval [t, t + τ ]. A larger τ naturally implies larger accumulations of amplitudes. The difference relates to dynamic randomness mainly arises from the mathematical definition of neural tuning Kolmogorov-Sinai entropy H KS (see (B7)). One can reorganize (B7) as where the second part relates to τ and is subject to r ≥r W rr (S , t, t + τ ) = 1. Based on information theory, the term r >r W rr (S , t, t + τ ) ln 1 W rr (S ,t,t+τ ) will be maximized if the probability distribution {W rr } r approaches the uniform distribution on [r, r] (here r denotes the maximum response rate that can be reached by the neuron). This is because the uniform distribution is the maximum entropy distribution defined on finite interval and has no constraint on moments [85]. As shown by Fig. 3(a), the distribution {W rr } r tends to approach the uniform distribution as τ increases (distribution peaks become broader). Therefore, one can verify that r >r W rr (S , t, t + τ ) ln 1 W rr (S ,t,t+τ ) increases along with τ . However, the actual second part in (11) is r >r W rr (S , t, t + τ ) ln τ 1 W rr (S ,t,t+τ ) , featuring an opposite variation trend. The increase will be reversed by the τ -th root τ √ and becomes decrease, implying that entropy H KS reduces along with τ . In summary, Finding 1 mainly emerges from the mathematical nature of the proposed H KS . Finding 2 arises when we attempt to compare the dynamic randomness between neurons. In Fig. 5(b), the mean H KS declines along with the mean distance to input neurons because neural spike rates reduce. Similar with Finding 1, this phenomenon results from the mathematical definition of entropy H KS as well. One can verify that H KS increases along with P ♥ n (r | S , 0, t) (see (11)), while the latter governs the spiking probability of neurons. Thus, the dynamic randomness will increase if the neuron tends to emit spikes. In our experiment, we also calculate the normalized H KS (averaged from the raw data of H KS corresponding to neural spikes and normalized by spiking probability). By controlling the effects of spiking probability, the observed variation trend of the normalized H KS is consistent with previous studies that stimulus drives suppress dynamic randomness (or chaotic degrees) [68]. These phenomena can be related to the mathematical properties of H KS in general.
Finding 3 arises when we analyze the encoding and decoding properties as the functions of dynamic randomness. For encoding properties, one can see an increasing encoding efficiency (MI/TE) along with the dynamic randomness (measured by the mean H KS ). Although this phenomenon is consistent with the common belief that representing variable stimulus information requires high variability of neural activities, it can not be derived from the mathematical definition of mutual information directly. For decoding properties, we have demonstrated that input neurons (with higher dynamic randomness) can feature less Fisher information (FI) than intermediary neurons (with lower dynamic randomness), which mainly results from the differences between the tuning properties of these two kinds of neurons (see Fig. 6(d)). Therefore, the decoding efficiency does not necessarily increase along with the mean H KS . Limited by the size of our experiments, the observed phenomenon in Fig. 6(e) is that the decoding efficiency (FI) increases when the mean H KS is relatively small and decreases when the randomness is sufficiently large. If the analysis is implemented on a sufficiently large neuron population (the variation range of the mean H KS is enlarged) that processes stimuli in a sufficiently long interval (the tuning curves of intermediary neurons become more smooth), we hypothesize that the variation trend of decoding efficiency will become more smoother and completely decreasing along with the mean H KS . Meanwhile, the quantity gap between input neurons and intermediary neurons in decoding efficiency will decrease to a reasonable range. In brief, Finding 3 mainly arises from neural tuning properties rather than the mathematical attributes of the proposed information-theoretical metrics. Although we hypothesize that Finding 3 principally holds in most cases, any generalization of Finding 3 should be verified carefully.
Finding 4 arises when we explore the neural representation of the stimulus distribution. In our research, we propose a new conception, the encoding scope µ, to capture the characteristics of neural representation. The encoding scope µ principally measures the proportion of stimulus sub-set S µ that has better interpretability for neural activities (the noise entropy quantities of encoding them are below average) in all stimuli. In Fig. 6(f), we have observed that the encoding scope decreases along with the mean H KS , implying that shallow-layer neurons have smaller encoding scope than deep-layer neurons. Moreover, the local interpretability (the interpretability of neural activities by the stimuli within S µ ) decreases from shallow-layer neurons to deep-layer neurons. In general, we suggest that these phenomena result from the mathematical nature of noise entropy (see (C4) and (C7)) as well as the differences between shallowlayer and deep-layer neurons in neural tuning properties. Please note that the noise entropy towards stimulus s (see (C7)) is defined as H s (N n , t) = − r P ♥ n r | s, 0, τ S (t) log 2 P ♥ n r | s, 0, τ S (t) . Following a similar idea that we have applied on Finding 1, the entropy quantity will be maximized if the probability distribution {P ♥ n r | s, 0, τ S (t) } r approaches the uniform distribution on [0, r] (e.g., the peaks of distribution becomes broader). Because P ♥ n r | s, 0, τ S (t) is governed by the tuning curve G n of neuron N n , the approaching process essentially requires the response coefficient G n (s) to be large. Otherwise the density of {P ♥ n r | s, 0, τ S (t) } r will concentrate on a narrow sub-interval of [0, r] where r is small. One can verify that shallow-layer neurons (e.g., input neurons) usually feature broader tuning curve peaks and higher maximum response coefficients than deep-layer neurons (e.g., see Fig. 6(d)). For shallowlayer neurons, this property makes the noise entropy towards a wider range of stimuli located near tuning curve peaks sufficiently large, leading to higher noise entropy quantities (see Fig. 6(c)). Meanwhile, there remain relatively few stimuli with low response coefficients, implying a small size of S µ because most stimuli correspond to high noise entropy. Therefore, the encoding scopes of shallow-layer neurons are frequently small. Opposite situations can be seen in deep-layer neurons because of their relatively low response coefficients to stimuli and narrower tuning curve peaks. As for the differences between shallow-layer and deep-layer neurons in local interpretability, although we hypothesize that this phenomenon arises from neural tuning properties, we can not derive it mathematically in the current work.
In summary, we suggest that Finding 1 and Finding 2 emerge from the mathematical properties of the neural tuning Kolmogorov-Sinai entropy H KS and keep consistency with neural characteristics. The validity and generalization ability of these two findings can be partly ensured. As for Finding 3 that results from neural tun-ing properties, we hypothesize that it principally holds under different conditions because the involved neural tuning properties are basic properties of neural systems. However, we need to emphasize that our results have not been verified in large-scale and long-run experiments. The phenomenon itself, as well as related discussions, should be treated carefully. For Finding 4, although the phenomenon relevant with encoding scope can be mathematically derived from noise entropy and neural tuning properties, the phenomenon related to local interpretability remains a subject for further investigation.

C. Future directions
Understanding the connection between dynamics and information in the brain has become one of the most critical challenges in physics and neuroscience, leading to a promising way to improve the interpretability of information and cognitive functions based on physical foundations [4].
Our theoretical framework is a possible paradigm towards the unified measurement and analysis of dynamics and information attributes of neural activities. As suggested above, more verification is necessary for the potential connections between information and dynamics identified by this framework. In future works, one can further explore our framework in the aspect of the stochastic dynamics of the master equation [86,87] (or known as the Schnakenberg network theory [88,89]). Another valuable direction in the theoretical analysis is to further study the potential connections between the proposed neural tuning Kolmogorov-Sinai entropy and the Lyapunov spectra (one can turn to [49,50] for the applications of the Lyapunov spectra in neural network studies). Built on the current qualitative connection established by Ruelle inequality [56] (or Pesin identity [55]), more intrinsic properties of neural dynamics might be revealed if a systematic unification is implemented between these two kinds of metrics in neural activities at a more quantitative level. Moreover, although the roles of neural plasticity (e.g., STDP [90,91]) have not been included in our current analysis, the proposed stimulustriggered neural activity characterization can be easily generalized to plasticity conditions to study the effects of memory (e.g., defining the neural response threshold Υ in (A10) as time-dependent). It is expected that this unified framework features the potential to deepen our understanding of the emergence of various characteristics of neural information processing from physical bases.
In the current research, we have focused on implementing the unification on the sub-cortex scale (e.g., a neural population with thousands of neurons), where each neuron plays a critical role and the cognitive functions have not emerged yet. Our future pursuit is to generalize the present framework to the cortex scale and further analyze cognitive functions. On the cortex surface, the ultra-dense distributions of neurons and synapses make the roles of the individual neuron or local neural network topology be covered up by the global cortex dynamics during the information processing process [92]. Therefore, the challenge we face is to develop a macroscopic description of the information-processing-related neural dynamics. One possible approach is to implement the renormalization group [93][94][95] on neural dynamics and analyze the multi-scale dynamics transformation. Such analysis allows us to understand the accumulation of dynamics from cellular scale to cortical scale. Another potential way is to develop a continuous formulation of neural dynamics by approaching the thermodynamic limit of the proposed neural activity characterization in our research [48]. Then, we can combine our characterization framework with specific cortical field models of perception (e.g., the ring models of primary visual cortex [96,97]) to generate the information-processing-related dynamics in the ultra-large neural population of certain sensory cortices. Based on these implementations, it might be possible for our framework to be applied in further studying how cognitive functions are shaped by global cortex dynamics.
In the viewpoint of Charles F. Stevens [98], models are common in neuroscience, but theories are relatively scarce. Neuroscience has amassed various models to describe specific phenomena, but few theories offer general frameworks for a wide range of facts and find the underlying connections between different issues. Our research, as well as diverse previous explorations (e.g., see the works on neural information functions [18] and the works on neural dynamics [9]), demonstrate a possible way to identify the general connections between information and dynamics in the brain. These present theories and discoveries suggest an evolutionary perspective that the characteristics of neural information functions and further cognitive functions naturally emerge from the physically fundamental properties of neural ensembles rather than be designed by complex high-level mechanisms. This suggested perspective is worthy of further explorations in the future. We begin with a random neuron population N (V, E), where V is the set of all neurons and E is the set of all synapses. We use the weighted adjacent matrix C to describe the synaptic connection strength between any two neurons N i and N j as C ij ∈ [−1, 1] (here C ij < 0 stands for the inhibitory connection, and C ij > 0 stands for the excitatory connection, and C ij = 0 means that there is no synaptic connection). In experiments, we randomly generate V, E, and C for university. Specifically, the randomization of E utilizes a basic approach introduced by Erdős and Rényi [99,100]. The probability for any two neurons to feature a synaptic connection is set as p, implying that the average degree of neurons equals p (|V| − 1). For convenience, our research randomize p ∈ [0.02, 0.025] in every experiment. As for C, each element in it is uniformly randomized from [−1, 1].
In a neural population, the stimulus inputs can not be simultaneously received by all the neurons. We refer to the neurons that receive inputs directly and instantaneously as the input neurons. As for the neurons that are not directly triggered by stimulus inputs, they can be activated by the stimulus information transmitted from the neurons in its receptive field (the set of its pre-synaptic neurons). We call them intermediary neurons. To keep our experiments universal, we randomly pick a subset of neurons in a generated neural population as input neurons. As for the remaining neurons, they are considered as intermediary neurons.

The neural activity of input neuron
Each input neuron N i processes stimuli directly, whose activity profile is appropriately shaped by its tuning properties (i.e., response selectivity to stimuli). Neural tuning curve is a standard model to describe the response selectivity [2,53], where the stimulus at the peak evokes the highest response rate [53]. A representative example is the bell-shaped tuning curve [2,53], which is where R i is the maximum response coefficient, s i is the preferred stimulus selected from the stimulus set S, and σ i represents the width of the tuning curve. Assume that a stimulus sequence S occurs in a given time interval [0, t ]. For convenience, sequence S is sampled from S uniformly in our experiments, and the time unit ∆t (minimum time step for discretization) is set as 1. Note that other kinds of randomization and discretization can also be applied.
Following the perspective of rate coding [2], we implement the neural activity characterization as the probability for the neural response to reach a specific frequency at a given moment. The Poisson process is efficient in approximating this probability in most cases [2]. Given the stimulus sequence S , the probability for the cumulative neural response count of an input neuron N i to reach a specific quantity r at moment t (t ∈ [0, t ]) can be approximated by a non-homogeneous Poisson process, whose probability distribution is where Λ i (0, t) is the cumulative intensity function and λ i (m) denotes the time-varying intensity function Given (A2-A4), the activity profile of input neuron N i is defined based on the interactions between the stimulus sequence S and the neural tuning properties G i (s). Given the above definition, we can further generate possible neural activities. This research generates neural activities by predicting the arrival time sequence of neural responses with the maximum probability method (MP). For the r-th neural response, we can find the location of the maximum probability of it by working out t r = argmax t (P i (r | S , 0, t)) . (A5) The obtained result t r by the operator argmax is the moment that maximizes the probability P i (r | S , 0, t), meaning that the probability for the r-th neural response to arrive at moment t r is highest. Therefore, the r-th neural response high-frequently occurs at moment t r in real situations. The MP method benefits neural response generation for its low computational costs and its ability to approximate the neural response generation by largescale Monte Carlo sampling [101] on probability distributions (maximum probability implies the highest occurrence frequency when the sampling size is large enough). One can replace the MP method with large-scale Monte Carlo sampling when computing power allows. A potential limitation of the MP method lies in that it essentially creates a deterministic mapping from the probability distribution P i (r | S , 0, t) to the neural response. Although this property does not affect our research because neural response trains that strictly reflect P i (r | S , 0, t) are exactly demanded in our analysis, the MP method is not applicable when one demands more randomness in neural response generation (e.g., to simulate noisy neural responses that do not follow P i (r | S , 0, t) strictly). In the latter situation, a relatively small-scale Monte Carlo sampling can be used to create more randomness.
By traversing all possible response rate r, we can obtain a set of moments { t r } r∈N + based on (A5). Given the properties of Poisson process, we know that { t r } r∈N + is naturally ensured to be not decreasing. For convenience, we mark every neural response in the timeline following where T i = { t r } r∈N + and δ denotes Dirac Delta function. The obtain sequence R i (t) equals +∞ if a neural response arrives at moment t. Otherwise it equals 0. Please note that one should replace all +∞ in R i by 1 in computational implementations.

The neural activity of intermediary neuron
Each intermediary neuron N j is driven by the presynaptic neurons RF (N j ) and affected by the network dynamics, whose activities can not be simplified as (A2). This research characterizes the neural activity profile of N j by describing its synaptic inputs and neural responses to these inputs.
For each neuron N k in RF (N j ), let its neural response arrival sequence be R k (t), then we can describe the synaptic input given from N k to N j as R k (t) C kj . In most cases, the cumulative synaptic inputs on neuron N j have a dissipation term Ω j (t) (e.g., the leaky term in the leaky integrate-and-fire neuron [32]). And N j emits a neural response only if the cumulative synaptic inputs reach a specific response threshold Υ j (t) (e.g., spiking threshold [32]). Therefore, we can define the neural response of N j as where υ (·) denotes the unit step function, and Ψ j (x) := N k ∈RF (Nj ) R k (x) C kj denotes the synaptic inputs. In our experiments, the definitions of Ω j (t) and Υ j (t) are proposed in general forms. Specifically, Ω j (t) satisfies where τ denotes the minimum time consumption of generating spikes and ε (t) is the time-dependent perturbation. Based on (A8), we can realize that (1) for t ∈ [0, τ ), the cumulative synaptic inputs of N j during [0, t] will only be perturbed by ε (t) rather than dissipated. This is natural because the 1-st spike generation does not end yet and historical accumulations should not dissipate; (2) for t ∈ [ τ , t ], the cumulative synaptic inputs of N j during [0, t] will be dissipated to small quantity of historical perturbations remaining for dissipation. In computational implementations, we can do discretization (set the minimum time step ∆t = τ ) on (A8) to realize that historical accumulations before the (n − 1)-th step will dissipate at the n-th step and only leave behind specific perturbations. A concrete example of the general definition in (A8) is the leaky term in the leaky integrate-and-fire neuron [32], where the timedependent perturbation ε (t) is frequently omitted. In our research, this perturbation is defined as in which γ ε is the degree of perturbation. In our experiment, we randomly define γ ε ∈ [20,50] to control the intensity of perturbation. As for the time dependent neural response threshold Υ j (t), it can be defined in various forms to achieve plasticity mechanisms [90,91]. In our experiment, we use a simplified definition in which we randomly set θ ∈ 1 4 , 3 4 based on the ratio of the difference between the response threshold and the resting state to the difference between the response apex and the resting state [35,102,103].
Based on (A7), we can obtain an observed neural response sequence of the intermediary neuron N j . We repeat the experiment with the stimulus sequence S for h times, each time we can obtain an observed result R j,a (t) (a ∈ Z∩ [1, h]). Then we can use the maximum likelihood estimation to construct the observed non-homogeneous Poisson process of the neural activities of N j , where the observed time-varying intensity function λ j (t) is given as Based on (A11), the observed Poisson process of N j features a probability distribution where Λ j (0, t) is the observed cumulative intensity function, Given (A12), the MP method in (A5) can be applied to obtain an observed neural response sequence R j (t). Moreover, if S ⊆ S , then we can also obtain the observed tuning curve of N j as G j (S (m)) = λ j (m) . (A14)

Summary of neural activity characterization
To this point, we have characterized the stimulustriggered neural activities in a neural population. For convenience, we define the stochastic process of each neuron N n following where ♥ acts as an index of the type of neuron. ♥ = 1 stands for that N n is an input neuron and its neural activities are defined by (A2), while ♥ = 0 means that N n is an intermediary neuron that follows (A12).
In our characterization, we distinguish between input and intermediary neurons because input neurons are similar to the sensory cells that perceive external stimuli in the early information processing stage. From a physics perspective, input neurons serve to transform stimuli into non-homogeneous Poisson processes following their neural tuning properties. Although this physics property can be mathematically simplified by direct signal transformation (e.g., see [2]), we suggest that the study of neural information function benefits from including input neurons into neural populations.
To computationally generate neural activities in neural populations, one can consider the following algorithm. In (A2) and (A12), we have described the neural activities of both input and intermediary neurons in a neural population with non-homogeneous Poisson processes. A beneficial property for further analysis is that any Poisson process is a kind of continuous Markov chain. Therefore, we reformulate neural activities in the form of the Markov chain.
Assume the neural population encodes a stimulus sequence S in an interval [0, t ]. For a neuron N n in the neural population, we concentrate on the probability of that the neural response rate at moment t is r and that at moment t + τ (τ ≥ 0) is r . This probability describes the transformation possibility from neural response state r to state r , which can defined as υ (r − r) P ♥ n (r − r | S , t, t + τ ). The first step to construct the Markov chain is to define the time-varying transition matrix W (S , t, t + τ ) as The second step is to propose the master equation of Estimated from set { Rn,a} a∈Z∩ [1,h] following (A11-A12); Neural response sequence Rn (τ ) with t − ∆t ≤ τ ≤ t ← Updated following (A5-A6); end end end the Markov chain of neuron N n , which is given as where τ ≥ 0 and Based on (B4), we can describe the stimulus-triggered neural activity of N n with the non-homogeneous continuous Markov chain.

Defining Kolmogorov-Sinai entropy depending on neural tuning properties
Kolmogorov-Sinai entropy characterizes the randomness of a dynamical system forward in time [54,58]. The classical definition of the Kolmogorov-Sinai entropy is where τ is the variation time step. Each C i denotes the state of the dynamical system (e.g., the neural response rate we have analysed before). In general, parameter τ can be understood as the time interval between any two times of sampling for the dynamical system. Enlarging τ is similar with the coarse graining. One can turn to [54,58] for a systematic analysis of Kolmogorov-Sinai entropy in statistical physics.
This research defines the neural activities of each neuron as a kind of non-homogeneous continuous Markov chain. However, this does not mean that we need to consider Kolmogorov-Sinai entropy with a continuous-time limit (the variation time step satisfies τ → 0). In the neural system, any kind of variation of neural states takes time (e.g., the time cost of biochemical reaction and the absolute refractory period). Thus, in the calculation of Kolmogorov-Sinai entropy, we only need to consider the situation where the selectable moment (e.g., t in (B4)) is continuous (this is ensured by the Poisson process) and the variation time step (e.g., τ in (B4)) is not approaching to 0 so as to meet the properties of the neural system.
Based on the knowledge of Markov chain, it is trivial that (B5) can be written as where P (C) is the probability of state C, and W (C → C ) defines the transformation probability from state C to state C [54,58]. In this research, the probability of neural activities is described in (B4). For each neuron N n , Kolmogorov-Sinai entropy H KS can be defined as a metric of its stimulus-triggered neural dynamics (we refer to it as the neural tuning Kolmogorov-Sinai entropy) × W rr (S , t, t + τ ) ln W rr (S , t, t + τ ) .
Equation (B7) defines the Kolmogorov-Sinai entropy H KS of probability distribution P ♥ n (r | S , 0, t) and, therefore, proposes a natural approximation of entropy H KS in the generated neural response trains by P ♥ n (r | S , 0, t). The approximation is valid when neural response generation is implemented by the MP method or a large-scale Monte Carlo sampling on P ♥ n (r | S , 0, t) [101] because these two approaches generate neural response trains strictly following P ♥ n (r | S , 0, t). However, the approximation becomes invalid when one uses small-scale Monte Carlo sampling or other randomization methods to involve neural response generation with more randomness (e.g., while simulating noisy neural responses). In that case neural response trains do not strictly follow P ♥ n (r | S , 0, t) and may have their unique entropy quantities.

Chaos of neural activities
An important property of the Kolmogorov-Sinai entropy is that the Kolmogorov-Sinai entropy of a dynamic system is no more than the summation of all the positive Lyapunov exponents of this system (see Ruelle inequality [56]). This property connects the Kolmogorov-Sinai entropy with the Lyapunov spectra analysis [49,50], supporting an analysis of chaos.
For each neuron N n , we have defined its neural tuning Kolmogorov-Sinai entropy by (B7). Based on Ruelle inequality, there is where each ξ is a Lyapunov exponent of the dynamic system that describes the neural activities of N n , and I is the indicative function. Here the equality holds only when the system is endowed with an Sinai-Ruelle-Bowen (SRB) measure [104]. Under this condition, (B8) is usually referred to as Pesin identity [55]. In dynamics theory, Lyapunov exponent ξ characterizes the separation or convergence rate of different infinitesimally close trajectories in the phase space (the space of all states of the dynamical system). Here are several key properties of Lyapunov exponent need to be emphasized: • For a dynamical system with n parameters, the phase space is n-dimensional and there are n Lyapunov exponents; • For a Lyapunov exponent ξ, if it's positive, then it measures the separation rate of close trajectories in the corresponding direction; if it's negative, then it measures the convergence rate; if it equals 0, then the trajectories won't separate nor converge in this direction; • For a dynamical system with n Lyapunov exponents, if at least one Lyapunov exponent is positive, then the dynamical system can be chaotic.
Therefore, if H KS is positive for neuron N n , then the neural activities of this neuron is chaotic.When H KS increases, the chaos is intensified.

Appendix C: Properties of the neural information processing
In this section, we will calculate the parameters related to neural encoding and decoding properties. To provide a clear vision, here we give basic explanations for those two conceptions: • Neural encoding concerns how neural responses encode and represent the input stimulus; • Neural decoding studies how to decode the coded information of stimulus from neural signals.

Properties of neural encoding
To measure the encoding efficiency of neurons, there are three widely-used parameters: • Total response entropy H. It measures the total variation of neural responses to stimuli; • Noise entropy H . It measures the variation of neural responses that can not be explained by stimuli; • Mutual information H . It measures the variation of neural responses that can be explained by stimuli.
The classic definitions of these three parameters have been summarized in [2]. In this research, we reformulate the calculation of them to fit dynamic situations. Before implementing the reformulation, there are several necessary derivations to carry out.
First, for the stimulus sequence S that occurs in [0, t ], it can be recorded to obtain a posteriori probability distribution of stimuli based on the frequency statistics. A stimulus s can not be recorded until it occurs. Thus, the posteriori probability distribution of stimuli is timedependent (the frequency statistics results of stimuli are updated in real time). For convenience, we use S (0, t) to represent the stimulus sequence that has been recorded in [0, t]. For each moment t in [0, t ], the corresponding distribution is defined as P (S , 0, t).
Second, for each stimulus s, the occurrence duration can be recorded as well. A stimulus might occur multiple times, each time corresponds to an occurrence duration. Although the stimulus sequence S is sampled from a stationary process in our experiments (therefore, the occurrence duration is fixed as 1), we still present our theory in a general form to fit more situations. For every moment t in [0, t ], we can do frequency statistics to obtain P (τ s , 0, t) as the posteriori probability distribution of the occurrence duration length of stimulus s based on the records in [0, t]. Then, we define τ S (t) = s∈S (0,t) P (s, 0, t) τs P (τ s , 0, t) τ s , where τ S (t) is the mean duration length averaged from all possible duration length of every stimulus in S that occurs in [0, t].
Third, based on the proposed stimulus-triggered neural activity characterization, we can define P ♥ n (r, 0, t) = s∈S (0,t) P ♥ n r | s, 0, τ S (t) P (s, 0, t) , (C2) where P ♥ n (r | s, 0, τ S (t)) denotes the probability for the neural response rate to reach r with a stimulus s lasting for τ S , which can be calculated by (A15). It can be seen that (C2) defines a neural response probability distribution of N n based on the frequency statistics in [0, t].
Given these above derivations, we can define the timedependent total response entropy as H (N n , t) = − r P ♥ n (r, 0, t) log 2 P ♥ n (r, 0, t) .

(C4)
Finally, we can calculate time-dependent mutual information as H (N n , t) = H (N n , t) − H (N n , t) .
Moreover, we can also measure the time-dependent entropy of the stimulus sequence as and define that µ (N n , t) = |S µ (0, t) | |S (0, t) | , then µ (N n , t) measures the proportion of the stimuli that can better explain neural activities in all stimuli. We refer to µ as the encoding scope. Till now, we can calculate the total response entropy, noise entropy, and mutual information only based on the stimulus in S µ (here S µ can be treated as specific local stimulus distribution). Most parts of calculations keep the same with what has been defined above. The only two differences lie in that we need to recalculate the occurrence probability of the stimuli in S µ and the occurrence duration length (since we only focus on the local part of stimulus distribution). Specifically, for each s in S µ , its new probability P (s, 0, t) is recalculated as P (s, 0, t) = P (s, 0, t) si∈Sµ P (s i , 0, t) . (C10) Apart from that, the new occurrence duration length is τ S (t) = s∈Sµ(0,t) P (s, 0, t) τs P (τ s , 0, t) τ s .
Based on the new probability redefined in (C10-C11), we can further calculate all parameters based on (C2-C5).

Properties of neural decoding
Decoding the stimulus parameter from given neural signals is important for neuroscience studies. With a decoding (or estimation) technique, researchers can predict the input stimulus based on the neural response.
For neuron N n , assume that we have recorded its neural response sequence to S in [0, t ]. At any moment t in [0, t ], a decoding technique is applied to obtain a timedependent estimated stimulus sequence S est (0, t) based on the neural response sequence in [0, t]. If we repeat the experiment with same stimulus sequence S for k times and each time we do an estimation, then we can obtain an averaged time-dependent estimate ∀s ∈ S (0, t) , s est (0, t) = 1 k For a decoding technique, it is optimized if its variance approaches to 0.
In statistical theory, the Cramér-Rao bound suggests that the Fisher information limits the accuracy with which any decoding technique can estimate the target parameter of the stimulus. Assume that B E n (t) and D E n (t) are the bias and variance obtained based on a decoding technique E that is applied on N n . Then the Cramér-Rao bound can be given as [2] ∀E , D E n (t) ≥ where F (N n , t) denotes the time-dependent Fisher information of neuron N n . Based on (C15), it can be seen that the calculation of F (N n , t) is important since it acts as the lower bound of the variance of any decoding technique applied on neuron N n . Even for the unbiased decoding (B E n (t) ≡ 0, thus ∂ ∂t B E n (t) ≡ 0), the variance is still no less than 1 F (Nn,t) [2]. To measure the precision limitation of any possible decoding technique that can be applied on neuron N n , we calculate the time-dependent Fisher information F (N n , t) as F (N n , t) = s∈S (0,t) P (s, 0, t) r P ♥ n r | s, 0, τ S (t) ∆ s ln P ♥ n r | s, 0, τ S (t) where ∆ s (·) is the first order difference with respect to s. The Fisher information proposed in (C16) is in discrete form. In special cases, if P ♥ n r | s, 0, τ S (t) is sufficiently smooth with respect to s, we can also use the partial derivative to replace the first order difference to obtain a continuous form.
Moreover, the calculated quantity in (C16) is the Fisher information of the whole stimulus sequence. It can be treated as the expectation of the Fisher information r P ♥ n r | s, 0, τ S (t) ∆ s ln P ♥ n r | s, 0, τ S (t) 2 of each stimulus s in the sequence. For a decoding method E , it is optimal if and only if its variance satisfies D E n (t) = 1 F (Nn,t) . And it can be seen that for neuron N n , if its Fisher information increases, then the optimal decoding technique applied on it can realize a better precision.
To this point, we have defined the information processing properties of neural activities from the perspectives of encoding and decoding. Similar to the neural tuning Kolmogorov-Sinai entropy H KS , those information processing properties are defined based on probability distribution P ♥ n (r | S , 0, t), approximating the properties of neural response trains. The validity of approximation relays on whether neural responses strictly follow P ♥ n (r | S , 0, t) and, consequently, can not be ensured when one generate noisy neural neural responses with more randomness.