A Unifying Framework for Information Processing in Stochastically Driven Dynamical Systems

A dynamical system can be regarded as an information processing apparatus that encodes input streams from the external environment to its state and processes them through state transitions. The information processing capacity (IPC) is an excellent tool that comprehensively evaluates these processed inputs, providing details of unknown information processing in black box systems; however, this measure can be applied to only time-invariant systems. This paper extends the applicable range to time-variant systems and further reveals that the IPC is equivalent to coefficients of polynomial chaos (PC) expansion in more general dynamical systems. To achieve this objective, we tackle three issues. First, we establish a connection between the IPC for time-invariant systems and PC expansion, which is a type of polynomial expansion using orthogonal functions of input history as bases. We prove that the IPC corresponds to the squared norm of the coefficient vector of the basis in the PC expansion. Second, we show that an input following an arbitrary distribution can be used for the IPC, removing previous restrictions to specific input distributions. Third, we extend the conventional orthogonal bases to functions of both time and input history and propose the IPC for time-variant systems. To show the significance of our approach, we demonstrate that our measure can reveal information representations in not only machine learning networks but also a real, cultured neural network. Our generalized measure paves the way for unveiling the information processing capabilities of a wide variety of physical dynamics which has been left behind in nature.


I. INTRODUCTION
Dynamical systems driven by external stimuli can be universally found in nature, especially in biology. The dynamical aspects of information processing found in biology have long been a source of inspiration for researchers who wish to create a high-speed, energy efficient, and robust real-time information processing device, which resolves a von Neumann bottleneck [1]. Reservoir computing (RC) [2][3][4] is a bioinspired information processing paradigm that capitalizes on this dynamical perspective and has been widely utilized in various fields in recent years [5][6][7][8][9][10][11][12]. It consists of a type of learning framework for recurrent neural networks (RNNs), whose intermediate layer is referred to as the reservoir. In a reservoir composed of N -nodes, the i th (i = 1, 2, . . . , N ) node state x i,t at the t th time step can be written as follows: where f is the activation function, and w ij and w in,i are the internal and input weights, respectively. To emulate the target output z t , we use linear regression to obtain an estimate of z t ,z t , as follows: where w out andw out ∈ R N are the weight and solution vector for the target output, respectively. This learning method leverages the dynamical resource through training without affecting the state of the reservoir but places a constraint on x t . To perform reproducible computation, RC requires x t to be the same response against identical input time-series (i.e., the state needs to be an echo function, which is a function of only the past input time-series u t = {u t−s } t s=1 ). This dynamical property is referred to as the echo state property (ESP) [13][14][15] or the fading memory property (FMP) [3,16,17], which are slightly different from each other (see the Appendix for further details). According to these properties, various activation functions can be used for the reservoir node. Furthermore, as the reservoir is not limited to a computer-generated system, it can be replaced with a real physical system. A reservoir using such a physical system is called a physical reservoir [18]. Some of the above systems have a wide range of dynamics that are not readily found in conventional neural networks.
In the literature, a measure called information processing capacity (IPC) [19] has been proposed to quantify the information processing capability of dynamical systems that have ESPs or FMPs. The IPC measures the type and quantity of input history that is handled and held in the system by decomposing the system state into an orthogonal basis [20][21][22]. Using the N -dimensional state x t ∈ R N and the one-dimensional stochastic input ζ t ∈ R at the t th (t ∈ Z) time step, the input-driven dynamical system (IDS) determines the next state, as shown below: where f maps R N × R → R N . The IPC evaluates the emulation ability of the i th target output z where F n (ζ) represents the n th order orthogonal polynomial of ζ. From Eqs. (2) and (3), we obtain an estimate of z t . When z t is an orthogonal function of the independent variables {ζ t−s } ∞ s=1 , the IPC is defined using a normalized emulation error of the reservoir, as follows: where w ∈ R N , X = [x 1 · · · x T ] ∈ R T ×N , and z (i) = z (i) 1 · · · z (i) T ∈ R T are the weight vector, state, and target output, respectively. In this case, the uniform random variable and the Legendre polynomial are the stochastic variable ζ and orthogonal polynomial F n (ζ), respectively, although the combination is not restricted. For example, a Gaussian random variable and a Hermite polynomial are also suitable [19]. Therefore, the IPC is a measure used to evaluate the input information held by the state with the emulation ability of the orthogonal basis.
In this connection, there is a theory about a deterministic dynamical system with a stochastic input ζ t in a different context. The system can be described as an operator of {ζ t−s } ∞ s=1 according to the polynomial chaos expansion [23], which is a series expansion using the target outputs of IPC as the bases (i.e., multivariate orthogonal polynomials of the random variables, z (i) t , described in Eq. [5]). The polynomial chaos expansion has been frequently utilized to determine the evolution of uncertainty in a dynamical system when there is probabilistic uncertainty in the system parameters [24,25]. These multivariate polynomials are referred to as polynomial chaoses (PCs), and the space spanned by PCs is called homogeneous chaos [23], expressed as where c i ∈ R N is the i th coefficient vector. If the input ζ t follows a certain distribution, the PC is determined based on its orthogonality. Let ζ be a vector notation of ζ sampled T times ζ = {ζ 1 , . . . , ζ T }. If a weighting function w(ζ) specific to the PCs {z exists, the following orthogonality relations should be satisfied: f (ζ)g(ζ) = ζ w(ζ)f (ζ)g(ζ), (9) where δ ij is the Kronecker delta function. The PC expansion can use various combinations of input and polynomials. The generalized polynomial chaos (gPC) [24] supplies the PC for specific combinations-e.g., the Hermite polynomial for Gaussian distributions, the Legendre polynomial for uniform distributions, and the Charlier polynomial for Poisson distributions are available as F n (ζ). Furthermore, the arbitrary polynomial chaos (aPC) [25] is the PC for random variables following an arbitrary probability distribution by using the Gram-Schmidt orthogonalization procedure (See the Appendix for both schemes). As previously described, the IPC and PC expansion have a number of similarities and differences. Both schemes use the IDS with stochastic inputs and multivariate orthogonal polynomials, whereas the types of input distributions and orthogonal polynomials are not as limited in the IPC as they are in the PC. One objective of this paper is to establish a connection between the IPC and the IDS expanded with PCs. Thus, we first aim to reveal this relationship by deriving the IPC from the state expanded with PCs. Second, to enlarge the applicable range of IPC, we extend this relationship for time-variant systems. So far, the IPC assumed that the system is a function of only the input time-series. However, a solution of the dynamical system in Eq. (4) is a function of the input time-series and time with a given initial state. Introducing time-dependent orthogonal bases to PCs, we aim to derive the IPC for time-variant systems and illustrate that input information processing is performed by coupling the terms of time and input time-series. Finally, to demonstrate the potential of our approach, we apply our theory to three cases-a model that is frequently used as a benchmark task in the context of temporal machine learning, an artificial neural network, and a real, cultured neural network-to reveal their information processing within the systems.

A. Singular value decomposition
Singular value decomposition (SVD) breaks down state where P ∈ R T ×r and Q ∈ R N ×r are matrices whose column and row vectors, respectively, are singular vectors, Σ = diag{σ 1 , . . . , σ r } ∈ R r×r is a diagonal matrix containing the singular values σ i (i = 1, . . . , r), and r(≤ N < T ) is the rank of X.

B. ESN
Let the i th (i = 1, 2, . . . , N ) state of the ESN at the t th step be x i,t . The state equation is given as follows: where w ij was initialized with the uniform random number in the range of [−1, 1] and multiplied by a constant so that the maximum eigenvalue of the matrix W = [w ij ] was 1. The input weight ω i was also set to a uniform random number in the range of [−1, 1], and ρ, ι(= 0.1), and N (= 50) represent the spectral radius of ρW , input intensity, and the number of nodes, respectively. f denotes the activation function, three types of which were used to solve the NARMA10 task [4,26]: a linear function a hyperbolic tangent function and an analog integrator function where τ was set to 1.25.

C. One-dimensional ESN
The one-dimensional ESN is described as where x t , u t , and ζ t are the state, input, and random variable at the t th step, respectively, and ρ and σ are an inner weight and input intensity, respectively. To treat the bounded and time-invariant state, we chose ρ = 0.95 and µ = 0.
Let the state and input at the t th step be y t and u t , respectively. The NARMA10 model is given by where the default constant parameters (α, β, γ, δ) are set to (0.3, 0.05, 1.5, 0.1), ζ t is the random variable at the t th step and follows a uniform distribution in the [−1, 1] interval, while µ and κ are the average of ζ t and the input intensity parameter, respectively. This paper uses two ranges: u t ∈ [0, σ] (µ = κ = σ/2) and u t ∈ [−σ, σ] (µ = 0, κ = σ). All the initial values of y t (t = 0, 1, . . . , 9) were set to zero, except when the basin of attraction was examined.
E. The limit cycle system The simple limit cycle system with radius r and azimuth θ in polar coordinates [48] is discretized as follows: where ω = 2π/3 and τ = 0.1 are the angular velocity and time step width, respectively, and ζ t is the uniform random number in the [−1, 1] interval. Therefore, the input u t follows the uniform distribution in the [µ−σ, µ+ σ] range (µ = 0.2 and σ = 1.5) and is applied in the radial direction. The Cartesian coordinates are given by x t = r t cos θ t and y t = r t sin θ t .

A. The equivalence of the IPC and coefficient in PC expansion
To show the relationship between the IPC and PC expansion, we derive the IPC from the state expanded in terms of PCs. First, we transform the IPC into a simpler form using SVD, which reduces the N state timeseries to r(≤ N ) linearly independent time-series vectors p j ∈ R T (j = 1, . . . , r). Using the decomposed state, we can rewrite the IPC relative to z (i) as where φ (i) = z (i) /||z (i) || is the normalized output. Next, assuming that x t can be expanded with PCs, the state X is described as where Φ = φ (1) φ (2) · · · andĈ = [ĉ 1ĉ2 · · · ] are the basis matrix and coefficient matrix, respectively. Comparing the decomposed state X = P ΣQ (see Methods) and Eq. (23), we obtain the matrix where P = [p 1 p 2 · · · p r ] and Λ = Σ −1 Q Ĉ = [λ 1 λ 2 · · · ] are a matrix form of linearly independent vectors and a constant matrix, respectively, while λ i ∈ R r . From Eqs. (22) and (24), the IPC becomes Because P is also written as (25) illustrates that the computation of the i th IPC is equivalent to expanding the state with the PCs and calculating the squared norm of the coefficient of the i th PC of the temporal bases p j expanded by the PCs.
In addition, the IPC has an important property of summation. The total capacity is described as Eq. (26) yields the sum of the squared norm of p j (j = 1, . . . , r) projected into an infinite-dimensional space that contains the orthogonal vectors φ (i) (i = 1, 2, . . .); therefore, if all of the p j are functions of only past input history, ||Φ p j || becomes one, resulting in We call this condition an integrity property. Under the assumption that the state is a function of only past input history, IPCs hold the integrity property in information processing. These results provide a new perspective that the coefficient of PC expansion represents the amount of processed input.

B. Demonstration of comprehensive computational capabilities using general input distribution
To illustrate that various orthogonal polynomials can be used as target outputs for the IPC, we examine the total capacities for eight target gPCs and four target aPCs using one-dimensional ESNs, which is a time-invariant system. FIG. 1(a) shows the IPC breakdowns with eight types of gPCs in the Askey scheme. The utilized gPCs of random distributions are the Hermite-chaos of a Gaussian distribution, Laguerre-chaos of a gamma distribution, Jacobi-chaos of a beta distribution, Legendre-chaos of a uniform distribution, Charlier-chaos of a Poisson distribution, Krawtchouk-chaos of a binomial distribution, Meixner-chaos of a negative binomial distribution, and Hahn-chaos of a hypergeometric distribution. The total IPCs are all one, suggesting that the gPCs form a complete orthogonal system with any combination of distributions, and orthogonal polynomials and can be used for the IPC.
Furthermore, to demonstrate that aPCs can also be used as the target outputs for the IPC, we estimated the IPCs with the ESN given four types of inputs following a mixed Gaussian distribution, Pareto distribution, Zipf distribution, and Bernoulli distribution, which do not follow the Askey scheme. To investigate the IPC of the ESN, we built Gram-Schmidt-chaoses. As shown in FIG. 1(a), the total IPCs were one, indicating that the aPCs formed a complete orthogonal system with the input distribution and Gram-Schmidt-chaoses and can be used for the IPC.
These results suggest that both the gPC and aPC are suitable as the target output of the IPC.

C. Extending the integrity in information processing to a time-variant domain
The information processing integrity shown so far holds only if the system is a function of a past input series. Using the derived relationship between the IPC and PC expansion, we extend the application range of IPC to time-varying IDSs. First, we introduce a classification of IDSs based on the conditions imposed on the solution of Eq. (4). From the connection between the IPC and PC, the IPC is calculated by extracting the second-order process x t in the 1 ≤ t ≤ T range from the original sequence, which is obtained according to Eq. (4). By recursively using Eq. (4) from t = 0, the solution is clearly determined from the time t, input sequence ζ t = {ζ t−s } t s=1 , and initial state x 0 , as follows: where ξ is determined by f in Eq. (4). However, the state is described as an operator of only {ζ t−s }, showing that the PC expansion assumes that x t is time-invariant. Therefore, two conditions are imposed on the time-series. First, every i th (i = 1, . . . , N ) state time-series x i,t (t = 1, . . . , T ) must be a second-order process. The specific condition is that the second moment of x i,t must be finite: Demonstration of theories. (a) IPC breakdowns of the one-dimensional ESN with eight types of target generalized/arbitrary PCs of a random variable ζt: Hermite-chaos with Gaussian random variables, Laguerre-chaos with gamma random variables, Jacobi-chaos with beta random variables, Legendre-chaos with uniform random variables, Charlier-chaos with Poisson random variables, Krawtchouk-chaos with binomial random variables, Meixner-chaos with negative binomial random variables, Hahn-chaos with hypergeometric random variables, and Gram-Schmidt-chaos with mixed-Gauss random variables, Pareto random variables, Zipf random variables, and Bernoulli random variables. The n th -order capacities (n = 1, . . . , 8) in the total capacity Ctot are summarized. The variable p represents the probability distribution of ζt. (b) The TIPC of the limit cycle system in Eqs. (19)- (21). The phase plane of xt and yt in the system (upper panel). The systems without input (µ = σ = 0; red line) and with input (µ = 0.2, σ = 1.5; black line) are shown. The first-order capacities of P1(ζt−s 1 ) cos(ωτ t + α1,s 1 ) with the delay step s1 (middle panel). The second-order capacities relative to delay steps s1 and s2 (lower panel). The diagonal (s1 = s2) and the upper left triangle (s1 < s2) dots show the capacities of P2(ζt−s 1 ) cos(ωτ t+α2,s 1 ) and P1(ζt−s 1 )P1(ζt−s 2 ) cos(ωτ t + α2,s 1 ,s 2 ), respectively. In the white area, the TIPC calculation is omitted.
to the Cameron-Martin theorem. Second, the solution ξ should be time-invariant. The extracted time-series needs to be described only with the input time-series ζ t .
Therefore, the PC expansion assumes the non-divergence and time-invariance of the system.
In contrast, if x t is time-variant, the homogeneous chaos is no longer a complete orthogonal system, and x t is represented by the time-dependent polynomial chaos (TDPC) as t are the i th coefficient vector depending on x 0 , polynomial chaos, and the timedependent basis, respectively. We can construct a complete orthogonal system from t and {ζ t−s } ∞ s=1 by adding bases of t that are orthogonal to the PCs of {ζ t−s }. In the following discussion, we assume that x 0 is given and fixed. By converting the summation over the input timeseries ζ t into a summation over time t, Eq. (8) can be rewritten as follows: and can be satisfied by the PCs. Next, we define the timedependent basis that satisfies this inner product. For example, {1, cos ωt, sin ωt, cos 2ωt, sin 2ωt, . . .} (ω = 2π/T ) is used for the Fourier series expansion and is a complete orthogonal basis of time. Since φ are uncorrelated with each other, we expect that the following orthogonality relation will be satisfied for a sufficiently long period T : In this paper, we call the time and input time-seriesdependent basis in Eq.
t , the TDPC and define the space spanned by the time-dependent polynomial chaoses as time-dependent homogeneous chaos (TDHC).
The IPCs of x t in the TDHC can be estimated by replacing the target output φ We define the IPC with the target TDPCs as temporal information processing capacity (TIPC). As with the IPC of a time-invariant system, the TIPC of a time-variant system is equivalent to expanding the state with TDPCs and calculating the squared norm of the coefficient of the i th TDPC of the temporal bases p j (j = 1, . . . , r) expanded by the TDPCs. Note that to remove the terms that do not include the input, the timeaverage of x t and the time-varying terms-for example, A n cos(2πf n t + θ n ) (n = 1, 2, . . .), where the amplitude A n , the frequency f n , and the phase θ n are estimated by the Fourier transform-are subtracted. Since TD-PCs constitute a complete orthogonal system, the norm of each of the r-time-series vectors in the space is one, and the total TIPC retains r. This extension can reveal aspects of the information processing performed by time-variant systems.

D. Demonstration of complete computational capabilities in a time-variant system
To demonstrate the extension of the IPC, we show the TIPC of a two-dimensional limit cycle system. As shown in FIG. 1(b), input that follows a uniform distribution forces the system to fluctuate around the non-input state. Using this time-variant system, we calculated the TIPCs, whose TDPCs were constructed using the PC {φ t } for n s ( s n s < 5), s < 10, and temporal basis ψ t ∈ {cos Ωt, sin Ωt, cos 2Ωt, sin 2Ωt, . . . , cos T 2 Ωt, sin T 2 Ωt}, where Ω = 2π/T . From the estimated TIPCs, the state on the two-dimensional plane x t = [x t y t ] can be expressed as follows: where P n (ζ) represents the n th -order Legendre polynomial, and the phases α 1 and α 2 depend on the initial values. Note that although TIPCs depend on the initial values of the system in general, the final outcome of TIPCs is the same in this case because the choice of the initial values does not affect the coefficient vectors of Eq. (35). The coefficient vectors for the first and second-order terms are p and q ∈ R 2 , respectively, indicating that the TIPCs of the system are composed of capacities of the product of the time-varying basis vectors ψ t ∈ {cos ωτ t, sin ωτ t} and PCs {P n (ζ t−s )}.
In addition, in the case of using the conventional IPC, the total IPC saturates at the rank of the state matrix, r, only in the system with the negative maximum Lyapunov exponent [19]. In the limit cycle system we adopted, the Lyapunov exponents of Eqs. (19)- (21) are zero in the azimuthal and negative in the radial direction; therefore, the maximum Lyapunov exponent is zero. Furthermore, the system does not satisfy the ESP because it is timevariant, and the phases α 1 and α 2 depend on the initial values. Although the system, whose rank is two, does not satisfy these conventional conditions, the total TIPC saturates at C tot = 1.987 (1 st -order, C tot = 1.952; 2 ndorder, C tot = 3.45 × 10 −2 ) as shown in FIG. 1(b).
Therefore, these results suggest that the information processing that was lacking with conventional IPC can be measured by adding time-varying bases, and the total TIPC can reach the rank even if the maximum Lyapunov exponent is not negative or the system state depends on the initial values.

E. Application #1: the benchmark task
To illustrate the usefulness of our theory, we demonstrate the information processing capabilities of three systems. First, we analyze the computational capabilities required to emulate a simple model for a timeseries benchmark test, which is a well-known NARMA10 model. The NARMA10 task is widely utilized to evaluate the computational capabilities of dynamical systems, but the meaning of predicting this model is unknown. We classified the model with certain parameter regions to be time-invariant (see the Appendix). In FIG. 2, p expresses the probability of not diverging as a function of σ for different random series {ζ t }, and the model is stable with certain parameters. Using the non-divergent model, we estimated its IPCs for the target Legendre-chaoses with delayed time step s(< 16) and degree n s ( s n s < 9).
Since the NARMA10 model is a one-dimensional system, the total capacity is one, but the breakdown of the IPC changes with some parameters. FIG. 2(a) and (b) show that the capacities using the uniform random variable in an asymmetric range u t ∈ [0, σ] differ significantly from those using the input in a symmetric range u t ∈ [−σ, σ]. The capacity breakdown with the symmetric input includes only the second-order capacities ( s n s = 2). In contrast, the capacities with the asymmetric input contain the first-order ones because the input term emerges as u t−9 u t = σ 2 /4(P 1 (ζ t )P 1 (ζ t−9 ) + P 1 (ζ t )+P 1 (ζ t−9 )+1) in Eq. (17), including the first-order terms of ζ. Hence, in the case of using an asymmetric input, we can regard the model as the system receiving the first-order inputs, which are retained for a few steps. These results suggest that the input should be changed according to the dynamical system when one uses the NARMA10 task; for example, as the nodes of an ESN are represented by an odd function, such as a hyperbolic tangent, the ESN has only odd capacities [19]. From this property and our results, the ESN with u t ∈ [0, σ] emulates the NARMA10 model, whereas the one with u t ∈ [−σ, σ] does not predict it at all. F. Application #2: the machine learning network Second, we analyzed the performance of a machine learning network that solved the benchmark task using the ESN as an example. As shown in FIG. 3(a), we also emulated target NARMA10 model using 50-node ESNs with three activation functions-linear, hyperbolic tangent, and analog integral functions-and compared the output of the ESNŷ t and the target y t with the normalized root-mean-square errors (NRMSEs). For all the functions, the NRMSE decreased as the spectral radius ρ increased and increased when ρ ≥ 1. To analyze the outputs of the ESNs training the NARMA10 model, the IPCs of the outputŷ t were estimated. FIG. 3 shows the change in the IPC breakdown of the output y t with the increase in the spectral radius ρ of the ESN with the linear, hyperbolic tangent, and analog integral functions, respectively. To emulate the NARMA10 model, whose IPC breakdown is shown in FIG. 3(e), the nine types of Legendre-chaoses {P 1 (ζ t−s )} s=1,2,3,10,11,12 and {P 2 (ζ t−s )} s=1,2,3 are mainly required in a certain ratio. According to FIG. 3(b)-(d), for any activation function, as ρ(< 1) increases, {P 1 (ζ t−s )} s=1,2,3,10,11,12 approaches the required rate. However, the three types of secondorder IPCs {P 1 (ζ t−s )P 1 (ζ t−s−9 )} s=1,2,3 are almost zero in ESNs with an activation function. Therefore, in the NARMA10 task with ESNs, performance is compared using only the first-order IPCs.
To investigate why the second-order IPCs did not appear in the breakdown, we estimated the IPCs from ESN states. FIG. 3(f)-(h) shows the change in the IPC breakdown with the increase in the spectral radius ρ of the ESN with a linear, hyperbolic tangent and an analog integral function, respectively. The rank r of the ESN state increases with ρ(≤ 1) and corresponds to C tot . As these three breakdowns have low-target second-order IPCs (each capacity is less than 0.2), the ESNs cannot emulate second-order Legendre-chaoses.
The above results demonstrate that our method is capable of decomposing the computational capability of the machine learning network before and after training, as well as clarifying whether the computational components required for the task have been extracted through learning and exist in the original network.
G. Application #3: the real neural network Finally, to show the broad applicability of our theory, we prepared a dissociated culture of neurons for a physical reservoir, which is an open non-equilibrium system that fluctuates due to external inputs and has parameters that can be considered time-dependent. Real neurons extracted from the cortices of rat embryos were pharmacologically and mechanically isolated and then seeded on an electrode array. After the culture matured (FIG.  4[a]), we constructed a physical reservoir using electrodes with active neurons (FIG. 4[b]; see the Appendix). We repeatedly applied bipolar waves with a 10, 20, or 30 ms interpulse interval (IPI) to 29 stimulus electrodes, whose amplitude ζ t follows a Gaussian distribution with mean µ = 200, 300, or 400 mV and standard deviation σ = 50 mV (FIG. 4[c]). Furthermore, we computed the number of spikes in an IPI-width bin from N = 792 measurement electrodes as the reservoir states (FIG. 4[d]). As a result, we obtained a long single trajectory of the activation of the electrodes according to the input stream, which con-sisted of 20,000 time steps and was used for our TIPC analysis.
Using these data, we computed the TIPCs of the physical system. FIG. 4(a) shows the first-order TIPCs of the delay step s, which contain the memory function (MF) [49,50] and the four temporal memory functions (TMFs), which had time-varying target z t = P 1 (ζ t−s ) cos(nωt) or z t = P 1 (ζ t−s ) sin(nωt) (n = 1, 2). Note that nω = 2πn/T (n = 1, . . . , T /2) denotes the frequency. The TMFs monotonically decay with an increase in s, as well as the MF. Next, to investigate the frequency characteristics of the TIPC, we plotted the TIPC with s = 1. FIG. 4(b) and (c) shows the relationship between the frequency and TIPCs with the time-varying cosine and sine targets, respectively. We term such a graph the TIPC spectrum. Both spectra have larger TIPCs at lower frequencies. Therefore, the input was processed by the coupling terms of the low frequency sinusoidal wave and the past input, suggesting that the superposition of these waves represents a gradual trend-which may be caused, for example, by synaptic plasticity and neural adaptation-and thus, information processing was also embedded in the non-stationary changes.
FIG. 4(d)-(f) illustrates the total capacities C tot with different µ and IPI. Every C tot contained time-varying IPCs, and as the IPI decreased, the ratio of time-varying IPC to C tot increased. As with the time-invariant case, the time-variant IPC increased as the degree decreased. Therefore, the total capacity of the dissociated culture of neurons contained time-varying IPCs in all cases. This type of information processing could not be elucidated by the conventional IPC, suggesting that our proposed measure is effective.

A. The relationship between an attractor and information processing
In the present paper, random variables were given as input to the time-invariant systems, whose states were represented by echo functions, which depend not on the time t but only on the past input time-series. In the case where the system with no input converges to a fixedpoint (e.g., the intersection point of the NARMA10 system), this function illustrates that the system stays at a fixed-point attractor and fluctuates around the fixedpoint due to the input. The conventional IPC targets time-invariant systems and quantifies the input processing of a state that depends only on the input time-series. Therefore, the conventional IPC sometimes represents the input information processing performed around a certain fixed-point attractor.
In addition, the IPC is the squared norm of the coefficient vector of the temporal basis vectors expanded with PCs. Since these coefficient vectors obviously change depending on the fixed-point, different types of information processing are performed at different fixed points. However, the IPC extended for time-variant systems was applied to the limit cycle system, which does not satisfy the prerequisites for RC. From its TIPC estimates, the expanded solution contains the coupling terms of the input and the time-varying terms, P 1 (ζ t−s1 ) cos(ωτ t+α 1,s1 ), P 2 (ζ t−s1 ) cos(ωτ t+α 2,s1 ), and P 1 (ζ t−s1 )P 1 (ζ t−s2 ) cos(ωτ t + α 2,s1,s2 ), showing that the processed input represents the amplitude scale of the sinusoidal. Therefore, the fluctuation of a periodic attractor determined by input can be processed around the limit cycle.
Based on these findings, we conclude that the conventional IPC can evaluate computational capabilities around a fixed point, while the TIPC can also evaluate capabilities around a periodic attractor. Since the recent RC framework exploits not only fixed-points or periodic attractors but also chaotic ones [51][52][53][54], future work should examine the relationship between various attractors and information processing.

B. Methods to utilize a time-variant system as a computational resource
In demonstrating the TIPCs for a limit cycle system, we showed that the state of the system can include coupling terms of the past input time-series. Since the coupling term is represented by the product of the timevarying basis ψ t and PC φ t , the conventional IPC for time-invariant terms could not quantify the amount of information. The TIPC indicates that even in a timevariant system, information can be processed by the PC φ t in the coupling term. In addition, to date, RC has trained a static readout weight by linear regression, assuming that the ESP or FMP is satisfied; the state in the reservoir is a function of the input time-series. Since the target output is described by the input history, performance decreases when a time-dependent reservoir (e.g., an ESN with a spectral radius of ρ ≥ 1) is used. As the number of input history terms in the expanded state decreases, the performance drops, while the number of coupling terms increases. The input information in a coupling term can be used for the task and can be extracted by giving a readout weight that cancels out φ t in the coupling term (e.g., a time-varying weight). Therefore, even in a reservoir where the ESP or FMP does not hold, the input time-series may be processed by TD-PCs, and the task can be successfully solved by using new types of readout.
In this connection, a method already exists for exploiting periodic systems as computational resources. As shown in FIG. 5, the time-multiplexing technique [5] switches the input u t with the time width τ (FIG. 5[a]) and extracts x(t) = [x(t + τ ), x(t + 2τ ), . . . , x(t + (N − 1)τ )] to virtually increase the number of nodes in the reservoir and improve the computational capabilities. If x(t) is a periodic function and oscillates with a period show the first-order TIPC spectrum with the cosine ζt = P1(ζt−1) cos(nωt) and sine target ζt = P1(ζt−1) sin(nωt), respectively. Frames (h), (i), and (j) depict the total capacity of the culture with IPI = 10, 20, and 30 ms, respectively, and the horizontal axis is the mean of amplitude µ. The hatched bar represents the total capacity with time-varying n th order polynomials. specific to u(t) (FIG. 5[b]), applying an input with the same width τ as the period can extract time-invariant virtual nodes because the i th virtual node always corresponds to x(t) at a certain phase and is not affected by the periodic fluctuation. However, if the period does not match τ (FIG. 5[c]), the phase shifts, and the scheme cannot exploit the computational capabilities of the periodic system. Therefore, time-multiplexing can be interpreted as a method capable of extracting a computational resource by transforming a time-varying system into a time-invariant system. Thus, a time-variant system can process input information through coupling terms, and rich input information can be extracted from time-variant systems by designing readout for the systems. In the future, information processing using a time-varying reservoir and novel design methods for readouts will be investigated.

C. Extension to TIPC with Multiple Input Variables
In the present paper, TIPC was limited to one type of input, but it could be easily extended to a multiple input version. Let M independent stochastic inputs that follow multiple arbitrary distributions be ζ ), whose state for a time-invariant system can be expanded by aPCs with multiple input variables [55]. As all of these aPCs are orthogonal, and their orthogonality is defined by the same inner product as Eqs. (32) and (33), which are also common to TDPCs, it is clear that we can expand the state for a time-variant system using the TDPCs and define its TIPCs. Physical systems often receive various inputs from the external environment, resulting in such multiple input-driven systems. For example, in the dissociated culture of neurons, the ranks of the state are 724-792, but the total capaci- ties are less than 3.5 (FIG. 4[c]), which are much smaller than their ranks. One possible speculation regarding this issue is that in our scheme, the state may be expressed as a function of multiple stochastic inputs, including the one we gave (e.g., electrical noise, synaptic noise, thermal noise, and shot noise [56]). Of all the inputs, the ones we could observe is limited, and the total capacity computed only from the inputs did not reach the rank. Therefore, physical systems can receive unobservable inputs, which disturb the examination of all capacities.

V. CONCLUSION
This paper attempted to clarify the unknown relationship between the PC expansion and IPC by deriving the IPC from the PC-expanded system. To illustrate this relation, we showed that the IPC can be measured using gPCs and aPCs. In addition, using the NARMA10 model, we concretely described the relationship and showed the usefulness of our theory. Next, taking into account the characteristics of the general solution of the input-driven dynamical system, we proposed the IPC for time-variant systems-called the TIPC. To demonstrate that the time-variant system has such TIPCs, we investigated the TIPC breakdown of the limit cycle system. The primary results are summarized as follows: • Using SVD, we can obtain the orthogonal temporal basis vectors from the state time-series. These vectors can be expanded with PCs to obtain the coefficient vector of each basis. The IPC is equivalent to the squared norm of the coefficient of the PC used as the target output. Therefore, the expanded basis coefficients represent the amount of input processing information.
• Using eight types of polynomials in the Askey scheme and Gram-Schmidt PCs, we estimated the IPCs of a one-dimensional ESN, whose total IPCs were equal to one. These results indicate that various types of PCs within the Askey scheme and Gram-Schmidt orthogonalization are suitable for the target output of the IPC.
• We calculated the TIPC of a time-variant system using the simple limit cycle and revealed that the input information processing is performed by timeand input time-series-dependent terms.
• IPC analysis revealed that the NARMA10 model is mainly composed of P 1 (ζ t−s ) (s = 1, 2, 3, 10, 11, 12) and P 1 (ζ t−s )P 1 (ζ t−s−9 ) (s = 1, 2, 3). The NARMA10 benchmark task can be solved by holding the nine types of input information in a reservoir. Consequently, combining the IPC and PC expansion provides a clear and concise picture of information processing.
• The dissociated culture of neurons had not only the time-invariant IPC but also the time-variant IPC, suggesting that the TIPC reveals the information processing in a trend-for example, synaptic plasticity and neural adaptation-that has been left behind to date.
The above results suggest that the connection between the IPC and PC expansion allows for a simpler description of information processing in dynamical systems. In the future, information processing using time-variant systems should also be elucidated. This scheme can be applied to non-stationary systems and thus may be suitable for elucidating information processing in neural circuits which has been overlooked so far. In addition, it can be applied not only to the neural systems but also to other physical systems that can be time-variant-for example, fluid, quantum, spintronics, and optical systems. For example, recently it is reported that some types of vortex generated when a fluid flows past a bluff body can be used as an information processing device [57]. In their analysis, they found that near the critical Reynolds number, where the flow exhibits a twin vortex before the onset of the Karman vortex shedding associated with the Hopf bifurcation, the information processing capability was maximized. This was also characterized by the breakdown of ESP. According to our results, it may be possible to evaluate the type and amount of information processing even in the Karman vortex shedding, which would be a direction for future work. In an ESN composed of N -nodes, the i th (i = 1, 2, . . . , N ) node state x i,t at the t th time step can be written as follows: where f is the activation function, and w ij and w in,i are the internal and input weights, respectively. To emulate the target output z t , we use linear regression to obtain an estimate of z t ,z t , as follows: where w out andw out ∈ R L are the weight and solution vector for the target output, respectively. This learning method does not affect the state of the reservoir but places a constraint on x t . Jaeger [2] and Maass [3] independently developed RC by integrating ESNs and LSMs, respectively. The prerequisites for x t differ in ESNs and LSMs. An ESN requires x t to be an echo function, which is a function of only the past input time-series u t = {u t−s } t s=1 . This dynamical property is referred to as the ESP [13][14][15]. We examine this feature by giving the same input time-series to two systems with different initial values and checking whether the two states coincide after a sufficiently long period of time. If the input is noise, this phenomenon is called noise-induced synchronization [58,59]. Furthermore, when the input is generated from a deterministic system, the phenomenon wherein the state is synchronized with the input is called generalized synchronization [60]. Therefore, ESP is related to the synchronization phenomenon of nonlinear dynamical systems.
In addition, LSMs impose a prerequisite on the power series expansion of states.
If the system is timeinvariant-i.e., the state does not depend on time-and retains exponentially decaying inputs, its state can be approximated by the Volterra series [61], which is a series expansion with non-orthogonal bases. Accordingly, the Volterra series operator [62] can be expanded in a non-orthogonal power series, as follows: where g s1···sn ∈ R L is the n th Volterra kernel. Such a memory decay feature is called the FMP [3,16,17] and is considered a prerequisite for LSMs.
Appendix B: Polynomial chaos expansion A deterministic dynamical system with a stochastic input ζ t can be described as an operator of {ζ t−s } ∞ s=1 according to polynomial chaos expansion [23], which is a series expansion using the target outputs of IPC as the bases-i.e., multivariate orthogonal polynomials of the random variables, z (i) t , described in Eq. (5). These multivariate polynomials are referred to as polynomial chaoses (PCs), and the space spanned by the PCs is called homogeneous chaos [23], expressed as where c i ∈ R L is the i th coefficient vector.

Generalized polynomial chaos
Generalized polynomial chaos (gPC) [24] is PC composed of the univariate polynomial F n (ζ) included in the Askey scheme [63] tree (FIG. B1). The Askey scheme represents various orthogonal polynomials (e.g., Hermite, Jacobi, Laguerre, and Charlier) using the hypergeometric series r F s of x, along with parameters a 1 , . . . , a r and (a) n = 1 (n = 0) a(a + 1) · · · (a + n − 1) (n = 1, 2, . . .) , where (a) n is the Pochhammer symbol. For example, the Laguerre polynomial with a parameter α, L (α) n (ζ) can be written as Note that in FIG. B1, an upper polynomial with the limit of a certain parameter or parameters becomes a lower polynomial connected with a line; for example, the Laguerre polynomial becomes the Hermite polynomial H n (ζ) by taking the following limit of α: The i th target z (B4) Using the hypergeometrical series r F s (ζ), F n (ζ t−k ) can be applied to the following eight types of orthogonal polynomials.

The Legendre polynomial and a uniform distribution
The n th (n = 1, 2, . . .) order Legendre polynomial P n (ζ) is given by where · represents the floor function, and ζ follows a uniform distribution in the range of [−1, 1]: The Charlier polynomial and a Poisson distribution The n th (n = 1, 2, . . .) order Charlier polynomial C n (ζ; a) is given by where ζ follows a Poisson distribution In FIG. 1(e), the parameter α was set to 6.

The Meixner polynomial and a negative binomial distribution
The n th (n = 1, 2, . . .) order Meixner polynomial M n (ζ; β, c) := M n is given by where M 0 = 1 and M −1 = 0. ζ follows a negative binomial distribution In FIG. 1(g), the parameters (c, β) were set to (0.2, 10).    (1) and ∆w (2) are shown in TABLEs D1-D3. From the above results, we found that the model with no input has a fixed point attractor at the saddle point.

Divergence
Next, we examined the conditions under which the NARMA10 model diverges. Since the nullclines are timevarying due to z can change. Thus, in the case of no input (µ = κ = 0), we examined the time step at which y t diverges with the initial values of w (1) 0 and w (2) 0 . The initial values of y t were set as y 0 = w (1) 0 and y 1 = y 2 = · · · = y 9 = (w In the presence of input, y t diverges due to the initial condition. As shown in FIG. C1(d), the model with input u t ∈ [0, 0.4] (µ = κ = 0.2) diverges at an initial value similar to the one in the case of no input. FIG. C1(e) shows the bifurcation diagram of y t given an initial value at which y t converges to the fixed-point. Since the fixed-point around which y t fluctuates is the saddle point, y t can diverge with the given input. To investigate the divergence conditions caused by the input, we ran the model over 10 6 time steps and examined the time step at which y t diverges by altering the initial values y 0 = · · · = y 9 = ψ and input intensity σ. FIG. C1(f) shows that y t diverges to infinity when ψ or σ exceeds three thresholds: (i) ψ < −5.15, (ii) 1.239 < ψ, and (iii) σ > 0.45. In the (i) and (ii) cases, y t diverges at a shorter time step (t < 100) than in (iii) because the divergence is caused by the initial value ψ. However, in case of (iii), y t diverges according to the input. After converging to the fixed-point, y t can diverge successively when receiving large positive inputs. According to the time steps in FIG. C1(f), when y t diverges as the runtime becomes longer, y t diverges with smaller σ. Consequently, the threshold (iii) is the boundary that depends on the input time-series. These results suggest that y t in the vicinity of the fixed-point can diverge depending on the input.
In FIG. 2, p expresses the probability of not diverging as a function of σ for different random series {ζ t }. As previously shown, once y t converges to the fixed-point, the model stochastically diverges due to the input, and the probability p depends on σ; thus, even though p is high, y t can potentially diverge. For example, two typical ranges of input, u t ∈ [0, 0.5] and [0, 1], have been used for the NARMA10 task; however, y t diverges in both cases (FIG. 2[b], σ = 0.5, 1.0) because σ is relevant to the average time until divergence, and the runtime, 10 6 time steps, is much longer than the time used for the benchmark task. Consequently, the divergence probability of y t , p, changes depending on the parameter σ.
Therefore, although the NARMA10 model produces the fixed-point attractor, it can potentially diverge depending on the initial values, input time-series, and parameter settings.

Time-variance analysis
Finally, we investigated the time-dependence of the NARMA10 model. The state of a dynamical system receiving noise input can transit from non-chaos to chaos, an effect referred to as noise-induced chaos [66]. As chaotic behavior is exhibited by a time-variant system, we investigated whether the system is chaotic or ordered by calculating the maximum Lyapunov exponent λ 1 . Thus, we derived the Lyapunov spectrum of the model λ i (i = 1, 2, . . . , 10) based on the 10-dimensional time-delay system in Eqs. (D1) and (D2). We expressed z (s) t (s = 1, . . . , 10) as a vector z t = [z (1) t · · · z (10) t ] , and the Jacobian matrix of Eq. (17) J t ∈ R 10×10 with respect to z t can be written as follows: where X t = α + 2βz t . Using the Jacobian matrices, the Lyapunov spectrum was computed as follows: ln ρ i (J t+M −1 J t+M −2 · · · J t ) (i = 1, 2, . . . , 10), where ρ i (J t+M −1 J t+M −2 · · · J t ) is the i th singular value of matrix J t+M −1 J t+M −2 · · · J t , while T and M were set to 6000 and 40, respectively. FIG. C1(g) shows the three largest Lyapunov spectra, λ 1 , λ 2 , and λ 3 , all of which are negative relative to σ, indicating that the system does not demonstrate chaos. Therefore, the NARMA10 model is not a chaotic and time-variant system.
The divergence and time-invariance analysis results revealed that the NARMA10 model converges to the fixedpoint and varies in the vicinity of the point. We considered the y t fluctuating around the fixed-point to be time-invariant.

Appendix E: Interconvertibility of the PC expansion and IPC
To clearly demonstrate that the PC expansion and IPC are interconvertible, we derived an approximate model that has nearly the same breakdown of the IPC as the original breakdown using Legendre-chaoses. From the above capacity analysis, we narrowed the polynomial terms to P 1 (ζ t−s ) (s = 1, 2, . . .) and P 1 (ζ t−s )P 1 (ζ t−s−9 ) (s = 1, 2, . . .), which yielded significantly greater capacities. The expanded state is expressed as follows: where p, q s , and r s are coefficients for the Legendrechaoses P 0 = 1, P 1 (ζ t−s ), and P 1 (ζ t−s )P 1 (ζ t−s−9 ), respectively, while P 1 (ζ) = ζ. ζ t follows a uniform distribution in [−1, 1], and N 1 and N 2 represent the sets of delayed time steps s for P 1 (ζ t−s ) and P 1 (ζ t−s )P 1 (ζ t−s−9 ), respectively. Let the normalized Legendre-chaoses be φ 0 = 1 √ T , φ wherep = p √ T ,q s = q s T t=1 P (ζ t−s ) 2 , andr s = r s T t=1 {P (ζ t−s )P (ζ t−s−9 )} 2 are the modified coefficients for p, q s , and r s , respectively.
Detrending the state and using Eq. (6), the IPCs for φ
Appendix F: How to compose the dissociated culture reservoir All experiments were approved by the ethical committee of the University of Tokyo and followed the "Guiding Principles for the Care and Use of Animals in the Field of Physiological Science" estalished by the Physiological Society of Japan. Embryonic rat cortices were dissected from E18 rats and used for cortical cell cultures. The cortices were dissociated in 2 mL of 0.25% trypsin-ethylenediaminetetraacetic acid (Trypsin-EDTA, Life Technologies), from which cells were isolated by trituration, and 38,000 cells were seeded on each microelectrode array (MEA; MaxWell Biosystems). For cell adhesion, 5 mL of 0.05% Polyethileneimine (PEI; Sigma-Aldrich) and 5 µl of 0.02 mg/ml Laminin (Sigma-Aldrich) were used before plating the cells. Then, after 24 hours, the plating media [67] were changed to growth media [68]. The plating media were composed of Neurobasal 850 µl (Life Technologies), 10% horse serum (Hy-Clone), 0.5 mM GlutaMAX (Life Technologies), and 2% B27 (Life Technologies). The growth media were composed of DMEM 850µl (Life Technologies), 10%horse serum (HyClone), 0.5 mM GlutaMAX (Life Technologies), and 1 mM sodium pyruvate (Life Technologies). All experiments were conducted in an incubator at 37 • C and 5% CO 2 . The MEAs were sealed with a lid to prevent water evaporation and invasion of bacteria and fungus.
The MEA had 26,400 electrodes, which were placed 17.5 µm apart and arranged in a 120 × 220 grid. The MEA can simultaneously use up to 1,024 of 26,400 electrodes. We selected electrodes with a high firing rate as measurement electrodes and electrodes on which the axon places, as stimulation electrodes. We applied bipolar pulse stimuli with an amplitude of ζ t , which followed a normal distribution with mean µ and standard deviation σ, and of an interpulse interval (IPI) of 10, 20, and 30 ms to the stimulation electrodes. Furthermore, a 6 th -order Butterworth bandpass filter and zero-phase IIR filter were applied to the voltage traces observed from the measurement electrodes to extract 300-3000 Hz components. At all electrodes, stimulus-induced artifacts were removed by eliminating traces ±2 ms from the stimulus times. The standard deviation of extracted signals was calculated as follows [69]: If the amplitude of an extracted signal exceeded 4σ, the value of the spike train was set to one; otherwise, it was set to zero. As the measurement frequency was 20 kHz, the above spike train was separated by a 1-ms time bin, and if one or more spikes appeared in one bin, the modified spike train was set to one; otherwise it was set to zero. The train was divided into bins by IPI-width, and the number of spikes in the bin was used for the state x t .