Modeling dynamics from only output data

We examine the problem of reconstructing input-output systems from time series data. Although the method of delays has already been used in the case where both input and output are measured, in some cases, the inputs cannot be measured, and hence, the method of delays cannot be used. On the basis of ideas derived from existing embedding theorems, we propose to build models by using delays of multivariate observations of output data. Assuming that the inputs are few, we use several observations for obtaining information about the inputs, and the remaining observations for obtaining information about the state of the system. Numerical examples on a discrete map and a continuous-time system show that input-output systems can indeed be identified by using multivariate observations of output data only. We also discuss the application of this method to the analysis of coupled systems or complex networks, by partitioning such large systems and analyzing each subsystem separately. The models used in this paper are nonpredictive models; thus, they cannot be used to predict the future behavior of the system. However, since they model the dynamics of the system, they have other possible applications such as change detection and noise reduction.


I. INTRODUCTION
In the analysis of experimental time series originating from low-dimensional nonlinear dynamical systems, the method of delays ͓1͔ is widely used for reconstructing a state space in which the dynamics can be viewed.This method is theoretically justified by Takens's embedding theorem ͓2,3͔.Thus, for an autonomous deterministic dynamical system, a scalar time series can contain sufficient information to reconstruct a state space that is equivalent to the state space of the original dynamical system.One can then use the reconstruction, for example, to estimate dynamical invariants of the system, to predict future values of the time series, to remove noise, or to control the system ͑see ͓4,5͔͒.
Since Takens's theorem applies only to autonomous deterministic systems, strictly speaking, it cannot be directly applied to a measured time series because of the presence of noise or an external input.First, signals obtained from real systems are affected by observational noise, which modifies the measured variable.A second type of noise is dynamical noise, which acts on the internal state of the systems; such systems are known as stochastic systems.Finally, in many fields, input-output systems are considered instead of autonomous systems.
Many studies have been carried out in order to overcome these difficulties.The effect of observational noise on the reconstruction of a state space has been well studied ͓6͔.The applicability of such reconstruction to input-output systems was conjectured by Casdagli ͓7͔, and indeed, several embedding theorems were later proved for deterministically forced systems ͓8͔ and for stochastic systems and input-output systems ͓9͔.See ͓10͔ for a review of these theorems.
Another question that recurs in the literature, but for which embedding theorems have not yet been stated or proved ͓11͔, is the use of multivariate observations in the reconstruction of a state space ͓12-16͔.If the dynamics of a dynamical system can be reconstructed from a scalar variable, then in certain cases, the reconstruction will be intuitively improved by the use of several variables.Muldoon et al. ͓17͔ showed that by performing many simultaneous observations such that an embedding of the original state space becomes automatic, even the noise can be reconstructed.
In this paper, on the basis of ideas derived from the theorem given by Muldoon et al. ͓17͔ and from Takens's theorem for stochastic systems ͓9͔, we propose to build models for low-dimensional deterministic input-output systems with unobserved inputs.We assume that the inputs are few, and we separate multivariate observations into primary observations and secondary observations.The secondary observations are used for obtaining information about the input, while delays of the primary observations are used for obtaining information about the state of the system.
In Sec.II, we review some of the existing embedding theorems.In Sec.III, we compare our approach to the existing embedding theorems and propose some theoretical arguments for the existence of a functional relation between delayed multivariate observations.In Sec.IV, we verify the functional relations of Sec.III by analyzing three numerically simulated systems: a discrete map with an external input, a continuous-time system with an external input, and finally two coupled continuous-time systems.We evaluate the quality of the reconstruction on the basis of predictability.In Sec.V, we summarize and discuss this research and its possible applications.a compact m-dimensional manifold M. The time evolution of the system is given by a map f t : M→M, so that if at time t 0 the system is in state x 0 , then at time t + t 0 its state is x͑t + t 0 ͒ = f t ͑x 0 ͒.Such a map is usually given by the solution of a differential equation on M.
In most applications, the state x is observed not directly but through a scalar observable y that is related to x by an observation function : M→R.The system is observed at discrete times t k = t 0 + k ͑k =0,1,...͒ with a sampling interval in order to obtain a scalar time series ͕y k ͖, where x k = f k ͑x 0 ͒.The states at discrete times t 0 , t 1 ,... are related by the equation where the notations were simplified by rescaling the time so that = 1 and f = f .On the basis of the observed scalar time series ͕y k ͖, it is possible to reconstruct a multidimensional state space that is equivalent to the state space of the original system; from such a state space, one can then build predictive models.One way of doing so is to use the method of delays ͓1͔: use successive delays of the time series The integer d is called embedding dimension, and v k is called reconstruction vector or delay vector.
From Eqs. ͑1͒ and ͑2͒, it follows that v k and x k are related by the equation where ⌽ : M→R d is called delay map or reconstruction map and is defined as follows: Takens's embedding theorem ͓2,3,10͔ then shows that for d Ն 2m + 1 and for typical f and of class C 2 , the delay map ⌽ is a smooth coordinate change between the original state space, where x k lies, and the reconstructed state space, where v k lies.To be precise, the delay map is then an embedding, i.e., it maps M diffeomorphically onto its image: it is one-to-one and continuously differentiable, and its inverse defined on f͑M͒ is also continuously differentiable.For a precise statement of Takens's theorem, refer to ͓10͔.
Since ⌽ is an embedding, a predictive relation can be obtained in the following manner.The coordinate change ⌽ can be used to transport the map f, which becomes F = ⌽ ‫ؠ‬ f ‫ؠ‬ ⌽ −1 in the reconstructed coordinates, where the symbol ‫"ؠ"‬ denotes function composition.Then, from Eq. ͑3͒, we know that v k = ⌽͑x k ͒ is in the image of ⌽.Thus, F can be applied to it; =v k+1 by Eq. ͑3͒.
By rewriting the above equation in terms of the observed time series ͕y k ͖, we can express the equations of dynamics of v k as follows: ͑y k+1 ,y k+2 , ... ,y k+d ͒ = F͑y k ,y k+1 , ... ,y k+d−1 ͒.
Here, the first d − 1 components of F are a trivial shift of coordinates, and hence, if we write the last component of F as G : ⌽͑M͒ → R, we obtain the following predictive relation: The function G can be estimated from observed data; thus, the future of the time series ͕y k ͖ can be predicted.

B. Generalizations to forced systems
We now consider a discrete-time system that is excited by an external force, namely, a forced system.The state at discrete time k is represented by a vector x k on a compact m-dimensional manifold M, and a multidimensional external input u k , which lies on a compact n-dimensional manifold N, is added.
The input u is considered to be a stochastic input or the state of a dynamical system that is coupled to the considered system.If the input is the state of a low-dimensional deterministic system, such that, for example, u k+1 = g͑u k ͒, then by using the methods of Stark ͓8͔, it is possible to simultaneously reconstruct both systems and obtain a predictive relation; this case is not considered here.Also, a nonstationary input is not considered.
The evolution of the state of the system is given by a map f : M ϫ N→M so that the state at discrete time k + 1 is related to the state and the input at discrete time k as follows: As before, we observe a scalar value y k , which is related to the state x k by an observation function : M→R.The question we now ask is whether we can reconstruct a state space equivalent to the original one, given only the input and output time series ͕u k ͖ and ͕y k ͖.The purpose of this reconstruction is to obtain an input-output model of the system.
Stark et al. ͓9͔ proved an embedding theorem that is applicable to a wide class of stochastic dynamical systems as well as to input-output systems.They called this theorem Takens's theorem for stochastic systems.For d Ն 2m + 1, for typical f and , and for typical input sequence u, the corresponding delay map is an embedding.Further, we have the following predictive relation, which was conjectured by Casdagli ͓7͔: However, G is only defined for almost every u k , u k+1 , ... , u k+d−1 .Hence, it is not known whether G is regular with respect to the input terms ͓10͔.
Muldoon et al. ͓17͔ have a different approach; they use multivariate observations.Assume that the observation function : M→R p gives p independent observables, with p Ն 2m + 1.Then, on the basis of Whitney's embedding theorem ͓18͔, is generically an embedding.Thus, from one multivariate observation y k , information about the state x k can be obtained.The idea developed by Muldoon et al. ͓17͔ is to combine this observation with its successor y k+1 , in order to obtain information about the input u k .More explicitly, the reconstruction map ⌽ f, : M ϫ N→R 2p defined by ⌽ f, ͑x,u͒ = ͓͑x͒,"f͑x,u͒…͔ is an embedding ͓26͔.In other words, knowledge of the state x k ͑through y k ͒ and its successor x k+1 ͑through y k+1 ͒ gives us information about the input u k .However, this theorem cannot be directly used to derive a predictive relation for future observations.In Sec.III, we use the main idea of this theorem in order to build a functional relation that does not rely on knowing the inputs.

III. FORCED SYSTEMS WITH UNKNOWN INPUTS
In Sec.II, we reviewed two theorems that are related to Takens's embedding theorem and are applicable to forced systems.The theorem by Stark et al. ͓9͔ allows the reconstruction of an input-output system, and, by assuming the knowledge of the input, we may use this theorem to obtain the predictive relation given in Eq. ͑5͒.The theorem by Muldoon et al. ͓17͔ allows the reconstruction of an input-output system by using multivariate observations; however, this theorem does not lead to a predictive relation.
In the case of unmeasured input, we now argue that it is possible to conciliate the two approaches and derive a functional relation that uses multivariate observations but does not require any knowledge of the input.Our approach is to combine multivariate observations with the method of delays.
We separate the problem into two cases.In the simpler case, in which the entire state can be observed, we derive a functional relation to estimate a part of the state.Then, in the more general case for which the state is observed through observation functions, we argue that a similar functional relation, but one that involves delays, can be obtained.
Both functional relations are nonpredictive relations; thus, they cannot be used to predict the future behavior of the system.However, since they model the dynamics of the system, they have other possible applications such as change detection and noise reduction ͑see Sec.V͒.

A. Observing the entire state
We first assume that the entire state is directly observed and that the manifold M, where the state x lies, is itself a Cartesian product, M = M a ϫ M b , where M a and M b are two compact manifolds of respective dimensions m a and m b .Therefore, x may be decomposed as x = ͕x a , x b ͖ T , and Eq.͑4͒ can be expressed as follows: where f a and f b are the corresponding components of the map f.We call x a the primary state variables, and x b the secondary state variables.Here, the word primary is used to imply that these are the state variables of primary interest.
The secondary state variables are used to uniquely determine the input, as explained below.
An important hypothesis of the theorem by Muldoon et al. ͓17͔ is that the restriction f x ϵ f͑x ,•͒ : N→M is an embedding for each x M.This hypothesis allows the unique determination of the input u k from one state x k and the next state x k+1 .
We consider a more restrictive hypothesis that states that the restriction f b,x ϵ f b ͑x ,•͒ : N→M b is an embedding for each x M. By using Eq.͑6b͒, we have x b,k+1 = f b,x k ͑u k ͒, and therefore, u k can be uniquely determined as follows: Inserting the value for u k into Eq.͑6a͒ leads to a functional relation on the basis of which the primary state variables, x a,k+1 , can be inferred, Equation ͑7͒ can be rewritten as follows: x a,k+1 = G͑x a,k ,x b,k ,x b,k+1 ͒. ͑8͒ The function G can then be estimated from time series data, and the next values of the primary state variables can be estimated.To this end, G should be at least continuous, which is the case, as the following lemma shows.Lemma 1.Let M a , M b , and N be compact manifolds of respective dimensions m a , m b , and n, and let Proof.Let ⌿ : M ϫ N→Mϫ M b be the function defined by ⌿͑x , u͒ = (x , f b ͑x , u͒).By assumption f b is of class C 1 and for x M, f b,x is an embedding; therefore, it is easy to show that ⌿ is of class C 1 , is injective, and is an immersion.Thus, since M and N are compact, ⌿ is an embedding, and its inverse ⌿ −1 is defined on ⌿͑M ϫ N͒ and is of class

B. Using observation functions
We now suppose that the state is indirectly observed through two observation functions a : M→R p a and b : M → R p b .Thus, we have two time series ͕y a ͖ and ͕y b ͖ given at discrete time k by Here also, we call y a the primary observations, and y b the secondary observations.Paralleling the reasoning of Sec.III A, we assume that the restriction b ‫ؠ‬ f x ϵ b ‫ؠ‬ f͑x ,•͒ : N → R p b is an embedding for each x M. Thus, the knowledge of x k and y b,k+1 uniquely determines the input u k .This can be seen by replacing the expression of y b,k+1 using the equation of dynamics ͓Eq.͑4͔͒, The previous equation shows that y b,k+1 is in the image of b ‫ؠ‬ f x k .Hence, by using our hypothesis, we can apply its inverse ͑ b ‫ؠ‬ f x k ͒ −1 and obtain Inserting this expression of u k in the equation of dynamics ͓Eq.͑4͔͒ then yields which may be rewritten as Equation ͑9͒ can be viewed as the equation of dynamics of a hypothetical input-output system whose state at discrete time k is x k and whose input at discrete time k is y b,k+1 .The state of this input-output system is observed through the observation function a .If Takens's theorem for stochastic systems ͓9͔ could be applied to this system, then similarly to Eq. ͑5͒, the functional relation would hold for sufficiently large d.
However, Takens's theorem for stochastic systems cannot be applied to this case for the following reason.The theorem requires that for fixed y, the evolution operator of the system, i.e., the function f ˜y : M→M defined by This implies that b will be a constant function on M, which is contrary to our assumption that b ‫ؠ‬ f x is an embedding for each x M. The problem is that we are trying to use b to determine the input, and then a to obtain information about the entire state of the system, including the part that we already know from b .
Instead, assume as in Sec.III A that M is a product of two manifolds M = M a ϫ M b and that the function f b ͑x ,•͒ : N→M b is an embedding.Further, assume that a depends only on x a , that b depends only on x b , and that b is an embedding of M b .Then, Eq. ͑8͒ holds, and since b is an embedding, x b,k is uniquely determined by y b,k = b ͑x b,k ͒, and x b,k+1 is uniquely determined by y b,k+1 = b ͑x b,k+1 ͒.Therefore, Eq. ͑8͒ can be rewritten as follows: With these assumptions, the aforementioned problem does not occur.If Takens's theorem for stochastic systems is applicable to the hypothetical dynamical system of Eq. ͑10͒, the following functional relation holds:

IV. NUMERICAL SIMULATIONS
In order to verify the validity of the functional relations derived in Sec.III A ͓Eq. ͑8͔͒ and Sec.III B ͓Eq. ͑11͔͒, in Secs.IV B-IV D we introduce and analyze three systems: a discrete map that is a discretized version of the Rössler system by finite differences, the continuous-time Rössler system, and two coupled continuous-time Rössler systems.

A. Methodology
For each system to be analyzed, we build models based on multivariate observations and without any knowledge of the input excitation.Further, we evaluate their estimation errors on separate test data.In this section, we present the models used and the methodology used for verifying the validity of the functional relations of Eqs.͑8͒ and ͑11͒.
In both cases, the data that we analyze come in the form of a multivariate time series ͕y k ͖ 1ՅkՅN consisting of p individual channels y i,k , i =1, ... , p.Each channel represents an independent observation of the state of the system.Following the notations of Sec.III B, some of these observations will be primary observations, and others will be secondary observations.From the primary observations, we select one channel as the target variable z k .The target z k is the value that we wish to estimate.For example, z k may be the first channel of the observed time series, y 1,k .The multivariate time series ͕y k ͖ 1ՅkՅN and associated target time series ͕z k ͖ 1ՅkՅN form a training set T. From the training data in T, we wish to identify the functional relation where v k is a vector formed from delayed values of the observed time series.We write the delay vector v k in a generic form, using nonuniform delays ͓19͔, The reconstruction vector v k has d components.For the jth component, we independently choose the channel i j and the delay j .Usually, we restrain j Ͼ 0 so that the target z k depends only on past delays; Eq. ͑12͒ is then a predictive relation.However, in the case of an input-output system without any knowledge of the input, as the relation from Eq. ͑11͒ shows, one of the delays for the secondary outputs is 0. Therefore, when modeling this relation, we allow j = 0 for the channels chosen as secondary outputs.In this paper, we take v k as the right-hand side of Eqs.͑8͒ or ͑11͒.
Once the reconstruction vector v k is chosen, we use the measured time series for estimating the function G of Eq. ͑12͒.To this purpose, we use a local linear approximation ͑refer to ͓5͔͒ defined as follows.For a test point v ˜with associated target z ˜, define U ͑v ˜͒ as the set of points among the reconstruction vectors v k of the training set T that are closest to the test point v ˜in terms of the Euclidean distance.
The number of neighbors = ͉U ͑v ˜͉͒ is chosen beforehand.
Then, the local linear approximation of the target z ˜is where the coefficients a and b are obtained by least-squares fitting over the set of neighbors U ͑v ˜͒.
In this paper, we perform all the local linear approximations by using neighborhoods containing = 20 neighbors.
The optimal choice of depends on the nonlinearity of the equations, on the level of noise, and on the dimension of the reconstruction vector v k .Thus, a fixed number of neighbors is not optimal for all the systems that are studied in Secs.IV B-IV D, and not optimal even for all the relations of a given system.Several values were tried, and the choice = 20 is close to the optimal for the systems that are studied here; for =10 or = 50, the results were similar to the results shown in this paper, with the exception of cases for which the dimension of the reconstruction vector is close to .
In order to estimate the accuracy of the model, we compute the root mean square of the estimation error of the model on a test time series, which we normalize by the standard deviation of the time series to be estimated, In the rest of the paper, we refer to E as normalized estimation error, or simply as estimation error.
We now examine the source of the estimation errors.The functional relation that we are modeling is in the form of Eq. ͑12͒.If it holds exactly, then as the number of points N in the training time series increases, better and better approximations of the function G can be made, and the normalized estimation error E will become closer and closer to zero.For the local linear approximation, the scaling law is as follows: where N is the length of the training data, and, in the case of autonomous systems, D is the information dimension of the attractor ͓20,21͔.For input-output systems, the same scaling law is expected to hold, with D being the dimension of the space in which the delay vector v k is constrained to lie ͓7͔.
With a random input, the dimension is D = m + n i , where m is the state-space dimension and n i the number of different inputs that appear in the delay vector.Alternatively, if Eq. ͑12͒ does not hold exactly, then the relation between the delay vector v k and the target variable z k is of the form where k is the noise term, or innovation, that does not depend on the variables included in v k .In this case, in the limit of large N, the normalized estimation error E is limited by the signal-to-noise ratio of the time series.For large N, the normalized estimation error E is then approximately In the present study, there may be several causes for not being able to estimate the target z k : first, when the reconstruction vector v k does not have a sufficient number of delay terms, a functional relation does not exist between v k and z k ; second, when the function G is not continuously differentiable, the local linear approximation fails; third, when the input u k cannot be uniquely determined from the variables included in the reconstruction vector v k , the proposed relations do not hold.It is the third case that we are looking for.
In this case, the innovation k of Eq. ͑15͒ mainly consists of a term depending on u k .Therefore, we can estimate the standard deviation of the innovation k , and thus the normalized estimation error E in the limit of large N, by estimating the contribution to z k of the terms depending on u k in the equations of dynamics of the system.In Secs.IV B-IV D, we use this reasoning for verifying the validity of the functional relations.If, as the training data length N is increased, the estimation error does not decrease to a value below the threshold given by Eq. ͑16͒, the input u k cannot be uniquely determined as was supposed in Sec.III.Conversely, an estimation error following the scaling law of Eq. ͑14͒ indicates a successful model based on only the output data.
In order to better discriminate between the two cases, we need to make a further adjustment to the procedure.As Casdagli ͓21͔ noted and as we also observed, very infrequent bad estimations seem to affect the normalized estimation error E. Hence, for large N, Eq. ͑14͒ does not hold.Casdagli ͓21͔ suggests discarding 10% of the worst estimations, or alternatively, replacing the arithmetic mean in Eq. ͑13͒ by a geometric mean.In this paper, we removed 10% of the worst estimations when computing the normalized estimation error E. The use of a geometric mean in Eq. ͑13͒ also gave similar results, which we have not discussed here.
Finally, we would like to mention that no observational noise was added to the simulated time series, and the time series were normalized in the following manner so as to give an equal weight to every channel when computing Euclidean distances for the local linear approximations.Each channel of the training time series was first normalized by subtracting the sample mean and dividing by the sample standard deviation so that each channel of the resulting time series has zero mean and unit variance.Each channel of the test time series was then normalized by the same factors ͑so that the channels of the test data may or may not have zero mean and unit variance͒.The reason for using the same factors is that the behavior of the system, and hence the values estimated by the models, depends on the amplitude of the data, i.e., on the amplitude of the current states and of the inputs.By normalizing the training data and the test data by the same factors, we can retain this dependency.

Model and data
As the first example, we use a discretized version of the Rössler system.We call this system Rössler map.The original Rössler system ͓22͔ is given by We transform it into a forced system by adding an input force u͑t͒ on the first or second equations, We use the third finite difference model presented by Letellier et al. ͓23͔, and to it, we add the above input force.Thus, the equations for the Rössler map that we use are as follows: This is a nonstandard finite difference model; 1 , 2 , and 3 are coefficients that depend on the integration time step h and on the time scales of the system.Following Letellier et al. ͓23͔, we use the values where x c = ͑−c + ͱ c 2 −4ab͒ / 2. We use the parameter values =1, a = 0.432, b =2, c = 4, an integration step h = 0.1, and input coefficients ␣ x = ␣ y =1.
Several training time series of the Rössler map with N data points were generated by evaluating Eq. ͑19͒ for N + 1000 time steps and discarding the first 1000 values, for increasing values of N. Each time series had different randomly chosen initial conditions and different external input ͕u k ͖, which was unfiltered random Gaussian white noise with zero mean and unit variance.In order to compute estimation errors, a test time series with 1000 points was also generated for each training time series.

Full state observation
We now examine Eq. ͑19͒ in more detail in order to understand which functional relations can be expected to hold.First, observe that the terms x k , y k , z k , and u k appear in all three equations, through the term x k+1 for Eqs.͑19b͒ and ͑19c͒.Thus, the next states can be expressed as functions of the immediately preceding input and states, The functions f x and f y are linear, while f z is nonlinear in that it involves second-order polynomial terms because of the presence of the term x k+1 z k in Eq. ͑19c͒.Therefore, local linear approximations of f x and f y should have very low estimation errors, while local linear approximations of f z should follow the scaling law of Eq. ͑14͒.
Consider the case of a full state observation.We assume that the hypothesis of Sec.III A holds so that if we have the full knowledge of the state ͑x k , y k , z k ͒, only one of the states x k+1 , y k+1 , or z k+1 is necessary to uniquely determine the input u k .Considering successively x, y, or z as the secondary variable, we thus define six functional relations on the basis of Eq. ͑8͒, By using Eq.͑19͒, we can derive analytical expressions for the above functions as follows: G y1 and G x1 are completely linear; G z1 and G z2 are polynomial; and G x2 and G y2 are rational functions with a pole at z k =0.͑The time series that were generated for the Rössler map do not include statespace regions with z k =0.͒ For each of the functions of Eq. ͑21͒ to be approximated, Fig. 1 shows the normalized estimation error E as a function of the training data length N. As the estimation errors of local linear approximations of the functions f x , f y , G x1 , and G y1 were as expected on the order of the round-off error of floating-point computations, their values were not included.For the sake of comparison, the estimation errors of local linear approximations of f z are also shown in Fig. 1.In Fig. 1, since both axes are shown on a logarithmic scale, it is apparent that all estimation errors decrease as a negative power of the data length N, following the scaling law of Eq. ͑14͒.
We further validate the estimation errors by comparing them to thresholds as defined in Sec.IV A. For each state variable x k , y k , and z k , the corresponding threshold was computed in the following manner.Assume that the noise k mainly consists of an additive term depending on the input u k , and approximate it by an estimate of the contribution of the terms depending on u k in the equations of dynamics.For example, when estimating x k+1 , the noise term x,k is approximated by where the bar represents the sample average.The threshold is then obtained by dividing the standard deviation of x by that of x as per Eq.͑16͒.The resulting thresholds are shown in Table I.All estimation errors of Fig. 1 are well below the thresholds for training time series with more than 10 000 points.

Observation functions
We now consider the case of Sec.III B, for which the state is observed through two observation functions a and b .Since the entire state is not observed, the functional relation of Sec.III A cannot be used.For several choices of observation functions a and b and for increasing d, we evaluate the quality of local linear approximations of the functional relation of Eq. ͑11͒.
Table II contains the definition of the pairs ͑ a , b ͒ of observation functions that were considered as well as the thresholds computed in the same manner as that described in Sec.IV B 2, and the normalized estimation errors obtained for local linear approximations of Eq. ͑11͒ using training time series with N = 100 000 points.
For d Ն 2, the estimation error in all cases is at least close to one order smaller than the corresponding threshold, which indicates that for d Ն 2, the functional relations hold in all the cases listed in Table II.
Figures 2 and 3 show the evolution of the estimation error as a function of the training data length N, for case 1 and case 4 of Table II.Case 1 is one of the cases with the smallest estimation error; case 4 is one of the cases with the largest estimation error.The figures show that the estimation error for d = 1 stagnates, while for d Ն 2, the estimation error follows the scaling law of Eq. ͑14͒.II.Thresholds and estimation errors for local linear approximations of the Rössler map using various primary ͑ a ͒ and secondary ͑ b ͒ observation functions.The threshold for each primary observation function a was computed from data sets of 1000 points as described in Sec.IV B 2. The functional relation that is approximated is that of Eq. ͑11͒, with increasing d.The estimation errors were computed using 100 000 training data points and 1000 test data points.All values shown are the mean for ten sets of data, and the standard deviation as a percentage of the mean.

Model and data
The second example is that of the Rössler system given by Eq. ͑18͒, with parameter values a = 0.398, b =2, c = 4, and = 1.We define three cases of input coefficients: ͑a͒ ␣ x =1 and ␣ y =0, ͑b͒ ␣ x = 0 and ␣ y = 1, and ͑c͒ ␣ x = 1 and ␣ y =1.By studying these three cases, it is possible to confirm, as the following results show, that if the inputs do not affect the chosen secondary variables, or if the number of secondary variables is insufficient, the relations do not hold: the estimation error stops to decrease after reaching a certain level.For each of these cases, several training and test time series were generated with the same procedure as described in Sec.IV B 1, by simulating the Rössler system of Eq. ͑18͒ with the fourth-order Runge-Kutta method ͑see for example ͓24͔͒ with an integration time step h = 0.1.
For an integration step, in the fourth-order Runge-Kutta equations, the input u͑t͒ is evaluated at the beginning of the step ͓u͑t k ͒ = u k ͔, at the middle of the step ͓u͑t k + h / 2͔͒, and at the end of the step ͓u͑t k + h͒ = u k+1 ͔.We assume that the input u͑t͒ is constant during one time step, so that u͑t k + h / 2͒ = u k .Because of this integration method, in the discretized equations that evolve the state ͑x k , y k , z k ͒ to ͑x k+1 , y k+1 , z k+1 ͒, there are input terms depending on u k in the three equations, and input terms depending on u k+1 in the first equation ͑when ␣ x =1͒ and in the second equation ͑when ␣ y =1͒.Thus, the generic form of the equations of dynamics of the discretized system is as follows:

Full state observation
We first consider the case of a full state observation.Since the discretized equation, Eq. ͑22͒ involves two inputs u k and u k+1 , we need at least two state variables to uniquely determine the inputs u k and u k+1 as described in Sec.III A. We have at our disposition only three state variables.We choose two of these as the secondary variables, and the remaining one as the primary variable.Thus, we consider the following functional relations based on Eq. ͑8͒: Here, all the functions of Eqs.͑22͒ and ͑23͒ are polynomial or rational functions of their arguments.We have noted in Sec.IV C 1 that when ␣ x = 1 and ␣ y =0, f y and f z do not depend on the input u k+1 .Therefore, the input u k+1 cannot be uniquely determined unless x is chosen as one of the secondary variables; hence, Eq. ͑23a͒ does not hold in case ͑a͒.
Similarly, when ␣ x = 0 and ␣ y =1, f x and f z do not depend on the input u k+1 .Therefore, the input u k+1 cannot be uniquely determined unless y is chosen as one of the secondary variables; hence, Eq. ͑23b͒ does not hold in case ͑b͒.
Figure 4 shows the normalized estimation error of local linear approximations of G x , G y , and G z as a function of the training data length N, while Table III gives the thresholds of Eq. ͑16͒, computed in the same way as for the Rösser map in Sec.IV B 2.
From Fig. 4͑a͒, we see that for case ͑a͒, for which ␣ x =1 and ␣ y = 0, the estimation error of approximations of G y and G z continues to decrease below the corresponding threshold as N is increased, which indicates successful models using only output data.However, the estimation error of an approximation of G x does not decrease below the threshold as N is increased.This was expected, since as mentioned above, in case ͑a͒ using the variable x k+1 is indispensable in order to uniquely determine the second output u k+1 .
Similarly, Fig. 4͑b͒ shows that for case ͑b͒, for which ␣ x = 0 and ␣ y = 1, the estimation error of an approximation of G z continues to decrease below the threshold, while G y is  limited by the inability to uniquely determine the input u k+1 from the variables y k+1 , z k+1 , x k , y k , and z k .Nevertheless, note that the estimation error for G x is below the threshold but does not follow the scaling law.This can be interpreted as follows.In Fig. 4͑b͒, the estimation error for an approximation of G x without using the variable z k+1 is also included.The estimation error is similar to that of an approximation of G x .Thus, the information contained in z k+1 was not useful to obtain information about the input u k ; only the information contained in y k+1 was useful.However, since y k+1 depends not only on u k but also on u k+1 , the input u k can be only partially determined.Thus, the estimation of x k+1 is not accurate due to the partial knowledge of u k .
Finally, Fig. 4͑c͒ shows that for case ͑c͒, for which ␣ x = 1 and ␣ y = 1, the estimation error of approximations of G x , G y , and G z continues to decrease below the threshold as N is increased.This indicates that the functional relations hold, and that any two of the state variables x k+1 , y k+1 , and z k+1 can be used to uniquely determine the two inputs u k and u k+1 , assuming also the knowledge of the previous state ͑x k , y k , z k ͒.

Observation functions
We now consider the case for which the state is observed through two observation functions a and b .When choosing the observation functions, it is important to ensure that the main assumption of Sec.III B holds, i.e., when the knowledge of the state ͑x k , y k , z k ͒ is assumed, the secondary observation b ͑x k+1 , y k+1 , z k+1 ͒ uniquely determines the inputs, which here are u k and u k+1 .Ignoring the fact that u k and u k+1 are the same input at two different times, we need to have at least two secondary observations.Here again, for several choices of a and b , and for increasing d, we evaluate the normalized estimation errors of local linear approximations of the functional relation of Eq. ͑11͒.
We present here the results only for case ͑b͒ with ␣ x =0 and ␣ y = 1; for cases ͑a͒ and ͑c͒, the results were similar.
Table IV contains the definition of the pairs ͑ a , b ͒ of observation functions that were considered as well as the thresholds computed in the same manner as that described in Sec.IV B 2, and the normalized estimation errors obtained for local linear approximations of Eq. ͑11͒ using training time series with N = 100 000 points.For cases 2 to 6, when d Ն 1 the estimation errors are at least one order below the threshold; for cases 1 and 7, when d Ն 2 the estimation errors are one order below the threshold.As a representative example, Fig. 5 shows the estimation error for case 5 as a function of the training data length N. One can see that the estimation error follows the scaling law of Eq. ͑14͒.

D. Two coupled Rössler systems
Finally, we analyze a system that consists of two coupled Rössler systems, the first one having additional input forces:  The values given are the mean for ten data sets of 1000 points, and the standard deviation.
We consider the two coupled systems as not one system but two separate systems.The state variables of the first Rössler system are x 1 , y 1 , and z 1 , and the external inputs are u and y 2 .The state variables of the second Rössler system are x 2 , y 2 , and z 2 , and the external input is y 1 .
However, for discretized versions of these systems, the external inputs are different, as shown below.Consider a discretization using the fourth-order Runge-Kutta method.As in Sec.IV C 1, we assume that the input is constant during one time step.Equation ͑24͒ can be inserted into the Runge-Kutta equations in order to obtain discretized equations of the two coupled systems.The discretized equations are fourth-order approximations.Reproducing the complete set of equations here would take considerable space; however, for the second Rössler system, we succinctly write the general form of these equations: where superscripts have been added as follows.Note that the functions f x 2 , f y 2 , and f z 2 are polynomials.When the terms of the polynomials are ordered according to the increasing power of h, the superscript of each variable in the right-hand side of Eq. ͑25͒ is the order of the first term in which this variable appears.For example, y 1,k first appears in a term multiplied by h 2 in the function f x 2 , in a term multiplied by h in the function f y 2 , and in a term multiplied by h 3 in the function f z 2 .Equation ͑25͒ shows that the inputs of the discretized second Rössler system are y 1,k , x 1,k , z 1,k , and u k .We cannot in general uniquely determine four inputs from the next values of the three state variables x 2 , y 2 , and z 2 , or from observables that depend on them, since this requires at the very least four state variables.However, if we consider a second-order approximation in h, Eq. ͑25͒ shows that only y 1,k and x 1,k are inputs of the second Rössler system.
In order to verify this approach, we consider output-only models that rely on the assumption of having only two inputs y 1,k and x 1,k , and we identify, from the data, local linear approximations of the following three functional relations: TABLE IV.Thresholds and estimation errors for local linear approximations of the Rössler system using various primary ͑ a ͒ and secondary ͑ b ͒ observation functions, for input coefficients ␣ x = 0 and ␣ y = 1.The threshold for each primary observation function a was computed from data sets of 1000 points as described in Sec.IV B 2. The functional relation that is approximated is that of Eq. ͑11͒, with increasing d.The estimation errors were computed using 100 000 training data points and 1000 test data points.All values shown are the mean for ten sets of data, and the standard deviation as a percentage of the mean.x 2,k+1 = G x 2 ͑y 2,k+1 ,z 2,k+1 ,x 2,k ,y 2,k ,z 2,k ͒,

Case
These functional relations are the same as those for the Rössler system in Eq. ͑23͒.For comparison, we also identify local linear approximations for the functions f x 2 , f y 2 , and f z 2 , in which we include variables with increasing order in h.For example, f x 2 1 is a first-order approximation of f x 2 depending only on the variables x 2,k , y 2,k , and z 2,k .
The normalized estimation error as a function of the training data length N is shown in Fig. 6. Figure 6͑b͒ shows that the local linear approximation using only output data to estimate y 2 has an estimation error that is smaller than that of a first-order approximation of f y 2 , but larger than that of a second-order approximation of f y 2 .This was expected since the second-order approximation depends on both y 1,k and x 1,k , which cannot be uniquely determined from the two secondary state variables, x 2,k+1 and z 2,k+1 .
In Figs.6͑a͒ and 6͑c͒ we observe that, for an estimation of x 2 or z 2 , local linear approximations using only output data outperform local linear approximations of the functions f x 2 and f z 2 .This is contrary to our expectation.A possible explanation is that the local linear approximation did not only model the relations between primary and secondary variables using the equations for the second Rössler system, but also reconstructed part of the dynamics of the first Rössler system.

V. SUMMARY AND DISCUSSIONS
We approached the problem of building models of inputoutput systems without any knowledge of the input.After examining existing embedding theorems, we proposed some theoretical arguments for the existence of a functional relation between delays of multivariate observations.The main assumptions are that the number of inputs is small and that the inputs can be uniquely determined by the current state and some of the observations at the next discrete time.Thus, multivariate observations are split into primary observations and secondary observations.The secondary observations are used for obtaining information about the input, while delays of the primary observations are used for obtaining information about the state of the system.
We first studied two examples, namely, a discrete map that is a discretized version of the Rössler system by using finite differences, and the Rössler system simulated by the Runge-Kutta method.We showed that, when compared to what they should be if the inputs could not be taken account of in the model, the estimation errors obtained for local linear models using only observed data are smaller by at least close to one order.Furthermore, the estimation errors follow the classical scaling law E = O͑N −2/D ͒.This indicates that successful models can be built by using only output data.A condition for this is that the secondary observations are functions of the states that are affected by the input.
An important application of this method is the analysis of coupled systems or complex networks.We can partition such a large system into several subsystems and study them separately.In this way, models based on the deterministic dynamics of each subsystem can be built, which would not be possible for the whole system because of its high dimensionality.
In order to verify the feasibility of this approach for the case of continuous-time systems, we studied the example of two coupled Rössler systems.The results indicate that in the case of a noiseless time series originating from these two continuous-time coupled systems, we cannot fully decouple the two subsystems to obtain a model of one of the two subsystems only.Although the two subsystems are coupled only by one variable when viewed as continuous-time systems, the discrete equations controlling the evolution of a time series originating from these subsystems have more coupling terms.In the case of a discretization with the Runge-Kutta method, it was shown that because of the continuous-time nature of the system, in the discretized equations the inputs of the considered subsystem include all the state variables of the other subsystem.This dependence originates from the fact that information flows through each variable in the continuous-time equations.Despite this, the effect of some of the inputs on the equations of the considered subsystem is negligible; therefore, the proposed relations hold approximately.
Only noiseless time series were used in this paper.The effect of observational noise is reported elsewhere ͓25͔.For noisy observations, the dependence on the additional variables will be hidden by the noise since these variables appear in the discretized equations in terms of higher order in the integration time h.In such a case, we can expect to separate the dynamics of the two subsystems since the noise hides the flow of information from the variables of one subsystem to the other.
It should be noted that the two proposed functional relations are nonpredictive relations; thus, they cannot be used to predict the future behavior of the system.However, since they model the dynamics of the system, they have other applications such as change detection and noise reduction.The application to change detection is reported elsewhere ͓25͔.Also, preliminary results indicate that for noisy data, the estimates of the primary observations are less noisy than the original time series; thus, noise in the primary observations can be reduced.

FIG. 1 .
FIG. 1. Normalized estimation error for a local linear approximation of the Rössler map as a function of the training data length N. The functions f x , f y , G x1 , and G y1 were not included since their estimation error was of the order of the round-off error of floatingpoint computations.For each value of N, the mean and the standard deviation for ten sets of training and test data are shown.
FIG. 2. Normalized estimation error for a local linear approximation of the Rössler map with observation functions a ͑x , y , z͒ = x and b ͑x , y , z͒ = y, as a function of the training data length N.For each value of N, the mean and the standard deviation for ten sets of training and test data are shown.

FIG
FIG. 3. Normalized estimation error for a local linear approximation of the Rössler map with observation functions a ͑x , y , z͒ = xy and b ͑x , y , z͒ = x + y + z, as a function of the training data length N.For each value of N, the mean and the standard deviation for ten sets of training and test data are shown.

FIG. 4 .
FIG. 4. Normalized estimation error for a local linear approximation of the Rössler system as a function of the training data length N, with parameters ͑a͒ ␣ x = 1 and ␣ y =0, ͑b͒ ␣ x = 0 and ␣ y = 1, and ͑c͒ ␣ x = 1 and ␣ y = 1.For each value of N, the mean and the standard deviation for ten sets of training and test data are shown.

FIG. 5 .
FIG. 5. Normalized estimation error for a local linear approximation of the Rössler system with observation functions a ͑x , y , z͒ = x + y + z and b ͑x , y , z͒ = (yz + x 2 , ͑x + y͒ 2 ), as a function of the training data length N. The input parameters were ␣ x = 0 and ␣ y = 1.For each value of N, the mean and the standard deviation for ten sets of training and test data are shown.

FIG. 6 .
FIG. 6. Normalized estimation error for a local linear approximation of the two coupled Rössler systems as a function of the training data length N. The estimated variables are ͑a͒ x 2 , ͑b͒ y 2 , and ͑c͒ z 2 .For each value of N, the mean and the standard deviation for ten sets of training and test data are shown.

TABLE I .
Estimation error thresholds for the Rössler map.The values given are the mean for ten data sets of 1000 points, and the standard deviation.

TABLE III .
Estimation error thresholds for the Rössler system.